TheSummary of ¡§Probabilistic Modeling in Psycholinguistics : LinguisticComprehension and Production¡¨
Probabilityid not really about numbers l it is about the structure of reasoning.
1.Mainargument: The research on probability¡@disappeared through the 60¡¦s, 70¡¦s, and 80¡¦s. This omission isastonishing when we considered that input to language comprehension is noisy,ambiguous, unsegmented. In order to deal with these problems, computationalmodels of speech processing have had to rely on probabilistic models for over30 years. Computational techniques for processing of text, an input mediumwhich is much less noisy than speech, rely just as heavily on probabilitytheory.
2.Probability theory is certainly the best normative model for solving problemsof decision-making under uncertainty. But perhaps it is a good normative model,but a bad descriptive one. Despite the fact¡@that probability theory was originally invented as a cognitive model ofhuman reasoning uncertainty, perhaps people do not use probabilistic reasoningin cognitive tasks like language production and comprehension. Perhaps humanlanguage processing is simply a non-optimal, non
3.There is an emerging consensus that¡@human cognition is in fact rational, and relies on probabilisticprocessing. Probabilistic model are also now finally being applied inpsycholinguistics, drawing from early Bayesian-esque precursors in perceptionsuch as the Luce (1959) choice rule and the work of Massaro.
4. Threeclaims have been made for language comprehension is in disambiguation, Considerthe task of accessing linguistic structure from the mental lexicon or grammar.Perhaps more probable structures are accessed more quickly, or with lesseffort. Or perhaps they can merely be accessed with less evidence than lessprobable structures:
(1)Another role for probability in comprehension is in disambiguation. Ambiguityis ubiquitous in language comprehension; speech input is ambiguously segmented,words are syntactically and semantically ambiguous, sentences are syntacticallyambiguous, utterances have ambiguous illocutionary force, and so on.
(2)Probability may also play a key in comprehension in explaining processingdifficulty.
(3)A third role for probability is in leering. Many models of how linguisticstructure is empirically induced rely on probabilistic andinformation-theoretic models.
5. Afinal implication of probabilities in psycholinguistics is representational. Inorder for humans to use probabilities of linguistic structure in processing,there must be some mental representation of probability.
6.Probability theory is a good model of language processing at what Marr calledthe ¡¥computational level¡¦; it characterizes the input-output properties of thecomputations that the mind must somehow be doing.
7.What probabilistic modeling offers psycholinguistics is a model of thestructure of evidential reasoning: a principled¡@ and well-understood¡@algorithm for weighting and combining¡@evidence to chose interpretations in comprehension, and to chose certainoutcomes in production.
8. Probabilisticmodeling has been applied to many areas of psycholinguistics: phonologicalprocessing, lexical processing, syntactic processing, discourse processing.
B.Summary of Evidence for Probabilistic Knowledge
1.Probabilistic modeling gives us tools to estimate the prior probability ofthese structures by making various independence assumptions, allowing us toestimate the probability of one large complex object from the counts of manysmaller objects.
(1)Considering that a corpus is an instance of language production, but thefrequencies derived from corpora are often used to model or control experimentsin comprehension.
(2) TheBrown corpus of a more frequent words tended to be shorter.
3.One of the earliest and most robust effects in psycholinguistics is the wordfrequency effect. Word frequency plays a role in both the auditory and visualmodalities, and in both comprehension and production.
4.More frequent words are accessed more quickly (shorter latency) and are articulatedmore quickly (shorter duration)
5.The frequencies of the semantic, syntactic, or morphological categoriesassociated with an ambiguous word play an important role in comprehension. Morefrequent categories are accessed more quickly and are preferred indisambiguation. Rather surprisingly, given the robust effect of the frequencyof lexical semantic/syntactic category in comprehension, there may not be anysuch effect in production. Instead, some studies have suggested that frequencyeffects in lexical production are confined to the level of the word form
6.The psycholinguistic role of word-to-word frequencies or probabilities has alsobeen extensive studied in production.¡@The production studies have generally focused on the effect of frequencyor probability given neighboring words in the phonetic form of a word. The mainresult is that words in high -frequency words-pairs or high-probability wordpairs are phonetically reduced in some way.
7.The probability of a word given the previous or following word play a role incomprehension and production. Words which have a high joint or conditionalprobability given preceding or following words have shorter duration inproduction. In comprehension, if a word pair has a high frequency, anyambiguous words in that pair are likely to be disambiguated consistently withthe category of word pair.
8.The conditional probability of a sub categorization frame given a verb plays arole in disambiguation in comprehension. The higher¡@ the conditional probability of the frame, the more it will bepreferred in disambiguation. In production, the evidence is less conclusive.
(1)The more likely a verb was to take a sentential complement, the more likely itwas to heavy NP shift.
(2) Thebinned conditional probability of a subcategorization frame given a verb seemsto be stable over different corpora after controlling for verb sense.
9.The conditional probability of a word, or a lexical category given previouswords or structure plays a role in disambiguation. Words or categories withhigher conditional probabilities are preferred.
10While some studies seem to suggest that the frequency of larger non-lexicalsyntactic structures play a role in disambiguation, the evidence is quitepreliminary and not very robust. None of the studies that found an effect ofnon-lexical syntactic or idiom structure did so after carefully controlling for
lexicalfrequencies¡@ and two-word or three-wordbigram frequencies. But of course the frequency of complex constructions ismuch lower than lexical frequencies, and so we expect frequency effects fromlarger¡@ constructions to be harder tofind. This remains an important area of future research.
11.The frequency of multiple-word structures play a role in both comprehension andproduction. Frequency word pairs or idioms are faster to access and/orpreferred in disambiguation. Frequent word pairs or words which have a highMarkov bigram probability given neighboring words are shorter in duration andphonologically more reduced.
12.Various kinds of conditional probabilities play a role in comprehension andproduction. For verbs which have more than one possible syntacticsubcategorization , the more frequent sub categorization frame is preferred in disambiguation.The probability of a verb appearing non-contiguous with its complement plays arole in production. For words with more than one potential part of speech, thepart of speech with higher conditional probability given the preceding part ofthe sentence is preferred.
C.Probabilitistic Architectures and Models
1.The competition-integration model of Spivey-Knowlton uses a neutral network tocombine these constraints to support alternative interpretations in parallel.Each syntactic alternative is represented by pre-built local list node in anetwork; thus the network models only the disambiguation process itself ratherthan the generation or construction of syntactic alternatives.
2.The Competition model may have been the first probabilitistic model of sentenceprocessing. The goal of the model is to map from the ¡¥formal¡¦ level to thefunctional level. Since input is ambiguous and noisy, the model assumes thatthe sentence processor relies in a probabilistic manner on various surface cuesfor building the correct functional structure. An English-speaking comprehendedrelies heavily on word order cues in making this mapping while a German speakerrelies more heavily on morphological (case) cues.
3.The competition model also considers factors related to the cost of a cue.
4.Charteret al.(1998) apply Anderson¡¦s rational model to sentence processing. They firstpropose that the goal of the human parser is to maximize the probability ofobtaining the globally correct parse.
5.Jurafsky(1996) proposed a probabilistic model for synctactic disambiguation.His probabilistic parser kept multiple interpretations of an ambiguoussentence, ranking each interpretation by its probability. The probability of aninterpretation was a computed by multiplying two probabilities: the stochasticcontext-free grammar(SCFG) ¡¥ prefix¡¦ probability of the currently-seen portionof the sentence, and the ¡¥valence¡¦ probability for each verb.
6.The Jurafsky parser has the advantages of a clean, well-defined probabilisticmodel, the ability to model the changes in probability word-by-word, a parallelprocessing architecture which could model both lexical and syntacticprocessing, accurate modeling of parse preference, and a probabilisticbeam-search architecture which explains difficult garden-path sentences. Themodel has many disadvantages, however. It only makes very broad-grainedreading-time predictions; it predicts extra reading time at difficultgarden-path sentences, because the correct parse falls out of the parser¡¦s beamwidth.
7.The probabilistic models of human parsing based on Markov models and stochasticcontext-free grammars use the SCFG or HMM probability to predict which parse ofan ambiguous sentence a human will prefer.
1.Probabilistic models do a good job of selecting the preferred interpretation ofambiguous input, and are starting to make headway in predicting the time-courseof this disambiguation process.