This is an attempt to answer questions NickW 211 asked in a comment on my previous post: “Hugh Dellar and Lexical Priming Part 2”. First, thanks Nick for taking the time to share your thoughts.
Question 1: You have criticized Lexical Priming, saying that “the problem is that Hoey nowhere operationalises his term “noticing” in any way that allows us to test the theory.” Do you happen to know if this same criticism of Hoey also applies to the theories of others with broadly similar theories of language? e.g. Sinclair or Hunston and Francis?
My Answer: No, I don’t think it does. None of them, as far as I know, claims that sub-conscious noticing is the key construct in language learning.
Question 2. You’ve also noted that Chomsky has successfully argued that: “language use is “stimulus independent” and “historically unbound”. How has Chomsky tested these two notions of stimulus independence and historical unboundedness? I was under the impression that these had not been tested, or at least that if they had been tested they were done so under highly specialized conditions that bear little or no relation to language as it is used by 99.9% of speakers.
My Answer: Skinner’s (1957) theory claims that “Language is stimulus dependent”. All the examples of language use which are not stimulus dependent therefore combine to form a convincing body of evidence that Skinner’s theory is false. Similarly, the claim that language is historically bound is refuted by examples of new, creative language use. So Chomsky’s claims that language use is “stimulus independent” and “historically unbound”, are part of his (1959) refutation of Skinner’s theory of language. In the 1950s, Skinner’s behaviourism was the paradigm theory of learning not only language but everything else, and claimed that any instance of human behaviour can be explained as a response to a stimulus. This view is based on a strict empiricist epistemology which regards all talk of mental states as so much unscientific mumbo jumbo. Chomsky was responsible for the fastest and most dramatic paradigm shift of his time, ushering in the new “mentalist” or “nativist” paradigm for linguistics, and cognitive science for psychology.
In a way, you’re right to say that Chomsky’s theory has nothing to do with the language used by 99.9% of speakers. Chomsky’s model of language distinguishes between competence and performance, between language knowledge and the use of language, influenced as the latter is by limits in the availability of computational resources, stress, tiredness, alcohol, etc. Chomsky says he’s concerned with “the rules that specify the well-formed strings of minimal syntactically functioning units” and with “an ideal speaker-listener, in a completely homogenous speech-community, who knows his language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance” (Chomsky, 1965: 3).
Only by dealing with the abstract knowledge he so carefully-defines can Chomsky make the claims he does for his theory of UG, and this is its great strength as a theory. To put it another way, Chomsky is interested in I-Language, not E-Language. I-Language refers to internalised language, the linguistic knowledge in the mind of the speaker. E-language refers to linguistic output: shouts, poems, sentences, songs, texts of all description. The important thing is that, in Chomsky’s view, E-Language is epiphenomenal; it is the result of I-Language. Chomsky sees I-Language as the phenomenon and E-language as performance data. All good theories are based on a careful distinction between phenomena and data. When you look at Hoey’s theory, it tries to base a case on patterns detected in corpora – the bigger the better (i.e., the more raw data, the better). Patterns are the phenomena which we notice in the data, but what’s the explanation for these patterns?
Question 3. How does Chomsky then account for the fact that, while it may be possible to “say things that we have not been trained to say and that we have never heard anybody else say” we nevertheless only seldom do so?
My Answer: I don’t think it’s the case that we rarely say things we have not been trained to say and that we have never heard anybody else say; I think it happens all the time. I agree that novel utterances can be seen as reformulations, hybrids, syncretisms of previous utterances, and that these novel utterances will contain patterns of text of the type Hoey goes on about. But the list of variations on the patterns is literally countless and as an explanation of language use, the construct of “lexical chunks” amounts to very little indeed. I think Hoey is right to say that the grammatical category we assign to a word is “a convenient label we give the combination of (some of) the word’s most characteristic and genre-independent primings”, but he seems to miss the point of the categories, which is to act as organising principles. If we abandon these categories we’re left with a mess: “a cluster of other primings”!
You say Hoey’s theory of Lexical Priming seems to point toward an explanation of why each of the following ‘novel’ utterances would be more or less acceptable and appear to carry some kind of meaning (even if it is one we can’t quite access):
Colorless green ideas sleep furiously.
a) The greening of colorlessly furious sleep ideas.
b) To colorless ideas with furious sleeping.
c) Furiously greening colorless idea sleeps.
I’m afraid I think these examples, far from supporting Hoey’s theory of Lexical Priming, actually give strong support to Chomsky’s claim about grammaticality judgements!
To return to lexical priming, it isn’t clear to me how sub-consciously noticing lexical patterns in input “explains” anything. Unlike Chomsky’s theory which says that we are hard-wired with knowledge about the principles underlying language, so that learning a particular language is a question of setting parameters as a result of exposure to it and then subsequently acquiring more lexis (including lexical chunks), Hoey limits himself to describing a small selection of the countless number of connections we make between words in terms of collocation, semantic association, pragmatic association, colligation, etc.; saying that repeated exposure to naturally occurring data is the sufficient condition for language learning; and claiming that what lexical patterns you acquire depends on the frequency with which you are exposed to them. You suggest that the child, as it learns language, is able to transfer the knowledge of primings for the words it does know onto those that it does not, or that it only knows partially. The syntax is not there from the outset but emerges as the result of the growth in sophistication that comes as more and more primings cluster about the ‘word’ – which here seems to be a shorthand for a place-marker over a ‘space’ in the linguistic system. So, instead of Chomsky’s claim that the child starts out with a knowledge of certain underlying principles of grammar which are common to all languages, we substitute the claim that the child starts with “a capacity for the acquisition of primings”. The problem is that it doesn’t make sense to say that we acquire primings. Priming means something like “readied by our prior experience” which is a mental state, not something which is acquired. Lexical priming means readied by our prior experience of words to expect words to be in the company of other words (their collocations); to appear in certain grammatical situations (colligations); to be in certain positions in text and discourse (textual colligations) and on and on. In my opinion Hoey doesn’t describe the competence acquired or explain how it is acquired. The latter question is obviously the key to an explanation, and Hoey has to either explain what the mechanism is or revert to an empiricist framework where prior experience is seen as a sufficient explanation and any black box is dispensed with. Until now, there is no convincing explanation of how the language Hoey describes is acquired, and no answer to the the fundamental question of the poverty of the stimulus is provided.
Question 4 Is there any possibility the order of acquisition of grammatical functors may be influenced by: Learners beliefs about what a language is and, therefore, how to learn it?
My Answer: Early work on error analysis, followed by 2 phases of morpheme studies, together with work on studies of acquisition order, indicate that there’s a more or less fixed order in the acquisition of certain aspects of English as an L2. That this order is influenced by learners’ beliefs and attitudes towards the target language is certainly possible, but, first, it’s been very difficult for researchers to present a construct of a belief or an attitude whose effects can be clearly measured, and second, these beliefs and attitudes are claimed to influence rate but not route. The evidence suggests that believing that “accuracy is more important than fluency”, for example, might affect how quickly or slowly you get accurate or fluent, but it doesn’t affect the order of acquisition of certain aspects of the language.
Question 5 Would you be prepared to concede that there might be several plausible alternative explanations for the order of L2 acquisition other than one purely or at least mainly related to cognitive processes – which as far as I know are unobservable to researchers?
My Answer: First, cognitive processes are not entirely unobservable these days, but it’s certainly the case that “interlanguage” is an entirely unobservable theoretical construct, and none the worse for that, IMHO. Like gravity, interlanguage is posited to exist in order to explain something we want to understand. But, yes, of course I accept that other explanations are possible. It’s just that if one subjects all the current candidate theories to critical examination, I think some version of a processing theory, which sees SLA as a process by which attention-demanding, controlled processes become more automatic through practice, and which results in the restructuring of the existing mental representation, is the strongest theory to date. Note that such a processing theory relies to some extent on the acquisition of grammatical knowledge, although it can easily cope with the suggestion that, what is being acquired is not Chomsky’s linguistic competence, but rather communicative competence, which, as I suggested in the previous post, may be something like Widdowson’s description of it as “a matter of knowing a stock of partially pre-assembled patterns, formulaic frameworks, and a kit of rules, so to speak, and being able to apply the rules to make whatever adjustments are necessary according to contextual demands. Communicative competence is a matter of adaption, and rules are not generative but regulative and subservient”.
The Competition Model
I think you might be interested in the Competition Model, which incorporates some of the ideas you touch on, and fleshes out many missing parts of Hoey’s theory.
Bates and MacWhinney’s Competition Model, first outlined in 1982, challenges the two fundamental bases on which processing theories rest: innateness, and a formalist approach to language. In contrast to Chomsky’s Principles and Parameters model, the Competition Model sees language learning as non-modular and non-specific, i.e. it results from the same kinds of cognitive mechanisms as those involved in other kinds of learning. Also in contrast to Chomsky, Bates and MacWhinney do not separate the linguistic form of language from its function; they argue that the two are inseparable. As a result of their rejection of both innateness and formalism, the third difference between the Competition Model and Chomsky’s theory of UG is that while Chomsky offers a theory of competence, Bates and MacWhinney offer a theory of performance. The Competition Model is concerned with how language is used, and while it is certainly true that this is also the main interest for other psycholinguistic approaches to SLA, the difference is that instead of adopting the formalist approach to language as a given, the Competition Model, by adopting a particular version of the functional approach to linguistics, considers language to be constructed through use.
MacWhinney (1997: 114) explains that the Competition Model makes a commitment to four major theoretical issues. These are:
(i) Lexical Functionalism. Functionalism claims that the forms of language are determined by the communicative functions they perform; language is a set of mappings between forms and functions. “Forms are the external phonological and word order patterns that are used in words and syntactic constructions. Functions are the communicative intentions or meanings that underlie language usage” (MacWhinney, 1997:115).
(ii) Connectionism. The Competition Model uses connectionist models to model the interactions between lexical mappings. Connectionism rejects the assumption made by nativists that the brain is a symbol processing device similar to a digital computer, and argues that the brain relies on a type of computation that emphasises patterns of connectivity and activation. MacWhinney, in keeping with the empiricist approach he adopts, uses evidence from studies in the field of cognitive neuroscience to help build his model. “The human brain is basically a huge collection of neurons. These neurons are connected through axons. When a neuron fires, it passes activation or inhibition along these axons and across synapses to all the other neurons with which it is connected. This passing of information occurs in an all-or-none fashion. There is no method for passing symbols down axons and across synapses. Brain waves cannot be used to transmit abstract objects such as phrase structures. Rather, it appears that the brain relies on a type of computation that emphasizes patterns of connectivity and activation. Models based on this type of computation are called ‘connectionist’ models….. A fundamental feature of these models is that they view mental processing in terms of interaction and connection, rather than strict modularity and separation. Although connectionist models often postulate some types of modules, they tend to view these modules as emergent and permeable (MacWhinney, 1998), rather than innate and encapsulated (Fodor, 1983)” (MacWhinney, 2001: 80).
(iii) Input-driven Learning. Language learning can be explained in terms of input rather than innate principles and parameters. Cue validity is the key construct in this explanation. “The basic claim of the Competition Model is that the system of form-function mappings embodied in language-processing networks is acquired in accord with a property we will call cue validity. .. The single most common interpretation of cue validity is in terms of the conditional probability that an event X will occur given a cue Y, that is p(XY). If this probability is high, then Y is a good cue to X. The most straightforward prediction from this initial analysis is that forms with a high conditional probability should be acquired early and be the strongest determinants of processing in adults” (MacWhinney, 1997: 121). MacWhinney adds in a later paper that “the most basic” determinant of cue strength is task frequency, while “the most important and most basic cue validity dimension is the dimension of reliability. A cue is reliable if it leads to the right functional choice whenever it is present” (MacWhinney, 2001: 75).
(iv) Capacity. Short-term verbal memory has limited capacity and the use of language in real time is continually subject to these limitations. “The Competition Model focuses on the role of underlying conceptual interpretation in determining the utilization of processing capacity” (MacWhinney, 1997: 115). “Although our results for online processing are still far from complete, we now have the outlines of a Competition Model approach to real-time sentence processing. This account treats sentence interpretation as a constant satisfaction process that balances the limitations imposed by verbal memory against the requirements of conceptual interpretation. Our raw memory for strings of nonsense words is not more than about four. However, when words come in meaningful groups, we can remember dozens of words, even when the message is unfamiliar. The most likely candidate for this additional storage is some form of conceptual representation. We…..claim that words are quickly converted into integrated conceptual representations through a process of structure building (Gernsbacher, 1990). This process begins with the identification of a starting point (MacWhinney, 1977), or perspective, from which the entire clause can be interpreted. In English, this is usually the subject” (MacWhinney, 1997: 133).
In brief, the Competition Model argues that language encodes functions like ‘topic’ and ‘agent’ onto surface grammatical conventions in various ways such as word order and subject-verb agreement. Because of the limits on processing, these functional categories compete for control of the surface grammatical conventions. Speakers of languages use four types of cues – word order, vocabulary, morphology, and intonation – to facilitate their interpretation of the these form-function mappings. Because of the principle of limited capacity mentioned above, human languages find different ways of using these cues. A central concept in the Competition Model is that speakers must have a way to determine relationships among elements in a sentence. Language processing involves competition among various cues, each of which contributes to a different resolution in sentence interpretation. Although the range of cues is universal, there is language-specific instantiation of cues, and language-specific “strength” assigned to cues. Another way of putting this is to say that language forms are used for communicative functions, but any one form may realise a number of functions, and any one function can be realised through a number of forms.
In English, for example, word order is very typically SVO in active declarative sentences, and, it is argued, word order is a strong cue for the realisation of many functions. Bates and MacWhinney claim that in Romance languages like Italian and Spanish, however, word order is not so important: they rely more on morphological agreement, semantics and pragmatics. Within a language, the cues often converge to give a clear interpretation of a sentence. In the English sentence “John kicks the ball.” the cues are word order (SVO), knowledge of the lexical items, the animacy criteria (balls do not kick), and subject-verb agreement. But sometimes there is competition among the cues that signal a particular function. For example, in the sentence “That teacher we like a lot.” there is competition between “teacher”, “we” and “lot” for agency. “Lot” can be eliminated because it is inanimate and follows the verb. “We” wins, because although “teacher” is in the optimum position, “we” is in the nominative case and because it agrees in number with the verb.
So far, the discussion holds for both first and second language learning. Turning to SLA, since the connectionist view assumes that all mental processing uses a common interconnected set of cognitive structures, this implies that transfer plays a key role. “the early second language learner should experience a massive amount of transfer from L1 to L2. Because connectionist models place such a strong emphasis on analogy and other types of pattern generalisation, they predict that all aspects of the first language that can possibly transfer to L2 will transfer. This is an extremely strong and highly falsifiable prediction. However, it seems to be in accord with what we currently know about transfer effects in second language learning” (MacWhinney, 1997: 119).
The Competition model claims that the second language learner begins learning with a parasitic lexicon, a parasitic phonology, and a parasitic set of grammatical constructs. “Over time, the second language grows out of this parasitic status and becomes a full language in its own right.” (MacWhinney, 1997: 119. As far as the lexicon is concerned, “this development is explained by the strengthening of direct associations from the L2 phonological form to the underlying referent, and by the restructuring of the meanings of some words. If two words in L1 map onto a single word in L2, the basic transfer process is unimpeded. It is easy for a Spanish speaker to take the L2 English form know and map it onto the meanings underlying saber and conocer (Stockwell, Bowen and Martin, 1965): What is difficult is for the L1 English speaker to acquire this new distinction when learning Spanish. In order to correctly control this distinction, the learner must restructure the concept underlying know into two new, related structures” (MacWhinney, 1997: 120).
In phonology, the L2 learner has to gradually “undo” the inappropriate direct transfer that occurs in the early stages of learning. In grammar, “the weights connecting functions to clusters of forms must be retuned.” MacWhinney, 1997:120) Sometimes, the L2 requires the learner to make new conceptual distinctions not present in the L1. In order to acquire this new category, the L2 learner begins by attempting to transfer from the L1, and in case of difficulties the learner is “resigned to picking up the pieces of this new category one by one and restructuring them into a working system” (MacWhinney, 1997: 121).
The second language learner’s task is thus seen as adjusting the internal speech-processing mechanisms from those appropriate to his L1 to those appropriate for the target language. Ellis, in his treatment of the Competition Model, puts it another way – the learner has to discover the particular form-function mappings that characterise the target language. The task facing the L2 learner is to discover (1) which forms are used to realise which functions in the L2, and (2) what weights to attach to the use of individual forms in the performance of specific functions. (Ellis, 1994: 375) Ellis comments that the question is: how does the learner do this? Does he use the same cues and the same weights as in his L1, or different ones? MacWhinney’s 1997 account goes some way to answering that question: the learner does it by massive transfer, and by then making the necessary adjustments on the basis of the input. The end result of this process of restructuring “is the tightening of within-language links in contrast to between-language links. In this way, a certain limited form of emergent linguistic modularity is achieved” (MacWhinney, 1997: 120).
The Competition Model rests on an empiricist view which attempts to do without non-sensory knowledge, and reduce learning to associationism. It’s important to stress that the Competition Model is based on a commitment to empiricism, but not an empiricism that refuses to consider causal explanations and attempts to rid observation of all “theoretical bias”. The empiricism Bates and MacWhinney champion talks of “mental processes” (though, true to the tradition of empiricism, it prefers to treat mental processes as far as possible as “neurological facts”), “conceptual interpretation”, “processing capacity”, “universals of cognitive structure” and indeed “general explanation”. What is most encouraging is that MacWhinney concludes his 1997 paper by saying “The wise reader will take these arguments for an empiricist position with a healthy grain of salt. We all know that the most reasonable and tenable positions on major issue, such as nativism versus empiricism, inevitably rest somewhere in the middle between the two extremes. However, it is often helpful to view the competing positions in their most undiluted form, so that we can navigate between these alternatives, coming always a bit closer to the truth” (MacWhinney, 1997: 137). Amen to that.
The Competition Model is coherent, cohesive, consistent, and its terms are reasonably well-defined. Furthermore, its hypotheses are precise and have a great deal of empirical content – as we would expect! As a result, researchers have been able to carry out many empirical studies that test the hypotheses. The basic test format used in most of the numerous studies of the Competition Model was to present L2 learners whose native language uses cues and cue strengths that differ from those of the L2 with sentences designed to offer competing cues. The learners were asked to say what the subject of the sentence was. The analysis of the results were based on “choice” (which nouns the subjects chose) and “latency” (the time taken to make the choice). The studies found that L2 learners are indeed faced with conflicts between native language and target language cues and cues strength, and that, to resolve the conflict, they first resort to their L1 processing strategies when interpreting L2 sentences. For example, English learners of Japanese initially made use of rigid word order as a cue: their initial hypothesis was rigid word order. Their next task was to figure out that in Japanese the order is SOV – which they then rigidly applied. On encountering incongruities, learners often resorted to meaning-based cues as opposed to word order, or morphology-based cues. In general, the studies strongly suggest transfer and indicated that the processing strategies of the L2 learners could be located between the two poles represented by the strategies used by native speakers of the two languages involved.
Unfortunately, the research methodology used in the studies is not without its problems. The task that forms the basis of the tests is extremely artificial. This is not in itself enough to invalidate the research (much of the work done in UG could be similarly criticised), but it does make it difficult to be sure that the analysis of the results is valid. McLaughlin and Harrington (1989) suggest that, since many of the sentences used in the studies are extremely deviant, there is the possibility that the wrong thing is being tested. Perhaps subjects are not processing such sentences as they would in actual communicative situations, but are settling on a particular problem-solving strategy to get them through the many judgements of this nature they have to make (McLaughlin and Harrington, 1989, cited in Ellis, 1994: 378).
The theory certainly lays itself open to falsification; as MacWhinney himself argues, the basic claims of the Competition Model regarding transfer and cue validity effects in SLA are highly falsifiable. The clearest counter-evidence would be instances of strong cue use in L1 that failed to transfer to L2. “If transfer is possible and does not occur, the model would be strongly falsified” (MacWhinney, 1997: 131). MacWhinney lists over 30 studies that he, Bates, and others have conducted in over a dozen languages over a period of fifteen years on aspects of cue validity. A large number of other studies (e.g., Gass 1987, Harrington, 1987, Sasaki, 1991) have examined aspects of the model for SLA. In MacWhinney’s words “These studies have yielded a remarkably consistent body of results.” Ellis (1994), and Braidi (1999), for example, agree that the Competition Model has survived empirical tests well. Most of the tests seem to confirm that different L1 users consistently use the same weighting of cues: word order is by far the most significant factor for English, for example. The studies on L2 learning give a great deal of support to the hypothesis that transfer of L1 weightings to L2 is an important feature of SLA.
Many in the field of SLA see the Competition Model in particular, and connectionist approaches in general, as being one of the most promising areas of all for SLA. The model is, of course, associated with connectionism, a movement in cognitive science which attempts to explain human intellectual abilities using artificial neural networks. Neural networks are simplified models of the brain, composed of large numbers of units (the analogs of neurons) together with weights that measure the strength of connections between the units. The central task of connectionist research is to find the correct set of weights to accomplish a given task by “training” the network. An early connectionist model was a network trained by Rumelhart and McClelland (1986) to predict the past tense of English verbs. The network showed the same tendency to overgeneralise as children, but there is still no agreement about the ability of neural networks to learn grammar. The interest in connectionism is that it may provide an alternative to the modular theory of mind. If it can be shown that these artificial networks can “learn”, then successive advances in what is known about the brain – which is seen as a neural network comprised of neurons and their connections (synapses) – may be enough to explain cognitive processes and learning without recourse to the “black box” of the mind.
Bates, E. and MacWhinney, B. (1987) Second language acquisition from a functionalist perspective: pragmatic, semantic, and perceptual strategies. In Winitz, H. (ed.) Native language and foreign language acquisition. Annals of the New York Academy of Sciences.
Braidi, S. M. (1995) Reconsidering the role of interaction and input in second language acquisition. Language Learning 45, 141-75.
Chomsky, N. (1959) Review of Skinner’s Verbal Behaviour. Language, 35 26-58.
Chomsky, N. (1965) Aspects of the theory of syntax. Cambridge, Mass.: MIT Press.
Ellis, R. (1994) The study of second language acquisition. Oxford: Oxford University Press.
MacWhinney, B. and Bates, E. (eds.) (1989) The crosslinguistic study of sentence processing. Cambridge: Cambridge University Press
MacWhinney, B. (1997) Second Language Acquisition and the Competition Model. In de Groot, A. B. M. and Kroll, J. F. (eds) Tutorials in Bilingualism: Psycholinguistic Perspectives. Hillsdale, N.J. Erlbaum.
Rumelhart, D. and McClelland, J. (1986) On learning the past tense of English verbs. In McClelland, J. and Rumelhart, D. (eds.) Parallel Distributed Processing: Explorations in the microstructure of cognition. Cambridge, Mass.: MIT Press.
Skinner, B. F. (1957) Verbal behavior. New York: Appleton-Century-Crofts.