This is adapted from my book. I include it because there’s rising interest in it. Scott Thornbury recently discussed it under the heading “Turning Point” in his blog http://scottthornbury.wordpress.com/ and made some very silly (IMHO) comments about phase shift. I wrote this (or rather, copied it from the book) before I read Scott’s post, and so I think I’ll have to deal with it again in a post on this website, where I can say what I like 🙂
The growing interest in connectionist views and associative learning is reflected in the development of what has been dubbed the “emergentist” approach to SLA. Ellis (1999) explains that emergentists (he includes MacWhinney and himself among them) “believe that the complexity of language emerges from relatively simple developmental processes being exposed to a massive and complex environment.” The Competition Model is a good example of an emergentist approach, rejecting, as it does, the nativist UG account of language, and the nativist assumption that human beings are born with linguistic knowledge and a special language learning mechanism. Ellis’s paper “Frequency Effects in Language Processing” is another.
In a special issue of Studies in Second Language Acquisition, Ellis shows how language processing is “intimately tuned to input frequency”, and expounds a “usage-based” theory which holds that “acquisition of language is exemplar based”. (Ellis, 2002: 143) The power law of practice is taken by Ellis as the underpinning for his frequency-based account, and then, through an impressive review of literature on phonology and phonotactics, reading and spelling, lexis, morphosyntax, formulaic language production, language comprehension, grammaticality, and syntax, Ellis argues that “a huge collection of memories of previously experienced utterances” rather than knowledge of abstract rules, is what underlies the fluent use of language. In short, emergentists take most language learning to be “the gradual strengthening of associations between co-occurring elements of the language”, and they see fluent language performance as “the exploitation of this probabilistic knowledge.” (Ellis, 2002: 173.)
Seidenberg and MacDonald (1999) suggest a “probabilistic constraints approach” to language acquisition, which adopts the connectionist approach to knowledge representation, and provides an alternative framework to “the generative paradigm”. In place of equating knowing a language with knowing a grammar, the probabilistic constraints approach adopts the functionalist assumption that language knowledge is “something that develops in the course of learning how to perform the primary communicative tasks of comprehension and production.” (Seidenberg and MacDonald, 1999: 571) This knowledge is viewed as a neural network that maps between forms and meanings, and further levels of linguistic representation, such as syntax and morphology, are said to emerge in the course of learning tasks. An alternative to “Competence” is also offered by Seidenberg and Macdonald, who argue that the competence-performance distinction excludes information about statistical and probabilistic aspects of language, and that these aspects play an important role in acquisition. The alternative is to characterize a performance system that handles all and only those structures that people can. Performance constraints are embodied in the system responsible for producing and comprehending utterances, not extrinsic to it (MacDonald & Christiansen, 1999; Christiansen & Chater, 1999). This approach obviates the paradox created by a characterization of linguistic knowledge that generates sentences that people neither produce nor comprehend. (Seidenberg and MacDonald, 1999: 573)
A third difference in this approach is the way in which the language learning task confronting the learner is characterised. “The generative approach sees the task as grammar identification ….. The alternative view is that the child is engaged in learning to use language.” (Seidenberg and MacDonald, 1999: 574). This change in orientation from grammar orientation to learning to use language has important consequences for standard poverty of the stimulus arguments. “In brief it turns out that many of the classic arguments rest on the assumption that the child’s task is grammar identification, and these argument simply no longer apply if the task is instead acquiring the performance system underlying comprehension and production” (Seidenberg and MacDonald, 1999: 574). As a final example of emergentism, Bates et al., (1998) look at innateness through the emergentist perspective and attempt to translate innateness claims to empiricist statements. They argue that innateness is often used as a logically inevitable, fall back explanation. “In the absence of a better theory, innateness is often confused with (1) domain specificity (Outcome X is so peculiar that it must be innate), (2) species specificity (we are the only species who do X so X must lie in the human genome), (3) localization (Outcome X is mediated by a particular part of the brain, so X must be innate), and (4) learnability (we cannot figure out how X could be learned so X must be innate” (Bates, et al., 1998: 590). Instead of this unsatisfactory “explanation” Bates et. al believe that an explicit, empirically-based theory of interaction, a theory that will explain the process by which nature and nuture, genes and the environment, interact without recourse to innate knowledge, is “around the corner”. They further argue that this theory of interaction, when it arrives, will have an emergentist form.
In an emergentist theory, outcomes can arise for reasons that are not predictable from any of the individual inputs to the problem. “Soap bubbles are round because a sphere is the only possible solution to achieving maximum volume and minimum surface (i.e., their spherical form is not explained by the soap the water or the little boy who blows the bubble.)” (Bates, et.al., 1998). This unlikely view is supported by Jean Piaget, who argued that logic and knowledge emerge in just such a fashion, from successive interactions between sensorimotor activity and a structured world. In the same way, it has been argued that grammars represent the class of possible solutions to the problem of mapping hyperdimensional meanings onto a low-dimensional channel, heavily constrained by the limits of human information processing (e.g., MacWhinney and Bates, 1989). Logic, knowledge and grammar are not given in the world, but neither are they given in the genes” (Bates, et al., 1998: 590).
Bates et al.(1998) propose to start by specifying the constraints on emergent forms offered by genes and environment. “Innateness” is defined as a claim about the amount of information in a complex outcome that was contributed by the genes. They use a taxonomy proposed by Elman et al. to identify different types of innateness and their location in the brain. A major achievement would be to locate the “mental organ” that Chomsky and others claim is responsible for language; “Pinker suggests that this innate knowledge must lie in the “microcircuitry” of the brain. We think that he is absolutely right: If the notion of a language instinct means anything at all, it must refer to a claim about cortical microcircuitry, because this is (to the best of our knowledge) the only way that detailed information can be laid out in the brain” (Bates et al., 1998: 594).
Bates et al. (1998) concede that while this kind of representational nativism is theoretically plausible and attractive “it has proven hard to defend on both mathematical and empirical grounds.” Other parts of innate constraint – innate architecture for example – are proving easier to conceptualise and locate in the brain, but in any case Bates at al. were doing no more than making the claim that it is feasible to give an account of how people learn language without resorting to innate knowledge.
We cannot conclude from the presence of eccentric structures that those structures are innate – not even if they are unique to our species, universal among all normal members of that species, localized in particular parts of the system, and learnable only under specific conditions. The same facts can be explained by replacing innate knowledge (i.e. representations) with architectural and temporal constraints that require much less genetically specified information. This kind of emergentist solution to the Nature-Nurture controversy has been around for many years, but it has only become a scientifically viable alternative in the past decade. (Bates, et al., 1998: 598)
Emergentism claims that complex systems exhibit ‘higher-level’ properties that are neither explainable, nor predictable from ‘lower-level’ physical properties, while they nevertheless have causal and hence explanatory efficacy. This would not seem to be, at first glance, a very attractive doctrine for an empiricist, but it certainly provides a way out of the difficulties empiricists encounter explaining complex representational systems. Modern empiricists, in the fields of cognitive psychology and linguistics, for example, who are attempting to do without the concept of innate knowledge (and even of the mind), could well do with such an added ingredient. Without wishing to impute motives to anyone, it is precisely the problem of the poverty of the stimulus that an emergentist approach can overcome.
Ellis makes little reference to, or use of, what we might call the emergentist part of emergentism. His paper on frequency effects, outlined briefly in the section above, seems far more in the associationist camp. Starting from the empiricist premise that items in the mind get there through experience, associationism argues that items that go together in experience will go together in thought. If two items are paired with sufficient frequency in the environment, they will go together in the mind. In this way we learn that milk is white, -ed is the past tenser marker for English verbs, and so on. Associationism shares the general empiricist view that complex ideas are constructed from simple “ideas”, which in turn are derived from sensations. The sensations are not governed by association, and are caused by interaction with the outside world.
There is no explicit reference to emergentism in Seidenberg and MacDonald (1999), and, while the commitment to empiricism and to the view that complex systems can be built from relatively simple sub-systems is evident, there is no overt suggestion that the probabilistic restraints view goes any further than associationism.
Nor, I believe, did the Competition Model start out with a clear emergentist epistemology. I make this attempt to distance the models and theories of those in SLA who adopt a functionalist view of grammar and an empiricist epistemology from emergentism because I think the latest radical empiricist versions of emergentism are actually quite dangerous; no more dangerous than one of those meteorites that might collide with planet earth, but dangerous. While classic emergentism was unable to explain how novel properties could emerge from complex systems, and thus remained somewhat mysterious (even smacking of dialectics), the latest versions of emergentism seem to be getting closer to a model of the process. The problem with this for a rationalist (as Dr. Slors of Tilberg University points out) is that the more it becomes possible to demonstrate the systematic interconnections between psychology and physics, for example (the more we can do away with the construct of the mind, and just talk about the brain), the closer we get to describing the necessary and sufficient conditions for psychological states in physical terms, and the closer we get to reductionism. Reductionism finds the ultimate meaning of the “object” not in its inherent qualities but in the parts which compose it, which is to say that we enter the topsy turvey world where there are only parts.
The usual response to this problem is to argue that it is impossible to construct such a theory. Bates et al., as we have seen, think that it is just round the corner, and appear to take seriously the task of reducing the mind to the brain. This brings from me the same reaction it brought from my daughter when I told her what I was writing about: “Scary!” On the assumption that we still have a few years to go, let us return to the debate about nativism versus emergentism among those interested in constructing a theory of SLA.
Gregg (2003) in his discussion of emergentism in SLA distinguishes between ‘nativist’ and ‘empiricist’ emergentists. “Emergentists are a fairly heterogeneous group, although having in common a rejection of anything like a ‘Chomskian’ UG, but one can distinguish two subsets: ‘nativist’ emergentists – mainly O’Grady and his associates – and what I call ‘empiricist’ emergentists, a term that I think accurately includes all other self-proclaimed emergentists. In SLA specifically, empiricist emergentism has been forcefully and accurately advocated in a series of articles by Nick Ellis (e.g., Ellis; 1998; 1999; 2002a; 2002b; 2003)” (Gregg, 2003: 43).
Gregg chooses to deal with the ‘empiricist’ emergentists, and, in order not to be confused with the O’Grady ‘special nativist’ position, uses the term used by Fodor (1984) ‘mad dog nativism’ to refer to his own, ‘Chomskian’ position. Gregg gives this succinct summary of the two positions: “So the lines are drawn: On the one hand, we have mad dog nativist theories which posit a rich, innate representational system specific to the language faculty, and non-associative mechanisms, as well as associative ones, for bringing that system to bear on input to create an L2 grammar. On the other hand, we have the emergentist position, which denies both the innateness of linguistic representations (Chomsky modularity) and the domain-specificity of language learning mechanisms (Fodor-modularity” (Gregg, 2003: 46).
Gregg argues that empiricist emergentism has no property theory to offer, but this is surely not surprising if the aim of emergentism is to do away with innate, domain-specific representational systems, and show that all that the learner needs is “an ability to do distributional analyses and to remember the products of the analyses” (Gregg, 2003: 55). What is surprising, as Gregg notes, is that Ellis seems to accept the validity of the linguist’s account of grammatical structure. As to the emergentist transition theory, Gregg takes this to be based on associative learning, which is certainly a fair description of Ellis’ position. Gregg says that these days one can model associative learning processes with connectionist networks;
“If empirical emergentism is to be a viable rival to mad dog nativism, it is important – perhaps even essential – that connectionism can be recruited to implement the emergentist transition theory” (Gregg, 2003: 55).
The severe limitations of connectionist models are highlighted by Gregg, who goes to the trouble of examining the Ellis and Schmidt model (see Gregg, 2003: 58 – 66) in order to emphasise just how little the model has learned and how much is left unexplained. The sheer implausibility of the enterprise strikes me as forcefully as it seems to strike Gregg. How can emergentists seriously propose that the complexity of language emerges from simple cognitive processes being exposed to frequently co-occurring items in the environment?
At the root of the problem of any empiricist account is the poverty of the stimulus argument. Emergentists, by adopting an associative learning model and an empiricist epistemology (where some kind of innate architecture is allowed, but not innate knowledge, and certainly not innate linguistic representations), have a very difficult job explaining how children come to have the linguistic knowledge they do. How can general conceptual representations acting on stimuli from the environment explain the representational system of language that children demonstrate? How come, as Eubank and Gregg, put it “children know which form-function pairings are possible in human-language grammars and which are not, regardless of exposure” (Eubank and Gregg, 2002: 238)? How can emergentists deal with cases of instantaneous learning, or “knowledge that comes about in the absence of exposure (i.e., a frequency of zero) including knowledge of what is not possible (Eubank and Gregg, 2002: 238)”?
Gregg (2003) summarises Laurence and Margolis’ (2001: 221) “lucid formulation” of the poverty of the stimulus argument:
1. An indefinite number of alternative sets of principles are consistent with the regularities found in the primary linguistic data.
2. The correct set of principles need not be (and typically is not) in any pretheoretic sense simpler or more natural than the alternatives.
3. The data that would be needed for choosing among those sets of principles are in many cases not the sort of data that are available to an empiricist learner.
4. So if children were empiricist learners they could not reliably arrive at the correct grammar for their language.
5. Children do reliably arrive at the correct grammar for their language.
6. Therefore children are not empiricist learners.
(Gregg, 2003: 48)
This extremely telling argument leads on to the more general conclusion that emergentists have no convincing account of what language is. Which is, of course, where the scary inexplicable, unpredictable outcomes that arise from individual inputs to a problem come in. Leaving these aside, if I may, to the extent that the emergentists insist on a strict empiricist epistemology, they will find it extremly difficult to provide any causal explanation of SLA. Combining observed frequency effects with the power law of practice, for example, and thus explaining acquisition order by appealing to frequency in the input does not go very far in explaining the acquisition process itself. What role do frequency effects have, how do they interact with other aspects of the SLA process? In other words, we need to know how frequency effects fit into a theory of SLA, because frequency and the power law of practice do not provide a sufficient theoretical framework in themselves. Neither does connectionism; as Gregg points out “connectionism itself is not a theory….. It is a method, and one that in principle is neutral as to the kind of theory to which it is applied.” (Gregg, 2003: 55)
See Suggested Reading and References page for sources referred to here.