Chomsky’s Critics 2: Elizabeth Bates

Elizabeth Bates (1947 – 2003) was a brilliant scholar perhaps best known for her work with Brian MacWhinney on the Competition Model and Connectionism. In her often outspoken work, Bates challenges the modular theory of mind and, more specifically, criticises the nativists’ use of accounts of “language savants” and those suffering from cognitive or language impairment disabilities to support their theory.  Specifically, in her review of Smith and Tsimpli’s The mind of a savant , Bates (2000) challenges the authors’ conclusions about Christopher, the savant in question, and, along the way, challenges the two main arguments supporting the UG “ideology”, as she calls it: the existence of universal properties of language, and the poverty of the stimulus.

First, the existence of language universals does not provide compelling evidence for the innateness of language, because such universals could arise for a variety of reasons that are not specific to language itself (e.g., universal properties of cognition, memory, perception, and attention).  (Bates, 2000: 5)

Bates, following Halliday, gives the analogy of eating food with ones’ hands (with or without tools like a fork or a chopstick), which can be said to be universal. Rather than posit “an innate hand-feeding module, subserved by a hand-feeding gene”, a simpler explanation is that, given the structure of the human hand, the position of the mouth, and the nature of the food we eat, this is the best solution to the problem.

In the same vein, we may view language as the solution (or class of solutions) to a difficult and idiosyncratic problem: how to map a rich high-dimensional meaning space onto a low-dimensional channel under heavy information-processing constraints, guaranteeing that the sender and the receiver of the message will end up with approximately the same high-dimensional meaning state.  Given the size and complexity of this constraint satisfaction problem, the class of solutions may be very small, and (unlike the hand-feeding example) not at all transparent from an a priori examination of the problem itself  (Bates, 2000: 5).

Bates gives other examples to support her argument that solutions to particular problems of perception and cognition often evolve in an ad hoc way, and that there is no need no jump to the convenient conclusion that the problem was solved by nature.  As she says “That which is inevitable does not have to be innate!”  (Bates, 2000:  6)

Bates sees language as consisting of a network, or set of networks, and she was one of the first to begin work on a connectionist model, known now as the Competition Model. She’s refreshingly frank in recognising that neural network simulations of learning are still in their infancy, and that it’s still not clear how much of human language learning such systems will be able to capture. Nevertheless, she says, the neural network systems which have already been constructed are able to generalise beyond the data and recover from error. “The point is, simply,” says Bates, “that the case for the unlearnability of language has not been settled one way or the other” (Bates, 2000: 6).

Bates goes on to say that when the nativists point to the “long list of detailed and idiosyncratic properties” described by UG, and ask how these could possibly have been learned, this begs the question of whether UG is a correct description of the human language faculty.  Bates paraphrases their argument as follows:

  1. English has property P.
  2. UG describes this property of English with Construct P’.
  3. Children who are exposed to English, eventually display the ability to comprehend and produce English sentences containing property P.
  4. Therefore English children can be said to know Construct P’.

Bates comments:

There is, of course, another possibility: Children derive Property P from the input, and Construct P’ has nothing to do with it. (Bates, 2000: 6)

An important criticism raised by many, and taken up by Bates, against Chomsky’s theory is that it is difficult to test. In principle, one of the strong points of UG is precisely its empirical testability – find a natural language where the description does not fit, or find a mature language user of a natural language who judges an ill-formed sentence to be grammatical, and you have counter-evidence. However, Bates argues that the introduction of parameters and parameter settings “serve to insulate UG from a rigorous empirical test.” In the case of binary universals (e.g., the Null Subject Parameter), any language either will or will not display them, they “exhaust the set of logical possibilities and cannot be disproven.” Other universals are allowed to be silent or unexpressed if a language does not offer the features to which these universals apply. For example universal constraints on inflectional morphology cannot be applied in Chinese, since Chinese has no inflectional morphology. Rather than allow Chinese to serve as a counter example to the universal, the apparent anomaly is resolved by saying that the universal is present but silent. Bates comments: “It is difficult to disprove a theory that permits invisible entities with no causal consequences.


1. Poverty of the Stimulus

Many of the criticisms made by Sampson and Bates do not seem to me to be well-founded.  While Bates is obviously correct to say that language universals could arise for a variety of reasons that are not specific to language itself, Bates provides no evidence against Chomsky’s claims. To say that “the case for the unlearnability of language has not been settled” amounts to the admission that no damning evidence has yet been found against the poverty of the stimulus argument, and, of course, such an argument can never be “proved”.

In general, to suggest that learning a language is just one more problem-solving task that the general learning machinery of the brain takes care of ignores all the empirical evidence of those adults who attempt and fail to learn a second language, and the evidence of atypical populations who successfully learn their L1.  Despite Bates’ careful and convincing unpicking of the more strident claims made by nativists in their accounts of atypical populations, it’s hard to explain the cases of those with impaired general intelligence who have exceptional linguistic ability (see Smith, 1999: 24), or the cases of those with normal intelligence who, after a stroke, lose their language ability while retaining other intellectual functions (see Smith 1999: 24-29), if language learning is not in fact localised.

Turning to Sampson, when he challenges Chomsky’s poverty of the stimulus argument by saying that many children have in fact been subjected to input like Blake’s Tyger poem, he ignores the obvious fact that many children have not, and when he says that children need input of yes/no questions in order to learn how to form them, nobody would disagree; the question remains of how the child also learns about aspects of the grammar that are not present in the input. In my recent discussion with Scott about the poverty of the stimulus argument, he claimed, as does Sampson, that “everything the child needs” is, in fact, present in the input, and thus no resort to nativist arguments of modular mind, innate knowledge, the LAD, or any of that, is necessary. While Sampson attempts, bizarrely and without success, to use Popper’s arguments for progress in science through conjectures and refutations as a model for language acquisition, I think Scott was relying more on the kind of emergentist theory of learning that Bates has promoted. But, in my opinion, only Bates shows any appreciation for just how hard it is to do without any appeal to innateness. Let’s take a quick look.

Nativism vs. Emergentism

Gregg (2003) highlights the differences between the two approaches. On the one hand, he says, we have Chomsky’s theory which posits a rich, innate representational system specific to the language faculty, and non-associative mechanisms, as well as associative ones, for bringing that system to bear on input to create a grammar. On the other hand, we have the emergentist position, which denies both the innateness of linguistic representations  and the domain-specificity of language learning mechanisms.

Starting from the premise that items in the mind get there through experience, emergentists adopt a form of associationism and argue that items that go together in experience will go together in thought. If two items are paired with sufficient frequency in the environment, they will go together in the mind.  In this way we learn that milk is white,  -ed is the past tenser marker for English verbs, and so on. Associationism shares the general empiricist view that complex ideas are constructed from simple “ideas”, which in turn are derived from sensations caused by interaction with the outside world. Gregg (2003) acknowledges that these days one certainly can model associative learning processes with connectionist networks, but he highlights the severe limitations of connectionist models by examining the Ellis and Schmidt model (see Gregg, 2003: 58 – 66) in order to emphasise just how little the model has learned and how much is left unexplained.  Re-reading the 2003 article makes me wonder if Scott and others who dismiss innateness as an explanation appreciate the sheer implausibility of a project which does without it. How can emergentists seriously propose that the complexity of language emerges from simple cognitive processes being exposed to frequently co-occurring items in the environment?

And so we return to the root of the problem of any empiricist account: the poverty of the stimulus argument.  Emergentists, by adopting an associative learning model and an empiricist epistemology, where some kind of innate architecture is allowed, but not innate knowledge, and certainly not innate linguistic representations, have a very difficult job explaining how children come to have the linguistic knowledge they do. They haven’t managed to explain how general conceptual representations acting on stimuli from the environment produce the representational system of language that children demonstrate, or to explain how, as Eubank and Gregg put it “children know which form-function pairings are possible in human-language grammars and which are not, regardless of exposure” (Eubank and Gregg, 2002: 238). Neither have emergentists so far dealt with “knowledge that comes about in the absence of exposure (i.e., a frequency of zero) including knowledge of what is not possible” (Eubank and Gregg, 2002: 238).

I gave Vivian Cook’s version of the PoS argument in Part 1, but let me here give  Gregg’s  summary of Laurence and Margolis’ (2001: 221) “lucid formulation”:

  1. An indefinite number of alternative sets of principles are consistent with the regularities found in the primary linguistic data.
  2. The correct set of principles need not be (and typically is not) in any pretheoretic sense simpler or more natural than the alternatives.
  3. The data that would be needed for choosing among those sets of principles are in many cases not the sort of data that are available to an empiricist learner.
  4. So if children were empiricist learners they could not reliably arrive at the correct grammar for their language.
  5. Children do reliably arrive at the correct grammar for their language.
  6. Therefore children are not empiricist learners (Gregg, 2003: 48).

To the extent that the emergentists insist on a strict empiricist epistemology, they’ll find it extremely difficult to provide any causal explanation of language acquisition, or, more relevant to us, of SLA. Combining observed frequency effects with the power law of practice, for example, and thus explaining acquisition order by appealing to frequency in the input doesn’t go far in explaining the acquisition process itself.  What role do frequency effects have, how do they interact with other aspects of the SLA process?  In other words, we need to know how frequency effects fit into a theory of SLA, because frequency and the power law of practice don’t provide a sufficient theoretical framework in themselves. Neither does connectionism; as Gregg points out “connectionism itself is not a theory….. It is a method, and one that in principle is neutral as to the kind of theory to which it is applied” (Gregg, 2003: 55).

 2. Idealisation

There is also the question of idealisation, stressed by Sampson in his criticisms, and probably the most frequently-expressed objection made to UG. The assumption Chomsky makes of instantaneous acquisition, like the idealisation of the “ideal speaker-listener in a completely homogenous speech-community”, is a perfectly respectable tool used in theory construction: it amounts to no more than the “ceteris paribus” argument that allows “all other things to be equal” so that we can isolate and thus better examine the phenomenon in question. Idealisations are warranted because they help focus on the important issues, and to get rid of distractions, which does not mean that this step is immune to criticism, of course. It’s up to Chomsky to make sure that any theories based on idealizations are open to empirical tests, and it is then up to those who disagree with Chomsky to come up with some counter evidence and/or to show that the idealisation in question has protected the theory from the influence of an important factor.  Thus, if Sampson wants to challenge Chomsky’s instantaneous acquisition assumption, he will have to show that there are differences in the stages of people’s language acquisition which result in significant differences in the end state of their linguistic knowledge.

While on the subject of idealisations, we may deal with the criticism of sociolinguists who challenge Chomsky’s idealisation to a homogenous speech community by saying that Chomsky is ruling out of court any discussion of variations within a community.  Chomsky would reply that he’s doing no such thing, and that if anybody is interested in studying such variations they are welcome to do so.  Chomsky’s opinion of the scant possibility of progress in such an investigation is well-known, but he of course admits that it’s  only an opinion. What Chomsky is interested in, however, is the language faculty, and the acquisition of a certain type of well-defined knowledge. In order to better investigate this domain, Chomsky idealises the speech community.  Sociolinguists can either produce arguments and data which show that such an idealization is illegitimate (i.e. that it isolates part of the theory from the influence of a significant factor), or say that they are interested in a completely different domain.  It seems to be often the case that criticisms of Chomsky arise from misunderstandings about the role of idealisations in theory construction, or about the domain of a theory.

Weaknesses of UG theory

Chomsky’s theory runs into difficulties in confronting the question of how UG evolves, and how the principles and parameters arrive at a stable state in a normal child’s development.  Furthermore, there’s  no doubt that the constant re-formulation of UG results in “moving the goal points” and protecting the theory from bad empirical evidence by the use of ad hoc hypotheses.

And we shouldn’t forget that when we discuss UG we have the “principles and parameters” theory in mind, and not the “Minimalist” programme, let alone Internalism. Internalism sees Chomsky insisting that the domain of his theory is not grammar but “I-language”, where “I” is “Internal” and where “Internal” means in the mind. While exposure to external stimuli is necessary for language acquisition, Chomsky maintains that, as Smith puts it “the resulting system is one which has no direct connection with the external world” (Smith, 1999: 138). This highly counter-intuitive claim takes us into the technicalities of a philosophical debate about semantics in general and “reference” in particular, where Chomsky holds the controversial view that semantic relations “are nothing to do with things in the world, but are relations between mental representations: they are entirely inside the head”  (Smith, 1999: 167).  Perhaps the most well-known example of this view is Chomsky’s assertion that while we may use the word “London” to refer to the capital city of the UK, it’s unjustified to claim that the word itself refers to some real entity in the world.  Go figure, as they say.

But the most important criticism I personally have of UG is that it is too strict and too narrow to be of much use to those trying to build a theory of SLA. I think it’s important to challenge Chomsky’s claim that questions about language use “lie beyond the reach of our minds”, and that they “will never be incorporated within explanatory theories intelligible to humans” (Chomsky, 1978).  Despite Chomsky’s assertion, I think we may assume that the L2 acquisition process is capable of being rationally and thoroughly examined.  Further, I suggest that it need not be, indeed should not be, idealised as an instantaneous event, which is to say, I assume that we can ask rational questions about the stages of development of interlanguages, that we can study the real-time processing required to understand and produce utterances in the L2, that we can talk about not just the acquisition of abstract principles but of skills, and even that we can study how different social environments affect SLA.

By insisting on a “scientific” status for his theory, Chomsky severely limits its domain, and to appreciate just how limited the domain of UG is, let us remind ourselves of Chomsky’s position on modularity.  Chomsky argues that in the human mind there is a language faculty, or grammar module, which is responsible for grammatical knowledge, and that other modules handle other kinds of knowledge. Not all of what is commonly referred to as “language” is the domain of the language module; certain parts of peripheral grammatical knowledge, and all pragmatic knowledge, are excluded. To put it another way, the domain of Chomsky’s theory is restricted by his distinction between I-language and E-language; Chomsky is concerned with the individual human capacity for language, and with the universal similarities between languages – his domain deliberately excludes the community. No justification needs to be offered for deciding to focus on a particular phenomenon or a particular hypothesis, but it is essential to grasp the domain of Chomsky’s theory.  Cook (1994) puts it this way:

Chomskian theory claims that, strictly speaking, the mind does not know languages but grammars; ‘the notion “language” itself is derivative and relatively unimportant’ (Chomsky, 1980, p. 126).  “The English Language” or “the French language” means language as a social phenomenon – a collection of utterances.  What the individual mind knows is not a language in this sense, but a grammar with the parameters set to particular values.  Language is another epiphenomenon: the psychological reality is the grammar that a speaker knows, not a language (Cook, 1994: 480).

Gregg (1996) has this to say:

… “language” does not refer to a natural kind, and hence does not constitute an object for scientific investigation.  The scientific study of language or language acquisition requires the narrowing down of the domain of investigation, a carving of nature at its joints, as Plato put it. From such a perspective, modularity makes eminent sense (Gregg, 1996: 1).

Chomsky himself says that what he seeks to describe and explain is

The cognitive state that encompasses all those aspects of form and meaning and their relation, including underlying structures that enter into that relation, which are properly assigned to the specific subsystem of the human mind that relates representations of form and meaning. A bit misleadingly perhaps, I will continue to call this subsystem ‘the language faculty’ (Chomsky 1980).

Pragmatic competence, on the other hand, is left out because

there is no promising approach to the normal creative use of language, or to other rule-governed acts that are freely undertaken…..  the creative use of language is a mystery that eludes our intellectual grasp (Chomsky, 1980).

Chomsky would obviously agree that syntax provides no more than clues about the content of any particular message that someone might try to communicate, and that pragmatics takes these clues and interprets them according to their context.  If one is interested in communication, then pragmatics is vital, but if one is interested in language as a code linking representations of sound and meaning, then it is not.  Chomsky’s strict demarcation between science and non-science effectively rules out the study of E-Language, and consequently his theory neither describes nor explains many of the phenomena that interest linguists. Far less does UG describe or explain the phenomena of SLA. By denying the usefulness of attempts to explain aspects of language use and usage that fall outside the domain of I-Language, UG  can’t be taken as the only valid frame of reference for SLA research and theory construction, or even as a good model.


Bates, E. (2000) Language Savants and The Structure of The Mind.  International Journal of Bilingualism. 

Bates, E.; Elman, J.; Johnson, M.; Karmiloff-Smith, A.; Parisi, D.; and Plunkett, K. (1998) Innateness and Emergentism.  In Bechtel, W., and Graham, G., (eds) A Companion to Cognitive Science. 590-601. Oxford: Basil Blackwell.

Bates, E. and Goodman, J. (1997) On the inseparability of grammar and the lexicon: evidence from apasia, acquisition and real-time processing.  Language and Cognitive Processes, 12 , 507-584.

Chomsky, N. (1980) Rules and representations. Oxford: Basil Blackwell.

Cook, V. J. (1994) The Metaphor of Access to Universal Grammar in L2 Learning.  In Ellis, N. (ed.)  Implicit and Explicit Learning of Languages.  London: Academic Press.

Gregg, K. R. (1996) The logical and developmental problems of second language acquisition.  In Ritchie, W.C. and Bhatia, T.K. (eds.) Handbook of second language acquisition.  San Diego: Academic Press.

Gregg, K. R. (2000) A theory for every occasion: postmodernism and SLA.  Second Language Research 16, 4, 34-59.

Gregg, K. R. (2003) The state of emergentism in second language acquisition.  Second Language Research, 19, 2, 42-75.

Laurence, S. and Margolis, E. (2001) The Poverty of the Stimulus Argument. British Journal for the Philosophy of Science, Vol. 52, 3.

Smith, N. (1999) Chomsky: Ideas and Ideals.  Cambridge: Cambridge University Press.

Smith, N., & Tsimpli, I-M. (1995). The mind of a savant: Language learning and modularity. Oxford: Basil Blackwell.

British Jnl. for the Philosophy of Sci.Volume 52, Issue 2 Pp. 217-276.

5 thoughts on “Chomsky’s Critics 2: Elizabeth Bates

  1. Thanks for this Geoff, I think this is the post I have most liked in ages, very well argued, clearly presented and shows a wonderful understanding of what Chomsky is on about, even for those of us who read a lot…. Chomsky is still a challenge… thank you.


  2. Hi Geoff,

    While reading your highly informative post, I suddenly wondered how prenatal stimuli, for example, could relate to this debate. I came across this article, which argues that infants may have learned about the properties of the native language while still in the womb. The results of the experiment the article presents indicate that even prior to birth, the human brain is tuning to the language environment. At first I thought this was not very relevant to Chomsky and the concept of UG because the stimuli a foetus gets in utero are of purely auditory nature, e.g. rhythm and pitch. However, this bit caught my attention: “Phonologists have traditionally classified the world’s languages into three main rhythmic categories: stress-timed (e.g., English, Dutch), syllable-timed (e.g., Spanish, French), and mora-timed (e.g., Japanese). This distinction is critically important to language learning as rhythmicity is associated with word order in a language (Nespor et al., 2008), rendering it one of the most potentially informative perceptual cues for bootstrapping language acquisition”. A question that immediately occurred to me was whether this argument would support or disprove Chomsky’s theory. What do you think?


  3. Hi Hana,

    Chomsky is of course interested in phonology – the grammar of sounds, but I’m afraid I’m not, so can’t help much. .

    As you probably know, Chomsky and Morris Halle founded the Generative School of Phonology in the late 1950’s and published “The Sound Patterns of English” in 1968. For those who know nothing about any of this, if you go to the end of page 6 of this article (which I found a copy of among my stuff because the authors are good on the philosophy of science) here you’ll get some idea of what’s involved. At the beginning of the book they say “We may think of a language as a set of sentences, each with an ideal phonetic form and an associated intrinsic semantic interpretation. The grammar of the language is the system of rules that specifies this sound-meaning correspondence”. Sound familiar? From what I understand (which is very little), Chomsky’s view of phonology is that phonological representations are segments made up of “distinctive features”. As far as I can make out, these features are like the principles and parameters of syntax: they’re binary and universal. So I would guess that the stimuli a foetus gets in utero are just that – and therefore unable to account for the knowledge which the nativists, using their battery of tests, say that children have about language. I’d also guess that nativists would say that the kinds of auditory input of rhythm and pitch, for example, that the researchers measured, would not explain the children’s putative linguistic competence.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s