by Noam Chomsky
As for the connectionist's claim to turn a Sellars–Lewis view of language and its embodiment in the brain into what purports to be a natural science (cf. Morris, Cotterell & Elman 2000), consider Chomsky's criticism of the much-hailed success of a recent connectionist form of Sellars's behaviorist-connectionist account of language learning. By way of background, the connectionists have learned some lessons since Sellars's time. Unlike Sellars (and Lewis), they have in recent years come to devote effort to trying to show that their training procedures operating on what they take to be computer models of ‘plastic’ neural nets (“simple recurrent networks” or SRNs in Elman's case) can yield behavioral analogues of Chomsky's linguistic principles. It is not obvious why. Their efforts are puzzling in the way the Sellarsian and Lewisian efforts were, but also for another reason. In choosing what to train SRNs to produce in the way of outputs, they choose behaviors that conform to some rule statement or another that has appeared in the work in the Chomskyan tradition. They devote considerable time and experimental resources to trying to get a computer model of a plastic neural net (more realistically, very many of them going through massive training sessions in various ‘epochs’ of training, sometimes with the best performers subjected to other epochs in an attempt to simulate a [naïve: see Appendix II] view of evolution, and so on) after a long process of training to duplicate in its/their outputs some range of a set of ‘sentences’ (thought of here as sets of binary code, not as internal expressions) chosen from a linguistic corpus and thought to represent behavior that accords with the chosen rule. The connectionists clearly have no intention of adopting Chomsky's naturalistic approach to languages themselves, and appear to ignore the background facts, assumptions, and methods that led to the improving degree of success that Chomskyan linguistic theory has had in recent years, the theories that include the rule- or principle-statements. They refuse to treat the rule/principle they focus on at a time as a rule/principle of derivation/computation of a natural ‘organ,’ one that does not produce linguistic behavior itself but that offers anyone who has such a system that can derive any of the infinite number of expressions that their I-languages make possible. They seem to think that the facts of language acquisition and creative language use must be wrong, and while they do take Chomskyan rules/principles into account in the superficial way indicated, their concern is to attempt to show that neural nets can be trained to produce behaviors that they believe indicate that the net has ‘learned’ the rule/principle. One way to measure how successful they have been in their efforts is found in Elman's (2001) claim that he got a neural net to deal with the common phenomenon of nested dependencies in natural languages. An example of nesting is center-embedded clauses in sentences; dependencies include subject-verb number agreement. They are important, for dependencies are closely related to linguistic structures and to constraints on them; they play a central role in syntax-semantics. As for Elman's claim to be successful, Chomsky remarks (in comments in personal correspondence that also appear in the third (2009) edition of Cartesian Linguistics): “No matter how much computer power and statistics . . . [connectionists] throw at the task [of language acquisition], it always comes out . . . wrong. Take [Jeff] Elman's . . . paper[s]6 . . . on learning nested dependencies. Two problems: (1) the method works just as well on crossing dependencies, so doesn't bear on why language near universally has nested but not crossing dependencies. (2) His program works up to depth two, but fails totally on depth three. So it's about as interesting as a theory of arithmetical knowledge that handles the ability to add 2+2 but has to be completely revised for 2+3 (and so on indefinitely).” Details aside, the point is clear. Those convinced that language is a learned form of behavior and that its rules can be thought of as learned social practices, conventions, induced habits, etc. that people conform to because they are somehow socially constrained, are out of touch with the facts. They are so because they begin with assumptions about language and its learning that have nothing to do with natural languages and their acquisition and use, refuse to employ standard natural science methodology in their investigation, and so offer ‘theories’ of language and its learning that have little to do with what languages are and how they are used.
Enough, then, of externalist or ‘representationalist’ and clearly non-naturalistic efforts to deal with language and its meaning. In a bit more detail, how does one proceed to construct a naturalistic theory of meaning for natural languages? Some of the details are in the main text; some prospects for ways to proceed are also found in Appendix V. In the interests of not prolonging an already overly long discussion, I will just outline some plausible-looking steps to take. They are plausible-looking in part not just because they do try to take the facts about language and its acquisition into account and adopt standard naturalistic scientific methodology, but because there has already been some progress on the way to constructing such a ‘science of linguistically expressed meaning.’ 1. An early step is settling on the methods to pursue and facts to take into account in constructing a science of meaning. The methods, as suggested, are those of naturalistic scientific research: to aim for descriptive and explanatory adequacy, objectivity, simplicity (where available), and possible accommodation to another science (here, surely biology and any physiochemical, physiological, and computational constraints that might apply to linguistically expressed meanings and their growth/development in a human). And one must over time make genuine progress in one or more of these dimensions. No other standards are worth pursuing if one wants anything like a genuine theory, and no other is worth attempting, judging by the unfortunate results of insisting that in studying the mind and language, one must do something else – default to some form of behaviorism. Its sole ‘advantage’ is that anyone can understand it; it is not simple in the way scientific theories are, however, but simple in a considerably less complimentary way. Surely the right course is to pursue a methodology that has yielded good results in the science of mind elsewhere (vision, for example), and in linguistic syntax, morphology, phonology, formal semantics (construed in an internalist, syntactic way), phonetics, and aspects of formal pragmatics – particularly since at least syntax and morphology are directly involved in determining linguistically expressed meaning at the language system's semantic interface. The methodology applied to the mind is internalist: it apparently works only where one focuses on the internal operations and ‘outputs’ of internal systems that develop or grow automatically (because they are natural ‘organs’) and that ‘interface’ with other internal systems. The methodology apparently does not work with the acts and actions of a person. As for the relevant facts, at a general level, they include the creative aspect of language use observations as well as the poverty of the stimulus ones, the latter not only for the progress made in syntax and phonology by respecting them and deciding to not only look for a ‘natural’ system located in the head, but because the rate of acquisition of lexical items is remarkably swift with no apparent training involved, and because infants clearly understand many of the concepts expressed in natural languages before they can speak or sign (express) the concepts themselves. At a finer-grained level, the facts include the richness and human interest-focused natures of commonsense concepts (making their swift acquisition all the more remarkable), the open-ended nature of concept and lexical acquisition and the ease with which individuals (including the very young) manage them, an apparent difference between human and non-human concepts, the degree of flexibility that our conceptual systems have available in them (due perhaps in part to morphological operations), the apparent universality (assuming acquisition) of remarkably rich concepts, the facts of polyadicity and its limits, and the like.
2. The next two stages consist of choosing how to place the study of meaning within an existing naturalistic research enterprise. Assuming that the theory at issue aims to offer a naturalistic account of linguistically expressed meanings, start by coming to an understanding of what the fundamental ‘atoms’ of meanings are. A
ny scientific effort – perhaps because of the natures of our minds, as mentioned in the main text – proceeds by looking for fundamental elements and assigning them the properties they need to have to provide an adequate theory. At the same time, come to an understanding of how these elements are put together to yield the complex meanings that sentences express. As it stands, doing the latter amounts to adopting an adequate version of the architecture of the language faculty. One of the simplest and easiest to understand is found in the idea that meaning ‘information’ is lodged in ‘words’ or lexical items (or some other technical notion with similar effect) in some way and that syntactic operations (Merge, at least) combine these to provide the complex form of meaning ‘information’ in sententially expressed ‘concepts’ provided at what Chomsky calls the “conceptual-intentional interface.”
Then decide on the scope of the theory in a way that respects the poverty and creativity observations, and any other basic facts that a serious theorist must take into account. To this end, choose an internalist approach. That is why I put scare-quotes around ‘information’ in the last paragraph. The word invites in many people's minds an intentional reading, which for them implies a referential reading. Fodor, mentioned above, is one. To try to avoid that implication, it might help to use Chomsky's technical-sounding term “semantic feature,” although this too can invite a referential reading because of ‘semantic.’ And “meaning feature” can do the same. So in what follows, I stipulate: ‘semantic feature’ and the other terms mentioned here are to be read non-intentionally, and ‘computation’ does not track truth or truth-conditions, as some insist it should; it must only somehow meet the conditions that the systems which language ‘interfaces’ set, and the semantic interface(s) in particular. Thus, these terms are to be read as semi-technical terms that, at the least, serve to distinguish the kind of information that – if a computation/derivation is successful – plays a role at the “semantic interface(s)” SEM rather than the “phonetic interface” PHON. Intuitively, then, semantic, phonological, and formal ‘information’ is placed in lexical items, or in some other way inserted into a sentential computation, and the computation/derivation offers the relevant kinds of information at the relevant interfaces to other systems.
3. That much is basic; the decisions in (2) are difficult to reverse because that would risk abandoning assumptions that have worked and proven fruitful in advancing naturalistic research into the mind so far. After that, decisions about what kind of theory to construct – which hypotheses to offer and find data concerning – reflect issues disputed by those working in the framework chosen in (2). For example, should one adopt what Hagit Borer (2005) calls an “endoskeletal” account of computation, where the relevant ‘information’ concerning a computation and how it proceeds is contained in some selection of lexical items, or instead adopt as I did above an “exoskeletal” account that assigns a ‘package’ of semantic features a status as noun or verb as computation proceeds, and if verb, assigns an adicity (the number of ‘arguments’ or nouns in “referring positions” (meaning by this in positions in sentences where they could be used to refer/where they have a case assignment)). Choosing one over another, one must make other choices consistent with one of the options. On an exoskeletal account, for example, the semantic information in a package of semantic features will be something that can in one form of computation get read as having the form of a noun, and in another, the form of a verb. And so on. Another decision to make is whether one treats the semantic information in a lexical item as compositional itself, or whether to conceive it as essentially ‘atomic,’ a morphological root that even in an account of concept/lexical semantic feature acquisition is not put together out of more basic features, but is one of the many thousands of ‘root’ concepts that humans can acquire. If one chooses the other option, one can explore the possibility that although from the point of view of syntax/morphology a lexical semantic root is treated as atomic, from the point of view of acquisition, it is composed. There is some discussion of this in the main text. Then there is the matter of how to conceive of the way in which semantic composition takes place. Does it have recourse in a fundamental way to notions such as truth, or does it proceed in a fully internalist manner? “Functionist” accounts are popular (Heim & Kratzer 1998) and many of them do seem, at least, to give a central role to truth, making them less than desirable for someone who wants to proceed in an internalist way,7 and they also have problems in explaining why there seem to be limits on the adicity of natural language verbs. There are alternatives that are more nearly internalist, and that do speak to other problems with the functionist approach. One is found in Pietroski's (2005) internalist version of Davidsonian-based event semantics joined to Boolos-style second-order quantification. Another possible move is to adopt one or another form of functionist semantic theory and model-theoretic work but denature them of any of their (supposed?) representationalist aspects. For reference, Chomsky (1986) pointed to a denaturing move mentioned above: adopt a model-theoretic way of saying what ‘appears’ at SEM in the way of semantic information, and continue to use terms such as “refer,” but read them as “Relation R,” where R is “reference,” but taken to be a ‘relation’ defined over mental models. It is not clear what to do about truth if one wants to keep anything like ‘real’ truth (likely impossible, I suspect). Model theory allows it to be denatured too: truth becomes largely stipulative. And there is an advantage in denaturing both: one gets an easy way to conceive of how to appropriate much of the machinery of formal semantics and appropriate the insights – and they are real – of the many who work in formal semantics now. But one can do that in other ways too. I will not mention other disputed issues.
4. The last stage (and really, the idea that there are stages or steps involved is a fiction; theory-construction is usually a matter of proceeding on all fronts at once, although attending to what must be dealt with at any given time) is trying to speak in at least a preliminary way to basic questions that no one has much of a clue about yet. Chomsky in the text says this of human concepts, and he is right: they are, as we have seen, a puzzle. However, the naturalistically inclined theoretician can do at least some things. For one, restricting discussion to linguistically expressed concepts focuses attention on what appears at the semantic interface, and what is needed at that place where a sentence/expression offers information to other systems (or places, if one adopts a staged or “phased” conception of computation). By assuming this, and by assuming (see pp. 28–29 and commentary) that the semantic interface works much as the phonetic one does, so that there is no need to link linguistically expressed meaning information to a separate concept (in the manner of Fodor with his “language of thought,” for example), we could call the semantic information at SEM a “linguistically expressed concept,” period. In a connected vein, another thing to do (also anticipated above) is acknowledge that while from the point of view of morphology and syntax a lexical concept is ‘atomic,’ from the point of view of the acquisition of the concepts expressed in lexical items themselves, they could be treated as decomposable. Neither of these moves says what the basic elements of concept acquisition are, what human linguistically expressed concepts in general are, and how either of these might have evolved. As Chomsky points out in the text in connection with the question of the evolution of language, these are very likely matters that we cannot now even begin to address. We cannot because – unlike the state of “narrow syntax” in its minimalist form, which is sufficiently strong to begin to address these matters – there really are no respectable theories of concepts yet in place.
In addition, even when we can try to address issues for the study of linguistically expressed concepts in a reasonable way, lying in wait are the kinds of problems that Richard Lewontin emphasized in his (1998). With the possible exception of a plausible saltational account of evolution of the sort that Chomsky discusses in the main text, where it is possible to gather at least some paleoanthropologic
al and archaeological evidence (and the like), it is extremely difficult to conceive of how one could gather any evidence for the evolutionary development of human conceptual capacities – capacities that on the face of it are very different from those available to other creatures.
I am very grateful for discussion of these issues not only to Chomsky, but to Paul Pietroski and Terje Lohndal at the University of Maryland, and to some students at McGill: Oran Magal, Steve McKay, Lauren de la Parra, Lara Bourdin, and Tristan Tondino.