Noam Chomsky's Contribution to LinguisticsA Sketch - Oxford Handbooks

Subscriber Login

Forgotten your password?

Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE ( (c) Oxford University Press, 2015. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy).

date: 26 July 2016

Noam Chomsky's Contribution to Linguistics: A Sketch

Abstract and Keywords

This sketch attempts to convey the magnitude of Chomsky’s contribution to linguistics by comparing his initial formulation of generative grammar with his structuralist predecessors’ approach to syntax and then comparing that formulation to the current perspective. In the intervening six decades, Chomsky: (a) constructed a formal theory of grammar and explored its foundations; (b) developed a cognitive/epistemological interpretation of the theory, leading to the biolinguistic perspective; (c) contributed major proposals for constraints on grammars resulting in a significant reduction in and simplification of the formal grammatical machinery; and (d) re-evaluated the theory of grammar in terms of language design, raising the possibility of empirical proposals about the language faculty as a biological entity with properties of economy, simplicity, and efficient computation. In redefining the science of language (a–d), Chomsky has wrought a revolution without precedent in the history of linguistics.

19.1 Preliminaries1

Over the past sixty years Noam Chomsky has produced a body of work on the study of language that includes over twenty-five books (some in their second and third editions) and over 100 articles in journals and book chapters. His work ranges from detailed technical studies of formal grammar (in phonology as well as syntax, see Chomsky and Halle 1968) to trenchant general discussions of foundational issues in philosophy and psychology concerning language and mind. In formal grammar, Chomsky pioneered the first work on (modern) generative grammar, laying the groundwork in his 1949 University of Pennsylvania undergraduate honours thesis on the morphophonemics of Modern Hebrew (revised and expanded as a 1951 master's thesis, a further revised 1951 version published 1979, henceforth MMH) and then developing the formal foundations with applications to English syntax in a 913 page manuscript, The Logical Structure of Linguistic Theory (henceforth LSLT, written 1955–6 and published in a somewhat shortened version in 19752), which included a detailed theory of transformations. Since then and up to the present day, Chomsky has continued to develop the (p. 440) theory of grammar, refining the formulations of grammatical mechanisms and the constraints on their operation and output as well as raising challenging general questions for linguistic theory (most recently in the form of the Minimalist Program of the past two decades). From the outset, Chomsky connected his work in linguistics to questions of epistemology and mind in philosophy and to learning and cognition in psychology, initially in his famous 1959 critique (1959b) of the behaviorist account of language and its acquisition in Skinner's Verbal Behavior (1957). In later years he extended that critique of an extreme form of empiricism to less extreme mainstream empiricist approaches to the study of language, including Wittgenstein and Quine (see e.g. Chomsky 1969 and more recently Chomsky 2000c: ch 3). In contrast, Chomsky's work engages the perspective of mentalism while rejecting dualism3 and resurrects a theory of innate knowledge—Universal Grammar—to explain how speakers come to know so much about the language they speak that cannot be explained solely on the basis of their limited and impoverished experience (the contribution of their environment).

Chomsky has presented this work on generative grammar within the context of an intellectual history spanning several centuries. His historical commentary begins with a discussion of the seventeenth-century Cartesian roots of modern generative grammar (see Chomsky 1966/2009, 1968/2006) and expands to the history and philosophy of science more generally, focusing on the naturalistic approach from Newton to the present, ‘the quest for theoretical understanding, the specific kind of inquiry that seeks to account for some aspects of the world on the basis of usually hidden structures and explanatory principles’ (Chomsky 2000c: 134). His discussion shows how the study of language falls within the scope of naturalistic inquiry (hence the natural sciences), contrary to much philosophical debate in the opposite direction (see again Chomsky 2000c). Part of his achievement has been to align modern linguistics with the natural sciences, in particular as a sub-part of human biology, biolinguistics.

Chomsky's work has consistently moved the technical study of linguistic structure forward as well as deepened the general understanding of how the study of language fits into the larger investigation of human cognition (cognitive science) from the perspective of natural science. In the history of linguistics this achievement is unparalleled.

The first formulation of generative grammar developed in response to a suggestion by Chomsky's professor Zellig Harris that he construct a structural analysis of some language. Chomsky describes the origin of this work as follows:

Harris suggested that I undertake a systematic structural grammar of some language. I chose Hebrew, which I knew fairly well. For a time I worked with an informant and applied methods of structural linguistics as I was then coming to understand them. The results, however, seemed to me rather dull and unsatisfying. Having no very clear idea as to how to proceed further, I abandoned these efforts (p. 441) and did what seemed natural; namely, I tried to construct a system of rules for generating the phonetic forms of sentences, that is, what is now called a generative grammar. I thought it might be possible to devise a system of recursive rules to describe the form and structure of sentences, recasting the devices in Harris's methods for this purpose, [footnote omitted] and thus perhaps to achieve the kind of explanatory force that I recalled from historical grammar [footnote omitted, RF]. (LSLT p. 25)4

The methods Chomsky abandoned were basic taxonomic procedures of segmentation and classification applied to a limited, though supposedly representative, linguistic corpus.5 Harris describes these analytic procedures in Methods of Structural Linguistics (1951), which Chomsky read as a manuscript in 1947 as his introduction to modern linguistics (see Harris's preface dated 1947, which acknowledges Chomsky's help with the manuscript).

This volume presents methods of research used in descriptive, or more exactly, structural, linguistics. It is thus a discussion of the operations which the linguist may carry out in the course of his investigations, rather than a theory of the structural analyses which result from these investigations. The research methods are arranged here in the form of the successive procedures of analysis imposed by the working linguist upon his data. (1951: 1)

Harris applies these operations bottom-up from phonemes to morphemes on up to the level of the utterance. These operations yield a grammar of lists, taking the term ‘grammar’ to be some characterization of a language. In §20.21 Harris identifies two contrasting purposes of linguistic analysis: stating ‘all the regularities which can be found in any stretch of speech, so as to show their interdependences (e.g. in order to predict successfully features of the language as a whole)’ vs synthesizing ‘utterances in the language such as those constructed by native speakers’ on the basis of some minimal information (p. 365). The procedures presented in Methods are most naturally compatible with the former purpose. Moreover, as Chomsky notes in the 1975 introduction to the published version of LSLT, ‘Harris did not elaborate on the suggestion (p. 442) that a grammar can be regarded as a device for “synthesizing utterances,” an idea that does not, strictly speaking, seem compatible with the general approach of Methods’ (p. 50 n. 45).

The formulation of generative grammar in MMH constitutes ‘a theory of structural analyses’ that Harris excludes as a focus in Methods.6 Furthermore, its fundamental purpose is to synthesize utterances, ‘the process of converting an open set of sentences—the linguist's incomplete and in general expandable corpus—into a closed [footnote omitted] set—the set of grammatical sentences—and of characterizing this latter set in some interesting way’ (MMH p. 1), what Chomsky characterizes as ‘a process of “description”,’ bypassing the ‘process of “discovery” consisting of the application of the mixture of formal and experimental procedures constituting linguistic method’ (p. 1). The footnote regarding the set of grammatical sentences notes that this set is ‘not necessarily finite. Thus the resulting grammar will in general contain a recursive specification of a denumerable set of sentences’ (p. 67). This shows that Chomsky had connected the notion of recursive function and language from the beginning.

It is worth recalling that Chomsky initially evaluated his work on generative grammar as unrelated to scientific linguistics.

While I found this work intriguing and exciting, I thought of it as more or less of a private hobby, having no relationship to ‘real linguistics.’ Part of my skepticism about this work derived from the fact that as the grammar was improved in terms of its explanatory force and degree of generalization, it became more and more difficult to isolate anything approximating a phonemic representation, and I took for granted that phonemic analysis was the unchallengeable core of linguistic theory. More generally, I was firmly committed to the belief that the procedural analysis of Harris's Methods and similar work should really provide complete and accurate grammars if properly refined and elaborated. But the elements that I was led to postulate in studying the generative grammar of Hebrew were plainly not within the range of such procedures. (LSLT p. 29).

Chomsky notes that his evaluation of this work ‘was reinforced by the almost total lack of interest in MMH on the part of linguists whose work I respect’ (with the sole exception of Henry Hoenigswald) (p. 30).

The initial work on generative grammar in MMH Chomsky carried out virtually on his own, and for this he deserves sole credit. However, his work that followed benefited from the input of teachers, colleagues, and students (and their students, etc.)—in effect, a collaborative effort in a scientific enterprise that Chomsky's early work established in the 1950s.7 Although the discussion that follows will focus on the specific proposals in Chomsky's publications, it should be kept in mind that this work responds to (p. 443) developments in linguistics based partially on the work of others and therefore belongs to a collaborative enterprise.

The preliminary sketch that follows contains four parts. The first characterizes the state of linguistics before Chomsky's first formulation of generative grammar in MMH. The second outlines the basic form of generative grammar that he postulated in LSLT and Syntactic Structures (1957a, henceforth SS). The third explicates the psychological interpretation of grammar that he developed in 1958–9 (see Chomsky 1982: 62), published seven years later as the first chapter of Aspects of the Theory of Syntax. The fourth sketches the current formulation of generative grammar under the Minimalist Program.

19.2 Syntax Before Generative Grammar

To fully appreciate Chomsky's contribution to linguistics, it is necessary to consider the state of American linguistics prior to 1957, the year his initial groundbreaking work reached a wide audience as the monograph SS. Consider for example the 1955 first edition of Henry A. Gleason's An Introduction to Descriptive Linguistics, a standard introduction to linguistics at that time. Linguistics is defined as ‘the science which attempts to understand language from the point of view of its internal structure’ (p. 2). The focus of discussion is primarily on phonology and morphology, concerned with phonemes and morphemes. Syntax occupies one thin chapter sandwiched between five chapters on morphology (‘The Morpheme,’ ‘The Identification of Morphemes,’ ‘Classing Allomorphs into Morphemes,’ ‘Outline of English Morphology,’ and ‘Some Types of Inflection’) and a sixth (‘Some Inflectional Categories’). In chapter 5, morphology and syntax are linked together as ‘not precisely delimitable’ subdivisions under the heading of grammar (following Bloomfield 1914: 167, see also Bloomfield 1933: 184), which might account for the sandwiching of the chapter on syntax between two chapters about inflection.

At the beginning of the chapter on syntax, Gleason provides the following definition:

Syntax may be roughly defined as the principles of arrangement of the constructions formed by the process of derivation and inflection (words) into longer constructions of various kinds. (1955: 128)8

(p. 444) What these principles are is not made explicit. Gleason begins with a ‘first hypothesis’ that every word in an utterance ‘has a statable relationship to each other word. If we can describe these interrelationships completely, we will have described the syntax of the utterance in its entirety’ (p. 129).9 To this end, Gleason introduces the concepts of construction (‘any significant group of words (or morphemes’), constituent (‘any word or construction (or morpheme) which enters into some larger construction’), and immediate constituent (IC) (‘one of the two, or a few, constituents of which any given construction is directly formed’) (pp. 132–3).10 He goes on to identify word order (§1.13) and constituent classes (§1.14) as universal syntactic devices, defining the latter as ‘any group of constituents (words or constructions) which have similar or identical syntactic function.’11 However, none of this material solves what Gleason characterizes in §10.8 as ‘the basic problem of syntax’: ‘to establish a method of finding the best possible organization of any given utterance and of insuring comparable results with comparable material’ (p. 132). According to his assessment, ‘unfortunately, the methodology has not as yet been completely worked out in a generally applicable form. Moreover, some of the best approximations to a general theory are beyond the scope of an introductory text’ (p. 132). No references to these best approximations are given.

Compared to SS, the discussion of syntax in Gleason's 1955 textbook barely scratches the surface. In striking contrast, his 1961 revised edition, which references Syntactic Structures and thanks Chomsky for comments on the manuscript, discusses phrase structure rules and transformations, and contains three chapters on syntax (p. 445) (‘Immediate Constituents,’ ‘Syntactic Devices,’ and ‘Transformations’) followed by ‘Language and Grammars.’

19.3 The Birth of Modern Generative Grammar

In marked contrast to the structuralist definitions of syntax cited above, Chomsky's definition in SS focuses instead on the grammatical mechanisms for constructing sentences. ‘Syntax is the study of the principles and processes by which sentences are constructed in particular languages’ (SS p. 11). One of the goals of syntactic investigation of a given language is ‘the construction of a grammar that can be viewed as a device of some sort for producing the sentences of the language under analysis’ (SS p. 11). In effect, Chomsky's definition shifts the focus of syntactic investigation from languages (viewed as a collection of sentences, superficially as a set of phonetic forms) to grammars, the mechanisms by which sentences of a given language are constructed. This is clarified further in Chomsky and Miller (1963).

It should be obvious, however, that a grammar must do more than merely enumerate the sentences of a language (though, in actual fact, even this goal has never been approached). We require as well that the grammar assign to each sentence it generates a structural description that specifies the elements of which the sentence is constructed, their order, arrangement, and interrelations and whatever other grammatical information is needed to determine how the sentence is used and understood. A theory of grammar must, therefore, provide a mechanical way of determining, given a grammar G and a sentence s generated by G, what structural description is assigned to s by G. If we regard a grammar as a finitely specified function that enumerates a language as its range, we could regard linguistic theory as specifying a function that associates with any pair (G, s), in which G is a grammar and s a sentence, a structural description of s with respect to G; and one of the primary tasks of linguistic theory, of course, would be to give a clear account of the notion of structural description. (p. 285)

In this way, syntax is concerned with the grammatical mechanisms that construct structural descriptions for sentences, thereby specifying their internal structure.

The formulation of generative grammar developed in LSLT and SS contains two explicit grammatical mechanisms as the processes for constructing sentences: phrase structure rules and transformations. Phrase structure rules in a grammar are formulated as rewrite rules, where the grammar contains ‘a sequence of conversion statements “α→β,” where α and β are strings, and derivations are constructed mechanically (p. 446) by proceeding down the list of conversions’ (LSLT p. 190).12 The second appendix of SS provides the following simplified example (p. 111).13


  1. (i) Sentence → NP + VP

  2. (ii) VP → Verb + NP

  3. (iii) Verb → Aux + V

  4. (iv) NP → {NPsingular, NPplural}

  5. (v) NPsingular → T + N + Ø

  6. (vi) NPplural → T + N + S

  7. (vii) Aux → C (M) (have + en) (be + ing)

  8. (vii) T → the

  9. (ix) N → man, ball, etc.

  10. (x) V → hit, take, walk, read, etc.

  11. (xi) M → will, can, may, shall, must

Applying the rules in a sequence starting with the initial element ‘Sentence’ yields a set of strings, with a final string consisting entirely of terminal elements to which no phrase structure rule can apply. For example, one derivation of the sentence (2) from the grammar in (1) would produce a set of strings (3).

(2) The man takes the books.

(3) {Sentence, NP+VP, NP+Verb+NP, NP+Aux+V+NP, NPsingular+Aux+V+NP, NPsingular+Aux+V+ NPplural, T+N+Ø+Aux+V+NPplural, T+N+Ø+Aux+V+T+N+S, T+N+Ø+C+V+T+N+S,…, the+man+Ø+C+take+the+books+S}

(p. 447) The ellipsis in (3) stands in for the subset of strings that would be produced by applying the lexical rules (vii–xi) one at a time to the string T+N+Ø+C+V+T+N+S. (3) constitutes a phrase-marker of the sentence which carries all the information about its constituent structure (see SS p. 87: n. 2, and LSLT §§53.2–54.1 for a more complicated and precise formulation). In practice, phrase markers are given equivalently as tree diagrams; in this case (3) would be represented as (4).

 Noam Chomsky's Contribution to LinguisticsA Sketch

In Chomsky's earliest formulations, C marks the position of the element combining person/number agreement and tense (present vs past) that ultimately attaches to the finite verbal element in the sentence, in this case the main verb.14 The separation of C from the verbal element it ultimately attaches to in phonetic form introduces a level of abstraction into the syntactic analysis of phrase structure, creating a distinction between abstract underlying form and concrete phonetic form. Chomsky motivates this analysis by demonstrating how it accounts for the distribution of periphrastic do in English. Chomsky's use of abstract structure to explain the patterns that occur in concrete phonetic form constitutes a large part of his legacy to linguistics.

In starting with the initial element Sentence, phrase structure grammar gives a top-down analysis whereby a phrase is subdivided into its constituent parts, each conversion specifying the immediate constituents of each phrasal element identified.15 In this way the phrase structure rules provide an analysis of the internal structure of sentences, both the basic linear order of the lexical elements and also their hierarchical structure.

To get from the abstract underlying representation in (4) to the phonetic form of (2) involves the application of transformational rules, which are defined as mappings from one phrase marker to another. Under the grammar given in SS the derivation of (2) involves one transformation that has the effect of specifying C as a morpheme S (representing present tense and third person singular) and another (generally referred (p. 448) to as Affix Hopping) that attaches C to the verb take (yielding takes). Thus the syntactic derivation of the simplest sentences will involve transformations as well as phrase structure rules.

Under the earliest formulations of transformational generative grammar the phrase marker (3–4) also serves as the underlying structure of the passive construction (5) corresponding to (2).

(5) The books are taken by the man.

The derivation of (5) involves the application of a passive transformation as well as the two transformations required for the derivation of (2). The formulation of the passive transformation in SS, given in (6), is complex.

(6) Passive—optional:

Structural analysis: NP—Aux– V—NP

Structural change:


It applies optionally, as opposed to the other two transformations, which must apply and are therefore obligatory. Thus transformations must be designated as obligatory or optional. Furthermore, it produces multiple changes in the phrase-marker (4), inverting the positions of the two NPs it analyses and inserting two lexical elements, the passive auxiliary be+en and the passive by. As formulated, (6) is specific to English and linked to a specific construction, the passive.

In deriving an active sentence and its passive counterpart from the same underlying representation, the transformational analysis provides a straightforward account of sentence relatedness. In the earliest transformational grammars, this extends to the relation between affirmative and negative sentences (e.g. (2) vs (7a)), and between declarative and interrogative sentences (e.g. (2) vs (7b)).


  1. a. The man does not take the books.

  2. b. Does the man take the books?

Compounding the three basic distinctions (active/passive, affirmative/negative, declarative/interrogative) yields at least eight possible outcomes (e.g. affirmative-active-declarative (2) vs. negative-passive-interrogative (8)), all of which are derived from the same underlying structure (3–4).

(8) Aren't the books taken by the man?

And given that (3–4) would also be the underlying structure for the wh-questions (9), the transformational account would extend to the relatedness between (2), (5), (7–8) and (9).


  1. a. Who takes the books?

  2. b. What does the man take?

    (p. 449)

Thus in addition to a transformation that derives passive constructions, there would be transformations for deriving negative constructions and interrogative constructions—in the latter case one for yes/no questions and another for wh-questions.16

Another essential property of the early formulation of transformational grammars concerned the ordering of rules. For example, given (3–4) as the underlying phrase marker for the passive construction (5), if the transformation that determines the content of C applies before the passive transformation, then the deviant (10) results.

(10) *The books is taken by the man.

However, if the passive transformation applies first, then C is mapped onto Ø (representing the morpheme for third person singular and present tense) and the legitimate (5) is derived. Therefore early transformational grammars included statements about the necessary ordering of transformations.

The early formulation of transformational grammar involved two distinct types of transformations. Those involved in the derivation of simple sentences like (2) and (5) applied to a single phrase marker, the singulary transformations. The derivation of complex and compound sentences required another kind of transformation that operated on pairs of phrase markers to produce larger phrase markers, the generalized transformations. These transformations instantiate the recursive property of grammars by means of which an infinite class of structures can be generated. The ordering of these two types of transformations in a derivation was expressed as an ordering statement called a transformation marker. In this way, the ordering of transformations constituted a central focus of the earliest generative grammars.

While phrase structure rules in (2) provide an explicit and complete account of immediate constituent structure (for a finite set of sentences), the passive transformation (6) does not. Thus transformations formulated in terms of strings as in (6) raise the problem of derived constituent structure. As Chomsky notes in §82.1 of LSLT:

It is necessary to study the internal effects of these transformations on strings in greater detail than we have done so far. The basic reason for this is that we must provide a derived constituent structure for transforms, for one thing, so that transformations can be compounded.

Each transformation contributes to derived constituent structure, so in derivations involving multiple transformations the compound effects on constituent structure need to be specified at each step. LSLT (p. 321) rejects the solution in which transformations are defined ‘in so detailed a fashion that all information about the constituent structure of the transform is provided by the transformation itself, and that any constituent hierarchy can be imposed on any transform by an appropriate transformation’ because (p. 450) it would make the definition of transformations ‘extremely cumbersome’ and also because some information is already provided by phrase structure grammar (e.g. the passive by-phrase which can be analysed as a PP). The alternative solution assumes the existence of general principles that determine derived constituent structure for the output of transformations.

Keeping to a minimal formulation of transformations, however, creates the possibility that transformations can (mis)generate deviant constructions. LSLT §95.3 cites the following examples, both of which would be derived from the same underlying phrase-marker:


  1. a. Your interest in him seemed to me rather strange.

  2. b. *Whom did your interest in seem to me rather strange?

Chomsky notes that the deviance of (11b) cannot be attributed to the stranded preposition in, because it does not create deviance in other constructions, as illustrated in (12) (LSLT p. 437).


  1. a. You lost interest in him (this year).

  2. b. Who did you lose interest in (this year)?

So the deviant (11b) is not blocked under the minimal formulation of the transformation that relocates the wh-phrase whom to clause-initial position from its underlying position where it occurs as the object of the preposition in. Consider the formulation of this rule in SS (rule 19 on p. 112), as given in (13) with details slightly modified for ease of exposition.


  • Structural analysis: XNPY

  • Structural change: X1 − X2 − X3 → X2 − X1 − X3

In the structural analysis, X and Y are variables that range over strings (possibly the null string), whereas NP is a constant term. In this formulation, the displacement of NP to clause-initial position is over a variable, which without further constraints can represent a string of any size and internal structure. Note that the use of variables in the formulation of the wh-movement rule (13), which allows the misgeneration, contrasts with the formulation of the passive transformation (6), where all of the terms identified in the structural analysis are constants. One solution proposed in LSLT introduces a complicated constraint on both variables as part of the transformation, thus departing from minimal formulations. The alternative, developed several years later, starting with Chomsky's A-over-A Principle,17 is to keep to minimal formulations by adding general (p. 451) constraints (‘hypothetical universals,’ Chomsky 1964b: 931) that apply to the application of all transformations as part of a theory of transformations—i.e. not tied to specific transformations or specific languages.

The model of grammar that emerges from LSLT contains a syntactic component consisting of two subcomponents, a phrase structure grammar and a set of transformations where the output of the latter serves as the input of the former as diagrammed in (14).

(14) PS rules → transformations →

The output of the transformational component serves as input to the rules of morphology and phonology, which determine the phonetic form (PF) of sentences. In later work beginning with Chomsky (1965), the outputs of these two syntactic subcomponents are treated as specific levels of representation: deep structure (later D-structure) derived from the application of phrase structure rules and surface structure (later S-structure) derived from the application of transformations. (See footnote 29 below for further discussion.)

Although there is no discussion of semantic analysis in LSLT, the first sentence of the preface distinguishes syntax from semantics as the major subdivisions of linguistic theory, the former being ‘the study of linguistic form’—which would naturally include phonology and morphology (p. 57). According to LSLT, ‘the goal of syntactic study is to show that the complexity of natural languages, which appears superficially to be so formidable, can be analyzed into simple components; that is, that this complexity is the result of repeated application of principles of sentence construction that are in themselves quite simple’ (a formulation that has been vindicated in a spectacular way in light of the evolution of the current theory: see §19.5 below). In contrast, semantics ‘is concerned with the meaning and reference of linguistic expressions,’ involving a study of how natural language, ‘whose formal structure and potentialities of expression are the subject of syntactic investigation, is actually put to use in a speech community.’ LSLT views ‘syntax and semantics as distinct fields of investigation,’ and therefore focuses on syntax as ‘an independent aspect of linguistic theory’ given that ‘how much each draws from the other is not known, or at least has never been clearly stated.’ Nonetheless, as Chomsky comments, ‘syntactic study has considerable import for semantics,’ unsurprising because ‘Any reasonable study of the way language is actually put to work will have to be based on a clear, understanding of the nature of the syntactic devices which are available for the organization and expression of content.’18

SS develops the discussion of syntax and semantics by examining the question of how semantics might play a role in determining the form and function of a theory of linguistic structure (see esp. chapter 9). In §9.3, Chomsky observes that core semantic notions like reference, significance and synonymy play no role in the formulation of the syntactic processes that construct sentences with their structural analyses. In Chomsky (1975b) this observation is reformulated as the restrictive thesis that the formulation of syntactic rules excludes all core notions of semantics—which Chomsky designates ‘absolute autonomy of formal grammar’ (p. 91). This formulation presupposes that the primitive notions of linguistic theory can be separated into distinct categories of formal vs semantic, where ‘the choice of primitives is an empirical matter’ (p. 91 n. 33). However, the thesis does not deny the existence of systematic connections between linguistic structure and meaning. Rather, it ‘constitutes an empirical hypothesis about the organization of language, leaving ample scope for systematic form-meaning connections while excluding many imaginable possibilities’ (p. 92). Chomsky (1975b) examines a representative sample of challenges to the absolute autonomy of formal grammar, concluding ‘that although there are, no doubt, systematic form–meaning connections, nevertheless the theory of formal grammar has an internal integrity and has its distinct structures and properties, as Jespersen [1924] suggested’ (pp. 106–7).19

From the outset, Chomsky's goals in proposing generative grammar transcend the development of new grammatical tools for syntactic analysis—i.e. phrase structure rules and transformations. LSLT identifies two interrelated general goals: the construction of both grammars for particular languages and a formalized general theory of linguistic structure.20 SS extends the second goal to include an exploration of ‘the foundations of such a theory’ (p. 5). It also announces a further goal that in light of the (p. 453) developments of the past two decades now stands as visionary. ‘The ultimate outcome of these investigations should be a theory of linguistic structure in which the descriptive devices utilized in particular grammars are presented and studied abstractly, with no specific reference to particular languages’ (p. 11). It took more than four decades of research to begin to understand how this goal might be realized within the Minimalist Program (see §19.5). Thus the conceptual shift in focus from describing the internal structure of languages to constructing formal grammars and their underlying formal theory, which marks the advent of modern generative grammar, was the first step toward this goal. The second involved the psychological interpretation of grammars and also the theory of grammar as discussed in the first chapter of Chomsky (1965).

19.4 Knowledge of Language

Beyond the initial formulation of modern generative grammars and the beginnings of an underlying general theory, Chomsky provides an essential interpretation of grammars and the theory of grammar that placed linguistics at the centre of cognitive revolution of the 1950s, which began to study human cognition in terms of computational models. Rather than treat a generative grammar as essentially an arbitrary conventional way to describe language data (cf. Harris 1951), the first chapter of Chomsky (1965) explores the interpretation of a generative grammar as a system of knowledge in the mind of the speaker (see Chomsky 1965: ch. 1).

Clearly, a child who has learned a language has developed an internal representation of a system of rules that determine how sentences are to be formed, used, and understood. Using the term ‘grammar’ with a systematic ambiguity (to refer, first, to the native speaker's internally represented ‘theory of his language’ and, second, to the linguist's account of this), we can say that the child has developed and internally represented a generative grammar, in the sense described (p. 25).

Furthermore, he proposes to treat linguistic knowledge as a separate entity of the faculty of language (henceforth FL), distinct from linguistic behavior—i.e. how that knowledge is put to actual use—thereby formulating a competence/performance distinction (see p. 4).

The interpretation of a generative grammar as a system of knowledge in the mind of a speaker raises four fundamental questions:21

(p. 454) (15)

  1. a. What is the system of knowledge?

  2. b. How is it acquired?

  3. c. How is it put to use?

  4. d. What are the physical mechanisms that serve as the basis for this system and its use?

This interpretation connects generative grammar to epistemology and the philosophy of mind as well as cognitive psychology. The last question follows through on the assumption inherent in the first three that language is an intrinsic part of human biology and thus generative grammar should be understood as part of biolinguistics.22

Chomsky's answer to the first question has always been relatively straightforward: a computational system and a lexicon. What has changed significantly over the past half century are the formulations of the two components (see the next section for discussion). The second question can be interpreted in two ways: how is language acquisition possible versus how is it actually accomplished step by step. The possibility question involves what has been called the logical problem of language acquisition. The problem arises from the fact that what speakers come to know about their language cannot all be explained solely on the basis of the linguistic data they have encountered. Part of this knowledge involves distinguishing deviant from grammatical utterances. For example, English speakers know that (11b) is deviant and not a novel utterance,23 a legitimate sentence of their language that they have not encountered before. English speakers can also make systematic judgements about relative deviance, where some constructions are judged to be more deviant than other deviant constructions. Consider for example (11b) compared to (16), which appears to be significantly worse.

(16) *Whom did your interest in him seem to rather strange?

In both cases, the linguistic data provided by the environment is not sufficient to explain the linguistic knowledge attained, referred to as the poverty of the stimulus.24 Chomsky's solution to the logical problem of language acquisition is to posit an innate (p. 455) component of the computational system, which would therefore be universal across the species—designated as Universal Grammar (UG). UG accounts for whatever linguistic knowledge cannot be explained solely on the basis of experience. Thus generative grammar bears on the debate about nature versus nurture. Given that the linguistic systems humans acquire are unique to the species, this innate component constitutes a core part of the definition of human nature.

Chomsky describes language acquisition as a transition between mental states of the FL, where the mind of a child starts out in a genetically determined initial state and on exposure to primary language data changes to a steady state involving a specific generative grammar—i.e. a specific formulation of the computational system and a specific lexicon. To the extent that the initial state places limits on the form and function of the generative grammar attained in the steady state, it provides an underlying theory of grammar—that is, a theory of UG. In the early work on generative grammar Chomsky had suggested that a theory of linguistic structure could be achieved by ‘determining the fundamental underlying properties of successful grammars’ (SS p. 11). Thus it appeared that the first goal of linguistic research was the construction of generative grammars of specific languages from which the fundamental underlying properties could be discovered.25 However, in the context of language acquisition, research into the initial state of the language faculty could proceed independently of attempts to characterize the steady state (i.e. construct generative grammars of specific languages) by solving specific poverty of the stimulus problems in a variety of languages using general constraints on the form and function of the computational system. As it turns out, this research path has yielded the greatest progress.

The cognitive interpretation of grammar has also had a fundamental effect on the concept of a language. Consider the definition of language in work that predates this interpretation. The term ‘language’ was not explicitly defined in LSLT, but in the 1975 introduction Chomsky notes that the 1955–6 manuscript takes a language to be ‘a set (in general infinite) of finite strings of symbols drawn from a finite “alphabet”’ (p. 5). SS defines a language as a set of sentences ‘each of finite length and constructed out of a finite set of elements’ (p. 13). In Chomsky (1986) such characterizations are designated ‘as instances of “externalized language” (E-language), in the sense that the construct is understood independently of the mind/brain’ (p. 20). If E-language is the object of investigation, then ‘grammar is a derivative notion; the linguist is free to select the grammar one way or another as long as it correctly identifies the E-language’ (p. 20). In contrast, the cognitive interpretation focuses on the representation of language in the mind of the speaker, thus the steady state of the language faculty attained on exposure to primary language data. This involves UG and the generative grammar of the language derived from it. Chomsky calls this concept ‘internalized language’ (I-language). Chomsky (1995b) takes ‘I’ to refer to ‘individual’ and ‘intensional’ as (p. 456) well as ‘internal.’ An I-language is a physical object in the world, a finite grammar consisting of a computational system and a lexicon. From this perspective the notion of a language as a set of sentences plays no role. Furthermore, the concept of I-language does not rely on a notion of ‘an ideal speaker-listener, in a completely homogeneous speech-community, who knows its language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance’ (Chomsky 1965: 3). This notion had been assumed in previous work, including Chomsky (1965), as standard for modern general linguistics.

19.5 Generative Grammar in 2011: Theory and its Evolution

In the past sixty years the landscape of generative grammar has undergone a radical transformation. The two grammatical mechanisms for creating linguistic structure have been reduced to one, eliminating phrase structure rules (Chomsky 1995a) and reducing transformations to their simplest formulation as single elementary operations (see Chomsky 1976, 1980). This reduction unifies phrase structure and transformational rules under the simplest combinatorial operation. As discussed below, this reduction has been facilitated by the development of a system of constraints on the operation and output of transformations, formulated as general principles of grammar (and thus part of a substantive proposal about the content of UG). Since the 1990s Chomsky and others have been concerned with refining and reducing these constraints from the perspective of economy conditions on derivations and representations (see Chomsky 1991), ultimately as principles of efficient computation. The goal of this work is to show how the computational system for human language, a central part of a biological language faculty, incorporates properties of economy, simplicity, and efficiency—thereby revealing the optimal nature of language design.

As a concrete illustration, consider the derivation of (2) above. The structural analysis is derived via the elementary operation Merge,26 which combines constituents (lexical items and/or constituents formed from lexical items) to form a new syntactic object with a syntactic label the matches the label of one of the constituents of the combination. Like generalized transformations of the earliest theory, this operation maps pairs of syntactic objects onto a single object. Thus the and books will be merged to form a syntactic object labelled N and that construct will be merged with take to form a phrase labelled V. The lexical item in a phrase that determines its label (p. 457) constitutes the head of the phrase.27 Assuming that every phrase is endocentric (i.e. has a unique head), the derivation of (2) under Merge yields (17) where T stands for tense and contains tense and agreement features that are ultimately attached to the verb take (cf. Affix Hopping in §19.3).

 Noam Chomsky's Contribution to LinguisticsA Sketch

The category labels D, N and V are inherent features of the lexical items indicated orthographically as the, man, take, and books. The designations NP, TP, and VP indicate the maximal phrasal projection of the categories N, T, and V. In contrast to phrase structure rules, which construct a set of strings whose hierarchical structure is derived by an interpretive procedure of comparing adjacent pairs of strings in a phrase marker, Merge constructs the hierarchical structure directly but not a set of strings as in (3). Furthermore, under this model the lexicon is separated from the computational system as an autonomous component of the grammar (cf. the lexical phrase structure rules in (1)).28

Under Merge the derivation of the passive counterpart to (2) (i.e. (5)) involves the intermediate structure (18).

(18) [T were [VP [V taken [NP the books] ] [PP by [NP the man]]]]

The NP the books is merged with the verb taken as its logical object because it is in this position that the NP is assigned its semantic function by the verb. The NP the man is merged as the object of the passive P by, in which position it is interpreted as the logical subject of the verb. To derive (2) from (18), the NP the books must be merged with the phrase (18) to create the TP (19).

(19) [TP [NP the books] [T were [VP [V taken [NP the books] ] [PP by [NP the man]]]]

(p. 458) Chomsky distinguishes this application of Merge as ‘internal Merge’ (IM) as compared to ‘external Merge’ (EM), which applies in the derivation of (18). Like EM, IM joins two syntactic objects X and Y to form a new single syntactic object. In the case of IM a copy of X is contained in Y, whereas with EM X is not a part of Y. As Chomsky notes, ‘Unless there is some stipulation to the contrary, which would require sufficient empirical evidence, both kinds of Merge are available for FL and IM creates copies’ (2008: 140). Taking minimal computation as an overriding principle, the copy theory of IM is the null hypothesis.

The copy of the NP the books in the verbal object position is relevant to interpretation, but not pronunciation (i.e. phonetic form) and therefore is deleted (via another elementary operation Delete29) from the syntactic representation of phonetic form (PF) that interfaces with the sensory-motor components of the mind/brain. This leads to two distinct interface representations, PF and LF (which connects with the conceptual/intensional components of the mind/brain), yielding a derivational model (20) where a derivation bifurcates at some point, one part producing PF and the other LF.30

 Noam Chomsky's Contribution to LinguisticsA Sketch

This model captures the phenomenon of displacement, where a constituent is interpreted in a different syntactic position than the one in which it is pronounced, a phenomenon that may be unique to natural language.

Replacing the passive transformation (6) with Merge eliminates the language-specific and construction-specific character of the operation that generates passive constructions. Employing IM generally for displacement phenomena eliminates transformations that compound operations (e.g. the double lexical insertion and the (p. 459) inversion of two NPs in (6)). Furthermore, it solves the problem of derived constituent structure because every application yields a distinct structure.

Merge generalizes to intra-clausal NP displacement, as illustrated in (21).31


  1. a. The man was reported to have taken the books.

  2. b. The man is likely to have taken the books.

In (21a–b) the NP the man is interpreted as the subject of taken, its predicate-argument function, although it pronounced as the main clause subject. Thus in LF representation the infinitival subordinate clause has a covert NP subject the man, as illustrated in (22).


  1. a. [TP [NP the man] was [VP reported [TP [NP the man] to have taken the books ]]]

  2. b. [TP [NP the man] is [AP likely [TP [NP the man] to have taken the books ]]]

The infinitival clause TP functions as an argument of the verb reported and the predicate adjective likely in the same way as the finite subordinate clauses in (23).


  1. a. It was reported that the man had taken the books.

  2. b. It was likely that the man had taken the books.

The main clause subject position in these constructions is not assigned a predicate-argument function by the main clause predicate, as demonstrated by the occurrence of pleonastic and semantically null non-referential it in (23). Therefore, NP displacement in (21) involves one position that is assigned a predicate-argument function by a predicate and another that is not. This analysis also applies to cases of wh-displacement, as illustrated in (24).


  1. a. Which books did the teacher give to the student?

  2. b. The books which the teacher gave to the student were on the list.

  3. c. To whom did the teacher give the books?

  4. d. The student to whom the teacher gave the books is in this class.

The interrogative phrase which books and the relative pronoun which are assigned a predicate-argument function as the direct object of give, but the clause initial position in which they are pronounced is assigned no such argument function. A similar analysis holds for to whom. In this way Merge replaces a special construction-specific wh-movement transformation (as actually formulated in SS) by generalizing across all displacement phenomena.

(p. 460) The massive reduction of the grammatical machinery of LSLT/SS to essentially two elementary operations, Merge and Delete,32 was made possible by the development of a framework of constraints on the formulation of grammatical rules and on their operation in derivations and their output, the representations they produce. This system of constraints was postulated at the level of grammatical theory as principles of grammar, hence as part of UG. The development of this system focuses on the initial state of the language faculty (UG) (as opposed to complete grammars of particular I-languages), leading to the formulation of the Principles and Parameters framework in the late 1970s.33 According to Chomsky (1981a: 61), ‘The goal of research into UG is to discover a general system of principles and parameters such that each permissible core grammar is determined by fixing the parameters of the system.’ The system of principles in Chomsky (1981b) includes constraints on (i) predicate/argument structure (the θ-Criterion, see below), (ii) the occurrence of NPs with phonetic content (the Case Filter, see below), (iii) the occurrence of silent copies resulting from IM (a) in terms of the syntactic distance from their antecedents (the Subjacency Condition) and (b) in terms of syntactic configuration characterized by a relation of government (the Empty Category Principle), and (iv) the syntactic relations between pairs of NPs construed in an anaphoric relation (principles of binding). In addition to the set of principles, UG also involves a set of parameters that account for cross-linguistic variation among languages. For example, some languages (e.g. Spanish and Italian) allow finite indicative clauses with covert pronominal subjects, whereas others (e.g. French, English, and German) do not; and further, some languages (e.g. French, Spanish, Italian, and German) allow yes/no interrogative construction where the finite main verb occurs clause-initially, whereas English does not. Chomsky (1981a: 38) summarizes as follows:

The theory of UG must be sufficiently rich and highly structured to provide descriptively adequate grammars. At the same time, it must be sufficiently open to allow for the variety of languages. Consideration of the nature of the problem at a qualitative level leads to the expectation that UG consists of a highly structured and (p. 461) restrictive system of principles with certain open parameters, to be fixed by experience. As these parameters are fixed, a grammar is determined, what we may call a ‘core grammar.’

UG provides a finite set of parameters (Chomsky 1981b: 11). Furthermore, ‘The grammar of a particular language can be regarded as simply the specification of values of parameters of UG, nothing more’ (p. 31).34

The system of general principles that has developed from the early 1970s sufficiently limits the operation and output of grammatical operations so that we can keep to their maximally simple formulations. Given the free application of Merge, various deviant constructions could otherwise be generated if not prohibited by general principles. For example, the failure of IM in the derivation of (21) could yield (25), where the main clause subject contains non-referential it.


  1. a. *It was reported the man to have taken the books.

  2. b. *It is likely the man to have taken the books.

The examples in (25) violate the Case Filter, which prohibits NPs containing phonetic features from occurring in a position that is not licensed for structural Case. The subject of the infinitival clause in (25) is not in a construction that licences structural Case and therefore the NP the man violates the Case Filter. The computational system thus prohibits the generation of deviant constructions like (25). Free Merge can also misgenerate constructions like (26), where instead of merging non-referential it as the main clause subject, a NP with semantic content is merged instead.


  1. a. *The woman was reported that the man had taken the books.

  2. b. *The woman was likely that the man had taken the books.

In (26), the NP the woman is not assigned a semantic function by any predicate and therefore violates the part of the θ-Criterion that prohibits NPs with semantic content that are assigned no semantic function by any predicate. Both the Case Filter and the θ-Criterion function as conditions on representations.

Comparing the legitimate constructions in (21) (where IM applies to the infinitival subordinate clause subject by creating a copy as the subject of the finite main clause) to the deviant constructions in (25) (where IM does not apply) demonstrates how Case motivates NP displacement—i.e. IM applies when it must. In contrast, consider (27) where unconstrained free Merge, as indicated in the analyses (27a.ii) and (27b.ii), produces another deviant result.


  1. a.

    1. i. *The man was reported (that) has taken the books.

    2. ii. [TP [NP the man] was [VP reported [CP (that) [TP [NP the man] to have taken the books ]]]]

      (p. 462)

  2. b.

    1. i. *The man is likely (that) has taken the books.

    2. ii. [TP [NP the man] is [AP likely [CP (that) [TP [NP the man] to have taken the books]]]]

In (27), because the NP the man is licensed for Case as the subject of the finite subordinate clause, displacement of the NP to subject position of the finite clause via IM is not motivated by the Case Filter. If the application of IM in (27) is not required by any other UG principle, then the deviance of (27a.i) and (27b.i) would follow from a basic economy-of-derivation assumption ‘that operations are driven by necessity: they are “last resort,” applied if they must be, not otherwise’ (Chomsky 1993: 31). Thus economy of derivations supports the minimal formulation of the grammatical operation Merge.

In tandem with economy constraints on derivations, Chomsky proposes a complementary constraint on representations: Full Interpretation (FI), which prohibits superfluous symbols in PF and LF, the two interfaces of syntax with systems of language use (see Chomsky 1986: 98, 1991: 437). Given that phonetic features are superfluous for the conceptual–intensional interface and that, correspondingly, semantic features are superfluous for the sensory–motor interface, FI requires a derivational point where phonetic and semantic features are separated onto distinct sub-paths via Spell-Out, thereby motivating the derivational model (20). Furthermore, FI subsumes the empirical effects of the portion of the θ-Criterion that prohibits (26), given that the NP the woman, having semantic content but filling no semantic function for any predicate in the sentence, would be presumably uninterpretable and hence superfluous at LF. Note also that the Case Filter might also be replaced by FI given that Case features on nouns are also uninterpretable under the analysis of Chomsky (1995b: ch. 4). Assuming that these features exist in the lexical entries for nouns that also have phonetic features, they must be eliminated during a derivation via a process of feature checking so that they will not occur in interface representations. The erasure of uninterpretable features results from an operation Agree that matches these features to corresponding interpretable features of another element in a construction.35 In this way, FI requires minimal representations and at the same time provides a principled motivation for certain computational operations.

Another phenomenon that bears on the issue of minimal computation concerns the syntactic distance between pairs of adjacent copies created by IM, where shorter distances are preferred. Consider for example the following paradigm involving NP displacement in complex sentences.


  1. a. It seems that it has been reported that the student had taken the books.

  2. b. It seems to have been reported that the student had taken the books.

  3. (p. 463) c. It seems that the student has been reported to have taken the books.

  4. d. The student seems to have been reported to have taken the books.

  5. e. *The student seems that it has been reported to have taken the books.

In the derivation of (28a) IM does not apply. This construction involves two syntactic positions to which a semantic function is not assigned, i.e. the subject of seem in the main clause and the subject of the passive predicate reported. This is demonstrated in (28a), where both positions are filled with pleonastic non-referential it. Also both positions can take displaced arguments, as illustrated in (28c) and (28d), where the NP the students is interpreted as the logical subject of the verb taken. (28b) shows that pleonastic elements can apparently also be affected by IM. However, displacement is blocked in (28e) where the distance between the subject of taken and the syntactic subject of seems crosses another subject position that does not contain a copy of the displaced NP. In contrast, the derivation of (28d) creates a chain of three copies of the NP the students where each link, consisting of a pair of adjacent copies, conforms to the shortest distance criterion. Chomsky (1995b) formulates this constraint as the Minimal Link Condition (MLC), interpreted ‘as requiring that at a given stage of a derivation, a longer link from α to K cannot be formed if there is a shorter legitimate link from β to K’ (Chomsky 1995b: 295). It follows from the MLC that long distance displacement (e.g. across multiple clauses) requires a series of local steps.36 Thus the derivation of (28d) applies IM twice, as illustrated in (29) where NP indicates a copy of the displaced NP the student.

(29) [TP The student seems [TP NP to have been reported [TP NP to have taken the books]]]

IM (as well as EM) applies here to successively larger structures, hence cyclically. Cyclic computation also contributes to the overall minimal character of the computational system.37 In Chomsky's most recent work (2007, 2008) cyclic computation is enforced by the No Tampering Condition (NTC) that restricts Merge to the edges of the syntactic objects it combines.

(30) NTC: Merge of X and Y leaves the two S[yntactic] O[bject]s unchanged. (Chomsky 2008: 138)

The NTC also constitutes a very narrow constraint on derived constituent structure, adhering to the principle of minimal computation and thereby contributing to a minimal account of representations. Note that the NTC entails the copy theory of IM.

(p. 464) Chomsky (2000b) introduces the Phase Impenetrability Condition (PIC) as another constraint on derivations that enforces a stronger form of cyclic computation.38 The PIC is based on a conception of derivations as having multiple points where syntactic objects are transferred to the interfaces via Spell-Out. The syntactic objects transferred are called ‘phases’; and ‘Optimally, once a phase is transferred, it should be mapped directly to the interface and then “forgotten”; later operations should not have to refer back to what has already been mapped to the interface—again, a basic intuition behind cyclic operations’ (Chomsky 2005: 17). As a result, derivation by phase imposes a high degree of locality, especially for the application of IM, and in this way contributes significantly to the goal of minimal computation.39

This focus on minimal computation derives most directly from Chomsky's concern for notions of simplicity and economy, which he had expressed on the first page and in first footnote of MMH (and repeated almost verbatim in the second footnote of chapter IV in LSLT again citing Goodman 1943 and adding Quine 1953):

It is important, incidentally, to recognize that considerations of simplicity are not trivial or ‘merely esthetic.’ It has been remarked in the case of philosophical systems that the motives for the demand for economy are in many ways the same as those behind the demand that there be a system at all. (MMH p. 114)

The proposal of economy conditions on derivations and representations in Chomsky 1991 served as a prelude to the formulation of a minimalist program for linguistic theory (henceforth MP) in Chomsky (1993) (written a year earlier).40

Chomsky (1995b: 9) formulates the MP as a research program that addresses two interrelated questions.


  1. a. To what extent is the computational system for human language optimal?

  2. b. To what extent is human language a ‘perfect’ system?

These questions are interrelated to the extent that an optimal computational system is a reasonable prerequisite for establishing the perfection of human language as a system. Answers to these questions require precise substantive interpretations of the adjectives optimal and perfect. As discussed above, the characterization of the computational system under the Principles and Parameters framework does appear to be optimal to the extent that it focuses on minimal computation in terms of the minimal formulation of grammatical mechanisms, the minimal function of these mechanisms in derivations, (p. 465) and the minimal nature of the representations they produce. These formulations also conform to basic notions of simplicity, economy, and efficiency of computation.

If it turns out that the computational system for human language is in fact optimal, then it could also be true that human language is a ‘perfect’ system in some precise sense. Chomsky (1995b: 228) provides a single criterion.

A ‘perfect language’ should meet the condition of inclusiveness: any structure formed by the computation (in particular, PF and LF) is constituted of elements already present in the lexical items selected for [the numeration] N; no new objects are added in the course of computation apart from rearrangements of lexical properties (in particular, no indices, bar-levels in the sense of X-bar theory, etc.).

This inclusiveness condition employs the lexicon to place significant constraints on the application and output of the computational system and thus contributes as well to minimal computation.

Ultimately the MP is an attempt to study the question of how well FL is designed, a new question that arises within the Principles and Parameters framework. ‘The substantive thesis is that language design may really be optimal in some respects, approaching a “perfect solution” to minimal design specifications.’41 These specifications concern the crucial requirement that linguistic representations are ‘legible’ to the cognitive systems that interface with FL, a requirement ‘that must be satisfied for language to be usable at all’ (Chomsky 2001: 1). Chomsky 2000b designates this as the strong minimalist thesis (SMT) and formulates it as (32).42

(32) Language is an optimal solution to legibility conditions. (p. 96)

Thus the SMT narrows the focus of UG to interface conditions, which may result in a significant reduction and simplification of UG.43 Chomsky 2008 explains:

If SMT held fully, which no one expects, UG would be restricted to properties imposed by interface conditions. A primary task of the MP is to clarify the notions that enter into SMT and to determine how closely the ideal can be approached. Any departure from SMT—any postulation of descriptive technology that cannot be given a principled explanation—merits close examination, to see if it is really justified. (p. 135)

From the perspective of the SMT there are three factors that affect the growth of language in the individual: data external to the individual (the contribution of experience), UG (the genetic endowment of the species), and principles that are not specific to FL. Chomsky (2004) subdivides UG into interface conditions (‘the principled part’) vs ‘unexplained elements’ (p. 106) and contrasts UG with general properties (the third (p. 466) factor, principles that are not specific to FL44), the latter elaborated in Chomsky (2005) as falling into two subtypes:

(a) principles of data analysis that might be used in language acquisition and other domains; (b) principles of structural architecture and developmental constraints that enter into canalization, organic form, and action over a wide range, including principles of efficient computation, which would be expected to be of particular significance for computational systems such as language. It is the second of these subcategories that should be of particular significance in determining the nature of attainable languages. (p. 106)

Based on this three-way contrast, Chomsky (2004) proposes another formulation of the SMT where UG contains no ‘unexplained elements’ and therefore all parts of UG are principled.

We can regard an account of some linguistic phenomena as principled insofar as it derives them by efficient computation satisfying interface conditions. We can therefore formulate SMT as the thesis that all phenomena of language have a principled account in this sense, that language is a perfect solution to interface conditions, the conditions it must at least partially satisfy if it is to be usable at all. (2004: 5)

Incorporating third factor considerations into the study of language leads to a second approach to the formulation of UG, as discussed in Chomsky (2007):

Throughout the modern history of generative grammar, the problem of determining the character of FL has been approached ‘from top down’: How much must be attributed to UG to account for language acquisition? The MP seeks to approach the problem ‘from bottom up’: How little can be attributed to UG while still accounting for the variety of I-languages attained, relying on third factor principles? The two approaches should, of course, converge, and should interact in the course of pursuing a common goal. (p. 4)

Approaching UG from below shifts ‘the burden of explanation from the first factor, the genetic endowment, to the third factor, language-independent principles of data processing, structural architecture, and computational efficiency, thereby providing some answers to the fundamental questions of biology of language, its nature and use, and perhaps even its evolution’ (Chomsky 2005: 9). This third factor approach to language design opens the possibility that methodological considerations of simplicity and economy might be recast as empirical hypotheses about the world, thereby establishing more substantive connections between general biology and the biolinguistic perspective at the core of the MP within the Principles and Parameters framework.

(p. 467) 19.6 Summing Up

The preceding sketch has attempted to convey the magnitude of Chomsky's contribution to linguistics by comparing his initial formulation of generative grammar with his structuralist predecessors' approach to syntax and then comparing that formulation to the current perspective. In the intervening six decades, Chomsky:

  1. a. constructed a formal theory of grammar (leading to the discovery of abstract underlying linguistic structure) and explored its foundations;

  2. b. developed a cognitive/epistemological interpretation of the theory, leading to an understanding of human language as a component of mind/brain with substantial innate content, hence a part of human biology;

  3. c. contributed a series of major proposals for constraints on grammars (ongoing from the beginning) that resulted in a significant reduction in and simplification of the formal grammatical machinery;

  4. d. re-evaluated the theory of grammar in terms of questions about language design, raising the possibility of empirical proposals about the language faculty as a biological entity with properties of economy, simplicity, and efficient computation.

From the beginning Chomsky's work placed linguistics at the centre of the cognitive revolution of the 1950s (see Miller 2003) and established the importance of the field for related fields concerned with the study of human language (e.g. philosophy, psychology, anthropology, computer science, and biology).45 In redefining the science of language, Chomsky has wrought a revolution without precedent in the history of linguistics.


(1) I am indebted to Howard Lasnik for discussions of some of the material developed in this chapter and to Noam Chomsky, Terje Lohndal, Katy McKinney-Bock, Carlos Otero, and Jon Sprouse for comments on a previous draft.

(2) See Chomsky's (1975a) preface to LSLT for details and discussion of the background of the manuscript.

(3) See esp. his discussions of the mind/body problem, which he points out cannot be properly formulated because since Newton's demolition of the mechanical philosophy there is as yet no coherent theory of body (Chomsky 1988, 2000c, 2009). For a graphic demonstration, see Hoffmann (1993).

(4) The second footnote to this passage cites §56.2 for a brief mention of the historical analogy, which Chomsky credits as the source of his own work in generative grammar.

(5) To quote Harris: ‘The basic operations are those of segmentation and classification. Segmentation is carried out at limits determined by the independence of the resulting segments in terms of some particular criterion. If X has a limited distribution in respect to Y, or if the occurrence of X depends upon (correlates completely with) the occurrence of a particular environment Z, we may therefore not have to recognize X as an independent segment at the level under discussion [footnote omitted, RF]. Classification is used to group together elements which substitute for or are complementary to one another [footnote omitted, RF]’ (Harris 1951: 367). Harris elaborates in a footnote that ‘the class of elements then becomes a new element of our description on the next higher level of inclusive representation.’ These operations yield a grammar of lists. ‘In one of its simplest forms of presentation, a synchronic description of a language can consist essentially of a number of lists’ (p. 376). These include a segment-phoneme list, a phoneme distribution list, several morphophonemic lists, lists dealing with type and sequences of morphemes, a component and construction list, and a sentence list—the list of utterance structures.

(6) For some detailed discussion of how the analysis in MMH departs from the strictures of Harris (1951), see Freidin (1994).

(7) In particular, Chomsky's 1975 preface to LSLT credits Zellig Harris: ‘While working on LSLT I discussed all aspects of this material frequently and in great detail with Zellig Harris, whose influence is obvious throughout’ and mentions several other teachers and friends in Cambridge to whom he is indebted ‘not only for their encouragement but also for many ideas and criticisms’ (1975a: 4–5).

(8) In effect, syntax is concerned with the output of morphological processes, a definition that echoes Bloomfield (1933) where syntactic constructions ‘are constructions in which none of the immediate constituents is a bound form’ (p. 184) and also Bloch and Trager's Outline of Linguistic Analysis (1942), where syntax is limited to the analysis of constructions that involve only free forms (p. 71). Bloomfield (1933) attempts to separate syntax from morphology with the distinction between words and phrases. ‘By the morphology of a language we mean the constructions in which bound forms appear among the constituents. By definition, the resultant forms are either bound forms or words, but never phrases. Accordingly, we may say that morphology includes the constructions of words and parts of words, while syntax includes the construction of phrases’ (p. 207). The approach is concerned with a strict separation of syntax and morphology—which is essentially abandoned in SS, e.g. in the syntactic analysis of the English verbal system. Generative grammar generally rejects the strict separation of levels, see for example the discussion of accent and juncture in Chomsky, Halle, and Lukoff (1956) about how higher levels of analysis can affect the choice of phonemic analysis.

(9) In American structuralist linguistics this focus on the relationships between words goes back to Bloomfield (1914: 167): ‘Syntax studies the interrelations of words in the sentence. These interrelations are primarily the discursive ones of predication and attribution, to which may be added the serial relation.’ The perspective carries forward to Hockett's 1958 introduction to the field, A Course in Modern Linguistics, which states that ‘syntax includes the ways in which words, and suprasegmental morphemes, are arranged relative to each other’ (p. 177).

(10) Gleason identifies immediate constituent as the most important concept. ‘The process of analyzing syntax is largely one of finding successive layers of ICs and of immediate constructions, the description of relationships which exist between ICs, and the description of those relationships which are not efficiently described in terms of ICs. The last is generally of subsidiary importance; most of the relationships of any great significance are between ICs’ (1955: 133).

(11) Gleason discusses government (as it relates to case) and concord (agreement) as two other syntactic devices, presumably not universal, that indicate structural relations, but does not go beyond citing a few examples.

(12) Phrase structure rules are first introduced in MMH, so Chomsky invented phrase structure grammar in his 1949 senior thesis at the University of Pennsylvania—including the rewrite mechanism utilized in their formulation. All the versions of MMH contained a rudimentary phrase structure grammar that was unaffected by changes in the later versions (Noam Chomsky, p.c.). It has been claimed that Chomsky's use of phrase structure rules comes from the work of Emil Post (see e.g. Scholz and Pullum 2007, which cites Post (1944) (as does Chomsky 1959a, see also Chomsky 2009b)). However, Chomsky's first exposure to Post was via Rosenbloom's Mathematical Logic, published in 1950, which Chomsky cites in chapter III of LSLT—referring to the second appendix ‘Algebraic Approach to Language: Church's Theorem.’ Post uses the rewrite notation to establish relations between strings and is not concerned with their hierarchical structure. The use of the notation for the analysis of phrase structure constitutes an intuitive leap.

(13) For ease of presentation the sequence of rules has been reorganized so that the rules that introduce single lexical items are listed as a block after the rules that specify phrases.

The phrase structure rules in SS are all formulated in context-free format, whereas LSLT also utilizes more complicated context-sensitive rules, where the application of the rewrite rule is restricted to a specific context or contexts. Chomsky (1959a) investigates the formal properties of grammars utilizing rewrite rules restricted to varying degrees and develops proofs concerning the equivalency between types of grammar and types of formal automata in terms of computational power, what is now referred to as the Chomsky Hierarchy. For example, grammars that allow only context-free phrase structure rules are equivalent to non-deterministic pushdown storage automata and grammars that allow context-sensitive phrase structure rules as well are equivalent to linear bounded automata, which are more powerful. For discussion of the significance of the Chomsky Hierarchy for transformational grammar, see Lasnik (1981) and Lasnik and Uriagereka (2011).

(14) Because the specification of C can depend on the prior application of a transformation (e.g. the rule deriving passive constructions), it cannot be handled by a phrase structure rule.

(15) This rejects the bottom-up approach to constituent structure of Harris (1946), endorsed in Wells (1947 (cf. §48)).

(16) While this analysis of sentence relatedness mirrors the transformational analysis proposed by Harris (see 1952 and 1957, esp. the first footnote), Chomsky's notion of transformation departs radically from Harris's in that it involves abstract underlying representations whereas in Harris transformations are formulated as relations between surface patterns. See Freidin (1994) for a detailed discussion.

(17) The A-over-A Principle states that ‘if a phrase X labeled as category A is embedded within a larger phrase ZXW which bears the same label A, then no rule applying to the category A applies to X (but only to ZXW)’ (Chomsky 1964b: 931). The A-over-A Principle was originally proposed to restrict the ambiguous application of transformation (see Chomsky 1964b for details). It was not proposed to solve the problem of constructions like (11b). Note that if it applies to the structure underlying (11b), then it should also apply to the structure underlying (12b), thereby blocking the derivation of a legitimate sentence.

(18) All quotes in this paragraph are from p. 57 of LSLT.

(19) For further discussion see various proposals from Generative Semantics, a line of work on generative grammar based on the hypothesis that underlying syntactic representations constituted the single level of semantic representation and therefore must be considerably more abstract than those Chomsky was positing. The disagreements between the two sides have been characterized somewhat hyperbolically as ‘linguistics wars’—for a history of the period see Newmeyer (1986), and for further commentary Harris (1993) and Huck and Goldsmith (1995); for critical review of the latter, see Freidin (1997).

(20) Chomsky comments, ‘Given particular grammars, we could generalize to an abstract theory. Given a sufficiently powerful abstract theory, we could automatically derive grammars for particular languages’ (LSLT p. 78). The challenge of course is how to determine which grammars to choose as a basis for the abstract theory. LSLT discusses ‘two factors involved in determining the validity of a grammar: the necessity to meet the external conditions of adequacy and to conform to the general theory. The first factor cannot be eliminated, or there are no constraints whatsoever on grammar construction; the simplest grammar for L will simply identify a grammatical sentence in L as any phone sequence. Elimination of the second factor leaves us free to choose at will among a vast number of mutually conflicting grammars’ (p. 81). In effect, constructing the general theory solely on the basis of particular grammars cannot succeed because the formulation of particular grammars is going to be determined in part on the basis of some theoretical ideas. ‘We can scarcely describe a language at all except in terms of some previously assumed theory of linguistic structure’ (LSLT p. 78). Again to quote Chomsky: ‘Actually, of course, neither goal can be achieved independently. In constructing particular grammars, the linguist leans heavily on a preconception of linguistic structure, and any general characterization of linguistic structure must show itself adequate to the description of each natural language. The circularity is not vicious, however. The fact is simply that linguistic theory has two interdependent aspects. At any given point in its development, we can present a noncircular account, giving the general theory as an abstract formal system, and showing how each grammar is a particular example of it. Change can come in two ways—either by refining the formalism and finding new and deeper underpinnings for the general theory, or by finding out new facts about languages and simpler ways of describing them’ (LSLT p. 79). The fundamental problem for linguistics as for any scientific inquiry is that observation statements are intrinsically theory-laden (see Hanson 1958 for discussion). See ch. 2 of LSLT for further discussion of the relationships between particular grammars and the general theory.

(21) Chomsky (1986) mentions only the first three, formulated in terms of ‘knowledge of language,’ though it is clear from the discussion that this refers to specific generative grammars. Chomsky (1988) introduces the fourth question and formulates these questions in terms of a system of knowledge. More recently, Chomsky has added a fifth question about the evolution of language—how did such systems arise in humans? See Hauser et al. (2002) for some discussion and Larson, Deprez and Yamakido (2010) for further commentary.

(22) See Chomsky (2000a) for discussion of the issues, especially the unification of linguistics and neuroscience. See also Poeppel and Embick (2005) on the disjointness of basic concepts in linguistics and neuroscience. Chomsky (2000a) notes, ‘the recursive procedure is somehow implemented at the cellular level, how no one knows’ (p. 19).

(23) Novel utterances result from one property of what Chomsky calls the creative aspect of language use, that language is innovative—i.e. that speakers are constantly creating sentences that are new to their experience and possibly to the history of the language. The other properties include that language is unbounded (there is in principle no longest sentence), free from stimulus control (speech is an act of free will), coherent, and appropriate to the situation. Chomsky notes that there has been no progress in finding an explanatory account for the creative aspect. ‘Honesty forces us to admit that we are as far today as Descartes was three centuries ago from understanding just what enables a human to speak in a way that is innovative, free from stimulus control, and also appropriate and coherent. This is a serious problem that the psychologist and biologist must ultimately face and that cannot be talked out of existence by invoking “habit” or “conditioning” or “natural selection.”’ (2006 [1968]: 22–3). See Chomsky (1966/2009, 1968/2006) for further discussion.

(24) For further references and recent discussion, see Berwick et al. (2011).

(25) In practice, research produced fragments of grammars, never a complete generative grammar of any specific language. See also n. 20 above.

(26) Merge is first proposed in Chomsky (1995a).

(27) This concept lies at the core of the X-bar theory of phrase structure first proposed in Chomsky (1970). Although X-bar theory is fundamentally a bottom-up analysis of syntactic structure, it was nonetheless formulated in terms of top-down phrase structure rule schema. Merge incorporates the fundamental insight, eliminating the top-down implementation. It follows from Merge that all syntactic constructions are endocentric.

(28) The separation of the lexicon occurs in Chomsky (1965), allowing for the elimination of context-sensitive phrase structure rules, a reduction in the descriptive power of phrase structure rules. Note also that finite T in (17) would constitute an abstract item in the lexicon.

(29) Delete accounts for ellipsis phenomena (e.g. VP-deletion in (i)) as well as eliminating multiple copies of a constituent at PF. John has bought a new computer and Mary has bought a new computer too. The strikethrough marks the VP that is interpreted in LF but not pronounced in PF.

(30) The original proposal occurs in Chomsky and Lasnik (1977) under a model that includes both phrase structure rules and a somewhat different formulation of transformations. Since Chomsky (1993) it has been assumed that the only levels of representation are the two interface levels, hence no level of deep structure, and no level of surface structure that is distinct from PF (see §19.3).

(31) In Chomsky's early work on this approach (see Chomsky 1976), NP displacement was handled by a rule called ‘Move NP’ where the operation was conceived as movement of an NP from one position to another, leaving behind an empty NP category called a trace. A separate rule ‘Move wh’ handled the displacement of wh-phrases in interrogatives and relative clauses. In Chomsky (1981b), these rules are replaced by the maximally general ‘Move α.’ The original formulation of Merge (Chomsky 1995a) distinguishes it from an operation Move. The distinction is eliminated in Chomsky (2004), where Move is recast as IM.

(32) More recently Chomsky has proposed distinguishing two types of Merge: pair Merge, which applies to adjuncts, and set Merge, which applies to non-adjuncts. See Chomsky (2004, 2008) for details. Delete appears to be restricted to the externalization of linguistic expressions (i.e. PF), perhaps as a consequence of ‘a principle of minimal computation (spell-out and pronounce as little as possible)’ (Noam Chomsky, p.c.).

(33) This line of research begins with Chomsky's A/A Principle and continues with Ross's critique of the principle in his 1967 MIT dissertation (published as Ross 1986). Ross proposes to replace the A/A Principle with a new set of general constraints that identify ‘syntactic islands’ from which constituents may not be extracted. These island conditions extend the scope of constraints on the application of transformations. Chomsky (1973) proposes another set of constraints that generalize to NP displacement (Ross 1967 focuses primarily on wh-movement), some of which also account for binding relations between anaphors and their antecedents (see also Chomsky 1976). Chomsky (1991) and (1995b) attempt to reformulate the set of grammatical principles in terms of notions of economy of derivation and representation (see below for some discussion). See Freidin (2011) and Lasnik and Lohndal (2013) for some discussion of this history. For a detailed overview of the Principles and Parameters framework see Chomsky (1981a, 1981b), Chomsky and Lasnik (1993), and Freidin (1996).

(34) Chomsky (1981a) discusses three cases of parameters that relate to the formulation of UG principles, hence the computational system. Chomsky (1991) raises the possibility that parameters of UG instead relate only to the lexicon, specifically functional (as opposed to substantive) elements like T (as opposed to substantive elements like N, V, A, and P), citing Borer (1984) and Fukui (1986, 1988).

(35) For detailed discussion see Chomsky (2000b, 2001, 2004, 2005, 2008). Chomsky (2004: 114) states that IM requires Agree. Chomsky (2005: 17) proposes that uninterpretable features occur as unvalued in the lexicon and are valued and eliminated via Agree (see also Chomsky 2007, 2008). For an overview of Case theory, including its history, see Lasnik (2008).

(36) Regarding a more exact formulation of the MLC, note the comment in Chomsky (2008: 156): ‘Just how small these local steps are remains to be clarified.’

(37) The cyclic application of transformations is first proposed in Chomsky (1965) as a means of reducing the descriptive power of the theory of grammar by eliminating generalized transformations and the related construct of T-markers. See Freidin (1999) and Lasnik (2006) for more detailed discussion of the history of the syntactic cycle.

(38) See also Chomsky (2001, 2004) for more explicit formulations.

(39) However, as Chomsky (2004: 107) notes, ‘It remains to determine what the phases are, and exactly how the operations work’—for example, whether the PIC replaces the MLC as it applies to block the derivation of (28e). Chomsky (2005: 17) adds a further cautionary comment about establishing a PIC: ‘Whether that is feasible is a question only recently formulated, and barely explored. It raises many serious issues, but so far at least, no problems that seem insuperable.’ See Chomsky (2008) for his most recent views on phases.

(40) For discussion of the roots of the minimalist approach in the history of generative grammar, see Freidin and Lasnik (2011), and for commentary on the evolution of linguistic theory that led to the MP, see Freidin and Vergnaud (2001).

(41) Chomsky adds: ‘The conclusion would be surprising, hence interesting if true’ (2000b: 93).

(42) It is worth noting that Chomsky (2007) identifies SMT as holding ‘that FL is “perfectly designed.”’ This suggests that there may be no significant difference between optimal design and perfect design.

(43) See e.g. the discussion above about how FI might replace the Case Filter and a part of the θ-Criterion.

(44) Interface conditions might plausibly be considered part of the third factor given that they are imposed by cognitive systems external to the FL.

(45) See Fitch (in press) for an overview of the relevance of Chomsky's ideas for the biology of language.