Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE ( © Oxford University Press, 2022. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).

date: 05 July 2022


Abstract and Keywords

This introductory chapter tries to set the chapters to follow in a general background, and to link them together. The first part begins by clarifying what is meant by Universal Grammar (UG), first distinguishing grammar from logic and then UG from a group of related concepts (biolinguistics, the language faculty, competence, I-language, generative grammar, language universals, and metaphysical universals). This leads to a clearer definition of UG as the general theory of I-languages, taken to be constituted by a subset of the set of possible generative grammars, and as such characterizes the genetically determined aspect of the human capacity for grammatical knowledge. The remaining sections introduce each of the five parts of the volume: the philosophical background to UG, linguistic theory, language acquisition, comparative syntax, and a number of wider issues ranging from creoles to animal language.

Keywords: Universal Grammar, philosophy of mind, linguistic theory, language acquisition, language universals

1.1 Introduction

Birds sing, cats meow, dogs bark, horses neigh, and we talk. Most animals, or at least most higher mammals, have their own ways of making noises for their own purposes. This book is about the human noise-making capacity, or, to put it more accurately (since there’s much more to it than just noise), our faculty of language.

There are very good reasons to think that our language faculty is very different in kind and in consequences from birds’ song faculty, dogs’ barking faculty, etc. (see Hauser, Chomsky, and Fitch 2002, chapter 21 and the references given there). Above all, it is different in kind because it is unbounded in nature. Berwick and Chomsky (2016:1) introduce what they refer to as the Basic Property of human language in the following terms: ‘a language is a finite computational system yielding an infinity of expressions, each of which has a definite interpretation in semantic-pragmatic and sensorimotor systems (informally, thought and sound).’ Nothing of this sort seems to exist elsewhere in the animal kingdom (see again Hauser, Chomsky, and Fitch 2002). Its consequences are ubiquitous and momentous: Can it be an accident that the only creature with an unbounded vehicle of this kind for the storage, manipulation, and communication of complex thoughts is the only creature to dominate all others, the only creature with the capacity to annihilate itself, and the only creature capable of devising a means of leaving the planet? The link between the enhanced cognitive capacity brought about by our faculty for language and our advanced technological civilization, with all its consequences good and bad for us and the rest of the biosphere, is surely quite direct. Put simply, no language, then no spaceships, no nuclear weapons, no doughnuts, no art, no iPads, or iPods. In its broadest conception, then, this book is about the thing in our heads that brought all this about and got us—and the creatures we share the planet with, as well as perhaps the planet itself—where we are today.

(p. 2) 1.1.1 Background

The concept of universal grammar has ancient pedigree, outlined by Maat (2013). The idea has its origins in Plato and Aristotle, and it was developed in a particular way by the medieval speculative grammarians (Seuren 1998:30–37; Campbell 2003:84; Law 2003:158–168; Maat 2013:401) and in the 17th century by the Port-Royal grammarians under the influence of Cartesian metaphysics and epistemology (Arnauld and Lancelot 1660/1676/1966). Chomsky (1964:16–25, 1965, chapter 1, 1966/2009, 1968/1972/2006, chapter 1) discusses his own view of some of the historical antecedents of his ideas about Universal Grammar (UG henceforth) and other matters, particularly in 17th-century Cartesian philosophy; see also chapter 4. Arnauld and Lancelot’s (1660/1676/1966) Grammaire générale et raisonnée is arguably the key text in this connection and is discussed—from slightly differing perspectives—in chapters 2 and 4, and briefly here.

The modern notion of UG derives almost entirely from one individual: Noam Chomsky. Chomsky founded the approach to linguistic description and analysis known as generative grammar in the 1950s and has developed that approach, along with the related but distinct idea of UG, ever since. Many others, among them the authors of several of the chapters to follow, have made significant contributions to generative grammar and UG, but Chomsky all along has been the founder, leader, and inspiration.

The concept of UG initiated by Chomsky can be defined as the scientific theory of the genetic component of the language faculty (I give a more detailed definition in (2) below). It is the theory of that feature of the genetically given human cognitive capacity which makes language possible, and at the same time defines a possible human language. UG can be thought of as providing an intensional definition of a possible human language, or more precisely a possible human grammar (from now on I will refer to the grammar as the device which defines the set of strings or structures that make up a language; ‘grammar’ is therefore a technical term, while ‘language’ remains a pre-theoretical notion for the present discussion). This definition clearly provides a characterization of a central and vitally important aspect of human cognition. All the evidence—above all the qualitative differences between human language and all known animal communication systems (see Hauser 1996; Hauser, Chomsky, and Fitch 2002; and chapter 21)—points to this being a cognitive capacity that only humans (among terrestrials) possess. This capacity manifests itself in early life with little prompting as long as the human child has adequate nutrition and its other basic physical needs are met, and if it is exposed to other humans talking or signing (see chapters 5, 10, and, in particular, 12). The capacity is universal in the sense that no putatively human ethnic group has ever been encountered or described that does not have language (modalities other than speech or sign, e.g., writing and whistling, are known, but these modalities convey what is nonetheless recognizably language; see Busnel and Classe 1976, Asher and Simpson 1994 on whistled languages). Finally, there is evidence that certain parts of the brain (in particular Broca’s area, Brodmann areas 44 and 45) are centrally involved with language, but crucial aspects of the neurophysiological instantiation of language in the brain (p. 3) are poorly understood. More generally in this connection there is the problem of understanding how abstract computational representations and information flows can be in any way instantiated in brain tissue, which they must be, on pain of committing ourselves to dualism—see chapter 2, Berwick and Chomsky (2016:50) and the following discussion.

For all these reasons, UG is taken to be the theory of a biologically given capacity. In this respect, our capacity for grammatical knowledge is just like our capacity for upright bipedal motion (and our incapacity for ultraviolet vision, unaided flight, etc.). It is thus species-specific, although this does not imply that elements of this capacity are not found elsewhere in the animal kingdom; indeed, given the strange circumstances of the evolution of language (on which see Berwick and Chomsky 2016; and section 1.6), it would be surprising if this were not the case. Whether the human capacity for grammatical knowledge is domain-specific is another matter; see section 1.1.2 for discussion of how views on this matter have developed over the past thirty or forty years.

In order to avoid repeated reference to ‘the human capacity for linguistic knowledge,’ I will follow Chomsky’s practice in many of his writings and use the term ‘UG’ to designate both the biological human capacity for grammatical knowledge itself and the theory of that capacity that we are trying to construct. Defined in this way, UG is related to but distinct from a range of other notions: biolinguistics, the faculty of language (both broad and narrow as defined by Hauser, Chomsky, and Fitch 2002), competence, I-language, generative grammar, language universals and metaphysical universals. I will say more about each of these distinctions directly.

But first a distinct clarification, and one which sheds some light on the history of linguistics, and of generative grammar, as well as taking us back to the 17th-century French concept of grammaire générale et raisonnée.

UG is about grammar, not logic. Since antiquity, the two fields have been seen as closely related (forming, along with rhetoric, the trivium of the medieval seven liberal arts). The 17th-century French grammarians formulated a general grammar, an idea we can take to be very close to universal grammar in designating the elements of grammar common to all languages and all peoples; in this their enterprise was close in spirit to contemporary work on UG. But it was different in that it was also seen as rational (raisonnée); i.e., reason lay behind grammatical structure. To put this a slightly different—and maybe tendentious—way: the categories of grammar are directly connected to the categories of reason. Grammar (i.e., general or universal grammar) and reason are intimately connected; hence, the grammaire générale et raisonnée is divided into two principal sections, one dealing with grammar and one with logic.

The idea that grammar may be understood in terms of reason, or logic, is one which has reappeared in various guises since the 17th century. With the development of modern formal logic by Frege and Russell just over a century ago, and the formalization of grammatical theory begun by the American structuralists in the 1930s and developed by Chomsky in the 1950s (as well as the development of categorial grammars of various kinds, in particular by Adjukiewicz 1935), the question of the relation between formal logic and formal grammar naturally arose. For example, Bar-Hillel (1954) suggested (p. 4) that techniques directly derived from formal logic, especially from Carnap (1937 [1934]) should be introduced into linguistic theory. In the 1960s, Montague took this kind of approach much further, using very powerful logical tools and giving rise to modern formal semantics; see the papers in Montague (1974) and the brief discussion in section 1.3.

Chomsky’s view is different. Logic is a fine tool for theory construction, but the question of the ultimate nature of grammatical categories, representations, and other constructs—the question of the basic content of UG as a biological object—is an empirical one. How similar grammar will turn out to be to logic is a matter for investigation, not decision. Chomsky made this clear in his response to Bar-Hillel, as the following quotation shows:

The correct way to use the insights and techniques of logic is in formulating a general theory of linguistic structure. But this does not tell us what sort of systems form the subject matter for linguistics, or how the linguist may find it profitable to describe them. To apply logic in constructing a clear and rigorous linguistic theory is different from expecting logic or any other formal system to be a model for linguistic behavior

(Chomsky 1955:45, cited in Tomalin 2008:137).

This attitude is also clear from the title of Chomsky’s Ph.D. dissertation, the foundational document of the field: The Logical Structure of Linguistic Theory (Chomsky 1955/1975). Tomalin (2008:125–139) provides a very illuminating discussion of these matters, noting that Chomsky’s views largely coincide with those of Zellig Harris in this respect.

The different approaches of Chomsky and Bar-Hillel resurfaced in yet another guise in the late 1960s in the debate between Chomsky and the generative semanticists, some of whom envisaged a way to once again reduce grammar to logic, this time with the technical apparatus of standard-theory deep structure and the transformational derivation of surface structure by means of extrinsically ordered transformations (see Lakoff 1971, Lakoff and Ross 1976, McCawley 1976, and section 1.1.2 for more on the standard theory of transformational grammar). In a nutshell, and exploiting our ambiguous use of the term ‘UG’: UG as the theory of human grammatical knowledge must depend on logic; just like any theory of anything, we don’t want it to contain contradictions. But UG as human grammatical knowledge may or may not be connected to any given formalization of our capacity for reason; that is an empirical question (to which recent linguistic theory provides some intriguing sketches of an answer; see chapters 2 and 9).

Let us now look at the cluster of related but largely distinct concepts which surround UG and sometimes lead to confusion. My goal here is above all to clarify the nature of UG, but at the same time the other concepts will be to varying degrees clarified.


One could, innocently, take this term to be like ‘sociolinguistics’ or ‘psycholinguistics’ in simply designating where the concerns of linguistics overlap with those of another discipline. Biolinguistics in this sense is just those parts of linguistics (looking at it from the linguist’s perspective) that overlap with biology. This overlap area presumably includes those aspects of human physiology that are directly connected to language, most obviously the vocal tract and the structure of the ear (thinking of sign language, perhaps also the physiology of manual gestures and our visual capacity to (p. 5) apprehend them), as well as the neural substrate for language in the brain. It may also include whatever aspects of the human genome subserve language and its development, both before and after birth. Furthermore, it may study the phylogenetic development of language, i.e., language evolution.

In recent work, however, the term ‘biolinguistics’ has come to designate the approach to the study of the language faculty which, by supposing that human grammatical knowledge stems in part from some aspect of the human genome, directly grounds that study in biology. Approaches to UG (in both senses) of the kind assumed here are thus biolinguistic in this sense. But we can, for the sake of clarity, distinguish UG from biolinguistics. On the one hand, one could study biolinguistics without presupposing a common human basis for grammatical knowledge: basing the study of language in biology does not logically entail that the basic elements of grammar are invariant and shared by all humans, still less that they are innate. Language could have a uniquely human biological basis without any particular aspect of grammar being common to all humans; in that case, grammar would not be of any great interest for the concerns of biolinguistics. This scenario is perhaps unlikely, but not logically impossible; in fact, the position articulated in Evans and Levinson (2009) comes close to this, although these authors emphasize the role of culture rather than biology in giving rise to the general and uniquely human aspects of language. Conversely, one can formulate a theory of UG as an abstract Platonic object with no claim whatsoever regarding any physical instantiation it may have. This has been proposed by Katz (1981), for example. To the extent that the technical devices of the generative grammars that constitute UG are mathematical in nature, and that mathematical objects are abstract, perhaps Platonic, objects, this view is not at all incoherent. So we can see that biolinguistics and UG are closely related concepts, but they are logically and empirically distinct. Combining them, however, constitutes a strong hypothesis about human cognition and its relation to biology and therefore has important consequences for our view of both the ontogeny and phylogeny of language.


UG is also distinct from the more general notion of faculty of language. This distinction partly reflects the general difference between the notions of ‘grammar’ and ‘language,’ although the non-technical, pre-theoretical notion of ‘language’ is very hard to pin down, and as such not very useful for scientific purposes. UG in the sense of the innate capacity for grammatical knowledge is arguably necessary for human language, and thus central to any conception of the language faculty, but it is not sufficient to provide a theoretical basis for understanding all aspects of the language faculty. For example, one might take our capacity to make computations of a Gricean kind regarding our interlocutor’s intentions (see Grice 1975) as part of our language faculty, but it is debatable whether this is part of UG (although, interestingly, such inferences are recursive and so may be quite closely connected to UG).

In a very important article, much-cited in the chapters to follow, Hauser, Chomsky, and Fitch (2002) distinguish the Faculty of Language in the Narrow sense (FLN) from the Faculty of Language in the Broad sense (FLB). They propose that the FLB includes all aspects of the human linguistic capacity, including much that is shared with other (p. 6) species: ‘FLB includes an internal computational system (FLN, below) combined with at least two other organism-internal systems, which we call “sensory-motor” and “conceptual-intentional” ’ (Hauser, Chomsky, and Fitch 2002:1570); ‘most, if not all, of FLB is based on mechanisms shared with nonhuman animals’ (Hauser, Chomsky, and Fitch 2002:1572). FLN, on the other hand, refers to ‘the abstract linguistic computational system alone, independent of the other systems with which it interacts and interfaces’ (Hauser, Chomsky, and Fitch 2002:1570). This consists in just the operation which creates recursive hierarchical structures over an unbounded domain, Merge, which ‘is recently evolved and unique to our species’ (Hauser, Chomsky, and Fitch 2002:1572). Indeed, they suggest that Merge may have developed from recursive computational systems used in other cognitive domains, for example, navigation, by a shift from domain-specificity to greater domain-generality in the course of human evolution (Hauser, Chomsky, and Fitch 2002:1579).

UG is arguably distinct from both FLB and FLN. FLB may be construed so as to include pragmatic competence, for example (depending on one’s exact view as to the nature of the conceptual-intentional interface) and so the point made earlier about pragmatic inferencing would hold. More generally, Chomsky’s (2005) three factors in language design relate to FLB. These are:

  1. (1) Introduction

(See in particular chapter 6 for further discussion.) In this context, it is clear that UG is just one factor that contributes to FLB (but see Rizzi’s (this volume) suggested distinction beteen ‘big’ and ‘small’ UG, discussed in section 1.1.2).

If FLN consists only of Merge, then presumably there is more to UG since more than just Merge makes up our genetically given capacity for languge (e.g., the status of the formal features that participate in Agree relations in many minimalist approaches may be both domain-specific and genetically given; see section 1.5). Picking up on Hauser, Chomsky, and Fitch’s final suggestion regarding Merge/FLN, the conclusion would be that all aspects of the biologically given human linguistic capacity are shared with other creatures. What is specific to humans is either the fact that all these elements uniquely co-occur in humans, or that they are combined in a particular way (this last point may be important for understanding the evolution of language, as Hauser, Chomsky, and Fitch point out—see also Berwick and Chomsky 2016:157–164 and section 1.6 for a more specific proposal).


The competence/performance distinction was introduced in Chomsky (1965:4) in the following well-known passage:

Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech-community, who knows its [the speech community’s] language perfectly and is unaffected by such grammatically irrelevant conditions as (p. 7) memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of this language in actual performance.

So competence represents the tacit mental knowledge a normal adult has of their native language. As such, it is an instantiation of UG. We can think of UG along with the third factors as determining the initial state of language acquisition, S0, presumably the state it is in when the individual is born (although there may be some in utero language acquisition; see chapter 12). Language acquisition transits through a series of states S1 … Sn whose nature is in some respects now quite well understood (see again chapter 12). At some stage in childhood, grammatical knowledge seems to crystallize and a final state SS is reached (here some of the points made regarding non-native language acquisition in chapter 13 are relevant, and may complicate this picture). SS corresponds to adult competence. In a principles-and-parameters approach, we can think of the various transitions from S0 to Ss as involving parameter-setting (see chapters 11, 14, and 18). These transitions certainly involve the interaction of all three factors in language design given in (1), and may be determined by them (perhaps in interaction with general and specific cognitive maturation). Adult competence, then, is partly determined by UG, but is something distinct from UG.


This notion, which refers to the intensional, internal, individual knowledge of language (see chapter 3 for relevant discussion), is largely coextensive with the earlier notion of competence, although competence may be and sometimes has been interpreted as broader (e.g., Hymes’ 1966 notion of communicative competence). The principal difference between I-language and competence lies in the concepts they are opposed to: E-language and performance, respectively. E-language is really an ‘elsewhere’ concept, referring to all notions of language that are not I-language. Performance, on the other hand, is more specific in that it designates the actual use of competence in producing and understanding linguistic tokens: as such, performance relates to individual psychology, but directly implicates numerous aspects of non-linguistic cognition (short- and long-term memory, general reasoning, Theory of Mind, and numerous other capacities). UG is a more general notion than I-language; indeed, it can be defined as ‘the general theory of I-language’ (Berwick and Chomsky 2016:90).

Generative grammar.

A generative grammar is a mathematical construct, a particular kind of Turing machine, that generates (enumerates) a set of structural descriptions over an unbounded domain, creating the ability to make infinite use of finite means. Applying a generative grammar to the description and analysis of natural languages thereby provides a mathematically based account of language’s potentially unbounded expressive power. A generative grammar provides an intensional definition of a language (now construed in the technical sense as a set of strings or structures). Generative grammars are ideally suited, therefore, to capturing the Basic Property of the human linguistic capacity. A given generative grammar is ‘the theory of an I-language’ (Berwick and Chomsky 2016:90). As such, it is distinct from UG, which, as we have seen, is the (p. 8) theory of I-languages. A particular subset of the set of possible generative grammars are relevant for UG; in the late 1950s and early 1960s, important work was done by Chomsky and others in determining the classes of string-sets different kinds of generative grammar could generate (Chomsky 1956, 1959/1963; Chomsky and Miller 1963; Chomsky and Schutzenberger 1963). This led to the ‘Chomsky hierarchy’ of grammars. The place of the set of generative grammars constituting UG on this hierarchy has not been easy to determine, although there is now some consensus that they are ‘mildly context-sensitive’ (Joshi 1985). Since, for mathematical, philosophical, or information-scientific purposes we can contemplate and formulate generative grammars that fall outside of UG, it is clear that the two concepts are not equivalent. UG includes a subset, probably a very small subset, of the class of generative grammars.

Language universals.

Postulating UG naturally invites consideration of language universals, and indeed, if the class of generative grammars permitted by UG is limited, then it follows that all known (and all possible) languages will conform to that class of generative grammars and so formal universals (in the sense of Chomsky 1965) of a particular kind must exist. An example is hierarchical structure. If binary Merge is the core operation generating syntactic structure in UG, then ‘flat,’ or non-configurational, languages of the kind discussed by Hale (1983), for example, are excluded, as indeed are ternary or quaternary branching structures (and the like), as well as non-branching structures.

The tradition of language typology, begun in recent times by Greenberg (1963/2007) has postulated many language universals (see, e.g., the Konstanz Universals Archive, However, there is something of a disconnect between much that has been observed in the Greenbergian tradition and UG. Greenbergian universals tend to be surface-oriented and tend to have counter-examples (see chapter 15). Universals determined by UG have neither of these properties: clearly they must be exceptionless by definition, and, as the example of hierarchical structure and the existence of apparently non-configurational languages shows, they may not manifest themselves very clearly in surface phenomena. Nonetheless, largely thanks to the emphasis on comparative work which stems from the principles-and-parameters model, there has been a rapprochement between UG-driven comparative studies and work in the Greenbergian tradition (see chapters 14 and 15). This has led to some interesting proposals for universals that may be of genuine interest in both contexts (a possible example is the Final over Final Constraint; see Sheehan 2013 and Biberauer, Holmberg, and Roberts 2014).

Metaphysical universals.

UG is not to be understood as a contribution to a philosophical doctrine of universals. Indeed, from a metaphysical perspective, UG may be somewhat parochial; it may merely constrain and empower human cognitive capacity. It appears not to be found in other currently existing terrestrial creatures, and there is no strong reason to expect it to be found in any extra-terrestrial life form we may discover (or which may discover us). This last point, although hardly an empirical one at (p. 9) present, is not quite as straightforward as it might at first appear, and I will return to it in section 1.3.

In addition to Merge, UG must contain some specification of the notion ‘possible grammatical category’ (or, perhaps equivalently, ‘possible grammatical feature’). Whatever this turns out to be, and its general outlines remain rather elusive, it is in principle quite distinct from any notion of metaphysical category. Modern linguistic theory concerns the structure of language, not the structure of the world. But this does not exclude the possibility that certain notions long held to be central to understanding semantics, such as truth and reference, may have correlates in linguistic structure or, indeed, may be ultimately reducible to structural notions (see chapters 2 and 9).

These remarks are intended to clarify the relations among these concepts, and in particular the concept of UG. No doubt further clarifications are in order, and some are given in the chapters to follow (see especially chapters 3 and 5). Many interesting and difficult empirical and theoretical issues underlie much of what I have said here, and of course these can and should be debated. My intention here, though, has been to clarify things at the start, so as to put the chapters to follow in as clear a setting as possible. To summarize our discussion so far, we can give the following definition of UG:

  1. (2) Introduction

As mentioned, the term ‘UG’ is sometimes used, here and in the chapters to follow, to designate the capacity for grammatical knowledge itself rather than, or in addition to, the theory of that capacity.

1.1.2 Three Stages in the History of UG

At the beginning of the previous section I touched on some of the historical background to the modern concept of UG (and see chapter 4 for more, particularly on Cartesian thought in relation to UG). Here I want to sketch out the ways in which the notion of UG has developed since the 1950s.

It is possible to distinguish three main stages in the development of our conception of UG. The first, which was largely rule- or rule-system based, lasted from the earliest formulations until roughly 1980. The second corresponds to the period in which the dominant theory of syntax was Government–Binding (GB) theory; the 1980s and into the early 1990s (Chomsky and Lasnik 1993 was the last general, non-textbook overview of GB theory). The third period was from 1993 to the present, although Hauser, Chomsky, and Fitch (2002) and Chomsky (2005) represent very important refinements of an emerging—and still perhaps not fully settled—view. Each of these stages is informed by, and informs, how both universal and variant properties of languages are viewed.

(p. 10) Taking UG as defined in (2) at the end of the previous section (the general theory of I-languages, taken to be constituted by a subset of the set of possible generative grammars, and as such characterizing the genetically determined aspect of the human capacity for grammatical knowledge), the development of the theory has been a series of attempts to establish and delimit the class of humanly attainable I-languages. The earliest formulations of generative grammars which were intended to do this, beginning with Chomsky (1955/1975), involved interacting rule systems. In the general formulation in Chomsky (1965), the basis of the Standard Theory, there were four principal rule systems. These were the phrase-structure (PS) or rewriting rules, which together with the Lexicon and a procedure for lexical insertion, formed the ‘base component,’ or deep structure. Surface structures were derived from deep structures by a different type of rule, transformations. PS-rules and transformations were significantly different in both form and function. PS-rules built structures, while transformations manipulated them according to various permitted forms of permutation (deletion, copying, etc.). The other two classes of rules were, in a broad sense, interpretative: a semantic representation was derived from deep structure (by Projection Rules of the kind put forward by Katz and Postal 1964), and phonological and, ultimately, phonetic representations were derived from surface structure (these were not described in detail in Chomsky 1965, but Chomsky and Halle 1968 put forward a full-fledged theory of generative phonology).

It was recognized right from the start that complex interacting rule systems of this kind posed serious problems for understanding both the ontogeny and the phylogeny of UG (i.e., the biological itself). For ontogeny, i.e., language acquisition, it was assumed that the child was equipped with an evaluation metric which facilitated the choice of the optimal grammar (i.e., system of rule systems) from among those available and compatible with the PLD (see chapter 11). A fairly well-developed evaluation metric for phonological rules is proposed in Chomsky and Halle (1968, ch. 9). Regarding phylogeny, the evolution of UG construed as in (2) remained quite mysterious; Berwick and Chomsky (2016:5) point out that Lenneberg (1967, ch. 6) ‘stands as a model of nuanced evolutionary thinking’ but that ‘in the 1950s and 1960s not much could be said about language evolution beyond what Lenneberg wrote.’

The development of the field since the late 1960s is well known and has been documented and discussed many times (see, e.g., Lasnik and Lohndal 2013). Starting with Ross (1967), efforts were made to simplify the rule systems. These culminated around 1980 with a theory of a somewhat different nature, bringing us to the second stage in the development of UG.

By 1980, the PS- and transformational rule systems had been simplified very significantly. Stowell (1981) showed that the PS-rules of the base could be reduced to a simple and general X′-theoretic template of essentially two very general rules. The transformational rules were reduced to the single rule ‘move-α’ (‘move any category anywhere’). These very simple rule systems massively overgenerated, and overgeneration was constrained by a number of conditions on representations and derivations, each characterizing independent but interacting modules (e.g., Case, thematic roles, binding, bounding, control, etc.). The theory was in general highly modular in character. Adding (p. 11) to this the idea that the modules and rule systems could differ slightly from language to language led to the formulation of the principles-and-parameters approach and thus to a very significant advance in understanding both cross-linguistic variation and what does not vary. It seemed possible to get a very direct view of UG through abstract language universals proposed in this way (see the discussion of UG and universals in the previous section, and chapter 14).

The architecture of the system also changed. Most significantly, work starting in the late 1960s and developing through the 1970s had demonstrated that many aspects of semantic interpretation (everything except argument structure) could not be read from deep structure. A more abstract version of surface structure, known as S-structure, was postulated, to which many of the aforementioned conditions on representations applied. There were still two interpretative components: Phonological Form (PF) and Logical Form (LF, intended as a syntactic representation able to be ‘read’ by semantic interpretative rules). The impoverished PS-rules generated D-structure (still the level of lexical insertion); move-α mapped D-structure to S-structure and S-structure to LF, the latter mapping ‘covert’ since PF was independently derived from S-structure with the consequence that LF was invisible/inaudible, its nature being inferred from aspects of semantic interpretation.

The GB approach, combined with the idea that modular principles and rule systems could be parametrized at the UG level, led to a new conception of language acquisition. It was now possible to eliminate the idea of searching a space of rule systems guided by an evaluation metric. In place of that approach, language acquisition could be formulated as searching a finite list of possible parameter settings and choosing those compatible with the PLD. This appeared to make the task of language acquisition much more tractable; see chapter 11 for discussion of the advantages of and problems for parameter-setting approaches to language acquisition.

Whilst the GB version of UG seemed to give rise to real advantages for linguistic ontogeny, it did not represent progress with the question of phylogeny. The very richness of the parametrized modular UG that made language acquisition so tractable in comparison with earlier models posed a very serious problem for language evolution. The GB model took both the residual rule systems and the modules, both parametrized, as well as the overall architecture of the system, to be both species-specific and domain-specific. This seemed entirely reasonable as it was very difficult to see analogs to such an elaborate system either in other species or in other human cognitive domains. The implication for language evolution was that all of this must have evolved in the history of our species, although how this could have happened assuming the standard neo-Darwinian model of descent through modification of the genome was entirely unclear.

The GB model thus provided an excellent model for the newly-burgeoning fields of comparative syntax and language acquisition (and their intersection in comparative language acquisition; see chapter 12). However, its intrinsic complexity raised conceptual problems and the evolution question was simply not addressed. These two points in a sense reduce to one: GB theory seemed to give us, for the first time, a workable UG in that comparative and acquisition questions could be fruitfully addressed in a unified (p. 12) way. There is, however, a deeper question: why do we find this particular UG in this particular species? If we take seriously the biolinguistic aspect of UG, i.e., the idea that UG is in some non-trivial sense an aspect of human biology, then this question must be addressed, but the GB model did not seem to point to any answers, or indeed to any promising avenues of research.

Considerations of this kind began to be addressed in the early 1990s, leading to the proposal in Chomsky (1993) for a minimalist program for linguistic theory. The minimalist program (MP) differs from GB in being essentially an attempt to reduce GB to barest essentials. The motivations for this were in part methodological, essentially Occam’s razor: reduce theoretical postulates to a minimum. But there was a further conceptual motivation: if we can reduce UG to its barest essentials we can perhaps see it as optimized for the task of relating sound and meaning over an unbounded domain. This in turn may allow us to see why we have this particular UG and not some other one; UG conforms to a kind of conceptual necessity in that it only contains what it must contain. Simplifying UG also means that less is attributed to the genome, and correspondingly there is less to explain when it comes to phylogeny. The task then was to subject aspects of the GB architecture to a ‘minimalist critique.’ This approach was encapsulated by the Strong Minimalist Thesis (SMT), the idea that computational system optimally meets conditions imposed by the interfaces (where ‘optimal’ should be understood in terms of maximally efficient computation). Developing and applying these ideas has been a central concern in linguistic theory for more than two decades.

Accordingly, where GB assumed four levels of representation for every sentence (D-Structure, S-Structure, Phonological Form (PF), and Logical Form (LF)), the MP assumes just the two ‘interpretative’ levels. It seems unavoidable to assume these, given the Basic Property. The core syntax is seen as a derivational mechanism that relates these two, i.e., it relates sound (PF) and meaning (LF) over an unbounded domain (and hence contains recursive operations).

One the most influential versions of the MP is that put forward in Chomsky (2001a). This makes use of three basic operations: Merge, Move, and Agree. Merge combines two syntactic objects to form a third object, a set consisting of the set of the two merged elements and their label. Thus, for example, verb (V) and object (O), may be merged to form the complex element {V,{V,O}}. The label of this complex element is V, indicating that V and O combine to form a ‘big V,’ or a VP. In this version of the MP labeling was stipulated as an aspect of Merge; more recently, Chomsky (2013, 2015) has proposed that Merge merely forms two-member sets (here {V, O}), with a separate labelling algorithm determining the labels of the objects so formed. The use of set-theoretic notation implies that V and O are not ordered by Merge, merely combined; the relative ordering of V and O is parametrized, and so order is handled by some operation distinct from the one which combines the two elements, and is usually thought to be a (PF) ‘interpretative’ operation of linearization (linear order being required for phonology, but not for semantic interpretation). In general, syntactic structure is built up by the recursive application of Merge.

Move is the descendent of the transformational component of earlier versions of generative syntax. Chomsky (2004a) proposed that Move is nothing more than a special (p. 13) case of Merge, where the two elements to be merged are an already constructed piece of syntactic structure S which non-exhaustively contains a given category C (i.e., [s … X … C … Y . .] where it is not the case that both X = 0 and Y = 0), and C. This will create the new structure {L{C,S}} (where L is the label of the resulting structure). Move, then, is a natural occurrence of Merge as long as Merge is not subjected to arbitrary constraints. We therefore expect generative grammars to have, in older terminology, a transformational component. The two cases of Merge, the one combining two distinct objects and the one combining an object from within a larger object to form a still larger one, are known as External and Internal Merge, respectively.

As its name suggests, Agree underlies a range of morphosyntactic phenomena related to agreement, case, and related matters. The essence of the Agree relation can be seen as a relation that copies ‘missing’ feature values onto certain positions, which intrinsically lack them but will fail to be properly interpreted (by PF or LF) if their values are not filled in. For example, in English a subject NP agrees in number with the verb (the boys leave/the boy leaves). Number is an intrinsic property of Nouns, and hence of NPs, and so we say that boy is singular and boys is plural. More precisely, let us say that (count) Nouns have the attribute [Number] with (in English) the values [{Singular, Plural}]. Verbs lack intrinsic number specification, but, as an idiosyncracy of English (shared by many, but not all, languages) have the [Number] attribute with no value. Agree ensures that the value of the subject NP is copied into the feature-matrix of the verb (if singular, this is realized in PF as the -s ending on present-tense verbs). It should be clear that Agree is the locus of a great deal of cross-linguistic morphosyntactic variation. Sorting out the parameters associated with the Agree relation is a major topic of ongoing research.

In addition to a lexicon, specifying the idiosyncratic properties of lexical items (including Saussurian arbitrariness), and an operation selecting lexical items for ‘use’ in a syntactic derivation, the operation of Merge (both External and Internal) and Agree form the core of minimalist syntax, as currently conceived. There is no doubt that this situation represents a simplification as compared to GB theory.

But there is more to the MP than just simplifying GB theory, as mentioned earlier. As we have seen, the question is: why do we find this UG, and not one of countless other imaginable possibilities? One approach to answering this question has been to attempt to bring to bear third-factor explanations of UG.

To see what this means, consider again the GB approach to fixing parameter values in language acquisition. A given syntactic structure is acquired on the basis of the combination of a UG specification (e.g., what a verb is, what an object is) and experience (exposure to OV or VO order). UG and experience are the first two factors making up adult competence then. But there remains the possibility that domain-general factors such as optimal design, computational efficiency, etc., play a role. In fact, it is a priori likely that such factors would play a role in UG; as Chomsky (2005:6) points out, principles of computational efficiency, for example, ‘would be expected to be of particular significance for computational systems such as language.’

Factors of this kind make up the third factor of the FLB in the sense of Hauser, Chomsky, and Fitch (2002); see chapter 6. In these terms, the MP can be viewed as asking (p. 14) the question ‘How far can we progress in showing that all … language-specific technology is reducible to principled explanation, thus isolating core processes that are essential to the language faculty’ (Chomsky 2005:11). The more we can bring third-factor properties into play, the less we have to attribute to pure, domain-specific, species-specific UG, and the MP leads us to attribute as little as possible to pure UG.

There has been some debate regarding the status of parameters of UG in the context of the MP (see Newmeyer 2005; Roberts and Holmberg 2010; Boeckx 2011a, 2014; and chapters 14 and 16). However, as I will suggest in section 1.4, it is possible to rethink the nature of parameters in a such a way that they are naturally compatible with the MP (see again chapter 14). If so, then we can continue to regard language acquisition as involving parameter-setting as in the GB context; the difference in the MP must be that the parametric options are in some way less richly prespecified than was earlier thought; see section 1.4 and chapters 14 and 16 on this point.

The radical simplification of UG that the MP has brought about has potentially very significant advantages for our ability to understand language evolution. As Berwick and Chomsky (2016:40) point out, the MP boils down to a tripartite model of language, consisting of the core computational component (FLN in the sense of Hauser, Chomsky, and Fitch 2002), the sensorimotor interface (which ‘externalizes’ syntax) and the conceptual-intentional interface, linking the core computational component to thought. We can break down the question of evolution into three separate questions therefore, and each of these three components may well have a distinct evolutionary history. Furthermore, now echoing Hauser, Chomsky and Fitch’s FLB, it is highly likely that aspects of both interfaces are shared with other species; after all, it is clear that other species have vocal learning and production (and hearing) and that they have conceptual-intentional capacities (if perhaps rudimentary in comparison to humans). So the key to understanding the evolution of language may lie in understanding two things: the origin of Merge as the core operation of syntax, and the linking-up of the three components to form FLB. Berwick and Chomsky (2016:157–164) propose a sketch of an account of language evolution along these lines; see section 1.6. Whatever the merits and details of that account, it is clear that the MP offers greater possibilities for approaching the phylogeny of UG than the earlier approaches did.

This concludes our sketchy outline of the development of UG. Essentially we have a shift from complex rule systems to simplified rule systems interacting in a complex way with a series of conditions on representations and derivations, both in a multi-level architecture of grammar, followed by a radical simplification of both the architecture of the system and the conditions on representations and derivations (although the latter have not been entirely eliminated). The first shift moved in the direction of explanatory adequacy in the sense of Chomsky (1964), i.e., along with the principles and parameters formulation it gave us a clear way to approach both acquisition and cross-linguistic comparison (see chapter 5 for extensive discussion of the notion of explanatory adequacy). The second shift is intended to take us beyond explanatory adequacy, in part by shifting some of the explanatory burden away from UG (the first factor in (1)) towards third factors of various kinds (see chapter 6). How successfully (p. 15) this has been or is being achieved is an open question at present. One area where, at least conceptually, there is a real prospect of progress, is in language phylogeny; this question was entirely opaque in the first two stages of UG, but with the move to minimalist explanation it may be at least tractable (see again Berwick and Chomsky 2016 and section 1.6).

As a final remark, it is worth considering an interesting terminological proposal made by Rizzi in note 5 to chapter 5. He suggests that we may want to distinguish ‘UG in the narrow sense’ from ‘UG in the broad sense,’ deliberately echoing Hauser, Chomsky, and Fitch’s FLN–FLB distinction. UG in the narrow sense is just the first factor of (1), while UG in the broad sense includes third factors as well; Rizzi points out that:

As the term UG has come, for better or worse, to be indissolubly linked to the core of the program of generative grammar, I think it is legitimate and desirable to use the term ‘UG in a broad sense’ (perhaps alongside the term ‘UG in a narrow sense’), so that much important research in the cognitive study of language won’t be improperly perceived as falling outside the program of generative grammar.

Furthermore, Tsimpli, Kambanaros, and Grohmann find it useful to distinguish ‘big UG’ from ‘small UG’ in their discussion of certain language pathologies in chapter 19. The definition of UG given in (2) is in fact ambiguous between these two senses of UG, in that it states that UG ‘characterizes the genetically determined aspect of the human capacity for grammatical knowledge.’ This genetically determined property could refer narrowly to the first factor or more broadly to the both the first and the third factors; the distinction is determined by domain-specificity in that the first factor is specific to language and the third more general. Henceforth, as necessary, I will refer to first-factor-only UG as ‘small’ UG and first-plus-third-factor UG as ‘big’ UG.

1.1.3 The Organization of this Book

The chapters to follow are grouped into five parts: the philosophical background, linguistic theory, language acquisition, comparative syntax, and wider issues. With the possible exception of the last, each of these headings is self-explanatory; the last heading is a catch-all intended to cover areas of linguistics for which UG is relevant but which do not fall under core linguistic theory, language acquisition, or comparative syntax. The topics covered in this part are by no means exhaustive, but include creoles (chapter 17), diachrony (chapter 18), language pathology (chapter 19), Sign Language (chapter 20), and the question of whether animals have UG (chapter 21). Language evolution is notably missing here, a matter I will comment on below.

The remainder of this introductory chapter deals with each part in turn. The goal here is not to provide chapter summaries, but to connect the chapters and the parts together, bringing out the central themes, and also to touch on areas that are not covered in the chapters. The latter goal is non-exhaustive, and I have selected issues which seem to be (p. 16) of particular importance and relevance to what is covered, although inevitably some topics are left out. The topic of universal grammar is so rich that even a book of this size is unable to do it full justice.

1.2 Philosophical Background

Part I tries to cover aspects of the philosophical background to UG. As already mentioned, the concept of universal grammar (in a general, non-technical sense) has a long pedigree. Like so many venerable ideas in the Western philosophical tradition it has its origins in Plato. Indeed, following Whitehead’s famous comment that ‘the safest general characterization of the European philosophical tradition is that it consists of a series of footnotes to Plato’ (Whitehead 1979:39), one could think of this entire book as a long, linguistic footnote. Certainly, the idea of universal grammar deserves a place in the European (i.e., Western) philosophical tradition (see again Itkonen 2013; Maat 2013).

The reason for this is that positing universal grammar, and its technical congener UG as defined in (2) in section 1.1.1, makes contact with several important philosophical issues and traditions. First and most obvious (and most discussed either explicitly or implicitly in the chapters to follow), postulating that human grammatical knowledge has a genetically determined component makes contact with various doctrines of innate ideas, all of which ultimately derive from Plato. The implication of this view is that the human newborn is not a blank slate or tabula rasa, but aspects of knowledge are determined in advance of experience. This entails a conception of learning that involves the interaction of experience with what is predetermined, broadly a ‘rationalist’ view of learning. It is clear that all work on language acquisition that is informed by UG, and the various models of learning discussed by Fodor and Sakas in chapter 11, are rationalist in this sense. The three-factors approach makes this explicit too, while bringing in non-domain-specific third factors as well (some of which, as we have already mentioned, may be ‘innate ideas’ too). Hence, positing UG brings us into contact with the rich tradition of rationalist philosophy, particularly that of Continental Europe of the early modern period.

Second, the fact that UG consists of a class of generative grammars, combined with the fact that such grammars make infinite use of finite means owing to their recursive nature, means that we have an account of what in his earlier writings Chomsky referred to as ‘the creative aspect of language use’ (see, e.g., Chomsky 1964:7–8). This relates to the fact that our use of language is stimulus-free, but nonetheless appropriate to situations. In any given situation we are intuitively aware of the fact that there is no external factor which forces us to utter a given word, phrase, or sentence. Watching a beautiful sunset with a beloved friend, one might well be inclined to say ‘What a beautiful sunset!’ but nothing at all forces this, and such linguistic behavior cannot be predicted; one might equally well say ‘Fancy a drink after this?’ or ‘I wonder who’ll win the match tomorrow’ or ‘Wunderbar’ or nothing at all, or any number of other things. (p. 17) The creative aspect of language use thus reflects our free will very directly, and positing UG with a generative grammar in the genetic endowment of all humans makes a very strong claim about human freedom (here there is a connection with Chomsky’s political thought), while at the same time connecting to venerable philosophical problems.

Third, if I-language is a biologically given capacity of every individual, it must be physically present somewhere. This raises the issue, touched on in the previous section, of how abstract mental capacities can be instantiated in brain tissue. On this point, Berwick and Chomsky (2016:50) say:

We understand very little about how even our most basic computational operations might be carried out in neural ‘wetware.’ For example, … the very first thing that any computer scientist would want to know about a computer is how it writes to memory and reads from memory—the essential operation of the Turing machine model and ultimately any computational device. Yet we do not really know how this most foundational element of computation is implemented in the brain….

Of course, future scientific discoveries may remove this issue from the realm of philosophy, but the problem is an extremely difficult one. One possibility is to deny the problem by assuming that higher cognitive functions, including I-language, do not in fact have a physical instantiation, but instead are objects of a metaphysically different kind: abstract, non-physical objects, or res cogitans (‘thinking stuff’) in Cartesian terms. This raises issues of metaphysical dualism that pose well-known philosophical problems, chief among them the ‘mind–body problem.’ (It is worth pointing out, though, that Kripke 1980 gives an interesting argument to the effect that we are all intuitively dualists). So here UG-based linguistics faces central questions in philosophy of mind. As Hinzen puts it in chapter 2, section 2.2:

The philosophy of mind as it began in the 1950s (see e.g. Place (1956); Putnam (1960)) and as standardly practiced in Anglo-American philosophical curricula has a basic metaphysical orientation, with the mind–body problem at its heart. Its basic concern is to determine what mental states are, how the mental differs from the physical, and how it really fits into physical nature.

These issues are developed at length in that chapter, particularly in relation to the doctrine of ‘functionalism,’ and so I will say no more about them here.

Positing a mentally-represented I-language also raises sceptical problems of the kind discussed in Kripke (1982). These are deep problems, and it is uncertain how they can be solved; these issues are dealt with in Ludlow’s discussion in chapter 3, section 3.4.

The possibility that some aspect of grammatical knowledge is domain-specific raises the further question of modularity. Fodor (1983) proposed that the mind consists of various ‘mental organs’ which are ontogenetically and phylogenetically distinct. I-language might be one such organ. According to Fodor, mental modules, which subserve the domain-general central information processing involved in constructing beliefs and intentions, have eight specific properties: (i) domain specificity; (ii) informational (p. 18) encapsulation (modules operate with an autonomous ‘machine language’ without reference to other modules or central processors); (iii) obligatory firing (modules operate independently of conscious will); (iv) they are fast; (v) they have shallow outputs, in that their output is very simple; (vi) they are of limited accessibility or totally inaccessible to consciousness; (vii) they have a characteristic ontogeny and, finally, (viii) they have a fixed neural architecture. UG, particularly in its second-stage GB conception, has many of these properties: (i) it is domain-specific by assumption; (ii) the symbols used in grammatical computation are encapsulated in that they appear to be distinct from all other aspects of cognition; (iii) linguistic processing is involuntary—if someone speaks to you in your native language under normal conditions you have no choice in understanding what is said; (iv) speed: linguistic processing takes place almost in real time, despite its evident complexity; (v) shallow ouputs: the interpretative components seem to make do with somewhat impoverished respresentations; (vi) grammatical knowledge is clearly inaccessible to consciousness; (vii) first-language acquisition studies have shown us that I-languages have a highly characteristic ontogeny (see chapter 12); (viii) I-language may have a fixed neural architecture, as evidence regarding recovery from various kinds of aphasia in particular seems to show (see chapter 19).

So there appears to be a case for regarding I-language as a Fodorian mental module. But this conclusion does not comport well with certain leading ideas of the MP, notably the role of domain-general factors in determining both I-language and ‘UG in the broad sense’ as defined by Rizzi in chapter 5 and briefly discussed earlier. In particular, points (i), (ii), and (v) are questionable on a three-factor approach to I-language and UG. Further, the question of the phylogeny of modules is a difficult one; the difference between second-phase and third-phase UG with respect to the evolution of language was discussed in section 1.1.2 and the essentially the same points hold here.

The three-factor minimalist view does not preclude the idea of a language module, but this module would, on this view, be less encapsulated and domain-specific than was previously thought and less so than a full-fledged Fodorian module (such as the visual-processing system, as Fodor argues at length). Pylyshyn (1999) argued that informational encapsulation was the real signature property of mental modules; even a minimalist I-language/UG has this property, to the extent that the computational system uses symbols such as ‘Noun,’ ‘Verb,’ etc., which seem to have no correlates outside language (and in fact raise non-trivial questions for phylogeny). So we see that positing UG raises interesting questions for this aspect of philosophy of mind.

Finally, if there is a ‘language organ’ or ‘language module’ of some kind in the mind a natural question arises by extension concerning what other modules there might be, and what they might have in common with language. Much discussion of modularity has focussed on visual processing, as already mentioned, but other areas spring to mind, notably music. The similarities between language and music are well-known and have often been commented on (indeed, Darwin 1871 suggested that language might have its origin in sexual selection for relatively good singing; Berwick and Chomsky 2016:3 call this his ‘Caruso’ theory of language evolution). First, both music and language are universal in human communities. Concerning music, Cross and Woodruff (2008:3) point (p. 19) out that ‘all cultures of which we have knowledge engage in something which, from a western perspective, seems to be music.’ They also observe that ‘the prevalence of music in native American and Australian societies in forms that are not directly relatable to recent Eurasian or African musics is a potent indicator that modern humans brought music with them out of Africa’ (2008:16). Mithen (2005:1) says ‘appreciation of music is a universal feature of humankind; music-making is found in all societies.’ According to Mithen, Blacking (1976) was the first to suggest that music is found in all human cultures (see also Bernstein 1976; Blacking 1995).

Second, music is unique to humans. Regarding music, Cross et al. (n.d.:7–8) conclude ‘Overall, current theory would suggest that the human capacities for musical, rhythmic, behaviour and entrainment may well be species-specific and apomorphic to the hominin clade, though … systematic observation of, and experiment on, non-human species’ capacities remains to be undertaken.’ They argue that entrainment (coordination of action around a commonly perceived, abstract pulse) is a uniquely human ability intimately related to music. Further, they point out that, although various species of great apes engage in drumming, they lack this form of group synchronization (12).

Third, music is readily acquired by children without explicit tuition. Hannon and Trainor (2007:466) say:

just as children come to understand their spoken language, most individuals acquire basic musical competence through everyday exposure to music during development … Such implicit musical knowledge enables listeners, regardless of formal music training, to tap and dance to music, detect wrong notes, remember and reproduce familiar tunes and rhythms, and feel the emotions expressed through music.

Fourth, both music and language, although universal and rooted in human cognition, diversify across the human population into culture-specific and culturally-sanctioned instantiations: ‘languages’ and ‘musical traditions/genres.’ As Hannon and Trainor (2007:466) say: ‘Just as there are different languages, there are many different musical systems, each with unique scales, categories and grammatical rules.’ Our everyday words for languages (‘French,’ ‘English,’ etc.) often designate socio-political entities. Musical genres, although generally less well defined and less connected to political borders than differences among languages, are also cultural constructs. This is particularly clear in the case of highly conventionalized forms such as Western classical music, but it is equally true of a ‘vernacular’ form such as jazz (in all its geographical and historical varieties).

So a question one might ask is: is there a ‘music module’? Another possibility, full discussion of which would take us too far afield here, is whether musical competence (or I-music) is in some way parasitic on language, given the very general similarities between music and language; see, for slightly differing perspectives on this question, Lehrdal and Jackendoff (1983); Katz and Pesetsky (2011); and the contributions to Rebuschat, Rohrmeier, Hawkins, and Cross (2012). Other areas to which the same kind of reasoning regarding the basis of apparently species-specific, possibly domain-specific, knowledge as applied to language in the UG tradition could be relevant include (p. 20) mathematics (Dehaene 1997); morality (Hauser 2008); and religious belief (Boyer 1994). Furthermore, Chomsky has frequently discussed the idea of a human science-forming capacity along similar lines (Chomsky 1975, 1980, 2000a), and this idea has been developed in the context of the philosophical problems of consciousness by McGinn (1991, 1993). The kind of thinking UG-based theory has brought to language could lead to a new view of many human capacities; the implications of this for philosophy of mind may be very significant.

As already pointed out, the postulation of UG situates linguistic theory in the rationalist tradition of philosophy. In the Early Modern era, the chief rationalist philosophers were Descartes, Kant, and Leibniz. The relation between Cartesian thought and generative linguistics is well known, having been the subject of a book by Chomsky (1966/2009), and is treated in detail by McGilvray in chapter 4. McGilvray also discusses the lesser-known Cambridge Neo-Platonists, notably Cudworth, whose ideas are in many ways connected to Chomsky’s. Accordingly, here I will leave the Cartesians aside and briefly discuss aspects of the thought of Kant and Leibniz in relation to UG.

Bryan Magee interviewed Chomsky on the BBC’s program Men of Ideas in 1978. In his remarkably lucid introduction to Chomsky’s thinking (see Magee 1978:174–5), Magee concludes by saying that Chomsky’s views on language, particularly language acquisition (essentially his arguments for UG construed as in (2)), sound ‘like a translation in linguistic terms of some of Kant’s basic ideas’ (Magee 1978:175). When put the question that ‘you seem to be redoing, in terms of modern linguistics, what Kant was doing. Do you accept any truth in that?’ (Magee 1978:191), Chomsky responds as follows:

I not only accept the truth in it, I’ve even tried to bring it out, in a certain way. However I haven’t myself referred specifically to Kant very often, but rather to the seventeenth-century tradition of the continental Cartesians and the British Neoplatonists, who developed many ideas that are now much more familiar through the writings of Kant: for example the idea of experience conforming to our mode of cognition. And, of course, very important work on the structure of language, on universal grammar, on the theory of mind, and even on liberty and human rights grew from the same soil.

(Magee 1978:191)

Chomsky goes on to say that ‘this tradition can be fleshed out and made explicit by the sorts of empirical inquiry that are now possible.’ At the same time, he points out that we now have ‘no reason to accept the metaphysics of much of that tradition, the belief in a dualism of mind and body’ (Magee 1978:191). (The interview can be found at

Aside from a generally rationalist perpective, there are possibly more specific connections between Chomsky and Kant. In chapter 2, Hinzen mentions that grammatical knowledge could be seen as a ‘prime instance’ of Kant’s synthetic a priori. Kant made two distinctions concerning types of judgments, one between the analytic and synthetic judgments and one between a priori and a posteriori judgments. Analytic judgments are true in virtue of their formulation (a traditional way to say this is to say that the predicate (p. 21) is contained in the subject), and as such do not add to knowledge beyond explaining or defining terms or concepts. Synthetic judgments are true in virtue of some external justification, and as such they add to knowledge (when true). A priori judgments are not based on experience, while a posteriori judgments are. Given the two distinctions, there are four logical possibilities. Analytic a priori judgments include such things as logical truths (‘not-not-p’ is equivalent to ‘p’, for example). Analytic a posteriori judgments cannot arise, given the nature of analytic judgments. Synthetic a posteriori judgments are canonical judgments about experience of a straightfoward kind. But synthetic a priori judgments are of great interest, because they provide new information but independently of experience. Our grammatical knowledge is of this kind; there is more to UG (however reduced in a minimalist conception) than analytic truths, in that the nature of Merge, Agree, and so forth could have been otherwise. On the other hand, this knowledge is available to us independent of experience in that the nature of UG is genetically determined. In particular given the SMT, the minimalist notion of UG having the form it has as a consequence of ‘(virtual) conceptual necessity’ may lead us question this, as it would place grammatical knowledge in the category of the analytic a priori, a point I will return to in section 1.3.

A further connection between Kant and Chomsky lies in the notion of ‘condition of possibility.’ Kant held that we can only experience reality because reality conforms to our perceptual and cognitive capacities. Similarly, one could see UG as imposing conditions of possibility on grammars. A human cannot acquire a language that falls outside the conditions of possibility imposed by UG, so grammars are the way they are as a condition of possibility for human language.

Turning now to Leibniz, I will base these brief comments on the fuller exposition in Roberts and Watumull (2015). Chomsky (1965:49–52) quotes Leibniz at length as one of his precursors in adopting the rationalist doctrine of innate ideas. While this is not in doubt, in fact there are a number of aspects of Leibniz’s thought which anticipate generative grammar more specifically, including some features of current minimalist theory. Roberts and Watumull sketch out what these are, and here I will summarize the main points, concentrating on two things: Leibniz’s rationalism in relation to language acquisition and his formal system.

Roberts and Watumull draw attention to the following quotation from Chomsky (1967b):

In the traditional view a condition for … innate mechanisms to become activated is that appropriate stimulation must be presented…. For Leibniz, what is innate is certain principles (in general unconscious), that ‘enter into our thoughts, of which they form the soul and the connection.’ ‘Ideas and truths are for us innate as inclinations, dispositions, habits, or natural potentialities.’ Experience serves to elicit, not to form, these innate structures…. It seems to me that the conclusion regarding the nature of language acquisition [as reached in generative grammar] are fully in accord with the doctrine of innate ideas, so understood, and can be regarded as providing a kind of substantiation and further development of this doctrine.

(Chomsky 1967b:10)

(p. 22) Roberts and Watumull point out that the idea that language acquisition runs a form of minimum description length algorithm (see chapter 11) was implicit in Leibniz’s discussion of laws of nature. Leibniz runs a Gedankenexperiment in which points are randomly distributed on a sheet of a paper and observes that for any such random distribution it would be possible to draw some ‘geometrical line whose concept shall be uniform and constant, that is, in accordance with a certain formula, and which line at the same time shall pass through all of those points’ (Leibniz, in Chaitin 2005:63). For any set of data, a general law can be constructed. In Leibniz’s words, the true theory is ‘the one which at the same time [is] the simplest in hypotheses and the richest in phenomena, as might be the case with a geometric line, whose construction was easy, but whose properties and effects were extremely remarkable and of great significance’ (Leibniz, in Chaitin 2005:63). This is the logic of program size complexity or minimum description length, as determined by an evaluation metric of the kind discussed earlier and in chapter 11.

This point is fleshed out in relation to the following quotation from Berwick (1982:6, 7, 8):

A general, formal model for the complexity analysis of competing acquisition … demands [can be] based on the notion of program size complexity[—]the amount of information required to ‘fix’ a grammar on the basis of external evidence is identified with the size of the shortest program needed to ‘write down’ a grammar. [O]ne can formalize the usual linguistic approach of assuming that there is some kind of evaluation metric (implicitly defined by a notational system) that equates ‘shortest grammars’ with ‘simple grammars,’ and simple grammars with ‘easily acquired grammars.’ [Formally, this] model identifies a notational system with some partial recursive function Φi (a Turing machine program) and a rule system as a program p for generating an observed surface set of data…. On this analysis, the more information that must be supplied to fix a rule system, the more marked or more complicated that rule system is…. This account thus identifies simplest with requiring the least additional information for specification[—]simplest = minimum extra information…. The program [model] also provides some insight into the analysis of the ontogenesis of grammar acquisition…. Like any computer program, a program for a rule system will have a definite control flow, corresponding roughly to an augmented flowchart that describes the implicational structure of the program. The flow diagram specifies … a series of ‘decision points’ that actually carry out the job of building the rule system to output. [The] implicational structure in a developmental model corresponds rather directly to the existence of implicational clusters in the theory of grammar, regularities that admit short descriptions. [T]his same property holds more generally, in that all linguistic generalizations can be interpreted as implying specific developmental ‘programs.’

They further connect this conception of language acquisition as being driven by a form of program size complexity/minimum description length with precursors in Leibniz’ thought to the idea developed in Biberauer (2011, 2015) and Roberts (2012) that parameters organize themselves into hierarchies which are traversed in language acquisition following two general third-factor conditions, Feature Economy (posit as few (p. 23) features as possible, consistent with PLD) and Input Generalization (maximize available features); see section 1.4 and chapter 14 for more on these conditions and on parameter hierarchies in general. As they point out ‘the complexities of language acquisition and linguistic generalizations are determined by simple programs—a Leibnizian conclusion evident in the theory of parameter hierarchies’ (Roberts and Watumull 2015:8).

Leibniz developed a formal system for combining symbols, as he believed that it was both possible and necessary to formalize the rules of reasoning. In this, he anticipated modern formal logic by well over a century. In the History of Western Philosophy, Russell (1946/2004:541) states that ‘Leibniz was a firm believer in the importance of logic, not only in its own sphere, but as the basis of metaphysics. He did work on mathematical logic which would have been enormously important if he had published it; he would, in that case, have been the founder of mathematical logic, which would have become known a century and a half sooner than it did in fact.’

In his formal system, Leibniz introduced an operator he wrote as ‘⊕’, which Roberts and Watumull dub ‘Lerge,’ owing to its formal similarities to Merge.

  1. (3) Introduction

  2. (4) Introduction

So ⊕ is a function that takes two arguments, α and β, and from them constructs the set {α,β}. In other words it is exactly like Merge. Roberts and Watumull then invoke Leibniz’s principle of the Identity of Indiscernibles, concluding that if Merge and Lerge are formally indiscernible, they are identical: Merge is Lerge. It thus appears that, in addition to developing central ideas of rationalist philosophy, Leibniz’s formal theory anticipated aspects of modern set theory and Merge.

In this section, I have sketched some of the issues that link UG to philosophy, primarily philosophy of mind and philosophy of language. These and many other issues are treated in more detail by Hinzen in chapter 2, on philosophy of mind, and by Ludlow in chapter 3, on philosophy of language. In chapter 4 McGilvray discusses the historical and conceptual links between Cartesian philosophy and generative grammar.

1.3 Linguistic Theory

Part II of this volume deals with general linguistic theory in relation to UG. Here the central concept is that of explanation, and how UG contributes to that. In chapter 5, Rizzi discusses the classical conception of explanatory adequacy as first put forward in Chomsky (1964) and developed extensively in the second stage of UG work. Chapter 6 focuses exclusively on third-factor explanations, concentrating on domain-general aspects of linguistic structure. In chapter 7, Newmeyer contrasts formal and functional approaches to explanation. Chapters 8 and 9 deal with two central areas of linguistic (p. 24) theory, phonology and semantics, discussing how the concept of UG contributes to explanations in these areas.

The classical notion of explanatory adequacy can be described as follows. Chomsky (1964) made the distinction between observational, descriptive, and explanatory adequacy. Regarding observational adequacy, he says:

Suppose that the sentences

  1. (i) John is easy to please.

  2. (ii) John is eager to please.

are observed and accepted as well-formed. A grammar that achieves only the level of observational adequacy would … merely note this fact one way or another (e.g., by setting up appropriate lists).

(Chomsky 1964:34)

But of course the natural question to ask at this point is why these sentences are different in the way we can observe. To answer this question, we need to move to the level of descriptive adequacy. On this, Chomsky says ‘To achieve the level of descriptive adequacy, however, a grammar would have to assign structural descriptions indicating that in (i) John is the direct object of please, while in (ii) it is the logical subject of please.’ In fact, we could assign the two examples the structural descriptions in (5):

  1. (5) Introduction

These representations capture the basic difference between the two sentences alluded to in the quotation from Chomsky just given, as well as a number of other facts (e.g., that the subject of the infinitive in (5a) is arbitrary in reference, while the notional object of please in (5b) is arbitrary in reference).

But linguistic theory must also tell us why these structural descriptions are the way they are. For example, the notations used in (5), CP, PRO, AP, etc., must be explicated. This brings us to explanatory adequacy: we have to explain how the structural descriptions are determined by UG. If we can do this, we also explain how, in principle, the grammatical properties indicated by representations like those in (5) are acquired (and hence how we, as competent adult native speakers, have the intuitions we have).

In the second stage of UG, the development of the principles-and-parameters approach made it possible to see the grammar of a given language as an instantiation of UG with parameters fixed. So the properties of the English easy-to-please construction can ultimately be explained in these terms. For example, among the parameters relevant to determining the representation of this construction in (5a) are the following: CP follows the head A (rather than preceding it, as in a head-final language; see chapter 14, section 14.3, and below on the head parameter); English has infinitives, and indeed infinitives of this type; arbitrary null pronouns can appear in this context with the set of properties that we observe them to have, the trace is a wh-trace, etc. (p. 25)

More generally, the second-stage UG approach offered, really for the first time, a plausible framework in which to capture the similarities and differences among languages within a rigorous formal theory. It also offered a way to approach language acquisition in terms of parameter-setting that was, at least in principle, very straightforward. This can be most clearly seen if we take the view that parametric variation exhausts the possible variation among languages and further assume that there is a finite set of binary parameters. In that case, a given language’s set of parameter settings can be reduced to a binary number n. Concomitantly, the task of the language acquirer is to extrapolate n from the PLD. Abstractly, then, the learner can be seen as a function from a set of language tokens (a text, in the sense of formal learning theory stemming from Gold 1967; see chapter 11) to n. In this way, we have the conceptual framework for the attainment of explanatory adequacy.

As we have already seen, the adoption of the goals of the MP requires us to rethink the principles-and-parameters approach. First, what we might term ‘methodological minimalism’ requires us to reduce our theoretical machinery as much as possible. This has entailed the elimination of many UG principles, and hence the traditional area of parametrization. Second, there is the matter of ‘substantive minimalism.’ This goes beyond merely a severe application of Occam’s Razor by asking the question of why UG, with the properties we think it has, is that way and not some other way. Here, the third-factor driven approach offers intriguing directions towards an answer. We want to say that I-languages are the way they are because of the way the three factors—‘small’ UG, PLD and the acquirer’s characteristic mode of interaction with it, and third-factor strategies of computational optimization, as well as physical laws—interact. If this can be achieved, and Lohndal and Uriagereka point in some interesting directions in chapter 6, then, in an obvious sense, this takes us ‘beyond explanatory adequacy,’ to a potentially deeper level of understanding of the general nature of I-language and therefore of UG.

In this connection, Guardiano and Longobardi’s discussion of notions of explanation in chapter 16, section 16.2, is instructive. They begin their chapter by listing the questions that any parametrized theory of UG must answer, as follows (their (1)):

  1. (6) Introduction

As they point out, (6a) relates to Chomsky’s (1964) notion of explanatory adequacy, as just introduced, because it concerns the form of UG (on a standard construal of ‘where’ parameters are to be stated, see chapter 14 and section 14.1.5 on this). As they point out, and as both chapters 14 and 15 discuss, the principles-and-parameters approach has attained ‘an intermediate level that we can call crosslinguistic descriptive adequacy, i.e., in accounting for grammatical diversity across languages’ (Guardiano and Longobardi, chapter 16, p. 378), but they echo one of the conclusions of chapter 11 that there is at present no fully adequate account of how parameter values are set on the basis of PLD in language acquisition.

(p. 26) They further point out that (6b) and (6d) go ‘beyond explanatory adequacy,’ in that they are concerned with what Longobardi (2003) called ‘evolutionary adequacy’ and ‘historical adequacy,’ respectively. Somewhat different answers to question (6b) are discussed in chapters 14 and 16. Finally, as they point out, the Parametric Comparison Method (PCM), which is the main subject matter of chapter 16, is designed precisely to answer question (6d), a question not addressed in earlier work on principles and parameters.

Another way to approach the question of what it may mean to go beyond explanatory adequacy is to think in terms of conceptual necessity, again bearing in mind the SMT. For example, it seems conceptually necessary to posit the two interpretative components of the grammar, since the fundamental aspect of language is that it relates sound and meaning, and of course we need a device which actually generates the structures. So the tripartite architecture of the grammar is justified on these grounds, while ‘internal’ levels of representation such as the earlier deep/D-structure and surface/S-structure are not required on the same grounds. One way to think about conceptual necessity, and indeed substantive minimalism more generally, is to ask oneself the following question: How different from what it actually is could UG have been? Taking a cue from Stephen Jay Gould: if we could rerun the tape of language evolution, how differently could UG turn out? Note that both first- and second-stage UG would allow it to be dramatically different (different systems of rule systems; different parameters and conditions on representations/derivations). As already suggested, the tripartite organization seems conceptually necessary, and Merge seems conceptually necessary in that there must be some kind of combinatorial device and simply combining elements to form sets is about the simplest combinatorial operation possible (Watumull 2015 argues that binary Merge is optimally simple in that Merge must combine more than one thing and binary combination is the simplest form of combination). The grammatical categories and features, as well as operations like Agree, however, seem less obviously conceptually necessary. Here the key issue is arguably that the nature and inventory of grammatical categories and features is poorly understood; in the current state of knowledge we barely have an extensional definition of them, still less an intensional one. However, it is clear in principle that the criterion of conceptual necessity may take us to a higher level of explanation.

In this connection, it is worth reconsidering the Kantian synthetic a priori again. If we could successfully reduce the whole of UG to actual conceptual necessity, presumably the system would no longer represent this category of knowledge, but would instead fall under the analytic a priori. This consequence may be of philosophical interest, and it does rather change the complexion of how we might think about grammatical knowledge.

In this context, a related question (thinking again of rerunning the tape of evolution) is: how universal is UG? Montague, as already briefly mentioned, had what at first sight seems to be a different concept from Chomsky’s. For Montague, ‘universal grammar’ refers to a general theory of syntax and semantics, which could apply to any kind of system capable of expressing true or false propositions. So, as Montague (1970) says: ‘There is in my opinion no important theoretical difference between natural languages and the artificial languages of logicians; indeed, I consider it possible to comprehend the syntax (p. 27) and semantics of both kinds of languages within a single natural and mathematically precise theory’ (Montague’s position echoes that of Bar-Hillel 1954 discussed in section 1.1.1, although it is significantly stronger). For Montague natural languages could be described, and, perhaps in some sense explained, as formal systems. This conception of universal grammar is mathematical, rather than psychological or biological; as such, it is universal, and the ontological question of what kind of thing universal grammar is becomes a case of the more general question about mathematical objects (which, as we have seen, may have a Platonic answer). Like any mathematical system, Montague’s universal grammar would have to be maximally simple and elegant. This approach appears to be very different from, and maybe incompatible with, the idea of a biologically given UG. But if, given the MP, the biological UG has the properties it has out of conceptual necessity then it might be closer to Montague’s conception than would at first sight appear. Could evolution give rise to a mathematically optimal system?

Here we approach the very general question of convergent evolution. Conway-Morris (2003) puts forward a very strongly convergentist position, arguing that the development of complex, intelligent life forms is all but inevitable given the right initial conditions (which may be cosmically very rare, hence the eerie silence of extraterrestrials). In his words:

Evolution is indeed constrained, if not bound. Despite the immensity of biological hyperspace I shall argue that nearly all of it must remain for ever empty, not because our chance drunken walk failed to wander into one domain rather than another but because the door could never open, the road was never there, the possibilities were from the beginning for ever unavailable.

(Conway-Morris 2003:12)

Conway-Morris hardly discusses language, but he does say that ‘a universal grammar may have evolved as a result of natural selection that optimizes the exploration of a ‘language space’ in terms of rule-based systems’ (Conway-Morris 2003:253). From an MP perspective, one could interpret this comment as supporting the role of third factors in language evolution. As Conway-Morris points out at length, a consequence of his view is that extra-terrestrial biology—where it exists at all—will be very similar to terrestrial biology. If language evolved in the same way as other biological systems (and this does not have to imply that natural selection is the only shaping force; see the discussion in Berwick and Chomsky 2016:16–49, then it would follow from Conway-Morris’ position in a way that is fully consistent with the MP notion of conceptual necessity, that extraterrestrial UG would be very similar to terrestrial UG. UG may be more universal than we once thought.

1.4 Language Acquisition

The idea that we may learn something about universal grammar by observing children acquiring language is also an old one. Herodotus relates the story of the Pharoah (p. 28) Psammetichus (or Psamtik) I, who sought to discover the origin of language by giving two newborn babies to a shepherd, with the instructions that no one should speak to them, but that the shepherd should feed and care for them while listening to determine their first words. When one of the children said ‘Bekos’ the shepherd concluded that the word was Phrygian because that was the Phrygian word for ‘bread.’ Hence it was concluded that Phrygian was the original language. It is not known whether the story is true, but it represents a valid line of thought regarding the determination of the origin of language and possibly of universal grammar, as well as a very strong version of the innateness hypothesis. More recently, children brought up without normal interaction with adults have also been alleged to provide evidence for some version of the idea that the language faculty is innate (see, e.g., Rymer 1994 on the ‘wild child’ Genie; as well as Kegl, Senghas, and Coppola 1999 on the development of a new sign language among a group of deaf children).

In the present context, the importance and relevance of the circumstances of language acquisition are self-evident. Accordingly, chapters 10 through 13, making up Part III of the volume, deal explicitly with this topic from different perspectives, and the topic is taken up in several other chapters too (see in particular chapters 5, 14, 17, 18, and 19). Chapter 10 deals with the most important argument for UG, the argument from the poverty of the stimulus. The poverty-of-the-stimulus argument is based on the observation that there is a significant ‘gap’ between what seems to be the experience facilitating first-language acquisition and the nature of the linguistic knowledge which results. An early statement of this argument is given by Chomsky (1965:58) as follows:

A consideration of the character of the grammar that is acquired, the degenerate quality and narrowly limited extent of the available data, the striking uniformity of the resulting grammars, and their independence of intelligence, motivation, and emotional state, over wide ranges of variation, leave little hope that much of the structure of the language can be learned by an organism initially uninformed as to its general character.

Similarly, in his very lucid introduction to the general question of the nature of the learning problem for natural languages, Niyogi (2006) points out that, although children’s learning resources are finite, the target is not completely ‘unknown’: it falls within a limited hypothesis space. As Niyogi points out, it is an accepted result of learning theory that this must be the case: the alternative of a tabula rasa imposes an insuperable learning problem on the child. This is the essence of the argument; it is discussed in detail, along with various counterarguments that have been put forward, by Lasnik and Lidz in chapter 10. In chapter 11, Fodor and Sakas discuss the question of learnability: what kind of evidence is needed for what kind of system such that a learner can converge on a grammatical system. This question turns out be more difficult than was once thought, as the authors show. In chapter 12, Guasti summarizes what is known about first-language acquisition, and in chapter 13 Sprouse and Schwartz compare non-native acquisition with first-language acquisition, arguing that the two cases are more similar than is often thought. (p. 29)

The three stages of UG that we introduced in section 1.1.2 are, perhaps not surprisingly, mirrored in the development of acquisition studies. In the first stage, the emphasis was on acquisition of rule systems and the nature of the evaluation metric; in the former connection, Wexler and Culicover (1980) was a major learnability result, as Fodor and Sakas discuss in chapter 11. In the second stage, parameter-setting was the major concern. Naturally, in this connection comparative acquisition studies came to the fore, led by the pioneering work of Hyams (1983, 1986). In the third stage, there has been more discussion of third factors. In this connection, work by Biberauer (2011, 2015) and Roberts (2012) emphasizing the role of the third-factor conditions Feature Economy (FE) and Input Generalization (IG) has been important. FE and IG were introduced briefly earlier, and can be more fully stated as follows:

  1. (7) Introduction

Together, these conditions form an optimal search/optimization strategy of a minimal kind. Roberts (2012) argues that these third factors interact with the other two factors, ‘small’ UG and the way in which the child interacts with the PLD, to give rise to parametric variation (expressed in the form of parameter hierarchies) as an emergent property, answering the otherwise difficult ‘where’ question regarding parameters in a minimalist architecture; see the discussion in chapter 14.

A potentially very important development of this idea is found in Biberauer (2011, 2015), Biberauer and Roberts (2015a,b, and in press). There it is proposed that not just parameters (seen as a subset of the set of formal features of UG, see chapter 14), but also the features themselves may be emergent in this way. If this conjecture turns out to be correct, then ‘small’ UG will be further emptied, and indeed the aspect of UG that perhaps most clearly represents the synthetic a priori will be removed from it, moving us further in the direction of conceptual necessity and a genuinely universal UG.

1.5 Comparative Syntax

Since UG is the general theory of I-languages, a major concern of linguistic theory is to develop a characterization of a possible human grammar (Chomsky 1957:50). Hence the question of language universals and comparative linguistics plays a central role (but see 1.1.1 on how the notions of ‘language universal’ and ‘UG’ are nonetheless distinct).

Part IV of the volume treats comparative syntax. Syntax has a whole part to itself since it is above all in this area that the principles-and-parameters approach to UG, both in its second and third stages, has been developed and applied; the details of this are given by Huang and Roberts in chapter 14. In chapter 15 Holmberg discusses language (p. 30) typology, and shows in detail how the Greenbergian cross-linguistic descriptive tradition is connected to and distinct from the Chomskyan tradition of looking for language universals as possible direct manifestations of UG (‘big’ or ‘small’). As already discussed, Guardiano and Longobardi introduce, defend and illustrate the Parametric Comparison Method (PCM) in chapter 16, a method which, as we saw in the previous section, may afford a new kind of historical adequacy for linguistic theory.

The idea that all languages are, as it were, cut from the same cloth, is not new. This is not the place, and I am not the person, to give even an approximate account of the development of ideas about language universals in the Western tradition of linguistic thought. Suffice it to say, however, that the tension between the easily observable arbitrary nature of linguistic form (famously emphasized by de Saussure 1916, but arguably traceable to the Platonic dialogue Cratylus; see Robins 1997:24; Seuren 1998:5–9; Law 2003:20–23; Graffi 2013; Itkonen 2013; Maat 2013; Magnus 2013; Mufwene 2013) and the widely accepted fact that there is a single independent non-linguistic reality ‘out there’ which we seem to be able to talk to each other about using language, has long been recognized in different ways in thought about language. To put it very simplistically, if there is a single reality and if we are able to speak about it in roughly the same way whether, like Plato, we speak Classical Greek, or like de Saussure, Genevan French, or, like me, Modern British English, then something must be universal, or at least very general, to language. More specifically, if logic reflects universal laws of thought, and if logical notions can be expressed through language, then we can observe something universal in language; negation may be a good example of this, in that negation is arguably a universal category of thought which receives arbitrarily differing formal expression in different languages.

If something is universal, then it becomes legitimate to ask how much is universal—this is one way of asking what is universal. And of course it is easy to observe that not everything is universal. It only takes the briefest perusal of a grammar or dictionary of a foreign language to recognize that there must be many features of English which distinguish it from some or all other languages. What these are is a matter for empirical investigation. It is here that we see how the study of how languages differ is the reverse of the coin of the study of universals (on this point, see also Kayne 2005a:3), and hence the importance of work, and theories of, comparative syntax for understanding UG (again, ‘big’ or ‘small’).

Another way of approaching the question of universals is to look at the origin of languages. If all languages developed from a common ancestor, or if they have all developed in the same way, or if we could show that language was somehow bestowed on humanity by some greater power (the name-giver of the Cratylus, God, Darwinian forces of selection and mutation, conceptually necessary laws governing the design of complex systems, inexorable cosmic forces of convergent evolution, etc.), then our question would surely be answered. We would be able to identify as universal what remains of the original proto-language in all currently existing languages, or what is created by universal processes of change, or the remnants of what was given to us by the higher power (on language change and UG, see chapter 18).

Just as we can isolate three stages in the development of UG since the 1950s, we can observe three stages in the development of comparative syntax. In the first stage, it was (p. 31) supposed that differences among languages could accounted for in terms of differing rule systems. For example, a VO language might have the PS-rule VP → V NP, while an OV language would have VP → NP V. Clearly there is little in the way of a theory of cross-linguistic variation here. In the second phase of UG, the highly simplified X′-schema which regulates the generation of D-Structures was parametrized. In particular, the Head Parameter was proposed (see chapter 14, section 14.3.1, for discussion and references):

  1. (8) Introduction

The grammar of each language makes a selection among precede and follow, giving rise to VO (and general head-initiality) or OV (and general head-finality). (8) itself is part of UG. The goal of comparative syntax at this stage was to discover the class of parameters, and thereby the form of parameters and ultimately enumerate their values for each language (see also chapter 16 and the discussion around (6)).

In the third stage, the development of the MP led to some difficulties for this notion of parameter (see in particular Boeckx 2011a, 2014). Building on an earlier proposal in Borer (1984), Chomsky (1995b) proposed that parameters were lexically encoded as differing values of a subset of formal features of functional heads (e.g., ‘φ-features,’ i.e., person, number and gender, Case, categorial and other features; as already mentioned, no exhaustive list of these features has been proposed). This, combined with empirical results that largely indicated that large-scale clustering of abstract morphosyntactic properties predicted by the second-stage concept of parameter of UG was not readily found, led to the development of ‘micro-parametric’ approaches to variation (see in particular Kayne 2005a, Baker 2008b and chapters 14 and 16). Although clearly descriptively adequate, it is unclear whether this approach has the explanatory value of the second-stage notion of parameter, notably in that it leads to a proliferation of parameters and therefore of parameter values and therefore of grammars. The range of cross-linguistic variation thus seems to be excessively unconstrained.

Beginning with Gianollo, Guardiano and Longobardi (2008), a different approach began to arise, which, following Baker (1996, 2008b), recognized the need for macroparameters in addition to microparameters. Roberts (2012) initiated a particular concept of parameter hierarchy (developing an idea first proposed in Baker 2001). As described earlier, this approach treats parameters and parameter hierarchies as emergent properties created by the interaction of (‘small’) UG, the PLD and the two third-factor conditions in (7): Feature Economy (FE) and Input Generalization (IG). For more details, see the discussion and references in chapter 14. Guardiano and Longobardi, in chapter 16, section 16.9, develop further the concept of Parameter Schemata, first proposed in Gianollo, Guardiano, and Longobardi (2008).

More recently, Biberauer and Roberts (2015c) and Roberts (2015) have suggested a rather different departure from a purely microparametric approach. Pursuing the long-noted analogy between parameters and genes (see the discussion of Jacob in Chomsky 1980:67), they suggest that, just as there are ‘master genes’ (genes which control other (p. 32) genes), so there are ‘master parameters.’ Taking a term from genetics, they call these pleiotropic parameters. Such parameters are ‘deep’ parameters which profoundly influence the overall shape of a grammatical system. They go on to identify a small number of features which have this parametric property, which they call pleiotropic formal features (PFFs), and identify a small number of PFFs (Person, Tense, Case, and Order). This is a very tentative proposal at an early stage of development, but it may represent an important developmenot for the theory of parameters, and therefore of (‘big’) UG.

1.6 Wider Issues

Part V deals with a range of wider issues that are of importance for UG, and where UG can, and has informed discussion and research. This includes creoles and creolization (Aboh and deGraff, chapter 17), diachronic syntax (Fuß, chapter 18), language pathology (Tsimpli, Kambanaros, and Grohmann, chapter 19), Sign Language (Cecchetto, chapter 20) and the question of whether animals have UG or some correlate (Boeckx, Hauser, and Samuels, chapter 21). There are numerous interconnections among these chapters, and with the chapters in Parts I–IV, and they attest to the fecundity of the idea of UG across a wide range of domains.

Of course, as I already mentioned, the topics covered are not exhaustive. Two areas not included which spring to mind are music and neurolinguistics in relation to UG (but see the remarks on music in section 1.2, and neurolinguistics is discussed in relation to pathology in chapter 19). But the largest lacuna is language evolution, a question whose importance for linguistic theory and any notion of UG is self-evident.

Although there is no separate chapter on language evolution, the topic has been touched on several times in the foregoing discussion, and is also discussed, to varying levels of detail, in chapters 2, 4, 6, 14, 16, 18, and 21. Furthermore, there is an entire volume devoted to language evolution in this very series (Tallerman and Gibson 2012; see in particular chapters 1, 13, 15, 31, 52, 56, and 65 of that volume). Most relevant in the current context, however, is Berwick and Chomsky (2016).

Berwick and Chomsky begin by recognizing the difficulties that the question of language evolution has always posed. There are basically three reasons for this. First, language is unique to humans and so cross-species comparative work cannot tell us very much (although see Hauser, Chomsky, and Fitch 2002 and chapter 21 for some indications). Second, the parts of the human anatomy which subserve language (brains, larynxes, etc.) do not survive in the fossil record, meaning that inferences have to made based on such things as surviving pieces of crania or hyoid bones, etc. Third, it seems that the evolution of language was a rather sudden event, and this goes against the traditional Darwinian idea of evolution proceeding through an accumulation of small changes.

Berwick and Chomsky critique the idea that evolution can only proceed incrementally, pointing out that many evolutionary developments (including the origin of complex cells, and eyes; Berwick and Chomsky 2016:37) are of a similarly (p. 33) non-micromutational nature, and that modern evolutionary theory can and does take account of this. The general account they give of language evolution can be summarized by the following extended quotation:

In some completely unknown way, our ancestors developed human concepts. At some time in the very recent past, apparently some time before 80,000 years ago if we can judge from associated symbolic proxies, individuals in a small group of hominids in East Africa underwent a minor biological change that provided the operation Merge—an operation that takes human concepts as computational atoms and yields structured expressions that, systematically interpreted by the conceptual system, provide a rich language of thought. These processes might be computationally perfect, or close to it, hence the result of physical laws independent of humans. The innovation had obvious advantages and took over the small group. At some later stage, the internal language of thought was connected to the sensorimotor system, a complex task that can be solved in many different ways and at different times.

(Berwick and Chomsky 2016:87)

From this quotation we see the role of third factors (‘physical law independent of humans’), and the all-important point that the problem is broken down into three parts: the development of concepts (which remains unexplained), the development of Merge (‘a minor biological change’) and the development of ‘externalization,’ i.e., the link to the articulatory-perceptual interface (roughly speaking, in spoken language, morphology, phonology, and phonetics). The evolution of (‘small’) UG then amounts to the aforementioned ‘minor biological change.’ On this view, then, the three factors of language design all have separate histories: the first factor is that ‘minor biological change’ giving rise to Merge (essentially seen as the sole content of UG; see Hauser, Chomsky, and Fitch 2002); the second factor could not arise until the system was externalized (note that once PLD became possible through externalization language change could start—see chapter 18); the third factors are, at this level of generality, rooted in physical law and hence not subject to biological evolution.

In their fourth chapter, Berwick and Chomsky (2016:109ff) treat language evolution as a mystery, and thus the narrative is a whodunnit: what, who, where, when, how, and why. Their solution to the whodunnit is summarized as follows, and again I quote at length:

  • ‘What’ boils down to the Basic Property of human language—the ability to construct a digitally infinite array of hierarchically structured expressions with determinate interpretations at the interfaces with other organic systems [footnote omitted]

  • ‘Who’ is us—anatomically modern humans—neither chimpanzees nor gorillas nor songbirds

  • ‘Where’ and ‘When’ point to sometime between the first appearance of anatomically modern humans in southern Africa roughly 200,000 years ago, prior to the last African exodus approximately 60,000 years ago…. (p. 34)

  • ‘How’ is the neural implementation of the Basic Property—little understood, but recent empirical evidence suggests that this could be compatible with some ‘slight rewiring of the brain,’ as we have put it elsewhere.

  • ‘Why’ is language’s use for internal thought, as the cognitive glue that binds together other perceptual and information-processing systems.

(Berwick and Chomsky 2016:110–111)

The remainder of the chapter fleshes out each of these points in detail, with a particularly interesting proposal regarding the ‘slight rewiring of the brain’ mentioned under ‘How?’ (see Berwick and Chomsky 2016:157–164).

If the assiduous and diligent reader of this volume, on reaching the end of chapter 21, feels a need to complete the experience by reading a chapter on UG and language evolution, I recommend they turn to chapter 4 of Berwick and Chomsky (2016).

1.7 Conclusion

One thing which should be apparent to the reader who has got this far is that the area of study that surrounds, supports, questions or elaborates the notion (or notions) of universal grammar is extremely rich. My remarks in this introductory chapter have been extremely sketchy for the most part. The chapters to follow treat their topics in more depth, but each is just an overview of a particular area. It is no exaggeration to say that one could spend a lifetime just reading about UG in its various guises.

And this is how it should be. The various ideas of UG and universal grammar entertained here represent serious attempts to understand arguably the most extraordinary feature of our species: our ability to produce and understand language (in a stimulus-free yet appropriate manner) in order to articulate, develop, communicate and store complex thoughts. Like any bold and interesting idea, UG has its critics. In recent years, some have been extremely vocal (see in particular Evans and Levinson 2009; Tomasello 2009). Again this is as it should be: all good ideas can and should be challenged. Even if these critics turn out to be correct (although I think it is fair to say that reports of the ‘death of UG’ are somewhat premature), at the very minimum the idea has served as an outstandingly useful heuristic for finding out more about language. But one is naturally led to wonder whether the idea can really be wholly wrong if it is able to yield so much, aside from any intrinsic explanatory value it may have. Both that fecundity and the explanatory force of the idea are amply attested in the chapters to follow.