Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE ( © Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).

Subscriber: null; date: 27 June 2019

Word Grammar

Abstract and Keywords

Word Grammar (WG) combines elements from a wide range of other theories of language and cognition into a coherent theory of language as conceptual knowledge. The structure is a network built round an ‘isa’ hierarchy; the logic is multiple default inheritance; and the knowledge is learned and applied by two cognitive processes: spreading activation and node-creation.

Keywords: network, psycholinguistics, sociolinguistics, activation, morphology, syntax, dependency, semantics, default inheritance

40.1 A Brief History of Word Grammar (WG)

Among the questions that we have been asked to consider is question (n): “How does your model relate to alternative models?” Very few of the ideas in Word Grammar (WG) are original so it may be helpful to introduce the theory via the various theories from which the main ideas come.

We start with the name “Word Grammar” (WG), which is less informative now than it was in the early 1980s when I first used it (Hudson 1984). At that time, WG was primarily a theory of grammar in which words played a particularly important role (as the only units of syntax and the largest of morphology). At that time I had just learned about dependency grammar (Anderson 1971; Ágel and Fischer this volume), which gave me the idea that syntax is built around words rather than phrases (see section 40.8). But the earlier roots of WG lie in a theory that I had called “Daughter-Dependency Grammar” (Hudson 1976; Schachter 1978, 1981) in recognition of the combined roles of dependency and the “daughter” relations of phrase structure. This had in turn derived from the first theory that I learned and used, Systemic Grammar (which later turned into Systemic Functional Grammar—Halliday 1961; Hudson 1971; Caffarel-Cayron, this volume). Another WG idea that I derived from Systemic Grammar is that “realization” is different from “part”, though this distinction is also part of the more general European tradition embodied in the “word-and-paradigm” model of morphology (Hudson 1973; Robins 2001).

(p. 982) In several respects, therefore, early WG was a typical “European” theory of language based on dependency relations in syntax and realization relations in morphology. However, it also incorporated two important American innovations. One was the idea that a grammar could, and should, be generative (in the sense of a fully explicit grammar that can “generate” well-formed structures). This idea came (of course) from what was then called Transformational Grammar (Chomsky 1965), and my first book was also the first of a series of attempts to build generative versions of Systemic Grammar (Hudson 1971). This concern for theoretical and structural consistency and explicitness is still important in WG, as I explain in section 40.2. The second American import into WG is probably its most general and important idea: that language is a network (Hudson 1984: 1, 2007b: 1). Although the idea was already implicit in the “system networks” of Systemic Grammar, the main inspiration was Stratificational Grammar (Lamb 1966). I develop this idea in section 40.3.

By 1984, then, WG already incorporated four ideas about grammar in a fairly narrow sense: two European ideas (syntactic dependency and realization) and two American ones (generativity and networks). But even in 1984 the theory looked beyond grammar. Like most other contemporary theories of language structure, it included a serious concern for semantics as a separate level of analysis from syntax; so in Hudson (1984), the chapter on semantics has about the same length as the one on syntax. But more controversially, it rejected the claim that language is a unique mental organ in favor of the (to my mind) much more interesting claim that language shares the properties of other kinds of cognition (Hudson 1984: 36, whereI refer to Lakoff 1977). One example of a shared property is the logic of classification, which I then described in terms of “models” and their “instances”, which “inherit” from the models (Hudson 1984: 14–21) in a way that allows exceptions and produces “prototype effects” (ibid. 39–41). These ideas came from my elementary reading in artificial intelligence and cognitive science (e.g., Quillian and Collins 1969; Winograd 1972; Schank and Abelson 1977); but nowadays I describe them in terms of the “isa” relation of cognitive science (Reisberg 2007) interpreted by the logic of multiple default inheritance (Luger and Stubblefield 1993: 387); section 40.4 expands these ideas.

The theory has developed in various ways since the 1980s. Apart from refinements in the elements mentioned above, it has been heavily influenced by the “cognitive linguistics” movement (Geeraerts and Cuyckens 2007; Bybee and Beckner, Fillmore, and Langacker, this volume). This influence has affected the WG theories of lexical semantics (section 40.9) and of learning (section 40.10), both of which presuppose that language structure is deeply embedded in other kinds of cognitive structures. Another development has been in the theory of processing, where I have tried to take account of elementary psycholinguistics (Harley 1995), as I explain in section 40.10. But perhaps the most surprising source of influence has been sociolinguistics, in which I have a long-standing interest (Hudson 1980, 1996). I describe this influence as surprising because sociolinguistics has otherwise had virtually no impact on theories of language structure. WG, in contrast, has always been able to provide a theoretically motivated place for sociolinguistically important properties of words such (p. 983) as their speaker and their time (Hudson 1984: 242; 1990: 63–6; 2007b: 236–48). I discuss sociolinguistics in section 40.11.

In short, WG has evolved over nearly three decades by borrowing ideas not only from a selection of other theories of language structure ranging from Systemic Functional Grammar to Generative Grammar but also from artificial intelligence, psycholinguistics, and sociolinguistics. I hope the result is not simply a mishmash of ideas but an integrated framework of ideas. On the negative side, the theory has research gaps including phonology, language change, metaphor, and typology. I hope others will be able to fill these gaps. However, I suspect the main gap is a methodological one: the lack of suitable computer software for holding and testing the complex systems that emerge from serious descriptive work.

40.2 The Aims of Analysis

This section addresses the following questions that the editors of this volume presented in chapter 1:

  1. (a) How can the main goals of your model be summarized?

  2. (b) What are the central questions that linguistic science should pursue in the study of language?

  3. (e) How is the interaction between cognition and grammar defined? (f) What counts as evidence in your model?

  4. (m) What kind of explanations does your model offer?

Each of the answers will revolve around the same notion: psychological reality.

Starting with question (a), the main goal of WG, as for many of the other theories described in this book, is to explain the structure of language. It asks what the elements of language are and how they are related to one another. One of the difficulties in answering these questions is that language is very complicated, but another is that we all have a number of different, and conflicting, mental models of language, including the models that Chomsky has called “E-language” and “I-language” (Chomsky 1986a). For example, if I learn (say) Portuguese from a book, what I learn is a set of words, rules, and so on which someone has codified as abstractions; in that case, it makes no sense to ask “Where is Portuguese?” or “Who does Portuguese belong to?” There is a long tradition of studying languages—especially dead languages—in precisely this way, and the tradition lives on in modern linguistics whenever we describe “a language”. This is “external” E-language, in contrast with the purely internal I-language of a given individual, the knowledge which they hold in their brain. As with most other linguistic theories (but not Systemic Functional Grammar), it is I-language rather than E-language that WG tries to explain.

(p. 984) This goal raises serious questions about evidence—question (f)—because in principle, each individual has a unique language, though since we learn our language from other people, individual languages tend to be so similar that we can often assume that they are identical. If each speaker has a unique I-language, evidence from one speaker is strictly speaking irrelevant to any other speaker; and, in fact, any detailed analysis is guaranteed eventually to reveal unsuspected differences between speakers. On the other hand, there are close limits to this variation set by the fact that speakers try extraordinarily hard to conform to their role-models (Hudson 1996: 10–14), and we now know, thanks to sociolinguistics, a great deal about the kinds of similarities and differences that are to be expected among individuals in a community. This being so, it is a fair assumption that any expert speaker (i.e., barring children and new arrivals) speaks for the whole community until there is evidence to the contrary. The assumption may be wrong in particular cases, but without it descriptive linguistics would grind to a halt. Moreover, taking individuals as representative speakers fits the cognitive assumptions of theories such as WG because it allows us also to take account of experimental and behavioral evidence from individual subjects. This is important if we want to decide, for example, whether regular forms are stored or computed (Bybee 1995)—a question that makes no sense in terms of E-language. In contrast, it is much harder to use corpus data as evidence for I-language because it is so far removed from individual speakers or writers.

As far as the central questions for linguistic science—question (b)—are concerned, therefore, they all revolve around the structure of cognition. How is the “language” area of cognition structured? Why is it structured as it is? How does this area relate to other areas? How do we learn it, and how do we use it in speaking and listening (and writing and reading)? This is pure science, the pursuit of understanding for its own sake, but it clearly has important consequences for all sorts of practical activities. In education, for instance, how does language grow through the school years, and how does (or should) teaching affect this growth? In speech and language therapy, how do structural problems cause problems in speaking and listening, and what can be done about them? In natural-language processing by computer, what structures and processes would be needed in a system that worked just like a human mind?

What, then, of the interaction between cognition and grammar—question (e)? If grammar is part of cognition, the question should perhaps be: How does grammar interact with the rest of cognition? According to WG, there are two kinds of interaction. On the one hand, grammar makes use of the same formal cognitive apparatus as the rest of cognition, such as the logic of default inheritance (section 40.4), so nothing prevents grammar from being linked directly to other cognitive areas. Most obviously, individual grammatical constructions may be linked to particular types of context (e.g., formal or informal) and even to the conceptual counterparts of particular emotions (e.g., the construction WH X, as in What on earth are you doing?, where X must express an emotion; cf. Kay and Fillmore 1999 on the What’s X doing Y construction). On the other hand, the intimate connection between grammar and the rest of cognition allows grammar to influence non-linguistic cognitive development as predicted by the (p. 985) Sapir–Whorf hypothesis (Lee 1996; Levinson 1996b). One possible consequence of this influence is a special area of cognition outside language which is only used when we process language—Slobin’s “thinking for speaking” (Slobin 1996). More generally, a network model predicts that some parts of cognition are “nearer” to language (i.e., more directly related to it) than others, and that the nearer language is, the more influence it has.

Finally, we have the question of explanations—question (m). The best way to explain some phenomenon is to show that it is a special case of some more general phenomenon, from which it inherits all its properties. This is why I find nativist explanations in terms of a unique “language module” deeply unsatisfying, in contrast with the research program of cognitive linguistics whose basic premise is that “knowledge of language is knowledge” (Goldberg 1995: 5). If this premise is true, then we should be able to explain all the characteristics of language either as characteristics shared by all knowledge, or as the result of structural pressures from the ways in which we learn and use language. So far I believe the results of this research program are very promising.

40.3 Categories in a Network

As already mentioned in section 40.1, the most general claim of WG is that language is a network, and more generally still, knowledge is a network. It is important to be clear about this claim because it may sound harmlessly similar to the structuralist idea that language is a system of interconnected units, which every linguist would accept. It is probably uncontroversial that vocabulary items are related in a network of phonological, syntactic, and semantic links, and networks play an important part in the grammatical structures of several other theories (notably system networks in Systemic Functional Grammar and directed acyclic graphs in Head-Driven Phrase Structure Grammar—Pollard and Sag 1994). In contrast with these theories where networks play just a limited part, WG makes a much bolder claim: in language there is nothing but a network—no rules or principles or parameters or processes, except those that are expressed in terms of the network. Moreover, it is not just the language itself that is a network; the same is true of sentence structure, and indeed the structure of a sentence is a temporary part of the permanent network of the language. As far as I know, the only other theory which shares the view that “it’s networks all the way down” is Neurocognitive Linguistics (Lamb 1998).

Word GrammarClick to view larger

Figure 40.1 Two synonyms and two homonyms as a network of complex units

Word GrammarClick to view larger

Figure 40.2 Two synonyms and two homonyms as a pure network

Moreover, the nodes of a WG network are atoms without any internal structure, so a language is not a network of complex information-packages such as lexical entries or constructions or schemas or signs. Instead, the information in each such package must be “unpacked” so that it can be integrated into the general network. The difference may seem small, involving little more than the metaphor we choose for talking about structures; but it makes a great difference to the theory. If internally complex nodes are permitted, then we need to allow for them in the theory by providing a typology of (p. 986) nodes and node-structures, and mechanisms for learning and exploiting these node-internal structures. But if nodes are atomic, there is some hope of providing a unified theory which applies to all structures and all nodes.

To make the discussion more concrete, consider the network-fragment containing the synonyms BEARverb and TOLERATE and the homonyms BEARverb and BEARnoun (as in I can’t bear the pain and The bear ate the honey). The analysis in Figure 40.1 is in the spirit of Cognitive Grammar (e.g., Langacker 1998b: 16), so it recognizes three “symbolic units” with an internal structure consisting of a meaning (in quotation marks) and a form (in curly brackets). Since symbolic units cannot overlap, the only way to relate these units to each other is to invoke separate links to other units in which the meanings and forms are specified on their own. In this case, the theory must distinguish the relations between units from those found within units, and must say what kinds of units (apart from symbolic units) are possible.

Word GrammarClick to view larger

Figure 40.3 Two synonyms and two homonyms in WG notation

This analysis can be contrasted with the one in Figure 40.2, which is in the spirit of WG but does not use WG notation (for which see Figure 40.3 below). In this diagram (p. 987) there are no boxes because there are no complex units—just atomic linked nodes. The analysis still distinguishes different kinds of relations and elements, but does not do it in terms of boxes. The result is a very much simpler theory of cognitive structure in which the familiar complexes of language such as lexical items and constructions can be defined in terms of atomic units.

We can now turn to question (c): “What kinds of categories are distinguished?” WG recognises three basic kinds of elements in a network:

  • Primitive logical relations: “isa” (the basic relation of classification which Lan-gacker calls “schematicity”; (Tuggy 2007)) and four others: “identity”, “argument”, “value”, and “quantity” (Hudson 2006: 47).

  • Relational concepts: all other relations whether linguistic (e.g., “meaning”, “realization”, “complement”) or not (e.g., “end”, “father”, “owner”).

  • Non-relational concepts, whether linguistic (e.g., “noun”, “{bear}”, “singular”) or not (e.g., “bear”, “tolerate”, “set”).

The “isa” relation plays a special role because every concept, whether relational or not, is part of an “is a hierarchy” which relates it upward to more general concepts and downward to more specific concepts. For example, “complement” is a “dependent”, and “object” is a “complement”, so the network includes a hierarchy with “complement” above “object” and below “dependent”. As I explain in section 40.4, “isa” also carries the basic logic of generalization, default inheritance.

Any network analysis needs a notation which distinguishes these basic types of element. The WG notation which does this can be seen in Figure 40.3:

  • (p. 988) Relational concepts are named inside an ellipse.

  • Non-relational concepts have labels with no ellipse.

  • Primitive logical relations have distinct types of line. The “isa” relation has a small triangle whose base rests on the super-category; “argument” and “value” are the arcs pointing into and out of the relational concept; and “quantity” is shown (without any line) by a digit which represents a non-relational concept.

In other words, therefore, the figure shows that the meaning of the noun BEAR (BEARnoun) is “bear”; and because “tolerate” may be the meaning of either TOLERATE or the verb BEAR, two different instance of “tolerate” are distinguished so that each is the meaning of a different verb. This apparently pointless complexity is required by the logic of WG, which otherwise cannot express the logical relation “or”—see section 40.4.

40.4 The Logic of Inheritance

As in any other theory, the linguist’s analysis tries to capture generalizations across words and sentences in the language concerned, so the mechanism for generalization plays a crucial role. Since the goal of the analysis is psychological reality in linguistic analysis combined with the attempt to use general-purpose cognitive machinery wherever possible, the mechanism assumed in WG is that of everyday reasoning, and default inheritance (Pelletier and Elio 2005). The same general principle is assumed in a number of other linguistic theories (Pollard and Sag 1994: 36; Jackendoff 2002: 184; Goldberg 2006: 171; Bouma 2006).

The general idea is obvious and probably uncontroversial when applied to commonsense examples. For example, a famous experiment found that people were willing to say that a robin has skin and a heart even though they did not know this as a fact about robins as such. What they did know, of course, was, first, that robins are birds and birds are living creatures (“animals” in the most general sense), and, second, that the typical animal (in this sense) has skin and a heart (Quillian and Collins 1969). In other words, the subjects had “inherited” information from a super-category onto the sub-category. We all engage in this kind of reasoning every minute of our lives, but we know that there are exceptions which may prove us wrong—and, indeed, it is the exceptions that make life both dangerous and interesting. If inheritance allows for exceptions, then it is called “default inheritance” because it only inherits properties “by default”, in the absence of any more specific information to the contrary. This is the kind of logic that we apply in dealing with familiar “prototype effects” in categorization (Rosch 1978); so if robins are more typical birds than penguins, this is because penguins have more exceptional characteristics than robins do. Somewhat more precisely, the logic that we use in everyday life allows one item to inherit from a number of super-categories; for example, a cat inherits some characteristics from “mammal” (e.g., having four legs) and others from “pet” (e.g., living indoors with humans). This extension of default inheritance is called “multiple default inheritance”.

(p. 989) It is reasonably obvious that something like this logic is also needed for language structure, where exceptions are all too familiar in irregular morphology, in “quirky” case selection, and so on, and where multiple inheritance is commonplace— for instance, a feminine, accusative, plural noun inherits independently from “feminine”, “accusative”, and “plural”. This logic is implied by the “Elsewhere condition” (Kiparsky 1982) in lexical phonology, and is implicit in many other approaches such as rule-ordering where later (more specific) rules can overturn earlier more general ones. Nevertheless, multiple default inheritance is considered problematic in linguistic theory, and much less widely invoked than one might expect. One reason for this situation is the difficulty of reconciling it with standard logic. Standardly, logic is “monotonic”, which means that, once an inference is drawn, it can be trusted. In contrast, default inheritance is non-monotonic because an inference may turn out to be invalid because of some exception that overrides it. Moreover, multiple inheritance raises special problems when conflicting properties can be inherited from different super-categories (Touretzky 1986). WG avoids these logical problems (and others) by a simple limitation: inheritance only applies to tokens (Hudson 2006: 25). How this works is explained below.

Word GrammarClick to view larger

Figure 40.4 An irregular verb overrides the default past tense form

To take a simple linguistic example, how can we show that by default the past tense of a verb consists of that verb’s stem followed by the suffix {ed}, but that for TAKE the past tense form is not taked but took? The WG answer is shown in Figure 40.4. The default pattern is shown in the top right-hand section: “past” (the typical past tense verb) has a “fully inflected form” (fif) consisting of ‘1’ stem followed by {ed}. The entry for TAKE in the top left shows that its stem is {take}, so by default the fif of a word which inherits (by multiple inheritance) from both TAKE and “past” should be {{take}{ed}}. However, (p. 990) the fif is in fact specified as {took}, so this form overrides the default. Now suppose we apply this analysis to a particular token T which is being processed either in speaking or in listening. This is shown in the diagram with an is a link to TAKE:past, as explained in section 40.10. If inheritance applies to T, it will inherit all the properties above it in the hierarchy, including the specified fif; but the process inevitably starts at the bottom of the hierarchy so it will always find overriding exceptions before it finds the default. This being so, the logic is actually monotonic: once an inference is drawn, it can be trusted.

Default inheritance is important in linguistic analysis because it captures the asymmetrical relation which is found between so many pairs of alternatives, and which in other theories is expressed as one of the alternatives being the “underlying” or “unmarked” one. For example, one word order can be specified as the default with more specific orders overriding it; so a dependent of an English word typically follows it, but exceptionally the subject of a verb typically precedes it, but exceptionally the subject of an “inverting” auxiliary verb typically follows it (see section 40.8 for word order). The same approach works well in explaining the complex ordering of extracted words in Zapotec, as well as a wide range of other asymmetrical patterns (Hudson 2003c).

Another role of default inheritance is to capture universal quantification. If X has property P, then “all X”, i.e., everything which is a X, also has property P. The main difference is that, unlike universal quantification, default inheritance allows exceptions. In contrast, the WG equivalent of the other kind of quantification, existential quantification, is simply separate “existence” in the network; so if “some X” has property P, there is a separate node Y in the network which is a X and has the property P. Other examples of X do not inherit P from Y because there is no “upward inheritance”. Similarly, inheritance makes the “and” relation easy to express: if X has two properties P and Q, then both are automatically inherited by any instance of X. In contrast, the relation “or” is much harder to capture in a network—as one might hope, given its relative complexity and rarity. The solution in WG is to recognize a separate sub-case for each of the alternatives; so if X has either P or Q among its properties, we assign each alternative to a different sub-case of X, X1, and X2—hence the two sub-cases of {bear} in Figure 40.3.

40.5 The Architecture of Language

The formal structure of WG networks described in section 40.3 already implies that they have a great deal of structure because every element is classified hierarchically. This allows us to distinguish the familiar levels of language according to the vocabulary of units that they recognize: words in syntax, morphs in morphology, and phones in phonology. Moreover, different relation-types are found on and between different levels, so levels of analysis are at least as clearly distinguished in WG as they are in any (p. 991) other theory. This allows us to consider question (d): “What is the relation between lexicon, morphology, syntax, semantics, pragmatics, and phonology?”

We start with the lexicon. WG (just like other cognitive theories—Croft 2007a: 471) recognizes no boundary between lexical and “grammatical” structures; instead, it simply recognizes more and less general word-types. For example, the verb BEARverb is a Transitive-verb, which is a Verb, which is a Word, and at no point do we find a qualitative difference between specific “lexical” and general “grammatical” concepts. Nor can we use length as a basis for distinguishing one-word lexical items from multi-word general constructions because we clearly memorize individual multi-word idioms, specific constructions, and clichés. Moreover, almost every theory nowadays recognizes that lexical items have a valency which defines virtual dependency links to other words, so all “the grammar” has to do is to “merge” lexical items so that these dependencies are satisfied (Chomsky 1995: 226; Ninio 2006: 6–10)—a process that involves nothing more specific than ensuring that the properties of a token (such as its dependents) match those of its type. In short, the syntactic part of the language network is just a highly structured and hierarchical lexicon which includes relatively general entries as well as relatively specific ones (Flickinger 1987)—what we might call a “super-lexicon”.

However, WG does not recognize just one super-lexicon specific to language but three: one for syntax (consisting of words), another for morphology, and a third for phonology. The morphological lexicon consists of what I call “forms”—morphs such as {bear}, {bore}, and {s}, and morph-combinations extending up to complete word-forms such as {{un}{bear}{able}} and {{walk}{s}} (Hudson 2006: 72–81). In phonology, I assume the vocabulary of units includes segments and syllables, but in WG this is unexplored territory. This analysis gives a three-level analysis within language; for example, the word FARMER:plural (the plural of FARMER) is realized by the form {{farm}{er}{s}} which in turn is realized by a phonological structure such as /fα:/mǝz/. Each level is identified not only by the units that it recognizes but also by the units that realize them and those that they realize; so one of the characteristics of the typical word is that it is realized by a form, and by default inheritance this characteristic is inherited by any specific word. The overall architecture of WG in terms of levels is shown in Figure 40.5, where every word is realized by some form and every form is realized by some sound. (Not every form realizes a word by itself, nor does every sound realize a form by itself.) What units at all three levels share is the fact that they belong to some language (English, French, or whatever), so they are united as “linguistic units”.

Word GrammarClick to view larger

Figure 40.5 The three linguistic levels in WG notation

This three-level analysis of language structure is controversial, of course, though by no means unprecedented (Sadock 1991; Aronoff 1994). It conflicts with any analysis in terms of bipolar “signs” which combine words (or even meanings) directly with phonology (Anderson 1992; Beard 1994; Pollard and Sag 1994; Chomsky 1995; Jackendoff 1997; Langacker 1998b), as well as with neo-Bloomfieldian analyses which treat morphemes as word-parts (Halle and Marantz 1993). The WG claim is that the intermediate level of “form” is psychologically real, so it is encouraging that the most widely accepted model of speech processing makes the same assumption (Levelt et al.1999). The claim rests on a variety of evidence (Hudson 2006: 74–8) ranging from (p. 992) the invisibility of phonology in syntax to the clear recognition of morphs in popular etymology. It does not follow from any basic principles of WG, so if it is true it raises research questions. Do all languages have the same three-level organization? For those languages that do have it, why have they evolved in this way?

A particularly controversial aspect of this three-level analysis is the place of meaning. The simplest assumption is that only words have meaning, so morphs have no meaning. This seems right for morphs such as the English suffix {s}, which signals two completely different inflectional categories (plural in nouns and singular in verbs); and if the form {bear} realizes either the verb or the noun, then there is little point in looking for its meaning. On the other hand, it is quite possible (and compatible with WG principles) that some morphs do have a meaning; and, indeed, there is experimental evidence for “phonaesthemes”—purely phonological patterns such as initial /gl/ in English that correlate with meanings, though rather more loosely than forms and words do (Bergen 2004). Moreover, intonational and other prosodic patterns have a meaning which contributes to the overall semantic structure, for instance by distinguishing questions from statements. It seems quite likely, therefore, that units at all levels can have a meaning. On the other hand, this is a typical property of words, in contrast with forms and sounds which typically have no meaning, so there is still some truth in the earlier WG claim that meanings are expressed only by words.

The default logic of WG (section 40.4) allows exceptions in every area, including the basic architecture of the system. We have just considered one example, morphological and phonological patterns that have meanings; and it cannot be ruled out that words might be realized in some cases directly by sounds. Another kind of exception is found between syntax and morphology, where the typical word is realized by a word-form (a particular kind of form which is “complete” as far as the rules of morphology are concerned). The exception here is provided by clitics, which are words—i.e., units of syntax—which are realized by affixes so that they have to be attached to other forms (p. 993) for the sake of morphological completeness; for example, the English possessive _’s (as in John’s hat) is a determiner realized by a mere suffix. WG analyses are available for various complex clitic systems including French and Serbo-Croat pronouns (Hudson 2001, 2006: 104–15; Camdzic and Hudson 2007).

In short, WG analyzes a language as a combination of three super-lexicons for words, forms, and sounds (at different levels of generality). These lexicons are arranged hierarchically by default so that words have meanings and are typically realized by forms, and forms are typically realized by sounds, but exceptions exist. As for pragmatics, a great deal of so-called “pragmatic” information about context may be stored along with more purely linguistic properties (see sections 40.9 and 40.11), but a great deal more is computed during usage by the processes of understanding (section 40.10).

40.6 Words, Features, and Agreement

In the three-level analysis, the typical word stands between meaning and morphological form, so its properties include at least a meaning and a realization. However, it has other properties as well which we review briefly below.

Most words are classified in terms of the familiar super-categories traditionally described in terms of word classes (noun, verb, etc.), sub-classes (auxiliary verb, modal verb, etc.), and feature structures (tense, number, etc.). Many theories reduce all these kinds of classification to feature structures expressed as attribute-value matrices, so that a plural noun (for example) might have the value “plural” for the attribute “number” and the value “noun” for “part of speech” (or, in Chomskyan analysis, “+” for “noun” and “?” for “verb”). “Nearly all contemporary approaches use features and feature structures to describe and classify syntactic and morphological constructions” (Blevins 2006: 393). WG takes the opposite approach, using the is a hierarchy for all kinds of classification. We have already seen the effects of this principle in Figure 40.4, where both TAKE and “past” have an is a relation to “verb”. This fundamental theoretical difference follows from the adoption of “isa” as the mechanism for classification, which in turn follows from the aim of treating language wherever possible like other areas of cognition. Even if attribute-value matrices are helpful in linguistic analysis, they are surely not relevant in most kinds of classification. For example, if we classify both apples and pears as a kind of fruit, what might be the attribute that distinguishes them? The problems are the same as those of the “componential analysis” that was tried, and abandoned, in the early days of modern semantics (Bolinger 1965).

Word GrammarClick to view larger

Figure 40.6 An is a hierarchy for words including classes, a sub-class, lexemes, a sub-lexeme, an inflection, and a token

Moreover, feature-based classification only works well for a very small part of language, where names such as “case” and “number” are already available for the attributes; we return to this minority of cases below. Distinctions such as the one between common and proper nouns or between auxiliary and full verbs have no traditional name, and for good reason: the “attribute” that contrasts them does no work in the grammar. Consequently, WG uses nothing but an is a hierarchy for classifying (p. 994) words. It should be borne in mind that multiple inheritance allows cross-classification, which is traditionally taken as evidence for cross-cutting attributes; for example, Figure 40.4 shows how the word TAKE:past can be classified simultaneously in terms of lexemes (TAKE) and in terms of morpho-syntactic contrasts such as tense (past). Similarly, Figure 40.6 shows how this analysis fits into a broader framework which includes:

  • the super-class “word”

  • very general word-types (lexeme, inflection)

  • word classes (verb, noun)

  • a sub-class (auxiliary)

  • individual lexemes (HELLO, TAKE)

  • sub-lexemes (TAKEintrans, the intransitive use of TAKE as in The glue wouldn’t take)

  • an inflection (past)

  • a word-token (T) which is analyzed as the past tense of TAKEintrans.

This unified treatment allows the same default inheritance logic to handle all kinds of generalizations, but it also brings other advantages. First, it allows us to avoid classification altogether where there is no generalization to be captured; this is illustrated by the word HELLO, which inherits no grammatical properties from any word class, so (p. 995) it is “syncategorematic”, belonging to no general category other than “word” (Pullum 1982). Second, default members of a category belong to that category itself, so subcategories are only needed for exceptions. Contrary to more traditional classification systems, this means that a category may have just one sub-category. The relevant example in the diagram is “auxiliary”, which does not contrast with any other word class because non-auxiliary verbs are simply default verbs. Similarly, “past” does not contrast with “present” because verbs are present tense by default; in traditional terminology, tense is a privative opposition, and “past” is marked relative to “present”. Third, sub-lexemes allow distinctions without losing the unifying notion of “lexeme”; so for example it is possible to recognize both the transitive and intransitive uses of TAKE as examples of the same lexeme (with the same irregular morphology) while also recognizing the differences. And lastly, the token (which is attached temporarily to the network as explained in section 40.10) can inherit from the entire hierarchy by inheriting recursively from each of the nodes above it.

Unlike many other contemporary theories, therefore, WG classifies words without using feature-structures because, in general, they are redundant. The exception is agreement, where one word is required to have the same value as some other word for some specified attribute such as gender or number; for example, in English a determiner has the same number as its complement noun (this book but these books), and in Latin an adjective agrees with the noun on which it depends in gender, number, and case. It is impossible to express this kind of rule in a psychologically plausible way without attributes and values, but this is not a theoretical problem for WG because attributes are found in general cognition; for example, when we say that two people are the same height or age, we are invoking an attribute. Consequently, attributes are available when needed, but they are not the basis of classification—and, indeed, their relation to basic classification in the is a hierarchy may be more or less complex rather than in a simple one-to-one relation. For example, one of the values may be assigned by default, allowing the asymmetrical relations between marked and unmarked values mentioned above, which is illustrated by the default “singular” number of nouns shown in Figure 40.7. The network on the right in this figure is the English agreement rule for determiners and their complement nouns. Other agreement rules may be more complex; for example, I have suggested elsewhere that subject–verb agreement in English involves three different attributes: number, agreement-number, and subject-number, which all agree by default but which allow exceptions such as the plural verb forms used with the pronouns I and you (Hudson 1999).

40.7 Morphology

Word GrammarClick to view larger

Figure 40.7 Nouns are singular by default, and a determiner agrees in number with its complement

The three-level architecture explained in section 40.5 means that each word has a morphological structure defined in terms of morphs; this applies even to monomorphs such as CAT, realized by {cat}, which in turn is realized by /kat/. The task of morphology (p. 996) is to define possible morphological structures and to relate them upward to words and word classes (morpho-syntax) and downward to phonology (morpho-phonology).

In morpho-syntax, WG allows morphs to realize semantic and syntactic contrasts, but does not require this; so morphs may be purely formal objects such as the semantically opaque roots in DECEIVE and RECEIVE, where {ceive} is motivated only by the derived nouns DECEPTION and RECEPTION. In most cases, however, a word’s morphological structure indicates its relations to other words with partially similar structures. The distinction between lexemes and inflections (Figure 40.6) allows two logical possibilities for these relations:

  • lexical (“derivational”) morphology: the two words belong to different lexemes (e.g., FARM—FARMER).

  • inflectional morphology: they belong to the same lexeme (e.g., farm—farms).

In both cases, the partial morphological similarities may match similarities found between other lexemes.

Lexical morphology often builds on general lexical relations which exist independently of morphological structure; for example, many animal names have contrasting adult-young pairs without any morphological support (e.g., COW—CALF, SHEEP— LAMB), though in some cases the morphology is transparent (DUCK—DUCKLING, GOOSE—GOSLING). Where lexical morphology is productive, it must involve two relations: a semantically and syntactically specified lexical relation between two sets of words, and a morphologically specified relation between their structures. A simple example can be found in Figure 40.8, which shows that a typical verb has an “agent-noun” which defines the agent of the verb’s action and whose stem consists of the verb’s stem followed by {er}. (A few details in this diagram have been simplified.)

Word GrammarClick to view larger

Figure 40.8 Lexical morphology: A verb is related to its agent-noun in both meaning and morphology

Inflectional morphology, on the other hand, relates a word’s morphological structure to its inflections, the abstractions such as “past” which cut across lexical differences. As explained in section 40.1, WG follows the European “Word and Paradigm” approach to inflectional morphology by separating morphological structure from (p. 997) inflectional categories and avoiding the term “morpheme”, which tends to confuse the two. This allows all sorts of complex mappings between the two structures, including a mapping in which several inflections are realized by a single morph (as in Latin am-o, “I love”, where the suffix {o} realizes “first-person”, “singular”, “present”, and “indicative”).

This strict separation of morpho-syntax from morpho-phonology is not limited to inflectional morphology but runs through the entire WG approach to morphology. One consequence is that although the logical contrast between lexical and inflectional morphology applies to morpho-syntax, it is irrelevant to morpho-phonology. For example, the {er} suffix which is found in agent-nouns (Figure 40.8) is also used in the comparative inflection (as in bigger). In morpho-phonology the issues concern morphological structure—what kinds of structure are possible, and what kinds of generalization are needed in order to link them to sounds? The analysis deals in distinctions such as that between root morphs and affixes, and has to capture generalizations such as the fact that full morphs are typically realized by one or more complete syllables, whereas affixes are often single segments. Furthermore, it has to have enough flexibility to accommodate patterns in which one structure is related to another, not by containing an extra morph but in all the other familiar ways such as vowel change as in take—took. We already have a partial analysis for this pair (Figure 40.4), but this simply presents {took} as an unrelated alternative to {take}, without attempting either to recognize the similarities between them or to reveal that the vowel is the usual locus for replacive morphology. Both these goals are achieved in Figure 40.9, which recognizes “V” (the stressed vowel) as a special type of realization which varies in morphs such as {take}.

Word GrammarClick to view larger

Figure 40.9 The alternation in take—took involves only the stressed vowel

(p. 998) This figure also illustrates another important facility in WG, the notion of a “variant”. This is the WG mechanism for capturing generalizable relations between mor-phological structures such as that between a form and its “ed-variant”—the structure which typically contains {ed} but which may exceptionally have other forms such as the one found in {took}. Typically, a form’s variant is a modification of the basic form, but in suppletion the basic form is replaced entirely by a different one. Variants have a number of uses in morpho-phonology. One is in building complex morphological structures step-wise, as when the future tense in Romance languages is said to be built on the infinitive (e.g., in French, port-er-ai “I will carry” but part-ir-ai “I will depart”). Another is in dealing with syncretism, where two or more distinct inflections systematically share the same realization; for example, in Slovene, dual and plural and plural nouns are generally different in morphology, but exceptionally the genitive and locative are always the same, and this is true even in the most irregular suppletive paradigms (Evans et al. 2001). The question is how to explain the regularity of this irregularity. One popular solution is to use a “rule of referral” (Stump 1993) which treats one form as basic and derives the other from it; so in the Slovene example, if we treat the genitive plural as basic we might use this in a rule to predict the genitive dual and locative dual. But rules of referral are very hard to take seriously if the aim is psychological reality because they imply that when we understand one form we must first mis-analyze it as a different one; and in any case, the choice of a basic form is psychologically arbitrary. The WG solution is to separate the morpho-syntax from the morpho-phonology. In morpho-phonology, we recognize a single “variant” which acts as the realization for a number of different inflections; so, for example, in Slovene, the variant which we might call (arbitrarily) “P3”, and which has different morphophonological (p. 999) forms in different lexemes, is always the one used to realize dual as well as plural in the genitive and locative (Hudson 2006: 86).

The main tools in WG morphology are all abstract relations: lexical relations between lexemes, realization relations, and “variant” relations among formal structures. This is typical of a network analysis, and anticipates what we shall find in syntax.

40.8 Syntax

Syntax is the area of analysis where most work has been published in WG, and the one on which the theory’s name is based (as explained in section 40.1). By far the most controversial aspect of WG syntax is the use of dependency structure instead of the more familiar phrase structure. The reason for this departure from the mainstream is that the arguments for dependency structure are very strong—in fact, even adherents of phrase structure often present it as a tool for showing syntactic dependencies—and (contrary to what I once believed—Hudson 1976) once dependencies are recognized, there are no compelling reasons for recognizing phrases as well. In WG syntax, there-fore, dependencies such as “subject” or “complement” are explicit and basic, whereas phrases are merely implicit in the dependency structure. This means, for example, that the subject of a verb is always a noun, rather than a noun phrase, and that a sentence can never have a “verb phrase” (in any of the various meanings of this term). The structure in Figure 40.10 is typical of dependency relations in WG, though it does not of course try to show how the words are classified or how the whole structure is related to the underlying grammar.

Word GrammarClick to view larger

Figure 40.10 Dependency structure in an English sentence

WG dependency structures are much richer than those in other dependency grammars because their role is to reveal the sentence’s entire syntactic structure rather than just one part of it (say, just semantics or just word order); and in consequence each sentence has just one syntactic structure rather than the multi-layered structures found, for example, in Functional Generative Description (Sgall et al. 1986) or the Meaning-Text Model (Mel’cuk 1997). This richness can be seen in Figure 40.10 where the word (p. 1000) syntax is the subject of two verbs at the same time: has and made. The justification for this “structure sharing” (where two “structures” share the same word) is the same as in other modern theories of syntax such as Head-Driven Phrase Structure Grammar (Pollard and Sag 1994: 2). However, some WG structures are impossible to translate into any alternative theory because they involve mutual dependency—two words each of which depends on the other. The clearest example of this is in wh-questions, where the verb depends (as complement) on the wh-word, while the wh-word depends (e.g., as subject) on the verb (Hudson 2003d), as in Figure 40.11. Such complex structures mean that a syntactic sentence structure is a network rather than a mere tree-structure, but this is hardly surprising given that the grammar itself is a network.

Word order is handled in current WG by means of a separate structure of “land-marks” which are predicted from the dependency structure. The notion of “landmark” is imported from Cognitive Grammar (e.g., Langacker 1990: 6), where it is applied to the semantics of spatial relations; for example, if X is in Y, then Y is the landmark for X. In WG it is generalized to syntax as well as semantics because in a syntactic structure each word takes its position from one or more other words, which therefore act as its “landmark”. In the WG analysis, “before” and “after” are sub-cases of the more general “landmark” relation. By default, a word’s landmark is the word it depends on, but exceptions are allowed because landmark relations are distinct from dependency relations. In particular, if a word depends on two other words, its landmark is the “higher” of them (in the obvious sense in which a word is “lower” than the word it depends on); so in Figure 40.10 the word syntax depends on both has and made, but only takes the former as its landmark. This is the WG equivalent of saying that syntax is “raised”. Similarly, the choice of order relative to the landmark (between “before” and “after”) can be set by default and then overridden in the way described at the end of section 40.4.

Word GrammarWord Grammar

Figure 40.11 Mutual dependency in a wh-question

Published WG analyses of syntax have offered solutions to many of the familiar challenges of syntax such as extraction islands and coordination (see especially Hudson 1990: 354–421) and gerunds (Hudson 2003b). Although most analyses concern English, there are discussions of “empty categories” (in WG terms, unrealized words) in Ice-landic, Russian, and Greek (Hudson 2003a; Creider and Hudson 2006) and of clitics in a number of languages, especially Serbo-Croatian (Camdzic and Hudson 2007; Hudson 2001).

(p. 1001) 40.9 Semantics

When WG principles are applied to a sentence’s semantics they reveal a much more complex structure than the same sentence’s syntactic structure. As in Frame Semantics (Fillmore, this volume), a word’s meaning needs to be defined by its “frame” of relations to a number of other concepts which in turn need to be defined in the same way, so ultimately the semantic analysis of the language is inseparable from the cognitive structures of the users. Because of space limitations, all I can do here is to offer the example in Figure 40.12 with some comments and refer interested readers to other published discussions (Hudson 1990: 123–66; Hudson and Holmes 2000; Hudson 2006: 211–36; Gisborne 2010).

The example gives the syntactic and semantic structure for the sentence The dog hid a bone for a week. The unlabeled syntactic dependency structure is drawn immediately above the words, and the dotted arrows link the words to relevant parts of the semantic structure; although this is greatly simplified, it still manages to illustrate some of the main achievements of WG semantics. The usual “1” labels (meaning a single token) have been distinguished by a following letter for ease of reference below.

Word GrammarClick to view larger

Figure 40.12 Syntactic and semantic structure for a simple English sentence

The analysis provides a mentalist version of the familiar sense/referent distinction (Jackendoff 2002: 294) in two kinds of dotted lines: straight for the sense and curved for the referent. Perhaps the most important feature of the analysis is that it allows the same treatment for all kinds of words, including verbs (whose referent is the particular incident referred to), so it allows events and other situations to have properties like those of objects; this is the WG equivalent of Davidsonian semantics (Davidson 1967; (p. 1002) Parsons 1990). For example, “1e” shows that there was just one incident of hiding, in just the same way that “1b” shows there was just one dog.

Definiteness is shown by the long “=” line which indicates the basic relation of identity (section 40.3). This line is the main part of the semantics of the, and indicates that the shared referent of the and its complement noun needs to be identified with some existing node in the network. This is an example of WG semantics incorporating a good deal of pragmatic information. The treatment of deictic categories such as tense illustrates the same feature; in the figure “1d”, the time of the boiling is before “1c”, the time of the word boiled itself.

The decomposition of “hiding” into an action (not shown in the diagram) and a result (“invisible”) solves the problem of integrating time adverbials such as for a week which presuppose an event with extended duration. Hiding, in itself, is a punctual event so it cannot last for a week; what has the duration is the result of the hiding, so it is important for the semantic structure to distinguish the hiding from its result.

WG also offers solutions to a range of other problems of semantics; for example, it includes the non-standard version of quantification sketched in section 40.4 as well as a theory of sets and a way of distinguishing distributed and joint actions (Hudson 2006: 228–32); but this discussion can merely hint at the theory’s potential.

40.10 Larning and Using Language

Question (j) is: “How does your model relate to studies of acquisition and to learning theory?” A central tenet of WG is that the higher levels of language are learned rather than innate, and that they are learned with the help of the same mechanisms as are available for other kinds of knowledge-based behavior. (In contrast, WG makes no claims about how the acoustics and physiology of speech develop.) This tenet follows from the claim that language is part of the general cognitive network, but it is supported by a specific proposal for how such learning takes place (Hudson 2006: 52–9), which in turn is based on a general theory of processing. The theories of learning and processing build on the basic idea of WG that language is a network, so they also provide further support for this idea.

The main elements in the WG theory of processing are activation and node-creation. As in all network models of cognition, the network is “active” in two senses. First, activation—which is ultimately expressed in terms of physical energy—circulates around the network as so-called “spreading activation”, making some nodes and links temporarily active and leaving some of them permanently more easily re-activated than others. There is a great deal of evidence for both these effects. Temporary activation can be seen directly in brain imaging (Skipper and Small 2006), but also indirectly through the experimental technique of priming (Reisberg 2007: 257–62). Permanent effects come mainly from frequency of usage and emerge in experiments such as those which test the relative “availability” of words (Harley 1995: 146–8). The two kinds of (p. 1003) change are related because temporary activation affects nodes differently according to their permanent activation level. Moreover, because there is no boundary around language, activation spreads freely between language and non-language, so the “pragmatic context” influences the way in which we interpret utterances (e.g., by guiding us to intended referents).

The second kind of activity in the network consists of constant changes in the fine details of the network’s structure through the addition (and subsequent loss) of nodes and links in response to temporary activation. Many of these new nodes deal with ongoing items of experience; so (for example) as you read this page you are creating a new node for each letter-token and word-token that you read. Token nodes must be kept separate from the permanent “type nodes” in the network because the main aim of processing is precisely to match each token with some type—in other words, to classify it. The two nodes must be distinct because the match may not be perfect, so when you read yelow, you match it mentally with the stored word YELLOW in spite of the mis-spelling.

As for learning, WG offers two mechanisms. One is the preservation of temporary token nodes beyond their normal life expectancy of a few seconds; this might be triggered for example by the unusually high degree of activation attracted by an unfamiliar word or usage. Once preserved from oblivion, such a node would turn (logically) into a type node available for processing future token nodes. The other kind of learning is induction, which also involves the creation of new nodes. Induction is the process of spotting generalizations across nodes and creating a new super-node to express the generalization. For instance, if the network already contains several nodes which have similar links to the nodes for “wing”, “beak”, and “flying”, a generalization emerges: wings, beaks, and flying go together; and a new node can be created which also has the same links to these three other nodes, but none of the specifics of the original nodes. Such generalizations can be expressed as a statistical correlation between the shared properties, and in a network they can be found by looking for nodes which happen to receive activation from the same range of other nodes. Induction is very different from the processing of ongoing experience, and indeed it may require down time free of urgent experience such as the break we have during sleep.

In reply to question (l) “How does your model deal with usage data?”, therefore, the WG theory of learning fits comfortably in the “usage-based” paradigm of cognitive linguistics (Barlow and Kemmer 2000) in which language emerges in a rather messy and piecemeal way out of a child’s experience, and is heavily influenced by the properties of the “usage” experienced, and especially by its frequency patterns (Bybee 2006c).

40.11 The Social Context

Question (i) is: “Does your model take sociolinguistic phenomena into account?” The answer to this question is probably more positive for WG than for any other theory of (p. 1004) language structure. As explained in section 40.1, sociolinguistics has long been one of my interests—indeed, this interest predates the start of WG—and I have always tried to build some of the more relevant findings of sociolinguistics into my ideas about language structure and cognition.

One of the most relevant conclusions of sociolinguistics is that the social structures to which language relates are extremely complex, and may not be very different in complexity from language itself. This strengthens the case, of course, for the WG claim that language uses the same cognitive resources as we use for other areas of life, including our social world—what we might call “I-society”, to match “I-language”. The complexity of I-society lies partly in our classification of people and their permanent relations (through kinship, friendship, work, and so on); and partly in our analysis of social interactions, where we negotiate subtle variations on the basic relations of power and solidarity. It is easy to find parallels with language; for example, our permanent classification of people is similar to the permanent classification of word-types, and the temporary classification of interactions is like our processing of word-tokens.

Another link to sociolinguistics lies in the structure of language itself. Given the three-level architecture (section 40.5), language consists of sounds, forms, and words, each of which has various properties including some “social” properties. Ignoring sounds, forms are seen as a kind of action and therefore inherit (inter alia) a time and an actor—two characteristics of social interaction. Words, on the other hand, are symbols, so they too inherit interactional properties including an addressee, a purpose, and (of course) a meaning (Hudson 2006: 218). These inherited properties provide important “hooks” for attaching sociolinguistic properties which otherwise have no place at all in a model of language. To take a very elementary example, the form {bonny} has the property of being typically used by a Scot—a fact which must be part of I-language if this includes an individual’s knowledge of language. Including this kind of information in a purely linguistic model is a problem for which most theories of language structure offer no solution at all, and cannot offer any solution because they assume that I-language is separate from other kinds of knowledge. In contrast, WG offers at least the foundations of a general solution as well as some reasonably well-developed analyses of particular cases (Hudson 1997a, 2007a, 2006: 246–8). To return to the example of {bonny}, the WG analysis in Figure 40.13 shows that its inherited “actor” (i.e., its speaker) is a Scot—an element in social structure (I-society), and not a mere uninterpreted diacritic.

40.12 Similarities and Differences Across Space and Time

Word GrammarClick to view larger

Figure 40.13 The form {bonny} is typically used by a Scot

Since WG is primarily a theory of I-language (section 40.2) it might not seem relevant to question (g): “How does your model account for typological diversity and universal (p. 1005) features of human languages?” or (h): “How is the distinction synchrony vs. diachrony dealt with?”. Typology and historical linguistics have traditionally been approached as studies of the E-language of texts and shared language systems. Nevertheless, it is individuals who change languages while learning, transmitting, and using them, so I-language holds the ultimate explanation for all variation within and between languages.

The answers to questions (g) and (h) rest on the answer to question (k): “How does your model generally relate to variation?” Variation is inherent in the WG model of I-language, partly because each individual has a different I-language but more importantly because each I-language allows alternatives to be linked to different social contexts (section 40.11). Such variation applies not only to lexical items like BONNY in relation to its synonyms, but also to phonological, morphological, and syntactic patterns—the full range of items that have been found to exhibit “inherent variability” (e.g., Labov 1969; Hudson 1996: 144–202). Moreover, variation may involve categories which range from the very specific (e.g., BONNY) to much more general patterns of inflectional morphology (e.g., uninflected 3rd-singular present verbs in English) or syntax (e.g., multiple negation). These more general patterns of social variation emerge in the network as correlations between social and linguistic properties, so learners can induce them by the same mechanisms as the rest of the grammar (section 40.10).

Returning to the two earlier questions, then, the distinction between synchrony and diachrony is made within a single I-language whenever the social variable of age is invoked, because language change by definition involves variation between the language of older and younger people and may be included in the I-language of either or both generations. However, this analysis will only reveal the ordinary speaker’s understanding of language change, which may not be accurate; for example, younger speakers may induce slightly different generalizations from older speakers without being at all aware of the difference. One of the major research questions in this area is whether this “restructuring” is gradual or abrupt, but usage-based learning (section 40.10) strongly predicts gradual change because each generation’s I-language (p. 1006) is based closely on that of the previous generation. This does indeed appear to be the case with one of the test cases for the question, the development of the modern English auxiliary system (Hudson 1997b). As for the other question, diversity among languages must derive from the theory of change because anything which can change is a potential source of diversity. Conversely, anything which cannot change because it is essential for language must also be universal. These answers follow from the WG mechanisms for inducing generalizations.

Equally importantly, though, the same mechanisms used in such variation of individual features allow us to induce the large-scale categories that we call “languages” or “dialects”, which are ultimately based, just like all other linguistic categories, on correlations among linguistic items (e.g., the correlates with cup in contrast with la and tasse) and between these and social categories. These correlations give rise to general categories such as “English word” (or “English linguistic unit”, as in Figure 40.5) which allow generalizations about the language. These language-particular categories interact, thanks to multiple inheritance, with language-neutral categories such as word classes, so a typical English word such as cup inherits some of its properties from “English word” and others from “noun”—see Figure 40.14. The result is a model of bilingualism (Hudson 2006: 239–46) which accommodates any degree of separation or integration of the languages and any degree of proficiency, and which explains why code-mixing within a sentence is both possible and also constrained by the grammars of both languages (Eppler 2010). The same model also offers a basis for a theory about how one language can influence another within a single I-language (and indirectly, in the entire E-language).

The one area of typological research where WG has already made a contribution is word order. Typological research has found a strong tendency for languages to minimize “dependency distance”—the distance between a word and the word on which it depends (e.g., Hawkins 2001), a tendency confirmed by research in psycholinguistics (Gibson 2002), and corpus linguistics (Collins 1996; Ferrer i Cancho 2004). The notion of “dependency distance” is easy to capture in a dependency-based syntactic theory such as WG, and the theory’s psychological orientation suggests a research program in psycholinguistic typology. For example, it is easy to explain the popularity of SVO and similar “mixed” orders in other phrase types as a way of reducing the number of dependents that are separated from the phrase’s head; thus in SVO order, both S and O are adjacent to V, whereas in both VSO and SOV one of these dependents is separated from V (Hudson 2006: s161). However, this explanation also implies that languages with different word orders may tend to make different demands on their users, when measured in terms of average dependency distances in comparable styles. Results so far suggest that this is in fact the case—for instance, average distances in Mandarin are much greater than those in English, and other languages have intermediate values (Liu et al. 2008).

Word GrammarClick to view larger

Figure 40.14 French TASSE and English CUP share a word class and a meaning

What, then, does WG offer a working descriptive linguist? What it does not offer is a check-list of universal categories to be “found” in every language. The extent to which different languages require the same categories is an empirical research question, not (p. 1007) a matter of basic theory. What it does offer is a way of understanding the structure of language in terms of general psychological principles. However, it is also important to stress that the theory has evolved over several decades of descriptive work, mostly but not exclusively on English, and dealing with a wide range of topics—in morphology, syntax, and semantics; concerning language structure, psycholinguistics, and sociolinguistics; and in bilingual as well as monolingual speech. I believe the theoretical basis provides a coherence, breadth, and flexibility which are essential in descriptive work. (p. 1008)


I should like to thank Nik Gisborne for help with this chaPteR. Interested readers will find a great deal more information on the Word Grammar website at and many of the papers I refer to can be downloaded from