Computational Models of Second Language Sentence Processing
Abstract and Keywords
Sentence comprehension draws on multiple levels of linguistic knowledge, including the phonological, orthographic, lexical, syntactic, and discoursal. This article focuses on the computational models of second language sentence processing. Understanding the computational mechanisms responsible for using this knowledge in real time provides basic insights into how language and the mind work. For a cognitive theory of second language acquisition, a better understanding of how the second language learner develops the capacity to process sentences fluently also has important implications for theories of acquisition and instruction. This article examines two perspectives on written sentence comprehension in the second language. The two approaches considered are syntax based and constraint based. The approaches make fundamentally different assumptions concerning the nature of linguistic representation and how the human speech processing mechanism uses this knowledge in online comprehension. The two perspectives also represent a basic division between formalist and functionalist/usage based approaches to second language learning and use.
Sentence comprehension draws on multiple levels of linguistic knowledge, including the phonological, orthographic, lexical, syntactic, and discoursal. Understanding the computational mechanisms responsible for using this knowledge in real time provides basic insights into how language and the mind work. For a cognitive theory of second language acquisition, a better understanding of how the second language (L2) learner develops the capacity to process sentences fluently also has important implications for theories of acquisition and instruction (Segalowitz and Hulstijn, 2003).
This chapter examines two perspectives on written sentence comprehension in the L2. The two approaches considered are:
The approaches make fundamentally different assumptions concerning the nature of linguistic representation and how the human speech processing mechanism uses this knowledge in online comprehension. The two perspectives also represent a basic division in cognitive theory in SLA between formalist and functionalist/usage-based approaches to L2 learning and use (Gregg, 2003).
(p. 190) Syntax-Based Approaches to Sentence Processing
The syntax-based approach ascribes a central role to syntactic processes in driving comprehension. Syntactic knowledge is characterized as a set of atomic syntactic units and rules that relate these units in a hierarchical structure. This structural knowledge incorporates an abstract level of representation whose features may not be explicitly represented in surface forms (R. Hawkins, 2001). Sentence processing, or parsing, is characterized as a process of structure building driven by the application of these syntactic rules. Parsing is an incremental process in which each incoming word is incorporated into the structure as it is built and possibly revised online (Pickering and Van Gompel, 2006). Syntactic knowledge is modular in that it is represented as discrete, language-specific entities, or symbols, that are applied automatically in the course of comprehension. Lexico-semantic, frequency, and contextual information can affect the parse in various ways, but all approaches assume that parsing is primarily a syntax-driven process. The perspective is strongly influenced by the generative grammar tradition (Chomsky, 1986) and related assumptions concerning the (un)learnability of key principles of language structure from exposure alone (Marcus, 1998). The symbolic rule-based view of language has traditionally been the dominant approach in cognitive science in general and in L1 sentence processing in particular (MacDonald and MacDonald, 1995). The syntax-based approach has also been the main approach in L2 sentence processing research over the past decade, with virtually all the major online comprehension studies carried out in this framework. In this section the basic assumptions of the syntax-based approach are described and studies applying the framework to L2 processing are presented.
Grammar Knowledge as a Rule-Based Symbol System
There is a long tradition in cognitive science that characterizes sentence comprehension as a symbol manipulation process. This symbolic approach assumes that linguistic knowledge is represented in the mind in the form of localist representations, which in natural language include phonemes, morphemes, syntactic rules, and so forth. Computations, specified in rules, are carried out directly on these representations to yield an interpretation. Syntactic rules are assumed to be independent of the semantics of the specific items in a manner similar to an algebraic equation (e.g., a + b = c). The computation is carried out in the same way regardless of the specific values of a and b. Syntactic knowledge is compositional in the sense that constituents can be plugged into hierarchical structures, which in turn determine how the constituents are interpreted. These knowledge structures are also recursive in that they allow embedding within larger structures (Stilling et al., 1995).
As an account of processing the perspective has been less concerned with how these rules and representations are learned, although many of the syntax-based (p. 191) models below have been closely associated with generative views of language acquisition (e.g., Frazier and Fodor, 1978).
Sentence comprehension in the syntax-based perspective is a process of structure building in which syntactic rules play a primary role. Fundamental insights into how this structure building proceeds have come from examining how individuals process structures that make significant demands on comprehension. These include sentences that are temporarily ambiguous, as in garden path sentences like The horse raced past the barn fell … or nonlocal dependencies encountered in relative clauseand wh- structures, for example, Which girl do you believe John loves ____ a lot? (Frazier, 1987). The latter are assumed to involve movement of elements out of structural trace or gap positions that do not appear in the surface form and thus presumably require an abstract level of representation (Stowe, 1986).
One of the earliest syntax-based models was the two-stage “sausage” model proposed by Frazier and Fodor 1978. Strictly modular, the model made a sharp distinction between the principles, or heuristics, responsible for the initial parse of the input string into constituent syntactic structure in the first stage, and a thematic processor that drew on lexical and contextual information to interpret the initial output in the second stage. If the output of the first parse was not appropriate for the context, the structure was reanalyzed. One type of ambiguity studied was attachment preferences in relative clauses, as in (1):
(1a) The spy saw the cop with the binoculars.
(1b) The spy saw the cop with a revolver.
In (1a) the phrase with the binoculars can modify either the NP The spy or the VP saw the cop. The minimal attachment heuristic was proposed to account for observed attachment preferences in processing. Minimal attachment stipulated that the parser attaches a new phrase to the preceding tree structure in the structurally simplest manner—that is, the one involving the fewest additional nodes. In (1a) this would yield the interpretation in which the spy used the binoculars, and this indeed appeared to be the preferred attachment preference for English readers (Frazier, 1987). Rayner, Carlson, and Frazier (1983) tested the psychological reality of the minimal attachment proposal by comparing reading times for sentences like (1a) with the unambiguous (1b), in which the VP attachment reading is not possible. Eye-tracking results showed that readers found (1b) more difficult than (1a), the slowdown being attributed to the need to reanalyze the initial, and ultimately erroneous, parse in light of second stage thematic information—in other words, that revolvers cannot be used to see.
The two-stage approach accorded a primary role to syntactic principles, but it was also evident that the application of these principles was strongly influenced (p. 192) by nonsyntactic factors. These include lexico-semantic information, exposure (frequency), and prior context. All these factors have been shown to influence the course and outcome of the parsing operation in given settings, effects extensively demonstrated in the processing of temporarily ambiguous main verb/reduced relative structures as in (2) (MacDonald & Seidenberg, 2006: 592):
(2a) Temporary main verb/reduced relative ambiguity: The three men arrested …
(2b) Main verb interpretation: The three men arrested the burglary suspects …
(2c) Reduced relative ambiguity: The three men arrested by the local police …
When the parser first encounters the verb arrested in (2a) there are two possible interpretations. It could either be the main verb (2b) or part of a reduced relative clause (2c), with relatively longer reading times having been observed for (2c) in a number of studies. Early syntax-based accounts attributed the longer reader times to the need for the parser to reanalyze the original parse, which favored the simpler structural interpretation stipulated by the minimal attachment heuristic (Rayner, Carlson and Frazier, 1983). However, alternative constraint-based accounts have shown that the predicted structural effects can be neutralized or even reversed by nonsyntactic factors that act together to account for the observed behavioral outcomes.
Factors affecting interpretation include the following:
• The animacy of the preverbal noun (Trueswell, Tanenhaus, and Garnsey, 1994)
• Verb frequency in the respective structures (MacDonald, Perlmutter, and Seidenberg, 1994)
• The agenthood plausibility of the preverbal noun (Tanenhaus and Trueswell, 1995)
• The thematic role of preverbal noun (Tabor, Juliano, and Tanenhaus, 1997)
• Broader discourse constraints (Altmann and Steedman, 1988)
These constraints and their interaction explain why the main verb/reduced relative clause ambiguities can be extremely difficult for some readers and easy for others. This variation cannot be easily explained by the application of strict grammatical principles alone (Pickering and Van Gompel, 2006). At issue, however, is not whether this nonsyntactic information can influence processing outcomes, but whether such knowledge can completely eliminate difficulties encountered in, for example, garden path structures and by doing so remove the need to posit a syntax-based processing mechanism (Gibson, 1998; McKoon and Ratcliff, 2003; Pickering, Traxler and Crocker, 2000).
Syntax-Based Accounts of L2 Processing
L1 sentence processing research has been primarily concerned with developing normative models of adult processing. Research on L2 processing shares this goal but also has a primary interest in the relationship between acquisition and processing (p. 193) (Harrington, 2001). As in L1 research, L2 researchers attempt to understand the mechanisms and knowledge bases responsible for successful L2 processing by examining performance on target linguistic structures that place significant demands on learners. This performance is typically compared with L1 baseline performance (Juffs and Harrington, 1995; Williams, Möbius, and Kim, 2001) and assessed as a reflection of the learner's L1 (Marinis et al., 2005) or as a function of L2 proficiency (Hopp, 2006). A central question implicit in the research discussed here concerns the degree to which observed L2 processing difficulties can be attributed to fundamental differences between the learner's L2 grammar knowledge and the L1 grammar—that is, a knowledge deficit, or whether the observed difficulties arise from online processing limitations that vary according to such learner-based factors as proficiency, working memory capacity, or automaticity.
An early syntax-based study of L2 sentence processing was Juffs and Harrington (1995, 1996). The study examined online reading of wh- question structures by advanced Chinese ESL learners, focusing on the observed asymmetry evident in L2 learner performance on grammatical judgments for wh- question structures (Schachter and Yip, 1990). The sentences in (3) are from a class of structures that are assumed to be governed by constraints on how constituents can be moved across structural element (Chomsky, 1986). In (3a) it is assumed that the wh- element is moved, or extracted, from the underlying subject position, as indicated by the empty gap in (3a), or from the object position in (3b):
(3a) Who did Ann believe____likes her friend? (subject extraction)
(3b) Who did Ann believe her friend likes____ (object extraction)
ESL learners experience more difficulty accepting grammatical subject extraction sentences (3a) than the object counterparts on these structures. Poorer performance on the subject structures (3a) by, for example, Chinese ESL learners has been used as evidence for lack of access to UG principles by these learners, given the assumed absence of wh- movement constraints in Chinese (Schachter and Yip, 1990). However, although L2 learners had difficulties with the subject structures (3a), they were sensitive to movement constraints in other types of wh- structures, including (3b). This raised the possibility that knowledge of the movement constraints per se was not the source of difficulty. Juffs and Harrington (1995, 1996) compared reading times on the two structures by advanced Chinese ESL learners using a self-paced reading task. They found the participants produced significantly longer reading times in the target region in the subject structures, in line with the parsing predictions made by Pritchett's (1992) thematic role assignment model.1 This led the authors to conclude that the parser, rather than the underlying L2 grammar knowledge, was responsible for the observed subject-object asymmetry (Juffs and Harrington, 1995, 1996). The findings were replicated in Juffs 2005.
Structures involving these nonlocal dependencies are a testing ground for syntax-based accounts, as the accounts assume that abstract linguistic structures represented in the underlying grammar are used in syntactic processing. A question of interest is to what extent L1 processing preferences can influence L2 processing (p. 194) outcomes on these structures. L1 influence has been demonstrated in processing L2 phonological, lexical, and morphosyntactic properties (Juffs, 1998a; Marian and Spivey, 2003; Weber and Cutler, 2003), but there is little evidence for the transfer of L1 syntactic processing preferences in the online processing of the nonlocal dependencies in structures like (4). There is substantial evidence that native English readers reactivate the fronted wh-structures at the structural gaps, even when the gap is not the target for the structure—that is, when the structure does not occur immediately after the subcategorizing verb (Gibson and Warren, 2004). Here is an example:
(4) The nurse who the doctor argued (i) that the rude patient had angered (ii) is refusing to work late.
L2 learners do not appear to process the gaps in the same manner as L1 readers, especially those who have learned the L2 as adults (Clahsen and Felser, 2006b; Dallas and Kaan, 2008; Marinis et al., 2005). Marinis et al. (2005) used a self-paced reading task to compare reading times by native English speakers and L2 learners on sentences like example (4). The structures contained intermediate (i) and target (ii) gaps, allowing the processing of the empty gap and any lexical effect of the adjoining verb to be examined separately. The L2 participants were from typologically distinct Chinese, German, Japanese, and Greek L1s.
Relative faster reading times were evident for the native speakers at the target angered (ii) in structures containing an intermediate gap (i) compared to controlsentences with no intermediate gap. In contrast, the L2 readers did not seem to be affected by the availability of the intermediate syntactic gap, regardless of L1 background. The authors attributed the results to differences in use of abstract syntactic features in the course of processing, with the L2 readers relying more on lexical information from the verb and other semantic information in the sentence for comprehension. Felser and Roberts 2007 reported similar results using a cross-modal priming technique to study sentences containing a single indirect object gap.
Papadopoulou and Clahsen 2003 also examined language-specific attachment preferences for relative clauses as a potential source of L1 transfer. Prior exposure has been shown to affect the initial parsing preference in complex noun phrases like (5) (Mitchell et al., 1995):
(5) The journalist interviewed the daughter of the colonel who had the accident.
The preference for attaching the relative clause who had the accident “low” to the colonel (the colonel had the accident), or “high” to the daughter (the daughter had the accident) appears to differ cross-linguistically according to frequency of occurrence. English readers tend to prefer a low reading, whereas French, German, and Dutch readers prefer the high reading (Mitchell et al., 1995), though other research (Mitchell and Brysbaert, 1998) did not support these preferences. Reading time performance of advanced learners of Greek from Spanish, German, and Russian L1s was compared (Papadopoulou and Clahsen, 2003). The participants showed native-like knowledge (p. 195) of the structures in an off-line task, but also showed online processing preferences that were different from the native Greek controls and that did not show any influence from L1 processing strategies. Other studies have also failed to demonstrate L1 effects on L2 parsing (Marinis et al., 2005; Williams, Möbius, and Kim, 2001).
However, the nonequivalence of L1 and L2 syntactic processing has been challenged. Hopp 2006 showed that highly proficient L2 learners can exhibit native-like processing outcomes on complex structures. The study compared performance on German subject and object relative clause structures by L1 Dutch and L1 English learners of low and near-native proficiency. Contrary to Clahsen and Felser (2006b) the results indicated that the near-native group for both L1s showed reliable use of syntactic features in phrase-structural reanalysis, especially in a condition in which lexical, semantic, pragmatic, and frequency-based cues to sentence interpretation were not available. Moreover, the near-natives showed an interaction of syntactic feature type and phrase-structural parsing principles in parsing the ambiguous grammatical sentences as well as the ungrammatical ones. The study suggests that the difference between L1 and L2 syntactic knowledge may be one of degree and not of kind. See Sabourin and Stowe 2008 for a similar discussion.
It remains to be established whether L2 readers are able to reach native-like processing fluency on complex syntactic structures in the L2. The weight of the evidence to date has led Clahsen and others to propose that the abstract syntactic representations available to native speakers are not used in the same way by adult L2 learners. Adult L2 processing in this view is characterized as being shallower in that it makes less use of syntactic or phrase-structural information in the course of parsing (Townsend and Bever, 2001). L2 parsing decisions seem to be guided to a greater extent by lexical-semantic cues and associative knowledge than those in the L1 (Clahsen and Felser, 2006a, 2006b; Dallas and Kaan, 2008).
Evidence that L2 learners do not access abstract linguistic features when processing the target L2 grammar indicates that knowledge shortcomings may be a factor in L2 processing difficulties. It also raises the question as to how abstract linguistic features are represented in the L2 grammar and how those features might interact with other types of knowledge. Although Clahsen and others maintain that L2 learners do not use this abstract knowledge in the same way as native speakers, presumably L2 grammar knowledge remains essentially a rule-based entity, though one that differs from the L1 in important respects (Hawkins, 2008; Williams and Kuribara, 2008).
Whatever differences and similarities may be evident between the L1 and L2 grammars, more studies are needed to assess how knowledge representations interact with processing limitations. Two key processing limitations are individual differences in working memory capacity and the degree of automaticity attained in L2 processing. Although individual differences in L2 working memory capacity have been related to L2 language performance in a number of domains (Harrington and Sawyer, 1992; Kormos and Safar, 2008; Leeser, 2007), the effect of capacity differences on L2 syntactic processing outcomes has yet to be shown. The few studies that have been done show little or no effect for differences in working memory on processing the type of structures examined here (Felser and Roberts, 2007; (p. 196) Juffs, 2005). Automaticity in accessing and using linguistic knowledge in real-time discourse is an essential aspect of fluent processing. Reduced automaticity in even advanced L2 learners in phonological and lexical processing has been widely observed, and the assumption is that automaticity in syntactic processing will be similar (Segalowitz and Hulstijn, 2003). Recent research using evoked-response potentials (ERP) to track the time course of processing provides some evidence that L2 learners also display less automaticity in the processing of complex syntactic structures (Clahsen and Felser, 2006b). More research in both areas is needed.
Syntax-Based Models of L2 Processing and SLA Theory
Syntax-based approaches assume that syntactic knowledge is a primary driver of online comprehension. L1 accounts provide evidence for the need to posit an abstract level of syntactic representation to account for the processing of the complex structures discussed here (Traxler and Van Gompel, 2006; though see Van Valin, 1995, for a counter view). The L2 syntax-based accounts examined make the same assumptions about syntactic knowledge and processing as their L1 counterparts. However, a number of studies show that L2 processors do not use this knowledge in the same way. Whether this abstract knowledge is available at all to the L2 learner remains an open question, as does the extent to which the adult L2 reader can achieve native-like processing skill overall, or just in certain parts of the grammar (Dallas and Kaan, 2008; Dussias and Sagarra, 2007; Osterhout et al., 2006; Sabourin and Stowe, 2008).
Clahsen and colleagues propose a model of L2 processing that is shallower and is driven by lexicalist and associative knowledge to a greater extent than in the L1 (see Clahsen and Felser, 2006a). This proposal suggests that L2 grammatical knowledge is not a unitary construct in which one set of mechanisms can account for all the effects observed. It is also consistent with the long-standing view that adult SLA is fundamentally different from L1 acquisition—a difference evident here at the computational level (Bley-Vroman, 1989). The less important abstract linguistic features become in determining L2 processing outcomes, the more overlap the L2 accounts discussed in this section share with the constraint-based approaches considered in the next.
Constraint-Based Approaches to L2 Sentence Processing
The constraint-based approach characterizes sentence processing as an interactive process in which different sources of information compete and converge in the course of comprehension (MacDonald and Seidenberg, 2006). The approach is nonmodular, with grammatical knowledge being but one source of information influencing (or constraining) processing outcomes. Word order regularities, (p. 197) morphosyntax, number agreement, and so forth are not represented as abstract rule-based knowledge but rather as statistical form-function regularities that arise as the result of use. These knowledge representations emerge as the result of the interaction of linguistic, pragmatic, contextual, and physical features (or cues) available in the input that are stored in memory at the time of processing (Ellis, 1998). The relative weight of a cue depends on its distribution and frequency in the input and will determine its availability during processing. The associative mechanisms responsible for the learning of the cues are also directly implicated in the processing of a specific instance (MacWhinney, 1999).
The multiplicity of potential cues in the input and the interactive nature of their use make the approach less amenable to testing specific effects. As a result, online accounts of the processing of the complex syntax discussed above have yet to appear in the L2 literature. Constraint-based studies of L2 processing have focused on the cross-linguistic development of cue knowledge—particularly as a source of L1 transfer (Harrington, 1987; Rounds and Kanagy, 1998)—and on the development of these cues in connectionist-learning studies (Ellis and Schmidt, 1998, 1997). Recent work in usage-based construction grammar (P. Robinson and Ellis, 2008) shares basic assumptions with these approaches. In this section, these assumptions are described and L2 studies in the framework are discussed.
Grammar Knowledge as a Graded Probabilistic System
Grammar knowledge in the constraint-based approach is represented as interconnected units in memory and not as a set of abstract rules. In connectionist approaches, these units take the form of distributed features connected in complex associative networks. Structure-like patterns corresponding to traditional rule-based representations (phonemes, morphemes, syntactic rules) emerge from these networks (Ellis, 2006), producing rule-like behavior in production and comprehension. Crucially this behavior does not assume the abstract level of syntactic representation central to principle-based approaches (MacWhinney, 1999). In more recent grammar accounts these units take the form of individual grammatical constructions (Tomasello, 2003).
Grammar knowledge develops as the result of experience with the language; hence, the approach is termed usage-based (Bybee, 2006). The effect of this experience is graded, with representations undergoing constant modification and strengthening with use. The graded nature of the patterns contrasts with traditional symbolic approaches, in which these linguistic representations are assumed to be learned in a discrete, all-or-none manner. The graded nature of language knowledge also means proficiency can be characterized in probabilistic statistical terms (Seidenberg and McDonald, 1999). The application of grammar knowledge in this view is not an either/or computation of a rule but rather an estimation of which alternative is the most probable. Probabilistic models have an advantage over rule-based models in being able to capture the variable nature of behavior, both across individuals and across time. The approach is particularly advantageous in SLA theory and research, (p. 198) in which variation itself is of primary theoretical and methodological interest. Although probabilistic approaches are not incompatible with discrete rule approaches, they are more consistent with constraint-based view of cognition (Chater and Christiansen, 2008).
The constraint-based approach assumes a direct and immediate relationship between processing and learning. Novel input is processed via previously stored experience, with the act of comprehension or production itself changing the strength of the existing knowledge representations. As a result, the mechanisms involved in processing existing forms are also responsible for learning new ones. Furthermore, unlike the modularity of the syntax-based approaches, these mechanisms draw on the same cognitive mechanisms as those used by other cognitive processes; that is, they are not specific to language. Language learning is thus characterized as an emergent property of the interaction between the learning environment and general learning capacities of the individual, and not as the result of some prespecified principles or capacities (Ellis, 1998).
Sentence Processing as Cue Competition and Convergence
The constraint-based perspective characterizes sentence processing as an interactive process in which multiple sources of information—semantic, syntactic, discourse, and referential—compete and converge to yield a particular interpretation. This process is exemplified in the competition model (MacWhinney and Bates, 1989), a paradigmatic constraint-based model. The model was developed to account for how readers or listeners assign agenthood in simple active sentences like Tom saw the sunset. The main English cues to agenthood in these sentences are word order (the first noun is usually the agent), preverbal position (the agent usually precedes the action), and animacy (animate entities can do things like “see”), and in certain contexts these cues all bias the assignment of agenthood to Tom in the canonical sentence above. But cue assignment is probabilistic, and the second noun could be the agent, as when the object is left-dislocated for effect (The sunset Tom saw). In this case, word order biases agent assignment to the sunset, whereas animacy and preverbal cues converge to bias the selection of Tom as the agent. Native English speakers will typically assign agenthood to the sunset in nonsensical but grammatically acceptable strings like The sunset saw Tom. This reflects the importance of word order as a cue in English—a condition not universally shared across languages; for example, Italians accord less weight to word order cues than English speakers (MacWhinney and Bates, 1989). Cross-linguistic differences in cue types and cue weights provide a means to quantify differences among languages and across levels of learner proficiency within a language.
The available cues are simultaneously activated in the course of processing, with the ultimate interpretation the result of the cooperation and competition among the varying cues strengths (MacWhinney, 2005). The competition model embodies the crucial characteristics of the constraint-based approach, and has, as a research program, had a significant influence on research on cross-linguistic (p. 199) differences and on how cues are learned by children. However, application of the model has been limited to the domain of agent assignment, not addressing how these multiple cues are integrated online.
Constraint-Based Accounts of Sentence Processing
The constraint-based view of processing is best exemplified by connectionist models of cognition and language. A variety of constraint-based accounts of learning and processing have been modeled within the connectionist framework, with early ones focusing on phonological, lexical, and morphosyntactic phenomena (O'Reilly and Munakata, 2000). Connectionist models have been trained to learn the following:
• How to form English past tense verbs (Rumelhart and McClelland, 1986)
• How to assign verb tense (Plunkett and Marchman, 1993)
• How to resolve lexical ambiguity (Kawamoto, 1993)
These early studies were highly influential because they demonstrated the ability of the approach to learn and process language by a general associative learning mechanism that provided a uniform account of processing across linguistic levels, perceptual, lexical, and syntactic (MacDonald and Seidenberg, 2006). However, the adequacy of associative learning alone as the means to develop the capacity to process complex structural relations has been challenged (Marcus, 1998; Pinker and Prince, 1988). Such relations include the nonlocal dependencies discussed earlier as well as other relations like embedding and agreement. These domains have been assumed to require an abstract level of syntactic representation that entails a rule-based system of grammatical knowledge (Pinker and Prince, 1988).
A major advancement thus was that of Elman 1993, who showed how a certain class of connectionist models could in principle capture nonlocal dependencies. Elman used a simple recurrent network (SRN) model to predict word sequences in sentences generated by a small, context-free grammar. The grammar contained key syntactic structures (including subject-verb agreement and subject-object relative clauses), and the model was able to extract these grammatical regularities from the input. Most notable was the ability to identify agreement relations across intervening words (Elman, 1993).
Elman's findings were extended to show that connectionist models were able to capture the difficulties encountered by human readers processing complex syntax. Christiansen and Chater 1999 examined the processing of center-embedded structures by comparing results of an SRN simulation with reading data from human subjects. They found that the connectionist simulation did a good job of predicting the structural complexity effects evident in the reading data. The resolution of garden path ambiguities was also examined in a two-component model consisting of an SRN and a weighting mechanism (Tabor, Juliano, and Tanenhaus, 1997). The lexical and syntactic interactions obtained from the simulation were fitted to human reading times from previous studies. The foregoing indicated that these structures can be handled in (p. 200) the connectionist framework. However, although interactive connectionist accounts have shown the potential to capture the complex syntactic relations needed in online comprehension, these structures remain a challenge for models of language processing that depend on statistical learning alone (MacDonald and Seidenberg, 2006).
Constraint-Based Accounts of L2 Processing
Constraint-based accounts of L2 processing include those done in the competition model framework (Bates and MacWhinney, 1989) and connectionist studies (Ellis and Schmidt, 1998). Unlike the syntax-based accounts, the focus in these studies has not been on online comprehension processes, but rather on identifying the linguistic factors that contribute to learning and processing outcomes.
The earliest constraint-based studies of L2 processing were applications of the competition model. These studies modeled L2 learning in terms of the development of cue strengths in the target L2. The focus was on relative differences in cue strengths across the learner's L1 and L2. These cue strengths were of interest as a source of cross-linguistic processing transfer, and as a measure of proficiency within the specific languages. Results from off-line agent-assignment tasks showed that cue weights provided a reliable and quantifiable way to describe L2 development in both areas (Harrington, 1987; Kilborn, 1989; Rounds and Kanagy, 1998; Sasaki, 1994). The competition model was limited in that it was applied to a single domain—agent assignment—and the off-line nature of the experimental task provided little insight into how these cues might be used in real-time processing. Recent studies have attempted to extend the model to argument structure (Dong and Cai, 2007).
Connectionist accounts of L2 development have focused on learning specific grammatical features. Early research examined the mapping of lexical items onto the following:
• Thematic roles (Gasser, 1990)
• The development of verb morphology (Broeder and Plunkett, 1994)
• Tense in English (Ellis and Schmidt, 1998)
Ellis and Schmidt 1998 investigated the frequency by regularity-interaction observed in the production of English past tense verbs (Prasada and Pinker, 1993). English native speakers differ in the time taken to produce regular (play-played) and irregular (run-ran) past tense forms, the production of the latter being closely related to stem frequency. Past tense forms of high-frequency stems are produced more quickly than those from low-frequency stems, whereas regular past tense verb take approximately the same time to produce, regardless of stem frequency. To account for this result, two separate mechanisms have been proposed:
1. Frequency-sensitive irregular verbs are stored as individual items in associative memory, with production based on retrieval from memory.
Ellis and Schmidt 1998 tested this proposal in a study comparing human and connectionist simulation data on learning plural morphology.2 They found a frequency-by-regularity interaction similar to the earlier study and consistent with earlier connectionist models (Rumelhart and McClelland, 1986). The results demonstrated that, in principle, a separate symbolic rule-based mechanism was not required to account for learning in this domain. The researchers also attempted to show that a connectionist model could account for the learning of number agreement by L2 learners (Ellis and Schmidt, 1997). The study tested Ellis's (1996) bootstrapping account of SLA in which the interaction of short-term and long-term memory processes allows the learner to extract nonlocal grammatical dependencies on the basis of exposure alone. The results were consistent with the account, as better memory performance correlated with more accurate performance on the structures (Ellis and Schmidt, 1997). However, the indirect nature of evidence from memory tasks, the fragmentary nature of the language used, and the off-line nature of task leaves open how generalizable these findings are to actual acquisition and online processing.
The accounts considered so far have not directly examined online comprehension processes in the same way as the syntax-based models discussed in the first part of this chapter. The formal linguistic theories that inform the latter provide the means to make specific processing predictions that can be tested in a controlled manner. To date, constraint-based accounts have been successful in identifying what knowledge is important for explaining L2 outcomes, but they have been less successful in explaining how that knowledge is integrated in real-time processing.
Complex syntax thus remains an issue for constraint-based approaches to processing. These domains are receiving increasing attention from researchers in construction grammar (Goldberg, 2006; Tomasello, 2003). The construction grammar approach shares basic assumptions of the constraint-based approach as it has been outlined here. Principal among these is the usage-based nature of grammar knowledge, its graded representation, and the emphasis on memory retrieval in processing in lieu of the application and evaluation of syntactic principles in processing. The processing of nonlocal wh- dependencies has been addressed in recent work in the area (Ambridge and Goldberg, 2008; Dąbrowska, 2008). Although the current empirical data (consisting of questionnaire and production data) is off-line, the insights may be applicable to testing in online studies of L2 processing in the future (Goldberg and Casenhiser, 2008).
Constraint-Based Models of L2 Processing and Sla Theory
Constraint-based models of L2 sentence processing remain underspecified in terms of the online effect studied by the syntax-based approaches. However, although empirical findings are limited, the constraint-based perspective remains appealing, as it embodies key properties of L2 learning and performance. The graded, (p. 202) probabilistic nature of grammar knowledge that is characteristic of the approach is well suited to capturing the variation evident across learners, languages, and settings—an integral part of adult SLA. The unified approach to learning and processing also provides an explicit characterization of the transition mechanisms that must be specified as part of a complete theory of SLA (Gregg, 2003).
As more explicit processing predictions are developed, more online processing studies in the constraint-based framework can be expected. Although the discussion so far has contrasted syntax-based and constraint-based accounts, there may also be a middle ground between the two. The less that abstract linguistic knowledge is seen to play a role in L2 processing outcomes (Clahsen and Felser, 2006a), the more tenable constraint-based approaches become as an account of the computational basis of SLA. Adult L2 grammar learning and processing may turn out to be a hybrid process in which constraint learning processes combine in some manner with structural representations that are primarily rule-based (Mellow, 2004).
The two perspectives examined here differ significantly in how they approach L2 sentence processing and in the domains of L2 performance studied. Three questions emerge from the comparison of the two perspectives and from the research produced to date:
1. Are L1 and L2 sentence processing fundamentally different? The evidence from syntax-based research is mixed, though an increasing number of studies suggest that abstract syntactic relations are handled differently by the L2 processor. The constraint-based approach assumes the two are the same on theoretical grounds. The ability to answer this question will depend in part on being able to identify and isolate the effect of nonstructural factors (frequency, plausibility, etc.) on syntactic processing outcomes.
2. Is L2 sentence processing a unitary behavior? The evidence from the syntax-based studies suggests it is not. Some domains (e.g., nonlocal dependencies) appear to be governed by some mechanisms, whereas other aspects (e.g., verb subcategorization) may be governed by others; individuals may be strategic in how they process a particular sentence at a particular time (Townsend and Bever, 2001). Constraint-based accounts assume that the same memory-based mechanisms are responsible for all processing outcomes in a uniform way.
3. Can adult L2 learners attain native-like processing proficiency? Again, the findings from the syntax-based accounts are mixed, but they suggest that overall native-like processing proficiency may not be attainable, at least in some (p. 203) domains. The usage-based nature of the constraint-based approach implies that native-like fluency in the L2 is possible, given the appropriate experience.
L2 sentence processing as a research area is still developing, and the answers given above are highly tentative. It has been only in the last decade that online studies have appeared, and the number of such studies will undoubtedly grow. The studies discussed in this chapter have used primarily behavioral data, examining online reading behavior through self-paced reading tasks (Papadopoulou, 2005) or eye movement monitoring (Frenck-Mestre, 2005). Recent neurolinguistic research is applying increasingly sophisticated tools to study online sentence processing in the L2. Techniques such as event-related brain potentials (ERPs; Mueller, 2005; Osterhout et al. (2006); Sabourin and Stowe, 2008), and brain imaging (Indefrey, 2006) will allow researchers to study online comprehension processes at a level of sensitivity not possible with the current behavioral methods.
(1.) The generalized theta attachment (GTA) model is a serial processing model that assigns thematic roles (that is, agent, theme, goal, etc.) to elements in the input string. Processing difficulties arise as the result of unfulfilled thematic role assignments. The key comparison was for reading times in the region following the word believe in the example sentences. In the subject extraction sentences this region follows the extraction gap, and in the object structures it occurs before. It was predicted that the parser will initially interpret the main verb believe in the subject structures as an NP complement and will posit a complete grammatical sentence with an object gap—that is, Who does Ann believe ____ (Pritchett, 1992). This is a version of the active filler strategy in which the processor is assumed to favor any analysis that allows gap filling over one that does not (Stowe, 1986). The subsequent appearance of the word like then forces the object gap to be reanalyzed as a subject gap structure, resulting in longer reading times.
(2.) Input units were singular word stems, and output units were plural prefixes, either regular or irregular. The network was trained by presenting the input units, each presentation activating either of the output units. At the outset, the mappings were random. Each time a given input unit activated a particular output unit, the model would compare the activation weight of that mapping with that of the correct output unit for that input unit. A backpropagation learning mechanism was used to calculate the difference between the observed and the desired mappings and then to make an incremental adjustment in the weight of the observed mapping. As a result, the next time the input unit was presented, it was closer to the weight of the correct output unit. All the input-output mappings were thus trained separately over thousands of trials. Subsequently, an untrained singular stem was used to test the model's ability to handle novel input; the result showed a small but statistically significant tendency for the novel unit to activate the appropriate output unit (regular plural; Ellis and Schmidt, 1998).