Bayesian Models of Attention
Abstract and Keywords
Traditionally, attentional selection has been thought of as arising naturally from resource limitations, with a focus on what might be the most apt metaphor, e.g. whether it is a ‘bottleneck’ or ‘spotlight’. However, these simple metaphors cannot account for the specificity, flexibility, and heterogeneity of the way attentional selection manifests itself in different behavioural contexts. A recent body of theoretical work has taken a different approach, focusing on the computational needs of selective processing, relative to environmental constraints and behavioural goals. They typically adopt a normative computational framework, incorporating Bayes-optimal algorithms for information processing and action selection. This chapter reviews some of this recent modelling work, specifically in the context of attention for learning, covert spatial attention, and overt spatial attention.
Keywords: attentional selection, Bayesian inference, behavioural goals, information processing, action selection, attention for learning, covert attention, overt attention
Introduction
Our senses are constantly bombarded by a rich stream of complex and noisy inputs. Selectively filtering these sensory inputs and maintaining useful interpretations for them are important computational tasks faced by the brain. Traditionally, the process of attentional selection has commonly been associated with the metaphor of a ‘bottleneck’: limited processing resources giving rise to the exclusion or attenuation of certain aspects or components of sensory inputs (Broadbent 1958; Deutsch and Deutsch 1963; Treisman 1969; Norman 1968). This ongoing debate has focused on whether the ‘unattended’ stimuli are totally ignored (Broadbent 1958) or merely attenuated (Treisman 1969); whether the selection process happens early on, such that semantic processing is only applied to the attended stimuli (Broadbent 1958), or much later on, after semantic analysis has been applied and before reaching consciousness (Deutsch and Deutsch 1963); whether the filter operates only on physical features of the stimuli (Broadbent 1958), or depends on top-down contextual influences (Deutsch and Deutsch 1963); and whether a discrete bottleneck, through which only one item is selected, or a spatially embedded ‘spotlight’ (LaBerge 1983; Eriksen and St James 1986), privileging all stimuli within a spatial region, is the more apt analogy.
Instead of trying to characterize the exact capacity and limitation of attentional selection, a number of recent modelling papers have focused on the computational goals of selective processing, i.e. a normative framework for ‘why’ attentional selection behaves the way it does in different context, instead of a descriptive picture of ‘how’ it operates. For example, selective filtering may arise because certain aspects or components of the sensory landscape are more relevant for the observer’s current behavioural goals, and the remaining components, if included for processing, may be confusing or even detrimental to the task at hand. Thus, selective processing could (p. 1160) be useful for computational optimality, beyond any resource limitation considerations. To motivate the selection-for-computation perspective on attentional selection, it is instructive to recall an early insight articulated by Hermann von Helmholtz. He was among the first to recognize that sensory processing involves an active inductive process of ‘unconscious inference’, which combines remembered ideas arising from past sensory experiences with fresh sense impressions, in order to arrive at ‘conclusions’ about the sensory world (Helmholtz 1878). From this perspective, the concept of selection-for-computation arises quite naturally: different pieces of information, whether from immediate sensory inputs or past knowledge, must be combined according to their respective relevance and informativeness, as part of the inductive process. In other words, selection-for-computation concretely implies selection for optimizing the inductive process. Notably, it has long been known that human sensory processing manifests many types of inductive biases. Indeed, many of the Gestalt laws of psychophysics formulated in the early twentieth century can also be interpreted this way (Elder and Goldberg 2002).
A number of recent theoretical models, based on Bayesian probability theory, have formalized the need of attentional selection to be shaped by computational desiderata (Dayan and Zemel 1999; Dayan and Yu 2002; Yu and Dayan 2005a, 2005b; Yu et al. 2009). Bayesian probability theory is a powerful and increasingly prevalent ideal observer (Green and Swets 1966) framework for understanding selective processing, as it provides a set of statistically optimal tools for the quantification and integration of imperfect information sources. It has been successfully applied to explain human and animal behaviour in a number of cognitive tasks, including perceptual inference (Grenander 1976–81; Bolle and Cooper 1984; Geman and Geman 1984; Marroquin et al. 1987; Szeliski 1989; Clark and Yuille 1990; Knill and Richards 1996), multi-modal sensory integration (Jacobs 1999; Ernst and Banks 2002; Battaglia et al. 2003; Dayan et al. 2000; Körding and Wolpert 2004; Shams et al. 2005; Körding et al. 2007a), reward learning (Behrens et al. 2007), and motor adaptation (Körding et al. 2007b).
In the following, we will review recently proposed Bayesian models of attention for learning, covert spatial attention, and overt spatial attention.
Attention for Learning
In the Bayesian framework, the two main computational tasks under conditions of uncertainty are inductive inference and learning. Inference refers to the computation of an ‘interpretation’ or ‘representation’ for sensory inputs based on an internal model of how events and properties of our external environment ‘generate’ these observations, while learning deals with a longer timescale process through which sensory experiences get incorporated into the internal representations of how entities in the (p. 1161) environment interact and generate sensory observations. In the context of attentional selection, inference and learning both assign weight to a piece of information according to its associated uncertainty, but they require precisely the opposite pattern of prioritization as a function of uncertainty. For inference, sensory inputs associated with greater uncertainty are accorded relatively less weight so that more informative inputs are selectively processed; for learning, cues associated with greater uncertainty are accorded greater weights, so that learning focuses on less well-known aspects of the environment.
As an illustrative example of attention for learning, we first consider a simple classical conditioning learning scenario that we previously modelled as a Kalman filter (Dayan and Yu 2003). This example demonstrates that when multiple sources of noisy information are available, an ideal observer should ‘attend’ more to the more reliable sources of information. Moreover, such uncertainty, if reducible through experience, should encourage the observer to ‘attend’ to the most uncertain aspects of the environment in order to reduce uncertainty.
The field of classical conditioning probes the way that animals learn and utilize predictive relationships in the world, between initially neutral stimuli such as lights and tones, and reinforcers such as food, water, or small electric shocks (Dickinson 1980; Mackintosh 1983). In these experiments, the animals are thought to be ‘reverse-engineering’ the arbitrary predictive relationships set by the experimenter (Sutton 1992). Figure 39.1 graphically illustrates these ideas.
One statistical formulation of the ‘true’ underlying stimulus–reinforcer relationship, constituting the generative model, is to assume that the stimuli x_{t} (e.g. light, tone) on trial t stochastically determine the reward (or punishment) r_{t}, through a linear relationship:
Here, each ${x}_{t}^{i}$ in ${\text{x}}_{t}=\{{x}_{t}^{i},\text{\hspace{0.17em}}\mathrm{...},\text{\hspace{0.17em}}{x}_{t}^{n}\}$ is a binary variable representing whether stimulus i is present or not on trial t, · denotes the dot product, w = {w_{1},..., w_{n}} are the weights that specify how each stimulus x_{i} contributes to the reward outcome, and η_{t} ~ N(0, τ^{2}) is a noise term following a Gaussian (normal) distribution with zero mean and variance τ^{2}. The task for the animal on each trial is to predict the amount of reinforcer given the stimuli, based on the learned relationship between stimuli and reinforcer, and to update those relationships according to the observation of a new pair of x_{t}, r_{t} on each trial.
Let us consider the simple case of there being two stimuli, i = 1, 2, and that it is known that only the first stimulus is present on trial 1, x_{1} = (1, 0), and both are present on trial 2, x_{2} = (1, 1).
Before any observations, we assume that the prior distributions of w_{1} and w_{2} are independent and Gaussian: ${w}_{i}~\aleph ({w}_{0},\text{\hspace{0.17em}}{\sigma}_{0}^{2})$, for i = 1, 2. After observing the first set of (x_{1}, r_{1}), the distribution over w_{2} is still just the prior distribution $\aleph ({w}_{0},\text{\hspace{0.17em}}{\sigma}_{0}^{2})$, since stimulus 2 was not present. The distribution over w_{1} takes the following form: (p. 1162)
The first part of the equation is just an instantiation of Bayes’ Rule (Bayes 1763), which states that the posterior distribution of a variable (w_{1}) after observations (x_{1}, r_{1}) is proportional to the likelihood of the data (p(r_{1}, x_{1}|w)) times the prior (p(w)). The distribution is normalized by the constant p(x_{1}, r_{1}). The second part, where ∝ denotes ‘proportional to’, shows that the posterior distribution is also Gaussian. It has a mean estimate that is a linear combination of the observed reinforcer for cue 1 and the prior mean w_{0}, where the weight assigned to each is determined by the relative precision (1/variance) of each—thus, the information source that is less noisy/uncertain is given proportionally greater weight. The precision of the posterior distribution is the sum of the individual precisions of the prior and likelihood distributions, reflecting the fact that combining multiple sources of information ultimately results in greater precision (less uncertainty) than having only one of them. (p. 1163)
On the second trial, when both stimuli are presented, we can again apply Bayes’ Rule to obtain a new posterior distribution in the weights, p(w_{1}, w_{2}|x_{1}, r_{1}, x_{2}, r_{2}), where now the prior distribution is the posterior from the previous trial. If we make the simplifying assumption that the correlation between w_{1} and w_{2} is 0, then it can be shown using a set of iterative Bayesian computations, known as the Kalman filter (Anderson and Moore 1979), that the posterior distribution is Gaussian with mean ${\widehat{w}}_{t}=\left\{{\widehat{w}}_{t}^{1},{\widehat{w}}_{t}^{2}\right\}$ and diagonal variance $\left\{{\left({\sigma}_{t}^{1}\right)}^{2},\text{\hspace{0.17em}}{\left({\sigma}_{t}^{2}\right)}^{2}\right\}$, where for i = 1, 2,
Eq. 3 says that the new estimate ${w}_{t}^{i}$ is just the old one plus the prediction error ${r}_{t}-{\text{x}}_{t}\cdot {\widehat{w}}_{\text{t}}$ times a coefficient, called the Kalman gain, which depends on the uncertainty associated with each weight estimate relative to the observation noise. The Kalman gain indicates a competitive allocation between the stimuli, so that the stimulus associated with the larger uncertainty σ_{i} gets the bigger share. On trial 2, because ${\left({\sigma}_{1}^{2}\right)}^{2}={\sigma}_{0}^{2}$ and ${\left({\sigma}_{1}^{1}\right)}^{2}<{\sigma}_{0}^{2},{\widehat{w}}^{2}$ would be accorded relatively faster learning. In addition, large observation noise τ^{2} would result in slower learning for all weights, as the inputs are known to be unreliable indicators of the underlying weights, and small τ^{2} leads to faster learning. Eq. 4 indicates that the uncertainty associated with each stimulus is also reduced faster when the observation noise τ^{2} is relatively small.
In the computation of the new weight estimate, larger prior uncertainty ${({\sigma}_{t}^{i})}^{2}$ (relative to observation noise) leads to greater weight placed on the observation, and lower prior uncertainty limits the impact of the observation. This again demonstrates the principle that in probabilistic inference, more uncertain information sources have less influence in the information integration process. Notice that the uncertainty ${({\sigma}_{t}^{i})}^{2}$ in Eq. 4 gradually reduces as a function of the number of times that the stimulus x^{i} has been observed, but not on the actual observations. This is a quirk of the simple linear-Gaussian generative (Kalman filter) model that we consider here. Critically, learning rate is parcelled out among cues differentially in this normative framework, such that the cue whose predictive consequences are least well known is accorded the greatest ‘attention’ during the learning process.
While this simple conditioning example demonstrates that uncertainty plays important roles in various aspects of selective attention in learning, it is overly simplistic in several respects. Firstly, it conflates the inference and learning problems by having a single kind of hidden variable, w. Under more realistic circumstances, there would be noise associated with any sensory inputs (both stimuli and reward), whether at the receptor level, generated in the cortex, or due to true stochasticity within the external (p. 1164) environment itself. If x_{t} and r_{t} were not observed directly but induce noisy sensory inputs, then computing a distribution over potential values of x_{t} and r_{t} would be part of the inference problem, whereas the computation about the weights can be considered a learning problem. Secondly, the ‘hidden’ relationships in the world (parameterized by w) are assumed to be constant over time. What if these predictive relationships can actually fundamentally change at times, as for instance through the common experimental manipulations of reversal or extinction? Clearly, the simple linear-Gaussian model that we have proposed would be inadequate, since it cannot capture such discrete, abrupt changes. One consequence of dramatic changes in the parameters of an environmentally specified generative model is the need for a measure of unexpected uncertainty, which monitors gross discrepancy between predictions made by the internal model and actual observations. This unexpected uncertainty measures the amount of ‘surprise’ in addition to any expected stochasticity within the (learned) behavioural environment. We call this latter form of well-known stochasticity expected uncertainty, which should be encoded in the internal model for the current environment. Jumps in unexpected (p. 1165) uncertainty would signal that there may have been dramatic changes in the statistical contingencies governing the behavioural environment, and should alert the system to a possible need to overhaul the internal model.
What should we expect of the neural realization of expected and unexpected uncertainty signals? First, both should have the effect of suppressing internal, expectation-driven information relative to external, sensory-induced signals, as well as promoting learning about lesser-known aspects of the environment. Second, they should be differentially involved in tasks engaging just one or the other form of uncertainty. A wide variety of experimental evidence suggests that the cholinergic (ACh) and noradrenergic (NE) neuromodulatory systems satisfy these conditions (Robbins and Everitt 1995; Posner and Petersen 1990; Sarter and Bruno 1997; Baxter and Chiba 1999; Gu 2002). We reviewed much of the supporting experimental evidence in Yu and Dayan (2002) and Yu and Dayan (2005b), and proposed a Bayesian model of expected and unexpected uncertainty, respectively signalled by ACh and NE (Yu and Dayan 2005b).
To understand the concrete roles of expected uncertainty and unexpected uncertainty in inference and learning, we examined their computational roles in a novel, hypothesized experimental task (Fig. 39.2) (Yu and Dayan 2005b) that generalizes a discrimination variant of the original Posner spatial cueing task (Posner 1980), and an attention-shifting task (Devauges and Sara 1990). In this generalized task, subjects observe a sequence of trials, each containing a set of cue stimuli (the coloured arrows, pointing left or right), preceding a target stimulus (the light bulb) after a variable delay, and must respond as soon as they detect the target. The directions of the coloured arrows are randomized independently of each other on every trial, but one of them, the (p. 1166) cue, specified by its colour, predicts the location of the subsequent target with a significant probability (cue validity γ > 0.5); the rest of the arrows are irrelevant distractors. On each trial, the cue is correct (valid) with probability γ, and incorrect (invalid) with probability 1 − γ (cue invalidity). The colour of the cue arrow (the ‘relevant’ colour) and the cue validity persist over many trials, defining a relatively stable context. However, the experimenter can suddenly change the behavioural context by changing the relevant cue colour and cue validity, without informing the subject. The subject’s implicit probabilistic task on each trial is to predict the likelihood of the target appearing on the left versus on the right given the set of cue stimuli on that trial. Doing this correctly requires the subject to infer the identity (colour) of the currently relevant arrow and estimate its validity. In turn, the subject must accurately detect the infrequent and unsignalled switches in the cue identity (and the context).
This novel task generalizes probabilistic cueing tasks, which typically have a predictive cue with fixed identity, but whose validity is explicitly manipulated. The task also generalizes attention-shifting tasks, for which the identity of the relevant cue stimulus is experimentally manipulated, but whose validity is fixed at being perfectly correct. In this generalized task, unsignalled changes in the cue identity result in observations about the cue and target that are atypical for the learned behavioural context. They give rise to unexpected uncertainty, and should therefore engage NE. Within each context, the cue has a fixed invalidity, which would give rise to expected uncertainty, and should therefore engage ACh.
Despite the apparent simplicity of the cue–target contingency in this novel task, achieving these computational goals is difficult due to the noise and non-stationarity underlying the cue–target relationship. The mathematically optimal ideal learner algorithm imposes rather high computational and representational costs, so that biological implementation of its exact form is unlikely. Nevertheless, the brain seems quite capable of solving similar, and much more difficult, problems. Therefore, we propose that the brain may be implementing an alternative algorithm (sketched in Fig. 39.3) that approximates the ideal one (Yu and Dayan 2005b). More detailed discussion of the model can be found in Yu and Dayan (2005b), along with a discussion of the circumstances under which the performance of the approximate algorithm closely tracks that of the ideal learner.
Specifically, the approximation we propose bases all estimates on just a single assumed relevant cue colour, rather than maintaining the full probability distribution over all potential cue colours. NE reports the estimated lack of confidence as to the particular colour that is currently believed to be relevant. This signal is driven by any unexpected cue–target observations on recent trials, and is the signal implicated in controlling learning following cue shift in the maze navigation task (Devauges and Sara 1990). ACh reports the estimated invalidity of the colour that is assumed to be relevant, and is the signal implicated in controlling the performance impact of cueing in the spatial cueing task, measured in validity effect, or the difference between reaction time or accuracy between validly and invalidly cued trials (Phillips et al. 2000). These two sources of uncertainty cooperate to determine how the subjects perform the trial-by-trial prediction task of (p. 1167) estimating the likelihood that the target will appear on the left versus the right. Either form of uncertainty reduces the attention paid to the target location predicted by the assumed cue, since it reduces the degree to which that cue can be trusted. Validity effect in our model is therefore assumed to be proportional to (1–ACh) (1–NE), though other formulations inversely related to each type of the uncertainties signalled by ACh and NE would produce qualitatively similar results. This is consistent with the observed ability of both ACh and NE to suppress top-down, intracortical information (associated with the cue), relative to bottom-up, input-driven sensory processing (associated with the target) (Gil et al. 1997; Hasselmo et al. 1996; Hsieh et al. 2000; Kimura et al. 1999; Kobayashi 2000).
In addition to the uncertainties signalled by ACh and NE, two other pieces of information are necessary for the appropriate updating of the internal model after each cue–target observation. One is the identity of the cue that is currently perceived to be relevant, which is critical for predicting target locations. The other is the estimate of the number of trials in the current context (since this colour first became relevant), which controls how much the estimated cue validity is influenced by the outcome of a single trial (analogous to the simple Kalman filter model discussed above). More details about these quantities, and their roles in the approximate algorithm, can be found in Yu and Dayan (2005b). We suggested that these two quantities are represented and updated in the prefrontal working memory (Miller and Cohen 2001). The prefrontal cortex has dense reciprocal connections with both the cholinergic (Sarter and Bruno 1997; Zaborszky et al. 1997; Hasselmo and Schnell 1994) and noradrenergic (Sara and Hervé-Minvielle 1995; Jodo et al. 1998) nuclei, in addition to the sensory processing areas, making it well suited to the integration and updating of the various quantities.
Figures 39.4 and 39.5 show comparisons between experimental and simulated data for the specific renditions of Posner’s task (Phillips et al. 2000) and the maze navigation task (Devauges and Sara 1990), which we discussed above. We model the Posner task (Phillips et al. 2000) as a restricted version of the general task, for which the identity of the relevant colour does not change and the cue validity is fixed. An iterative algorithm computes expected and unexpected uncertainties and thereby predicts the size of validity effect (Yu and Dayan 2005b). Since there is no unexpected uncertainty, NE is not explicitly involved, and so noradrenergic manipulation is incapable of interfering with performance in this task. This is consistent with experimental data (Witte and Marrocco 1997). However, ACh captures the invalidity of the cue, and so, as in the experimental data (Fig. 39.4a, b), the validity effect depends inversely on boosting (Fig. 39.4c) or suppressing (Fig. 39.4d) ACh.
In contrast to the Posner task, which involves no unexpected uncertainty, the attention-shifting task involves unexpected, but not expected, uncertainty. Within our theoretical framework, such a task explicitly manipulates the identity of the relevant cue, while the cue validity is kept constant (with high validity). Experimentally enhancing NE level (Devauges and Sara 1990) would result in greater unexpected uncertainty and therefore a greater readiness to abandon the current hypothesis and adopt a new model for environmental contingencies. As expected, simulations of our model showed (p. 1168) an advantage for NE elevation to 10% above normal (Fig. 39.5b), similar to experimental data (Fig. 39.5a) (Devauges and Sara 1990). Our model also predicts a lack of ACh involvement, since the perfect reliability of the cues obviates a role for expected uncertainty, consistent with experimental data (McGaughy et al. 2008).
These results do not imply that increasing NE would create individuals that are generally ‘smarter’. In the model, control animals are relatively slow in switching to a new visual strategy because their performance embodies an assumption (which is normally correct) that task contingencies do not easily change. Pharmacologically increasing NE counteracts the conservative character of this internal model, allowing idazoxan animals to learn faster than the control animals under these particular circumstances. The extra propensity of the NE group to consider that the task has changed based on relatively little evidence can also impair their performance in other circumstances, for instance when the underlying statistical contingencies are highly stable. (p. 1169)
In the generalized task of Fig. 39.2, both cue identity and validity are explicitly manipulated, and therefore we expect both ACh and NE to play significant roles. The key to solving the full task is the timely and accurate detection of context changes in the face of invalidity. A trial perceived to be valid always increases confidence in the current context, as well as estimated cue validity. But when a trial is apparently invalid, subjects have to decide between maintaining the current context with an increased invalidity, or abandoning it altogether. This decision requires comparing the relative probability of having observed a chance invalid trial given the estimated cue validity, and the probability of the predictive cue identity having changed altogether. As ACh reports the first probability, and NE the second, we can expect there to be a rich interaction between these neuromodulators. In Yu and Dayan (2005b), we showed that near-optimal learning can be approximated by assuming the context to have changed whenever
This inequality points to an antagonistic relationship between ACh and NE: the threshold for NE which determines whether or not the context should be assumed to have changed is set monotonically by the level of ACh. Intuitively, when the estimated cue invalidity is low, a single observation of a mismatch between cue and target could signal a context switch. But when the estimated cue invalidity is high, indicating low correlation between cue and target, then a single mismatch would be more likely to be treated as an invalid trial rather than a context switch. This antagonistic relationship (p. 1170) between ACh and NE in the learning of the cue–target relationship over trials contrasts with their chiefly synergistic relationship in the prediction of the target location on each trial.
Figure 36.a shows a typical run in the full task that uses differently coloured cue stimuli. The predictive cue stimulus is μ = 1 for the first 200 trials, μ = 5 for the next 200, and μ = 3 for the final 200. The approximate algorithm does a good job of tracking the underlying contextual sequence from the noisy observations. The black dashed line (labelled 1 − γ) in Fig. 39.6b shows the cue invalidities of 1%, 30%, and 15% for the three contexts. (p. 1171) Simulated ACh levels (dashed red trace in Fig. 39.6b) approach these values in each context. The corresponding simulated NE levels (solid green trace in Fig. 39.6b) show that NE generally correctly reports a contextual change when one occurs, though occasionally a false alarm can be triggered by a chance accumulation of unexpected observations, which takes place most frequently when the true cue validity is low. These traces directly give rise to physiological predictions regarding ACh and NE activations, which could be experimentally verified. Psychophysical predictions can also be derived from the model. The validity effect is predicted to exhibit the characteristic pattern shown in Fig. 39.6c, where large transients are mostly dependent on NE activities, while tonic values are more determined by ACh levels. During the task, there is a strong dip in the validity effect just after each contextual change, arising from a drop in model confidence. The asymptotic validity effect within a context, on the other hand, converges to a level that is proportional to the expected probability of valid cues.
It follows from Eq. 5 and the related discussion above, that ACh and NE interact critically to help construct appropriate cortical representations and make correct inferences. Thus, simulated experimental interference with one or both neuromodulatory systems should result in an intricate pattern of impairments. Using simulations, we showed that NE depletion results in the model having excessive confidence in the current cue–target relationship, leading to perseverative behaviour and an impairment in the ability to adapt to environmental changes (Yu and Dayan 2005b), which are also observed in animals with experimentally reduced NE levels (Sara 1998). In addition, the model makes the prediction that this reluctance to adapt to new environments would make the ACh level, which reports expected uncertainty, gradually rise to take into account all the accumulating evidence of deviation from the current model. Conversely, suppressing ACh leads the model to underestimate the amount of variation in a given context (Yu and Dayan 2005b). Consequently, the significance of deviations from the primary location is exaggerated, causing the NE system to overreact and lead to frequent and unnecessary alerts of context switches. Overall, the system exhibits symptoms of ‘hyper-distractibility’, reminiscent of empirical observations that anti-cholinergic drugs enhance distractibility (Jones and Higgins 1995) while agonists suppress it (Prendergast et al. 1998; Terry et al. 2002; O’Neill et al. 2003).
We also simulated combined ACh and NE depletion (Yu and Dayan, 2005b): it is predicted to lead to inaccurate cholinergic tracking of cue invalidity and a significant increase in false alarms about contextual changes, though it also results in less severe impairments than either single depletion: intermediate values of NE depletion, combined with ACh depletion, induce impairments that are significantly less severe than either single manipulation (Yu and Dayan 2005b). Intuitively, since ACh sets the threshold for NE-dependent contextual change (Eq. 5), abnormal suppression of either system can be partially alleviated by directly inhibiting the other. Due to this antagonism, depleting the ACh level in the model has somewhat similar effects to enhancing NE; and depleting NE is similar to enhancing ACh. Intriguingly, Sara and colleagues have found similarly antagonistic interactions between ACh and NE in a series of learning and memory studies (Sara 1989; Ammassari-Teule et al. 1991; Sara et al. 1992; (p. 1172) Dyon-Laurent et al. 1993, 1994). They demonstrated that learning and memory deficits caused by cholinergic lesions can be alleviated by the administration of clonidine (Sara 1989; Ammassari-Teule et al. 1991; Sara et al. 1992; Dyon-Laurent et al. 1993, 1994), a noradrenergic α-2 agonist that decreases the level of NE (Coull et al. 1997).
While we have illustrated our model using a hypothetical task that is a combination of a spatial cueing task and a maze learning task, the key concepts could be equivalently realized by modifying a number of other familiar attention tasks, such as allowing the cue validity to vary in an attention-shifting task. There is also a rich background of experimental data consistent with our uncertainty theory of ACh and NE, which lie outside traditional attentional tasks. For instance, the enhanced learning animals accord to stimuli with uncertain predictive consequences (Bucci et al. 1998), and decreased learning they accord to stimuli with well-known consequences (Baxter et al. 1997), in conditioning tasks (Pearce and Hall, 1980) is critically dependent on the ACh system. Also, recordings of neurons in the locus coeruleus, the source of cortical NE, indicate strong neural response to unexpected external changes such as novelty, the introduction of reinforcement pairing, and the extinction or reversal of these contingencies (Sara and Segal 1991; Vankov et al. 1995; Sara et al. 1994; Aston-Jones et al. 1997). NE has also been observed to modulate the P300 component of ERP (Pineda et al. 1997; Missonnier et al. 1999; Turetsky and Fein 2002), which has been associated with various types of violation of expectations: ‘surprise’ (Verleger et al. 1994), ‘novelty’ (Donchin et al. 1978), and ‘oddball’ (Pineda et al. 1997). These data provide additional evidence that NE reports unexpected global changes in the external environment, and thus serving as an alarm system for contextual switches. In addition, the well-documented ability of ACh and NE to control experience-dependent plasticity in the cortex (Gu 2002) is consistent with their proposed ability to alter sensory processing after the detection of a global contextual change in a fundamental manner.
Covert Attention in Perceptual Decision-making
In the previous section, we focused on the role of neuromodulatory systems in the representation of uncertainty in inference and learning tasks. In addition to this more macroscopic form of uncertainty, there is a separate body of work on the encoding of microscopic form of uncertainty by cortical neuronal populations. This spans a broad spectrum, from distributional codes that can encode mean and variance (Zemel et al. 1998), to more exotic codes that can represent complex distributions (Sahani and Dayan 2003; Barber et al. 2003; Rao 2004; Weiss and Fleet 2002). How microscopic and macroscopic neural representations of uncertainty work together is an important and relatively unexplored area. Another under-explored area of attention is the decision component. For simple binary perceptual decision, an elegant theoretical framework (p. 1173) has emerged: lateral intraparietal sulcus (LIP) neurons have been suggested to accumulate sensory information about relevant stimulus properties, such as motion direction in the case of a moving stimulus, until a fixed decision threshold is reached (Gold and Shadlen 2002; Ratcliff 2001; Smith and Ratcliff 2004), in a process computationally reminiscent of the Bayes-optimal statistical decision procedure, the sequential probability ratio test (Wald 1947; Wald and Wolfowitz 1948). However, how sensory information translates into decision in more complex scenarios, such as when there are multiple (greater than two) perceptual outcomes, or when there is a computational need to attend more to some aspect of the sensory environment over the other, is less understood in terms of both the computational and neural principles underlying brain processing. Recently (Yu and Dayan 2005a), we tackled these challenges in the context of a spatial (covert) attention cueing task that we introduced earlier. We used a temporally more detailed model to examine the way neuronal populations interact to filter and accumulate noisy information over time, the influence of attention and neuromodulation on the dynamics of cortical processing, and the process through which perceptual decisions are made (Yu and Dayan 2005a).
One empirically observed consequence of spatial attention is a multiplicative increase in the activities of visual cortical neurons (McAdams and Maunsell 1999, 2000). If cortical neuronal populations are coding for uncertainty in the underlying variable, it is of obvious importance to understand how attentional effects on neural response, such as multiplicative modulation, change the implied uncertainty, and what statistical characteristics of attention license this change. One early Bayesian model of spatial attention gave an abstract computational account of this effect (Dayan and Zemel 1999). It was argued that performing an orientation discrimination task on a spatially localized stimulus is equivalent to marginalizing out the spatial uncertainty in the joint posterior over orientation φ and spatial location y given inputs I:
If that spatial integration is restricted to a smaller region that contains the visual stimulus, then less irrelevant input (i.e. noise) is integrated into the computation. This in turn leads to more accurate and less uncertain posterior estimates of $\widehat{\varphi}$. It was proposed that under encoding schemes such as the standard Poisson model, such decrease in posterior uncertainty is equivalent to a multiplicative modulation of the orientation tuning curve (Dayan and Zemel 1999), which may be implemented by, for example, a contrast gain mechanism (see also chapter by Maunsell).
Following up on that work, we proposed a more detailed model of neural coding to demonstrate how neuromodulator-mediated spatial attention interacts with cortical probabilistic information to influence the dynamics and semantics of perceptual inference and decision-making (Yu and Dayan 2005a). In this scheme, spatial attention also effects a multiplicative scaling of the orientation tuning function. Compared to the standard Poisson model (Dayan and Abbott 2001), however, this encoding scheme is (p. 1174) able to represent a more diverse range of probabilistic distributions over stimulus values. We also examined how information is accumulated over time in such a network, and enables the timely and accurate execution of perceptual decisions (Yu and Dayan 2005a). We review the model and its main implications here.
For concreteness, we again focus on a discrimination variant of the Posner task (Posner 1980), in which a cue predicts the location (left or right, y) of a subsequent target, on which the subject must perform a discrimination task (e.g. orientation, φ). The cue induces a prior distribution over the target location y. However, ‘robustness’ would require that a sensory stimulus, however improbable under the current top-down model, should get processed to some extent. We therefore model the prior distribution to be a mixture between a peaked cue-induced component and a ‘flat’ generic component, ${p}_{c}(y;c)=\gamma \aleph (\tilde{y},{\nu}^{2})+(1-\gamma )c$. γ parameterizes the relative probability of the cue-induced component being correct, and incorporates factors such as the validity of the cue. Consistent with our theory of neuromodulation outlined in the previous section, we suggest that 1 − γ should be signalled by ACh. The Gaussian component of the prior comes from a top-down source, perhaps a higher cortical area such as the parietal cortex, and its mean and width, possibly of high spatial precision, should be represented by a cortical population itself.
The neural computations under consideration here involve some intermediate level of processing in the visual pathway, which receives top-down attentional inputs embodied by the prior p(y; c) and noisy sensory inputs D_{t} = {x_{1},..., x_{t}} that are sampled independently and identically (iid) from a stimulus with true location and orientation y^{*} and φ^{*}. We model the pattern of activations x_{t} = {x_{ij}(t)} to the stimulus as independent and Gaussian, x_{ij}(t) ~ N(f_{ij}(y^{*}, φ^{*}), σ_{n}), with variance ${\sigma}_{n}^{2}$ around a mean tuning function that is bell-shaped and separable in space and orientation:
The task involves making explicit inferences about φ and implicit ones about y. The computational steps involved in the inference can be decomposed into the following:
Because the marginalization step is weighted by the priors, even though the task is ultimately about the orientation variable φ, the shape of the prior p(y) on the spatial variable can have dramatic effects on the marginalization and the subsequent computations. In particular, if the prior p(y) assigns high probability to the true y^{*}, then the more (p. 1175) ‘signal’ and less ‘noise’ would be integrated into the posterior, whereas just the opposite happens if p(y) assigns low probability to the true y^{*}. This is the computational cost between valid and invalid cueing, a point we will return to later.
To implement the necessary probabilistic computations, we consider a hierarchical neural architecture in which top-down attentional priors are integrated with sequentially sampled sensory input in a sound Bayesian manner, using a direct log probability encoding (Weiss and Fleet 2002; Rao 2004). Fig. 39.7 shows the semantics and architecture of this hierarchical neural network, along with example activities of each layer at one moment in time. The first layer reports likelihood information and represents the activities of early stages in sensory processing (e.g. the retina in the visual system). The second layer represents the next stage of processing that incorporates top-down influence and bottom-up inputs (for instance, visual areas from LGN to MT/MST have all been shown to be significantly modulated by spatial attention (O’Connor et al. 2002); also see chapter by Beck and Kastner). Layer III represents neuronal populations that specialize in a particular aspect of featural processing, as it is well documented that higher visual cortical areas become increasingly specialized. Layer IV represents (p. 1176) neuronal populations that integrate information over time, as for instance seen in the monkey LIP (Gold and Shadlen 2002). And finally, layer V neurons represent those involved in the actual decision-making, presumably in the frontal cortical areas (though some have argued that higher visual cortical areas such as LIP may be responsible for this stage as well (Gold and Shadlen 2002)). At any given time t, a decision is made based on the maximum of the posterior: if it is greater than a decision threshold q, then the observation process is terminated, and the most probable orientation is reported as the estimated $\widehat{\varphi}$ for the current trial; otherwise, the observation process continues for at least one more time step. This is an n-hypothesis generalization of the sequential probability ratio test for binary decisions (Wald and Wolfowitz 1948), which has also been proposed to explain apparent evidence integration computations in the lateral intraparietal sulcus (Gold and Shadlen 2002).
Figure 39.8 demonstrates that the model indeed exhibits the cue-induced validity effect. That is, mean reaction time and error rates for invalid cue trials are greater than (p. 1177) those for valid cue trials. The model ‘reaction time’ (RT) is the average number of iid samples (presumed to be generated with independent and identical noise conditioned on the stimulus state) necessary to reach the decision threshold q, and ‘error rate’ is the average angular distance between estimated $\widehat{\varphi}$ and the true ϕ^{*}.
Fig. 39.8 shows simulation results for 300 trials each of valid and invalid cue trials, for different values of γ which reflect the model’s belief of cue validity. The RT distribution for invalid-cue trials is broader and right-shifted compared to valid-cue trials, consistent with experimental data (Posner 1980; Bowman et al. 1993) (Fig. 39.8b). Fig. 39.8a shows a similar pattern in the distribution of RT obtained in the case of γ = 0.5. Fig. 39.8c shows that the validity effect increases with increasing perceived cue validity, as parameterized by γ, in both reaction times and error rates. The robust validity effect in both measures excludes the possibility of a simple speed–accuracy trade-off, instead reflecting a real cost of invalid cueing that depends on assumed cue validity.
Since we have an explicit model of not only the ‘behavioural responses’ on each trial, but the intermediate levels of neural machinery underlying the computations, we can look more closely at the activity patterns in the various neuronal layers and relate them to physiological phenomena. Electrophysiological and functional imaging studies have shown that spatial cueing to one side of the visual field increases stimulus-induced activities in the corresponding part of the visual cortex (Reynolds and Chelazzi 2004; Kastner and Ungerleider 2000). Fig. 39.9a shows that our model can qualitatively reproduce this effect: the cued side is more active than the uncued side. Moreover, the difference (p. 1178) increases for increasing γ, the perceived cue validity. Electrophysiological experiments have also shown that spatial attention has an approximately multiplicative effect on orientation tuning responses in visual cortical neurons (McAdams and Maunsell 1999). We see a similar phenomenon in the layer III and IV neurons. Fig. 39.9b shows the layer IV responses averaged over 300 trials of each of the valid and invalid conditions; layer III effects are similar and not shown here. The shape of the average tuning curves and the effect of attentional modulation are qualitatively similar to those observed in spatial attention experiments (McAdams and Maunsell 1999) (also see Maunsell chapter). Fig. 39.9c is a scatter-plot of ${\u3008{r}_{j}^{4}\u3009}_{t}$ for the valid condition versus the invalid condition, for various values of γ. The quality of the linear least square error fits is fairly good, and the slope increases with increasing confidence in the cued location (e.g. larger γ). For comparison, the slope fit to the experimental data (McAdams and Maunsell 1999) is shown in black dashed line. In the model, the slope not only depends on γ but also the noise model, the discretization, and so on, so the comparison of Fig. 39.9c should be interpreted loosely.
In valid cases, the effect of attention is to increase the certainty (narrow the width) of the marginal posterior over φ, since the correct prior allows the relative suppression of noisy input from the irrelevant part of space. If the marginal posterior were Gaussian, the increased certainty would translate into a decreased variance. For Gaussian probability distributions, logarithmic coding amounts to something close to a quadratic (adjusted for the circularity of orientation), with the curvature determined by the variance. Decreasing the variance increases the curvature, and therefore has a multiplicative effect on the activities (as in Fig. 39.9). While it is difficult to show the multiplicative modulation rigorously, we proved it for the case where the spatial prior is very sharply peaked at its Gaussian mean y∼ (Yu and Dayan 2005a). The approximate Gaussianity of the marginal posterior comes from the accumulation of many independent samples over time and space, and is related to the central limit theorem. Readers are referred to a standard statistics probability textbook for further reading on this point.
Another interesting aspect of the intermediate representation is the way attention modifies the evidence accumulation process over time. Fig. 39.10 shows the effect of cueing on the activities of neuron ${r}_{j*}^{5}(t)$, or P(φ^{*}|D_{t}), for all trials with correct responses: i.e. where neuron j^{*} representing the true underlying orientation φ^{*} reached decision threshold before all other neurons in layer V. The mean activity trajectory is higher for the valid cue case than the invalid one: in this case, spatial attention mainly acts through increasing the rate of evidence accumulation after stimulus onset (steeper rise). This attentional effect is more pronounced when the system has more confidence about its prior information (A. γ = 0.5, B. γ = 0.75, C. γ = 0.99). It is interesting that changing the perceived validity of the cue affects the validity effect mainly by changing the cost of invalid cues, and not the benefit of the valid cue. This has also been experimentally observed in rat versions of the Posner task (Witte and Marrocco 1997). Crudely, as γ approaches 1, the evidence accumulation rate in valid-cue case saturates due to input noise. But for the invalid-cue case, the near-complete withdrawal of weight on the ‘true’ signal coming from the uncued location leads to catastrophic consequences. The general effect of increasing γ is similar to increasing input noise in invalid trials. (p. 1179)
Fig. 39.10d shows the average traces for invalid-cueing trials aligned to the stimulus onset, and Fig. 39.10e to the decision threshold crossing. These results bear remarkable similarities to the LIP neuronal activities recorded during monkey perceptual decision-making (Roitman and Shadlen 2002; Gold and Shadlen 2002). In the stimulus-aligned case, the traces rise linearly at first and then tail off somewhat, and the rate of rise increases for lower (effective) noise. In the decision-aligned case, the traces rise steeply and in sync. Roughly speaking, greater input noise leads to smaller average increase of ${r}_{j}^{5}$ at each time step, but greater variance. Because the threshold-crossing event is strongly determined by both the mean and the variance of the random walk, the two effects tend to counteract each other, resulting in similarly steep rise prior to threshold-crossing independent of the underlying noise process. All these characteristics can also be seen in the LIP neural response (Roitman and Shadlen 2002; Gold and Shadlen 2002), where the input noise level was explicitly varied.
This work has various theoretical and experimental implications. The model presents one possible reconciliation of cortical and neuromodulatory representations of (p. 1180) uncertainty. The sensory-driven activities (layer I in this model) themselves encode bottom-up uncertainty, including sensory receptor noise and any processing noise that have occurred up until then. The top-down information, which specifies the Gaussian component of the spatial prior p(y), involves two kinds of uncertainty. One determines the locus and spatial extent of visual attention, the other specifies the relative importance of this top-down bias compared to the bottom-up stimulus-driven input. The first is highly specific in modality and featural dimension, presumably originating from higher visual cortical areas (e.g. parietal cortex for spatial attention, inferotemporal cortex for complex featural attention). The second is more generic and may affect different featural dimensions and maybe even different modalities simultaneously, and is thus more appropriately signalled by a diffusely projecting neuromodulator such as ACh. This characterization is also in keeping with our previous models of ACh (Yu and Dayan 2002, 2003) and experimental data showing that ACh selectively suppresses cortico-cortical transmission relative to bottom-up processing in primary sensory cortices (Kimura et al. 1999), as well as pharmacological studies showing an inverse relationship between the cue validity effect and the level of ACh (Phillips et al. 2000).
The results illustrate the important concept that prior belief about one dimension of a stimulus can significantly alter the inferential performance in an independent stimulus dimension (orientation). Increasing γ leads to an increased mismatch between the assumed prior distribution (sharply peaked at cued location) and the true generative distribution over space (bimodally–modally peaked at the two locations ±y^{*}). Because the spatial prior affects the marginal posterior over φ by altering the relative importance of joint posterior terms in the marginalization process, overly large γ results in undue prominence of the noise samples in the cued location and negligence of samples in the uncued sample. Thus, while a fixed posterior threshold would normally lead to a fixed accuracy level under the correct prior distribution, in this case larger γ induces larger mismatch and therefore poor discrimination performance.
In addition to its theoretical implications, this work has interesting bearings on the experimental debate over the target of top-down attention. Earlier studies suggested that spatial attention acts mainly at higher visual areas, that attentional modulation of striate cortical activities is minimal, if at all significant (Moran and Desimone 1985). However, a recent study using more sensitive techniques (O’Connor et al. 2002) has demonstrated that spatial attention alters visual processing not only in primary visual cortex, but also in the lateral geniculate nucleus in the thalamus (see also Beck and Kastner, chapter 9, this volume, Saalmann and Kastner, chapter 14, this volume). In our neural architecture, even though attentional effects are prominent at higher processing layers (III–V), the prior actually comes into the integration process at a lower layer (II). This raises the intriguing possibility that attention directly acts on the lowest level that receives top-down input and is capable of representing the prior information. The attentional modulation observed in higher visual areas may be a consequence of differential bottom-up input rather than direct attentional modulation.
An important question that remains is how the quality of the input signal can be detected and encoded. If the stimulus onset time is not precisely known, then naive (p. 1181) integration of bottom-up inputs is no longer optimal, because the effective signal/noise ratio of the input changes when the stimulus is turned on (or off). More generally, the signal strength (possibly 0) could be any one of several possibilities, as in the random-dot coherent motion task, in which subjects have to identify the primary direction of motion of a field of moving dots, only a fraction of which are moving coherently in the same direction, the rest flickering randomly (Gold and Shadlen 2002). Optimal discrimination under such conditions requires the inference of both the stimulus strength and its property (e.g. orientation or motion direction). There is some suggestive evidence that the neuromodulator norepinephrine may be involved in such computations. In a version of the Posner task in which cues are presented on both sides (so-called double cueing), and so provide only information about stimulus onset, there is experimental evidence that norepinephrine is involved in optimizing sensory processing (Witte and Marrocco 1997). Based on a slightly different task involving sustained attention or vigilance (Rajkowski et al. 1994), Brown et al. (2004) have recently made the interesting suggestion that one role for noradrenergic neuromodulation is to implement a change in the integration strategy when the stimulus is detected. We have also tackled this issue by ascribing to phasic norepinephrine a related but distinct role in signalling unexpected state uncertainty (Dayan and Yu 2006; Shenoy and Yu 2012).
Overt Spatial Attention
In experimental settings, spatial attention tends to be studied purely in the overt setting, with subjects instructed to resist eye movements. This is in part to allow stability to be established in visual cortical receptive field responses, and in part to simplify the visual processing itself. However, in naturalistic settings, covert and overt attention interact intimately to aid sensory processing (Yantis and Jonides 1984; Rizzolatti et al. 1987; Hoffman and Subramaniam 1995; Moore and Fallah 2001; Stigchel and Theeuwes 2007). Humans and other animals continuously use strategic self-motion to improve sensory information collection, in a manner sensitive to prior knowledge and task goals. Recently, we used a novel visual search task, combined with Bayesian ideal-observer modelling, to examine whether and how the brain internalizes spatial regularities in target location and uses such information to optimize saccadic search strategy for a noisy target (Huang and Yu 2010).
While it has long been known that eye movement patterns are strongly influenced by cognitive factors (Yarbus 1967), such as prior knowledge about target location (He and Kowler 1989), temporal onset (Oswal et al. 2007), and reward probability (Roesch and Olson 2003), it is poorly understood how such contextual knowledge is acquired and how it precisely modulates saccadic choices and perceptual decisions. We recently used a novel visual search task, in which the target stimulus appeared in different locations with different probabilities, to investigate how humans learn and use knowledge about the spatial distribution of targets to control eye movements (Huang and Yu 2010). We formulated several Bayesian ideal observer models (Huang and Yu 2010), based (p. 1182) on different hypotheses about statistical learning and decision-making, and compared them to experimental data—in order to identify the learning procedure used by subjects to acquire spatial prior information, and the decision process that they employ to choose sequential fixation locations.
There has been a long history of debate over whether humans and animals match (Herrnstein 1961, 1970; Davison and McCarthy 1987) or maximize (Hall-Johnson and Poling 1984; Blakely et al. 1988; Poling et al. 2011) in their choice behaviour, where there are multiple options that yield different magnitude or probability or reward. While the maximizing strategy would seem more optimal (greater expected reward), there is an intriguing body of studies suggesting that humans and animals often adopt a matching-like choice policy instead (Herrnstein 1961, 1970; Sugrue et al. 2004) or the related sampling policy (Vul et al. 2009). This debate is relevant for saccadic strategy—in understanding whether subjects first search the most probable location (maximizing), or allocate search fixation proportional to the underlying probability of different locations containing the target (matching). We also explored a third alternative, an algorithmically simple but computationally suboptimal heuristic policy, in which subjects always start searching from the target location of the previous trial, which we call ‘follow-last-target’ policy. This is related to the classic win-stay-lose-shift policy, first proposed in game theory for games like Prisoner’s Dilemma (Rapoport and Chammah 1965; Nowak and Sigmund 1993) and later used to characterize pigeon (Randall and Zentall 1997), monkey (Warren 1966), and human (Frank et al. 2008; Steyvers et al. 2009; Lee et al. 2011; Otto et al. 2011; Scheibehenne et al. 2011; Worthy and Maddox 2012) learning and choice behaviour.
From a different angle, we were also interested in characterizing the dynamic learning procedure humans adopt to learn about spatial regularities in the environment. Specifically, we investigated whether human subjects’ learning policy accumulates spatial statistics stably over a long timescale (across a block), or takes into account only the recent trial history. This was motivated by our previous work showing that in serial 2-alternative reaction time tasks, subjects’ tendency to exhibit a sequential effect (sensitivity to local runs of repetitions and alternations) (Laming 1968; Soetens et al. 1985; Cho et al. 2002) may arise from Bayesian stimulus expectancy computation based on statistical regularities in recent trial history (Yu and Cohen 2009). Ultimately, subjects’ behavioural choices reflect a combination of their internal knowledge about the world (reflecting the learning process) and their saccadic choice based on that knowledge (reflecting the decision policy). We adopted a Bayesian inference and decision modelling framework that allows us to consider different combinations of learning and decision strategies, and differentiate their relevance based on subjects’ behaviour (Huang and Yu 2010).
In our visual search task (Huang and Yu 2010), subjects must find a target motion stimulus in one of three possible locations, with the other two locations containing distractor motion stimuli (Fig. 39.11a). In the spatially ‘biased condition’ (1:3:9 condition), the target location is biased among the three options with 1:3:9 odds. In the ‘uniform condition’ (1:1:1 condition), the target is distributed uniformly across the three locations. The order of the eight blocks (six biased blocks and two uniform ones, 90 trials per block) are randomized for each subject. To eliminate the complications associated with (p. 1183) (p. 1184) the spatial dynamics of covert attention, the display is gaze-contingent: only the fixated stimulus is visible at any given time, the other two replaced by two small dots indicating available fixation alternatives. Subjects receive feedback about true target location on each trial after making their choice. Other than experiencing training blocks with similar statistics before the main experiments, subjects do not receive any explicit instructions on the spatial distribution of target location. They are only told to find the target quickly and accurately, as they receive performance-contingent pay at the end of the experiment, proportional to points they earn: +50 points for each correct trial, −50 points for each error trial, −12.5 points for each second spent searching, and −25 points for each switch of search location.
We found that human subjects indeed notice and take advantage of the spatial statistics to locate the target stimulus more accurately (Fig. 39.11b) and rapidly (Fig. 39.11c). Underlying this performance improvement is a prioritized search strategy that favours the ‘9’ location over the ‘3’ location, in turn over the ‘1’ location, as the first search location (Fig. 39.11d, black bars). If they first fixated 9 and did not find the target, they favoured the 3 location over the 1 location as the second search location (Fig. 39.11d, white bars). Moreover, average distribution of subjects’ first fixation distribution appeared close to a ‘matching’ strategy (Fig. 39.11d, dashed lines), by allocating choices similar in proportion to the actual underlying statistical frequencies.
The observed ‘matching’ search strategy suggests that humans can internalize and utilize a graded probabilistic representation of potential target location. The most salient interpretation is that subjects stochastically select on each trial the first fixation duration in proportion to that block’s target probabilities in the different locations. However, another possibility is that subjects base their beliefs about the configuration of target probabilities not on the entire history of experienced trials (we call this the Fixed Belief Model, or FBM), but instead giving more weight to the more recent trials (we call this the Dynamic Belief Model, or DBM). This is motivated by our previous work (Yu and Cohen 2009) showing that subjects often (implicitly) assume that the world is potentially changeable, and therefore the most recent observations ought to be given greater emphasis than more distant ones in predicting future outcomes. Under such a non-stationarity assumption, an accumulation of unexpected outcomes (e.g. target appearing in unexpected locations) suggests a potential shift in underlying environmental statistics, and a rational observer should be willing to update his/her internal world model and recalibrate his/her interactions with the environment, e.g. by favouring a recently frequented target location, instead of choosing purely according to long-term averages. Thus, matching-like search behaviour may arise not from a stochastic choice policy, but rather a limited-memory belief update process, in combination with a maximizing choice policy.
A third possibility for matching-like search pattern is that subjects may adopt a follow-last-target heuristic strategy, related to the previously proposed win-stay-lose-shift strategy for binary choices (Rapoport and Chammah 1965; Nowak and Sigmund 1993; Randall and Zentall 1997; Warren 1966; Steyvers et al. 2009; Lee et al. 2011; Otto et al. 2011; Scheibehenne et al. 2011; Worthy and Maddox 2012). That (p. 1185) is, subjects could simply be searching first in last trial’s target location, in which case the 1:3:9 empirical statistics would naturally result in matching-like behaviour with one-trial lag. This heuristic strategy would not require any explicit or implicit knowledge about spatial distribution of target locations. However, as shown in Fig. 39.14a, while subjects are swayed by last trial target location, their first fixation choice also reflects long-term statistics—thus favouring 9 over 1 and 3.
Figure 39.12a shows the generative model for the DBM; the FBM is just a special case of the dynamic model with α = 1. Fig. 39.12b shows a sample run of the DBM, in which the target (green x) mostly appears in location 9, but sometimes in location 3, and (p. 1186) occasionally in location 1. The marginalized predictive belief DBM assigns to each of the potential target locations on each trial fluctuates with the prior history of experienced trials. The marginalized predictive belief that the target will appear in location 9 (blue) is highest most of the sequence, but a few trials of location 1 leads to relatively high predictive probability assigned to location 1 (red) early on in the sequence, and several trials of location 3 in the latter half of the sequence sways the DBM to assign higher predictive probability to location 3 (green). Note that the DBM plus a maximizing strategy (p. 1187) produces first fixation predictions that closely correspond to this subject’s actual choices (top panel). The rare discrepancies occur when there are unexpected observations and the underlying probabilities are close in magnitude: for example, trials 56, 72, 77, and 80. Other times, the model and the subject concur on switching (57, 78) or staying (64, 76).
To investigate more closely whether subjects employ something akin to the FBM or DBM in learning spatial statistics, and whether they match or maximize in their fixation policy, we examined how subjects’ fixation choices evolve over the course of 1:3:9 blocks, and compared their behaviour to four different models: FBM+match (FBM for learning spatial statistics and a matching policy for generating first fixation location based on current beliefs), DBM+match (DBM for learning and a matching fixation policy), FBM+max (FBM for learning and a maximization policy for generating first fixation location based on current beliefs), and DBM+max (DBM for learning and (p. 1188) a maximization fixation policy). As we expected, both FBM+match and DBM+max produce learning curves quite similar to the behavioural data (Fig. 39.13a, b), as well as in terms of overall fixation distributions (Fig. 39.13c, d). In contrast, FBM+max over-match (Fig. 39.13a, solid) and DBM+match (Fig. 39.13b, dashed) under-match subjects’ behavioural distribution.
Table 39.1 Accuracy of different models in predicting subjects’ trial-by-trial choice of first fixation. Average predictive accuracy computed for each 1:3:9 or 1:1:1 block, then averaged across all blocks of each condition and all subjects. DBM+max outperforms all other algorithms in both conditions significantly (*, p < 0.05) or very significantly (**, p < 0.001).
1:3:9 Condition |
||
---|---|---|
Model |
Predictive Accuracy |
SEM |
Follow-last-target |
0.7172^{**} |
0.0018 |
FBM+match |
0.5770^{**} |
0.0250 |
FBM+max |
0.7776^{*} |
0.0214 |
DBM+match |
0.5435^{**} |
0.0200 |
DBM+max |
0.8086 |
0.0187 |
1:1:1 Condition |
||
Model |
Predictive Accuracy |
SEM |
Follow-last-target |
0.5700^{**} |
0.0078 |
FBM+match |
0.3508^{**} |
0.0129 |
FBM+max |
0.4864^{**} |
0.0229 |
DBM+match |
0.3586^{**} |
0.0140 |
DBM+max |
0.6495 |
0.0211 (p. 1198) |
While the average statistics cannot distinguish FBM+match or DBM+max in their fit to human behaviour, the conditional distribution of fixation choices as a function of last trial target location reveals some dramatic differences. For both 1:3:9 (Fig. 39.14a) and 1:1:1 (Fig. 39.14d) conditions, DBM+max produced conditional distributions statistically indistinguishable from subjects’ first fixation distributions (p = 0.076, one-sided t-test of average Kullback–Leibler divergence between subjects’ conditional distributions and that of DBM+max), whereas FBM+match produced conditional distributions that are very significantly different from subjects’ distributions (p = 0.00075). Human subjects and DBM+max both adopt a first fixation policy that combines both long-term (1:3:9) and short-term (last trial) influences, where FBM+match makes fixation choices with little regard to the most recent trial target location.
We then examined how well the various model predict subjects’ first fixation choices on a trial-to-trial basis. We found that DBM+max significantly outperformed all four of the other models (DBM+match, FBM+max, FBM+match, Follow-last-target) in both 1:3:9 and 1:1:1 conditions (Table 39.1, p < 0.05 for all one-sided paired t-test of (p. 1189) DBM+match versus all the other models). Note that we fit a value for αfor each subject that maximized the DBM predictive accuracy of first fixation choice; we fit αseparately for 1:3:9 and 1:1:1, and for DBM+match and DBM+max.
To summarize, we showed in this work that humans readily internalize spatial statistics after just a handful of exemplars, and use that information to improve accuracy and efficiency in target search by biasing both saccadic planning and perceptual processing towards the more probable target locations. We found that a combination of optimal fixation decision policy (maximizing accuracy) and suboptimal learning procedure (overestimating the volatility of statistical regularities) gives rise to ‘matching’ choice behaviour on a longer timescale, a strategy known to be suboptimal but nevertheless often observed in experiments. While the non-stationarity assumption and matching-like behaviour seem suboptimal in our experimental context, it would be a valuable asset in natural environments where statistical regularities do change over time, such as financial and economic markets, seasonal weather patterns, rise and fall in predator and prey populations, and so on. Indeed, few things are entirely constant in life besides gravity—even here, astronauts have shown a large degree of adaptability in outer space. We hypothesize that the apparently irrational matching behaviour is an adaptive response to the inherent non-stationarity in natural environments, and that the variability in how close subjects act like a ‘matcher’ versus a ‘maximizer’ may arise from implicit assumptions about the stability of environmental statistics in a particular behavioural context.
The results demonstrate that overt attention, mediated by purposeful eye movements, complements covert attention to play a critical role in the brain’s selection and filtering process. While traditionally attentional selection was thought of as arising from limited neuronal resources at perceptual, decisional, and motor levels (Eriksen and Eriksen 1974), more recently formal Bayesian statistical models have suggested covert attentional selection to be computationally desirable beyond any resource limitation considerations (Dayan and Yu 2002; Dayan and Zemel 1999; Yu and Dayan 2005a; Yu and Cohen 2009). This work adds to this ‘selection-for-computation’ principle of attentional selection by demonstrating that overt attention also contributes to sensory processing efficiency by precisely favouring sensing locations in a manner that is sensitive to environmental statistics and task objectives. Future work is needed to clarify the precise manner in which covert and overt attention interact to mediate efficient sensory processing.
Conclusions and Discussion
In this chapter, we reviewed a number of Bayesian models that envision selective attention as selection-for-computation. From this normative viewpoint, we saw that attention for learning requires a greater amount of learning be accorded to aspects of the environment that are less well known (Dayan and Yu 2003), but attention for prediction (p. 1190) and inference requires greater emphasis on the most precise sources of information in the environment (Yu and Dayan 2005b). We saw how both expected and unexpected uncertainties play a crucial role in these computations, and discussed their putative neural realization by the cholinergic and noradrenergic neuromodulatory systems (Yu and Dayan 2005b). We saw how cortical representations of uncertainty interact with neuromodulatory uncertainty to modulate visual cortical processing of sequentially processed sensory information (Yu and Dayan 2005a). And finally, we saw how overt attention (Huang and Yu 2010) can be understood within a similar Bayesian normative rubric as covert attention.
A great deal of work still remains in understanding attentional selection in a common theoretical framework. Clearly, more needs to be understood about how the different forms of attention discussed here interact with each other. Our understanding of the underlying neural mechanisms is still in its infancy. So far, there is a rather large disconnect between theoretical models of attentional mechanisms and neurophysiological data. Some of the neural hypotheses proposed here, such as those related to ACh, NE, and cortical neural populations, still need to be experimentally explored and verified. Finally, there are many other aspects of attention that have not yet received a Bayesian treatment, such as the large body of experimental results related to feature integration (Treisman and Gelade 1980) and the binding problem (Zeki 1978; Maunsell and Newsome 1987; Wade and Bruce 2001).
References
Ammassari-Teule, M., Maho, C., and Sara, S. J. (1991). Clonidine reverses spatial learning deficits and reinstates θ frequencies in rats with partial fornix section. Behavioural Brain Research 45: 1–8.Find this resource:
Anderson, B. D. and Moore, J. B. (1979). Optimal Filtering. Eaglewood Cliffs, N.J.: Prentice-Hall.Find this resource:
Aston-Jones, G., Rajkowski, J., and Kubiak, P. (1997). Conditioned responses of monkey locus coeruleus neurons anticipate acquisition of discriminative behavior in a vigilance task. Neuroscience 80(3): 697–715.Find this resource:
Barber, M. J., Clark, J. W., and Anderson, C. H. (2003). Neural representation of probabilistic information. Neural Computation 15(8): 1843–1864.Find this resource:
Battaglia, P. W., Jacobs, R. A., and Aslin, R. N. (2003). Bayesian integration of visual and auditory signals for spatial localization. Journal of the Optical Society of America A: Optics, Image Science, and Vision 20(7): 1391–1397.Find this resource:
Baxter, M. G. and Chiba, A. A. (1999). Cognitive functions of the basal forebrain. Current Opinion in Neurobiology 9: 178–183.Find this resource:
Baxter, M. G., Holland, P. C., and Gallagher, M. (1997). Disruption of decrements in conditioned stimulus processing by selective removal of hippocampal cholinergic input. Journal of Neuroscience 17(13): 5230–5236.Find this resource:
Bayes, T. (1763). An essay toward solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society 53: 370–418.Find this resource:
Behrens, T. E. J., Woolrich, M. W., Walton, M. E., and Rushworth, M. F. S. (2007). Learning the value of information in an uncertain world. Nature Neuroscience 10(9): 1214–1221. (p. 1191) Find this resource:
Blakely, E., Starin, S., and Poling, A. (1988). Human performance under sequences of fixed-ratio schedules: Effects of ratio size and magnitude of reinforcement. Psychological Record 38: 111–120.Find this resource:
Bolle, R. M. and Cooper, D. B. (1984). Bayesian recognition of local 3-D shape by approximating image intensity functions with quadric polynomials. IEEE Transactions on Pattern Analysis and Machine Intelligence 6(4): 418–429.Find this resource:
Bowman, E. M., Brown, V., Kertzman, C., Schwarz, U., and Robinson, D. L. (1993). Covert orienting of attention in Macaques. I: Effects of behavioral context. Journal of Neurophysiology 70(1), 431–443.Find this resource:
Broadbent, D. (1958). Perception and Communication. Elmsford, N.Y.: Pergamon.Find this resource:
Brown, E., Gilzenrat, M., and Cohen, J. D. (2004). The Locus Coeruleus, Adaptive Gain, and the Optimization of Simple Decision Tasks (Technical Report No. 04-01). Princeton, N.J.: Center for the Study of Mind, Brain, and Behavior, Princeton University.Find this resource:
Bucci, D. J., Holland, P. C., and Gallagher, M. (1998). Removal of cholinergic input to rat posterior parietal cortex disrupts incremental processing of conditioned stimuli. Journal of Neuroscience 18(19): 8038–8046.Find this resource:
Cho, R. Y., Nystrom, L. E., Brown, E. T., Jones, A. D., Braver, T. S., Holmes, P. J., and Cohen, J. D. (2002). Mechanisms underlying dependencies of performance on stimulus history in a two-alternative forced-choice task. Cognitive, Affective, & Behavioral Neuroscience 2(4): 283–299.Find this resource:
Clark, J. J. and Yuille, A. L. (1990). Data Fusion for Sensory Information Processing Systems. Boston, Dordrecht, and London: Kluwer Academic Press.Find this resource:
Coull, J. T., Frith, C. D., Dolan, R. J., Frackowiak, R. S., and Grasby, P. M. (1997). The neural correlates of the noradrenergic modulation of human attention, arousal and learning. European Journal of Neuroscience 9(3): 589–598.Find this resource:
Davison, M. and McCarthy, D. (1987). The Matching Law: A Research Review. Hillsdale, N.J.: Lawrence Erlbaum Associates.Find this resource:
Dayan, P. and Abbott, L. F. (2001). Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. Cambridge, Mass.: MIT Press.Find this resource:
Dayan, P., Kakade, S., and Montague, P. R. (2000). Learning and selective attention. Nature Neuroscience 3: 1218–1223.Find this resource:
Dayan, P. and Yu, A. J. (2002). ACh, uncertainty, and cortical inference. In T. G. Dietterich, S. Becker, and Z. Ghahramani (eds.), Advances in Neural Information Processing Systems 14 (pp. 189–196). Cambridge, Mass.: MIT Press.Find this resource:
Dayan, P. and Yu, A. J. (2003). Uncertainty and learning. IETE Journal of Research 49: 171–181.Find this resource:
Dayan, P. and Yu, A. J. (2006). Norepinephrine and neural interrupts. In Y. Weiss, B. Schölkopf, and J. Platt (eds.), Advances in Neural Information Processing Systems 18 (pp. 243–250). Cambridge, Mass.: MIT Press.Find this resource:
Dayan, P. and Zemel, R. S. (1999). Statistical models and sensory attention. In Proceedings of the 9th International Conference on Artificial Neural Networks (ICANN) (pp. 1017–1022). Edinburgh, UK.Find this resource:
Deutsch, J. A. and Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review 87: 272–300.Find this resource:
Devauges, V. and Sara, S. J. (1990). Activation of the noradrenergic system facilitates an attentional shift in the rat. Behavioural Brain Research 39(1): 19–28.Find this resource:
Dickinson, A. (1980). Contemporary Animal Learning Theory. Cambridge: Cambridge University Press. (p. 1192) Find this resource:
Donchin, E., Ritter, W., and McCallum, W. C. (1978). Cognitive psychophysiology: The endogenous components of the ERP. In E. Callaway, P. Tueting, and S. Koslow (eds.), Event-Related Brain Potentials In Man (pp. 1–79). New York: Academic Press.Find this resource:
Dyon-Laurent, C., Hervé, A., and Sara, S. J. (1994). Noradrenergic hyperactivity in hippocampus after partial denervation: Pharmacological, behavioral, and electrophysiological studies. Experimental Brain Research 99: 259–266.Find this resource:
Dyon-Laurent, C., Romand, S., Biegon, A., and Sara, S. J. (1993). Functional reorganization of the noradrenergic system after partial fornix section: A behavioral and autoradiographic study. Experimental Brain Research 96: 203–211.Find this resource:
Elder, J. H. and Goldberg, R. M. (2002). Ecological statistics of gestalt laws for the perceptual organization of contours. Journal of Vision 2(4): 324–353.Find this resource:
Eriksen, B. A. and Eriksen, C. W. (1974). Effects of noise letters upon the identification of a target letter in a nonsearch task. Perception & Psychophysics 16: 143–149.Find this resource:
Eriksen, C. and St James, J. (1986). Visual attention within and around the field of focal attention: A zoom lens model. Perception & Psychophysics 40(4): 225–240.Find this resource:
Ernst, M. O. and Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870): 429–433.Find this resource:
Frank, M. J., O’Reilly, R. C., and Curran, T. (2008). When memory fails, intuition reigns: Midazolam enhances implicit inference in humans. Psychological Science 17: 700–707.Find this resource:
Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6: 721–741.Find this resource:
Gil, Z., Conners, B. W., and Amitai, Y. (1997). Differential regulation of neocortical synapses by neuromodulators and activity. Neuron 19: 679–686.Find this resource:
Gold, J. I. and Shadlen, M. N. (2002). Banburismus and the brain: Decoding the relationship between sensory stimuli, decisions, and reward. Neuron 36: 299–308.Find this resource:
Green, D. M. and Swets, J. A. (1966). Signal Detection Theory and Psychophysics. Los Altos, Calif.: Peninsula Publishing.Find this resource:
Grenander, U. (1976–1981). Lectures in Pattern Theory. 3 vols.: Pattern Analysis, Pattern Synthesis, and Regular Structures. New York: Springer-Verlag.Find this resource:
Gu, Q. (2002). Neuromodulatory transmitter systems in the cortex and their role in cortical plasticity. Neuroscience 111: 815–835.Find this resource:
Hall-Johnson, E. and Poling, A. (1984). Preference in pigeons given a choice between sequences of fixed-ratio schedules: Effects of ratio values and duration of food delivery. Journal of the Experimental Analysis of Behavior 42: 127–135.Find this resource:
Hasselmo, M. E. and Schnell, E. (1994). Laminar selectivity of the cholinergic suppression of synaptic transmission in rat hippocampal region CA1: Computational modeling and brain slice physiology. Journal of Neuroscience 14(6): 3898–3914.Find this resource:
Hasselmo, M. E., Wyble, B. P., and Wallenstein, G. V. (1996). Encoding and retrieval of episodic memories: Role of cholinergic and GABAergic modulation in the hippocampus. Hippocampus 6: 693–708.Find this resource:
He, P. Y. and Kowler, E. (1989). The role of location probability in the programming of saccades: Implications for ‘center-of-gravity’ tendencies. Vision Research 29(9): 1165–1181.Find this resource:
Helmholtz, H. L. F. von (1878). The facts of perception. In R. Kahl (ed.), Selected Writings of Hermann von Helmholtz. Middletown, Conn.: Wesleyan University Press, 1971 (translated from German original Die Tatsachen in der Wahrnehmung).Find this resource:
Herrnstein, R. J. (1961). Relative and absolute strength of responses as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behaviour 4: 267–272. (p. 1193) Find this resource:
Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behaviour 13: 243–266.Find this resource:
Hoffman, J. E. and Subramaniam, B. (1995). The role of visual attention in saccadic eye movements. Perception & Psychophysics 57(6): 787–795.Find this resource:
Hsieh, C. Y., Cruikshank, S. J., and Metherate, R. (2000). Differential modulation of auditory thalamocortical and intracortical synaptic transmission by cholinergic agonist. Brain Research 800(1–2): 51–64.Find this resource:
Huang, H. and Yu, A. J. (2010). Statistical learning across trials and reward-driven decision-making within trials in an active visual search task: Comparison of human behavioral data to Bayes-optimal sensory processing and saccade planning. Paper presented at the Society for Neural Science Annual Meeting, 15 November.Find this resource:
Jacobs, R. A. (1999). Optimal integration of texture and motion cues in depth. Vision Research 39: 3621–3629.Find this resource:
Jodo, E., Chiang, C., and Aston-Jones, G. (1998). Potent excitatory influence of prefrontal cortex activity on noradrenergic locus coeruleus neurons. Neuroscience 83(1): 63–79.Find this resource:
Jones, D. N. and Higgins, G. A. (1995). Effect of scopolamine on visual attention in rats. Psychopharmacology 120(2): 142–149.Find this resource:
Kastner, S. and Ungerleider, L. G. (2000). Mechanisms of visual attention in the human cortex. Annual Review of Neuroscience 23: 315–341.Find this resource:
Kimura, F., Fukuada, M., and Tusomoto, T. (1999). Acetylcholine suppresses the spread of excitation in the visual cortex revealed by optical recording: Possible differential effect depending on the source of input. European Journal of Neuroscience 11: 3597–3609.Find this resource:
Knill, D. C. and Richards, W. (eds.) (1996). Perception as Bayesian Inference. Cambridge: Cambridge University Press.Find this resource:
Kobayashi, M. (2000). Selective suppression of horizontal propagation in rat visual cortex by norepinephrine. European Journal of Neuroscience 12(1): 264–272.Find this resource:
Körding, K. P., Beierholm, U., Ma, W., Quartz, S., Tenenbaum, J., and Shams, L. (2007a). Causal inference in cue combination. PLoS One 2(9): e943.Find this resource:
Körding, K. P., Tenenbaum, J. B., and Shadmehr, R. (2007b). The dynamics of memory as a consequence of optimal adaptation to a changing body. Nature Neuroscience 10(6): 779–786.Find this resource:
Körding, K. P. and Wolpert, D. M. (2004). Bayesian integration in sensorimotor learning. Nature 427: 244–247.Find this resource:
LaBerge, D. (1983). Spatial extent of attention to letters and words. Journal of Experimental Psychology: Human Perception and Performance 9(3): 371–379.Find this resource:
Laming, D. R. J. (1968). Information Theory of Choice-Reaction Times. London: Academic Press.Find this resource:
Lee, M. D., Zhang, S., Munro, M., and Steyvers, M. (2011). Psychological models of human and optimal performance in bandit problems. Cognitive Systems Research 12: 164–174.Find this resource:
McAdams, C. J. and Maunsell, J. H. (2000). Attention to both space and feature modulates neuronal responses in macaque area V4. Journal of Neurophysiology 83(3): 1751–1755.Find this resource:
McAdams, C. J. and Maunsell, J. H. R. (1999). Effects of attention on orientation-tuning functions of single neurons in macaque cortical area V4. Journal of Neuroscience 19: 431–441.Find this resource:
McGaughy, J., Ross, R. S., and Eichenbaum, H. (2008). Noradrenergic, but not cholinergic, deafferentation of prefrontal cortex impairs attentional set-shifting. Neuroscience 153: 63–71.Find this resource:
Mackintosh, N. J. (1983). Conditioning and Associative Learning. Oxford: Oxford University Press.Find this resource:
Marroquin, J. L., Mitter, S., and Poggio, T. (1987). Probabilistic solution of ill-posed problems in computational vision. Journal of the American Statistical Association 82(397): 76–89. (p. 1194) Find this resource:
Maunsell, J. H. and Newsome, W. T. (1987). Visual processing in monkey extrastriate cortex. Annual Review of Neuroscience 10: 363–401.Find this resource:
Miller, E. K. and Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual Review of Neuroscience 24: 167–202.Find this resource:
Missonnier, P., Ragot, R., Derouesné, C., Guez, D., and Renault, B. (1999). Automatic attentional shifts induced by a noradrenergic drug in Alzheimer’s disease: Evidence from evoked potentials. International Journal of Psychophysiology 33: 243–251.Find this resource:
Moore, T. and Fallah, M. (2001). Control of eye movements and spatial attention. Proceedings of the National Academy of Sciences USA 98(3): 1273–1276.Find this resource:
Moran, J. and Desimone, R. (1985). Selective attention gates visual processing in the extrastriate cortex. Science 229: 782–784.Find this resource:
Norman, D. A. (1968). Toward a theory of memory and attention. Psychological Review 75(6): 522–536.Find this resource:
Nowak, M. and Sigmund, K. (1993). A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature 364: 56–58.Find this resource:
O’Connor, D. H., Fukui, M. M., Pinsk, M. A., and Kastner, S. (2002). Attention modulates responses in the human lateral geniculate nucleus. Nature Neuroscience 15(1): 31–45.Find this resource:
O’Neill, J., Siembieda, D. W., Crawford, K. C., Halgren, E., Fisher, A., and Fitten, L. J. (2003). Reduction in distractibility with AF102B and THA in the macaque. Pharmacology, Biochemistry, and Behavior 76(2): 306–301.Find this resource:
Oswal, A., Ogden, M., and Carpenter, R. H. S. (2007). The time course of stimulus expectation in a saccadic decision task. Journal of Neurophysiology 97: 2722–2730.Find this resource:
Otto, A. R., Taylor, E. G., and Markman, A. B. (2011). There are at least two kinds of probability matching: Evidence from a secondary task. Cognition 118: 274–279.Find this resource:
Pearce, J. M. and Hall, G. (1980). A model for Pavlovian learning: Variation in the effectiveness of conditioned but not unconditioned stimuli. Psychological Review 87: 532–552.Find this resource:
Phillips, J. M., McAlonan, K., Robb, W. G. K., and Brown, V. (2000). Cholinergic neurotransmission influences covert orientation of visuospatial attention in the rat. Psychopharmacology 150: 112–116.Find this resource:
Pineda, J. A., Westerfield, M., Kronenberg, B. M., and Kubrin, J. (1997). Human and monkey P3-like responses in a mixed modality paradigm: Effects of context and context-dependent noradrenergic influences. International Journal of Psychophysiology 27: 223–240.Find this resource:
Poling, A., Edwards, T., Weeden, M., and Foster, T. M. (2011). The matching law. Psychological Record 61(2): 313–322.Find this resource:
Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology 32: 3–25.Find this resource:
Posner, M. I. and Petersen, S. E. (1990). The attention system of the human brain. Annual Review of Neuroscience 13: 25–42.Find this resource:
Prendergast, M. A., Jackson, W. J., Terry, A. V. J., Decker, M. W., Arneric, S. P., and Buccafusco, J. J. (1998). Central nicotinic receptor agonists ABT-418, ABT-089, and (–)-nicotine reduce distractibility in adult monkeys. Psychopharmacology 136(1): 50–58.Find this resource:
Rajkowski, J., Kubiak, P., and Aston-Jones, P. (1994). Locus coeruleus activity in monkey: Phasic and tonic changes are associated with altered vigilance. Synapse 4: 162–164.Find this resource:
Randall, C. K. and Zentall, T. R. (1997). Win-stay/lose-shift and win-shift/lose-stay learning by pigeons in the absence of overt response mediation. Behavioural Processes 41(3): 227–236.Find this resource:
Rao, R. P. (2004). Bayesian computation in recurrent neural circuits. Neural Computation 16: 1–38. (p. 1195) Find this resource:
Rapoport, A. and Chammah, A. M. (1965). Prisoner’s Dilemma. Ann Arbor, Mich.: University of Michigan Press.Find this resource:
Ratcliff, R. (2001). Putting noise into neurophysiological models of simple decision making. Nature Neuroscience 4: 336–337.Find this resource:
Reynolds, J. H. and Chelazzi, L. (2004). Attentional modulation of visual processing. Annual Review of Neuroscience 27: 611–647.Find this resource:
Rizzolatti, G., Riggio, L., Dascola, I., and Umiltá, C. (1987). Reorienting attention across the horizontal and vertical meridians: evidence in favor of a premotor theory of attention. Neuropsychologia 25: 31–40.Find this resource:
Robbins, T. W. and Everitt, B. J. (1995). Arousal systems and attention. In M. S. Gazzaniga (ed.), The Cognitive Neurosciences (pp. 703–720). Cambridge, Mass.: MIT Press.Find this resource:
Roesch, M. R. and Olson, C. R. (2003). Impact of expected reward on neuronal activity in prefrontal cortex, frontal and supplementary eye fields and premotor cortex. Journal of Neurophysiology 90(3): 1766–1789.Find this resource:
Roitman, J. D. and Shadlen, M. N. (2002). Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. Journal of Neuroscience 22(21): 9475–9489.Find this resource:
Sahani, M. and Dayan, P. (2003). Doubly distributional population codes: simultaneous representation of uncertainty and multiplicity. Neural Computation 15: 2255–2279.Find this resource:
Sara, S. J. (1989). Noradrenergic-cholinergic interaction: Its possible role in memory dysfunction associated with senile dementia. Archives of Gerontology and Geriatrics, Supplement 1: 99–108.Find this resource:
Sara, S. J. (1998). Learning by neurons: Role of attention, reinforcement and behavior. Comptes Rendus de l’Academie des Sciences Série III: Sciences de la Vie/Life Sciences 321: 193–198.Find this resource:
Sara, S. J., Dyon-Laurent, C., Guibert, B., and Leviel, V. (1992). Noradrenergic hyperactivity after fornix section: Role in cholinergic dependent memory performance. Experimental Brain Research 89: 125–132.Find this resource:
Sara, S. J. and Hervé-Minvielle, A. (1995). Inhibitory influence of frontal cortex on locus coeruleus neurons. Proceedings of the National Academy of Sciences USA 92: 6032–6036.Find this resource:
Sara, S. J. and Segal, M. (1991). Plasticity of sensory responses of LC neurons in the behaving rat: Implications for cognition. Progress in Brain Research 88: 571–585.Find this resource:
Sara, S. J., Vankov, A., and Hervé, A. (1994). Locus coeruleus-evoked responses in behaving rats: A clue to the role of noradrenaline in memory. Brain Research Bulletin 35: 457–465.Find this resource:
Sarter, M. and Bruno, J. P. (1997). Cognitive functions of cortical acetylcholine: Toward a unifying hypothesis. Brain Research Reviews 23: 28–46.Find this resource:
Scheibehenne, B., Wilke, A., and Todd, P. M. (2011). Expectations of clumpy resources influence predictions of sequential events. Evolution and Human Behavior 32(5): 326–333.Find this resource:
Shams, L., Ma, W. J., and Beierholm, U. (2005). Sound-induced flash illusion as an optimal percept. NeuroReport 16(17): 1923–1927.Find this resource:
Shenoy, P. and Yu, A. J. (2012). Stimulus expectancy, norepinephrine, and inhibitory control (Under review).Find this resource:
Smith, P. L. and Ratcliff, R. (2004). Psychology and neurobiology of simple decisions. Trends in Neurosciences 27(3): 161–168.Find this resource:
Soetens, E., Boer, L. C., and Hueting, J. E. (1985). Expectancy or automatic facilitation? Separating sequential effects in two-choice reaction time. Journal of Experimental Psychology: Human Perception and Performance 11: 598–616.Find this resource:
Steyvers, M., Lee, M. D., and Wagenmakers, E. J. (2009). A Bayesian analysis of human decision-making on bandit problems. Journal of Mathematical Psychology 53: 168–179. (p. 1196) Find this resource:
Stigchel, S. Van der and Theeuwes, J. (2007). The relationship between covert and overt attention in endogenous cuing. Perception & Psychophysics 69(5): 719–731.Find this resource:
Sugrue, L. P., Corrado, G. S., and Newsome, W. T. (2004). Matching behavior and the representation of value in the parietal cortex. Science 304(5678): 1782–1787.Find this resource:
Sutton, R. S. (1992). Gain adaptation beats least squares? In Proceedings of the 7th Yale Workshop on Adaptive and Learning Systems (pp. 161–166).Find this resource:
Szeliski, R. M. (1989). Bayesian Modeling of Uncertainty in Low-Level Vision. Norwell, Mass.: Kluwer Academic Press.Find this resource:
Terry, A. V. J., Risbrough, V. B., Buccafusco, J. J., and Menzaghi, F. (2002). Effects of (+/−)-4-[[2-(1-methyl-2-pyrrolidinyl)ethyl]thio]phenol hydrochloride (SIB-1553A), a selective ligand for nicotinic acetylcholine receptors, in tests of visual attention and distractibility in rats and monkeys. Journal of Pharmacology and Experimental Therapeutics 301(1): 384–392.Find this resource:
Treisman, A. (1969). Strategies and models of selective attention. Psychological Review 76: 282–299.Find this resource:
Treisman, A. G. and Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology 12(1): 97–136.Find this resource:
Turetsky, B. I. and Fein, G. (2002). α2-noradrenergic effects on ERP and behavioral indices of auditory information processing. Psychophysiology 39: 147–157.Find this resource:
Vankov, A., Hervé-Minvielle, A., and Sara, S. J. (1995). Response to novelty and its rapid habituation in locus coeruleus neurons of freely exploring rat. European Journal of Neuroscience 109: 903–911.Find this resource:
Verleger, R., Jaskowski, P., and Wauschkuhn, B. (1994). Suspense and surprise: On the relationship between expectancies and P3. Psychophysiology 31(4): 359–369.Find this resource:
Vul, E., Goodman, N. D., Griffiths, T. L., and Tenenbaum, J. B. (2009). One and done? Optimal decisions from very few samples. In Proceedings of the 31st Annual Conference of the Cognitive Science Society. Amsterdam, Netherlands.Find this resource:
Wade, N. J. and Bruce, V. (2001). Surveying the seen: 100 years of British vision. British Journal of Psychology 92: 79–112.Find this resource:
Wald, A. (1947). Sequential Analysis. New York: John Wiley and Sons.Find this resource:
Wald, A. and Wolfowitz, J. (1948). Optimal character of the sequential probability ratio test. Annals of Mathematical Statistics 19: 326–339.Find this resource:
Warren, J. M. (1966). Reversal learning and the formation of learning sets by cats and rhesus monkeys. Journal of Comparative and Physiological Psychology 61(3): 421–428.Find this resource:
Weiss, Y. and Fleet, D. J. (2002). Velocity likelihoods in biological and machine vision. In R. P. N. Rao, B. A. Olshausen, and M. S. Lewicki (eds.), Probabilistic Models of the Brain: Perception and Neural Function (pp. 77–96). Cambridge, Mass.: MIT Press.Find this resource:
Witte, E. A. and Marrocco, R. T. (1997). Alteration of brain noradrenergic activity in rhesus monkeys affects the alerting component of covert orienting. Psychopharmacology 132: 315–323.Find this resource:
Worthy, D. A. and Maddox, W. T. (2012). Age-based differences in strategy use in choice tasks. Frontiers in Neuroscience: doi: 10.3389/fnins.2011.00145.Find this resource:
Yantis, S. and Jonides, J. (1984). Abrupt visual onsets and selective attention: Evidence from visual search. Journal of Experimental Psychology: Human Perception and Performance 10(5): 601–621.Find this resource:
Yarbus, A. L. (1967). Eye Movements and Vision (B. Haigh, Trans.). New York: Plenum Press.Find this resource:
Yu, A. J., Bentley, P., Seymour, B., Driver, J., Dolan, R., and Dayan, P. (2004). Expected and unexpected uncertainties control allocation of attention in a novel attentional learning task. Society for Neuroscience Abstracts 30: 176.17.Find this resource:
Yu, A. J. and Cohen, J. D. (2009). Sequential effects: Superstition or rational behavior? Advances in Neural Information Processing Systems 21: 1873–1880. (p. 1197) Find this resource:
Yu, A. J. and Dayan, P. (2002). Acetylcholine in cortical inference. Neural Networks 15(4/5/6): 719–730.Find this resource:
Yu, A. J. and Dayan, P. (2003). Expected and unexpected uncertainty: ACh and NE in the neocortex. In S. T. S. Becker and K. Obermayer (eds.), Advances in Neural Information Processing Systems 15 (pp. 157–164). Cambridge, Mass.: MIT Press.Find this resource:
Yu, A. J. and Dayan, P. (2005a). Inference, attention, and decision in a Bayesian neural architecture. In L. K. Saul, Y. Weiss, and L. Bottou (eds.), Advances in Neural Information Processing Systems 17 (pp. 1577–1584). Cambridge, Mass.: MIT Press.Find this resource:
Yu, A. J. and Dayan, P. (2005b). Uncertainty, neuromodulation, and attention. Neuron 46: 681–692.Find this resource:
Yu, A. J., Dayan, P., and Cohen, J. D. (2009). Dynamics of attentional selection under conflict: Toward a rational Bayesian account. Journal of Experimental Psychology: Human Perception and Performance 35: 700–717.Find this resource:
Zaborszky, L., Gaykema, R. P., Swanson, D. J., and Cullinan, W. E. (1997). Cortical input to the basal forebrain. Neuroscience 79(4): 1051–1078.Find this resource:
Zeki, S. M. (1978). Functional specialization in the visual cortex of the rhesus monkey. Nature 274: 423–428.Find this resource:
Zemel, R. S., Dayan, P., and Pouget, A. (1998). Probabilistic interpretation of population codes. Neural Computation 10: 403–430.Find this resource: