# Quantum Models of Cognition and Decision

## Abstract and Keywords

*Quantum probability theory* provides a new formalism for constructing probabilistic and dynamic systems of cognition and decision. The purpose of this chapter is to introduce psychologists to this fascinating theory. This chapter is organized into six sections. First, some of the basic psychological principles supporting a quantum approach to cognition and decision are summarized; second, some notations and definitions needed to understand quantum probability theory are presented; third, a comparison of quantum and classical probability theories is presented; fourth, quantum probability theory is used to account for some paradoxical findings in the field of human probability judgments; fifth, a comparison of quantum and Markov dynamic theories is presented; and finally, a quantum dynamic model is used to account for some puzzling findings of decision-making research. The chapter concludes with a summary of advantages and disadvantages of a quantum probability theoretical framework for modeling cognition and decision.

Keywords: quantum probability, classical probability, Hilbert space, the law of total probability, conjunction fallacy, disjunction fallacy, dynamic models of decision, Markov models, quantum interference, two-stage gambles, dynamic consistency

Reasons for a Quantum Approach to Cognition and Decision

This chapter is not about quantum physics per se.^{1} Instead, it explores the application of probabilistic dynamic systems derived from quantum theory to a new domain – cognition and decision making behavior. Applications of quantum theory have appeared in judgment (Aerts & Aerts, 1994; Busemeyer, Pothos, Franco, & Trueblood, 2011; Franco, 2009; Pothos, Busemeyer, & Trueblood, 2013; Wang & Busemeyer, 2013), decision making (Bordley & Kadane, 1999; Busemeyer, Wang, & Townsend, 2006; Khrennikov & Haven, 2009; Lambert-Mogiliansky, Zamir, & Zwirn, 2009; La Mura, 2009; Pothos & Busemeyer, 2009; Trueblood & Busemeyer, 2011; Yukalov & Sornette, 2011), conceptual combinations (Aerts, 2009; Aerts & Gabora, 2005; Blutner, 2009), memory (Brainerd, Wang, & Reyna, 2013; Bruza et al., 2009), and perception (Atmanspacher, Filk, & Romer, 2004; Conte et al., 2009). Several review articles (Pothos & Busemeyer, 2013; Wang, Busemeyer, Atmanspacher, & Pothos, 2013) and books (Busemeyer & Bruza, 2012; Ivancevic & Ivancevic, 2010; Khrennikov, 2010) provide a summary of this new program of research. Before presenting the formal ideas, let us first examine why quantum theory should be applicable to human cognition and decision behavior.

## Judgments Are Based upon Indefinite and Uncertain Cognitive States

Models commonly used in psychology assume the cognitive system changes from moment to moment, but at any specific moment it is in a definite state with respect to some judgment to be made. For example, suppose a juror has just heard conflicting evidence from the prosecutor and (p. 370) the defense and the juror has to consider two mutually exclusive and exhaustive hypotheses—guilty or not guilty. A Bayesian model would assign a probability distribution over the two hypotheses— a probability $p(G|evidence)$ is assigned to guilt and a probability $1-p(G|evidence)$ is assigned to not guilty. Therefore, the juror’s subjective probability with respect to the question of guilty or not boils down to a state represented by a point lying somewhere between zero and one on the probability scale at each moment. This probability may change from moment to moment to produce a definite trajectory of probability for guilt across time. However, at each moment, this subjective probability either favors guilt $p(G|evidence)\text{}\text{}.50$, or it favors not guilty $p(G|evidence)\text{}\text{}.50$, or it is exactly at $p(G|evidence)\text{}=\text{}.50$. At a single moment, the juror cannot be both favoring guilt $p(G|evidence)\text{}\text{}.50$ and at the same time favoring not guilty $p(G|evidence)\text{}\text{}.50$.

In contrast, quantum theory assumes that during deliberation the juror is in an *indefinite (superposition)* state at each moment. While in an indefinite state, the juror does not necessarily favor guilty and at the same time the juror does not necessarily favor not guilty. Instead, the juror is in a superposition state that leaves the juror conflicted, ambiguous, confused, or uncertain about the guilty status. The potential for saying guilt may be greater than the potential for saying not guilty at one moment, and these potentials may change from one moment to the next, but either hypotheses could potentially be chosen at *each* moment. In quantum theory, there is *no* single trajectory or sample path across time before making a decision. When asked to make a decision, the juror would be forced to commit to either guilt or not.

## Judgments Create Rather Than Record a Cognitive State

Models commonly used in psychology assume that what we record at a particular moment reflects the state of the cognitive system as it existed immediately before we inquired about it. For example, if a person watches a scene of an exciting car chase and is asked “Are you afraid?” then the answer “Yes. I am afraid” reflects the person’s cognitive state with respect to that question just before we asked it.

In contrast, quantum theory assumes that taking a measurement of a system creates rather than records a property of the system (Wang, Busemeyer, Atmanspacher, & Pothos, 2013). For example, the person may be ambiguous about his or her feelings after watching the scene, but the answer “Yes. I am afraid” is constructed from the interaction of this indefinite state and the question, which results in a now definitely “afraid” state. This is, in fact, the basis for modern psychological theories of emotion (Schachter & Singer, 1962). Decision scientists also have shown evidence that beliefs and preferences are constructed online rather than simply being read straight out of memory (Payne, Bettman, & Johnson, 1992), and expressing choices and opinions can change preferences (Sharot, Velasquez, & Dolan, 2010).

## Judgments Disturb Each Other Producing Order Effects

According to quantum theory, the answer to a question can change a state from indefinite to definite state, and this change causes one to respond differently to subsequent questions. Intuitively, the answer to the first question sets up a context that changes the answer to the next question, and this produces order effects of the measurements. Order effects make it impossible to define a joint probability of answers to questions A and B (unless one conditionalizes the conjunction with an order parameter), and instead it is necessary to assign a probability to the sequence of answers to question A followed by question B. In quantum theory, if A and B are two measurements, and the probabilities of the outcomes depend on the order of the measurements, then the two measurements are *noncommutative.* Many of the mathematical properties of quantum theory, such as Heisenberg’s famous uncertainty principle (Heisenberg, 1958), arise from developing a probabilistic model for noncommutative measurements. Question order effects are a major concern for attitude researchers, who struggle for a theoretical understanding of these effects similar to that achieved in quantum theory (Feldman & Lynch, 1988). Of course quantum theory is not the only theory to explain order effects. Markov models, for example, also can produce order effects. Quantum theory, however, provides a more natural, elegant, and built in set of principles (as opposed to *ad hoc* assumptions) for explaining order effects (Wang, Solloway, Shiffrin, & Busemeyer, 2014).

## Judgments Do Not Always Obey Classical Logic

Probabilistic models commonly used in psychology are based on the Kolmogorov axioms (1933/1950), which define events as sets that
(p. 371)
obey the axioms of set theory and Boolean logic. One important axiom is the distributive axiom: If {*G*, *T*, *F*} are events then $G\cap (T\cup F)=(G\cap T)\cup (G\cap F)$. Consider for example, the concept that a boy is good (*G*), and the pair of concepts that the boy told the truth (*T*) versus the boy did not tell truth (*F*). According to classical Boolean logic, the event *G* can only occur in one of two ways: either $(G\cap T)$ occurs or $(G\cap F)$ exclusively. From this distributive axiom, one can derive the law of total probability, $p(G)=p(T)p(G|T)+p(F)p(G|F)$.

Quantum probability theory is derived from the von Neumann axioms (1932/1955), which define events as subspaces that obey different axioms from those of set theory. In particular, the distributive axiom does not always hold (Hughes, 1989). For example, according to quantum logic, when you try to decide whether a boy is good without knowing if he is truthful or not, you are *not* forced to have only two thoughts: he is good and he is truthful, or he is good and he is not truthful. You can remain ambiguous or indeterminate over the truthful or not truthful attributes, which can be represented by a superposition state. The fact that quantum logic does not always obey the distributive axiom implies that the quantum model does not always obey the law of total probability (Khrennikov, 2010).

Preliminary Concepts, Definitions, and Notations

Quantum theory is based on geometry and linear algebra defined on a Hilbert space. (Hilbert spaces are complex vector spaces with certain convergence properties.) Paul Dirac developed an elegant notation for expressing the abstract elements of the theory, which are used in this chapter. This chapter is restricted to finite spaces for simplicity, but note that the theory is also applicable to infinite dimensional spaces. In fact, to keep examples simple, this section introduces the ideas using only a three-dimensional space in order to visually present the ideas. Figure 17.1 shows a particular vector labeled S that lies within a three-dimensional space spanned by three basis vectors labeled A, B, and C. For example, a simple attitude model could interpret S as the state of opinion of a person with regard to the beauty of an artwork using three mutually exclusive evaluations “good,” “mediocre,” or “bad,” which are represented by the basis vectors A, B, and C, respectively.

A finite Hilbert space is an *N*-dimensional vector space defined on a field of complex numbers and endowed with an inner product.^{2} The space has a *basis,* which is a set of *N* orthonormal basis vectors $\chi =\{|{X}_{1}\u3009,\dots ,|{X}_{N}\u3009\}$ that span the space. The symbol $|X\u3009$ represents an arbitrary vector in an *N*-dimensional vector space, which is called a “ket.” This vector can be expressed by its coordinates with respect to the basis *χ* as follows

The coordinates, *x _{i}* are complex numbers. The

*N*coordinates representing the ket $|X\u3009$ with respect to a basis

*χ*forms an $N\text{}\text{}\times \text{}\text{}1$ column matrix

Referring to Figure 17.1, the coordinates for the specific vector $|S\u3009$ with respect to the $\left\{\right|A\u3009,\text{}\text{}|B\u3009,\text{}\text{}|C\u3009\}$ basis equal

Referring back to our simple attitude model, the coordinates of *S* represents the potentials for each of the opinions.

The symbol $\u3008X|$ represents a linear functional in an *N*-dimensional (dual) vector space, which is called a “bra.” Each ket $|X\u3009$ has a corresponding bra $\u3008X|$. The conjugate transpose operation, ${(|X\u3009}^{\u2020}=\u3008X|)$ changes ket into a bra. The *N* coordinates representing the bra $\u3008X|$ with respect to a basis *χ* forms a $1\text{}\text{}\times \text{}\text{}N$ row matrix

(p. 372) The * symbol indicates complex conjugation. For example, the bra $\u3008S|$ corresponding to the ket $|S\u3009$ has the matrix representation given below as

(Here the numbers in the example are real, and so conjugation has no effect.)

Hilbert spaces are endowed with an inner product. Psychologically, the inner product is a measure of similarity between two vectors. The inner product is a scalar formed by applying the bra to the ket to form a bra-ket

For example

This shows that the ket $|S\u3009$ is unit length.

The outer product, denoted by $|X\u3009\text{}\text{}\u3008Y|$, is a linear operator, which is used to make transitions from one state to another. In particular, assuming that the kets are unit length, then the outer product $|X\u3009\text{}\text{}\u3008Y|$ maps the ket $|Y\u3009$ to the ket $|X\u3009$ as follows: $(|X\u3009\text{}\u3008Y|)\cdot |Y\u3009=|X\u3009\text{}\u3008Y|Y\u3009=|X\u3009$. Assuming $|X\u3009$ is unit length, the outer product $|X\u3009\text{}\text{}\u3008X|$ projects $|X\u3009$ to itself, $|X\u3009\text{}\u3008X|\cdot |X\u3009=|X\u3009\text{}\u3008X|X\u3009=|X\u3009\cdot 1$, and $|X\u3009\text{}\text{}\u3008X|$ projects any other ket $|Y\u3009$ onto the ray spanned by $|X\u3009$ as follows $|X\u3009\text{}\u3008X|Y\u3009=\u3008X|Y\u3009\cdot |X\u3009$. For these reasons, $|X\u3009\text{}\text{}\u3008X|$ is called the projector for the ray spanned by $|X\u3009$, which is also symbolized as ${M}_{X}=|X\u3009\text{}\u3008X|.$. Projectors correspond to subspaces that represent events in quantum theory. They are Hermitian and idempotent. Referring to Figure 17.1, the coordinates for the basis vector $|A\u3009$ (with respect to the $\left\{\right|A\u3009,\text{}\text{}|B\u3009,\text{}\text{}|C\u3009\}$ basis) simply equals

The projector ${M}_{A}=A\cdot {A}^{\u2020}$ corresponds to the subspace representing the event A. In our simple attitude model, *M _{A}* would be used to represent the event that the person decides the artwork to be “good” (which corresponds to event A). The matrix representation of the projection of the ket $|S\u3009$ onto the ray spanned by the basis $|A\u3009$ then equals

In our simple attitude model, this projection is used to determine the probability that the person decides the artwork is “good.” According to quantum theory, the squared length of this projection, ${0.696}^{2}=0.4844$, equals the probability that the person will decide that the artwork is “good.” Similarly, the coordinates for the basis vector $|B\u3009$ (with respect to the $\left\{\right|A\u3009,\text{}\text{}|B\u3009,\text{}\text{}|C\u3009\}$ basis) simply equals

So the matrix representation (with respect to the $\left\{\right|A\u3009,\text{}\text{}|B\u3009,\text{}\text{}|C\u3009\}$ basis) for the projector for $|B\u3009$ equals

In our simple attitude model, this projector is used to represent the event that the person decides the artwork to be “mediocre” (which corresponds to event B). In addition, the horizontal plane shown in Figure 17.1 is spanned by the $\left\{\right|A\u3009,\text{}\text{}|B\u3009\}$ basis vectors, and the projector that projects vectors onto this plane equals ${M}_{A}+{M}_{B}=|A\u3009\text{}\u3008A|+|B\u3009\text{}\u3008B|$ which has the matrix representation

In our simple attitude example, this corresponds to the event that the person thinks the artwork is “good” or “mediocre.” The vector labeled T in Figure 17.1 is the projection of the vector $|S\u3009$ onto the plane spanned by the $\left\{\right|A\u3009,\text{}\text{}|B\u3009\}$ basis vectors, which has the matrix representation (p. 373)

The squared length of this projection, ${T}^{\u2020}T=2\cdot {696}^{2}=0.969$, equals the probability that the person decides the artwork to be “good” or “mediocre.” This is also the probability that the person thinks the artwork is not “bad”$(1-{0.1765}^{2}=0.969)$.

Referring back to our simple attitude model, suppose that instead of asking whether the artwork is beautiful, we ask what kind of moral message it conveys, and once again there are three answers such as “good,” “neutral,” or “bad.” Now the person needs to evaluate the same artwork with respect to a new point of view. In quantum theory, this new point of view is represented as a change in the basis. Figure 17.2 illustrates three new orthonormal vectors within the same three-dimensional space labeled U, V, and W in the figure. Now the basis vectors U, V, and W represent a “good,” “neutral,” or “bad” moral message, respectively. The state S now represents the person’s opinion with respect to this new moral message point of view.

With respect to the $\left\{\right|A\u3009,\text{}\text{}|B\u3009,\text{}\text{}|C\u3009\}$ basis, the coordinates for these three vectors are as follows

These three vectors form another orthogonal basis for spanning the three-dimensional space. The projector ${M}_{V}=|V\u3009\text{}\text{}\u3008V|$ projects vectors onto the ray spanned by the basis vector $|V\u3009$ as follows: ${M}_{V}|X\u3009=|V\u3009\text{}\u3008V|X\u3009$. Using the coordinates defined above for $|V\u3009$ we obtain,

In sum, the same vector $|S\u3009$ can be expressed by the coordinates *X* using the $\left\{\right|A\u3009,\text{}\text{}|B\u3009,\text{}\text{}|C\u3009\}$ basis or by the coordinates *Y* using the $\left\{\right|U\u3009,\text{}\text{}|V\u3009\},\text{}\text{}|W\u3009\}$ basis.

Note that the event “morally good” is represented by the vector U in Figure 17.2. This vector lies along the diagonal line of the A, B plane. Here we see an interesting feature of quantum theory. If a person is definite that the piece of artwork is “morally good” (represented by the vector U), then the person must be uncertain about whether it’s beauty is good versus mediocre (because U has a 45 degree angle with respect to each of the A, B vectors). However, if the person is certain that the artwork is “morally good” then the person is certain that it’s beauty is not “bad” (because U is contained in the A, B plane).

Quantum Compared to Classical Probabilities

This section presents the quantum probability axioms formulated by Paul Dirac (1958) and John von Neumann (1932/1955), and compares them systematically with the axioms of classical Kolmogorov probability theory (1933/1950) (see Box 1 for a summary)^{3}. For simplicity, we restrict this presentation to finite spaces in this chapter. Although the space is finite, the number of dimensions can be very large. The general theory is applicable to infinite dimensional spaces. See Chapter 2 in Busemeyer and Bruza (2012) for a more comprehensive introduction.
(p. 374)

## Events

Classical probability postulates a sample space *χ*, which we will assume contains a finite number of points, *N* (and *N* may be very large). The set of points in the sample space is defined as $\chi =\{{X}_{1},\dots ,{X}_{N}\}$. An event *A* is a subset of this sample space $A\subseteq \chi $. If $A\subseteq \chi $ is an event and $B\subseteq \chi $ is an event, then the intersection $A\cap B$ is an event; also the union $A\cup B$ is an event.

Quantum theory postulates a Hilbert space *χ*, which we will assume has a finite dimension, *N* (and again *N* may be very large). The space is spanned by an orthonormal set of basis vectors $\chi =\{|{X}_{1}\u3009,\dots ,|{X}_{N}\u3009\}$ that form abasis for the space.^{4} An event *A* is a subspace spanned by a subset ${\chi}_{A}\subseteq \chi $ of basis vectors. This event corresponds to a projector ${M}_{A}={\displaystyle {\sum}_{i\in A}|{X}_{i}\u3009\text{}\text{}\u3008{X}_{i}|}$. If *A* is an event spanned by ${\chi}_{A}\subseteq \chi $ and *B* is an event spanned by ${\chi}_{B}\subseteq \chi $, then the meet (infimum) $A\wedge B$ is an event spanned by ${\chi}_{A}\cap {\chi}_{B}$; also thejoin (supremum) $A\vee B$ is an event spanned by ${\chi}_{A}\cup {\chi}_{B}$.

For example, in Figure 17.1, the event A is represented by the ray spanned by the vector $|A\u3009$, and “A or B” is represented by the horizontal plane spanned by the two vectors $\left\{\right|A\u3009,\text{}\text{}|B\u3009\}$ for the quantum model.

## System State

Classical probability postulates a probability function *p* that maps points in the sample space *χ* into positive real numbers which sum to unity. The empty set is mapped into zero, and the sample space is mapped into one, and all other events are mapped into the interval [0,1]. If the pair of events , $\{A\subseteq \chi ,\text{}B\subseteq \chi \}$ are mutually exclusive $A\text{}\cap \text{}B=\varnothing $, then $p(A\cup B)=p(A)+p(B)$. The probability of the event “not A,” denoted $\overline{A}$, equals $p(\overline{A})=1-P(A)$.

Quantum probability postulates a unit length state vector $|X\u3009$ in the Hilbert space. The probability of an event *A* spanned by ${\chi}_{A}\subseteq \chi $ is defined by $q(A)={\Vert {M}_{A}|X\u3009\Vert}^{2}$. For later work, it will be convenient to express ${\Vert M|X\u3009\Vert}^{2}$ as the inner product

If the pair of events {*A*, *B*}, both spanned by subsets of basis vectors from *χ*, are mutually exclusive, ${\chi}_{A}\cap {\chi}_{B}=\varnothing $, then it follows from orthogonality that $q(A\vee B)={\Vert {M}_{A}+{M}_{B}|X\u3009\Vert}^{2}={\Vert {M}_{A}|X\u3009\Vert}^{2}+{\Vert {M}_{B}|x\u3009\Vert}^{2}=q(A)+q(B)$. The event *A* is the subspace that is the orthogonal complement to the subspace for the event *A*, and its probability equals $q(\overline{A})=\Vert (I-{M}_{A})|X\u3009{\Vert}^{2}=1-q(A).$

For example, in Figure 17.1, the probability of the event A equals ${\Vert {M}_{A}|S\u3009\Vert}^{2}={\Vert A\cdot {A}^{\u2020}\cdot S\Vert}^{2}=|.696{|}^{2}$, and the probability of the event “A or B” equals ${\Vert {M}_{A}+{M}_{B}|S\u3009\Vert}^{2}={\Vert (A\cdot {A}^{\u2020}+B\cdot {B}^{\u2020})\cdot S\Vert}^{2}={\Vert T\Vert}^{2}=2\cdot |.696{|}^{2}$.

## State Revision

According to classical probability, if an event $A\subseteq \chi $ is observed, then a new conditional probability function is defined by the mapping $p({X}_{i}|A)=\frac{p({X}_{i}\cap A)}{p(A)}$. The normalizing factor in the denominator is used to guarantee that the probability assigned to the entire sample space remains equal to one.

According to quantum probability, if an event *A* is observed, then the new revised state is defined by $|{X}_{A}\u3009=\frac{{M}_{A}|X\u3009}{\Vert {M}_{A}|X\u3009\Vert}$. The normalizing factor in the denominator is used to guarantee that the revised state remains unit length. The new revised state is then used (as described earlier) to compute probabilities for events. This is called Lüder’s rule.

(p. 375) For example, in Figure 17.1, if the event “A or B” is observed, then the matrix representation of the revised state equals

The probability of event A given that “A or B” was observed equals ${\Vert A\cdot {A}^{\u2020}\cdot {S}_{AorB}\Vert}^{2}=.50$.

## Commutativity

Classical probability assumes that for any given experiment, there is only one sample space, *χ*, and all events are contained in this single sample space. Consequently, the intersection between two events and the union of two events is always well defined. A single probability function *p* is sufficient to assign probabilities to all events of the experiment. This is called the principle of unicity (Griffiths, 2003).^{5} It follows from the commutative property of sets that joint probabilities are commutative, $p(A)\cdot p(B|A)=p(A\cap B)=p(B\cap A)=p(B)\cdot p(A|B)$.

Quantum probability assumes that there is only one Hilbert space and all events are contained in this single Hilbert space. For a single fixed basis, such as $\chi =\{|{X}_{i}\u3009,\text{}i=1,\dots ,N\}$, the meet and the join of two events spanned by a common set of basis vectors in *χ* are always well defined, and a probability function *q* can be used to assign probabilities to all the events defined with respect to the basis *χ*. When a common basis is used to define all the events, then the events are compatible.

The beauty of a Hilbert space is that there are many choices for the basis that one can use to describe the space. For example, in Figure 17.2, a new basis using vectors $\left\{\right|U\u3009,\text{}\text{}|V\u3009\},\text{}\text{}|W\u3009\}$ was introduced to represent the state $|S\u3009$, which was obtained by rotating the original basis $\left\{\right|A\u3009,\text{}\text{}|B\u3009,\text{}\text{}|C\u3009\}$ used in Figure 17.1. Suppose ${\chi}^{\prime}=\{|{Y}_{i}\u3009,\text{}i=1,\dots ,N\}$ is another orthonormal basis for the Hilbert space. If event *A* is spanned by ${\chi}_{A}\subset \chi $, and event *B* is spanned by ${{\chi}^{\prime}}_{B}\subset {\chi}^{\prime}$, then the meet for these two events is not defined; also the join for these two events is not defined either (Griffths, 2003). In this case, the events are not compatible. That is, the projectors for these two events do not commute, ${M}_{A}{M}_{B}\ne {M}_{B}{M}_{A}$, and the projectors for these two events do not share a common set of eigenvectors. In this case, it is not meaningful to assign a probability simultaneously to the pair of events {*A*, *B*} (Dirac, 1958). When the events are incompatible, the principle of unicity breaks down and the events cannot all be described within a single sample space. The events spanned by *χ*, which are all compatible with each other, form one sample space; and the events spanned by *χ*′, which are combatible with each other, form another sample space; but the events from *χ* are not compatible with the events from *χ*′. In this case, there are two stochastically unrelated samples spaces (Dzhafarov & Kujala, 2012), and quantum theory provides a single state $|S\u3009$ that can be used to assign probabilities to both sample spaces.

For noncommutative events, probabilities are assigned to histories or sequences of events using Lüder’s rule. Suppose *A* is an event spanned by ${\chi}_{A}\subseteq \chi $, and event *B* is spanned by ${{\chi}^{\prime}}_{B}\subseteq {\chi}^{\prime}$. Consider the probability for the sequence of events: *A* followed by B. The probability of the first event *A* equals $q(A)={\Vert {M}_{A}|X\u3009\Vert}^{2}$; the revised state, conditioned on observing this event equals $|{X}_{A}\u3009=\frac{{M}_{A}|X\u3009}{||{M}_{A}|X\u3009||}$; the probability of the second event, conditioned on the first event, equals $q(B|A)={\Vert {M}_{B}|{X}_{A}\u3009\Vert}^{2}$; therefore, the probability of event *A* followed by event *B* equals

If the projectors do not commute, then order matters because

For example, referring to Figures 17.1 and 17.2, the probability of event “A or B” and then event V equals

The probability of the event V and then the event “A or B” equals

If all events are compatible, then quantum probability theory reduces to classical probability theory. In this sense, quantum probability is a generalization of classical probability theory (Gudder, 1988).

## (p. 376) Violations of the Law of Total Probability

The quantum axioms do not necessarily have to obey the classical law of total probability in the following manner. Consider an experiment with two different conditions. The first condition simply measures whether event *B* occurs. The second condition first measures whether *A* occurs, and then measures whether *B* occurs. For both conditions, we compute the probability of the event B. For the first condition, this is simply $q(B)={\Vert {M}_{B}|X\u3009\Vert}^{2}$. For the second condition, we could observe the sequence with event *A* and then event *B* with probability $q(A)\cdot q(B|A)={\Vert {M}_{B}{M}_{A}|X\u3009\Vert}^{2}$, or we could observe the sequence with event $\overline{A}$ and then event *B* with probability $q(\overline{A})\cdot q(B/\overline{A})={\Vert {M}_{B}{M}_{\overline{A}}|X\u3009\Vert}^{2}$, and so the *total* probability for event *B* in the second experiment equals the sum of these two ways: ${}_{qT}(B)={\Vert {M}_{B}{M}_{A}|X\u3009\Vert}^{2}+{\Vert {M}_{B}{M}_{\overline{A}}|X\u3009\Vert}^{2}$. The *interference* produced in this experiment is defined as the probability ofevent *B* observed in the first condition minus the total probability of event *B* observed in the second condition. According to quantum probability theory, the interference equals $Int{=}_{q}(B){-}_{qT}(B)$. To analyze this more closely, let us decompose the probability from the first condition as follows:

An interference cross-product term, denoted *Int*, appears in the probability ${}_{q}(B)$ from the first condition, which produces deviations from the total probability ${}_{qT}(B)$ computed from the second condition. This interference term can be positive (i.e., constructive interference) or negative (i.e., destructive interference) or zero (i.e., no interference). If the two projectors commute so that ${M}_{A}{M}_{B}={M}_{B}{M}_{A}$, then the interference is zero. There is also an interference term associated with the complementary probability

The interference term associated with $q(\overline{B})$ must be the negative of the interference term associated with ${}_{q}(B)$ because we must finally obtain

A skeptic might argue that the preceding rules for assigning probabilities to events defined as subspaces are *ad hoc*, and maybe there are many other rules that one could use. In fact, a famous theorem by Gleason (1957) proves that these are the only rules one can use to assign probabilities to events defined as subspaces using an additive measure (at least for vector spaces of dimension greater than 2).

Now let us turn to a couple of example applications of this theory. Quantum cognition and decision has been applied to a wide range of findings in psychology (see Box 2). In this chapter we only have space to show two illustrations— an application to probability judgment errors, and another application to violations of rational decision-making.

Application to Probability Judgment Errors

Quantum theory provides a unified and coherent account for a broad range of puzzling findings in the area of human judgments. The theory has provided accounts for order effects on attitude judgments (Wang & Busemeyer, 2013; Wang et al., 2014), inference (Trueblood & Busemeyer, 2010), and causal reasoning (Trueblood & Busemeyer, 2011). The same theory has also been used to account for conjunction and disjunction errors found with probability judgments (Franco, 2009), as well as overextension and underextension errors found in conceptual combinations (Aerts, 2009). This section briefly describes how the theory accounts for conjunction and disjunction errors in probabilistic judgments (Busemeyer et al., 2011).

Conjunction and disjunction probability judg-ment errors are very robust and they have been found with a wide variety of examples (Tversky & Kahneman, 1983). Here we consider an example, where a judge is provided with a brief story about a hypothetical woman named Linda (circa 1980s):

(p. 377)Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.

Then the judge is asked to rank the likelihood of the following events: Linda is (a) active in the feminist movement, (b) a bank teller, and (c) active in the feminist movement and a bank teller, (d) active in the feminist movement and not a bank teller, (e) active in the feminist movement or a bank teller. The conjunction fallacy occurs when option c is judged to be more likely than option b (even though it can be argued from the classical perspective that the latter contains the former), and the disjunction fallacy occurs when option a is judged to be more likely than option e (again, even though it can be argued that the latter contains the former). For example, in a study (Morier & Borgida, 1984), the mean probability judgments were ordered as follows: using *J*(A) to denote the mean probability judgment for event A, $J\text{(feminist)=}\text{.83}\text{}\text{}\text{}J\text{(feministorbankteller)=}\text{}\text{}\text{.60}\text{}\text{}\text{}J\text{(feministandbankteller)=}\text{}\text{}\text{.36}\text{}\text{}\text{}\text{}\text{}J\text{(bankteller)=}\text{.26}$ (*N* = 64 observations per mean, and all pairwise differences are statistically significant). These results violate classical probability theory which is the reason that they are called fallacies. What follows is a simple yet general model for these types of findings.

The first assumption is that after reading the story about Linda, a person forms an initial belief state $|S\u3009$ that represents the person’s beliefs about features or properties that may or may not be true about Linda. Formally, this belief state is a vector within an *N*-dimensional vector space. This belief state is used to answer *any* possible question that might be asked about Linda.

The second assumption is that a question such as “Is Linda a feminist?” is represented by a ${N}_{F}\text{}\text{}N$ -dimensional subspace of the N-dimensional vector space. This subspace corresponds to a projector *M _{F}* that projects the state vector onto the subspace representing the feminist question. The question “Is Linda a bank teller?” is represented by another subspace of dimension ${N}_{B}\text{}\text{}N$ with a corresponding projector

*M*.

_{B}The third assumption is that the projectors *M _{F}*,

*M*do not commute so that ${M}_{F}{M}_{B}\ne {M}_{B}{M}_{F}$, and thus, the order of their applications matters. The reason why these two projectors do not commute is the following. The two concepts (feminist, bank teller) are rarely experienced together, and so the person has not formed a compatible representation of beliefs about combinations of both concepts using a common basis. A person may have formed one basis representing features related to feminists, but this basis differs from the basis used to represent features related to bank tellers. The concepts do not share the same basis and so they are incompatible, and the person needs to change from one basis to the other sequentially in order to answer questions about each concept.

_{B}The fourth assumption concerns the order that the concepts are processed when asked “Is Linda a feminist and a bank teller?”. Given that the events are incompatible, the person has to pick an order to process them. It is assumed that the more likely event is processed first. It is quite easy to judge the order of each individual question such as that $J\text{(feminist)}J\text{(bankteller)}$. But the question about
(p. 378)
“feminist bank teller” is subtler, and this is not as easy as the previous two questions. The judgment for the conjunction requires forming an additional and subtler judgment about the conditional probability *J*(bank teller given feminist).

These assumptions are now used to derive the quantum predictions for the probability of bank teller (using Eq. 2) :

According to the quantum model, a conjunction error occurs when

Formally, the negative interference term produces the conjunction error. Intuitively, the Linda story produces a belief state that is almost orthogonal to the subspace for the bank teller event. However, if this state is first projected onto the feminist subspace (eliminating some details about Linda that make it impossible for her to be a bank teller), then it becomes a bit more likely to think that this feminist can be a bank teller too.

Figure 17.3 illustrates how this works using a simple two-dimensional example (though we stress that the specification of the model is general and not restricted to one-dimensional subspaces). The probability for the bank teller alone question is determined from the direct projection from the initial state Psi to the bank teller (BT) axis, which is shown as the shorter light grey vertical segment. The probability for the conjunction is represented by first projecting Psi onto feminist (F), and then projecting onto bank teller, which is shown as the longer dark grey vertical segment. Note that the projection for the conjunction (dark grey vertical segment) exceeds the projection for the bank teller alone (light grey vertical segment).

The same theory can also account for the disjunction effect. The event that “Linda is a bank teller or a feminist” is the same as the denial of the event that “Linda is not a bank teller and not a feminist.” According to the quantum model, the probability of the event “not a bank teller and not a feminist” equals $q\left(\overline{B}\right)\cdot q\left(\overline{F}|\overline{B}\right)$, and so the probability of the denial equals $1-q\left(\overline{B}\right)\cdot q\left(\overline{F}|\overline{B}\right)$. The disjunction error is predicted to occur when the probability of “feminist” exceeds the disjunction, that is, when $q\left(F\right)>1-q\left(\overline{B}\right)\cdot q\left(\overline{F}|\overline{B}\right)\to q\left(\overline{F}\right)<q\left(\overline{B}\right)\cdot q\left(\overline{F}|\overline{B}\right)$. Therefore, according to the quantum model, a disjunction error occurs when

To account for both of these fallacies using the same principles and parameters, the model must predict the following order effect (see Busemeyer et al., 2011, appendix)

Intuitively, the probability obtained by first considering whether Linda is a feminist and then considering whether she is a bank teller must be greater than the probability obtained by the opposite order. Order effects in this direction have been reported—asking people to judge “is Linda a bank teller” before asking them to judge “is Linda a feminist and a bank teller” significantly reduces the size of the conjunction error as compared to the opposite order (Stolarz-Fantino, Fantino, Zizzo, & Wen, 2003).

The fact that the quantum model can account for both conjunction and disjunction errors using the same principles and same parameters is a definite advantage over other accounts, such as an averaging model. As described in Busemeyer et al. (2011), there are many other qualitative predictions that can be derived from this model. In particular, because $q(F)>q(F)q(B|F)$, this model cannot produce *double* conjunction errors, in which the conjunction is greater than both individual events.

(p. 379) Empirically, indeed single conjunction errors are much more common and double conjunction errors are infrequent (Yates & Carlson, 1986). Another prediction from this model is that assuming the conjunction error occurs so that $q(F)\cdot q(B|F)\text{}\text{}q(B)$, then $q(B|F)\text{}\ge \text{}q(B)$ because $q(B|F)\text{}\ge \text{}q(F)\cdot q(B|F)\text{}\text{}q(B)$. The intuition here is that, given the detailed knowledge about Linda, it is almost impossible for Linda to be a bank teller; but given that she is viewed more generally as a feminist, it is more likely to think that a feminist also can be a bank teller. This is an important prediction that needs further empirical testing. (See Tentori & Crupi, 2013, for arguments against this prediction.)

The model presented in this section can account for conjunction errors, disjunction errors, averaging effects, and order effects. It is, however, only one of many possible ways to build models of probability judgments using quantum principles. In particular, Aerts (Aerts, 2009) and his colleagues (Aerts & Gabora, 2005) have developed alternative quantum models that account for conjunction and disjunction errors in conceptual combinations. Importantly, their model can produce *double* conjunction errors; but unfortunately it must change parameters to account for differences between conjunction and disjunction errors. In summary, the quantum axioms provide a common set of general principles that can be implemented in different ways to construct more specific and competing quantum models of the same phenomena. Each of the specific quantum models can be compared with each other and with other classical models with respect to their ability account for empirical results.

Quantum Dynamics

This section presents the quantum dynamical principles and compares them with Markov processes used in classical dynamical systems. Markov theory provides the basis for constructing a wide variety of classical probability models in cognitive science (e.g., random walk/diffusion models of decision-making). Once again, we restrict this presentation to finite dimensional systems in this chapter. Although finite, the number of dimensions can be very large, and both quantum and Markov processes can readily be extended to infinite dimensional systems. See Busemeyer et al. (2006) and Chapter 7 in Busemeyer and Bruza (2012) for a more comprehensive treatment.

## State Space

Both quantum and Markov models begin with a set of *N* states $\chi =\{|{X}_{1}\u3009,\dots ,|{X}_{N}\u3009\}$, where the number of states, *N*, can be very large. According to the Markov model, a state such as $|{X}_{i}\u3009$ represents all the information required to characterize the system at some moment, and *χ* represents the set of all the possible characterizations of the system across time. At any moment in time, the Markov system is exactly located at some specific state in *χ*, and across time the state changes from one element to another in *χ*. In comparision, according to the quantum model, a state, such as $|{X}_{i}\u3009$, represents a basis vector used to describe the system, and the set *χ* is a set of basis vectors that span an *N*-dimensional vector space. At any moment in time, the system is in a superposition state, $|\psi \u3009\text{}$, which is a point within the vector space spanned by *χ*, and across time the point $|\psi \u3009\text{}$ moves around in the vector space (until a measurement occurs, which reduces the state to the observed basis vector).

## Initial State

According to the Markov model, the system starts at some particular element of *χ*. However, this initial state may be unknown to the investigator, in which case a probability, denoted $\text{0}\le {\varphi}_{i}(0)\le 1$, is assigned to each state $|{X}_{i}\u3009$. The *N* initial probabilities form a $N\text{}\text{}\times \text{}\text{}1$ column matrix

It will be convenient to define a $1\text{}\text{}\times \text{}\text{}N$
row matrix as $J=[1\cdots 1]$, which is used for summation. More generally, *ϕ*(*t*) represents the probability distribution across states in *χ* at time *t*. The Markov model requires this probability distribution to sum to unity: $J\cdot \varphi (t)=1$.

According to the quantum model, the system starts in a superposition state $|\psi (0)\u3009={\displaystyle \sum {\psi}_{i}(0)}$. $|{X}_{i}\u3009$ where ${\psi}_{i}$ is the coordinate (called amplitude) assigned to the basis vector $|{X}_{i}\u3009$. The *N* amplitudes for the initial state form $N\text{}\text{}\times \text{}\text{}1$ column matrix

More generally, *ψ*(*t*) represents the amplitude distribution across basis vectors in *χ* at time *t*. The quantum model requires the squared length of this amplitude distribution to equal unity: $\psi {(t)}^{\u2020}\psi (t)=1$.

## (p. 380) State Transitions

According to the Markov model, the probability distribution across states evolves across time according to the linear transition law

*T*representing the probability of transiting to a state in row

_{ij}*i*from a state in column

*j*. The transition matrix of a Markov model is called

*stochastic*because the columns of $T(t+h,t)$ must sum to one to guarantee that the resulting probability distribution continues to sum to one, that is $J\cdot \varphi (t+\tau )=1$, and recall that $J=[1\text{}\text{}1\text{}\text{}1\text{}\dots \text{}1]$. (The rows, however, are not required to sum to one.) In many applications, it is assumed that the transition matrix is stationary so that $T({t}_{2}+\tau ,\text{}\text{}{t}_{2})=T({t}_{1}+\tau ,\text{}\text{}{t}_{1})=T(\tau )$ for all

*t*and

*τ*. The transition matrix of a Markov model is called a stochastic matrix because the columns of the transition matrix must sum to unity.

According to the quantum model, the amplitude distribution evolves across time according to the linear transition law

*U*representing the amplitude for transiting to row

_{ij}*i*from column

*j*. The unitary matrix must satisfy the unitary property ${U}^{\u2020}\cdot U=\text{}I$ (

*I*is the identity matrix) in order to guarantee that $\psi {(t)}^{\u2020}\psi (t)=\text{}1$. That is, the columns are unit length and each pair of columns is orthogonal. A transition matrix can be formed from the unitary matrix by taking the squared modulus of each of the cell entries of $U(t+\tau ,\text{}\text{}t)$. The transition matrix formed in this manner is

*doubly stochastic:*both the rows and columns ofthis transition matrix must sum to unity. This is a more

*restrictive*constraint on the transition matrix as compared to the Markov model. In many applications, it is assumed that the unitary matrix is stationary so that $U({t}_{2}+\tau ,\text{}\text{}{t}_{2})=U({t}_{1}+\tau ,\text{}\text{}{t}_{1})=U(\tau )$ for all

*t*and

*τ*.

According to the Markov model, the stationary transition matrix obeys the Kolmogorov forward equation

*K*is the intensity matrix, with element

*K*, and ${K}_{ij}\ge 0$ for $i\ne j$, and ${\sum}_{j}{K}_{ij}=0$ to guarantee that

_{jj}*T*(

*t*) remains a transition matrix.

According to the quantum model, the stationary unitary matrix obeys the Schrödinger equation

*H*is the Hamiltonian matrix, which is a Hermitian matrix ${H}^{\u2020}=H$, to guarantee that

*U*(

*t*) is a unitary matrix. This is where complex numbers enter in a significant way into quantum models.

For the Markov model, the solution to the Kolmogorov foward equation is the following matrix exponential function

For the quantum model, the solution to the Schrödinger equation is the following complex matrix exponential function

In summary, the probability distribution across states for the Markov model at time *t* equals

*t*equals

The most important step for building a dynamic model is specifying the intensity matrix for the Markov model or specifying the Hamiltonian matrix for the quantum model. Here psychological science enters by developing a mapping from the psychological factors onto the parameters that define these matrices. An example is provided following this section to illustrate this model development process.

## Response Probabilities

Consider the probability of observing the response *R _{k}* at time

*t*, which is denoted $p(R(t)={R}_{k})$. In this section, we use the same choice probability notation for both the Markov and quantum models. Assume that

*φ*(

*t*) is the current probability distribution for the Markov model and

*ψ*(

*t*) is the current amplitude distribution for the quantum model at time

*t*. Both the Markov and quantum models determine the probability of a response by evaluating the set of states that map onto that particular response. Suppose a subset of states, ${\chi}_{k}\subset \chi $, are mapped onto a response

*R*. Define

_{k}*M*as a $N\text{}\times \text{}N$ indicator matrix, which is a diagonal matrix with ones on the diagonal corresponding to (p. 381) the states mapped onto the response

_{k}*R*, and zeros everywhere else. Then according to the Markov model, the response probability equals (recall $J=[1\text{}\text{}1\text{}\text{}1\text{}\dots \text{}1]$

_{k}If in fact, the response *R _{k}* is observed at time

*t*, then the new probability distribution, conditioned on this observation equals

According to the quantum model, the response probability equals

If in fact, the response *R _{k}* is observed at time

*t*, then the new amplitude distribution, conditioned on this observation equals

The conditional states, $\varphi (t|{R}_{k})$ for the Markov model and $\psi (t|{R}_{k})$ for the quantum model, then become the “initial” states to be used for further evolution and future observations.

Application to Decision Making

This section examines two puzzling findings from decision research. One is the violation of the “sure thing” principle (Tversky & Shafir, 1992). Savage introduced the “sure thing” principle as a rational axiom for the foundation of decision theory (1954). According to the sure thing principle, if under state of the world X, you prefer action A over B, and if under the complementary state of the world $\overline{X}$, you also prefer action A over B, then you should prefer action A over B even when you do not know the state of the world. A violation of the sure thing principle occurs when A is preferred over B for each known state of the world, but the opposite preference occurs when the state of the world is unknown.

The other puzzling finding is the violation of the principle of dynamic consistency, called dynamic inconsistency. Dynamic consistency is considered in standard theory to be a rational principle for making dynamic decisions involving a sequence of actions and events over time. According to the backward induction algorithm used to form optimal plans with dynamic decisions, a person works backward making plans at the end of the sequence in order to decide actions at the beginning of the sequence. To be dynamically consistent, when reaching the decisions at the end of the sequence, one should follow through on the plan that was used to make the decision at the beginning of the sequence. Violations of dynamic decision-making occur when people change plans and fail to follow through on a plan once they arrive at the final decisions.

## Two-Stage Gambling Paradigm

Tversky and Shafir (1992) experimentally investigated the sure thing principle using a two-stage gamble. They presented 98 students with a target gamble that had an equal chance of winning $200 or losing $100 (they used hypothetical money). The students were asked to imagine that they already played the target gamble once, and now they were asked whether they wanted to play the same gamble a second time. Each person experienced three conditions that were separated by a week and mixed with other decision problems to produce independent decisions. They were asked if they wanted to play the gamble a second time, given that they won the first play (Condition 1: known win), given that they lost the first play (Condition 2: known loss), and when the outcome of the first play was unknown (Condition 3: unknown). If they thought they won the first gamble, the majority (69%) chose to play again; if they thought they lost the first gamble, then again the majority (59%) chose to play again; but if they didn’t know whether they won or lost, then the majority chose not to play (only 36% wanted to play again). Tversky and Shafir (1992) explained these findings by claiming that people fail to follow through on consequential reasoning. When a person knows she/he has won the first gamble, then a reason to play again arises from the fact that she/he has extra house money to play with. When the person knows she/he has lost the first gamble, then a reason to play again arises from the fact that she/he needs to recover for their losses. When the person does not know the outcome of the first play, these reasons fail to arise. However, why not?

Pothos and Busemeyer (2009) explained these and other results found by Shafir and Tversky (1992) using the concept of quantum interference. Referring back to the section *Violations of the Law of Total Probability,* define the event *B* as deciding to play the gamble on the second stage, define event
(p. 382)
*A* as winning the first play, and define event $\overline{A}$ as losing the first play. Then Eq. 2 expresses the probability of playing the gamble on the second stage for the unknown condition in terms of the total probability, ${}_{qT}(B)$, of playing the second stage on either of the two known conditions, plus the interference term *Int*. Given that the probability of winning the first stage equals .50, then a violation of the sure thing principle is predicted whenever ${}_{qT}(B)\text{}\text{}.50$ and the interference term *Int* is sufficiently negative so that $q(B)\text{}\text{}\text{}\text{}.50$. But what determines the interference term? To answer this question, Pothos and Busemeyer (2009) developed a dynamic quantum model to account for the violation of the sure thing principle. This model is described in detail later, but before presenting these modeling details, let us first examine the second puzzling finding regarding violations of dynamic consistency. The same model is used to explain both findings.

Barkan and Busemeyer (1999, 2003) used the same two-stage gambling paradigm to study another phenomena called dynamic inconsistency, which occurs whenever a person changes plans during decision-making. Each study included a total of 100 people, and each person played a series of gambles twice. Each gamble had an equal chance of producing a win or a loss (e.g., equal chance to win 200 points or lose 100 points, where each point was worth $0.01). Different gambles were formed by changing the amounts to win or lose. For each gamble, the person was forced to play the first round, and then contingent on the outcome of the first round, they were given a choice whether to take the same gamble on the second round. Choices were made under two conditions: a planned versus a final choice. For the planned choice, contingent on winning the first round, the person had to select a plan about whether to take or reject the gamble on the second round; contingent on losing the first round, the person had to make another plan about whether to take or reject the gamble on the second round. Then the first-stage gamble was actually played out and the actual win or loss was revealed. For the final choice, after actually experiencing the win on the first round, the person made a final decision to take or reject the second round. Likewise, after actually experiencing a loss on the first round, the person had to decide whether to take or reject the gamble on the second round. The plan and the final decisions were made equally valuable because the experimenter randomly selected either the planned action or the final action to determine the final payoff with real money at stake. The results showed that people violate the dynamic consistency principle: Following an actual win, they changed from planning to take to finally rejecting the second stage; following an actual loss, they changed from planning to reject to finally taking the second stage. For example, Table 17.1 shows the results from the four gambles used by Barkan and Busemeyer (1999). The first two columns show the amounts to win or lose, the next two colums show the probability of taking the gamble under the plan (conditioned a planned win or loss), and the last two columns show the probability to take the gamble for the final decision (conditioned on an experienced win or a loss). Similar results were found by Barkan and Busemeyer (2003) using 17 different gambles. For later reference, we will denote the amount of the win by *XW* and the amount of the loss by *XL.* So for example, in the first row, ${x}_{w}=80$ and ${x}_{L}=100$.

It is worth mentioning that the results shown in Table 1 once again demonstate a violation of the classical law of total probability in the following way. If the law of total probability holds, then the probability of taking the gamble during the plan (denoted $p(T|P)$ and shown in the columns labeled “Plan Win, Plan Lose”) equals the probability of winning the first play (denoted *p*(*W*) which was stated to be equal to .50) times the probability to take the gambe after a win (denoted $p(T|W)$ and shown under the column “Final Win” in Table 17.1) plus the probability of losing the first play (denoted $p(L)=1-p(W)$ times the probability of taking the gamble following a loss (denoted $p(T|L)$ and shown under the column “Final Loss” in Table 17.1) so that $p(T|P)=p(W)\cdot p(T|W)+p(L)\cdot p(T|L)$. All the gambles have the same probability of winning, and *p*(*W*) is fixed across gambles and is stated in the problem to be equal to .50. However, these assumptions fail to reproduce the findings shown in Table 17.1. For example, we require $p(W)=.64$ to reproduce the data in the first row, but we require $p(W)=.43$ to reproduce the third row, and we require $p(W)=.31$ to reproduce the data in the fourth row, and even worse, no legitimate value of*p*(*W*) can be found to reproduce the second row.

## Markov Dynamic Model for Two-Stage Gambles

First let us construct a general Markov model for this two-stage gambling task. The Markov model (p. 383) uses a four-dimensional state space with states

Table 17.1. Barkan and Busemeyer (1999).

Win |
Lose |
Plan Win |
Plan Loss |
Final Win |
Final Loss |
---|---|---|---|---|---|

80 |
100 |
.25 |
.26 |
.20 |
.35 |

80 |
40 |
.76 |
.72 |
.69 |
.73 |

200 |
100 |
.68 |
.68 |
.60 |
.75 |

200 |
40 |
.84 |
.86 |
.76 |
.89 |

Before evaluating the payoffs of the gamble, the decision maker has an initial state represented by *ϕ*(0). This initial state depends on information about the outcome of the first play. If the outcome of the first play is unknown (i.e., the planning stage), then the initial state is set equal to $\varphi (0)={\varphi}_{U}$, which has coordinates ${\varphi}_{WT}={\varphi}_{WR}={\varphi}_{LT}={\varphi}_{LR}=\frac{1}{4}$. If the first play is known to be a win, then the initial state is set equal to $\varphi (0)={\varphi}_{W}$ with coordinates ${\varphi}_{WT}={\varphi}_{WR}=\frac{1}{2}$, ${\varphi}_{LT}={\varphi}_{LR}=0$. If the first play is known to be a loss, then the initial state is set equal to $\varphi (0)={\varphi}_{L}$ with coordinates ${\varphi}_{WT}={\varphi}_{WR}=0$, ${\varphi}_{LT}={\varphi}_{LR}=\frac{1}{2}$.

The probabilities of taking the gamble, depending on the win or lose first game belief states, are then determined by a transition matrix

*T*is a $2\text{}\text{}\times \text{}\text{}2$ transition matrix conditioned on winning, and

_{W}*T*is a $2\text{}\text{}\times \text{}\text{}2$ transition matrix conditioned on losing.

_{L}The matrix that picks out the states corresponding to the action of “taking the gamble” is represented by

Recall that $J=[1\text{}\text{}\text{}1\text{}\text{}\text{}1\text{}\text{}\text{}1]$ sums across states to obtain the probability of a response. Finally, the Markov model predicts:

The last line shows that the Markov model must satisfy the law of total probability. Note that the Markov model must always obey the law of total probability, and, thus, already qualitatively fails to account for the violation of the sure thing principle and the dynamic inconsistency results described earlier.

## (p. 384) Quantum Dynamic Model for Two-Stage Gambles

Pothos and Busemeyer (2009) developed a quantum dynamic model that has been applied to the two-stage gambling task. The quantum model also uses a four-dimensional vector space spanned by four basis vectors

The matrix representation of this superposition state is the $4\text{}\times \text{}1$ column matrix (length equal to one) composed of two parts

Before evaluating the payoffs of the gamble, the decision maker has an initial state represented by *ψ*(0). This initial state depends on information about the outcome of the first play. If the outcome of the first play is unknown (i.e., the planning stage), then the initial state is set equal to $\psi (0)\text{}=\text{}{\psi}_{U}$, which has coordinates .50. If the first play is known to be a win, then the initial state is set equal to $\psi (0)\text{}=\text{}{\psi}_{W}$ with coordinates ${\psi}_{WT}={\psi}_{WR}=\sqrt{.50}$, ${\psi}_{LT}\text{}=\text{}{\psi}_{LR}=0$. If the first play is known to be a loss, then the initial state is set equal to $\psi (0)\text{}=\text{}{\psi}_{L}$ with coordinates ${\psi}_{WT}\text{}=\text{}{\psi}_{WR}=0$, ${\psi}_{LT}={\psi}_{LR}=\sqrt{.50}$.

Evaluation of the gamble payoffs causes the initial state *ψ*(0) to evolve into a final state *ψ*(*t*) after a period of deliberation time *t*, and this final state is used to decide whether to take or reject the gamble at the second stage. The Hamiltonian *H* used for this evolution is $H={H}_{1}+{H}_{2}$, where

The matrix *H _{W}* in the upper left corner of

*H*

_{1}rotates the state toward taking or rejecting the gamble depending on the final payoffs $({x}_{W}+{x}_{W},{x}_{W}-{x}_{L})$, given an initial win of the amount of

*x*from the first play. The coefficients

_{W}*h*and

_{W}*h*in the Hamiltonian

_{L}*H*are supposed to range between −1 and +1. So we need to map the utility differences into this range. The hyperbolic tangent provides a smooth S-shaped mapping. We then define

_{W}*h*in terms of the utility difference following a win as follows:

_{W}The variable *u _{W}* is the utility of playing the gamble after a win, which uses a risk-aversion parameter

*a*and a loss-aversion parameter

*b.*The variable

*D*is the difference between the utility of taking and rejecting the gamble after a win. The matrix

_{W}*H*in the bottom right corner of

_{L}*H*

_{1}rotates the state toward taking or rejecting the gamble depending on the final payoffs $({x}_{W}-{x}_{L},\text{}-{x}_{L}-{x}_{L})$ given an initial loss of the amount

*x*from the first play. Once again, using the hyperbolic tangent, we map the utility differences following a loss into

_{L}*h*as follows:

_{L}The variable *u _{L}* is the utility of playing the gamble after a loss, which uses the same risk aversion
(p. 385)
parameter

*a*and the same loss aversion parameter

*b*. The variable

*D*is the difference between the utility of taking and rejecting the gamble after a loss.

_{L}The matrix *H _{2}* is designed to align beliefs with actions. This produces a type of “hot hand” effect. The parameter

*c*determines the extent that beliefs can change from their initial values during the evaluation process, and it is critical for producing interference effects. Critically, if the parameter

*c*is set to zero, then the quantum model reduces to a special case of a Markov model, the law of total-probability holds, and there are no interference effects. According to the quantum- model hypothesis, a nonzero value of this parameter

*c*is expected, which will reproduce the 17 different quantum interference terms for the 17 different gambles.

The initial state evolves to the final state according to the unitary evolution

Following Pothos and Busemeyer (2009), the deliberation time was set equal to $t=\frac{\pi}{2}$, because at this time point, the evolution of preference first reaches an extreme point. The projector for choosing to gamble is represented by the indicator matrix that picks the “take gamble” action

The probability of taking the gamble for the known win, known loss, and unknown (plan) conditions then equals

The parameter *c* is critical for producing violations of the law of total probability. If we set the quantum-model parameter $c=0$, then the quantum model predicts

Therefore, if $c\ne 0$, the quantum model violates the law of total probability; but if $c=0$, the quantum model satisfies the law of total probability. In fact, when $c=0$, the Markov model can reproduce the predictions of the quantum model by setting each element of the first row of the transition matrix *T _{W}* equal to $p(T|W)$ predicted by the quantum model, and by setting each element of the first row of the transition matrix

*T*equal to $p(T|L)$ predicted by the quantum model. In other words, we can obtain a Markov model from the quantum model by setting $c=0$.

_{L}## Model Comparisons

Next, we compare the Markov (obtained by setting $c=0$ in the quantum model) and the quantum model (allowing $c\ne 0$) with respect to their fits to the Barkan and Busemeyer (2003) results in three different ways. The first is to compare least-squares fits to the 17 (gambles with different payoff conditions) × 2 (plan versus final) = 34 mean choice proportions reported in Barkan and Busemeyer (2003). The second is to compare the models using maximum likelihood estimates at the individual level and using AIC and BIC methods. The third is to estimate the hierarchical Bayesian posterior distribution for the critical parameter *c* that distinguishes the two models.

The models are first compared using ${R}^{2}=1-\frac{SSE}{TSS}$, and *adjusted* ${R}^{2}=1-\frac{SSE}{TSS}\cdot \frac{34-1}{34-n}$. The latter index includes a penalty term for extra parameters. These statistics were computed with $SSE={\displaystyle \sum {\left({P}_{i}-{p}_{i}\right)}^{2}}$ and $TSS={\displaystyle \sum {\left({P}_{i}-\overline{P}\right)}^{2}}$, where *P _{i}* is the observed mean proportion of trials to choose gamble $i=1,\dots ,34$,

*p*is the predicted mean proportion, $\overline{P}$ is the grand mean, $34=17$ (payoff conditions) × 2 (plan versus final stage choices) is the number of observed choice proportions being fit. The quantum model has $n=3$ parameters, and the best-fitting parameters (minimizing sum of squared error) are $a=0.71$ (risk aversion), $b=2.54$ (loss aversion), and $c=-4.40$. The risk aversion parameter is a bit below one as expected, and the loss parameter

_{i}*b*exceeds one, as it should be. The (p. 386) model produced an ${R}^{2}=0.8234$ and an

*adjusted*${R}^{2}=0.8120$. The Markov model, obtained by setting $c=0$ in the quantum model, has only two parameters and it produced an ${R}^{2}=0.7854$ and an

*adjusted*${R}^{2}=0.7787$, which are lower than those of the quantum model.

Next, the models were compared using AIC and BIC methods based on maximum likehood fits to individuals. For person *i* on trial *t* we observe a data pattern ${X}_{i}(t)=[{x}_{TT}(t),{x}_{TR}(t),{x}_{RT}(t),{x}_{RR}(t)]$ defined by ${x}_{jk}(t)=1$ if event (*j*, *k*) occurs and otherwise zero, where *TT* is the event “planned to take the gamble and finally took the gamble,” *TR* is the event “planned to take the gamble but finally rejected the gamble,” *RT* is the event “planned to reject the gamble but finally took the gamble” and *RR* is the event “planned to reject the gamble and finally rejected the gamble.” To allow for possible dependencies between a pair of choices within a single trial, an additional memory recall parameter, *m*, was included in each model. For both models, it was assumed that there is some probability $0\le m\le 1$ that the person simply recalls and repeats the planned choice during the final choice, and there is some probability $1-m$ that the person forgets or ignores the planned choice when making the final choice. After including this memory parameter, the prediction for each event becomes

Using these definitions for each model, the log likelihood function for the 33 trials^{6} (with a pair of plan and final choices on each trial) from a single person can be expressed as

The log likelihood from each person was converted into ${G}_{i}^{2}=-2\cdot ln\text{}\left({L}_{i}\right)$ which indexes the *lack* of fit, and the parameters that minimized ${G}_{i}^{2}$ were found for each person.^{7} The quantum model has one more parameter than the Markov model. In this case, the AIC badness of fit index is defined as ${G}_{i}^{2}+2$, where 2 is the penalty for the one extra parameter. Using AIC, 48 out of the 100 participants produced AIC indices favoring the quantum model over the Markov model. The BIC penalty depends on the number of observations, which is 33 for each person, and so for one extra parameter, the penalty equals $log(33)=3.4965$. Using the more conservative BIC index, 22 out of the 100 participants produced BIC indices favoring the quantum model over the Markov model. Thus a majority of participants were ade-quately fit by the Markov model, but a substantial percentage of participants were better fit by the quantum model.

One final method used to compare models is to examine the posterior distribution of the parameter *c* when estimated by hiearchical Bayesian methods. The details for this analysis are described in Busemeyer, Wang, and Trueblood (2012) and the results are only briefly summarized here. The hierarchical Bayesian estimation method starts by assuming a prior distribution over the individuals for each of the four quantum model parameters. Then, the likelihoods from the individual fits are used to update the prior distribution into a posterior distribution over the indivduals for the four parameters. The posterior distribution of the critical quantum parameter *c* is shown in Figure 17.4. The entire distribution lies below zero, and the mean of the distribution equals to −2.67.

This supports the hypothesis that the critical quantum parameter, *c*, is not zero, and the model does not reduce to the Markov model.

It is worth noting that the same quantum model also accounts for the violations of the sure-thing principle, whereas the Markov model cannot explain this violation. Furthermore, the same quantum model described here was used to explain two other puzzling findings (not reviewed here). One is concerned with order effects on inference (Trueblood & Busemeyer, 2010), and the other is the interference of categorization on decision making (Busemeyer et al., 2009). In sum, the same quantum model has been successfully applied to four distinct puzzling judgement and decision findings, which builds confidence in the broad applicability of the model.

Concluding Comments

This chapter provides a brief introduction to the basic principles of quantum theory and a few major paradoxical judgement and decision findings that
(p. 387)
the theory has been used to explain. The theory is new and needs further testing, but the initial successful applications demonstrate its viability and theoretical potential. Busemeyer and Bruza (2012) provide a more detailed presentation of the basic principles, and they also describe in detail a much larger number of empirical applications. Also Pothos and Busemeyer (2013) summarize applications of quantum theory to cognitive science. Finally, special issues on quantum cognition have recently appeared *in Journal of Mathematical Psychology* (Bruza, Busemeyer, & Gabora, 2009) and *Topics in Cognitive Science* (Wang, Busemeyer, Atmanspacher, & Pothos, 2013).

What are the advantages and disadvantages of the quantum approach as compared to traditional cognitive theories? First, let us consider some of the disadvantages. One is that the concepts and mathematics are very new and unfamiliar to psychologists, and learning how to use them requires an investment of time and effort. Second, because of the unfamiliarity, it may seem difficult at first to intuitively connect these ideas to traditional concepts of cognitive psychology, such as memory, attention, and information processing. Finally, applications of quantum theory to cognition have to overcome skepticism that naturally arises when introducing a revolutionary scientific new idea. However, now consider the advantages. First, the mathematics is not as difficult as it seems, and it only requires knowledge of linear algebra and differential equations. Second, once one does become familiar with the mathematics and concepts, it becomes apparent that quantum theory provides a conceptually elegant and innovative way to formalize and represent some of the major concepts from cognition. For example, the superposition principle provides a natural way to represent parallel processing of uncertain information and capture that deep ambiguous feelings. Moreover, the consideration of quantum models allows the (re)introduction of new and useful theoretical principles into psychology, such as incompatibility, interference, and entanglement. In all, the main advantage of quantum theory is that a small set of principles provide a coherent explanation for a wide variety of puzzling results that have never been connected before under a single theoretical framework (see Box 2).

## References

Aerts, D. (2009). Quantum structure in cognition. *Journal of Mathematical Psychology*, 53(5), 314–348.Find this resource:

Aerts, D., & Aerts, S. (1994). Applications of quantum statistics in psychological studies of decision processes. *Foundations of Science*, 1, 85–97.Find this resource:

Aerts, D., & Gabora, L. (2005). A theory of concepts and their combinations ii: A Hilbert space representation. *Kybernetes*, 34, 192–221.Find this resource:

Aerts, D., Gabora, L., & Sozzo, S. (2013). Concepts and their dynamics: A quantum - theoretic modeling of human thought. *Topics in Cognitive Science*, 5, 737–773.Find this resource:

Atmanspacher, H., & Filk, T. (2010). A proposed test of temporal nonlocality in bistable perception. *Journal of Mathematical Psychology*, 54, 314–321.Find this resource:

Atmanspacher, H., Filk, T., & Romer, H. (2004). Quantum zero features of bistable perception. *Biological Cybernetics*, 90, 33–40.Find this resource:

Barkan, R., & Busemeyer, J. R. (1999). Changing plans: dynamic inconsistency and the effect of experience on the reference point. *Psychological Bulletin and Review*, 10, 353–359.Find this resource:

Barkan, R., & Busemeyer, J. R. (2003). Modeling dynamic inconsistency with a changing reference point. *Journal of Behavioral Decision Making*, 16, 235–255.Find this resource:

Blutner, R. (2009). Concepts and bounded rationality: An application of Niestegge’s approach to conditional quantum probabilities. In L. e. a. Acardi (Ed.), *Foundations of probability and physics-5* (Vol. 1101, p. 302–310).Find this resource:

Blutner, R., Pothos, E. M., & Bruza, P. (2013). A quantum probability perspective on borderline vagueness. *Topics in Cognitive Science*, 5(4), 711–736.Find this resource:

Bordley, R. F., & Kadane, J. B. (1999). Experimentdependent priors in psychology. *Theory and Decision*, 47(3), 213–227.Find this resource:

Brainerd, C. J., Wang, Z., & Reyna, V. (2013). Superposition of episodic memories: Overdistribution and quantum models. *Topics in Cognitive Science*, 5(4), 773–799.Find this resource:

Bruza, P., Kitto, K., Nelson, D., & McEvoy, C. (2009). Is there something quantum-like in the human mental lexicon? *Journal of Mathematical Psychology*, 53, 362–377.Find this resource:

Bruza, P. D., Busemeyer, J., & Gabora, L. (Eds.). (2009). Special issue on quantum cognition (Vol. 53). *Journal of Mathematical Psyvhology.*Find this resource:

Busemeyer, J. R., & Bruza, P. D. (2012). Quantum models of cognition and decision. Cambirdge University Press.Find this resource:

Busemeyer, J. R., Pothos, E. M., Franco, R., & Trueblood, J. S. (2011). A quantum theoretical explanation for probability judgment errors. *Psychological Review*, 118(2), 193–218.Find this resource:

Busemeyer, J. R., Wang, Z., & Lambert-Mogiliansky, A. (2009). Empirical comparison of markov and quantum models of decision making. *Journal of Mathematical Psychology*, 53(5), 423–433.Find this resource:

Busemeyer, J. R., Wang, Z., & Townsend, J. (2006). Quantum dynamics of human decision making. *Journal of Mathematical Psychology*, 50(3), 220–241.Find this resource:

Busemeyer, J. R., Wang, Z., & Trueblood, J. S. (2012). Hierarchical bayesian estimation of quantum decision model parameters. In J. R. Busemeyer, F. DuBois, A. Lambert- Mogiliansky, & M. Melucci (Eds.), *Quantum interaction*. lecture notes in computer science, vol. 7620 (pp. 80–89). Springer.Find this resource:

Conte, E., Khrennikov, A. Y., Todarello, O., Federici, A., Mendolicchio, L., & Zbilut, J. P. (2009). Mental states follow quantum mechanics during perception and cognition of ambiguous figures. *Open Systems and Information Dynamics*, 16, 1–17.Find this resource:

Dirac, P. A. M. (1958). The principles of quantum mechanics. Oxford University Press.Find this resource:

Dzhafarov, E., & Kujala, J. V. (2012). Selectivity in probabilistic causality: Where psychology runs into quantum physics. *Journal of Mathematical Psychology*, 56, 54–63.Find this resource:

Feldman, J. M., & Lynch, J. G. (1988). Self-generated validity and other effects of measurement on belief, attitude, intention, and behavior. *Journal of Applied Psychology*, 73(3), 421–435.Find this resource:

Franco, R. (2009). Quantum amplitude amplification algorithm: An explanaton of availability bias. In P. Bruza, D. Sofge, W. Lawless, K. van Rijsbergen, & M. Klusch (Eds.), Quantum interaction (pp. 84–96). Springer.Find this resource:

Fuss, I. G., & Navarro, D. J. (2013). Open parallel cooperative and competitive decsision processes: A potential provenance for quantum probability decision models. *Topics in Cognitive Science*, 5(4), 818–843.Find this resource:

Gleason, A. M. (1957). Measures on the closed subspaces of a Hilbert space. *Journal of Mathematical Mechanics*, 6, 885– 893.Find this resource:

Griffiths, R. B. (2003). Consistent quantum theory. Cambridge University Press.Find this resource:

Gudder, S. P. (1988). Quantum probability. Academic Press.Find this resource:

Hammeroff, S. R. (1998). Quantum computation in brain microtubles? the penrose - hameroff “orch or” model of consiousness. *Philosophical Transactions Royal Society London (A)*, 356, 1869–1896.Find this resource:

Heisenberg, W. (1958). Physics and philosophy. Harper and Row.Find this resource:

Hughes, R. I. G. (1989). The structure and interpretation of quantum mechanics. Harvard University Press.Find this resource:

Ivancevic, V. G., & Ivancevic, T. T. (2010). Quantum neural computation. Springer.Find this resource:

Khrennikov, A. Y. (2010). Ubiquitous quantum structure: From psychology to finance. Springer.Find this resource:

Kolmogorov, A. N. (1933/1950). Foundations of the theory of probability. N.Y.: Chelsea Publishing Co.Find this resource:

Lambert-Mogiliansky, A., Zamir, S., & Zwirn, H. (2009). Type indeterminacy: A model of the “KT”
(p. 389)
(Kahneman-Tversky)-man. *Journal of Mathematical Psychology*, 53(5), 349–361.Find this resource:

La Mura, P. (2009). Projective expected utility. *Journal of Mathematical Psychology*, 53(5), 408–414.Find this resource:

Morier, D. M., & Borgida, E. (1984). The conjuction fallacy: A task specific phenomena? *Personality and Social Psychology Bulletin*, 10, 243–252.Find this resource:

Payne, J., Bettman, J. R., & Johnson, E. J. (1992). Behavioral decision research: A constructive processing perspective. *Annual Review of Psychology*, 43, 87–131.Find this resource:

Peres, A. (1998). *Quantum theory: Concepts and methods*. Kluwer Academic.Find this resource:

Pothos, E. M., & Busemeyer, J. R. (2012). Can quantum probability provide a new direction for cognitive modeling? *Behavioral and Brain Sciences*, 36, 255–274.Find this resource:

Pothos, E. M., Busemeyer, J. R., & Trueblood, J. S. (2013). A quantum geometric model of similarity. *Psychological Review*, 120(3), 679–696.Find this resource:

Savage, L. J. (1954). The foundations of statistics. John Wiley & Sons.Find this resource:

Schachter, S., & Singer, J. E. (1962). Cognitive, social, and physiological determinants of emotional state. *Psychological Review*, 69(5), 379–399.Find this resource:

Shafir, E., & Tversky, A. (1992). Thinking through uncertainty: nonconsequential reasoning and choice. *Cognitive Psychology*, 24, 449–474.Find this resource:

Stolarz-Fantino, S., Fantino, E., Zizzo, D. J., & Wen, J. (2003). The conjunction effect: New evidence for robustness. *American Journal of Psychology*, 116 (1), 15–34.Find this resource:

Tentori, K., & Crupi, V. (2013). Why quantum probability does not explan the conjunction fallacy. *Behavioral and Brain Sciences*, 36 (3), 308–310.Find this resource:

Trueblood, J. S., & Busemeyer, J. R. (2010). A quantum probability account for order effects on inference. *Cognitive Science*, 35, 1518–1552.Find this resource:

Trueblood, J. S., & Busemeyer, J. R. (2011). A quantum probability model of causal reasoning. *Frontiers in cognitive science*, 3, 138.Find this resource:

Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjuctive fallacy in probability judgment. *Psychological Review*, 90, 293–315.Find this resource:

Tversky, A., & Shafir, E. (1992). The disjunction effect in choice under uncertainty. *Psychological Science*, 3, 305–309.Find this resource:

Von Neumann, J. (1932/1955). *Mathematical foundations of quantum theory*. Princeton University Press.Find this resource:

Wang, Z., & Busemeyer, J. R. (2013). A quantum question order model supported by empirical tests of an a priori and precise prediction. *Topics in Cognitive Science*, 5, 689–710.Find this resource:

Wang, Z., Busemeyer, J. R., Atmanspacher, H., & Pothos, E. M. (2013). The potential of using quantum theory to build models of cognition. *Topics in Cognitive Science*, 5, 672–688.Find this resource:

Wang, Z., Solloway, T., Shiffrin, R. M., & Busemeyer, J. (2014). Context effects produced by question orders reveal quantum nature of human judgments. *Proceedings of the National Academy of Sciences*, 111(26), 9431–9436.Find this resource:

Yates, J. F., & Carlson, B. W. (1986). Conjunction errors: Evidence for multiple judgment procedures, including ’signed summation’. *Organizational Behavior and Human Decision Processes*, 37, 230–253.Find this resource:

Yukalov, V. I., & Sornette, D. (2011). Decision theory with prospect interference and entanglement. *Theory and Decision*, 70, 283–328.
(p. 390)
Find this resource:

## Notes:

(1.) In particular, this chapter does not rely on the quantum brain hypothesis (Hammeroff, 1998).

(2.) This section makes little use complex numbers, but the section on dynamics requires their use.

(3.) This chapter follows the Dirac representation of the state as a vector rather than the more general von Neumann representation of the state as a density matrix.

(4.) The basis χ is an arbitrary choice and there are many other choices for a basis. Initially, we restrict ourselves to one arbitrarily chosen basis. Later we discuss issues arising from using different choices for the basis.

(5.) Kolmogorov assigned a single sample space to the outcomes of an experiment. This allows one to use a different sample space for each experiment. But then the problem is that these separate sample spaces are left stochastically unrelated.

(6.) 16 gambles were played twice, one other gamble was played only once.

(7.)
A surprising feature was found with the log likelihood function of the quantum model as a function of the key quantum parameter *c*. The log likelihood function forms a damped oscillation that converges at a reasonably high log likelihood at the extremes, and this is true both for the average across participants as well as for individual participants.