Show Summary Details

A more recent version of this content exists; this version was replaced on 5 October 2015. The version that replaced it can be found here.

This more in-depth online version of this article features the following changes from the print handbook version: expanded sections discussing topics in more detail such as chance in physical theories and the Dutch Book Argument.

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).

date: 28 May 2020

# Probability

## Abstract and Keywords

Rather than entailing that a particular outcome will occur, many scientific theories only entail that an outcome will occur with a certain probability. Because scientific evidence inevitably falls short of conclusive proof, when choosing between different theories it is standard to make reference to how probable the various options are in light of the evidence. A full understanding of probability in science needs to address both the role of probabilities in theories, or chances, as well as the role of probabilistic judgment in theory choice. In this chapter, the author introduces and distinguishes the two sorts of probability from one another and attempt to offer a satisfactory characterization of how the different uses for probability in science are to be understood. A closing section turns to the question of how views about the chance of some outcome should guide our confidence in that outcome.

# 1. Introduction

When thinking about probability in the philosophy of science, we ought to distinguish between probabilities in (or according to) theories and probabilities of theories.1

The former category includes those probabilities that are assigned to events by particular theories. One example is the quantum mechanical assignment of a probability to a measurement outcome in accordance with the Born rule. Another is the assignment of a probability to a specific genotype appearing in an individual, randomly selected from a suitable population, in accordance with the Hardy-Weinberg principle. Whether or not orthodox quantum mechanics is correct and whether or not any natural populations meet the conditions for Hardy-Weinberg equilibrium, these theories still assign probabilities to events in their purview. I call the probabilities assigned to events by a particular theory t, t-chances. The chances are those t-chances assigned by any true theory t. The assignment of chances to outcomes is just one more factual issue on which different scientific theories can disagree; the question of whether the t-chances are the chances is just the question of whether t is a true theory.

Probabilities of theories are relevant to just this question. In many cases, it is quite clear what t-chances a particular theory assigns, while it is not clear how probable it is that those are the chances. Indeed, for any contentful theory, it can be wondered how probable it is that its content is correct, even if the theory itself assigns no chances to outcomes.

This second sort of probability does not seem to reduce to the first sort. Perhaps a theory Ω is so all-encompassing that the outcome Ω is correct is among the outcomes to which it assigns a chance; in which case, there will be an Ω-chance of the Ω-chances being the true chances. But what of it? What we generally want to know is not how probable Ω regards itself as being, but rather how probable we ought to regard Ω as being. Knowing that Ω rates itself highly is only helpful in answering this second question if we are already Ω partisans.

Perhaps these two seemingly distinct uses of probability in philosophy of science can ultimately be reduced to one underlying species of probability. But, at first glance, the prospects of such a reduction do not look promising, and I will approach the two topics separately in the following two sections of this chapter. In the last section, I will turn to the prospect of links between probabilities in theories and probabilities of theories.

# 2. Probabilities in Theories

## 2.1. Formal Features

Scientific theories have two aims: prediction and explanation. But different theories may predict and explain different things, and, generally, each theory will be in a position to address the truth of only a certain range of propositions. So, at least implicitly, each theory must be associated with its own space of (possible) outcomes, a collection of propositions describing the events potentially covered by the theory. We can assume that, for any theory, its space of outcomes forms an algebra. That is, (1) the trivially necessary outcome ⊤ is a member of any space of outcomes; (2) if p is an outcome, then so is its negation ¬p; and (3) if there exists some countable set of outcomes {p1, p2, …}, then their countable disjunction ∨ipi is also an outcome. Each of these properties is natural to impose on a space of outcomes, considered as a set of propositions the truth value of which a given theory could predict or explain.

Some have wished to impose only a weaker condition (3*): that if p and q are outcomes, so is pq (de Finetti 1937). The stronger condition (3) is often argued for on grounds of mathematical convenience but may be justified in the present context by observing, first, that if a theory can explain the truth or falsity of each proposition in some set, it is thereby able to explain the truth or falsity of the proposition that at least one member of the set is true. (This is so whether or not any person can grasp the explanation or the proposition explained.) Second, outcomes are propositions, not sentences, so there is no need for scruples about infinite sentences to extend to the propositions they would express.

A theory includes chances only if its principles or laws make use of a probability function P that assigns numbers to every proposition p in its space of outcomes O, in accordance with these three axioms first formulated explicitly by Kolmogorov (1933):

1.

$Display mathematics$

2.

$Display mathematics$

3.

$Display mathematics$

for any countable set of mutually exclusive outcomes ${p1,...}⊆O.$

To these axioms is standardly added a stipulative definition of conditional probability. The notation P(p|q) is read the probability of p given p, and defined as a ratio of unconditional probabilities: when,

A theory that involves such a function P meets a formal mathematical condition on being a chance theory. (Of course, a theory may involve a whole family of probability functions and involve laws specifying which probability function is to apply in particular circumstances.) As Kolmogorov noted, the formal axioms treat probability as a species of measure, and there can be formally similar measures that are not plausibly chances. Suppose some theory involved a function that assigned to each event a number corresponding to what proportion of the volume of the universe it occupied—such a function would meet the Kolmogorov axioms, but “normalised event volume” is not chance. The formal condition is necessary but insufficient for a theory to involve chances (Schaffer 2007, 116; but see Eagle 2011b, appendix A).

## 2.2. The Modal Aspect of Chance

What is needed, in addition to a space of outcomes and a probability measure over that space, is that the probability measure play the right role in the theory. Because the outcomes are those propositions whose truth value is potentially predicted or explained by the theory, each outcome—or its negation, or both—is a possibility according to the theory. Because scientific theories concern what the world is like, these outcomes are objectively (physically) possible according to the associated theory. It is an old idea that probability measures the “degree of possibility” of outcomes, and chances in theories correspondingly provide an objective measure of how possible each outcome is, according to the theory. This is what makes a probability function in theory T a chance: that T uses it to quantify the possibility of its outcomes (Mellor 2005, 45–48; Schaffer 2007, 124; Eagle 2011b, §2).

Principles about chance discussed in the literature can be viewed as more precise versions of this basic idea about the modal character of chances. Consider the Basic Chance Principle (Bcp) of Bigelow, Collins, and Pargetter (1993), which states that when the T-chance of some outcome p is positive at w, there must be a possible world (assuming T is metaphysically possible) in which p occurs and in which the T-chance of p is the same as in w and which is relevantly like w in causal history prior to p:2

In general, if the chance of A is positive there must be a possible future in which A is true. Let us say that any such possible future grounds the positive chance of A. But what kinds of worlds can have futures that ground the fact that there is a positive present chance of A in the actual world? Not just any old worlds…. [T]he positive present chance of A in this world must be grounded by the future course of events in some A-world sharing the history of our world and in which the present chance of A has the same value as it has in our world. That is precisely the content of the Bcp.

So the existence of a positive chance of an outcome entails the possibility of that outcome, and indeed, the possibility of that outcome under the circumstances which supported the original chance assignment. Indeed, Bigelow et al. argue that “anything that failed to satisfy the Bcp would not deserve to be called chance.”

Does the Bcp merely precisify the informal platitude that p’s having some chance, in some given circumstances, entails that p is possible in those circumstances? The Bcp entails that when an outcome has some chance, a very specific metaphysical possibility must exist: one with the same chances supported by the same history in which the outcome occurs. This means that the Bcp is incompatible with some live possibilities about the nature of chance. For example, consider reductionism about chance, the view that the chances in a world supervene on the Humean mosaic of occurrent events in that world (Lewis 1994; Loewer 2004). Reductionism is motivated by the observation that chances and frequencies don’t drastically diverge and, indeed, as noted earlier, that chances predict frequencies. Reductionists explain this observation by proposing that there is a nonaccidental connection (supervenience) between the chances and the pattern of outcomes, including outcome frequencies: chances and long-run frequencies don’t diverge because they can’t diverge. Plausible versions of this view require supervenience of chances at some time t on the total mosaic, not merely history prior to t (otherwise chances at very early times won’t reflect the laws of nature but only the vagaries of early history). Consider, then, a world in which the chance of heads in a fair coin toss is 1/2; the chance of 1 million heads in a row is $1/2(106)$. By the Bcp, since this chance is positive, there must be some world w in which the chance of heads remains 1/2, but where this extraordinary run of heads occurs. But if the prior history contains few enough heads, the long run of heads will mean that the overall frequency of heads in w is very high—high enough that any plausible reductionist view will deny that the chance of heads in w can remain 1/2 while the pattern of outcomes is so skewed toward heads. So reductionists will deny the Bcp for at least some outcomes: those that, were they to occur, would undermine the actual chances (Ismael 1996; Lewis 1994; Thau 1994). Reductionists will need to offer some other formal rendering of the informal platitude. One candidate is the claim—considerably weaker than Bcp—that when the actual chance of p is positive at t, there must be a world perfectly alike in history up to t in which p.3

## 2.3. Chance and Frequency

Chancy theories do not make all-or-nothing predictions about contingent outcomes. But they will make predictions about outcome frequencies. It makes no sense to talk about “the frequency” of an outcome because a frequency is a measure of how often a certain type of outcome occurs in a given trial population; different populations will give rise to different frequencies. The frequencies that matter—those that are predicted by chance theories—are those relative to populations that consist of genuine repetitions of the same experimental setup. The population should consist of what von Mises called “mass phenomena or repetitive events … in which either the same event repeats itself again and again, or a great number of uniform elements are involved at the same time” (von Mises 1957, 10–11). Von Mises himself held the radical view that chance simply reduced to frequency, but one needn’t adopt that position in order to endorse the very plausible claim that chance theories make firm predictions only about “practically unlimited sequences of uniform observations.”

How can a chance theory make predictions about outcome frequencies? Not in virtue of principles we’ve discussed already: although a probability function over possible outcomes might tell us something about the distribution of outcomes in different possible alternate worlds, it doesn’t tell us anything about the actual distribution of outcomes (van Fraassen 1989, 81–86). Because chances do predict frequencies, some additional principle is satisfied by well-behaved chances. One suggestion is the Stable Trial Principle (STP):4

If (i) A concerns the outcome of an experimental setup E at t, and (ii) B concerns the same outcome of a perfect repetition of E at a later time t, then $Ptw(A)=x=Pt′w(B)$. The Stp predicts, for instance, that if one repeats a coin flip, the chance of heads should be the same on both trials.

(Schaffer 2003, 37)

If chances obey the STP for all possible outcomes of a given experimental setup, they will be identically distributed. Moreover, satisfaction of the STP normally involves the trials being independent (dependent trials may not exhibit stable chances because the chances vary with the outcomes of previous trials). So if, as Schaffer and others argue, Stp (or something near enough) is a basic truth about the kinds of chances that appear in our best theories, then the probabilities that feature in the laws of our best theories meet the conditions for the strong law of large numbers: that almost all (i.e., with probability 1) infinite series of trials exhibit a limit frequency for each outcome type identical to the chance of that outcome type.5

If a theory entails that repeated trials are stable in their chances—that the causal structure of the trials precludes “memory” of previous trials and that the chance setup can be more or less insulated from environmental variations—then the chance, according to that theory, of an infinite sequence of trials not exhibiting outcome frequencies that reflect the chances of those outcomes is zero. The converse to the Bcp—namely, that there should be a possible world in which p only if p has some positive chance—is false: it is possible that a fair coin, tossed infinitely many times, lands heads every time. But something weaker is very plausible (see the principle Hcp below discussed later): that if the chance of p being the outcome of a chance process is 1, then we ought to expect that p would result from that process. Given this, an infinite sequence of stable trials would be expected to result in an outcome sequence reflecting the chances.

So some chance theories make definite predictions about what kinds of frequencies would appear in an infinite sequence of outcomes. They make no definite prediction about what would happen in a finite sequence of outcomes. They will entail that the chance of a reasonably long sequence of trials exhibiting frequencies that are close to the chances is high and that the chance increases with the length of the sequence. But because that, too, is a claim about the chances, it doesn’t help us understand how chance theories make predictions. If the goals of science are prediction and explanation, we remain unclear on how chancy theories achieve these goals.

Our actual practice with regard to this issue suggests that we implicitly accept some further supplementary principles. Possibility interacts with the goals of prediction and explanation. In particular: the more possible an outcome is, the more frequently outcomes of that type occur. If a theory assigns to p a high chance, it predicts that events relevantly like p will frequently occur when the opportunity for p arises. Chance, formalizing degree of possibility, then obeys these principles:

• HCE: If the chance of p is high according to T, and p, then T explains why p.6

• Hcp: If the chance of p is high according to T, then T predicts that p, or at least counsels us to expect that p.

These principles reflect our tacit opinion on how chance theories predict and explain the possible outcomes in their purview. So, if a probability measure in some theory is used in frequency predictions, that suggests the measure is a chance function. Similarly, if an outcome has occurred, and theoretical explanations of the outcome cite the high value assigned to the outcome by some probability measure, then that indicates the measure is functioning as a chance because citing a high chance of some outcome is generally explanatory of that outcome. So, chance theories will predict and explain observed frequencies so long as the series of repeated stable trials is sufficiently long for the theory to entail a high chance that the frequencies match the theoretical chances.

Emery 2015 argues that a basic characteristic of chance is that it explains frequency: a probability function is a chance function if high probability explains high frequency. Assuming that high frequency of an outcome type can explain why an instance of that type occurred, HCE is a consequence of her explanatory criterion for chance.

## 2.4. Classical and Propensity Chances

Two issues should be addressed before we see some chance theories that illustrate the themes of this section. The first concerns the modal aspect of chances. I’ve suggested that chance theories involve a space of possible outcomes and that chances measure possibilities. These are both core commitments of the classical theory of chance. But that theory adds a further element: namely, that “equally possible” outcomes should be assigned the same chance:

The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought.

(Laplace 1951, 6–7)

This further element is not now widely accepted as part of the theory of chance. It is presupposed in many textbook probability problems that do not explicitly specify a probability function: for example, we might be told the numbers of red and black balls in an urn and asked a question about the probability of drawing two red balls in a row with replacement. The only way to answer such a question is to assume that drawing each ball is a case “equally possible,” so that the facts about the number of different kinds of ball determine the probability. The assumption seems right in this case, although that appears to have more to do with the pragmatics of mathematics textbooks than with the correctness of the classical theory (the pragmatic maxim being that one ought to assume a uniform distribution over elementary outcomes by default and, accordingly, mention the distribution explicitly only when it is nonuniform over such outcomes). But the classical theory fails to accommodate any chance distribution in which elementary outcomes, atoms of the outcome space, are not given equal probability, such as those needed to model weighted dice. Since the space of outcomes for a weighted die is the same as for a fair die, chance cannot supervene on the structure of the space of outcomes in the way that the classical theory envisages. Although chance has a modal dimension, it is unlike alethic modals, where the modal status of a proposition as necessary, or possible, can be read straight off the structure of the space of possible outcomes. And, for that reason, the classical theory of chance is untenable. That is not to say that symmetries play no role in the theory of chance; it is clearly plausible in many cases that empirically detected symmetries in the chance setup and the space of outcomes play a role in justifying a uniform distribution (North 2010; Strevens 1998). But the fact that symmetrical outcomes, where they exist, can justify equal probabilities for those outcomes falls well short of entailing that they always exist or that they always justify equal probabilities.

The other issue I wish to take up here concerns the formal aspect of chances. I’ve assumed that chances must meet the probability axioms, and the pressing question on that assumption was how to discriminate chances from other mathematically similar quantities. But having picked on the connections between chance, possibility, and frequency to characterize the functional role of chance, a further question now arises: must something that plays this functional role be a probability?

Given that chances predict and explain outcome frequencies in repeated trials, an attractive thought is that they manage to do so by being grounded in some feature of the trials constituting a tendency for the setup involved to cause a certain outcome. It is standard to call such a tendency a propensity (Giere 1973; Popper 1959). The idea that chances reflect the strength of a causal tendency both promises to explain the modal character of chances—because if a chance setup has some tendency to produce an outcome, it is not surprising that the outcome should be a possible one for that setup—and also to provide illumination concerning other aspects of the chance role. The Stp, for example, turns out to be a direct consequence of requiring that repeated trials be alike in their causal structure.

This invocation of tendencies or propensities is implicitly conditional: a propensity for a certain outcome, given a prior cause. So it is unsurprising that many propensity accounts of chance take the fundamental notion to be conditional, recovering unconditional chances (if needed) in some other way (Popper 1959; see also Hájek 2003). The chance of complications from a disease could be understood as the chance of the complication given that someone has the disease; or, to put it another way, the propensity for the complication to arise from the disease. The idea of conditional probability antedates the explicit stipulative definition given earlier, and causal propensities seem promising as a way of explicating the pretheoretical notion.

But as Humphreys pointed out (1985; see also Milne 1986), grounding conditional chances in conditional propensities makes it difficult to understand the existence of nontrivial chances without such underlying propensities. Yet such chances are ubiquitous, given that chances are probabilities. The following theorem—Bayes’ theorem—is an elementary consequence of the definition of conditional probability: $P(q|p)=P(p|q)P(q)P(p).$. We say that p is probabilistically independent of q if P(p|q) = P(p). It follows from this definition and Bayes’ theorem that p is probabilistically independent of q if and only if q is independent of p. If p is causally prior to q, then although there may be a propensity for p to produce q, there will generally be no propensity for q to produce p. If there is no propensity for q to produce p, then the chance of p given q should be just the unconditional chance of p because the presence of q makes no difference: p should be probabilistically independent of q. But it will then follow that q is independent of p, too, even though there is a propensity for p to produce q. Humphreys offers an illustrative example: “heavy cigarette smoking increases the propensity for lung cancer, whereas the presence of (undiscovered) lung cancer has no effect on the propensity to smoke” (Humphreys 1985, 559). In this case, causal dependence is not symmetric. There is a causal dependence of lung cancer on smoking—a propensity for smoking to produce lung cancer. This propensity grounds a chancy dependence of lung cancer on smoking. But there is no causal dependence of smoking on lung cancer; so, on the propensity view, there should not be a dependence in the chances. These two results are not compatible if chances are probabilities because probabilistic dependence is symmetrical:

a necessary condition for probability theory to provide the correct answer for conditional propensities is that any influence on the propensity which is present in one direction must also be present in the other. Yet it is just this symmetry which is lacking in most propensities.

(Humphreys 1985, 559)

Humphreys uses this observation in arguing that chances aren’t probabilistic because conditional chances are grounded in conditional propensities. But even conceding that some chances are grounded in propensities, many are not. Consider the conditional chance of rolling a 3 with a fair die, given that an odd number was rolled. There is no causal influence between rolling an odd number and rolling a three—whatever dependence exists is constitutive, not causal. These conditional chances exist without any causal propensity to ground them. Given this, the lack of causal dependence of smoking on lung cancer, and the corresponding absence of a propensity, cannot entail that there is no chance dependence between those outcomes. We could retain the role of causal tendencies in explaining certain conditional chances without violating the probability calculus. To do so, we must abandon the idea that every conditional chance should be grounded in a causal tendency. Retaining the formal condition that chances be probabilities certainly puts us in a better position to understand why it is probability functions that appear in our best scientific theories rather than whatever mathematical theory of “quasi-probabilities” might emerge from executing Humphreys’s program of refounding the theory of chances on causal propensities. (For an example of what such a theory might look like, see Fetzer 1981, 59–67.) Many successful theories presuppose that chances are probabilities, and dropping that presupposition involves us in a heroic revisionary project that we might, for practical reasons, wish to avoid if at all possible.

## 2.5. Examples of Chance in Scientific Theory

I’ve suggested that theories involve chances only when they involve a probability assignment to a theoretically possible outcome that is used by that theory in prediction and explanation of outcome frequencies. So understood, a number of current scientific theories involve chances. I will give three examples, all from physics: radioactive decay, quantum mechanics, and classical statistical mechanics. Importantly, not all are from fundamental physics, and the issues raised by the discussion of chance in statistical mechanics carry over to chances in other nonfundamental theories, such as chances in population genetics, in mutagenesis, and even in the familiar gambling devices of coins and dice. Indeed, Ismael (2009) argues that every testable theory, deterministic or otherwise, comes equipped with a probability function that enables the theory to make predictions from partial information and that such probability functions are perfectly objective features of the content of the theory. Given the role of chance in prediction, it would follow that chances should be commonplace in the empirical sciences.

Perhaps the simplest chance theory in physics is the theory of radioactive decay (van Fraassen 1991, 79–81). Some isotopes have an unstable combination of neutrons and protons in their nuclei. The instability in any given atom manifests in spontaneous and unpredictable transmutation of the radioisotope into a more stable isotope of another element (perhaps via some intermediate decay products) and the emission of particles and electromagnetic radiation. Although the decay of any particular nucleus is unpredictable, the chance of decay is very well-behaved. We can consider the chance setup as simply observing a given nucleus of some radioisotope for a second. The outcome space consists of the two possible outcomes during that second, decay or survive, such that the probability of decay in each 1 second trial is equal to a constant λ (which depends on which radioisotope is involved). Collections of nuclei, over longer time scales, are treated as ensembles of these simpler systems; in particular, the chance of decay is constant over time and identical for each nucleus.7 Thus, chances in the theory of radioactive decay are formally probabilities, stable over trials, and with the right connections to modality to facilitate prediction and explanation.

Because probability is expected truth value, the expected number of decays from N nuclei in 1 second is λN, and the expected “increase” in N over time is $dNdt=−λN$. Rearranging and integrating, we see that the expected number of nuclei remaining after t seconds, $N(t)=Ne−λt$. This equation can be used to derive the time t at which $N(t)=N/2$, the so-called half-life of the substance, which is $t1/2=ln2λ$. Since macroscopic quantities of radioactive elements consist of vast numbers of individual atoms, the chance that the observed value for the half-life is close to this expected value is very high; so (by Hcp and HCE), a given theory of radioactive decay for some isotope predicts and explains the half-life. That, in turn, might permit empirical observation of half-lives for isotopes to determine their respective chances of decay.

The theory of radioactive decay is a high-level theory: it describes the consequences of nuclear instability while remaining silent on the underlying mechanism. In fact, very different mechanisms give rise to the same exponential decay laws. In α decay, there is an underlying quantum mechanical explanation of why the decay chances take the form they do, although it is in turn irreducibly chancy.

One way of thinking about quantum mechanics is that the theory attributes to each system of particles a configuration space, each point of which corresponds to a complete specification of some possible spatial arrangement of those particles and such that every possible arrangement of particles corresponds to some point of the space (Albert 1992, 278; Ney 2013, 10ff; Rickles this volume, §3.1). Over this configuration space is defined a wave function, which behaves like a field, taking (complex) values at every point of the space. The fundamental equation of quantum mechanics, the Schrödinger equation, determines how this wave function evolves over time. The Schrödinger equation is deterministic; no probabilities are involved. But the wave function determines a probability function by the Born rule: the probability that the system will be found, when measured, in a particular configuration is equal to the amplitude of the wave function at the corresponding point in configuration space. The probability that the system will be in a particular region of configuration space is obtained by integrating the squared amplitude of the wave function over every configuration in the region.8

There is considerable controversy over how to interpret these outcomes finding the system in a particular sort of configuration on measurement. One intuitively tempting picture is that the system really is in a particular configuration before measurement, and measurement simply reveals that configuration—so that the wave function is something like a measure of ignorance over so-called hidden variables. But hidden variable theories, at least those with orthodox causality, are not consistent with the experimental evidence.9 Any hidden variables theory in which the prior configuration is causally responsible for the observed outcomes will predict that the observed frequencies will reflect certain inequalities among probabilities: the Bell inequalities (Bell 1964). Orthodox quantum mechanics entails that the Bell inequalities are false for quantum chances and thus predicts that they will be violated in the observed frequencies. And this violation of the Bell inequalities is what we observe (van Fraassen 1991, 79–105; Shimony 2013).

These results suggest that the complete state of a given quantum mechanical system at a time is given by its wave function in configuration space, not by the wave function plus a hidden variable. The relationship of the wave function to observable measurement outcomes is irreducibly probabilistic. Since the events we are interested in—those that constitute the macroscopic world of ordinary experience—are all measurement outcomes, the quantum theory is one that assigns chances to ordinary events and predicts outcome frequencies for repeated trials of measurement preparations.

This leaves many interpretative questions unanswered. Ought we take the formalism literally and think of quantum mechanics as telling us that, ultimately, 3D space is illusory and only configuration space is real (Albert 1996; see also Wallace and Timpson 2010)? Or should we take the quantum state as basic and the wave function story as just a representation? As an example of this second approach, consider Wallace’s development of the Everett interpretation, on which every one of the multiplicity of configurations that are assigned a non-zero amplitude by the wave function is treated as a real way that macroscopic observables are arranged, so that the underlying quantum state grounds a derivative ontology of multiple observable “branches” (Wallace 2011). Both of these approaches treat the emergence of quantum chance as a byproduct of the mismatch between the quantum state and familiar macroscopic observables. Other approaches take these familiar observable quantities as fundamental and add new fundamental probabilistic laws to link them with the wave function. The most sober of such approaches, the GRW theory (Ghirardi, Rimini, and Weber 1986), postulates additional dynamic laws of the evolution of the wave function over time (in addition to the Schrödinger equation), so that—sometimes, at random—the wave function spontaneously “collapses” so that it takes the value zero everywhere outside of a very small region of configuration space, where the probability of collapse onto a particular region is proportionate to the amplitude of the wave function at that region. These collapses happen often enough that, when we do measure observable quantities, we are almost certain to find the system in some specific configuration (see also Bell 1987). (Less sober versions of the theory take the imposition of human consciousness in measurement to prompt collapses.) Whatever the right approach to quantum mechanics, all involve the macroscopic character of the space of possible outcomes and provide a probability function over such outcomes dependent on the generating quantum system, which exhibits the appropriate connections with possibility and frequency that are hallmarks of chance.

Return now for a moment to radioactive decay. In α decay, an α particle is emitted from the decaying nucleus. The standard picture is that the nucleus decays first but that the α particle that is the product of that decay remains bound near the nucleus by the strong nuclear force. This gives rise to a wave function that has most of the amplitude concentrated on configurations in which the α particle remains close to the nucleus, but some amplitude on configurations that involve the α particle escaping the vicinity of the nucleus—being emitted, despite intuitively having too little energy to escape the nucleus, by “quantum tunneling.” It turns out that, given the right values for the energy of the particle and the strength of the strong nuclear force, this wave function determines the probability of escape to follow the exponential decay law given by the high-level theory (Greenstein and Zajonc 1997, 163–165).

The final chance theory I wish to discuss is classical statistical mechanics (Albert 2000, 35–70; Sklar 1993). This theory provides a probabilistic foundation for thermodynamic phenomena—ice melting, the behavior of gases in containers under varying temperature and pressure; in general, the movement of heat from one system to another. The basic idea is to treat thermodynamic systems as aggregates of particles governed by classical Newtonian physics. The state of a system at a time is thus given by specifying the position and momentum of each particle; accordingly, we can, as in quantum mechanical configuration space, treat a state of an N particle system as a point in a 6N dimensional state space called phase space. The space of possible outcomes is the algebra based on phase space; in effect, a possible outcome is an arbitrary subset of phase space. Importantly, the thermodynamic properties of a system—its temperature, pressure, and the like—correspond to regions of phase space that include many specific microscopic states. (If temperature is something like mean kinetic energy of the aggregate of particles, there are obviously lots of ways of positioning the particles and attributing momentum to them resulting in the same mean energy.)

To get probabilities involved, we add a measure over this space of possible outcomes: the Liouville measure μ, which is effectively just an ordinary volume measure. This is not a probability measure because (given that there is no upper bound on velocity in classical physics) the phase space volume is infinite. But it determines a probability measure for each measurable subset K of the phase space: if p is some possible outcome, then the probability of p relative to K, PK(p), is the ratio μ(pK)/μ(K), assuming μ(K)>0. This measure is uniform for each K, assigning equal probabilities to subregions of K of equal volume (Meacham 2005, 285).

A statistical mechanical system has a state that evolves over time in accordance with classical Newtonian laws and thus traces a certain trajectory through phase space. One consequence of those laws is that the energy of an isolated system is conserved. So, whatever the trajectory of an isolated system through phase space looks like, it will remain confined to a region where the total energy for the system is constant. Let’s call this constant energy region K, and we can accordingly assign probabilities to the possible outcomes for this isolated system relative to K. The theory makes predictions about possible outcomes by making the following assumption: the probability that a system in K will give rise to a certain outcome p is just PK(p); that is, the probability of finding the system in a certain region of phase space is just the relative volume of that region relative to the background assumption that the system satisfies K.10

The appeal of the theory comes from its treatment of thermodynamics and, in particular, the second law of thermodynamics: that thermodynamic systems in isolation become more macroscopically disorderly. The explanation of this law offered by statistical mechanics is this: any macrocondition of a system corresponds to a region of phase space. A disorderly, or random, macrocondition is one that is compatible with more underlying states: intuitively, there are more random ways to distribute particle position and velocity than orderly ones, consistent with a given total energy (Albert, 2000, 50–51).11 Given the assumptions about probability, it then turns out that the probability for a closed system being in a disorderly state—a state of high entropy—is much greater than the probability that the system will be in an orderly state. Interpreted dynamically, a system started in any state, confined to a region of given energy, is vastly more likely to eventually be in a disorderly macrocondition than an orderly one. And so the observable evidence confirming the thermodynamic law is grounded in the near certainty that it obtains at any given moment (Albert 2000, 55–60).12

So far, these relative probabilities look a lot like chances. They are relativized to a particular specification of the system, of course, but all of the chance functions I’ve discussed have had that feature, although not as explicitly (Hájek 2007). But they are defined over a class of possible outcomes and linked to possibility and frequency. The link to frequency is in fact built in; it is stipulated to be part of the application of the theory that relative probabilities are a good guide to frequency or at least expected macrocondition. And the assumption that at least some relative probabilities are chances—those relative to macroconditions that specify a well-behaved chance setup (Eagle 2014, 149–154) or those relative to macroconditions that explain macrofrequencies (Emery 2015)—enables us to understand why statistical mechanics explains and predicts thermodynamic phenomena in the same way that quantum mechanics explains α decay. The explanatory role of these probabilities ensures that they are chances rather than some other sort of probability. The chance status of relative probabilities is particularly clear because, as just noted, statistical mechanics doesn’t just explain outcome frequencies: it explains thermodynamic laws. And while thermodynamic “generalisations may not be fundamental … nonetheless they satisfy the usual requirements for being laws: they support counterfactuals, are used for reliable predictions, and are confirmed by and explain their instances” (Loewer 2001, 611–612). The statistical mechanical explanation of thermodynamics doesn’t succeed without invoking probability, and those probabilities must therefore be chances.

## 2.6. The Question of Determinism

In the previous section, we saw chances arise in two different ways: either from fundamental indeterministic dynamics, as in the GRW approach to quantum mechanics, or derived from an irreducibly probabilistic relationship between the underlying physical state and the observable outcomes, as in statistical mechanics and the Everett interpretation. But many have argued that this second route does not produce genuine chances and that indeterministic dynamical laws are the only way for chance to get in to physics:

To the question of how chance can be reconciled with determinism … my answer is: it can’t be done…. There is no chance without chance. If our world is deterministic there are no chances in it, save chances of zero and one. Likewise if our world somehow contains deterministic enclaves, there are no chances in those enclaves.

(Lewis 1986, 118–120)

There is a puzzle here (Loewer 2001, §1). For it does seem plausible that if a system is, fundamentally, in some particular state, and it is determined by the laws to end up in some future state, then it really has no chance of ending up in any other state. And this supports the idea that nontrivial chances require indeterministic laws. On the other hand, the predictive and explanatory success of classical statistical mechanics, and indeed of deterministic versions of quantum mechanics, suggests that the probabilities featuring in those theories are chances.

This puzzle has recently been taken up by a number of philosophers. Some have tried to bolster Lewis’s incompatibilist position (Schaffer 2007), but more have defended the possibility of deterministic chance. A large number have picked up on the sort of considerations offered in the last section and argue that because probabilities in theories like statistical mechanics behave in the right sort of way, they are chances (Emery 2015 ; Loewer 2001). Often, these sorts of views depend on a sort of “level autonomy” thesis, that the probabilities of nonfundamental theories are nevertheless chances because the nonfundamental theories are themselves, to a certain degree, independent of the underlying fundamental physics and so cannot be trumped by underlying determinism (Glynn 2010; List and Pivato 2015; Sober 2010).

This doesn’t seem like it will help very much with the “probability problem” for Everettian quantum mechanics (Greaves 2007), where we are not dealing with theories at different levels but with the probabilistic predictions of the fundamental theory itself. However, most approaches to Everettian quantum mechanics do treat branches—observable histories of determinate measurement outcomes—as emergent from the underlying dynamics. Since it is events in branches that are the bearers of probability, these theories do, in effect, claim that branch outcomes are autonomous from the underlying quantum state. What this autonomy consists in is rather unclear. Some argue that since what observers have access to is the occurrences on their local branch, but that information isn’t sufficient to determine future outcomes on that branch, there is local chance despite the global determinism of the underlying state (Ismael 2003). This makes Everettian chance rather like statistical mechanical chance, in which the macroinformation doesn’t suffice to determine future macro-outcomes. Others argue that the probabilities don’t attach to ordinary outcomes but rather to a certain sort of self-locating proposition, such as that I see a certain measurement outcome, which is evaluable only at a branch (Saunders and Wallace 2008), but which is essential because it is the kind of proposition that opens the theory up to empirical test in the first place (recall Ismael 2009).

It is difficult, however, to justify the autonomy of higher level theories because every occurrence supposedly confirming them supervenes on occurrences completely explicable in fundamental theories. If so, the underlying determinism will apparently ensure that only one course of events can happen consistent with the truth of the fundamental laws and that therefore nontrivial probabilities in higher level theories won’t correspond to genuine possibilities. If chance is connected with real possibility, these probabilities are not chances: they are merely linked to epistemic possibility, consistent with what we know of a system. In responding to this argument, the crucial question is: is it possible for determinism to be true and yet more than one outcome be genuinely possible? The context-sensitivity of English modals like can and possibly allows for a sentence like the coin can land heads to express a true proposition even while the coin is determined to land tails, as long as the latter fact isn’t contextually salient to the speaker (Kratzer 1977). We may use this observation, and exploit the connection between chance ascriptions and possibility ascriptions, to resist this sort of argument for incompatibilism (Eagle 2011b) and continue to regard objective and explanatory probabilities in deterministic physics as chances.

# 3. Probabilities of Theories

Many claims in science concern the evaluation, in light of the evidence, of different theories or hypotheses: that general relativity is better supported by the evidence than its nonrelativistic rivals, that relative rates of lung cancer in smokers and nonsmokers confirm the hypothesis that smoking causes cancer, or that the available evidence isn’t decisive between theories that propose natural selection to be the main driver of population genetics at the molecular level and those propose that random drift is more important. Claims about whether evidence confirms, supports, or is decisive between hypotheses are apparently synonymous with claims about how probable these hypotheses are in light of the evidence.13 How should we understand this use of probability?

One proposal, due to Carnap, is to invoke a special notion of “inductive probability” in addition to chance to explain the probabilities of theories (Carnap 1955, 318). His inductive probability has a “purely logical nature”—the inductive probability ascribed to a hypothesis with respect to some evidence is entirely fixed by the formal syntax of the hypothesis and the evidence. Two difficulties with Carnap’s suggestion present themselves. First, there are many ways of assigning probabilities to hypotheses relative to evidence in virtue of their form that meet Carnap’s desiderata for inductive probabilities, and no one way distinguishes itself as uniquely appropriate for understanding evidential support. Second, and more importantly, Carnap’s proposal leaves the cognitive role of confirmation obscure. Why should we care that a theory has high inductive probability in light of the evidence, unless having high inductive probability is linked in some way to the credibility of the hypothesis?

Particularly in light of this second difficulty, it is natural to propose that the kind of probability involved in confirmation must be a sort that is directly implicated in how credible or believable the hypothesis is relative to the evidence. Indeed, we may go further: the hypothesis we ought to accept outright—the one we ought to make use of in prediction and explanation—is the one that, given the evidence we now possess, we are most confident in. Confirmation should therefore reflect confidence, and we ought to understand confirmation as implicitly invoking probabilistic levels of confidence: what we might call credences.

## 3.1. Credences

A credence function is a probability function defined over a space of propositions: the belief space. Unlike the case of chance, propositions in the belief space are not possible outcomes of some experimental trial. They are rather the possible objects of belief; propositions that, prior to inquiry, might turn out to be true. An agent’s credence at a time reflects their level of confidence then in the truth of each proposition in the belief space.

There is no reason to think that every proposition is in anyone’s belief space. But since a belief space has a probability function defined over it, it must meet the algebraic conditions on outcome spaces and must therefore be closed under negation and disjunction. This ensures that many unusual propositions—arbitrary logical compounds of propositions the agent has some confidence in—are the objects of belief. It also follows from the logical structure of the underlying algebra that logically equivalent propositions are the same proposition. The laws of probability ensure that an agent must be maximally confident in that trivial proposition which is entailed by every other. Since every propositional tautology expresses a proposition entailed by every other, every agent with a credence function is certain of all logical truths. (Note: they need not be certain of a given sentence that it expresses a tautology, but they must be certain of the tautology expressed.) These features make having a credence function quite demanding. Is there any reason to think that ordinary scientists have them, that they may be involved in confirmation?

The standard approach to this question certainly accepts that it is psychologically implausible to think that agents explicitly represent an entire credence function over an algebra of propositions. But a credence function need not be explicitly represented in the brain to be the right way to characterize an agent’s belief state. If the agent behaves in a way that can be best rationalized on the basis of particular credences, that is reason to attribute those credences to her. Two sorts of argument in the literature have this sort of structure. One, given mostly by psychologists, involves constructing empirical psychological models of cognitive phenomena, which involves assigning probabilistic degrees of belief to agents. The success of those models then provides evidence that thinkers really do have credences as part of their psychological state (Perfors 2012; Perfors et al. 2011).

## 3.2. Credences from Practical Rationality

The second sort of argument, given mostly by philosophers and decision theorists, proposes that certain assumptions about rationality entail that, if an agent has degrees of confidence at all, these must be credences. The rationality assumptions usually invoked are those about rational preference (although recent arguments exist that aim to avoid appealing to practical rationality; see Joyce 1998). If an agent has preferences between options that satisfy certain conditions, then these preferences can be represented as maximizing subjective expected utility, a quantity derived from a credence function and an assignment of values to outcomes. Any such representation theorem needs to be supplemented by an additional argument to the effect that alternative representations are unavailable or inferior, thus making it plausible that an agent really has a credence function when they can be represented as having a credence function. These additional arguments are controversial (Hájek 2008; Zynda 2000).

There are many representation theorems in the literature, each differing in the conditions it imposes on rational preference (Buchak 2013; Jeffrey 1983; Maher 1993; Savage 1954). Perhaps the simplest is the Dutch book argument (de Finetti 1937; Ramsey 1926). The argument has two stages: first, show that one can use an agent’s preferences between options of a very special kind to assign numbers to propositions that reflect degrees of confidence; and, second, show that those numbers must have the structure of probabilities on pain of practical irrationality.

The first stage of the argument involves showing that betting preferences can be used to assign numbers to credences. A bet exists between A and B over p when A pays SA to win S if p, and B pays SB to win S if ¬p, where $S=SA+SB$. A bet is fair for A if A has no preference between wagering SA on p, wagering SSA against p; that is, a fair bet is one in which the agent doesn’t care which side of the bet they are on. A fair bet is the least non-unfavorable bet. Ramsey notes that an agent

will take a bet at any better odds than those corresponding to his state of belief; in fact, his state of belief is measured by the odds he will just take.

(Ramsey 1926, 72)

That is, A’s credence in a proposition is their fair betting rate, $SAS.$. We can evaluate fair betting rates for hypothetical bets by looking at A’s dispositions to bet and use them to establish credences in arbitrary propositions in the belief space.

The second stage of the argument is that fair betting rates—for agents whose preferences between bets are rational—are probabilities. This is shown by demonstrating that nonprobabilistic fair betting rates commit agents to the acceptability of packages of simultaneous bets that collectively guarantee a sure loss—a Dutch book—which is patently irrational assuming that the agent desires money and that the bets are small enough that risk aversion plays no role.14 Here’s an example. Suppose A’s credence—their fair betting rate—in a tautology ⊤ was less than 1. So A is indifferent between paying 1−ε to receive $1 if ⊤ and ε>0 to receive$1 if ¬⊤. Since ⊤ will eventuate, A in fact regards a bet that is guaranteed to ensure that A is worse off by ε in the end as fair, so A is irrational. Only if A’s fair betting rate in ⊤ is 1 do they avoid a sure loss, just as the axioms of probability require.

The actual bets are irrelevant. Ramsey suggests the fundamental problem is the irrationality of inconsistent valuation of equivalent options. Having a degree of belief susceptible to a Dutch book

would be inconsistent in the sense that it violated the laws of [rational] preference between options … if anyone’s mental condition violated these laws, his choice would depend on the precise form in which the options were offered him, which would be absurd.

(Ramsey 1926, 78)

The idea is this: no one regards as fair a bet that cost x and paid back xε for certain. This is simply to value x at less than x, and doing that would be irrational. But the Dutch book argument shows that probability-violating degrees of belief generate books of subjectively fair bets tantamount to this irrational bet. If someone takes the book, but not the irrational bet, they are taking the presentation of equivalent options to make a relevant difference, which is also irrational.

The Dutch book argument, like other representation theorems, is suggestive but far from conclusive. But even philosophers can avail themselves of the psychological style of argument for credences, arguing that because approaching epistemic rationality via credences allows the best formal systematization of epistemology and its relation to rational action, that is reason to suppose that rational agents have credences:

A remarkably simple theory—in essence, three axioms that you can teach a child—achieves tremendous strength in unifying our epistemological intuitions. Rather than cobbling together a series of local theories tailored for a series of local problems—say, one for the grue paradox, one for the raven paradox, and so on—a single theory in one fell swoop addresses them all. While we’re at it, the same theory also undergirds our best account of rational decision-making. These very successes, in turn, provide us with an argument for probabilism: our best theory of rational credences says that they obey the probability calculus, and that is a reason to think that they do.

(Eriksson and Hájek 2007, 211)

## 3.3. Bayesian Confirmation Theory

Suppose this all works and that scientists have credences. The basic postulate of Bayesian confirmation theory is that evidence e confirms hypothesis h for A if A’s credence in h given e is higher than their unconditional credence in h: . That is, some evidence confirms a hypothesis just in case the scientist’s confidence in the hypothesis, given the evidence, is higher than in the absence of the evidence. Note that, by Bayes’ theorem, e confirms h if $CA(e|h)>CA(e)$: if the evidence is more likely given the truth of the hypothesis than otherwise. In answering the question of what scientists should believe about hypotheses and how those beliefs would look given various pieces of evidence, we do not need a separate “logic of confirmation”: we should look to conditional credences in hypotheses on evidence.

This is overtly agent-sensitive. Insofar as confirmation is about the regulation of individual belief, that seems right. But although it seems plausible as a sufficient condition on confirmation—one should be more confident in the hypothesis given evidence confirming it—many deny its plausibility as a necessary condition. Why should it be that evidence can confirm only those hypotheses that have their credence level boosted by the evidence? For example, couldn’t it be that evidence confirms a hypothesis of which I am already certain (Glymour 1981)?

Some Bayesians respond by adding additional constraints of rationality to the framework. Credences of a rational agent are not merely probabilities: they are probabilities that also meet some further condition. It is fair to say that explicit constructions of such further conditions have not persuaded many. Most have made use of the classical principle of indifference (see Section 2.4), but such principles are overly sensitive to how the hypothesis is presented, delivering different verdicts for equivalent problems (Milne 1983; van Fraassen 1989). It does not follow from the failure of these explicit constructions that anything goes: it just might be that, even though only some credences really reflect rational evaluations of the bearing of evidence on hypotheses, there no recipe for constructing such credences without reference to the content of the hypotheses under consideration. For example, it might be that the rational credence function to use in confirmation theory is one that assigns to each hypothesis a number that reflects its “intrinsic plausibility … prior to investigation” (Williamson 2000). And it might also be that, when the principle of indifference is conceived as involving epistemic judgments in the evaluation of which cases are “equally possible,” which give rise to further epistemic judgments about equal credences, it is far weaker and less objectionable (White 2009).

Moreover, there are problems faced by “anything goes” Bayesians in explaining why agents who disagree in credences despite sharing all their evidence shouldn’t just suspend judgment about the credences they ought to have (White 2005). The reasonable thing to do in a case like that, it’s suggested, is for the disagreeing agents to converge on the same credences, adopt indeterminate credences, or do something like that—something that involves all rational agents responding in the same way to a given piece of evidence. But how can Bayesians who permit many rational responses to evidence explain this? This takes a more pressing form, too, given the flexibility of Bayesian methods in encompassing arbitrary credences: why is it uniquely rational to follow the scientific method (Glymour 1981)? The scientific method is a body of recommended practices designed to ensure reliable hypothesis acceptance (e.g., prefer evidence from diverse sources as more confirmatory or avoid ad hoc hypotheses as less confirmatory). The maxims of the scientific method summarize techniques for ensuring good confirmation, but if confirmation is dependent on individual credence, how can there be one single scientific method? The most plausible response for the subjective Bayesian is to accept the theoretical possibility of a plurality of rational methods but to argue that current scientific training and enculturation in effect ensure that the credences of individual scientists do by and large respond to evidence in the same way. If the scientific method can be captured by some constraints on credences, and those constraints are widely endorsed, and there is considerable benefit to being in line with community opinion on confirmation (as there is in actual scientific communities), that is a prudential reason for budding scientists to adopt credences meeting those constraints.

## 3.4. Successes of the Bayesian Approach

Whether we think that scientists have common views about how to respond to evidence as a matter of a priori rationality or peer pressure doesn’t matter. What ultimately matters is whether the Bayesian story about confirmation delivers plausible cases where, either from the structure of credences or plausible assumptions about “natural” or widely shared priors, we can derive that the conditional credence in h given e will exceed the unconditional credence if and only if e intuitively confirms h. Howson and Urbach (1993, 117–164) go through a number of examples showing that, in each case, there is a natural Bayesian motivation for standard principles of scientific methodology. I will consider two cases: the evidential value of diverse evidence and the so-called “paradox of confirmation.” The upshot is that Bayesian confirmation theory, based on the theory of credences, provides a systematic framework in which proposed norms of scientific reason can be formulated and evaluated and that vindicates just those norms that do seem to govern actual scientific practice when supplied with the kinds of priors it is plausible to suppose working scientists have

### 3.4.1. Diverse Evidence

It appears to be a methodological rule that, other things being equal, the more diverse the sources of evidence for one’s theory, the more strongly confirmed that theory is. This can be captured in this maxim: A theory that makes predictions in a number of disparate and seemingly unconnected areas is more confirmed by that evidence than is a theory that is confirmed by predictions only about a narrow and circumscribed range. There is a Bayesian rationale for this rule (Earman 1992, §3.5; Horwich 1982, 118–122). The Bayesian insight is that diverse evidence is not internally correlated, which will be reflected in any reasonable prior C. If the evidence is diverse, it consists of at least two propositions, e1 and e2, such that the truth of one is not positively relevant to the truth of the other if the hypothesis in question is false. So e1 and e2 are diverse relative to h if and only if the likelihood $C(e1∧e2|¬h)$ is low or at least if it is not greater than the product of the individual credences $C(e1|¬h)C(e2|¬h)$. If, for example, the hypothesis is that all swans are white, then swans collected from different countries would, if white, provide better evidence for the hypothesis than would swans collected from the same pond, because we know that if one swan on a pond is white, it is much more likely to be related to other swans in its pond, and those are more likely therefore to be white.

Define the Bayes’ factor of ¬h against h: $β(¬h:h)=dfC(e|¬h)C(e|h)$. Using this notion, we can express Bayes’ theorem (see Section 2.4) in this useful form: $C(h|e)=C(h)C(h)+β(¬h:h)C(¬h).$. If the hypothesis h predicts both e1 and e2, then the likelihood $C(e1∧e2|h)$ is high (close to 1). So the Bayes’ factor βh : h) is approximately equal to its numerator, $C(e1∧e2|¬h)$. Substituting this into the reformulated Bayes’ theorem, we get

$Display mathematics$

The last inequality follows from the diversity of the evidence. So much is just the probability calculus. In practical cases, we are interested in evidence that is diverse in such a way as to ensure that, if the hypothesis is false, at least one of the pieces of evidence is unlikely. It would be surprising, given that not all swans are white, if arbitrary swans taken from diverse locations were all likely to be white. There is an additional assumption on the priors in this case: that at least one $C(ei|¬h)$ is low and perhaps both are low. Given that, $C(h)+C(e1|¬h)C(e2|¬h)C(¬h)≈C(h)$, so that $C(h|e)≈1$. If h was not antecedently plausible, diverse evidence of this form has strongly confirmed it.

### 3.4.2. The Paradox of Confirmation

Hempel’s (1945) paradox of confirmation starts with two intuitive principles about confirmation:

1. 1. Nicod’s Condition: A generalization of the form ∀ is confirmed by any instance $φ[a\x]$.

2. 2. Equivalence: Logically equivalent hypotheses are confirmed by the same evidence.

The paradox is this: the observation of a particular white piece of paper yields evidence amounting to an instance of the generalization all non-black things are non-ravens. By Nicod’s condition, that evidence confirms the generalization. This generalization is logically equivalent to the hypothesis that all ravens are black; hence, that latter hypothesis is also confirmed by the observation by the Equivalence principle. But—intuitively—one cannot confirm the ornithological hypotheses by armchair investigation of the contents of one’s filing cabinet. It’s a paradox because the conditions seem true, but the conclusion that validly follows from them does not.

The paradox can be resolved without recourse to Bayesian technology. Nicod’s condition is not always true. The neatest counterexample is Rosenkrantz’s:

Three people leave a party, each with a hat. The hypothesis that none of the three has his own hat is confirmed, according to Nicod, by the observation that person 1 has person 2’s hat and by the observation that person 2 has person 1’s hat. But since there are only three people, the second observation must refute the hypothesis, not confirm it.

(Howson and Urbach 1993, 129)

This blocks the objectionable reasoning in general but gives no account of why Nicod’s condition should seem plausible in Hempel’s original case. A Bayesian treatment yields fuller understanding (Good 1961; Horwich 1982, 54–63). A more illuminating counterexample to Nicod’s condition is Good’s:

Suppose that we know we are in one or other of two worlds, and the hypothesis, H, under consideration is that all the crows in our world are black. We know in advance that in one world there are a hundred black crows, no crows that are not black, and a million other birds; and that in the other world there are a thousand black crows, one white one, and a million other birds. A bird is selected equiprobably at random from all the birds in our world. It turns out to be a black crow. This is strong evidence … that we are in the second world, wherein not all crows are black. Thus the observation of a black crow, in the circumstances described, undermines the hypothesis that all the crows in our world are black.

(Good 1967, 322)

Good’s example shows that Nicod’s condition can fail given the specific background credences. But the condition can hold in many cases without such “rigged” background credences. In such cases, the paradoxical reasoning goes through. What the Bayesian argues, however, is that, in such cases, the observation of a white sheet of paper gives only a negligible credential boost to the hypothesis that all ravens are black. The argument that Howson and Urbach give—relying as usual on a number of assumptions that they say “seem plausible to us” (1993, 127)—is that the conditional credence probability of all ravens being black (h) given evidence of a white sheet of paper (a non-black, non-raven) turns out to be $C(h)C(¬r|¬b)≈C(h)$, as

presumably there are vastly more non-black things in the universe than ravens. So even if no ravens are black, the probability of some object about which we know nothing, except that it is not black, being a non-raven, must be very high, indeed, practically 1.

(Howson and Urbach 1993, 128)

On the other hand, the conditional credence in h given a black raven is, given assumptions including the Principal Principle (see Section 4), $C(h)∑hθ∈ΘC(hθ)⋅θ,$, where Θ is a family of parameterized hypotheses about the proportion of ravens that are black (h is equivalent to h1; i.e., “θ = 1”). If C(h) is initially low, then the credence given the evidence could be high because most of the prior probability could go to hypotheses on which θ ≈ 0. So, although the observation of a white sheet of paper never much confirms h, the observation of a black raven can—given appropriate prior credences.

## 3.5. Further Problems for Bayesianism

I will mention two challenges for the probabilistic approach to scientific reasoning: justifying abductive reasoning and the problem of “old evidence.”

Adbuction, or inference to the best explanation, seems to be common in scientific practice. Unfortunately, it appears that Bayesians cannot explain why abduction should be rational.

This may be surprising because of the flexibility of the Bayesian approach. Can’t an agent who favors explanatory theories simply adopt a credence that takes explanatory theories to have the bulk of initial plausibility? This is the usual sort of Bayesian “solution” to the problem of induction: if the unconditional credences privilege inductive (greenish) hypotheses over counterinductive (gruesome) hypotheses, then the evidence of many observed green emeralds will tend to confirm the hypothesis that all emeralds are green (Howson 2000). Some find even this flexibility objectionable:

This suggests that the Bayesian vindication of inductive norms is less a matter of extracting them from the probability calculus and more one of our introducing them as independent assumptions that can be expressed in probabilistic language.

(Norton 2011, 396)

But others have argued that abduction is, at least in many uses, impossible to reconcile with the Bayesian model. If we can (even in a rough and ready way) separate propositions into those directly about observation and those that are theoretical, then we can construct a “rival” h to any theory h, which is committed just to the observational consequences of h (Glymour 1981). These two theories are equally consistent with the evidence because they predict the same appearances; but since h entails h, credence in hcannot be lower than credence in h, and hence his, and always will be, more credible. Van Fraassen (1989) uses a similar argument in defense of constructive empiricism.

But this doesn’t seem right: this kind of rival is never seriously considered in science and certainly is not regarded as a more credible alternative. The theoretical virtues of explanatory power, and the like, which hlacks and h may possess, play a significant role in scientific inference, one that must be being overlooked in the Bayesian model. So the Bayesian, at least in this respect, seems to flout standard scientific practice. In short: the simplest Bayesian reconstruction of abduction is: infer to the most likely hypothesis given the evidence. But that could be a very unappealing and gerrymandered hypothesis, not the hypothesis that abductive inference does in practice favor.

This is concerning if the test of Bayesianism is its ability to encompass and explain the appeal of standard aspects of scientific reasoning. Yet it is another worry—the problem of old evidence—that has been more widely discussed. The problem stems from the observation that “scientists commonly argue for their theories from evidence known long before the theories were introduced” (Glymour 1981). In these cases, there can be striking confirmation of a new theory precisely because it explains some well-known anomalous piece of evidence. The Bayesian framework doesn’t seem to permit this: if e is old evidence, known to be the case already, then any scientist who knows it assigns it credence 1. Why? Because credence is a measure of epistemic possibility. If, for all A knows, it might be that ¬e, then A does not know that e. Contraposing, A knows e only if there is no epistemic possibility of ¬e. And if there is no epistemic possibility of ¬e, then $CA(¬e)=0$, and $CA(e)=1$.15 If $CA(e)=1$, however, it follows from Bayes’ theorem that $CA(h|e)=CA(h)$, and, therefore, e does not confirm h for anyone who already knows e.

The most promising line of response might be to revise orthodox Bayesianism and deny that known evidence receives credence 1.16 But can we respond to this objection without revising the orthodox picture? Perhaps by arguing that old evidence should not confirm because confirmation of a theory should involve an increase in confidence. But it would be irrational to increase one’s confidence in a theory based on evidence one already had: either you have already factored in the old evidence, in which case it would be unreasonable to count its significance twice, or you have not factored in the old evidence, in which case you were unreasonable before noticing the confirmation. Either way, the only reasonable position is that confirmatory increases in confidence only arise with new evidence. If there seems to be confirmation by old evidence, that can only be an illusion, perhaps prompted by imperfect access to our own credences.

## 3.6. Believing Theories

The orthodox response to the problem of old evidence, as well as the problem itself, presupposes a certain view about how the acquisition of confirmatory evidence interacts with credences. Namely: if h is confirmed by e, and one acquires the evidence that e, one should come to be more confident in h. The simplest story that implements this idea is that if A’s credence is C, and e is the strongest piece of evidence A receives, their new credential state should be represented by that function C+ such that, for every proposition in the belief space p, $C+(p)=C(p|e)$. This procedure is known as conditionalization because it involves taking one’s old conditional credences given e to be one’s new credences after finding out that e. If conditionalization describes Bayesian learning, then the Bayesian story about confirmation can be described as follows: e confirms h for A just in case A will become more confident in h if they learn e.

There are a number of controversial issues surrounding conditionalization when it is treated as the one true principle for updating belief in light of evidence. For example, since it is a truth of probability theory that if C (p) = 1, then for every q, C (p|q) = 1, conditionalization can never lower the credence of any proposition of which an agent was ever certain. If conditionalization is the only rational update rule, then forgetting or revision of prior certainty in light of new evidence is never rational, and this is controversial (Arntzenius 2003; Talbott 1991).

But, regardless of whether conditionalization is always rational, it is sometimes rational. It will often be rational in the scientific case, where evidence is collected carefully and the true propositional significance of observation is painstakingly evaluated so that certainty is harder to obtain and occasions demanding the revision of prior certainty correspondingly rarer. In scientific contexts, then, the confirmation of theory by evidence in hand will often lead to increased confidence in the theory, to the degree governed by prior conditional credence. This is how Bayesian confirmation theory (and the theory of credence) proposes to explain probabilities of scientific hypotheses.

# 4. The Principal Principle

## 4.1. Coordinating Chance and Credence

We’ve discussed chance in theories and credibility of theories. Is there any way to connect the two? Lewis (1986) offered an answer: what he called the Principal Principle (PP) because “it seem[ed] to [him] to capture all we know about chance.” The discussion in Sections 2.22.3 already shows that we know more about chance than this; nevertheless, the PP is a central truth about the distinctive role of credence about chance in our cognitive economy.

We need some notation. Suppose that C denotes some reasonable initial credence function prior to updating in the light of evidence. Suppose that $〚P(p)=x〛$ denotes the proposition that the real chance of p is x. That is, it is the proposition that is true if the true theory t is such that the t-chance of p is x. Suppose that e is some admissible evidence, evidence

whose impact on credence about outcomes comes entirely by way of credence about the chances of those outcomes.

Using that notation, the PP can be expressed: $C(p|〚P(p)=x〛∧e)=x.$. Given a proposition about chance and further evidence that doesn’t trump the chances, any reasonable conditional credence in p should equal the chance.17

The PP doesn’t say that one should set one’s credences equal to the chances; if the chances are unknown, one cannot follow that recommendation; and yet, one can still have conditional credences recommended by PP. It does follow from the PP that (1) if you were to come to know the chances and nothing stronger and (2) you update by conditionalization from a reasonable prior credence, then your credences will match the chances.18

But even before knowing the chances, the PP allows us to assign credences to arbitrary outcomes, informed by the credences we assign to the candidate scientific hypotheses. For (assuming our potential evidence is admissible and that we may suppress mention of e), the theorem of total probability states that, where Q is a set of mutually exclusive and jointly exhaustive propositions, each with non-zero credence, then for arbitrary p, $C(p)=∑qi∈QC(p|qi)C(qi)$. If Q is the set of rival scientific hypotheses about the chances (i.e., each $Px=〚P(p)=x〛$), then the PP entails that $C(p)=∑Px∈QC(p|Px)C(Px)=∑Px∈Qx⋅C(Px).$

That is, the PP entails that one’s current credence in p is equal to one’s subjective expectation of the chance of p, weighted by one’s confidence in various hypotheses about the chance. You may not know which chance theory is right, but if you have a credence distribution over those theories, then the chances in those theories are already influencing your judgments about possible outcomes. Suppose, for example, you have an unexamined coin that you are 0.8 sure is fair and 0.2 sure is a double-headed trick coin. On the former hypothesis, P (heads) = 0.5; on the latter, P (heads) = 1. Applying the result just derived, C (heads) = 10.2 + 0.5 0.8 = 0.6, even while you remain in ignorance of which chance hypothesis is correct.

The PP is a deference principle, one claiming that rational credence defers to chances (Gaifman 1988; Hall 2004). It has the same form as other deference principles, such as deference to expert judgment (e.g., adopt conditional credences in rain tomorrow equal to the meterologist’s credences) or to more knowledgeable agents (e.g., adopt conditional credences given e equal to your future credences if you were to learn e; the Reflection Principle of van Fraassen [1984]). It is worth noting that the proposition about chance in the PP is not a conditional credence; this seems to reflect something about chance, namely, that the true unconditional chances are worth deferring to regardless of the (admissible) information that the agent is given (Joyce 2007).

Lewis showed that, from the PP, much of what we know about chance follows. For example, if it is accepted, we needn’t add as a separate formal constraint that chances are probabilities (Section 2.1). Suppose one came to know the chances, had no inadmissible evidence, and began with rational credences. Then, in accordance with the PP, one’s new unconditional credences in possible outcomes are everywhere equal to the chances of those outcomes. Since rational credences are probabilities, so too must chances be (Lewis 1986, 98). This and other successes of the PP in capturing truths about chance led Lewis to claim that

A feature of Reality deserves the name of chance to the extent that it occupies the definitive role of chance; and occupying the role means obeying the [PP].

(Lewis 1994, 489)

## 4.2. Credence, Confirmation, and Chance

One major issue remains outstanding. How can we link up probabilities in and probabilities of theories? In particular: can we offer an explanation of how, when a theory makes predictions about frequency, the observation of frequencies in line with prediction is confirmatory? Of course it is confirmatory, as encapsulated in the direct inference principle Hcp from Section 2.3. But it would be nice to offer an explanation.

A chance theory t, like the theory of radioactive decay, makes a prediction f that certain frequencies have a high chance according to the theory. Let us make a simplifying assumption that we are dealing with a collection of rival hypotheses that share an outcome space. Then, we may say that t predicts f if and only if Pt(f) is high. Notice that such a prediction doesn’t yet permit the machinery of Bayesian confirmation theory to show that the observation of f would confirm t. While the t-chance of f is high, there is no guarantee yet that anyone’s credence in f given t is correspondingly high. So, even if Pt(f) is considerably greater than CA(f), that doesn’t yield confirmation of t by f for A unless $CA(f|t)≈Pt(f)$.

Suppose t is some theory that entails a claim about the chance of a certain frequency outcome f. Presumably, for any plausible t, the rest of t is compatible with that part of t that is about the chance of f; so factorize t into a proposition $〚P(f)=x〛$ and a remainder t′. Suppose that one doesn’t have other information e that trumps the chances. Then, the PP applies to your initial credence function (assuming your current credence C was obtained by conditionalizing on e), so that $C(f|t)=Cinitial(f|t∧e)=Cinitial(f|〚P(f)=x〛∧t′∧e)=x$. But since $〚P(f)=x〛$ is a consequence of t, $Pt(f)=x$, and we thus have the desired equation, $C(f|t)=Pt(f)$, which allows the frequency predictions of a theory to confirm it—or disconfirm it (Howson and Urbach 1993, 342–347; Lewis 1986, 106–108).

## 4.3. Reductionism and Undermining

Not all is smooth sailing for the PP. Consider the case of undermining discussed in Section 2.2. There, a reductionist chance theory assigned some positive chance to a subsequent pattern of outcomes that, if it occurred, would undermine the chance theory. Since the actual chance theory t is logically incompatible with the chance theory that holds in the possibility where the undermining pattern exists, it is not possible for t to be true and for the undermining future u to occur. Given logical omniscience, $C(u|t)=0$. But, by the PP, $C(u|t)=C(〚Pt(u)=x〛∧e)=x>0$. Contradiction: the existence of undermining futures and the PP are jointly incompatible.

Anti-reductionists don’t think undermining futures are possible because (for them) the correct chance theory does not supervene on the pattern of outcomes. So this poses no problem for the PP on such views. But anti-reductionist views may have other issues around justifying the PP:

Don’t call any alleged feature of reality “chance” unless you’ve already shown that you have something, knowledge of which could constrain rational credence. I think I see, dimly but well enough, how knowledge of [actual outcome patterns] could constrain rational credence. I don’t begin to see, for instance, how knowledge that two universals stand in a certain special relation N* could constrain rational credence about the future coinstantiation of those universals.

(Lewis 1994, 484)

Here, Lewis explicitly attacks Armstrong’s anti-reductive view of chance, but the objection generalizes. (That said, why can’t such views simply take the obviousness of the PP at face value and adopt it as a primitive without further explanation of why PP holds?)

One could try to be a reductionist without undermining, perhaps offering a reduction in which the chance-grounding pattern of outcomes lies wholly in the past. Such a proposal would end up elevating highly contingent features of the early universe to an authoritative position with respect to the chances or entail that the laws of nature change over time as history accumulates—neither option appeals. So Lewis and others argue that PP needs to be revised (Hall 1994; Thau 1994).

A number of revisions have been proposed. Let chance abbreviate the nonrigid description the actual chance function (varying from world to world), and let P rigidly designate some particular probability function. Then, Joyce proposes this revised deference principle:

Let C be the credence function for someone whose evidence is limited to the past and present. Then, if the chances are given by probability function P, then $C(p|〚chance=P〛)=P(p|〚chance=P〛)$.

(Joyce 2007, 198)

Note that this formulation prevents the derivation of the contradiction. For the P-underming future u is incompatible with the actual chance distribution being P, so that $P(u|〚chance=P〛)=0$. By Joyce’s principle, then, $C(u|〚chance=P〛)=0$, as required.

One other approach deserves discussion. Ismael (2008) argues that the PP ought to be rejected, but that the real principle is the following where h is the admissible historical evidence, but excluding any propositions about the correct theory of chance t: if C is a rational credence function, then $C(p|h)=Pt(p)$. That is: one’s credence, given admissible information, ought to equal the chance. Note that, unlike the PP itself, the chance hypotheses are not themselves part of what the agent’s conditional credence is conditioned on. This principle is not susceptible to undermining because Ismael’s principle says nothing about how the chances bear on conditional credences like C(u|t), and, a fortiori, doesn’t say enough to enable the derivation of a contradiction.

This norm is hard to follow because one is typically ignorant of what the chances are. But if one has a credence distribution over chance hypotheses, one can follow this norm: set one’s credence equal to your best estimate of the chances. That is, for a rational credence C, in our notation, $C(p)=∑Px∈QPPx(p)⋅C(Px)$ (Ismael 2008, 298) where, of course, $PPx(p)=x$, which gives us back Lewis’s original expectation calculation without his detour through PP.

Unfortunately, there is a problem with both these responses to the problem with PP. If f is an undermining frequency outcome, then $C(f|t)≠Pt(f)$ in general—but the approximate equality of these quantities was needed to justify the principle of direct inference in Section 4.2.19 So it is hard to see how reductionists about chance who accept the possibility of undermining can explain, in complete generality, how theoretically predicted chances of frequency should link up with likelihoods.

If undermining futures are possible, then Joyce’s quantity $P(p|〚chance=P〛)$ is not in general equal to P(p), and only the former is linked to likelihoods. Similarly, Ismael’s principle is simply silent on how chance constrains likelihood in a way that PP was not (Briggs 2009, §3.4). That reticence allows these approaches to avoid the problematic derivation but also prevents us from explaining direct inference. That might be okay because HCP is independently plausible. But anti-reductionists might conclude that the ability to use the stronger principle PP is a mark in favor of their position.

Briggs (2010) covers material similar to that found in this chapter, and some readers may find a second opinion useful. Implicit in much of the chapter were references to various substantive theories of the truth conditions for probability claims, historically known as “interpretations” of probability: Hájek (2012) contains a much fuller and more explicit account of the various positions on this issue. Lewis (1986) is the classic account of the Principal Principle, the relationship between credence and chance. The foundations of the theory of credences can be found in a wonderful paper (Ramsey 1926); the best account of the application of the theory of credences to confirmation is Howson and Urbach (1993). There are many textbooks on the philosophy of probability; none is bad, but one that is particularly good on the metaphysical issues around chance is Handfield (2012). But perhaps the best place to start further reading is with Eagle (2011c), an anthology of classic articles with editorial context: it includes the items by Ramsey, Lewis, and Howson and Urbach just recommended.

## References

Albert, David Z. (1992). Quantum Mechanics and Experience. (Cambridge, MA: Harvard University Press).Find this resource:

Albert, David Z. (1996). “Elementary Quantum Metaphysics.” In J. T. Cushing, A. Fine, and S. Goldstein (eds.), Bohmian Mechanics and Quantum Theory (Dordrecht: Kluwer), 277–284.Find this resource:

Albert, David Z. (2000). Time and Chance. (Cambridge, MA: Harvard University Press).Find this resource:

Arntzenius, Frank. (2003). “Some Problems for Conditionalization and Reflection.” Journal of Philosophy 100: 356–370.Find this resource:

Bell, John S. (1964). “On the Einstein-Podolsky-Rosen Paradox.” Physics 1: 195–200.Find this resource:

Bell, John S. (1987). “Are There Quantum Jumps?.” In his Speakable and Unspeakable in Quantum Mechanics (Cambridge: Cambridge University Press), 201–212.Find this resource:

Bigelow, John, Collins, John, and Pargetter, Robert. (1993). “The Big Bad Bug: What Are the Humean’s Chances?” British Journal for the Philosophy of Science 44: 443–462.Find this resource:

Briggs, Rachael. (2009). “The Anatomy of the Big Bad Bug.” Noûs 43: 428–449.Find this resource:

Briggs, Rachael. (2010). “The Metaphysics of Chance.” Philosophy Compass 5: 938–952.Find this resource:

Buchak, Lara. (2013). Risk and Rationality. (Oxford: Oxford University Press).Find this resource:

Carnap, Rudolf. (1955). Statistical and Inductive Probability (Brooklyn: Galois Institute of Mathematics and Art). Reprinted in Eagle (2011c), 317–326; references are to this reprinting.Find this resource:

de Finetti, Bruno. (1937). “Foresight: Its Logical Laws, Its Subjective Sources.” In Henry E. Kyburg, Jr. and Howard E. Smokler (eds.), Studies in Subjective Probability, [1964]. (New York: Wiley), 93–158.Find this resource:

Eagle, Antony. (2011a). “Chance versus Randomness.” In Edward N. Zalta (ed.), Stanford Encyclopedia of Philosophy (Spring 2014 edition) http://plato.stanford.edu/archives/spr2014/entries/chance-randomness/.Find this resource:

Eagle, Antony. (2011b) “Deterministic Chance.” Noûs 45: 269–299.Find this resource:

Eagle, Antony (ed.). (2011c). Philosophy of Probability: Contemporary Readings (London: Routledge).Find this resource:

Eagle, Antony. (2014). “Is the Past a Matter of Chance?” In Alastair Wilson (ed.), Chance and Temporal Asymmetry (Oxford: Oxford University Press), 126–158.Find this resource:

Earman, John. (1992). Bayes or Bust? (Cambridge, MA: MIT Press).Find this resource:

Emery, Nina. (2015). “Chance, Possibility, and Explanation.” British Journal for the Philosophy of Science 66: 95–120.Find this resource:

Eriksson, Lina, and Hájek, Alan. (2007). “What Are Degrees of Belief?” Studia Logica 86: 183–213.Find this resource:

Fetzer, James H. (1981). Scientific Knowledge: Causation, Explanation, and Corroboration. (Dordrecht: D. Reidel).Find this resource:

Gaifman, Haim. (1988). “A Theory of Higher Order Probabilities.” In Brian Skyrms and William Harper (eds.), Causation, Chance and Credence, vol. 1. (Dordrecht: Kluwer), 191–219.Find this resource:

Garber, Daniel. (1983). “Old Evidence and Logical Omniscience in Bayesian Confirmation Theory.” In John Earman (ed.), Minnesota Studies in Philosophy of Science, vol.10: Testing Scientific Theories. (Minneapolis: University of Minnesota Press), 99–132.Find this resource:

Ghirardi, G. C., Rimini, A., and Weber, T. (1986). “Unified Dynamics for Microscopic and Macroscopic Systems.” Physical Review D 34: 470.Find this resource:

Giere, Ronald N. (1973). “Objective Single-Case Probabilities and the Foundations of Statistics.” In P. Suppes, L. Henkin, G. C. Moisil, et al. (eds.), Logic, Methodology and Philosophy of Science IV. (Amsterdam: North-Holland), 467–483.Find this resource:

Glymour, Clark. (1981). “Why I Am Not a Bayesian.” In C. Glymour (ed.), Theory and Evidence. (Chicago: University of Chicago Press), 63–93.Find this resource:

Glynn, Luke. (2010). “Deterministic Chance.” British Journal for the Philosophy of Science 61: 51–80.Find this resource:

Goldstein, Sheldon. (2013). “Bohmian Mechanics.” In Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Spring 2013 edition) http://plato.stanford.edu/archives/spr2013/entries/qm-bohm/.Find this resource:

Good, I. J. (1961). “The Paradox of Confirmation (III).” British Journal for the Philosophy of Science 11: 63–64.Find this resource:

Good, I. J. (1967). “The White Shoe Is a Red Herring.” British Journal for the Philosophy of Science 17: 322.Find this resource:

Greaves, H. (2007). “Probability in the Everett Interpretation.” Philosophy Compass 2: 109–128.Find this resource:

Greenstein, George, and Zajonc, Arthur G. (1997). The Quantum Challenge. (Sudbury, MA: Jones and Bartlett).Find this resource:

Hájek, Alan. (2003). “What Conditional Probability Could Not Be.” Synthese 137: 273–323.Find this resource:

Hájek, Alan. (2007). “The Reference Class Problem Is Your Problem Too.” Synthese 156(3): 563–585.Find this resource:

Hájek, Alan. (2008). “Arguments for—or Against—Probabilism?” British Journal for the Philosophy of Science 59(4): 793–819.Find this resource:

Hájek, Alan. (2012). “Interpretations of Probability.” In Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Winter 2012 Edition) http://plato.stanford.edu/archives/win2012/entries/probability-interpret/.Find this resource:

Hall, Ned. (1994).“Correcting the Guide to Objective Chance.” Mind 103: 505–518.Find this resource:

Hall, Ned. (2004). “Two Mistakes About Credence and Chance.” In Frank Jackson and Graham Priest (eds.), Lewisian Themes. (Oxford: Oxford University Press), 94–112.Find this resource:

Handfield, Toby. (2012). A Philosophical Guide to Chance. (Cambridge: Cambridge University Press).Find this resource:

Hempel, Carl. (1945). “Studies in the Logic of Confirmation.” Mind 54: 1–26, 97–121.Find this resource:

Hoefer, Carl. (2007). “The Third Way on Objective Probability: A Sceptic’s Guide to Objective Chance.” Mind 116: 549–596.Find this resource:

Horwich, Paul. (1982). Probability and Evidence. (Cambridge: Cambridge University Press).Find this resource:

Howson, Colin. (2000). Hume’s Problem: Induction and the Justification of Belief. (Oxford: Oxford University Press).Find this resource:

Howson, Colin, and Urbach, Peter. (1993). Scientific Reasoning: The Bayesian Approach, 2nd ed. (Chicago: Open Court).Find this resource:

Humphreys, Paul. (1985). “Why Propensities Cannot Be Probabilities.” Philosophical Review 94: 557–570.Find this resource:

Ismael, Jenann. (1996). “What Chances Could Not Be.” British Journal for the Philosophy of Science 47: 79–91.Find this resource:

Ismael, Jenann. (2003). “How to Combine Chance and Determinism: Thinking About the Future in an Everett Universe.” Philosophy of Science 70: 776–790.Find this resource:

Ismael, Jenann. (2008). “Raid! Dissolving the Big, Bad Bug.” Noûs 42: 292–307.Find this resource:

Ismael, Jenann. (2009). “Probability in Deterministic Physics.” Journal of Philosophy 106: 89–108.Find this resource:

Jeffrey, Richard C. (1983). The Logic of Decision, 2nd ed. (Chicago: University of Chicago Press).Find this resource:

Joyce, James M. (1998). “A Nonpragmatic Vindication of Probabilism.” Philosophy of Science 65: 575–603.Find this resource:

Joyce, James M. (2007). “Epistemic Deference: The Case of Chance.” Proceedings of the Aristotelian Society 107: 187–206.Find this resource:

Kemeny, J. (1955). “Fair Bets and Inductive Probabilities.” Journal of Symbolic Logic 20: 263–273.Find this resource:

Kolmogorov, A. N. (1933). Grundbegriffe der Wahrscheinlichkeitrechnung, Ergebnisse Der Mathematik und ihrer Grenzgebiete, no. 3 (Springer, Berlin); translated as (1956) Foundations of the Theory of Probability, 2nd ed. (New York: Chelsea).Find this resource:

Kratzer, Angelika. (1977). “What ‘Must’ and ‘Can’ Must and Can Mean.” Linguistics and Philosophy 1: 337–355.Find this resource:

Laplace, Pierre-Simon. (1951). Philosophical Essay on Probabilities. (New York: Dover).Find this resource:

Lehman, R. Sherman. (1955). “On Confirmation and Rational Betting.” Journal of Symbolic Logic 20: 251–262.Find this resource:

Lewis, David. (1986). “A Subjectivist’s Guide to Objective Chance.” In his Philosophical Papers, vol. 2. (Oxford: Oxford University Press), 83–132.Find this resource:

Lewis, David. (1994). “Humean Supervenience Debugged.” Mind 103: 473–490.Find this resource:

List, Christian, and Pivato, Marcus (2015) “Emergent Chance.” Philosophical Review 124: 59–117.Find this resource:

Loewer, Barry. (2001). “Determinism and Chance.” Studies in History and Philosophy of Modern Physics 32: 609–620.Find this resource:

Loewer, Barry. (2004). “David Lewis’s Humean Theory of Objective Chance.” Philosophy of Science 71: 1115–1125.Find this resource:

Maher, Patrick. (1993). Betting on Theories. (Cambridge: Cambridge University Press).Find this resource:

Meacham, Christopher J. G. (2005). “Three Proposals Regarding a Theory of Chance.” Philosophical Perspectives 19(1): 281–307.Find this resource:

Mellor, D. H. (2005). Probability: A Philosophical Introduction. (London: Routledge).Find this resource:

Milne, P. (1986). “Can There Be a Realist Single-Case Interpretation of Probability?” Erkenntnis 25: 129–132.Find this resource:

Milne, Peter. (1983). “A Note on Scale Invariance.” British Journal for the Philosophy of Science 34: 49–55.Find this resource:

Ney, Alyssa. (2013). “Introduction.” In Alyssa Ney and David Z. Albert (eds.), The Wave Function. (New York: Oxford University Press), 1–51.Find this resource:

North, Jill. (2010). “An Empirical Approach to Symmetry and Probability.” Studies in History and Philosophy of Modern Physics 41: 27–40.Find this resource:

Norton, John. (2011). “Challenges to Bayesian Confirmation Theory.” In P. S. Bandypadhyay and M. R. Forster (eds.), Handbook of the Philosophy of Science, vol. 7: Philosophy of Statistics. (Amsterdam: North-Holland), 391–439.Find this resource:

Perfors, Amy. (2012). “Bayesian Models of Cognition: What’s Built in After All?” Philosophy Compass 7(2): 127–138.Find this resource:

Perfors, Amy, Tenenbaum, Joshua B., Griffiths, Thomas L, et al. (2011) “A Tutorial Introduction to Bayesian Models of Cognitive Development.” Cognition 120: 302–321.Find this resource:

Popper, Karl. (1959). “A Propensity Interpretation of Probability.” British Journal for the Philosophy of Science 10: 25–42.Find this resource:

Popper, Karl. (1963). Conjectures and Refutations. (New York: Routledge).Find this resource:

Price, Huw. (1994). “A Neglected Route to Realism About Quantum Mechanics.” Mind 103: 303–336.Find this resource:

Ramsey, F. P. (1926). “Truth and Probability.” In D. H. Mellor (ed.), Philosophical Papers [1990]. (Cambridge: Cambridge University Press), 52–94.Find this resource:

Saunders, Simon, and Wallace, David. (2008). “Branching and Uncertainty.” British Journal for the Philosophy of Science 59: 293–305.Find this resource:

Savage, Leonard J. (1954). The Foundations of Statistics. (New York: Wiley).Find this resource:

Schaffer, Jonathan. (2003). “Principled Chances.” British Journal for the Philosophy of Science 54: 27–41.Find this resource:

Schaffer, Jonathan. (2007). “Deterministic Chance?” British Journal for the Philosophy of Science 58: 113–140.Find this resource:

Shimony, Abner. (2013). “Bell’s Theorem.” In Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Winter 2013 edition) http://plato.stanford.edu/archives/win2013/entries/bell-theorem/.Find this resource:

Sklar, Lawrence. (1993). Physics and Chance. (Cambridge: Cambridge University Press).Find this resource:

Sober, Elliott. (2010). “Evolutionary Theory and the Reality of Macro Probabilities.” In E. Eells and J. H. Fetzer (eds.), The Place of Probability in Science. (Dordrecht: Springer), 133–161.Find this resource:

Strevens, Michael. (1998). “Inferring Probabilities from Symmetries.” Noûs 32: 231–246.Find this resource:

Talbott, W. J. (1991). “Two Principles of Bayesian Epistemology.” Philosophical Studies 62: 135–150.Find this resource:

Thau, Michael. (1994). “Undermining and Admissibility.” Mind 103: 491–503.Find this resource:

van Fraassen, Bas C. (1984). “Belief and the Will.” Journal of Philosophy 81: 235–256.Find this resource:

van Fraassen, Bas C. (1989). Laws and Symmetry. (Oxford: Oxford University Press).Find this resource:

van Fraassen, Bas C. (1991). Quantum Mechanics: An Empiricist View. (Oxford: Oxford University Press).Find this resource:

von Mises, Richard. (1957). Probability, Statistics and Truth. (New York: Dover).Find this resource:

Wallace, David. (2011). The Emergent Multiverse. (Oxford and New York: Oxford University Press).Find this resource:

Wallace, David, and Timpson, Christopher G. (2010). “Quantum Mechanics on Spacetime I: Spacetime State Realism.” British Journal for the Philosophy of Science 61: 697–727.Find this resource:

White, Roger. (2005). “Epistemic Permissiveness.” Philosophical Perspectives 19: 445–459.Find this resource:

White, Roger. (2009). “Evidential Symmetry and Mushy Credence.” In T. S. Gendler and J. Hawthorne (eds.), Oxford Studies in Epistemology. (Oxford: Oxford University Press), 161–186.Find this resource:

Williamson, Timothy. (2000). Knowledge and Its Limits. (Oxford: Oxford University Press).Find this resource:

Zynda, Lyle. (2000). “Representation Theorems and Realism About Degrees of Belief.” Philosophy of Science 67: 45–69.Find this resource:

## Notes:

(1) I use theory as a catchall term for scientific hypotheses, both particular and all-encompassing; I hereby cancel any implication that theories are mere theories, not yet adequately supported by evidence. Such well-confirmed theories as general relativity are still theories in my anodyne sense.

(2) In the same vein is the stronger (assuming that the laws entail the chances) “Realization Principle” offered by Schaffer (2007, 124): that when the chance of p is positive according to T, there must be another world alike in history, sharing the laws of T, in which p.

(3) Since reductionists don’t accept that chance can float free from the pattern of occurrences, independent restriction to a world with the same chances is redundant where it is not—as in the earlier case—impossible.

(4) Here, “Ptw” denotes the probability function at time t derived from the laws of w.

(5) Stepping back from the earlier remarks, the STP may be satisfied by processes that produce merely exchangeable sequences of outcomes (de Finetti 1937). In these, the probability of an outcome sequence of a given length depends only on the frequency of outcome types in that sequence and not their order. For example, sampling without replacement from an urn gives rise to an exchangeable sequence, although the chances change as the constitution of the urn changes. If a “perfect repetition” involves drawing from an urn of the same prior constitution, then this may be a stable trial even though successive draws from the urn have different probabilities. It is possible to prove a law of large numbers for exchangeable sequences to the effect that almost all of them involve outcome frequencies equal to the chances of those outcomes.

(6) In the sense of explains on which T explains why p doesn’t entail or presuppose T.

(7) Rather like a large collection of simultaneous coin tosses, repeated every second—with the exception that getting decay removes the coin from the ensemble.

(8) Note that, in this presentation, spatial configurations are basic, and so the basic outcomes are all outcomes of finding the system in a particular spatial configuration. It turns out that an equivalent formulation can be obtained by constructing a wave function in momentum space, in which the points are possible assignments of momentum to each particle in the system; the wave function in momentum space is related to the wave function in configuration space by an invertible transformation, so either presentation is sufficient to characterize the system and its probabilities.

(9) It is possible to have viable hidden variable theories if the causal relationship between hidden prior state and measurement outcome is unorthodox (e.g., nonlocal causation, as in Bohmian mechanics [Goldstein 2013] or backward causation [Price 1994]).

(10) This standard assumption is substantive: it tells us that the probability of a certain outcome over time—a dynamic probability—is equal to the volume of a region of phase space at a time. This is just to assume the link between probability and frequency discussed at the beginning of Section 2.3. Technically, the assumption here is that of ergodicity: that, in the limit, sojourn time (the amount of time a system spends in a given macrocondition) corresponds to relative probability. This is much weaker than the Bernoulli assumption that classical statistical mechanical systems are like i.i.d. trials.

(11) Consider a sequence of 100 coin tosses with even numbers of heads and tails; there are more sequences where the pattern of H and T is irregular than there are sequences where the pattern follows some strict law, such as alternating H and T, or fifty heads followed by fifty tails (Eagle 2011a, §2.2).

(12) This story, however, seems also to predict that a closed system is vastly more likely to have evolved from a disorderly macrocondition, and it takes quite a bit of work to explain why the observed thermodynamic asymmetries should exist (Albert 2000, 71–96).

(13) Given the focus of this chapter, I am setting aside those philosophical views that deny that claims about support or confirmation can be helpfully understood using probability (Popper 1963). See Sprenger (this volume) for more on confirmation.

(14) We need also the premise that probabilistic fair betting rates are not susceptible to a Dutch book (Kemeny 1955; Lehman 1955).

(15) This argument is controversial. Because someone can reasonably have different fair betting rates on p and q, even when they know both, the betting dispositions interpretation of credence does not require that each known proposition has equal credence, so they needn’t each have credence 1. I’ll briefly return to this issue in Section 3.6.

(16) Other revisionary responses include claiming that something is learned and confirmatory: namely, that h entails e (Garber 1983). Note that an agent who genuinely learns this is subject to Dutch book before learning it (since they are uncertain of a trivial proposition) and so irrational; it cannot be a general account of confirmation by old evidence for rational agents.

(17) Lewis’s own formulation involves reference to time dependency of chance, a reference that, in my view, is a derivative feature of the dependence of chance on the physical trial and that I suppress here for simplicity (Eagle 2014).

(18) The PP is a schema that holds for any proposition of the form $〚P(p)=x〛$, not just the true one. So, whatever the chances might be, there is some instance of PP that allows updating on evidence about the actual chances.

(19) These quantities can come quite far apart. Here’s a toy example. Suppose a coin is to be tossed; if it lands heads, it will be destroyed; if it lands tails, it will be tossed once more. Suppose that the actual pattern of outcomes will be TH, and, on the basis of this pattern (and other symmetries in the coin), the chance of heads is in fact 0.5, matching the actual frequency. So, the correct theory of chance ch assigns a chance of 0.5 to H, 0.25 to TH, and 0.25 to TT. But if the history is H, let’s assume that frequency undermines the theory of chance; if history H had occurred, the chance of H would be 1. (The coin is only ever tossed once and lands heads on that toss.) So , because H being the history is incompatible with ch being the correct reductionist chance theory, but $Pch(H)=0.5$; and, indeed, $C(TH|ch)=1$ (since that is the only pattern of outcomes consistent with ch, while $Pch(TH)=0.25$. The two quantities are not even approximately equal.