Confirmation and Induction
Abstract and Keywords
Scientific knowledge is based on induction, ampliative inferences from experience. The chapter gives an overview of the problem of induction and the responses that philosophers of science have developed over time, focusing on attempts to spell out rules of inductive inference, and to balance attractive theoretical principles with judgments and intuitions in particular cases. That this is not always easy is demonstrated by challenges such as the paradox of the ravens, the problem of irrelevant conjunctions, and Goodman's new riddle of induction. The chapter then focuses on explications of the degree of confirmation of a hypothesis and compares various Bayesian measures of confirmation, as well as the Bayesian and frequentist approaches to statistical inference.
Keywords: confirmation, degree of confirmation, induction, probability, inductive logic, Bayesianism, statistical inference
1 The Problems of Induction
Induction is a method of inference that aims at gaining empirical knowledge. It has two main characteristics: first, it is based on experience. (The term “experience” is used interchangeably with “observations” and “evidence.”) Second, induction is ampliative, that is, the conclusions of an inductive inference are not necessary but contingent. The first feature makes sure that induction targets empirical knowledge; the second feature distinguishes induction from other modes of inference, such as deduction, where the truth of the premises guarantees the truth of the conclusion.
Induction can have many forms. The simplest one is enumerative induction: inferring a general principle or making a prediction based on the observation of particular instances. For example, if we have observed 100 black ravens and no nonblack ravens, we may predict that also raven 101 will be black. We may also infer the general principle that all ravens are black. But induction is not tied to the enumerative form and comprises all ampliative inferences from experience. For example, making weather forecasts or predicting economic growth rates are highly complex inductive inferences that amalgamate diverse bodies of evidence.
The first proper canon for inductive reasoning in science was set up by Francis Bacon, in his Novum Organon (Bacon 1620). Bacon’s emphasis is on learning the cause of a scientifically interesting phenomenon. He proposes a method of eliminative induction, that is, eliminating potential causes by coming up with cases where the cause but not the effect is present. For example, if the common flu occurs in a hot summer period, then cold cannot be its (sole) cause. A similar method, although with less meticulous devotion to the details, was outlined by René Descartes (1637). In his Discours de la Méthode, he explains how scientific problems should be divided into tractable subproblems and how their solutions should be combined.
(p. 186) Both philosophers realized that without induction, science would be blind to experience and unable to make progress; hence, their interest in spelling out the inductive method in detail. However, they do not provide a foundational justification of inductive inference. For this reason, C. D. Broad (1952, 142–143) stated that “inductive reasoning … has long been the glory of science” but a “scandal of philosophy.” This quote brings us directly to the notorious problem of induction (for a survey, see Vickers 2010).
Two problems of induction should be distinguished. The first, fundamental problem is about why we are justified to make inductive inferences; that is, why the method of induction works at all. The second problem is about telling good from bad inductions and developing rules of inductive inference. How do we learn from experience? Which inferences about future predictions or general theories are justified by these observations? And so on.
About 150 years after Bacon, David Hume (1739, 1748) was the first philosopher to clearly point out how hard the first problem of induction is (Treatise on Human Nature, 1739, Book I; Enquiry Concerning Human Understanding, 1748, Sections IV–V). Like Bacon, Hume was interested in learning the causes of an event as a primary means of acquiring scientific knowledge. Because causal relations cannot be inferred a priori, we have to learn them from experience; that is, we must use induction.
Hume divides all human reasoning into demonstrative and probabilistic reasoning. He notes that learning from experience falls into the latter category: no amount of observations can logically guarantee that the sun will rise tomorrow, that lightning is followed by thunder, that England will continue to lose penalty shootouts, and the like. In fact, regularities of the latter sort sometimes cease to be true. Inductive inferences cannot demonstrate the truth of the conclusion, but only make it probable.
This implies that inductive inferences have to be justified by nondemonstrative principles. Imagine that we examine the effect of heat on liquids. We observe in a number of experiments that water expands when heated. We predict that, upon repetition of the experiment, the same effect will occur. However, this is probable only if nature does not change its laws suddenly: “all inferences from experience suppose, as their foundation, that the future will resemble the past” (Hume 1748, 32). We are caught in a vicious circle: the justification of our inductive inferences invokes the principle of induction itself. This undermines the rationality of our preference for induction over other modes of inference (e.g., counterinduction).
The problem is that assuming the uniformity of nature in time can only be justified by inductive reasoning; namely, our past observations to that effect. Notably, pragmatic justifications of induction, by reference to past successes, do not fly because inferring from past to future reliability of induction also obeys the scheme of an inductive inference (Hume 1748).
Hume therefore draws the skeptical conclusion that we lack a rational basis for believing that causal relations inferred from experience are necessary or even probable. Instead, what makes us associate causes and effects are the irresistible psychological forces of custom and habit. The connection between cause and effect is in the mind (p. 187) rather than in the world, as witnessed by our inability to give a noncircular justification of induction (Hume 1748, 35–38).
Hume’s skeptical argument seems to undermine a lot of accepted scientific method. If induction does not have a rational basis, why perform experiments, predict future events, and infer to general theories? Why science at all? Note that Hume’s challenge also affects the second problem: if inductive inferences cannot be justified in an objective way, how are we going to tell which rules of induction are good and which are bad?
Influenced by Hume, Karl Popper (1959, 1983) developed a radical response to the problem of induction. For him, scientific reasoning is essentially a deductive and not an inductive exercise. A proper account of scientific method neither affords nor requires inductive inference—it is about testing hypotheses on the basis of their predictions:
The best we can say of a hypothesis is that up to now it has been able to show its worth … although, in principle, it can never be justified, verified, or even shown to be probable. This appraisal of the hypothesis relies solely upon deductive consequences (predictions) which may be drawn from the hypothesis: There is no need even to mention “induction.”
(Popper 1959, 346)
For Popper, the merits of a hypothesis are not determined by the degree to which past observations support it but by its performances in severe tests; that is, sincere attempts to overthrow it. Famous examples from science include the MichelsonMorley experiment as a test of the ether theory in physics and the Allais and Ellsberg experiments as tests of subjected expected utility theory in economics. Popper’s account also fits well with some aspects of statistical reasoning, such as the common use of null hypothesis significance tests (NHST): a hypothesis of interest is tested against a body of observations and “rejected” if the result is particularly unexpected. Such experiments do not warrant inferring or accepting a hypothesis; they are exclusively designed to disprove the null hypothesis and to collect evidence against it. More on NHST will be said in Section 6.
According to Popper’s view of scientific method, induction in the narrow sense of inferring a theory from a body of data is not only unjustified, but even superfluous. Science, our best source of knowledge, assesses theories on the basis of whether their predictions obtain. Those predictions are deduced from the theory. Hypotheses are corroborated when they survive a genuine refutation attempt, when their predictions were correct. Degrees of corroboration may be used to form practical preferences over hypotheses. Of course, this also amounts to learning from experience and to a form of induction, broadly conceived—but Popper clearly speaks out against the view that scientific hypotheses with universal scope are ever guaranteed or made probable by observations.
Popper’s stance proved to be influential in statistical methodology. In recent years, philosopher Deborah Mayo and econometrician Aris Spanos have worked intensively on this topic (e.g., Mayo 1996; Mayo and Spanos 2006). Their main idea is that our preferences among hypotheses are based on the degree of severity with which they have been (p. 188) tested. Informally stated, they propose that a hypothesis has been severely tested if (1) it fits well with the data, for some appropriate notion of fit; and (2), if the hypothesis were false, it would have been very likely to obtain data that favor the relevant alternative(s) much more than the actual data do.
We shall not, however, go into the details of their approach. Instead, we return to the second problem of induction: how should we tell good from bad inductions?
2 Logical Rules for Inductive Inference
Hume’s skeptical arguments show how difficult it is to argue for the reliability and truthconduciveness of inductive inference. However, this conclusion sounds more devastating than it really is. For example, on a reliabilist view of justification (Goldman 1986), beliefs are justified if generated by reliable processes that usually lead to true conclusions. If induction is factually reliable, our inductive inferences are justified even if we cannot access the reasons for why the method works. In a similar vein, John Norton (2003) has discarded formal theories of induction (e.g., those based on the enumerative scheme) and endorsed a material theory of induction: inductive inferences are justified by their conformity to facts.
Let us now return to the second problem of induction; that is, developing (possibly domainsensitive) rules of induction—principles that tell good from bad inductive inferences. In developing these principles, we will make use of the method of reflective equilibrium (Goodman 1955): we balance scientific practice with normative considerations (e.g., which methods track truth in the idealized circumstances of formal models). Good rules of induction are those that explain the success of science and that have at the same time favorable theoretical properties. The entire project is motivated by the analogy to deductive logic, in which rules of inference have been useful at guiding our logical reasoning. So why not generalize the project to inductive logic, to rules of reasoning under uncertainty and ampliative inferences from experience?
Inductive logic has been the project of a lot of twentiethcentury philosophy of science. Often it figures under the heading of finding criteria for when evidence confirms (or supports) a scientific hypothesis, that is, to explicate the concept of confirmation: to replace our vague pretheoretical concept, the explicandum, with a simple, exact, and fruitful concept that still resembles the explicandum—the explicatum (Carnap 1950, 3–7). The explication can proceed quantitatively, specifying degrees of confirmation, or qualitatively, as an allornothing relation between hypothesis and evidence. We will first look at qualitative analyses in firstorder logic since they outline the logical grammar of the concept. Several features and problems of qualitative accounts carry over to and motivate peculiar quantitative explications (Hempel 1945a).
Scientific laws often take the logical form ∀x: Fx → Gx; that is, all F’s are also G’s. For instance, take Kepler’s First Law that all planets travel in an elliptical orbit around the sun. (p. 189) Then, it is logical to distinguish two kinds of confirmation of such laws, as proposed by Jean Nicod (1961, 23–25): L’induction par l’infirmation proceeds by refuting and eliminating other candidate hypotheses (e.g., the hypothesis that planets revolve around the Earth). This is basically the method of eliminative induction that Bacon applied to causal inference. L’induction par la confirmation, by contrast, supports a hypothesis by citing their instances (e.g., a particular planet that has an elliptical orbit around the sun). This is perhaps the simplest and most natural account of scientific theory confirmation. It can be expressed as follows:
Nicod condition (NC): For a hypothesis of the form H = ∀x : Fx → Gx and an individual constant a, an observation report of the form Fa ∧ Ga confirms H.
However, NC fails to capture some essentials of scientific confirmation—see Sprenger (2010) for details. Consider the following highly plausible adequacy condition, due to Carl G. Hempel (1945a, 1945b):
Equivalence condition (EC): If H and H′ are logically equivalent sentences, then E confirms H if and only if E confirms H′.
EC should be satisfied by any logic of confirmation because, otherwise, the establishment of a confirmation relation would depend on the peculiar formulation of the hypothesis, which would contradict our goal of finding a logic of inductive inference.
Combining EC with NC leads, however, to paradoxical results. Let H = ∀x: Rx → Bx stand for the hypothesis that all ravens are black. H is equivalent to the hypothesis H′ = ∀x: ¬Bx → ¬Rx that no nonblack object is a raven. A white shoe is an instance of this hypothesis H′. By NC, observing a white shoe confirms H′, and, by EC, it also confirms H. Hence, observing a white shoe confirms the hypothesis that all ravens are black! But a white shoe appears to be an utterly irrelevant piece of evidence for assessing the hypothesis that all ravens are black. This result is often called the paradox of the ravens (Hempel 1945a, 13–15) or, after its inventor, Hempel’s paradox.
How should we deal with this problem? Hempel suggests biting the bullet and accepting that the observation of a white shoe confirms the raven hypothesis. After all, the observation eliminates a potential falsifier. To push this intuition further, imagine that we observe a gray, ravenlike bird, and, only after extended scrutiny, we find out that it is a crow. There is certainly a sense in which the crowness of that bird confirms the raven hypothesis, which was already close to refutation.
Hempel (1945a, 1945b) implements this strategy by developing a more sophisticated version of Nicod’s instance confirmation criterion in which background knowledge plays a distinct role, the socalled satisfaction criterion. We begin with the formulation of direct confirmation, which also captures the main idea of Hempel’s proposal:
Direct confirmation (Hempel): A piece of evidence E directly Hempelconfirms a hypothesis H relative to background knowledge K if and only if E and K jointly entail the (p. 190) development of H to the domain of E—that is, the restriction of H to the set of individual constants that figure in E. In other words, $E\wedge K\text{}\models {H}_{dom}{}_{(E)}$
The idea of this criterion is that our observations verify a general hypothesis as restricted to the actually observed objects. Hempel’s satisfaction criterion generalizes this intuition by demanding that a hypothesis be confirmed whenever it is entailed by a set of directly confirmed sentences. Notably, Clark Glymour’s account of bootstrap confirmation is also based on Hempel’s satisfaction criterion (Glymour 1980b).
However, Hempel did not notice that the satisfaction criterion does not resolve the raven paradox: E = ¬Ba directly confirms the raven hypothesis H relative to K = ¬Ra (because $E\wedge K\text{}\models \text{}{H}_{\left\{a\right\}}$). Thus, even objects that are known not to be ravens can confirm the hypothesis that all ravens are black. This is clearly an unacceptable conclusion and invalidates the satisfaction criterion as an acceptable account of qualitative confirmation, whatever its other merits may be (Fitelson and Hawthorne 2011).
Hempel also developed several adequacy criteria for confirmation, intended to narrow down the set of admissible explications. We have already encountered one of them, the EC. Another one, the special consequence condition, claims that consequences of a confirmed hypothesis are confirmed as well. Hypotheses confirmed by a particular piece of evidence form a deductively closed set of sentences. It is evident from the definition that the satisfaction criterion conforms to this condition. It also satisfies the consistency condition that demands (inter alia) that no contingent evidence supports two hypotheses that are inconsistent with each other. This sounds very plausible, but, as noted by Nelson Goodman (1955) in his book Fact, Fiction and Forecast, that condition conflicts with powerful inductive intuitions. Consider the following inference:
Observation: emerald e_{1} is green.
Observation: emerald e_{2} is green.
…
Generalization: All emeralds are green.
This seems to be a perfect example of a valid inductive inference. Now define the predicate “grue,” which applies to all green objects if they were observed for the first time prior to time t = “now” and to all blue objects if observed later. (This is just a description of the extension of the predicate—no object is supposed to change color.) The following inductive inference satisfies the same logical scheme as the previous one:
Observation: emerald e_{1} is grue.
Observation: emerald e_{2} is grue.
…
Generalization: All emeralds are grue.
(p. 191) In spite of the gerrymandered nature of the “grue” predicate, the inference is sound: it satisfies the basic scheme of enumerative induction, and the premises are undoubtedly true. But then, it is paradoxical that two valid inductive inferences support flatly opposite conclusions. The first generalization predicts emeralds observed in the future to be green; the second generalization predicts them to be blue. How do we escape from this dilemma?
Goodman considers the option that, in virtue of its gerrymandered nature, the predicate “grue” should not enter inductive inferences. He notes, however, that it is perfectly possible to redefine the standard predicates “green” and “blue” in terms of “grue” and its conjugate predicate “bleen” (= blue if observed prior to t, else green). Hence, any preference for the “natural” predicates and the “natural” inductive inference seems to be arbitrary. Unless we want to give up on the scheme of enumerative induction, we are forced into dropping Hempel’s consistency condition and accepting the paradoxical conclusion that both conclusions (all emeralds are green/grue) are, at least to a certain extent, confirmed by past observations. The general moral is that conclusions of an inductive inference need not be consistent with each other, unlike in deductive logic.
Goodman’s example, often called the new riddle of induction, illustrates that establishing rules of induction and adequacy criteria for confirmation is not a simple business. From a normative point of view, the consistency condition looks appealing, yet it clashes with intuitions about paradigmatic cases of enumerative inductive inference. The rest of this chapter will therefore focus on accounts of confirmation where inconsistent hypotheses can be confirmed simultaneously by the same piece of evidence.
A prominent representative of these accounts is hypotheticodeductive (HD) confirmation. HD confirmation considers a hypothesis to be confirmed if empirical predictions deduced from that hypothesis turn out to be successful (Gemes 1998; Sprenger 2011). An early description of HD confirmation was given by William Whewell:
Our hypotheses ought to foretel [sic] phenomena which have not yet been observed … the truth and accuracy of these predictions were a proof that the hypothesis was valuable and, at least to a great extent, true.
(Whewell 1847, 62–63)
Indeed, science often proceeds that way: our best theories about the atmospheric system suggest that emissions of greenhouse gases such as carbon dioxide and methane lead to global warming. That hypothesis has been confirmed by its successful predictions, such as shrinking Arctic ice sheets, increasing global temperatures, its ability to backtrack temperature variations in the past, and the like. The hypotheticodeductive concept of confirmation explicates the common idea of these and similar examples by stating that evidence confirms a hypothesis if we can derive it from the tested hypothesis together with suitable background assumptions. HD confirmation thus naturally aligns with the Popperian method for scientific inquiry that emphasizes the value of risky predictions, the need to test our scientific hypotheses as severely as possible, to derive precise predictions, and to check them with reality.
An elementary account of HD confirmation is defined as follows: (p. 192)
HypotheticoDeductive (HD) Confirmation: E HDconfirms H relative to background knowledge K if and only if
1. H∧K is consistent,
2. H∧K entails $E\left(H\wedge K\text{}\models \text{}E\right)$,
3. K alone does not entail E.
The explicit role of background knowledge can be used to circumvent the raven paradox along the lines that Hempel suggested. Neither Ra ∧Ba nor ¬Ba ∧¬Ra confirms the hypothesis H = ∀x: Rx → Bx, but Ba (“a is black”) does so relative to the background knowledge Ra, and ¬Ra (“a is no raven”) does so relative to the background knowledge ¬Ba. This makes intuitive sense: only if we know a to be a raven, the observation of its color is evidentially relevant; and, only if a is known to be nonblack, the observation that it is no raven supports the hypothesis that all ravens are black in the sense of eliminating a potential falsifier.
Although the HD account of confirmation fares well with respect to the raven paradox, it has a major problem. Irrelevant conjunctions can be tacked to the hypothesis H while preserving the confirmation relation (Glymour 1980a).
Tacking by conjunction problem: If H is confirmed by a piece of evidence E (relative to any K), H ∧X is confirmed by the same E for an arbitrary X that is consistent with H and K.
It is easy to see that this phenomenon is highly unsatisfactory: assume that the wave nature of light is confirmed by Young’s doubleslit experiment. According to the HD account of confirmation, this implies that the following hypothesis is confirmed: “Light is an electromagnetic wave and the star Sirius is a giant bulb.” This sounds completely absurd.
To see that HD confirmation suffers from the tacking problem, let us just check the three conditions for HD confirmation: assume that some hypothesis X is irrelevant to E, and that H ∧X ∧K is consistent. Let us also assume $H\wedge K\models \text{}\text{}E$ and that K alone does not entail E. Then, E confirms not only H, but also H ∧X (because $H\wedge K\models \text{}\text{}E$ implies $H\wedge K\wedge X\models \text{}\text{}E$).
Thus, tacking an arbitrary irrelevant conjunct to a confirmed hypothesis preserves the confirmation relation. This is very unsatisfactory. More generally, HD confirmation needs an answer to why a piece of evidence does not confirm every theory that implies it. Solving this problem is perhaps not impossible (Gemes 1993; Schurz 1991; Sprenger 2013), but it comes at the expense of major technical complications that compromise the simplicity and intuitive appeal of the hypotheticodeductive approach of confirmation.
In our discussion, several problems of qualitative confirmation have surfaced. First, qualitative confirmation is grounded on deductive relations between theory and evidence. These are quite an exception in modern, statisticsbased science, which standardly deals with messy bodies of evidence. Second, we saw that few adequacy conditions have withstood the test of time, thus making times hard for developing a (p. 193) qualitative logic of induction. Third, no qualitative account measures degree of confirmation and tells strongly from weakly confirmed hypotheses, although this is essential for a great deal of scientific reasoning. Therefore, we now turn to quantitative explications of confirmation.
3 Probability as Degree of Confirmation
The use of probability as a tool for describing degree of confirmation can be motivated in various ways. Here are some major reasons.
First, probability is, as quipped by Cicero, “the guide to life.” Judgments of probability motivate our actions: for example, the train I want to catch will probably be on time, so I have to run to catch it. Probability is used for expressing forecasts about events that affect our lives in manifold ways, from tomorrow’s weather to global climate, from economic developments to the probability of a new Middle East crisis. This paradigm was elaborated by philosophers and scientists such as Ramsey (1926), De Finetti (1937), and Jeffrey (1965).
Second, probability is the preferred tool for uncertain reasoning in science. Probability distributions are used for characterizing the value of a particular physical quantity or for describing measurement error. Theories are assessed on the basis of probabilistic hypothesis tests. By phrasing confirmation in terms of probability, we hope to connect philosophical analysis of inductive inference to scientific practice and integrate the goals of normative and descriptive adequacy (Howson and Urbach 2006).
Third, statistics, the science of analyzing and interpreting data, is couched in probability theory. Statisticians have proved powerful mathematical results on the foundations of probability and inductive learning. Analyses of confirmation may benefit from them and have done so in the past (e.g., Good 2009). Consider, for example, the famous De Finetti (1974) representation theorem for subjective probability or the convergence results for prior probability distributions by Gaifman and Snir (1982).
Fourth and last, increasing the probability of a conclusion seems to be the hallmark of a sound inductive inference, as already noted by Hume. Probability theory, and the Bayesian framework in particular, are especially wellsuited for capturing this intuition. The basic idea is to explicate degree of confirmation in terms of degrees of belief, which satisfy the axioms of probability. Degrees of belief are changed by conditionalization (if E is learned, p_{new}(H) = p(HE)), and the posterior probability p(HE) stands as the basis of inference and decisionmaking. This quantity can be calculated via Bayes’ theorem:
(p. 194) Chapter 20 of this handbook, concerning probability, provides more detail on the foundations of Bayesianism.
We now assume that degree of confirmation only depends on the joint probability distribution of the hypothesis H, the evidence E, and the background assumptions K. More precisely, we assume that E, H, and K are among the closed sentences $\mathfrak{L}$ of a language $\mathfrak{L}$ that describes our domain of interest. A Bayesian theory of confirmation can be explicated by a function ${\mathfrak{L}}^{3}\times \mathfrak{B}\to \mathbb{R}$, where $\mathfrak{B}$ is the set of probability measures on the algebra generated by $\mathfrak{L}$. This function assigns a realvalued degree of confirmation to any triple of sentences together with a probability (degree of belief) function. For the sake of simplicity, we will omit explicit reference to background knowledge since most accounts incorporate it by using the probability function p(·K) instead of p(·).
A classical method for explicating degree of confirmation is to specify adequacy conditions on the concept and to derive a representation theorem for a confirmation measure. This means that one characterizes the set of measures (and possibly the unique measure) that satisfies these constraints. This approach allows for a sharp demarcation and mathematically rigorous characterization of the explicandum and, at the same time, for critical discussion of the explicatum by means of defending and criticizing the properties that are encapsulated in the adequacy conditions.
The first constraint is mainly of a formal nature and serves as a tool for making further constraints more precise and facilitating proofs (Crupi 2013):
Formality: For any sentences $H,E\in \mathfrak{L}$ with probability measure p(·), c(H, E) is a measurable function from the joint probability distribution of H and E to a real number $c\left(H,E\right)\in \mathbb{R}$. In particular, there exists a function $f:\text{}{\left[0,\text{}1\right]}^{3}\to \mathbb{R}$ such that c(H, E) = f (p(H ∧ E), p(H), p(E)).
Since the three probabilities p(H ∧ E), p(H), p(E) suffice to determine the joint probability distribution of H and E, we can express c(H, E) as a function of these three arguments.
Another cornerstone for Bayesian explications of confirmation is the following principle:
Final probability incrementality: For any sentences H, E, and ${E}^{\prime}\in \text{}\text{}\mathfrak{L}$ with probability measure p(·),
c(H, E) > c(H, E′) if and only if p(HE) > p(HE′), and
c(H, E) < c(H, E′) if and only if p(HE) < p(HE′).
According to this principle, E confirms H more than E′ does if and only if it raises the probability of H to a higher level. Degree of confirmation covaries with boost in degree of belief, and satisfactory Bayesian explications of degree of confirmation should satisfy this condition.
There are now two main roads for adding more conditions, which will ultimately lead us to two different explications of confirmation: as firmness and as increase in firmness (p. 195) (or evidential support). They are also called the absolute and the incremental concept of confirmation.
Consider the following condition:
Local equivalence: For any sentences H, H′, and $E\in \text{}\text{}\mathfrak{L}$ with probability measure p(·), if H and H′ are logically equivalent given E $\left(\text{i}\text{.e}\text{.},E\wedge H\text{}\models {H}^{\prime},E\wedge {H}^{\prime}\text{}\text{}\models H\right)$, then c(H, E) = c(H′, E).
The plausible idea behind local equivalence is that E confirms the hypotheses H and H′ to an equal degree if they are logically equivalent conditional on E. If we buy into this intuition, local equivalence allows for a powerful (yet unpublished) representation theorem by Michael Schippers (see Crupi 2013):
Theorem 1: Formality, final probability incrementality, and local equivalence hold if and only if there is a nondecreasing function g: $\left[0,\text{1}\right]\to \mathbb{R}$ such that for any $H,E\in \text{}\mathfrak{L}$ and any p(·), c(H, E) = g(p(HE)).
On this account, scientific hypotheses count as wellconfirmed whenever they are sufficiently probable; that is, when p(HE) exceeds a certain (possibly contextrelative) threshold. Hence, all confirmation measures that satisfy the three given constraints are ordinally equivalent; that is, they can be mapped on each other by means of a nondecreasing function. In particular, their confirmation rankings agree: if there are two functions g and g′ that satisfy Theorem 1, with associated confirmation measures c and c′, then c(H, E) ≥ c(H′, E′) if and only if c′(H, E) ≥ c′(H′, E′). Since confirmation as firmness is a monotonically increasing function of p(HE), it is natural to set up the qualitative criterion that E confirms H (in the absolute sense) if and only if p(HE) ≥ t for some t ∈ [0, 1].
A nice consequence of the view of confirmation as firmness is that some longstanding problems of confirmation theory, such as the paradox of irrelevant conjunctions, dissolve. Remember that, on the HD account of confirmation, it was hard to avoid the conclusion that if E confirmed H, then it also confirmed H ∧ H′ for an arbitrary H′. On the view of confirmation as firmness, we automatically obtain c(H ∧ H′, E) ≤ c(H, E). These quantities are nondecreasing functions of p(H ∧ H′E) and p(HE), respectively, and they differ the more the less plausible H′ is and the less it coheres with H. Confirmation as firmness gives the intuitively correct response to the tacking by conjunction paradox.
It should also be noted that the idea of confirmation as firmness corresponds to Carnap’s concept of probability_{1} or “degree of confirmation” in his inductive logic. Carnap (1950) defines the degree of confirmation of a theory H relative to total evidence E as its probability conditional on E:
(p. 196) where this probability is in turn defined by the measure m that state descriptions of the (logical) universe receive. By the choice of the measure m and a learning parameter λ, Carnap (1952) characterizes an entire continuum of inductive methods from which three prominent special cases can be derived. First, inductive skepticism: the degree of confirmation of a hypothesis is not changed by incoming evidence. Second, the rule of direct inference: the degree of confirmation of the hypothesis equals the proportions of observations in the sample for which it is true. Third, the rule of succession (de Laplace 1814): a prediction principle that corresponds to Bayesian inference with a uniform prior distribution. Carnap thus ends up with various inductive logics that characterize different attitudes toward ampliative inference.
Carnap’s characterization of degree of confirmation does not always agree with the use of that concept in scientific reasoning. Above all, a confirmatory piece of evidence often provides a good argument for a theory even if the latter is unlikely. For instance, in the first years after Einstein invented the general theory of relativity (GTR), many scientists did not have a particularly high degree of belief in GTR because of its counterintuitive nature. However, it was agreed that GTR was wellconfirmed by its predictive and explanatory successes, such as the bending of starlight by the sun and the explanation of the Mercury perihelion shift (Earman 1992). The account of confirmation as firmness fails to capture this intuition. The same holds for experiments in presentday science whose confirmatory strength is not evaluated on the basis of the posterior probability of the tested hypothesis H but by whether the results provide significant evidence in favor of H; that is, whether they are more expected under H than under ¬H.
This last point brings us to a particularly unintuitive consequence of confirmation as firmness: E could confirm H even if it lowers the probability of H as long as p(HE) is still large enough. But nobody would call an experiment where the results E are negatively statistically relevant to H a confirmation of H. This brings us to the following natural definition:
Confirmation as increase in firmness: For any sentences H, $E\in \mathfrak{L}$ with probability measure p(·),
1. Evidence E confirms/supports hypothesis H (in the incremental sense) if and only if p(HE) > p(H).
2. Evidence E disconfirms/undermines hypothesis H if and only if p(HE) < p(H).
3. Evidence E is neutral with respect to H if and only if p(HE) = p(H).
In other words, E confirms H if and only E raises our degree of belief in H. Such explications of confirmation are also called statistical relevance accounts of confirmation because the neutral point is determined by the statistical independence of H and E. The analysis of confirmation as increase in firmness is the core business of Bayesian confirmation theory. This approach receives empirical support from findings by Tentori et al. (Tentori, Crupi, Bonini, and Osherson 2007): ordinary people use the concept of confirmation in a way that can be dissociated from posterior probability and that is strongly correlated with measures of confirmation as increase in firmness. (p. 197)
Table 9.1 I. J. Good’s (1967) counterexample to the paradox of the ravens
W_{1} 
W_{2} 


Black ravens 
100 
1,000 
Nonblack ravens 
0 
1 
Other birds 
1,000,000 
1,000,000 
Confirmation as increase in firmness has interesting relations to qualitative accounts of confirmation and the paradoxes we have encountered. For instance, HD confirmation now emerges as a special case: if H entails E, then p(EH) = 1, and, by Bayes’ theorem, p(HE) > p(H) (unless p(E) was equal to one in the first place). We can also spot what is wrong with the idea of instance confirmation. Remember Nicod’s (and Hempel’s) original idea; namely, that universal generalizations such as H = ∀x: Rx → Bx are confirmed by their instances. This is certainly true relative to some background knowledge. However, it is not true under all circumstances. I. J. Good (1967) constructed a simple counterexample in a note for the British Journal for the Philosophy of Science: There are only two possible worlds, W_{1} and W_{2}, whose properties are described by Table 9.1.
The raven hypothesis H is true whenever W_{1} is the case and false whenever W_{2} is the case. Conditional on these peculiar background assumptions, the observation of a black raven is evidence that W_{2} is the case and therefore evidence that not all ravens are black:
By an application of Bayes’ theorem, we infer P(W_{1}Ra.Ba) < P(W_{1}), and, given W_{1}≡ H, this amounts to a counterexample to NC. Universal conditionals are not always confirmed by their positive instances. We see how confirmation as increase in firmness elucidates our pretheoretic intuitions regarding the theory–evidence relation: the relevant background assumptions make a huge difference as to when a hypothesis is confirmed.
Confirmation as increase in firmness also allows for a solution of the comparative paradox of the ravens. That is, we can show that, relative to weak and plausible background assumptions, p(HRa ∧Ba) < p(H¬Ra ∧¬Ba) (Fitelson and Hawthorne 2011, Theorem 2). By final probability incrementality, this implies that Ra ∧Ba confirms H more than ¬Ra ∧¬Ba does. This shows, ultimately, why we consider a black raven to be more important evidence for the raven hypothesis than a white shoe.
Looking back to qualitative accounts once more, we see that Hempel’s original adequacy criteria are mirrored in the logical properties of confirmation as firmness and increase in firmness. According to the view of confirmation as firmness, every consequence H′ of a confirmed hypothesis H is confirmed, too (because p(H′) ≥ p(H)). This conforms to (p. 198) Hempel’s special consequence condition. The view of confirmation as increase in firmness relinquishes this condition, however, and obtains a number of attractive results in return.
4 Degree of Confirmation: Monism or Pluralism?
So far, we have not yet answered the question of how degree of confirmation (or evidential support) should be quantified. For scientists who want to report the results of their experience and quantify the strength of the observed evidence, this is certainly the most interesting question. It is also crucial for giving a Bayesian answer to the DuhemQuine problem (Duhem 1914). If an experiment fails and we ask ourselves which hypothesis to reject, the degree of (dis)confirmation of the involved hypotheses can be used to evaluate their standing. Unlike purely qualitative accounts of confirmation, a measure of degree of confirmation can indicate which hypothesis we should discard. For this reason, the search for a proper confirmation measure is more than a technical exercise: it is of vital importance for distributing praise and blame between different hypotheses that are involved in an experiment. The question, however, is which measure should be used. This is the question separating monists and pluralists in confirmation theory: monists believe that there is a single adequate or superior measure—a view that can be supported by theoretical reasons (Crupi, Tentori, and Gonzalez 2007; Milne 1996) and empirical research (e.g., coherence with folk confirmation judgments; see Tentori et al. 2007). Pluralists think that such arguments do not specify a single adequate measure and that there are several valuable and irreducible confirmation measures (e.g., Eells and Fitelson 2000; Fitelson 1999, 2001).
Table 9.2 provides a rough survey of the measures that are frequently discussed in the literature. We have normalized them such that for each measure c(H, E), confirmation amounts to c(H, E) > 0, neutrality to c(H, E) = 0, and disconfirmation to c(H, E) < 0. This allows for a better comparison of the measures and their properties.
Evidently, these measures all have quite distinct properties. We shall now transfer the methodology from our analysis of confirmation as firmness and characterize them in terms of representation results. As before, formality and final probability incrementality will serve as minimal reasonable constraints on any measure of evidential support. Notably, two measures in the list, namely c′ and s, are incompatible with final probability incrementality, and objections based on allegedly vicious symmetries have been raised against c′ and r (Eells and Fitelson 2002; Fitelson 2001).
Here are further constraints on measures of evidential support that exploit the increase of firmness intuition in different ways:
(p. 199)Disjunction of alternatives: If H and H′ are mutually exclusive, then
c(H, E) > c(H ∨ H′, E′) if and only if p(H′E) > p(H′),
Table 9.2 A list of popular measures of evidential support
Difference Measure 
$d\left(H,E\right)\text{}=p\left(HE\right)p\left(H\right)$ 
LogRatio Measure 
$r\left(H,E\right)\text{}=\text{}log\frac{p(HE)}{p(H)}$ 
LogLikelihood Measure 
$l\left(H,E\right)\text{}=\text{}log\frac{p\left(EH\right)}{p(E\neg H)}$ 
KemenyOppenheim Measure 
$k\left(H,E\right)\text{}=\frac{p(EH)p(E\neg H)}{p(EH)+p(E\neg H)}$ 
Rips Measure 
${r}^{\prime}\left(H,E\right)\text{}=\frac{p(HE)p(H)}{1p(H)}$ 
CrupiTentori Measure 
$z(H,E)=\{\begin{array}{c}\frac{p(HE)p(H)}{1p(H)}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}p(HE)\ge p(H)\\ \frac{p(HE)p(H)}{p(H)}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{}p(HE)<p(H)\end{array}$ 
ChristensenJoyce Measure 
$s\left(H,E\right)\text{}=p\left(HE\right)p\left(H\neg E\right)$ 
Carnap’s Relevance Measure 
${c}^{\prime}\left(H,E\right)\text{}=p(H\wedge E)p\left(H\right)p\left(E\right)$ 
with corresponding conditions for c(H, E) = c(H ∨ H′, E′) and c(H, E) < c(H ∨ H′, E′).
That is, E confirms H ∨ H′ more than H if and only if E is statistically relevant to H′. The idea behind this condition is that the sum (H ∨ H′) is confirmed to a greater degree than each of the parts (H, H′) when each part is individually confirmed by E.
Law of Likelihood:
c(H, E) > c(H′, E) if and only if p(EH) > p(EH′),
with corresponding conditions for c(H, E) = c(H′, E′) and c(H, E) < c(H′, E′).
This condition has a long history of discussion in philosophy and statistics (e.g., Edwards 1972; Hacking 1965). The idea is that E favors H over H′ if and only if the likelihood of H on E is greater than the likelihood of H′ on E. In other words, E is more expected under H than under H′. The Law of Likelihood also stands at the basis of the likelihoodist theory of confirmation, which analyzes confirmation as a comparative relation between two (p. 200) competing hypotheses (Royall 1997; Sober 2008). Likelihoodists eschew judgments on how much E confirms H without reference to specific alternatives.
Modularity:$\text{If}\text{\hspace{0.17em}}\text{\hspace{0.17em}}p(EH\wedge E\prime )\text{}=p\left(EH\right)\text{}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.17em}}p(E\neg H\wedge E\prime )\text{}=p\left(E\neg H\right),\text{\hspace{0.17em}}\text{then}\text{\hspace{0.17em}}\text{\hspace{0.17em}}c\left(H,E\right)\text{}={c}_{E\prime}\left(H,E\right)$ where ${c}_{E\prime}$ denotes confirmation relative to the probability distribution conditional on E′.
This constraint screens off irrelevant evidence. If E′ does not affect the likelihoods of H and ¬H on E, then conditioning on E′—now supposedly irrelevant evidence—does not alter the degree of confirmation.
Contraposition/Commutativity: If E confirms H, then c(H, E) = c(¬E, ¬H); and if E disconfirms H, then c(H, E) = c(E, H).
These constraints are motivated by the analogy of confirmation to partial deductive entailment. If $H,\text{}\text{}\u22a2E$, then also $\neg E\u22a2\neg H$, and if E refutes H, then H also refutes E. If confirmation is thought of as a generalization of deductive entailment to uncertain inference, then these conditions are very natural and reasonable (Tentori et al. 2007).
Combined with formality and final probability incrementality, each of these four principles singles out a specific measure of confirmation up to ordinal equivalence (Crupi 2013; Crupi, Chater, and Tentori 2013; Heckerman 1988):
Theorem 2 (representation results for confirmation measures):
1. If formality, final probability incrementality, and disjunction of alternatives hold, then there is a nondecreasing function g such that c(H, E) = g(d(H, E)).
2. If formality, final probability incrementality, and Law of Likelihood hold, then there is a nondecreasing function g such that c(H, E) = g(r(H, E)).
3. If formality, final probability incrementality, and modularity hold, then there are nondecreasing functions g and g′ such that c(H, E) = g(l(H, E)) and c(H, E) = g′ (k(H, E)). Note that k and l are ordinally equivalent.
4. If formality, final probability incrementality, and commutativity hold, then there is a nondecreasing function g such that c(H, E) = g(z(H, E)).
That is, many confirmation measures can be characterized by means of a small set of adequacy conditions. It should also be noted that the Bayes factor, a popular measure of evidence in Bayesian statistics (Kass and Raftery 1995), falls under the scope of the theorem since it is ordinally equivalent to the loglikelihood measure l and the Kemeny and Oppenheim (1952) measure k. This is also evident from its mathematical form
(p. 201) for mutually exclusive hypotheses H_{0} and H_{1} (for which H and ¬H may be substituted).
To show that the difference between these measures has substantial philosophical ramifications, let us go back to the problem of irrelevant conjunctions. If we analyze this problem in terms of the ratio measure r, then we obtain, assuming $H\text{}\text{}\u22a2E$, that for an “irrelevant” conjunct H′,
such that the irrelevant conjunction is supported to the same degree as the original hypothesis. This consequence is certainly unacceptable as a judgment of evidential support since H′ could literally be any hypothesis unrelated to the evidence (e.g., “the star Sirius is a giant light bulb”). In addition, the result does not only hold for the special case of deductive entailment: it holds whenever the likelihoods of H and H ∧ H′ on E are the same; that is, p(EH ∧ H′) = p(EH).
The other measures fare better in this respect: whenever p(EH ∧ H′) = p(EH), all other measures in Theorem 2 reach the conclusion that c(H ∧ H′, E) < c(H, E) (Hawthorne and Fitelson 2004). In this way, we can see how Bayesian confirmation theory improves on HD confirmation and other qualitative accounts of confirmation: the paradox is acknowledged, but, at the same time, it is demonstrated how it can be mitigated.
That said, it is difficult to form preferences over the remaining measures. Comparing the adequacy conditions might not lead to conclusive results due to the divergent motivations that support them. Moreover, it has been shown that none of the remaining measures satisfies the following two conditions: (1) degree of confirmation is maximal if E implies H; (2) the a priori informativity (cashed out in terms of predictive content and improbability) of a hypothesis contributes to degree of confirmation (Brössel 2013, 389–390). This means that the idea of confirmation as a generalization of partial entailment and as a reward for risky predictions cannot be reconciled with each other, thus posing a further dilemma for confirmation monism. One may therefore go for pluralism and accept that there are different senses of degree of confirmation that correspond to different explications. For example, d strikes us as a natural explication of increase in subjective confidence, z generalizes deductive entailment, and l and k measure the discriminatory force of the evidence regarding H and ¬H.
Although Bayesian confirmation theory yields many interesting results and has sparked interests among experimental psychologists, too, one main criticism has been leveled again and again: that it misrepresents actual scientific reasoning. In the remaining sections, we present two major challenges for Bayesian confirmation theory fed by that feeling: the problem of old evidence (Glymour 1980b) and the rivaling frequentist approach to learning from experience (Mayo 1996).
(p. 202) 5 The Problem of Old Evidence
In this brief section, we expose one of the most troubling and persistent challenges for confirmation as increase in firmness: the problem of old evidence. Consider a phenomenon E that is unexplained by the available scientific theories. At some point, a theory H is discovered that accounts for E. Then, E is “old evidence”: at the time when H is developed, the scientist is already certain or close to certain that the phenomenon E is real. Nevertheless, E apparently confirms H—at least if H was invented on independent grounds. After all, it resolves a wellknown and persistent observational anomaly.
A famous case of old evidence in science is the Mercury perihelion anomaly (Earman 1992; Glymour 1980b). For a long time, the shift of the Mercury perihelion could not be explained by Newtonian mechanics or any other reputable physical theory. Then, Einstein realized that his GTR explained the perihelion shift. This discovery conferred a substantial degree of confirmation on GTR, much more than some pieces of novel evidence. Similar reasoning patterns apply in other scientific disciplines where new theories explain away wellknown anomalies.
The reasoning of these scientists is hard to capture in the Bayesian account of confirmation as increase in firmness. E confirms H if and only if the posterior degree of belief in H, p(HE), exceeds the prior degree of belief in H, p(H). When E is old evidence and already known to the scientist, the prior degree of belief in E is maximal: p(E) = 1. But with that assumption, it follows that the posterior probability of H cannot be greater than the prior probability: p(HE) = p(H) · p(EH) ≤ p(H). Hence, E does not confirm H. The very idea of confirmation by old evidence, or equivalently, confirmation by accounting for wellknown observational anomalies, seems impossible to describe in the Bayesian belief kinematics. Some critics, like Clark Glymour, have gone so far to claim that Bayesian confirmation only describes epiphenomena of genuine confirmation because it misses the relevant structural relations between theory and evidence.
There are various solution proposals to the problem of old evidence. One approach, adopted by Howson (1984), interprets the confirmation relation with respect to counterfactual degrees of belief, where E is subtracted from the agent’s actual background knowledge. Another approach is to claim that confirmation by old evidence is not about learning the actual evidence, but about learning a logical or explanatory relation between theory and evidence. It seems intuitive that Einstein’s confidence in GTR increased on learning that it implied the perihelion shift of Mercury and that this discovery was the real confirming event.
Indeed, confirmation theorists have set up Bayesian models where learning $H\u22a2E$ increases the probability of H (e.g., Jeffrey, 1983) under certain assumptions. The question is, of course, whether these assumptions are sufficiently plausible and realistic. For critical discussion and further constructive proposals, see Earman (1992) and Sprenger (2015a).
(p. 203) 6 Bayesianism and Frequentism
A major alternative to Bayesian confirmation theory is frequentist inference. Many of its principles have been developed by the geneticist and statistician R. A. Fisher (see Neyman and Pearson [1933] for a more behavioral account). According to frequentism, inductive inference does not concern our degrees of belief. That concept is part of individual psychology and not suitable for quantifying scientific evidence. Instead of expressing degrees of belief, probability is interpreted as the limiting frequency of an event in a large number of trials. It enters inductive inference via the concept of a sampling distribution; that is, the probability distribution of an observable in a random sample.
The basic method of frequentist inference is hypothesis testing and, more precisely, NHST. For Fisher, the purpose of statistical analysis consists in assessing the relation of a hypothesis to a body of observed data. The tested hypothesis usually stands for the absence of an interesting phenomenon (e.g., no causal relationship between two variables, no observable difference between two treatments, etc.). Therefore it is often called the default or null hypothesis (or null). In remarkable agreement with Popper, Fisher states that the only purpose of an experiment is to “give the facts a chance of disproving the null hypothesis” (Fisher 1925, 16): the purpose of a test is to find evidence against the null. Conversely, failure to reject the null hypothesis does not imply positive evidence for the null (on this problem, see Popper 1954; Sprenger 2015b).
Unlike Popper (1959), Fisher aims at experimental and statistical demonstrations of a phenomenon. Thus, he needs a criterion for when an effect is real and not an experimental fabrication. He suggests that we should infer to such an effect when the observed data are too improbable under the null hypothesis:
“either an exceptionally rare chance has occurred, or the theory [= the null hypothesis] is not true” (Fisher 1956, 39).
This basic scheme of inference is called Fisher’s disjunction by (Hacking 1965), and it stands at the heart of significance testing. It infers to the falsity of the null hypothesis as the best explanation of an unexpected result (for criticism, see Royall 1997; Spielman 1974).
Evidence against the null is measured by means of the pvalue. Here is an illustration. Suppose that we want to test whether the realvalued parameter θ, our quantity of interest, diverges “significantly” from H_{0}: θ = θ_{0}. We collect independent and identically distributed (i.i.d.) data x: = (x_{1}, …, x_{N}) whose distribution is Gaussian and centered around θ. Assume now that the population variance σ^{2} is known, so ${x}_{i}\sim N\left(\theta ,{\sigma}^{2}\right)$ for each x_{i}. Then, the discrepancy in the data x with respect to the postulated mean value ${\theta}_{0}$ is measured by means of the statistic
We may reinterpret this equation as
Determining whether a result is significant or not depends on the pvalue or observed significance level; that is, the “tail area” of the null under the observed data. This value depends on z and can be computed as
that is, as the probability of observing a more extreme discrepancy under the null than the one which is actually observed. Figure 9.1 displays an observed significance level p = 0.04 as the integral under the probability distribution function—a result that would typically count as substantial evidence against the null hypothesis (“p < .05”).
For the frequentist practitioner, pvalues are practical, replicable, and objective measures of evidence against the null: they can be computed automatically once the statistical model is specified, and they only depend on the sampling distribution of the data under H_{0}. Fisher interpreted them as “a measure of the rational grounds for the disbelief [in the null hypothesis] it augments” (Fisher 1956, 43).
The virtues and vices of significance testing and pvalues have been discussed at length in the literature (e.g., Cohen 1994; Harlow, Mulaik, and Steiger 1997), and it would go beyond the scope of this chapter to deliver a comprehensive critique. By now, it is (p. 205) consensus that inductive inference based on pvalues leads to severe epistemic and practical problems. Several alternatives, such as confidence intervals at a predefined level $\alpha $, have been promoted in recent years (Cumming 2014; Cumming and Finch 2005). They are interval estimators defined as follows: for each possible value ${\theta}^{\prime}$ of the unknown parameter $\theta $, we select the interval of data points x that will not lead to a statistical rejection of the null hypothesis $\theta ={\theta}^{\prime}$ in a significance test at level $\alpha $. Conversely, the confidence interval for $\theta $, given an observation x, comprises all values of $\mathrm{\theta}$ that are consistent with x in the sense of surviving a NHST at level $\mathrm{\alpha}$.
We conclude by highlighting the principal philosophical difference between Bayesian and frequentist inference. The following principle is typically accepted by Bayesian statisticians and confirmation theorists alike:
Likelihood Principle (LP): Consider a statistical model $\mathcal{M}$ with a set of probability measures $p\left(\cdot \theta \right)$ parametrized by a parameter of interest $\theta \in \mathrm{\Theta}$. Assume we conduct an experiment Ɛ in $\mathcal{M}$ Then, all evidence about $\theta $ generated by Ɛ is contained in the likelihood function $p\left(x\theta \right)$, where the observed data x are treated as a constant.
Indeed, in the simple case of only two hypotheses (H and ¬H), the posterior probabilities are only a function of p(EH) and p(E¬H), given the prior probabilities. This is evident from writing the wellknown Bayes’ theorem as
So Bayesians typically accept the LP, as is also evident from the use of Bayes factors as a measure of statistical evidence.
Frequentists reject the LP: his or her measures of evidence, such as pvalues, are based on the probability of results that could have happened, but did actually not happen. The evidence depends on whether the actual data fit the null better or worse than most other possible data (see Figure 9.1). By contrast, Bayesian induction is “actualist”: the only thing that matters for evaluating the evidence and making decisions is the predictive performance of the competing hypotheses on the actually observed evidence. Factors that determine the probability of possible but unobserved outcomes, such as the experimental protocol, the intentions of the experimenter, the risk of early termination, and the like, may have a role in experimental design, but they do not matter for measuring evidence post hoc (Edwards, Lindman, and Savage 1963; Sprenger 2009).
The likelihood principle is often seen as a strong argument for preferring Bayesian to frequentist inference (e.g., Berger and Wolpert 1984). In practice, statistical data analysis still follows frequentist principles more often than not: mainly because, in many applied problems, it is difficult to elicit subjective degrees of belief and to model prior probability distributions.
(p. 206) 7 Conclusion
This chapter has given an overview of the problem of induction and the responses that philosophers of science have developed over time. These days, the focus is not so much on providing an answer to Hume’s challenge: it is wellacknowledged that no purely epistemic, noncircular justification of induction can be given. Instead, focus has shifted to characterizing valid inductive inferences, carefully balancing attractive theoretical principles with judgments and intuitions in concrete cases. That this is not always easy has been demonstrated by challenges such as the paradox of the ravens, the problem of irrelevant conjunctions, and Goodman’s new riddle of induction.
In the context of this project, degree of confirmation becomes especially important: it indicates to what extent an inductive inference is justified. Explications of confirmation can be distinguished into two groups: qualitative and quantitative ones. The first serve well to illustrate the “grammar” of the concept, but they have limited applicability.
We motivated why probability is an adequate tool for explicating degree of confirmation and investigated probabilistic (Bayesian) confirmation measures. We distinguished two senses of confirmation—confirmation as firmness and confirmation as increase in firmness—and investigated various confirmation measures. That said, there are also alternative accounts of inductive reasoning, some of which are nonprobabilistic, such as objective Bayesianism (Williamson 2010), ranking functions (Spohn 1990), evidential probability (Kyburg 1961) and the DempsterShafer theory of evidence (Shafer 1976; see also Haenni, Romeijn, Wheeler, and Williamson 2011).
Finally, we provided a short glimpse of the methodological debate between Bayesians and frequentists in statistical inference. Confirmation theory will have to engage increasingly more often with debates in statistical methodology if it does not want to lose contact with inductive inference in science—which was Bacon’s target in the first place.
Acknowledgments
I would like to thank Matteo Colombo, Vincenzo Crupi, Raoul Gervais, Paul Humphreys, and Michael Schippers for their valuable feedback on this chapter. Research on this chapter was supported through the Vidi project “Making Scientific Inferences More Objective” (grant no. 27620023) by the Netherlands Organisation for Scientific Research (NWO) and ERC Starting Investigator Grant No. 640638.
Suggested Readings
For qualitative confirmation theory, the classical texts are Hempel (1945a, 1945b). For an overview of various logics of inductive inference with scientific applications, see Haenni et al. (2011). A classical introduction to Bayesian reasoning, with comparison to (p. 207) frequentism, is given by Howson and Urbach (2006). Earman (1992) and Crupi (2013) offer comprehensive reviews of Bayesian confirmation theory, and Good (2009) is an exciting collection of essays on induction, probability, and statistical inference.
References
Bacon, F. (1620). Novum Organum; Or, True Suggestions for the Interpretation of Nature (London: William Pickering).Find this resource:
Berger, J., and Wolpert, R. (1984). The Likelihood Principle (Hayward, CA: Institute of Mathematical Statistics).Find this resource:
Birnbaum, A. (1962). “On the Foundations of Statistical Inference.” Journal of the American Statistical Association 57(298): 269–306.Find this resource:
Broad, C. D. (1952). Ethics and the History of Philosophy (London: Routledge).Find this resource:
Brössel, P. (2013). “The Problem of Measure Sensitivity Redux.” Philosophy of Science 80(3): 378–397.Find this resource:
Carnap, R. (1950). Logical Foundations of Probability (Chicago: University of Chicago Press).Find this resource:
Carnap, R. (1952). Continuum of Inductive Methods (Chicago: University of Chicago Press).Find this resource:
Cohen, J. (1994). “The Earth Is Round (p<.05).” Psychological Review 49: 997–1001.Find this resource:
Crupi, V. (2013). “Confirmation.” In E. Zalta (ed.), The Stanford Encyclopedia of Philosophy. Retrieved on December 28, 2015, at http://plato.stanford.edu/entries/confirmation/.Find this resource:
Crupi, V., Chater, N., and Tentori, K. (2013). “New Axioms for Probability and Likelihood Ratio Measures.” British Journal for the Philosophy of Science 64(1): 189–204.Find this resource:
Crupi, V., Tentori, K., and Gonzalez, M. (2007). “On Bayesian Measures of Evidential Support: Theoretical and Empirical Issues.” Philosophy of Science 74: 229–252.Find this resource:
Cumming, G. (2014): “The New Statistics: Why and How.” Psychological Science 25: 7–29.Find this resource:
Cumming, G. and Finch, S. (2005). “Inference by Eye: Confidence Intervals and How to Read Pictures of Data.” American Psychologist 60(2): 170–180.Find this resource:
De Finetti, B. (1937). “La Prévision: ses Lois Logiques, ses Sources Subjectives.” Annales de l’institut Henri Poincaré 7: 1–68.Find this resource:
De Finetti, B. (1974). Theory of Probability. Vol. 1 (New York: John Wiley & Sons).Find this resource:
de Laplace, P. S. (1814). A Philosophical Essay on Probabilities (Mineola, NY: Dover).Find this resource:
Descartes, R. (1637). Discours de la méthode (Leiden: Jan Maire).Find this resource:
Duhem, P. (1914). La Théorie Physique: Son Objet, Sa Structure (Paris: Vrin).Find this resource:
Earman, J. (1992). Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory (Cambridge, MA: MIT Press).Find this resource:
Edwards, A. (1972). Likelihood (Cambridge: Cambridge University Press).Find this resource:
Edwards, W., Lindman, H., and Savage, L. J. (1963). “Bayesian Statistical Inference for Psychological Research.” Psychological Review 70: 193–242.Find this resource:
Eells, E., and Fitelson, B. (2000). “Measuring Confirmation and Evidence.” Journal of Philosophy 97(12): 663–672.Find this resource:
Eells, E., and Fitelson, B. (2002). “Symmetries and Asymmetries in Evidential Support.” Philosophical Studies 107(2): 129–142.Find this resource:
Fisher, R. (1956). Statistical Methods and Scientific Inference (New York: Hafner).Find this resource:
Fisher, R. A. (1925). Statistical Methods for Research Workers (Edinburgh: Oliver & Boyd).Find this resource:
Fitelson, B. (1999). “The Plurality of Bayesian Measures of Confirmation and the Problem of Measure Sensitivity.” In Philosophy of Science, Vol. 66 (Chicago: University of Chicago Press), S362–S378.Find this resource:
(p. 208) Fitelson, B. (2001). Studies in Bayesian Confirmation Theory. PhD thesis, University of Wisconsin–Madison.Find this resource:
Fitelson, B., and Hawthorne, J. (2011). “How Bayesian Confirmation Theory Handles the Paradox of the Ravens.” In J. H. Fetzer and E. Eells (eds.), The Place of Probability in Science (New York: Springer), 247–275.Find this resource:
Gaifman, H., and Snir, M. (1982). “Probabilities Over Rich Languages, Testing and Randomness.” Journal of Symbolic Logic 47(3): 495–548.Find this resource:
Gemes, K. (1993). “HypotheticoDeductivism, Content and the Natural Axiomatisation of Theories.” Philosophy of Science 60: 477–487.Find this resource:
Gemes, K. (1998). “HypotheticoDeductivism: The Current State of Play; the Criterion of Empirical Significance: Endgame.” Erkenntnis 49(1): 1–20.Find this resource:
Glymour, C. (1980a). “HypotheticoDeductivism Is Hopeless.” Philosophy of Science 47(2): 322–325.Find this resource:
Glymour, C. (1980b). Theory and Evidence (Princeton, NJ: Princeton University Press).Find this resource:
Goldman, A. I. (1986). Epistemology and Cognition (Cambridge, MA: Harvard University Press).Find this resource:
Good, I. (2009). Good Thinking (Mineola, NY: Dover).Find this resource:
Good, I. J. (1967). “The White Shoe Is a Red Herring.” British Journal for the Philosophy of Science 17(4): 322.Find this resource:
Goodman, N. (1955). Fact, Fiction and Forecast (Cambridge, MA: Harvard University Press).Find this resource:
Hacking, I. (1965). Logic of Statistical Inference (Cambridge: Cambridge University Press).Find this resource:
Haenni, R., Romeijn, J. W., Wheeler, G., and Williamson, J. (2011). Probabilistic Logic and Probabilistic Networks (Berlin: Springer).Find this resource:
Harlow, L. L., Mulaik, S. A., and Steiger, J. H. (1997). What If There Were No Significance Tests? (Mahway, NJ: Erlbaum).Find this resource:
Hawthorne, J., and Fitelson, B. (2004). “ReSolving Irrelevant Conjunction with Probabilistic Independence.” Philosophy of Science 71: 505–514.Find this resource:
Heckerman, D. (1988). “An Axiomatic Framework for Belief Updates.” In J. F. Lemmer and L. N. Kanal (eds.), Uncertainty in Artificial Intelligence 2 (Amsterdam: NorthHolland), 11–22.Find this resource:
Hempel, C. G. (1945a). “Studies in the Logic of Confirmation [I].” Mind 54(213): 1–26.Find this resource:
Hempel, C. G. (1945b). “Studies in the Logic of Confirmation [II].” Mind 54(214): 97–121.Find this resource:
Howson, C. (1984). “Bayesianism and Support by Novel Facts.” British Journal for the Philosophy of Science 34: 245–251.Find this resource:
Howson, C., and Urbach, P. (2006). Scientific Reasoning: The Bayesian Approach, 3rd ed. (La Salle, IL: Open Court).Find this resource:
Hume, D. (1739). A Treatise of Human Nature (Oxford: Clarendon Press).Find this resource:
Hume, D. (1748). Enquiry Concerning Human Understanding (Oxford: Clarendon Press).Find this resource:
Jeffrey, R. C. (1965). The Logic of Decision, 2nd ed. (Chicago: University of Chicago Press).Find this resource:
Jeffrey, R. C. (1983). “Bayesianism with a Human Face.” In J. Earman (ed.), Testing Scientific Theories (Minneapolis: University of Minnesota Press), 133–156.Find this resource:
Kass, R. E., and Raftery, A. E. (1995). “Bayes Factors.” Journal of the American Statistical Association 90: 773–795.Find this resource:
Kemeny, J. G., and Oppenheim, P. (1952). “Degree of Factual Support.” Philosophy of Science 19: 307–324.Find this resource:
Kyburg, H. E. (1961). Probability and the Logic of Rational Belief (Middletown, CT: Wesleyan University Press).Find this resource:
(p. 209) Mayo, D. G. (1996). Error and the Growth of Experimental Knowledge (Chicago: University of Chicago Press).Find this resource:
Mayo, D. G., and Spanos, A. (2006). “Severe Testing as a Basic Concept in a NeymanPearson Philosophy of Induction.” British Journal for the Philosophy of Science 57: 323–357.Find this resource:
Milne, P. (1996). “log[P(h/eb)/P(h/b)] Is the One True Measure of Confirmation.” Philosophy of Science 63: 21–26.Find this resource:
Neyman, J., and Pearson, E. S. (1933). “On the Problem of the Most Efficient Tests of Statistical Hypotheses.” Philosophical Transactions of the Royal Society A 231: 289–337.Find this resource:
Nicod, J. (1961). Le problème logique de l’induction (Paris: Presses Universitaires de France).Find this resource:
Norton, J. D. (2003). “A Material Theory of Induction.” Philosophy of Science 70(4): 647–670.Find this resource:
Popper, K. (1954). “Degree of Confirmation.” British Journal for the Philosophy of Science 5: 143–149.Find this resource:
Popper, K. R. (1959). The Logic of Scientific Discovery (London: Hutchinson).Find this resource:
Popper, K. R. (1983). Realism and the Aim of Science (Towota, NJ: Rowman & Littlefield).Find this resource:
Ramsey, F. P. (1926). “Truth and Probability.” In D. H. Mellor (ed.), Philosophical Papers (Cambridge: Cambridge University Press), 52–94.Find this resource:
Royall, R. (1997). Statistical Evidence: A Likelihood Paradigm (London: Chapman & Hall).Find this resource:
Schurz, G. (1991). “Relevant Deduction.” Erkenntnis 35: 391–437.Find this resource:
Shafer, G. (1976). A Mathematical Theory of Evidence (Princeton, NJ: Princeton University Press).Find this resource:
Sober, E. (2008). Evidence and Evolution: The Logic Behind the Science (Cambridge: Cambridge University Press).Find this resource:
Spielman, S. (1974). “The Logic of Tests of Significance.” Philosophy of Science 41(3): 211–226.Find this resource:
Spohn, W. (1990). “A General NonProbabilistic Theory of Inductive Reasoning.” In R. D. Shachter, T. S. Levitt, J. Lemmer, and L. N. Kanal (eds.), Uncertainty in Artificial Intelligence 4, 149–158 (Amsterdam: Elsevier).Find this resource:
Sprenger, J. (2009). “Evidence and Experimental Design in Sequential Trials.” Philosophy of Science 76: 637–649.Find this resource:
Sprenger, J. (2010). “Hempel and the Paradoxes of Confirmation.” In D. M. Gabbay, S. Hartmann, and J. Woods (eds.), Handbook of the History of Logic, Vol. 10 (Amsterdam: NorthHolland), 235–263.Find this resource:
Sprenger, J. (2011). “HypotheticoDeductive Confirmation.” Philosophy Compass 6(7): 497–508.Find this resource:
Sprenger, J. (2013). “A Synthesis of Hempelian and HypotheticoDeductive Confirmation.” Erkenntnis 78: 727–738.Find this resource:
Sprenger, J. (2015a). “A Novel Solution of the Problem of Old Evidence.” Philosophy of Science 82: 383–401.Find this resource:
Sprenger, J. (2015b). “Two Impossibility Results for Measures of Corroboration.” Forthcoming in The British Journal for Philosophy of Science.Find this resource:
Tentori, K., Crupi, V., Bonini, N., and Osherson, D. (2007). “Comparison of Confirmation Measures.” Cognition 103: 107–119.Find this resource:
Vickers, J. (2010). “The Problem of Induction.” In E. N. Zalta (ed.), The Stanford Encyclopedia of Philosophy. Fall 2010 edition. Retrieved on December 28, 2015, at http://plato.stanford.edu/entries/inductionproblem/.Find this resource:
Whewell, W. (1847). Philosophy of the Inductive Sciences, Founded Upon Their History (London: Parker).Find this resource:
Williamson, J. (2010). In Defence of Objective Bayesianism (Oxford: Oxford University Press).Find this resource: