Causation in Science
Abstract and Keywords
This article discusses some philosophical theories of causation and their application to several areas of science. Topics addressed include regularity, counterfactual, and causal process theories of causation; the causal interpretation of structural equation models and directed graphs; independence assumptions in causal reasoning; and the role of causal concepts in physics. In connection with this last topic, this article focuses on the relationship between causal asymmetries, the time-reversal invariance of most fundamental physical laws, and the significance of differences among varieties of differential equations (e.g., hyperbolic versus nonhyperbolic) in causal interpretation. It concludes with some remarks about “grounding” special science causal generalizations in physics.
The subject of causation in science is vast, and any article-length treatment must necessarily be very selective. In what follows I have attempted, insofar as possible, to avoid producing yet another survey of the standard philosophical “theories” of causation and their vicissitudes. (I have nonetheless found some surveying inescapable—this is mainly in Section 3.) Instead, I have tried to discuss some aspects of this topic that tend not to make it into survey articles and to describe some new developments and directions for future research. My focus throughout is on epistemic and methodological issues as they arise in science, rather than on the “underlying metaphysics” of the causal relation.
The remainder of this article is organized as follows. After some orienting remarks (Section 2), Section 3 describes some alternative approaches to understanding causation. I then move on to a discussion of more specific ideas about causation and causal reasoning found in several areas of science, including causal modeling procedures (Sections 4 and 5), and causation in physics (Section 6).
There are controversies in philosophy and philosophy of science not only about which (if any) account of causation is correct but also about the role of causation (and, relatedly, causal explanation1) in various areas of science. For example, an influential strain of thought maintains that causal notions play little or no legitimate role in physics (Section 6). (p. 164) There has also been a recent upsurge of interest in (what are taken to be) noncausal forms of explanation, not just in physics but also in sciences like biology. A common theme (or at least undercurrent) in this literature is that causation (and causal reasoning) is less central to much of science than many have supposed. I touch briefly on this issue later but for purposes of this article baldly assert that this general attitude/assessment is wrong-headed, at least for areas of science outside of physics. There are indeed noncausal forms of explanation, but causal reasoning plays a central role in many of areas of science, including the social, behavioral, and biological sciences, as well as in portions of statistics, artificial intelligence, and machine learning. Philosophers of science should engage with this literature rather than ignore it or attempt to downplay its significance.
3 Theories of Causation
3.1 Regularity Theories
The guiding idea is that causal claims assert the existence of (or at least are “made true” by) a regularity linking cause and effect. Mackie’s (1974) INUS condition account is an influential example: C2 causes E if and only if C is a nonredundant part (where C is typically but not always by itself insufficient for E) of a sufficient (but typically not necessary) condition for E. The relevant notions of sufficiency, necessity, and nonredundancy are explicated in terms of regularities: short circuits S cause fires F, because S is a nonredundant part or conjunct in a complex of conditions (which might also include the presence of oxygen O), which are sufficient for F in the sense that S.O is regularly followed by F. S is nonredundant in the sense that if one were to remove S from the conjunct S.O, F would not regularly follow, even though S is not strictly necessary for F since F may be caused in some other way—for example, through the occurrence of a lighted match L and O, which may also be jointly sufficient for F. In the version just described, Mackie’s account is an example of a reductive (sometimes called “Humean”) theory of causation in the sense that it purports to reduce causal claims to claims (involving regularities, just understood as patterns of co-occurrence) that apparently do not make use of causal or modal language. Many philosophers hold that reductive accounts of causation are desirable or perhaps even required, a viewpoint that many nonphilosophers do not share.
As described, Mackie’s account assumes that the regularities associated with causal claims are deterministic. It is possible to construct theories that are similar in spirit to Mackie’s but that assume that causes act probabilistically. Theories of this sort, commonly called probabilistic theories of causation (e.g., Eells 1991), are usually formulated (p. 165) in terms of the idea that C causes E if and only if C raises the probability of E in comparison with some alternative situation in which C is absent:(3.1)
(It is far from obvious how to characterize the appropriate K, particularly in nonreductive terms, but I put this consideration aside in what follows.) Provided that the notion of probability is itself understood nonmodally—that is, in terms of relative frequencies—(3.1) is a probabilistic version of a regularity theory. Here what (3.1) attempts to capture is the notion of a positive or promoting cause; the notion of a preventing cause might be captured by reversing the inequality in (3.1).
A general problem with regularity theories, both in their deterministic and probabilistic versions, is that they seem, prima facie, to fail to distinguish between causation and noncausal correlational relationships. For example, in a case in which C acts as a common cause of two joint effects X and Y, with no direct causal connection between X and Y, X may be an INUS condition for Y and conversely, even though, by hypothesis, neither causes the other. Parallel problems arise for probabilistic versions of regularity theories.
These “counterexamples” point to an accompanying methodological issue: causal claims are (at least apparently) often underdetermined by evidence (at least evidence of a sort we can obtain) having to do just with correlations or regularities—there may be a number of different incompatible causal claims that are not only consistent with but even imply the same body of correlational evidence. Scientists in many disciplines recognize that such underdetermination exists and devise ways of addressing it—indeed, this is the primary methodological focus of many of the accounts of causal inference and learning in the nonphilosophical literature. Pure regularity or correlational theories of causation do not seem to address (or perhaps even to recognize) these underdetermination issues and in this respect fail to make contact with much of the methodology of causal inference and reasoning.
One possible response is that causal relationships are just regularities satisfying additional conditions—that is, regularities that are pervasive or “simple” in contrast to those that are not. Pursuing this line of thought, one comes naturally to the view that causal regularities either are or (at least) are “backed” or “instantiated” by laws where laws are understood as regularities meeting further conditions, as in the best systems analysis of law (Lewis 1999).
Quite apart from other difficulties, this proposal faces the following problem from a philosophy of science viewpoint: the procedures actually used in the various sciences to infer causal relationships from other sorts of information (including correlations) do not seem to have much connection with the strengthened version of the regularity theory just described. As illustrated later, rather than identifying causal relationships with some subspecies of regularity satisfying very general conditions concerning pervasiveness and so on, these inference techniques instead make use of much more specific assumptions linking causal claims to information about statistical and other (p. 166) kinds of independence relations, to experimentation, and to other sorts of constraints. Moreover, these assumptions connecting correlational information with causal conclusions are not formulated in terms of purely “Humean” constraints. Assuming (as I do in what follows) that one task of the philosopher of science is to elucidate and possibly suggest improvements in the forms that causal reasoning actually takes in the various sciences, regularity theories seem to neglect too many features of how such reasoning is actually conducted to be illuminating.
3.2 Counterfactual Theories
Another natural idea, incorporated into many theories of causation, both within and outside of philosophy, is that causal claims are connected to (and perhaps even reduce to) claims about counterfactual relationships. Within philosophy a very influential version of this approach is Lewis (1973). Lewis begins by formulating a notion of counterfactual dependence between individual events: e counterfactually depends on event c if and only if, (3.2) if c were to occur, e would occur; and (3.3) if c were not to occur, e would not occur. Lewis then claims that c causes e if and only if there is a causal chain from c to e: a finite sequence of events c, d, f…e, … such that d causally depends on c, f on d, … and e on f. (Lewis claims that this appeal to causal chains allows him to deal with certain difficulties involving causal preemption that arise for simpler versions of a counterfactual theory.) The counterfactuals (3.2) and (3.3) are in turn understood in terms of Lewis’ account of possible worlds: roughly “if c were the case, e would be the case” is true if and only if some possible worlds in which c and e are the case are “closer” or more similar to the actual world than any possible world in which c is the case and e is not. Closeness of worlds is understood in terms of a complex similarity metric in which, for example, two worlds that exhibit a perfect match of matters of fact over most of their history and then diverge because of a “small miracle” (a local violation of the laws of nature) are more similar than worlds that do not involve any such miracle but exhibit a less perfect match. Since, like the INUS condition account, Lewis aspires to provide a reductive theory, this similarity metric must not itself incorporate causal information, on pain of circularity. Using this metric, Lewis argues that, for example, the joint effects of a common cause are not, in the relevant sense, counterfactually dependent on one another and that while effects can be counterfactually dependent on their causes, the converse is not true. Counterfactual dependence understood in this “non-backtracking” way thus tracks our intuitive judgments about causal dependence.
Lewis’s similarity metric functions to specify what should be changed and what should be “held fixed” in assessing the truth of counterfactuals. For example, when I claim, that if (contrary to actual fact) I were to drop this wine glass, it would fall to the ground, we naturally consider a situation s (a “possible world”) in which I release the glass, but in which much else remains just as it is in the actual world—gravity still operates, if there are no barriers between the glass and ground in the actual world, this is also the case in s and so on.
(p. 167) As is usual in philosophy, many purported counterexamples have been directed at Lewis’s theory. However, the core difficulty from a philosophy of science perspective is this: the various criteria that go into the similarity metric and the way in which these trade-off with one another are far too vague and unclear to provide useful guidance for the assessment of counterfactuals in most scientific contexts. As a consequence, although one occasionally sees references in passing to Lewis’s theory in nonphilosophical discussions of causal inference problems (usually when the researcher is attempting to legitimate appeals to counterfactuals), the theory is rarely if ever actually used in problems of causal analysis and inference.3
Awareness of this has encouraged some philosophers to conclude that counterfactuals play no interesting role in understanding causation or perhaps in science more generally. Caricaturing only slightly, the inference goes like this: counterfactuals can only be understood in terms of claims about similarity relations among Lewisian possible worlds, but these are too unclear, epistemically inaccessible, and metaphysically extravagant for scientific use. This inference should be resisted. Science is full of counterfactual claims, and there is a great deal of useful theorizing in statistics and other disciplines that explicitly understands causation in counterfactual terms but where the counterfactuals themselves are not explicated in terms of a Lewisian semantics. Roughly speaking, such scientific counterfactuals are instead represented by (or explicated in terms of) devices like equations and directed graphs, with explicit rules governing the allowable manipulations of contrary to fact antecedents and what follows from these. Unlike the Lewisian framework, these can be made precise and applicable to real scientific problems.
One approach, not without its problems but that provides a convenient illustration of a way in which counterfactuals are used in a statistical context, is the potential outcomes framework for understanding causal claims developed by Rubin (1974) and Holland (1986) and now widely employed in econometrics and elsewhere. In a simple version, causation is conceptualized in terms of the responses to possible treatments imposed on different “units” ui. The causal effect of treatment t with respect to an alternative treatment t’ for ui is defined as where Yt(ui) is the value Y would have assumed for ui if it had been assigned treatment t and Yt’(ui) is the value Y would have assumed had ui instead been assigned treatment t’. (The definition of causal effect is thus explicitly given in terms of counterfactuals.) When dealing with a population of such units, and thinking of Yt(u) and Yt’(u) as random variables ranging over the units, the average or expected effect of the treatment can then be defined as E [Yt(u)–Yt’(u)]. No semantics of a Lewisian sort is provided for these counterfactuals, but the intuitive idea is that we are to think of Yt(u) and so on as measuring the response of u in a well-conducted experiment in which u is assigned t—this in turn (in my view) can be naturally explicated by appeal to the notion of an intervention, discussed later.
(p. 168) The absence of a semantics or truth conditions (at least of a sort philosophers expect) for the counterfactuals employed may seem unsatisfying, but in fact the previous characterization is judged by many researchers to be methodologically useful in several ways. For example, the “definitions” given for the various notions of causal effect, even if not reductive, can be thought of as characterizing the target to which we are trying to infer when we engage in causal inference. They also draw attention to what Rubin (1974) and Holland (1986) describe as the “fundamental problem” of causal inference, which is that for any given unit one can observe (at most) either Yt(ui) or Yt’(ui) but not both. If, for example, Yt(ui) but not Yt’(ui) is observed, then it is obvious that for reliable causal inference one requires additional assumptions that allow one to make inferences about Yt’(ui) from other sorts of information—for example, from the responses Yt’(uj) for other units ui≠uj. One can use the potential response framework to more exactly characterize the additional assumptions and information required for reliable inference to causal conclusions involving quantities like Yt(ui)–Yt’(ui), or E [Yt(u)–Yt’(u)]. For example, a sufficient (but not necessary) condition for reliable (unbiased) estimation of E [Yt(u)–Yt’(u)] is that the treatment variable be independent of the counterfactual responses Y(u). Another feature built into the Rubin-Holland framework is that (as is obvious from the previous definition of individual-level causal effect) causal claims are always understood as having a comparative or contrastive structure (i.e., as claims that the cause variable C taking one value in comparison or contrast to C taking some different value causes the difference or contrast between the effect variable’s taking one value rather than another). A similar claim is endorsed by a number of philosophers. As Rubin and Holland argue, we can often clarify the content of causal claims by making this contrastive structure explicit.
Interventionist or manipulationist accounts of causation can be thought of as one particular version of a nonreductive counterfactual theory, in some respects similar in spirit to the Rubin-Holland theory. The basic idea is that C (a variable representing the putative cause) causes E (a variable representing the effect) if and only if there is some possible intervention on C such that if that intervention were to occur, the value of E would change. Causal relationships are in this sense relationships that are potentially exploitable for purposes of manipulation. Heuristically, one may think of an intervention as a sort of idealized experimental manipulation that is unconfounded for the purposes of determining whether C causes E. (“Unconfounded” means, roughly, that the intervention is not related to C or E in a way that suggests C causes E when they are merely correlated.) This notion can be given a more precise technical definition that makes no reference to notions like human agency (Woodward 2003). An attraction of this account is that it makes it transparent why experimentation can be a particularly good way of discovering causal relationships.
One obvious question raised by the interventionist framework concerns what it means for an intervention to be “possible.” There are many subtle issues here that I lack space to address, but the basic idea is that the intervention operation must be well defined in the sense there are rules specifying which interventions the structure of the system of interest permits and how these are to be modeled. Which interventions are (p. 169) possible in this sense in turn affects the counterfactual and causal judgments we make. Consider a gas is enclosed in a cylinder with a piston that can either be locked in position (so that its volume is fixed) or allowed to move in a vertical direction.4 A weight rests on the piston. Suppose first (i) the piston is locked and the gas placed in a heat bath of higher temperature and allowed to reach equilibrium. It seems natural to say that the external heat source causes the temperature of the gas, and the temperature and volume together cause the pressure. Correspondingly, if the temperature of the heat bath or the volume had been different, the pressure would have been different. Contrast this with a situation (ii) in which the heat source is still present, the weight is on the piston, the piston is no longer fixed, and the gas is allowed to expand until it reaches equilibrium. Now it seems natural to say that the weight influences the pressure, and the pressure and temperature cause the new volume. Correspondingly, if the weight had been different, the volume would have been different. Thus the way in which various changes come about and what is regarded as fixed or directly controlled and what is free to vary (legitimately) influence causal and counterfactual judgment. The interventionist invocation of “possible” interventions reflects the need to make the considerations just described explicit, rather than relying indiscriminately on the postulation of miracles.
3.3 Causal Process Theories
The theories considered so far are all “difference- making” accounts of causation: they embody the idea that causes “make a difference” to their effects, although they explicate the notion of difference-making in different ways. In this sense, causal claims involve a comparison between two different possible states of the cause and effect, with the state of the effect depending on the state of the cause.
By contrast, causal process theories, at least on a natural interpretation, do not take difference-making to be central to causation. In the versions of this theory developed by Salmon (1984) and Dowe (2000), the key elements are causal processes and causal interactions. In Salmon’s version, a causal process is a physical process, such as the movement of a baseball through space, that transmits energy and momentum or some other conserved quantity in a spatiotemporally continuous way. Causal processes are “carriers” of causal influence. A causal interaction is the spatiotemporal intersection of two or more causal processes that involves exchange of a conserved quantity such as energy/momentum, as when two billiard balls collide. Causal processes have a limiting velocity of propagation, which is the velocity of light. Causal processes contrast with pseudo-processes such as the movement of the bright spot cast by a rotating search light inside a dome on the interior surface of the dome. If the rate of rotation and radius of the dome (p. 170) are large enough, the velocity of the spot can exceed that of light, but the pseudo-process is not a carrier of causal influence: the position of the spot at one point does not cause it to appear at another point. Rather, the causal processes involved in this example involve the propagation of light from the source, which of course respects the limiting velocity c.
Causal processes theories are often described as “empirical” or “physical” theories of causation. According to proponents, they are not intended as conceptual analyses of the notion of causation; instead the idea is that they capture what causation involves in the actual world. It may be consistent with our concept of causation that instantaneous action at a distance is conceptually possible, but as matter of empirical fact we do not find this in our world.
According to advocates, the notions of transmission and exchange/transfer of conserved quantities used to characterize causal processes and intersections can be elucidated without reference to counterfactuals or other difference-making notions: whether some process is a causal process or undergoes a causal interaction is a matter that can be read off just from the characteristics of the processes themselves and involves no comparisons with alternative situations. We find this idea in Ney (2009), who contrasts “physical causation,” which she takes to be captured by something like the Salmon/Dowe approach and which she thinks characterizes the causal relationships to be found in “fundamental physics” with difference-making treatments of causation, which she thinks are characteristic of the special sciences and folk thinking about causation.
One obvious limitation of causal process theories is that it is unclear how to apply them to contexts outside of (some parts) of physics in which there appear to be no analogues to conservation laws. (Consider “increases in the money supply cause inflation.”) One possible response, reflected in Ney (2009), is to embrace a kind of dualism about causation, holding that causal process theories are the right story about causation in physics, while some other, presumably difference-making approach is more appropriate in other domains. Of course this raises the question of the relationship, if any, between these different treatments of causation. Following several other writers (Earman 2014, Wilson forthcoming), I suggest in Section 6 that intuitions underlying causal processes theories are best captured by means of the contrast between systems governed by hyperbolic and other sorts (elliptical, parabolic) of differential equations. Since all such equations describe difference-making relationships, causal processes should be understood as involving one particular kind of difference-making relationship.
3.4 Causal Principles
So far we have been considering theories that attempt to provide “elucidations” or “interpretations” or even “definitions” of causal concepts. However, a distinguishable feature of many treatments of causation is that they make assumptions about how the notion of causation connects with other notions of interest—assumptions embodying constraints or conditions of some generality about how causes behave (at least typically), which may or may not be accompanied by the claim that these are part of “our concept” (p. 171) of causation. I call these causal principles. Examples include constraints on the speed of propagation of causal influence (e.g., no superluminal signaling) and conditions connecting causal and probabilistic relationships—such as the Causal Markov condition described later. Some accounts of causation are organized around a commitment to one or more of these principles—a prohibition on superluminal propagation is assumed in causal process theories, and the Causal Markov condition is assumed in many versions of probabilistic theories. On the other hand, several different interpretive accounts may be consistent with (or even fit naturally with) the same causal principle, so that the same causal principles can “live” within different interpretive accounts. For example, one might adopt an interventionist characterization of causation and also hold that, in contexts in which talk of causal propagation makes sense, causes understood on interventionist lines will obey a prohibition on superluminal signaling. Similarly, one might argue that causes understood along interventionist lines will, under suitably circumscribed conditions, obey the Causal Markov condition (cf. Hausman and Woodward 1999). Philosophers who discuss causation often focus more on interpretive issues than on causal principles, but the latter are centrally important in scientific contexts.
4 Causal Modeling, Structural Equations, and Statistics
Inference (involving so-called causal modeling techniques) to causal relationships from statistical information is common to many areas of science. Techniques are used for this purpose throughout the social sciences and are increasingly common in contemporary biology, especially when dealing with large data sets, as when researchers attempt to infer to causal relationships in the brain from fMRI data or to genetic regulatory relationships from statistical data involving gene expression.
Contrary to what is commonly supposed, such techniques do not, in most cases, embody a conception of causation that is itself “probabilistic” and do not provide a straightforward reduction of causation to facts about probabilistic relationships. Instead, we gain insight into these techniques by noting they employ two distinct sorts of representational structures. One has to do with information P about probabilistic or statistical relationships concerning some set of variables V. The other involves devices for representing causal relationships—call these C—among variables in V, most commonly by means of equations or directed graphs. The causal relationships C so represented are not defined in terms of the probabilistic relationships in P—indeed, these causal relationships are typically assumed to be deterministic. Instead, the problem of causal inference is conceptualized as a problem of inferring from P to C. Usually this requires additional assumptions A of various sorts that go beyond the information in P—examples include the Causal Markov and Faithfulness conditions discussed later. The role of these additional assumptions is one reason why the causal relationships in C (p. 172) should not be thought of as definable just in terms of the relationships in P. Depending on the details of the case, P in conjunction with A may allow for inference to a unique causal structure (the causal structure may be identifiable from P, given A) or (the more typical case) the inference may only yield an equivalence class of distinct causal structures.
As a simple illustration, consider a linear regression equation (4.1)(4.1) may be used merely to describe the presence of a correlation between X and Y but it may also be used to represent the existence of a causal relationship: that a change in X of amount dX causes a change of adY, in which case (4.1) is one of the vehicles for representing causal relationships C referred to earlier. (The convention is that the cause variable X is written on the right-hand side of the equation and the effect variable Y on the left-hand side.) U is a so-called error term, commonly taken to represent other unknown causes of Y besides X (or, more precisely, other causes of Y that do not cause X and that are not caused by X.) U is assumed to be a random variable, governed by some probability distribution. When (4.1) correctly represents a causal relationship, individual values or realizations of U, ui, combine with realizations of values xi of X to yield values yi of Y in accord with (4.1). Note that the stochastic or probabilistic element in (4.1) derives from the fact U is a random variable and not because the relationship between X and Y or between U and Y is itself chancy.
A simple inference problem arising in connection with (4.1) is to infer the value of a from observations of the values of X and Y as these appear in the form of an observed joint probability distribution Pr(X,Y), with Pr(X,Y) playing the role of P in the previous schema. A simple result is that if (4.2) the general functional form (4.1) is correct, and (4.3) the distribution of U satisfies certain conditions, the most important of which is that U is uncorrelated with X, then one may reliably estimate a from Pr(X,Y) via a procedure called ordinary least squares. Here (4.2) and (4.3) play the role of the additional assumptions A, which in conjunction with the information in Pr(X,Y) are used to reach a (causal) conclusion about the value of the coefficient a.
The example just described involves bivariate regression. In multivariate linear regression a number of different variables Xi causally relevant to some effect Y are represented on the right-hand side of an equation This allows for the representation of causal relationships between each of the Xi and Y but still does not allow us to represent causal relationships among the Xi themselves. The latter may be accomplished by means of systems of equations, where the convention is that one represents that variables Xj are direct causes of some other variable Xk by writing an equation in which the Xj occur on the right-hand side and Xk on the left-hand side. Every variable that is caused by other variables in the system of interest is represented on the left-hand side of a distinct equation. For example, a structure in which exogenous X1 (directly) causes X2 and X1 and X2 (directly) cause Y might be represented as(4.4) (p. 173)
Here X1 affects Y via two different routes or paths, one of which is directly from X1 to Y and the other of which is indirect and goes through X2. As with (4.1), one problem is that of estimating values of the coefficients in each of the equations from information about the joint probability distribution Pr (Y, X1, X2), and other assumptions including assumptions about the distribution of the errors U1 and U2. However, there is also the more challenging and interesting problem of causal structure learning. Suppose one is given information about Pr (Y, X1, X2) but does not know what the causal relationships are among these variables. The problem of structure learning is that of learning these causal relations from the associated probability distribution and other assumptions.
To describe some contemporary approaches to this problem, it will be useful to introduce an alternative device for the representation of causal relationships: directed graphs. The basic convention is that an arrow from one variable to another (X—>Y) represents that X is a direct cause (also called a parent [par]) of Y. (It is assumed that this implies that Y is some nontrivial function of X and perhaps other variables—nontrivial in that there are at least two different values of X that are mapped into different values of Y.) For example, the system (4.4) can be represented by the directed graph in Figure 8.1.
One reason for employing graphical representations is that it is sometimes reasonable to assume that there are systematic relationships between the graph and dependence/independence relationships in an associated probability distribution over the variables corresponding to the vertices of the graph. Suppose we have a directed graph G with a probability distribution P over the vertices V in G. Then G and P satisfy the Causal Markov condition (CM) if and only if
(CM) For every subset W of the variables in V, W is independent of every other subset in V that does not contain the parents of W or descendants (effects) of W, conditional on the parents of W.
CM is a generalization of the familiar “screening off” or conditional independence relationships that a number of philosophers (and statisticians) have taken to characterize the relationship between causation and probability. CM implies, for example, that if two joint effects have a single common cause, then conditionalizing on this common cause renders those effects conditionally independent of each other. It also implies that if X does not cause Y and Y does not cause X and X and Y do not share a common cause, then X and Y are unconditionally independent—sometimes called the principle of the common cause.
(F) Graph G and associated probability distribution P satisfy the faithfulness condition if and only if every conditional independence relation true in P is entailed by the application of CM to G.
(F) says that independence relationships in P arise only because of the structure of the associated graph G (as these are entailed by CM) and not for other reasons. This rules out, for example, a causal structure in which X affects Z by two different paths or routes, one of which is direct and the other of which is indirect, going through a third variable Z, but such that the effects along the two routes just happen to exactly cancel each other, so that X is independent of Z.
Although I do not discuss details here, these two assumptions can be combined to create algorithms that allow one to infer facts about causal structure (as represented by G) from the associated probability distribution P. In some cases, the assumptions will allow for the identification of a unique graph consistent with P; in other cases, they will allow the identification of an equivalence class of graphs that may share some important structural features. These conditions—particularly F—can be weakened in various ways so that they fit a wider variety of circumstances, and one can then explore which inferences they justify.
Both CM and F are examples of what I called causal principles. Faithfulness is clearly a contingent empirical assumption that will be true of some systems and not others. For some sorts of systems, such as those involving feedback or regulatory mechanisms in which the design of the system is such that when deviations from some value of a variable that arise via one route or mechanism some other part of the system acts so as to restore the original value, violations of faithfulness may be very common.
The status of CM raises deeper issues. There are a variety of conditions under which systems, deterministic or otherwise, may fail to satisfy CM, but these exceptions can be delimited and arguably are well understood.5 For systems that are deterministic with additive independent errors (i.e., systems for which the governing equations take the form , with the Ui, independent of each other and each Ui independent of par (Xi)), CM follows as a theorem (cf. Pearl 2000.) On the other hand, our basis for adopting these independence assumptions may seem in many cases to rely on assuming part of CM—we seem willing to assume the errors are independent to the extent that the Ui do not cause one another or par(Xi) and do not have common causes (a version of the common cause principle mentioned earlier). Nonetheless, CM is often a natural assumption, which is adopted in contexts outside of social science including physics, as we note later. In effect, it amounts to the idea that correlations among causally independent (“incoming” or exogenous variables) do not arise “spontaneously” or (p. 175) at least that such correlations are rare and that we should try to avoid positing them insofar as this is consistent with what is observed.
5 Stability, Independence, and Causal Interpretation
So far we have not directly addressed the question of what must be the case for a set of equations or a directed graph to describe a causal relationship. My own answer is interventionist: an equation like (4.1) correctly describes a causal relationship if and only if for some range of values of X, interventions that change the value of X result in changes in Y in accord with (4.1). Causal relationships involving systems of equations or more complicated graphical structures can also be understood in interventionist terms. The basic idea is that when a system like (4.4) is fully causally correct, interventions on the each of the variables Xj, setting such variables to some value k may be represented by replacing the equation in which Xj occurs as a dependent variable with the equation while the other equations in the system (some of which will have Xj as an independent variable) remain undisturbed, continuing to describe how the system will respond to setting . Each of the equations thus continues to hold according to its usual interventionist interpretation under changes in others—the system is in this sense modular.
Putting aside the question of the merits of this specifically interventionist interpretation, it is useful to step back and ask in a more general way what is being assumed about causal relationships when one employs structures like (4.1) and (4.4). A natural thought is that causal interpretation requires assumptions of some kind (however in detail these may be fleshed out) about the stability or independence of certain relationships under changes in other kinds of facts. At a minimum, the use of (4.1) to represent a causal relationship rests on the assumption that, at least for some range of circumstances, we can plug different values of X into (4.1) and it will continue to hold for (be stable across or independent of) these different values in the sense that we can use (4.1) to determine what the resulting value of Y will be. Put differently, we are assuming that there is some way of fixing or setting the value of X that is sufficiently independent of the relationship (4.1) that setting X via P does not upset whether (4.1) holds. The modularity condition referred to earlier similarly embodies stability or independence conditions holding across equations.
It seems plausible that we rely on such stability assumptions whenever, for example, we interpret (4.1) in terms of a relation of non-backtracking counterfactual dependence of Y on X: what one needs for such an interpretation is some specification of what will remain unchanged (the relationship [4.1], the values of variables that are not effects of X) under changes in X. As suggested earlier, in the contexts under discussion such information is provided by the equational or graphical model employed (and the rules for (p. 176) interpreting and manipulating such models, like those just described), rather than by a Lewis-style similarity metric over possible worlds. Similarly, if one wants to interpret equations like (4.1) in terms of an INUS style regularity account, one also needs to somehow fashion representational resources and rules for manipulating representations that allow one to capture the distinction between those features of the represented system that remain stable under various sorts of changes and those that do not.
These remarks suggest a more general theme, which is that in many areas of science claims about causal relationships are closely connected to (or are naturally expressed in terms of) independence claims, including claims about statistical independence, as in the case of CM, and claims about the independence of functional relationships under various changes—the latter being a sort of independence that is distinct from statistical independence. A closely related point is that independence claims (and hence causal notions) are also often represented in science in terms of factorizability claims, since factorizability claims can be used to express the idea that certain relationships will be stable under (or independent of) changes in others—the “causally correct” factorization is one that captures these stability facts. As an illustration, CM is equivalent to the following factorization condition:(5.1)
There are many different ways of factoring a joint probability distribution, but one can think of each of these as representing a particular claim about causal structure: in (5.1) each term can be thought of as representing a distinct or independent causal relationship linking Xj and its parents. This interpretation will be appropriate to the extent that, at least within a certain range, each will continue to hold over changes in the frequency of , and each of the can be changed independently of the others. (The causally correct factorization is the one satisfying this condition.) Physics contains similar examples in which factorization conditions are used to represent causal independence—for example, the condition (clustering decomposition) that the S-matrix giving scattering amplitudes in quantum field theory factorizes as the product of terms representing far separated (and hence causally independent) subprocesses (cf. Duncan 2012, 58).
A further illustration of the general theme of the relationship between causal and independence assumptions is provided by some recent work on the direction of causation in machine learning (Janzing et al. 2012). Suppose (i) Y can be written as a function of X plus an additive error term that is independent of (where _|_ means probabilistic independence). Then if the distribution is non-Gaussian, there is no such additive error model from Y to X—that is, no model in which (ii) X can be written as A natural suggestion is that to determine the correct causal direction, one should proceed by seeing which of (i) or (ii) holds, with the correct causal direction being one in which the cause is independent of the error. One can think of this as a natural expression of the ideas about the independence of (p. 177) “incoming” influences embodied in CM. Expressed slightly differently, if X→Y is the correct model, but not of additive error form, it would require a very contrived, special (and hence in some sense unlikely) relationship between P(X) and P(Y/X) to yield an additive error model of form Y→ X. Again, note the close connection between assumptions about statistical independence, the avoidance of “contrived” coincidences, and causal assumptions (in this case about causal direction).
In the case just described, we have exploited the fact that at least three causal factors are present in the system of interest—Y, X, and U. What about cases in which there are just two variables—X and Y—which are deterministically related, with each being a function of the other? Janzig et al. (2012) also show that even in this case (and even if the function from X to Y is invertible) it is sometimes possible to determine causal direction by means of a further generalization of the connection between causal direction and facts about independence, with the relevant kind of independence now being a kind of informational independence rather than statistical dependence. Very roughly, if the causal direction runs as X→Y, then we should expect that the function f describing this relationship will be informationally independent of the description one gives of the (marginal) distribution of X—independent in the sense that knowing this distribution will provide no information about the functional relationship between X and Y and vice versa. Remarkably, applications of this idea in both simulations and real-world examples in which the causal direction is independently known shows that the method yields correct conclusions in many cases. If one supposes that the informational independence of Pr(X) and X→Y functions as a sort of proxy for the invariance or stability of X→Y under changes in the distribution of X, this method makes normative sense, given the ideas about causal interpretation described previously.
6 Causation in Physics
Issues surrounding the role of causation/causal reasoning in physics have recently been the subject of considerable philosophical discussion. There is a spectrum (or perhaps, more accurately, a multidimensional space) of different possible positions on this issue. Some take the view that features of fundamental physical laws or the contexts in which these laws are applied imply that causal notions play little or no legitimate role in physics. Others take the even stronger position that because causal notions are fundamentally unclear in general, they are simply a source of confusion when we attempt to apply them to physics contexts (and presumably elsewhere as well.) A more moderate position is that while causal notions are sometimes legitimate in physics, they are unnecessary in the sense that whatever scientifically respectable content they have can be expressed without reference to causality. Still others (e. g., Frisch 2014) defend the legitimacy and even centrality of causal notions in the interpretation of physical theories. Those advocating this last position observe that even a casual look at the physics literature turns up plenty of references to “causality” and “causality conditions”. Examples include a micro-causality condition in quantum field theory that says that operators at spacelike (p. 178) separation commute and which is commonly motivated by the claim that events at such separation do not interact causally, and the clustering decomposition assumption referred to in Section 5, which is also often motivated as a causality condition. Other examples are provided by the common preference for “retarded” over “advanced” solutions to the equations of classical electromagnetism and the use of retarded rather than advanced Green’s functions in modeling dispersion relations. These preferences are often motivated by the claim that the advanced solutions represent “noncausal” behavior in which effects temporally precede their causes (violation of another causality condition) and hence should be discarded. Similarly, there is a hierarchy of causality conditions often imposed in models of general relativity, with, for example, solutions involving closed causal or timelike curves being rejected by some on the grounds they violate “causality.” Causal skeptics respond, however, that these conditions are either unmotivated (e.g., vague or unreasonably aprioristic) or superfluous in the sense that what is defensible in them can be restated without reference to any notion of causation.
A related issue concerns whether causal claims occurring in the special sciences (insofar as they are true or legitimate) require “grounding” in fundamental physical laws, with these laws providing the “truth-makers” for such claims. A simple version of such a view might claim that C causes E if and only if there is a fundamental physical law L linking Cs to Es, with C and E “instantiating” L. Although there is arguably no logical inconsistency between this grounding claim, the contention that causation plays no role in physics and the claim that causal notions are sometimes legitimately employed in the special sciences, there is at least a tension—someone holding all these claims needs to explain how they can be true together. Doing this in a plausible way is nontrivial and represents another issue that is worthy of additional philosophical attention.
6.1 Causal Asymmetry and Time-Reversal Invariance
Turning now to some arguments for little or no role for causation in physics, I begin with the implications of time-reversal invariance. With the exception of laws governing the weak force, fundamental physical laws, both in classical mechanics and electromagnetism, special and general relativity, and quantum field theories are time-reversal invariant: if a physical process is consistent with such laws, the “time-reverse” of this process is also permitted by these laws. For example, according to classical electromagnetism, an accelerating charge will be associated with electromagnetic radiation radiating outward from the charge. These laws also permit the time-reversed process according to which a spherically symmetric wave of electromagnetic radiation converges on a single charge which then accelerates—a process that appears to be rare, absent some special contrivances. A widespread philosophical view is that this time-symmetric feature of the fundamental laws is in some way inconsistent with the asymmetry of causal relationships or at least that the latter lacks any ground or basis, given the former. Assuming that an asymmetry between cause and effect is central to the whole notion of causation, this is (p. 179) in turn taken to show that there is no basis in fundamental physics for the application of causal notions.
This inference is more problematic than usually recognized. As emphasized by Earman (2011), the time-reversal invariant character of most of the fundamental equations of physics is consistent with particular solutions to those equations exhibiting various asymmetries—indeed, studies of many such equations show that “most” of their solutions are asymmetric (Earman 2011). An obvious possibility, then, is that to the extent that a causal interpretation of a solution to a differential equation is appropriate at all, it is asymmetries in these solutions that provide a basis for the application of an asymmetric notion of causation. In other words, causal asymmetries are to be found in particular solutions to the fundamental equations that arise when we impose particular initial and boundary conditions rather than in the equations themselves.
To expand on this idea, consider why the convergent wave in the previous example rarely occurs. The obvious explanation is that such a convergent process would require a very precise co-ordination of the various factors that combine to produce a coherent incoming wave. (cf. Earman 2011) On this understanding, the origin of this asymmetry is similar to the origin of the more familiar thermodynamic asymmetries—it would require a combination of circumstances that we think are unlikely to occur “spontaneously,” just as the positions and momenta of the molecules making up a gas that has diffused to fill a container are unlikely to be arranged in such a way that the gas spontaneously occupies only the right half of the container at some later time. Of course “unlikely” does not mean “impossible” in the sense of being contrary to laws of nature in either case, and in fact in the electromagnetic case one can readily imagine some contrivance—for example, a precisely arranged system of mirrors and lenses—that produces such an incoming wave.
The important point for our purposes is that the asymmetries in the frequencies of occurrence of outgoing, diverging waves in comparison with incoming, converging waves are not, so to speak, to be found in Maxwell’s equations themselves but rather arise when we impose initial and boundary conditions on those equations to arrive at solutions describing the behavior of particular sorts of systems. These initial and boundary conditions are justified empirically by the nature of the systems involved: if we are dealing with a system that involves a single accelerating point charge and no other fields are present, this leads to a solution to the equations in which outgoing radiation is present; a system in which a coherent wave collapses on a point charge is modeled by a different choice of initial and boundary conditions, which leads to different behavior of the electromagnetic field. If we ask why the first situation occurs more often than the second, the answer appears to relate to the broadly statistical considerations described earlier. Thus the time-reversal invariant character of the fundamental equations is consistent with particular solutions to those equations exhibiting various asymmetries, and although there may be other reasons for not interpreting these solutions causally, the time-reversal invariance of the fundamental equations by itself is no barrier to doing so. I suggest that what is true in this particular case is true more generally.
(p. 180) We may contrast this view with the implicit picture many philosophers seem to have about the relationship between causation and laws. According to this picture, the fundamental laws, taken by themselves, have rich causal content and directly describe causal relationships: the “logical form” of a fundamental law is something like(6.1)6
It is indeed a puzzle to see how to reconcile this picture with the time-reversal invariant character of physical laws. Part of the solution to this puzzle is to recognize that the laws by themselves, taken just as differential equations, often do not make causal claims in the manner of (6.1); again, causal description, when appropriate at all, derives from the way in which the equations and the choice of particular initial and boundary conditions interact when we find particular solutions to those equations.
If this general line of thought is correct, several additional issues arise. On the one hand, it might seem that the choice of initial and boundary conditions to model a particular system is a purely empirical question—either one is faced with a coherent incoming wave or not—and distinctively causal considerations need not be thought of as playing any role in the choice of model. One might argue on this basis that once one chooses an empirically warranted description of the system, one may then place a causal gloss on the result and say the incoming wave causes the acceleration but that the use of causal language here is unnecessary and does no independent work. On the other hand, it is hard not to be struck by the similarity between the improbability of a precisely coordinated incoming wave arising spontaneously and principles like CM described in Section 4. When distinct segments of the wave front of an outgoing wave are correlated, this strikes us as unmysterious because this can be traced to a common cause—the accelerating charge. On the other hand, the sort of precise coordination of incoming radiation that is associated with a wave converging on a source strikes us as unlikely (again in accord with CM) in the absence of direct causal relations among the factors responsible for the wave or some common cause, like a system of mirrors. On this view, causal considerations of the sort associated with CM play a role in justifying one choice of initial and boundary conditions over another or at least in making sense of why as an empirical mater we find certain sets of these conditions occurring more frequently than others.7
6.2 Kinds of Differential Equations
As noted previously, known fundamental physical laws are typically stated as differential equations. Differences among such equations matter when we come to interpret them (p. 181) causally. From the point of view of causal representation, one of the most important is the distinction between hyperbolic and other sorts (parabolic, elliptical) of partial differential equations (PDEs). Consider a second-order nonhomogeneous partial differential equation in two independent variables
where the subscripts denote partial differentiation: , and so on. A hyperbolic differential equation of this form is characterized by , while parabolic (respectively, elliptical) equations are characterized by (respectively, ). Hyperbolic PDEs are the natural way of representing the propagation of a causal process in time. For example, the wave equation in one spatial dimension
is a paradigmatic hyperbolic PDE that describes the propagation of a wave through a medium.
The solution domains for hyperbolic PDEs have “characteristic surfaces” or cone-like structures that characterize an upper limit on how fast disturbances or signals can propagate—in the case of the equations of classical electromagnetism these correspond to the familiar light-cone structure. In contrast, elliptical and parabolic PDEs have solution domains in which there is no such limit on the speed of propagation of disturbances. A related difference is that hyperbolic equations are associated with specific domains of dependence and influence: there is a specific region in the solution domain on which the solution at point P depends, in the sense that what happens outside of that region does not make a difference to what happens at P—thus in electromagnetism, what happens at a point depends only on what happens in the backwards light cone of that point. By contrast, elliptical PDEs such as the Laplace equation do not have specific domains of dependence. Instead, the domain of dependence for every point is the entire solution domain. Given this feature and the absence of limiting velocity of disturbance propagation there is, intuitively, no well-defined notion of causal propagation for systems governed by such equations.
Both Earman (2014) and Wilson (forthcoming) suggest that the appropriate way to characterize a notion of causal propagation (and related to this, a Salmon/Dowe–like notion of causal process) is in terms of systems whose behavior is governed by hyperbolic differential equations and that admit of a well-posed initial value formulation. This allows one to avoid having to make use of unclear notions such as “intersection,” “possession” of a conserved quantity, and so on. A pseudo-process can then be characterized just by the fact that its behavior is not described by a relevant hyperbolic PDE.
If this is correct, several other conclusions follow. First, recall the supposed distinction between difference-making and the more “scientific” or “physical” notion of causation associated with the Salmon/Dowe account. The hyperbolic PDEs used to characterize the notion of a causal process clearly describe difference-making (p. 182) relationships—one may think of them as characterizing how, for example, variations or differences in initial conditions will lead to different outcomes. So the difference-making/causal process contrast seems ill founded; causal processes involve just one particular kind of difference-making relation—one that allows for a notion of causal propagation.
Second, and relatedly, there are many systems (both those treated in physics and in other sciences) whose behavior (at least in the current state of scientific understanding) is not characterized by hyperbolic PDEs but rather by other sorts of equations, differential and otherwise. Unless one is prepared to argue that only hyperbolic PDEs and none of these other structures can represent causal relationships—a claim that seems difficult to defend—the appropriate conclusion seems to be that some representations of some situations are interpretable in terms of the notion of a causal process but that other representations are not so interpretable.8 In this sense the notion of a causal process (or at least representations in terms of causal processes) seems less general than the notion of a causal relationship. Even in physics, many situations are naturally described in causal terms, even though the governing equations are not of the hyperbolic sort.
I conclude with a final issue that deserves more philosophical attention and that has to do with the relationships among the mathematical structures discussed in this article. On a sufficiently generous conception of “based,” it does not seem controversial that the causal generalizations found in the special sciences (or, for that matter, nonfundamental physics) are in some way or other “based” or “grounded” in fundamental physics. However, this observation (and similar claims about “supervenience,” etc.) do not take us very far in understanding the forms that such basing relations might take or how to conceive of the relations between physics and the special sciences. To mention just one problem, many special science generalizations describe the equilibrium behavior of systems—the mathematics used to describe them may involve some variant of the structural equations of Section 4 (which are most naturally interpretable as describing relationships holding at some equilibrium; see Mooij et al. 2013) or else nonhyperbolic differential equations. These generalizations do not describe the dynamical or evolutionary processes that lead to equilibrium. By contrast, most fundamental physical laws, at least as formulated at present, take the form of hyperbolic differential equations that describe the dynamical evolution of systems over time. Understanding how these very different forms of causal representation fit together is highly nontrivial, and for this reason, along with others, the correct stories about the “grounding” of special science causal generalizations in physics is (when it is possible to produce it at all) likely to be subtle and complicated—far from the simple “instantiation” story gestured at earlier. This is just one aspect of the much bigger question of how causal representations and relationships in science at different levels and scales relate to one another.
(p. 183) Acknowledgements
I would like to thank Bob Batterman, John Earman, Mathias Frisch, Paul Humphreys, Wayne Myrvold, John Norton, and Mark Wilson for helpful comments and discussion.
Davidson, D. (1967). “Causal Relations.” Journal of Philosophy 64: 691–703.Find this resource:
Dowe, P. (2000). Physical Causation (Cambridge, UK: Cambridge University Press).Find this resource:
Duncan, A. (2012). The Conceptual Framework of Quantum Field Theory (Oxford: Oxford University Press).Find this resource:
Earman, J. (2011). “Sharpening the Electromagnetic Arrow(s) of Time.” In C. Callender (ed.), Oxford Handbook of the Philosophy of Time (Oxford: Oxford University Press), 485–527.Find this resource:
Earman, J. (2014). “No Superluminal Propagation for Classical Relativistic and Relativistic Quantum Fields.” Studies in History and Philosophy of Modern Physics 48: 102–108.Find this resource:
Eells, E. (1991). Probabilistic Causality (Cambridge, UK: Cambridge University Press).Find this resource:
Frisch, M. (2014). Causal Reasoning in Physics (Cambridge, UK: Cambridge University Press).Find this resource:
Hausman, D., Stern, R. and Weinberger, W. (forthcoming) “Systems Without a Graphical Representation.” Synthese.Find this resource:
Hausman, D., and Woodward, J. (1999). “Independence, Invariance and the Causal Markov Condition.” The British Journal for the Philosophy of Science 50: 521–583.Find this resource:
Heckman, J. (2005). “The Scientific Model of Causality.” Sociological Methodology 35: 1–97.Find this resource:
Holland, P. (1986) “Statistics and Causal Inference.” Journal of the American Statistical Association 81: 945–960.Find this resource:
Janzing, D., Mooij, J., Zhang, K., Lemeire, J., Zscheischler, J., Daniusis, D., Steudel, B., and Scholkopf, B. (2012).”Information-Geometric Approach to Inferring Causal Directions.” Artificial Intelligence 182–183: 1–31.Find this resource:
Lewis, D. (1973). “’Causation’ Reprinted with Postscripts.” In D. Lewis, Philosophical Papers, vol. 2 (Oxford: Oxford University Press), 32–66.Find this resource:
Lewis, D. (1999). Papers in Metaphysics and Epistemology (Cambridge, UK: Cambridge University Press).Find this resource:
Mackie, J. (1974). The Cement of the Universe (Oxford: Oxford University Press).Find this resource:
Mooij, J., Janzing, D., and Schölkopf, B. (2013). “From Ordinary Differential Equations to Structural Causal Models: The Deterministic Case.” In A. Nicholson and P. Smyth (eds.), Proceedings of the 29th Annual Conference on Uncertainty in Artificial Intelligence (Corvallis: AUAI Press), 440–448.Find this resource:
Ney, A. (2009). “Physical Causation and Difference-Making.” The British Journal for the Philosophy of Science 60(4): 737–764.Find this resource:
Pearl, J. (2000). Causality: Models, Reasoning, and Inference (Cambridge, UK: Cambridge University Press).Find this resource:
Rubin, D. (1974). “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 66: 688–701.Find this resource:
(p. 184) Salmon, W. (1984). Scientific Explanation and the Causal Structure of the World (Princeton, NJ: Princeton University Press).Find this resource:
Spirtes, P., Glymour, C., and Scheines, R. (2000). Causation, Prediction and Search (Cambridge, MA: MIT Press).Find this resource:
Wilson, M. (forthcoming). Physics Avoidance. Oxford: Oxford University Press.Find this resource:
Woodward, J. (2003). Making Things Happen: A Theory of Causal Explanation (New York: Oxford University Press).Find this resource:
(1) In what follows I do not sharply distinguish between reasoning involving causation and causal explanation. Think of a causal explanation as just an assembly of information about the causes of some explanandum.
(2) I use uppercase letters to describe repeatable types of events (or whatever one thinks the relata of causal relationships are) and lowercase letters to describe individual instances or tokens of such relata.
(4) An example of this sort is also described in Hausman, D., Stern, R. and Weinberger, N. (forthcoming). However, I would put the emphasis a bit differently from these authors: it is not that the system lacks a graphical representation but rather that the causal relationships in the system (and the appropriate graphical representation) are different depending on what is fixed, varying and so on.