Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).

Subscriber: null; date: 19 August 2018

# Non-Market Valuation: Stated Preference Methods

## Abstract and Keywords

The purpose of this article is to give a detailed description of the steps involved in designing a choice experiment and analyzing the responses. It also discusses a number of behavioral aspects of stated preference surveys, with an emphasis on hypothetical bias. It briefly presents the underlying economic model that is used to analyze discrete choices. The main idea of a choice experiment is often to estimate the welfare effects of changes in attributes. The article discusses the three important parts of the design of a stated preference survey, namely, definition of attributes and attribute levels, experimental design, and survey context, behavioral aspects, and validity tests. This article discusses the incentive properties of different choice formats, then looks at the empirical evidence on hypothetical bias, and finally at methods for reducing hypothetical bias. It mentions the importance of social context where the decision maker is not one single individual.

# 1 Introduction

Stated preference methods assess the value of goods and characteristics of goods by using individuals' stated behavior in a hypothetical setting. The method includes a number of different approaches such as conjoint analysis, contingent valuation, and choice experiments. There are two broad areas within food economics where stated preference methods are, and should be, used. The first relates to the public good aspects of the production and consumption of food. The second pertains to the demand and willingness to pay (WTP) for particular alternatives of food, or particular characteristics of food products. The main reasons for using a stated preference method instead of revealed preference methods such as actual market data are: (1) there are public good aspects (e.g., people care about other people's consumption), (2) the difficulty of disentangling preferences for different characteristics of goods using market data, and (3) not all levels of the characteristics exist in the market today. Stated preference methods have been used extensively in the area of food economics. Examples of study areas are preferences for genetically modified food products (e.g., Loureiro and Hine 2002; Lusk, Roosen, and Fox 2003; Lusk, Jamal, et al. 2005; Carlsson, Frykblom, and Lagerkvist 2007a); animal welfare (e.g., Carlsson, Frykblom, and Lagerkvist 2007b; Liljenstolpe 2008); and food safety (e.g., Hamilton, Sunding, and Zilberman 2003; Canavari, Novella, and Scarpa 2005).

The most established stated preference method is the contingent valuation method (CVM) method. In CVM studies, respondents are asked whether or not they would be willing to pay a certain amount of money for realizing a change in the level of a good, where most often the good is a public or quasi-public good (see, e.g., (p. 182) Mitchell and Carson 1989; Bateman and Willis 1999). However, in this chapter I shall focus the discussion on the choice experiment method, or stated choice method. In a choice experiment, individuals are given a hypothetical setting and asked to choose their preferred alternative among several alternatives in a choice set, and are usually asked to perform a sequence of such choices. Each alternative is described by a number of attributes or characteristics, and the levels of the attributes vary between choice sets. The reasons for focusing on the choice experiment method here is that it is a generalization of CVM, and that it is the stated preference method that is most extensively used in food economics. Most studies look at the influence of various characteristics of the food product on consumer behavior, and in that case choice experiments are more suitable. The major exception where CVM is used, and perhaps is more suitable, is in the case of bans on certain characteristics, for example genetically modified food products and labeling (see, e.g., Hamilton, Sunding, and Zilberman 2003).

The purpose of this chapter is to give a detailed description of the steps involved in designing a choice experiment and analyzing the responses. However, the econometric analysis is covered in detail by Adamowicz and Swait (Chapter 5 in this volume). I shall also discuss a number of behavioral aspects of stated preference surveys, with an emphasis on hypothetical bias.

# 2 An Economic Model of Behavior and Estimation Issues

## 2.1 The Economic Model

In this section I briefly present the underlying economic model that is used to analyze discrete choices. A more detailed discussion can be found in Adamowicz and Swait (Chapter 5 in this volume). Although most stated preference studies only deal with the discrete choice between several alternatives, the underlying economic model can actually deal with both the decision about which good to choose and how much to consume of the chosen good. Hanemann 1984 calls this a discrete/continuous choice. For example, a consumer decides first which type of meat to buy, and then how many kilograms to buy. However, the model I present here is only dealing with the discrete choice. Suppose that an individual k is faced with a choice between N mutually exclusive alternatives. Each alternative is described with a vector of attributes, ai, the price of each alternative is pi, and the exogenous income is Y. We assume that the individual wishes to maximize utility by choosing one of the available alternatives. Given a number of restrictions (see Adamowicz and Swait, Chapter 5 in this volume) the maximization problem can be expressed as

(p. 183) (1)

$Display mathematics$
where Vk is the indirect utility function. Thus, the individual chooses alternative i if and only if (2)
$Display mathematics$

In order to make the model operational, there are two additional things that have to be done. The first is to make an assumption about the functional form of the utility function. The second is to allow for unobservable (for the researcher) effects that could be due to unobserved characteristics of the individual, attributes that are not included, measurement error, and/or heterogeneity of preferences (Hanemann and Kanninen 1999). In order to allow for these effects, the Random Utility approach (McFadden 1974) is used to link the deterministic model with a statistical model of human behavior. We simply introduce an additive error term, so the conditional utility function is written as (3)

$Display mathematics$

Alternative i is chosen if and only if (4)

$Display mathematics$

Since the utility functions have a random element, we can rewrite the above condition in probability terms: (5)

$Display mathematics$

A thorough discussion of different specifications of the error terms are discussed in Adamowicz and Swait (Chapter 5 in this volume). What we shall discuss in more detail here is a number of estimation issues.

## 2.2 Estimation Issues

The first issue is the functional form of the utility function. The most common assumption is a utility function that is linear in the parameters. This is not as restrictive as it seems since a linear in parameters function can allow for non-linear effects of attributes and interaction terms with observable socioeconomic characteristics. One important property of discrete choice models is that only the differences in utility between alternatives affect the choice probabilities, not the absolute levels of utility. This means that not all parameters of the utility function can be estimated. For example, if there is no difference in the levels of an attribute between the alternatives, then the parameter for that attribute cannot be estimated. In other words, there must be a difference between the alternatives in order to estimate the parameter. Note that in a choice experiment the levels of an attribute could be equal in one or several of the (p. 184) choice sets. This would not mean that we could not estimate the parameter; i.e., there need not be a difference in all of the choice sets.

The property that there must be a difference between the alternatives also has implications for the possibility of including alternative-specific constants. An alternative-specific constant would capture the average effect on utility of attributes or factors that are not included. However, since only differences in utility matter, only differences in alternative-specific constants matter. A standard way of accounting for this is to normalize one of the constants to zero. In that case, the other constants would be interpreted as relative to the normalized constant.

The fact that only utility difference matters also has implications for how socioeconomic characteristics can enter the model. Socioeconomic characteristics are supposed to capture taste variation. One way of including them is to normalize the parameter of one of the alternatives to zero, just as in the case of the constant. The parameters that are estimated should then be interpreted as relative to the normalized parameter. Another interpretation of this approach is to see the socioeconomic characteristics as interacting with the alternative-specific constants. Finally, these characteristics could be made to interact with the attributes of the alternatives.

Note that it is not necessary to include alternative-specific constants. In particular, if the choice experiment has a generic design, it would actually not be reasonable to include constants.1 If we include constants in that case and they are significant, that would indicate that there is something else affecting respondents' choices that we have not been able to capture. With an alternative-specific design, constants should in general be included.

That only the difference in utility matters also has some important implications for how income should enter the utility function. The most common assumption is that utility is a linear function of income, so that the utility of alternative i is (6)

$Display mathematics$
where ai is the vector of attributes of alternative i, β is the corresponding parameter vector, and λ is the marginal utility of income. Since only the differences in utility affect the choice probabilities, income would not be included as an explanatory variable. The marginal utility of income, λ, is still estimable since each alternative implies a certain cost. If we want to include income, there are two alternatives. The first is to specify another functional form of the utility function, for example a quadratic income term. The second is to include the income variable by interacting the variable with another characteristic or alternative-specific constant. The reason for choosing the simple functional form where income enters linearly has mainly to do with calculation of welfare measures, which we will come back to.

In the literature, it has become increasingly common to allow also for unobserved heterogeneity through Mixed Logit and Latent-Class models (see, e.g., Train 2003; (p. 185) Adamowicz and Swait, Chapter 5 in this volume). In a Mixed Logit model, the researcher has to decide which parameters should be random and which fixed, and the distribution of the random parameters. The choice of distribution is not a straightforward task. In principle, any distribution could be used, but in previous applications the most common ones have been the normal and the lognormal distribution. Other distributions that have been applied are the uniform, triangular, and Raleigh distributions. However, before we make the choice about the distribution we must also determine which parameters should be randomly distributed and which parameters should be fixed. This choice can depend on several factors. For example, in many choice experiments it has been common to assume that the cost parameter is fixed. One reason for this is that the distribution of the marginal willingness to pay (MWTP) is then the distribution of the attribute. The other alternative would, of course, be to assume that only the parameter of the cost attribute is randomly distributed. This has been called a random marginal utility model (see von Haefen 2003). An alternative is to use a test procedure suggested by Mc Fadden and Train (Haefen 2000. With this test, artificial variables are constructed from a standard logit estimation (7)

$Display mathematics$
where Pjt is the conditional logit probability for alternative j in choice situation t. The logit model is then re-estimated with these artificial variables, and the test of whether a coefficient should be fixed or not is based on the significance of the coefficient of the artificial variable (see McFadden and Train 2000 for details).

Suppose now that we have determined the set of coefficients that should be randomly distributed. The next step is to specify a particular distribution for each of the random parameters. One might consider several aspects. For example, we might want to impose certain restrictions. The most natural one might be that all respondents should have the same sign of the coefficients. Of the previously discussed distributions, it is only the lognormal distribution that has this property. For example, if we assume that the cost coefficient is lognormally distributed, we ensure that all individuals have a nonpositive price coefficient.2 In this case, the lognormal coefficients have the following form: (8)

$Display mathematics$
where the sign of coefficient βk is determined by the researcher according to expectations, bk is constant and the same for all individuals, and vik is normally distributed across individuals with mean and variance equal to 0 and σk, respectively. This means (p. 186) that the coefficient has the following properties: (a) median = exp(bk); (b) mean = exp(bk + σk/2); and (c) $exp (bk+σk2/2)(exp(σk2)−1)0.5$. While the lognormal distribution seems like a reasonable assumption, there may be some problems with applying this distribution. First, this is the distribution that causes the most problems with convergence in model estimation. One reason for this is most likely that it puts a restriction on the preferences in terms of all respondents having the same sign of the coefficient. Another problem with the lognormal distribution is that the estimated welfare measures could be extremely high since values of the cost attribute close to zero are possible (see, e.g., Revelt and Train 1998). Therefore, the most common distribution assumption has been normal distribution, i.e., that the coefficient for the k-attribute is given by βk ~ N[bk, wk]. However, there is an increasing interest in other distributions, in particular, distributions where marginal WTP can be constrained to be nonnegative (Hensher and Greene 2003). One such distribution is a triangular distribution where the standard deviation parameter is constrained to be equal to the mean of the parameter.

There has recently been discussion in the literature over whether the random parameter model implies restrictive assumptions about the scale parameter, i.e., the standard deviation of the error term of the utility function (Train and Weeks 2005; Louviere 2006; Scarpa, Thiene, and Train 2008). For example, it has been almost standard to assume that the cost coefficient is fixed since this facilitates estimation and since the distribution of the MWTP is the distribution of the corresponding attribute. This implies an assumption that the scale parameter is the same for all respondents, which is questionable. As discussed by Train and Weeks (2005), there are, however, two problems with allowing the price coefficient to be randomly distributed. First, the distribution of the marginal WTP of the attribute could be intractable; for example, there could be a normal distribution of the attribute coefficient and a lognormal distribution of the cost coefficient. Second, uncorrelated utility coefficients could translate into marginal WTPs that are correlated. Therefore, Train and Weeks (2005) suggest a modeling strategy where the model is parameterized in terms of WTP instead of utility.3 This means that assumptions about the distribution of WTP are made. However, whether this specification of the model results in different WTPs than the standard model formulation is an empirical question (Scarpa 2008).

# 3 Welfare Measures

The main purpose of a choice experiment is often to estimate the welfare effects of changes in the attributes. In order to obtain these, researchers have generally assumed a simple functional form for the utility function by imposing a constant marginal utility (p. 187) of income, as described above. Again remember that we focus on purely discrete choices; this means that in some cases the welfare measures have to be interpreted with care. For example, in the case of a food product choice experiment, the welfare measures are per package or per kilogram, depending on what has been defined in the survey. Two types of welfare measure are commonly reported: (1) marginal WTP and (2) total WTP (Hanemann 1999; Louviere, Hensher, and Swait 2000).

## 3.1 Marginal Willingness to Pay

The simplest welfare measure that can be obtained from a choice experiment is marginal WTP (MWTP); this is the marginal rate of substitution between the attribute and money. Let us assume a simple linear utility function (9)

$Display mathematics$

The marginal rate of substitution between any of the attributes and money is then simply the ratio of the coefficient of the attribute and the marginal utility of income, found by totally differentiating (9) and rearranging: (10)

$Display mathematics$

MWTP shows how much money an individual is willing to sacrifice for a marginal change in the attribute. However, in many instances the attributes are not continuous; for example, the attribute could be a dummy variable indicating if the attribute is present or not. In that case, the ratio of the attribute coefficient and the marginal utility of money is strictly not a MWTP since we cannot talk about a marginal change of the discrete attribute. The interpretation of this WTP measure is instead the amount of money a respondent is willing to pay for a change in the attribute levels from, say, zero to one.

I shall now discuss two extensions of the estimation of MWTP. The first is non-linear utility functions. It is straightforward to allow for a non-linear utility function, but then the MWTP would have to be evaluated at a certain level of the attribute. Suppose we include a quadratic term of attribute a12, so that the utility function is (11)

$Display mathematics$

The marginal WTP for attribute ai is then (12)

$Display mathematics$

The MWTP thus depends on the level of the attribute, and we would have to decide at what values to calculate the WTP.

(p. 188) The second extension is to allow for observed heterogeneity in WTP. This can be done by interacting the attributes of the choice experiment with a set of socioeconomic characteristics. This way we would obtain the MWTP for different groups of people with a certain set of socioeconomic characteristics.4 of particular interest is perhaps the assumption of a constant marginal utility of money. Morey, Sharma, and Karlström (2003) suggest a simple approach where the utility is a piecewise linear spline function of income. For example, we could divide the sample into three groups: low-income, medium-income, and high-income respondents. For each of these groups we estimate a separate coefficient of marginal utility. This allows us to estimate separate MWTP expressions for each of these groups.

## 3.2 Total Willingness to Pay

With total WTP we mean willingness to pay to go from one alternative to another, or willingness to pay for a change in several attributes, or willingness to pay for one alternative compared with all other alternatives. The best examples in the case of food products are perhaps evaluation of bans or labels, or introduction of a new product. This is perhaps the best example of when CVM method is more suitable. If we are primarily interested in the total WTP for a particular scenario or for, say, a ban, then it is advisable to conduct a CVM survey instead of a choice experiment. However, in many cases we could be interested in obtaining both total WTP and MWTP.

When we discuss total WTP, it is useful to distinguish between generic and alternative-specific experiments. In a generic experiment, the alternatives are simply labeled A, B, C, or 1, 2, 3. In an alternative-specific experiment the alternative could be brand names, shops, different national parks. With an alternative-specific experiment, the choice between the alternatives potentially implies something more than a trade-off between the attributes we present. This will be revealed through the inclusion of alternative-specific constants. This in turn means that if a respondent is forced to make a choice between alternatives, she is not only sacrificing the attribute levels of the non-chosen alternatives, she is also potentially sacrificing other aspects as measured by the alternative-specific constants. If the experiment is generic, this is not an issue, and strictly we would not need to include alternative-specific constants in a generic experiment.

However, let us begin with the alternative-specific case. We assume a simple linear utility function (13)

$Display mathematics$
(p. 189) where the attribute vector includes an alternative-specific constant. Since marginal utility of income is constant, the ordinary and compensated demand functions coincide. The compensating variation (CV) is in general obtained by solving the equality (14)
$Display mathematics$
where a0 is the attribute vector before and a1 is the attribute vector after the change, and V is the unconditional utility function. The unconditional utility function can be written as (15)
$Display mathematics$
where N is the number of alternatives. Inserting this into the equality for the compensating variation we have: (16)
$Display mathematics$

Solving for the compensating variation, (17)

$Display mathematics$

Thus, the compensating variation is the difference between the utility after the change and that before the change, normalized by the marginal utility of income. However, since we do not know what choice the individual would make, we have the expected indirect utility in the expression. What remains is to find expressions for the expected value of the maximum indirect utility of the alternatives. In order to do this, we need to make an assumption about the error terms (see, e.g., Small and Rosen 1981; Hanemann 1984). If the error terms have an extreme value distribution, the expected value of the maximum value is the so-called log-sum value (or the inclusive value): (18)

$Display mathematics$

Therefore, the compensating variation is (19)

$Display mathematics$

Thus, the compensating variation is the difference in expected utility before and after the change, normalized by the marginal utility of income, where the expected utility is obtained using the log-sum formula.

For a generic experiment it does not make sense to think of a choice between several alternatives. Instead, the total WTP for a generic experiment is simply the sum of (p. 190) willingness to pay for each attribute change. Suppose we conduct a generic choice experiment with four attributes, including the cost attribute. We estimate the following utility function: (20)

$Display mathematics$

Based on the estimated model we wish to calculate the WTP for a change in all three attributes: Δa1, Δa2, and Δa3. The total WTP would then be (21)

$Display mathematics$

However, a generic choice experiment could also include an opt-out alternative, for example a no-purchase option. In that case, an alternative-specific constant for the opt-out should be included in the model. Let us denote the opt-out with the subscript 0. The utility function would then be (22)

$Display mathematics$

Whether or not to include the opt-out constant in the WTP depends on what we want to measure. If we want to know the WTP for a subject that is currently not choosing the opt-out, we would calculate the WTP as (23)

$Display mathematics$

If we want to know the WTP of a particular attribute combination for a subject that is currently choosing the opt-out, the WTP would be (24)

$Display mathematics$

This measure is thus the price that would make someone indifferent to buying.

# 4 Design of Stated Preference Survey

The actual design of a stated preference survey is in many respects the most crucial part of the whole exercise. If the study is not well designed, it does not matter what econometrics you apply or how well the interviews are done; the results will still be useless. I shall discuss three important parts of the design of a stated preference survey: (1) definition of attributes and attribute levels, (2) experimental design, and (3) survey context, behavioral aspects, and validity tests.

## (p. 191) 4.1 Definition of Attributes and Levels

The first step in the development of a choice experiment is to conduct a series of focus group studies aimed at selecting the relevant attributes. The focus group studies could be in terms of verbal protocols, group discussion, and actual surveys; see, e.g., Layton and Brown 1998 for a discussion of how to use focus groups for pretesting the question format and attributes. A starting point involves studying the attributes and attribute levels used in previous studies and their importance in the choice decisions. Additionally, the selection of attributes should be guided by the attributes that are expected to affect respondents' choices, as well as those attributes that are policy-relevant. This information forms the base for which attributes and relevant attribute levels should be included in the first round of focus group studies.

The task in a focus group is to determine the number of attributes and attribute levels. As a first step, the focus group studies should provide information about credible minimum and maximum attribute levels. Additionally, it is important to identify possible interaction effect between the attributes, for example if the preference for a particular attribute depends on the levels of another attribute. If we want to calculate welfare measures, it is necessary to include a monetary attribute such as a price or a cost. In such a case, the focus group studies will indicate the best way to present a monetary attribute. Credibility plays a crucial role; the researcher must ensure that the attributes selected and their levels can be combined in a credible manner. Hence, proper restrictions may have to be imposed (see, e.g., Layton and Brown 1998).

The focus group sessions should shed some light on the best way to introduce and explain the task of making a succession of choices from a series of choice sets. As Layton and Brown 1998 explain, choosing repeatedly is not necessarily a behavior that could be regarded as obvious for all goods. When it comes to recreational choices, for example, it is clear that choosing a site in a choice set does not preclude choosing another site given different circumstances. Furthermore, it must be made clear to the respondent that the choices are independent. This is particularly important when repeated choices are a natural part of the decision, such as with food purchases. However, in the case of public goods, such repeated choices might require further justification in the experiment, and in particular, it must be made clear that the choices are independent of each other.

A general problem with applying a stated preference survey is the choice between the amount of information given and obtained, and the complexity of the task. The ultimate goal is to obtained high-quality information about people's preferences. This means that we should ask a sufficient number of well-designed questions, but not ask so many questions that the quality of the responses decreases. To what extent the survey is difficult or not will clearly depend on the context and the subject pool. I shall discuss a number of these issues in Section 4.3, but I think one should not overstate the problem of complexity. Most respondents can handle a fair number of (p. 192) attributes, and make a number of choice tasks. Many experiments that obtain plausible results included four to eight attributes, and four to sixteen choice tasks.

## 4.2 Statistical Design

The statistical design has to do with the construction of the actual choice sets that respondents will face, or, in other words, how attribute levels will be combined. The reason why we have to care at all about this is that for most, if not all, choice experiments it is impossible to let all respondents answer all possible choice sets since the number of choice sets becomes too large. It is therefore necessary to reduce the number of choice sets that is included in the final design. However, this is not the only reason why we have to care about the statistical design.

There are a number of aspects that one should consider when performing the statistical design. The most important aspect is, of course, to assure that the effects that one wishes to estimate are identified, i.e., that the variation of the attribute levels allows us to estimate the parameters of the utility function. In practice we have to specify in advance which effects we want to be able to estimate. Since the design determines what effects are estimable, this is clearly one very important part of the design, and a part that is often overlooked. One common assumption is to estimate so-called main effects; this means that we assume that there are no interaction effects between the attributes. To make this important point more concrete, suppose we have an experiment with two non-monetary attributes and a cost attribute. With a main effects design we would then rule out that the preferences of one attribute depend on the levels of the other attribute.

The objective of an optimal statistical design is to extract the maximum amount of information from the respondents, subject to the number of attributes, attribute levels, and other characteristics of the survey such as cost and length of the survey. The central question is then how to select the attribute levels to be included in the stated preference experiment in order to extract maximum information from each individual. However, note that the following discussion will take the number of attribute levels and the actual levels as given. As we have seen, the choices regarding these two aspects are, of course, also important, not least with respect to what we can learn from the responses. For binary attributes, this is not really an issue. This could, for example, be an attribute describing whether genetically modified fodder has been used or not. For attributes with several levels, and in particular for continuous attributes such as the cost attribute, the choice of number of levels and the levels themselves is more important. The necessary number of levels for a continuous attribute depends on what we want to be able to estimate. For example, if we want to be able to estimate non-linear effects, we need to include more than two levels. On the other hand, if we only want to estimate a main effect, strictly what is needed is two levels. At the same time, it is not advisable, in my opinion, to include only two levels of a cost attribute. It is a risky strategy, since if (p. 193) the chosen levels are such that the respondent only cares about the cost (or totally disregards the cost), the responses will not provide much information.

A number of design principles have been discussed in the literature. The statistical aspect of the design has to do with the construction of the combinations and the variance-covariance matrix of the estimator. The purpose, from a statistical point of view, of the design is to minimize the “size” of the covariance matrix of the estimator, implying precise estimates of the parameters. One common measure of efficiency, which relates to the covariance matrix, is D-efficiency (25)

$Display mathematics$
where K is the number of parameters to estimate and Ω is the covariance matrix of the parameter estimates. Although there are several other criteria of efficiency such as A- and G-efficiency, which are all highly correlated, the main reason for choosing D-efficiency is that it is less computationally burdensome (see, e.g., Kuhfeld, Tobias, and Garratt 1994). Huber and Zwerina 1996 identify four principles for an efficient design of a choice experiment based on a non-linear model: (1) level balance, (2) orthogonality (the variation in attribute levels is uncorrelated), (3) minimal overlap, and (4) utility balance. A design that satisfies these principles has a maximum D-efficiency. The level balance criterion requires that the levels of each attribute occur with equal frequency in the design. A design has minimal overlap when an attribute level does not repeat itself in a choice set. Utility balance requires that the utility of each alternative in a choice set is equal.

In order to understand these principles and to show different design strategies that can be applied, I shall work with a small design example. Suppose we have three attributes: one with two levels and two with three levels: x1 = {0, 1}, x2 = {0, 1, 2, 3}, x3 = {0, 1, 2, 3}. The choice experiment is a binary choice experiment, i.e., each choice set consists of two alternatives. The total number of possible combinations, called the full factorial design, is 21 × 42 = 32. Note that this is the number of combinations for one alternative. In practice it would be possible to include all these combinations, but let us suppose that we wish to reduce the number of possible alternatives. We can then use an econometric package such as SAS or SPSS, or a special design program such as Ngene, to generate a fractional factorial design. A fractional design is a subset of the full factorial design. Suppose that for this particular problem we seek to generate a main effects fractional factorial design with eight combinations and that we do this by maximizing the D-efficiency (D-optimality), assuming that we have a linear design problem (I shall explain later on what I mean by this). This would result in the fractional design shown in Table 7.1.

This design fulfills two criteria: it is orthogonal and it has level balance. However, we have not designed the choice sets yet. All we have so far is a fractional design. A number of approaches can be used to create the choice sets, and I shall discuss only a few. One simple approach is to use a cyclical design (Bunch, Louviere, and Andersson 1996). A cyclical design is a simple extension of the orthogonal approach. First, each of the (p. 194) combinations in the orthogonal design is allocated to a choice set, and represents the first alternative in that set (so there are now eight choice sets). Attributes of the additional alternatives are then constructed by cyclically adding alternatives into the choice set based on the attribute levels. The attribute level in the new alternative is the next-higher attribute level to the one applied in the previous alternative, and if the highest level is attained, the attribute level is set to its lowest level. By construction, this design has level balance, orthogonality, and minimal overlap. The cyclical design for this example is presented in Table 7.2. Let us look at choice set number 4; here the attribute x1 is equal to one for the first alternative. The level of attribute x1 for alternative x2 is thus zero, since the attribute is a two-level attribute. Attribute x2 is equal to zero for alternative 1. This means that attribute x2 is equal to one in alternative 2. Finally, attribute x3 is also zero in the first alternative, and is thus equal to one in alternative 2.

Table 7.1 An orthogonal main effects design

Combination

x1

x2

x3

1

1

3

3

2

1

2

2

3

1

1

1

4

1

0

0

5

0

3

0

6

0

2

1

7

0

1

3

8

0

0

2

Table 7.2 A cyclical design

Choice set

Alternative 1

Alternative 2

x1

x2

x3

x1

x2

x3

1

1

3

3

0

0

0

2

1

2

2

0

3

3

3

1

1

1

0

2

2

4

1

0

0

0

1

1

5

0

3

0

1

0

1

6

0

2

1

1

3

2

7

0

1

3

1

2

0

8

0

0

2

1

1

3

An alternative is to create the choice sets simultaneously (see, e.g., Louviere 1988). Creating the choice sets simultaneously means that the design is selected from the collective factorial. The collective factorial is an L factorial, where C is the number of alternatives and each alternative has A attributes with L levels, so in our case with two alternatives the collective factorial is 22(alternatives)*1(attributes) × 42(alternatives)**2(attributes). (p. 195) Therefore, when we create the full factorial we specify, in our case, six attributes, instead of three. This design strategy is also very simple. One problem with this simultaneously design approach is that both the full and the fractional factorial can become very large. For our small design this is, however, not a problem.

A third alternative is to use a random design principle (see, e.g., Lusk and Norwood 2005). With this approach we would just randomly draw design combinations and combine them into choice sets. The set of design combinations that we draw from could be the full factorial design or a fractional factorial design such as the one presented in Table 7.1. Since it is difficult practically to let each individual face a unique combination of randomly drawn choice sets, it is often necessary to create a number of questionnaire versions that each contains a particular set of choice sets.

One thing that should be mentioned about designs is that often a final design is split into different versions, in particular if the number of alternatives and attributes is large; this is called blocking the design. The blocking could be done by random drawing or by applying exactly the same criteria again, i.e., orthogonality, level balance, and minimum overlap.

So far I have only touched upon the statistical aspects of the design, but have not talked at all about the fourth criterion of an optimal design: utility balance. The design I have used so far does not consider utility balance when generating the design. The reason for this is that for linear models utility balance does not matter for statistical efficiency. However, for non-linear discrete choice models it turns out that utility balance does matter for statistical efficiency, since the covariance matrix depends on the true parameters in the utility function. A series of papers have shown that designs that take utility balance into consideration produce lower standard errors than the orthogonal designs; see, e.g., Huber and Zwerina 1996, Zwerina, Huber, and Kuhfeld (1996), Sandor and Wedel 2001, Kanninen 2002, and Carlsson and Martinsson 2003. In order to illustrate their design approach it is necessary to return to the Multinomial Logit model. McFadden (1974) showed that the maximum likelihood estimator for the Conditional Logit model is consistent and asymptotically normally distributed with the mean equal to β and a covariance matrix given by (26)

$Display mathematics$

This covariance matrix, which is the main component in the D-optimal criteria, depends on the true parameters in the utility function, since the choice probabilities, Pin; depend on these parameters.

The utility balance aspect of the design problem closely resembles the problem of optimal design of the bid vector for closed-ended compensating variation surveys. A good statistical design is a function of the parameter vector, but we conduct the experiment to find the parameter. Thus, this is a Catch-22 problem. One solution is to do some sort of sequential design, where a number of pilot studies are conducted before (p. 196) the main study. This, of course, is costly and not without problems. However, it is important to understand that, irrespective of the choice of statistical design, information regarding respondents' preferences for the attributes is needed. For example, if the attribute levels are assigned in such a way that the choice depends only on the level of a certain attribute in the choice set, not much information is extracted. Another example would be choice sets where one alternative completely dominates the rest. In order to avoid situations of these types, some prior information is needed, relating not to the difference between the design strategies, but to the way in which the information is entered into the creation of the design.

However, there has also been criticism of utility-balanced design principles; see, e.g., Kanninen 2002 and Louviere et al. (2008). Think of the extreme case where two alternatives are almost identical in utility terms, which means that it is a difficult choice to make if we really try to assess and compare their utility. It also means that the choice we make is not that important to us, since the utility loss is small if we choose the “wrong” alternative. Consequently, there is a high risk that the responses to such a choice would be almost random and would thus not provide us with much information. Thus, there is a clear potential drawback to utility-balanced designs. On the other hand, we could argue that asking respondents to make choices between dominating or almost dominating alternatives would also give us very little information about their preferences. Consequently, the second alternative is also not very appealing (and, of course, no one has argued that one should strive for dominating alternatives).

One alternative approach would be to eliminate beforehand choice sets that are strictly dominating and, more importantly, choice sets that contain alternatives that are highly dominating although not strictly dominating. This begs the question, what is a highly dominating alternative? There is, of course, no simple rule. One strategy could be to collect prior information about the parameters of the utility function and then use this prior information to compute choice probabilities and, for example in the case of a binary experiment, exclude all choice sets where the choice probability is higher than 0.9 for an alternative. Clearly, the choice of cutoff point is arbitrary and this approach would still require an actual estimation of priors on the parameters of the utility function.

A final problem with the design has to do with how the levels of the attributes can be combined in practice. Some attributes might be correlated, but at the same time, we have to remember that one purpose of the statistical design is to get rid of the correlation, and the real world correlation is an argument for using stated preference data instead of revealed preference data.

## 4.3 Survey Context, Behavioral Aspects, and Validity Tests

In the previous section, I addressed optimal design of a choice experiment from a statistical perspective. However, in empirical applications there may be other issues to (p. 197) consider in order to extract the maximum amount of information from respondents. The first issue is the context of the choice experiment. Most stated preferences surveys relating to food products put the respondent in a situation where he or she can choose to buy only one alternative, most often in a specific quantity. With this setting, the respondent cannot express any preference for the consumption choices of others, since his or her choices will not affect the choice sets of the others. An example where respondents might care about the choices and consumption of others is genetically modified (GM) food products. A consumer may prefer GM-free products owing to what Antle 1999 calls extrinsic, or public good, quality. Thus, the consumer cares about the production process even if it does not affect product quality, for example owing to animal, environmental, ethical, or religious reasons. This in turn implies that a consumer might prefer a ban rather than mandatory labeling (Carlsson, Frykblom, and Lagerkvist 2007a). In order to measure this type of preference, the choice experiment would need to be framed somewhat differently. One alternative is to put the respondent in a situation where he or she can choose to buy only one alternative, but explain that the choice restricts the alternatives available for others as well. Another alternative is to frame the experiment such that the respondent is to choose between different “states of the world.”

One issue to consider in the development of the questionnaire is whether to include a base case scenario or an opt-out alternative. This is particularly important if the purpose of the experiment is to calculate welfare measures. If we do not allow individuals to opt for a status quo alternative, this may distort the welfare measure for non-marginal changes. This decision should, however, be guided by whether or not the current situation and/or non-participation is a relevant alternative. A non-participation decision can be econometrically analyzed by, for example, a Nested Logit model with participants and non-participants in different branches (see, e.g., Blamey et al. 2000). A simpler alternative is to model non-participation as an alternative where the levels of the attributes are set to the current attribute levels.

Another issue is whether to present the alternatives in the choice sets in a generic (alternatives A, B, C) or alternative-specific form (Coca Cola, Pepsi, etc.). Blamey et al. (2000) discuss advantages of these two approaches and compare them in an empirical study. An advantage of using alternative-specific labels is familiarity with the context and hence the cognitive burden is reduced. However, the risk is that the respondent may not consider trade-offs between attributes. This approach is preferred when the emphasis is on valuation of the labeled alternatives. An advantage of the generic model is that the respondent is less inclined to consider only the label and thereby will focus more on the attributes. Therefore, this approach is preferred when the emphasis is on the marginal rates of substitution between attributes.

Many decisions regarding food consumption affect several members of the household, and presumably many decisions are made jointly. Surprisingly few stated preference studies have looked at the household valuation of products and public goods (see, e.g., Arora and Allenby 1999; Dosman and Adamowicz 2006; Bateman and Munro 2009; Beharry-Borg, Hensher, and Scarpa 2009). Even if the survey is not designed to (p. 198) investigate household decision-making, it is necessary to describe in the scenario the intended setting; for example, are the decisions made for the household or not? However, stated preference surveys could also be used to explore the relationship between individual and household decision-making. Beharry-Borg, Hensher, and Scarpa (2009) present an analytical framework for analyzing joint and separate decisions made by couples. The advantages of using a survey, instead of real purchase data, is that we can obtain data for both individual and joint choices by interviewing couples both individually and jointly.

There are a number of aspects of respondent behavior that affect the design of the survey and the choice of internal validity tests. The broadest issue is perhaps task complexity. This is determined by factors such as the number of choice sets presented to the individual, the number of alternatives in each choice set, the number of attributes describing those alternatives, and the correlation between attributes for each alternative (Swait and Adamowicz 1996). In complex cases, respondents may simply answer carelessly or use some simplified lexicographic decision rule. This could also arise, for example, if the levels of the attributes are not sufficiently differentiated to ensure trade-offs. In practice, it is difficult to separate this behavior from preferences that are genuinely lexicographic, in which case the respondents have a ranking of the attributes but the choice of an alternative is based solely on the level of their most important attribute. Genuine lexicographic preferences in a choice experiment are not a problem, although they provide us with little information in the analysis compared to the other respondents. However, if a respondent chooses to use a lexicographic strategy because of its simplicity, systematic errors are introduced, which may bias the results. One strategy for distinguishing between different types of lexicographic behavior is to use debriefing questions, where respondents are asked to give reasons why they, for example, focused on only one or two of the attributes in the choice experiment (De Shazo and Fermo 2002; Hensher, Rose, and Greene 2005).

One interesting aspect of choice experiments is the possibility of building in internal and external consistency tests. By internal consistency tests I mean within-subject tests, and by external consistency tests I mean between-subject tests. There are both advantages and disadvantages to internal and external tests. For example, with an internal test we do not have to control for potential differences in the samples, but at the same time internal tests could be seen as a weaker form of test since they could be perceived as simple “rationality” tests of the respondent. However, as we shall see, some of these tests are still better suited to internal tests. I shall look briefly at two validity tests: transitivity and stability of preferences.

In order to test for transitive preferences, we have to construct such a test. For example, in the case of a pairwise choice experiment we have to include three specific choice sets: (1) Alt. 1 versus Alt. 2, (2) Alt. 2 versus Alt. 3, and (3) Alt. 1 versus Alt. 3. For example if the respondent chooses Alt. 1 in the first choice set and Alt. 2 in the second choice set, then Alt. 1 must be chosen in the third choice if the respondent has transitive preferences. Carlsson and Martinsson 2001 conducted tests of transitivity and they did not find any strong indications of violations.

The standard assumption in stated preference surveys is that the utility function of each individual is stable throughout the experiment. The complexity of the exercise might cause violations of this assumption, arising from learning and fatigue effects: “learning” in the sense of learning the preferences (Plott 1996) or learning the institutional setup (Bateman et al. 2008). The issue of learning and stability of preferences is not as simple as it seems. There is a lot of evidence that people's preferences are formed through repeated interactions (List 2003). If we conduct a stated preference survey on a good involving attributes that are not that familiar to the respondent, there is a risk that the preferences are not really formed before the survey situation. A test of the stability of the preferences could then reveal that the preferences are not stable. It is then not obvious that this is related to problems with the method itself; instead it could be that preferences are indeed constructed as the respondent goes through the survey. However, there is a potential counteracting effect to learning, and that is coherent arbitrariness (Ariely, Loewenstein, and Prelec 2008). This means that individuals' choices are often internally coherent, but at the same time they can be strongly anchored to some initial starting point. This is equivalent to starting point bias in stated preference surveys (Herriges and Shogren 1996; Ladenburg and Olsen 2008). The empirical evidence on stability of preferences is mixed. Johnson, Matthews, and Bingham (2000) test for stability by comparing responses to the same choice sets included (p. 200) both at the beginning and at the end of the experiment. They find a strong indication of instability of preferences. However, there is a potential problem of confounding effects of the sequencing of the choice sets and the stability of the preferences. An alternative approach, without the confounding effect, is applied in Carlsson and Martinsson 2001 in a choice experiment on donations to environmental projects. In their exercise, half of the respondents received the choice sets in the order {A, B} and the other half in the order {B, A}. A test for stability was then performed by comparing the preferences obtained for the choices in subset A, when it was given in the sequence {A, B}, with the preferences obtained when the choices in subset A were given in the sequence {B, A}. This could then be formally tested in a likelihood ratio test between the pooled model of the choices in subset A and the separate groups. A similar test could be performed for subset B. By using this method Carlsson and Martinsson 2001 found only a minor problem with instability of preferences. Layton and Brown 2000 conducted a similar test of stability in a choice experiment on policies for mitigating impacts of global climate change; they did not reject the hypothesis of stable preferences. Bryan et al. (2000) compared responses in the same way, but with the objective of testing for reliability, and found that 57 percent of the respondents did not change their responses when given the same choice set in a two-part choice experiment. Furthermore, in an identical follow-up experiment two weeks after the original experiment, 54 percent of the respondents made the same choices on at least eleven out of twelve choice situations.

# 5 Hypothetical Bias

One important issue is whether individuals actually would do what they state they would do if it were for real. There has been an extensive discussion about the possibility of eliciting preferences, both for private and for public goods, with stated preference methods and the extent of hypothetical bias. With hypothetical bias we mean the bias introduced by asking a hypothetical question and not confronting the respondent with a real situation. I shall begin this section with a discussion about the incentive properties of different choice formats, then I shall look at the empirical evidence on hypothetical bias, and finally at methods for reducing hypothetical bias.

In assessing this bias, it is important to distinguish between private and public goods, as well as between goods with essentially no nonuse values and goods with a non-negligible proportion of nonuse values. As we shall see, there are reasons to believe that there are particular problems with measuring nonuse values in hypothetical surveys. This is not, of course, saying that nonuse values should not be measured; rather, that there are some inherent problems with measuring these values. The reason for this is that nonuse values are to varying extents motivated by “purchase of moral satisfaction” (Kahneman and Knetsch 1992) and “warm glow” (Andreoni 1989), and that they often involve an “important perceived ethical dimension” (p. 201) (Johansson-Stenman and Svedsäter 2008). Note that we are not saying that nonuse values are a result of stated preference surveys, but that it is particularly difficult to measure nonuse values.

## 5.1 Incentives for Truthful Revelation of Preferences

I shall begin by outlining the discussion and arguments made by Carson et al. (1999) and Carson and Groves 2007. It is important to be aware of their assumptions and what aspects of individual behavior they are not investigating. The basic premise behind their work is the assumption that individuals behave strategically when responding to a survey. Furthermore, they start from the assumption of a so-called consequential survey, which is defined as one that is perceived by the respondent as something that may potentially influence agency decisions, as well as one where the respondent cares about the outcome. This means that they rule out purely hypothetical surveys. Indeed, they argue that economics has essentially nothing to say about purely hypothetical surveys. Do note that one core result of their research is that the probability that the survey is consequential does not affect the incentive properties; the only thing that is required is that this probability is positive. Finally, the incentive properties for any type of stated preference question also depends on (1) the type of good, i.e., is it a private or a public good? and (2) the payment mechanism, i.e., is the payment coercive or voluntary?

The result of Carson and Groves 2007 is essentially negative: most question formats that we use are not incentive-compatible. They argue that essentially the only incentive-compatible format is a binary discrete choice, and this in turn is only incentive-compatible for the cases of (1) a new public good with coercive payments, (2) the choice between two public goods, and (3) a change in an existing private or quasi-public good. I shall not go through all the results of their research; instead I shall focus on an understanding of the intuition of their results and the implications of them. The best way to understand this is to work with a simple example, which is not intended to cover all the different possibilities covered by Carson and Groves. I shall assume that the survey is consequential. Suppose we wish to value a public good that can be described by a set of attributes. The utility of a particular attribute combination is V (p, qi, Y). Suppose that an individual is confronted with a binary choice between two levels of a public good: q0 and q1, and that the initial level of the public good is the status quo. The new level of the public good is associated with a bid level of tk. The respondent is asked to say yes or no to the bid and the new level of the public good. Given no uncertainty and assuming that the proposed bid is the actual cost, the optimal outcomes are (p. 202) (27)

$Display mathematics$

The question is whether this mechanism is incentive-compatible. In order to make this as simple as possible, suppose that the actual cost the respondent would face is equal to the bid and that the respondent believes this.5 Furthermore, assume that there are no other effects on the utility of the outcome and of the responses in the survey than those described. So the respondent does not care about anything other than the attributes of the good, although this, of course, could involve nonuse values. What is ruled out is, for example, a utility of the act of responding in the survey. For example, the respondent does not receive any utility from pleasing the interviewer, nor receives any utility from feeling good by responding in a certain perhaps ethical way. We shall come back to these issues later on. Given these restrictions, would a respondent gain anything from not telling the truth? The answer is no. Suppose that the respondent prefers no change, then the best response for that individual is no, and vice versa. The easiest way to understand this is perhaps to think: what would the person gain from not answering truthfully? Let us take the case of a respondent who prefers no change. What would that person gain from answering yes to the valuation question? By answering yes the probability that the public good is provided increases, but the respondent does not want that, at least not given the cost. Of course, this argument critically depends on all the assumptions we have discussed.6

Let us now look at a simple choice experiment. Suppose we ask respondents to choose between three alternatives, with the following utilities: (28)

$Display mathematics$

Let us assume that V1V2V3 for a particular respondent. The question is, will this respondent always choose alternative 1? The answer is no. This can be seen as a voting situation between three candidates where the winner is the candidate that receives the highest number of votes. From the literature we now that this type of situation can create voting cycles and paradoxes. For example, suppose that it is only a small fraction of the population that prefers alternative 1, and that is common knowledge. A respondent who prefers this alternative would then have incentives to choose alternative 2 instead, since that would increase the probability that alternative 2 wins (p. 203) over alternative 3. This is the basic reasoning behind why choice experiments are not incentive-compatible for a public good.

The incentives are similar for a private good. The main difference is that it is possible that more than one alternative is provided. This does not change the result to any large extent unless all but one alternative are provided. In an actual experiment, respondents would not know the number of alternatives that will be provided. Carson and Groves 2007 argue that this will result in respondents choosing alternatives that are the most preferred, or close to the most preferred, as long as they believe many alternatives will be provided, the difference in utility between the alternatives is small, or they have little information about other respondents' preferences. Carson and Groves further argue that this is mainly a problem when we want to estimate the total WTP. If the main interest is MWTP, or marginal trade-offs between attributes in general, this is less problematic. The reason is that the scale parameter cancels out when we make marginal comparisons.

The above discussion is, of course, very simplified. Three things are worth mentioning, though none of them changes the basic results of incentive compatibility or incompatibility. On the other hand, the things we shall discuss will perhaps put the issue of incentive compatibility into perspective. The first aspect has to do with uncertainty. For example, what if the respondent is uncertain about whether the proposed bid in a closed-ended question is the cost he or she would face? As mentioned, that would clearly affect the properties, since the respondent would use the expected cost as the basis for the decisions and not the proposed bid. In the case of a choice experiment, the situation is even more complex. Here the respondent also needs information about others' preferences and how they would choose. Alpizar, Carlsson, and Martinsson (2003) discuss in detail how uncertainty about others' preferences affects the behavior of the respondents in a choice experiment. Three straightforward and important conclusions can be drawn from their analysis. First, introducing imperfect information does not ensure that the degree of strategic behavior is reduced. It may well be the case that respondents form such expectations so that they act strategically even if they would not have done so with perfect information. Second, using a generic (no labels) presentation of the alternatives instead of an alternative-specific (labels) form probably reduces the risk of strategic behavior since it increases the complexity of forming expectations regarding other respondents' preferences. Third, it is generally advisable to introduce uncertainty explicitly into the choice experiment. This can be done by saying that there is uncertainty regarding individuals' preferences for the alternatives and the attributes.

The second aspect that needs to be mentioned when discussing incentive compatibility is that there can be other effects on the respondent's utility than those specified in the scenario. Perhaps the most classical example is to please the interviewer. By giving a certain answer that one thinks is the one the interviewer wants, the respondent is better off. Similarly, by acting in a certain way, for example expressing environmentally friendly preferences or certain ethical preferences, the respondent is better off. This is also, of course, the underlying reason for actual behavior. However, the survey (p. 204) situation could be seen as a cheap way of “purchasing moral satisfaction” (Kahneman and Knetsch 1992) or receiving a “warm glow” (Andreoni 1989). The fact that the survey is perceived as a consequential survey does not change this fact. Values of this type are more likely to occur for goods with nonuse values, but at the same time the very reasons why we conduct stated preferences is that the goods have nonuse values. There are, of course, other reasons for conducting stated preference surveys, but if we want to measure nonuse values, stated preferences is the only alternative.

## 5.2 The Empirical Evidence

There is a vast literature testing the performance of stated preference methods with respect to how well they mimic real behavior. As we have discussed, it is important to distinguish between tests involving private goods, such as choice of mode of transport, and tests involving public goods, such as environmental goods. In transportation economics there have been a number of tests of the external validity of stated preference methods. These tests largely concern choice experiments or similar methods such as conjoint analysis and different ranking and rating formats. The validity tests are either comparative studies with both hypothetical choice/ranking data and revealed preference data (e.g., Benjamin and Sen 1982) or comparisons of predicted market shares from hypothetical choice/ranking studies with observed market shares (e.g., Wardman 1988). The evidence from a large proportion of studies is that choice experiments generally pass external tests of validity. However, as we have discussed, it is not obvious that these results carry over to hypothetical experiments on non-market goods.

Marketable private goods have been used frequently to test for external validity of the contingent valuation method, and some of these studies have indicated that individuals overstate their WTP in a hypothetical setting (see, e.g., Bishop and Heberlien 1979; Cummings, Harrison, and Rutström 1995; Frykblom 1997). Some previous studies on donations to environmental projects using the contingent valuation method have also indicated overstatement of hypothetical WTP (see, e.g., Seip and Strand 1992; Brown et al. (1996); and see List and Gallet 2001 for a meta-analysis on hypothetical bias). Another way to test external validity is to compare the results of CVM studies with revealed preference studies. Carson et al. (1996) performed a meta-analysis, including eighty-three studies, allowing 616 comparisons, and they found that CVM estimates of WTP were slightly lower than their revealed preference counterparts, with a mean ratio between CVM and revealed preferences of 0.89.

There are a number of cases where laboratory and natural field experiments can be used as an alternative to stated preference surveys. The main application is within food economics, where both stated preference and experiments have been used to elicit preferences for food attributes and the effects of information on consumer choices. A number of studies compare results from lab experiments with stated preference (p. 205) surveys on the same topic. Most of these are done with the purpose of testing for hypothetical bias. The results are clearly mixed for choice experiments. Carlsson and Martinsson 2001 failed to reject a hypothesis of equal marginal WTP in a real and a hypothetical setting (both conducted in a lab), while Johansson-Stenman and Svedsäter 2008 did reject the equality of marginal WTPs. Lusk and Schroeder 2004 found that hypothetical choices overestimate total WTP, but did not reject the equality of marginal WTPs for changes in individual attributes.

There are also studies that compare lab experiments, stated preference surveys, and behavior outside the lab. Shogren et al. (1999) conducted a hypothetical mail survey and a lab experiment concerning irradiated food, and compared the results with actual store purchases. They found that both the survey and the lab experiment resulted in a larger market share prediction of irradiated chicken than the grocery store prediction. Chang, Lusk, and Norwood (2009) found that both a stated preference survey and an actual lab experiment predicted actual retail sales fairly well, although the non-hypothetical experiment performed better than the hypothetical choice experiment. The most interesting finding is perhaps the one in Lusk, Pruitt, and Norwood 2006, where they compared a framed field experiment with actual retail sales. They found that the results of the framed field experiment predicted consumer behavior in the store, although there is some evidence of more prosocial behavior in the framed field experiment.

What is the interpretation of these results? I think there are two important take-home messages. First, we should be careful when comparing results obtained from either a laboratory experiment or a stated preference survey with actual behavior. Clearly, this difference cannot be explained only by hypothetical bias in the strictest sense, since both actual lab experiments and stated preference survey results have some problems with predicting actual retail behavior. Second, although the results are mixed, actual lab experiments seem to perform better than stated preference experiments. As discussed by Levitt and List 2007, there are a number of factors that can explain the behavioral differences between the laboratory and the real world: scrutiny, context, stakes, selection of subjects, and restrictions on time horizons and choice sets. Some of these are, of course, important explanations for the difference between even lab experiments and retail sales. There are at least three important differences. The first is the degree of scrutiny: in both the lab and the survey situation, subjects/respondents know that they are taking part in a study where someone is interested in their behavior. The second is the choice set restriction: in the lab and the survey, the choice sets are clearly defined and restricted, while in a store the choice sets are larger and perhaps less clear. The third is the context in which the choices are made, i.e., the store versus the lab. There is a vast literature comparing behavior in the lab and the field showing that these could be important factors explaining the difference. The degree of anonymity in experiments is one potential measure of scrutiny, and a number of experiments show that the extent of prosocial behavior increases, the less anonymous the decisions are (see, e.g., List et al. 2004; Soetevent 2005). The effects of choice set restrictions can also be important. For example, Bardsley 2008 and List 2007 find that subjects' behavior (p. 206) in traditional dictator games changes when they are faced with the possibility of taking money from the recipient's endowment. Consequently, when comparing behavior in a survey situation and in an actual situation, many of these factors could also be different, and explain the potential differences. This is particularly important for tests of external validity.

## 5.3 Methods for Reducing Hypothetical Bias

A number of measures to reduce and/or correct for hypothetical bias have been suggested in the literature. In this section I shall look at three approaches: (1) using follow-up certainty questions, (2) using cheap talk and consequential scripts, and (3) using time-to-think protocol.

With a follow-up certainty question respondents are asked to rate how certain they are that they would actually pay. Usually a ten-point Likert scale is used (10 = very certain and 1 = very uncertain). Then only certain responses are used to estimate WTP.7 A series of studies have shown that using only certain responses results in a hypothetical WTP that is insignificantly different from actual WTP (see, e.g., Champ et al. 1997; Champ and Bishop 2001; Vossler et al. 2003; Blumenschein et al. 2008). While this method indeed seems to reduce hypothetical bias, the major problem is that there does not seem to be any consistency in the threshold. Sometimes a threshold of eight works best and sometimes, for example, a threshold of ten. It is not evident how respondent uncertainty responses can be implemented in a choice experiment, and few studies have used this approach. Lundhede et al. (2009) designed two experiments where respondents were asked to assess how certain they were of their response after each choice set. However, using various recoding approaches, they did not find any significant or consistent effects on WTP.

Cheap talk scripts were initially suggested by Cummings and Taylor 1999, and they are an attempt to bring down the hypothetical bias by thoroughly describing and discussing the propensity of respondents to exaggerate stated WTP. The success of cheap talk scripts has varied. Using private goods, classroom experiments, or closely controlled field settings, the use of cheap talk has proven to be potentially successful (Cummings and Taylor 1999; List 2001). Similarly, short cheap talk scripts have also been effective in reducing marginal WTP in choice experiments (Carlsson, Frykblom, and Lagerkvist 2005). Mixed results have been found when incorporating a public good with private good attributes (Aadland and Caplan 2003, 2006), and one possible explanation for the difference is that the length and structure of the cheap talk script matters. A somewhat different approach that can be called a consequential script has (p. 207) been suggested by Bulte et al. (2005). With this script the respondents are told explicitly that the results from the study could have an actual effect and that they should consider this when answering. This type of script seems to have a similar effect on WTP as a cheap talk script. However, whether the effect of this script was desirable or not would partly depend on the incentive properties of the valuation question. Furthermore, this script is perhaps less likely to reduce effects such as purchase of moral satisfaction and warm glow.

Finally, giving respondents time to consider their responses has been shown in a number of studies to result in more consistent responses and lower WTP (Whittington et al. 1992; Cook et al. 2007). There are two potential effects of giving respondents time to think: they may consider the budget constraint more carefully and the focus on the issues is reduced, and the influence of the interviewer is reduced. Both these effects are likely to reduce WTP, and hence the hypothetical bias.

# 6 Discussion

This chapter has provided an overview of stated preference methods for food demand analysis, with a focus on the choice experiment method. To conclude, I provide a brief discussion of some issues for ongoing and future research.

I shall actually start where I ended. Hypothetical bias remains one of the most important problems with stated preference methods, although one sometimes get the sense that people not working with stated preference methods are exaggerating the problem. There is some more recent work on implementation of methods to reduce hypothetical bias. One suggestion is to use a so-called oath script. In an incentive-compatible second-price auction, Jacquemet et al. (2009) asked bidders to swear on their honor to give honest answers prior to participating. They found that in treatments with an oath script, responses were more sincere than in treatments without an oath script. Another recent suggestion is to use a third-person perception approach, or inferred valuation. With this approach, subjects are asked what they believe an average person would do (Carlsson, Daruvala, and Jaldell 2008; Lusk and Norwood 2009a, b). Lusk and Norwood (2009a) found that predictions of others' voting were similar to actual voting behavior. Lusk and Norwood (2009b) found that for goods with high normative consequences, own stated WTP was higher than the predicted WTP of others. Carlsson, Daruvala, and Jaldell (2008) found that subjects stated a lower WTP when asked about others' behavior than when asked about their own WTP. For these two approaches, more empirical evidence is clearly needed, and it remains to be seen if they can be implemented in a traditional stated preference survey.

Another area for future research is the role of context dependence that we have also discussed in this chapter. It is clear that context matters in the survey situation, but at the same time context also matters in other situations, such as contributions to public goods and donations (see, e.g., Landry et al. 2006; (p. 208) Alpizar, Carlsson, and Johansson-Stenman 2008a). One interesting question is whether context is more or less important in hypothetical contexts (see, e.g., Hanemann 1994; Bertrand and Mullainathan 2001; Alpizar, Carlsson, and Johansson-Stenman 2008b). Furthermore, the analytical tools in stated preferences are highly suitable for analyzing and incorporating context dependence (Swait et al. 2002). An excellent example of such a study is one by Hu, Adamowicz, and Veeman (2006), where the effect of labels and reference points on food demand is studied. However, more empirical studies are clearly needed.

Finally, surprisingly few stated preference studies consider the social context. Within the area of food economics, the social context is likely to be very important in many circumstances, in particular since many decisions are made within the household, where the decision maker is not one single individual. Stated preference methods are actually highly suitable for analyzing household decisions and relating household decisions to individual preferences since it is possible to conduct surveys at both the household level and the individual level.

## References

Aadland, D., and A. Caplan. 2003. “Willingness to Pay for Curbside Recycling with Detection and Mitigation of Hypothetical Bias.” American Journal Agricultural Economics, 85/2: 492–502.Find this resource:

———.2006. “Cheap Talk Revisited: New Evidence from CVM.” Journal Economic Behavior and Organization 60/4: 562–78.Find this resource:

Adamowicz, W., P. Boxall, M. Williams, and J. Louviere. 1998. “Stated Preferences Approaches to Measuring Passive Use Values.” American Journal of Agricultural Economics 80/1: 64–75.Find this resource:

Alpizar, F., F. Carlsson, and O. Johansson-Stenman. 2008a. “Anonymity, Reciprocity, and Conformity: Evidence from Voluntary Contributions to a National Park in Costa Rica.” Journal of Public Economics 92/5–6: 1047–60.Find this resource:

Alpizar, F., F. Carlsson, and O. Johansson-Stenman. 2008b. “***Does Context Matter More for Hypothetical than for Actual Contributions? Evidence from a Natural Field Experiment.” Experimental Economics 11/3: 299–314.Find this resource:

———, and P. Martinsson. 2003. “Using Choice Experiments for Non-Market Valuation.” Economic Issues 8/1: 83–110.Find this resource:

Andreoni, J. 1989. “Giving with Impure Altruism: Applications to Charity and Ricardian Equivalence.” Journal of Political Economy 97/6: 1447–58.Find this resource:

Antle, J. M. 1999. “The New Economics for Agriculture.” American Journal of Agricultural Economics 81/5: 993–1010.Find this resource:

Ariely, D., G. Loewenstein, and D. Prelec. 2008. “Coherent Arbitrariness”: Stable Demand Curves without Stable Preferences. Working Paper. Cambridge, MA: Massachusetts Institute of Technology.Find this resource:

Arora, N., and G. Allenby. 1999. “Measuring the Influence of Individual Preference Structures in Group Decision Making.” Journal of Marketing Research 36/4: 476–87.Find this resource:

Bardsley, N. 2008. “Dictator Game Giving: Altruism or Artifact?” Experimental Economics 11/2: 122–33.Find this resource:

(p. 209) Bateman, I., and A. Munro. 2009. “Household versus Individual Valuation: What's the Difference?” Environmental and Resource Economics 43/1: 119–35.Find this resource:

——and K. Willis. 1999. Valuing Environmental Preferences. Oxford: Oxford University Press.Find this resource:

——, D. Burgess, G. Hutchinson, and D. Matthews. 2008. “Learning Design Contingent Valuation (LDCV): NOAA Guidelines, Preference Learning and Coherent Arbitrariness.” Journal of Environmental Economics and Management 55/2: 127–41.Find this resource:

Beharry-Borg, N., D. Hensher, and R. Scarpa. 2009. “An Analytical Framework for Joint vs Separate Decisions by Couples in Choice Experiments: The Case of Coastal Water Quality in Tobago.” Environmental and Resource Economics 45/1: 95–117.Find this resource:

Benjamin, J., and L. Sen. 1982. “Comparison of the Predictive Ability of Four Multiattribute Approaches to Attitudinal Measurement.” Transportation Research Record 890: 1–6.Find this resource:

Bertrand, M., and S. Mullainathan. 2001. “Do People Mean What They Say? Implications for Subjective Survey Data.” American Economic Review 91/2: 67–72.Find this resource:

Bishop, R., and T. Heberlein. 1979. “Measuring Values of Extra-Market Goods: Are Indirect Measures of Value Biased?” American Journal Agricultural Economics 61/5: 926–30.Find this resource:

Blamey, R., J. Bennett, J. Louviere, M. Morrison, and J. Rolfe. 2000. “A Test of Policy Labels in Environmental Choice Modeling Studies.” Ecological Economics 32/2: 269–86.Find this resource:

Blumenschein, K., G. Blomquist, M. Johannesson, N. Horn, and P. Freeman. 2008. “Eliciting Willingness to Pay without Bias: Evidence from a Field Experiment.” Economic Journal 118/525: 114–37.Find this resource:

Bradley, M. 1988. “Realism and Adaptation in Designing Hypothetical Travel Choice Concepts.” Journal of Transport Economics and Policy 22/1: 121–37.Find this resource:

Brown, T., P. Champ, R. Bishop, and D. McCollum. 1996. “Which Response Format Reveals the Truth about Donations to a Public Good?” Land Economics 72/2: 152–66.Find this resource:

Bryan, S., L. Gold, R. Sheldon, and M. Buxton. 2000. “Preference Measurement Using Conjoint Methods: An Empirical Investigation of Reliability.” Health Economics 9/5: 385–95.Find this resource:

Bulte, E., S. Gerking, J. List, and A. de Zeeuw. 2005. “The Effect of Varying the Causes of Environmental Problems on Stated Values: Evidence from a Field Study.” Journal of Environmental Economics and Management 49/2: 330–42.Find this resource:

Bunch, D., J. Louviere, and D. Andersson. 1996. A Comparison of Experimental Design Strategies for Choice-Based Conjoint Analysis with Generic-Attribute Multinomial Logit Models. Working Paper. Davis: Graduate School of Management, University of California, Davis.Find this resource:

Cameron, T. 1988. “A New Paradigm for Valuing Non-Market Goods using Referendum Data: Maximum Likelihood Estimation by Censored Logistic Regression.” Journal of Environmental Economics and Management 15/3: 355–79.Find this resource:

Campbell, D., G. Hutchinson, and R. Scarpa. 2008. “Incorporating Discontinuous Preferences into the Analysis of Discrete Choice Experiments.” Environmental and Resource Economics 41/3: 101–17.Find this resource:

Canavari, M., G. Nocella, and R. Scarpa. 2005. “Stated Willingness-to-Pay for Organic Fruit and Pesticide Ban: An Evaluation Using Both Web-Based and Face-to-Face Interviewing.” Journal of Food Products Marketing 11/3: 107–34.Find this resource:

Carlsson, F., and P. Martinsson. 2001. “Do Hypothetical and Actual Marginal Willingness to Pay Differ in Choice Experiments? Application to the Valuation of the Environment.” Journal of Environmental Economics and Management 41/2: 179–92.Find this resource:

(p. 210) Carlsson, F., and P. Martinsson. 2003. “Design Techniques for Stated Preference Methods in Health Economics.” Health Economics 12/4: 281–94.Find this resource:

———. 2008. “How Much Is Too Much? An Investigation of the Effect of the Number of Choice Sets, Starting Point and the Choice of Bid Vectors in Choice Experiments.” Environmental and Resource Economics 40/2: 165–76.Find this resource:

——, D. Daruvala, and H. Jaldell. 2008. Do You Do What You Say or Do You Do What You Say Others Do? Working Papers in Economics No. 309. Gothenburg: Department of Economics, University of Gothenburg.Find this resource:

——, P. Frykblom, and C. J. Lagerkvist. 2005. “Using Cheap-Talk as a Test of Validity in Choice Experiments.” Economics Letters 89/2: 147–52.Find this resource:

Carlsson, F., P. Frykblom, and C. J. Lagerkvist. 2007a. “***Consumer Benefits of Labels and Bans on GM Foods: Choice Experiments with Swedish Consumers.” American Journal of Agricultural Economics 89/1: 152–61.Find this resource:

Carlsson, F., P. Frykblom, and C. J. Lagerkvist. 2007b. “***Consumer Willingness to Pay for Farm Animal Welfare: Transportation of Farm Animals to Slaughter versus the Use of Mobile Abattoirs.” European Review of Agricultural Economics 34/3: 321–44.Find this resource:

Carson, R., and T. Groves. 2007. “Incentive and Informational Properties of Preference Questions.” Environmental and Resource Economics 37/1: 181–210.Find this resource:

——, N. Flores, K. Martin, and J. Wright. 1996. “Contingent Valuation and Revealed Preference Methodologies: Comparing the Estimates for Quasi-Public Goods.” Land Economics 72/1: 80–99.Find this resource:

——, R. Groves, and M. Machina. 1999. “Incentive and Informational Properties of Preference Questions.” Paper presented at the European Association of Environmental and Resource Economists Ninth Annual Conference, Oslo.Find this resource:

Champ, P., and R. Bishop. 2001. “Donation Payment Mechanisms and Contingent Valuation: An Empirical Study of Hypothetical Bias.” Environmental and Resource Economics 19/4: 383–402.Find this resource:

——, R. Bishop, T. Brown, and D. McCollum. 1997. “Using Donation Mechanisms to Value Non-use Benefits from Public Goods.” Journal of Environmental Economics and Management 33/2: 151–62.Find this resource:

Chang, J. B., J. Lusk, and F. B. Norwood. 2009. “How Closely Do Hypothetical Surveys and Laboratory Experiments Predict Field Behavior?” American Journal of Agricultural Economics 91/2: 518–34.Find this resource:

Cook, J., D. Whittington, D. Canh, F. R. Johnson, and A. Nyamete. 2007. “Reliability of Stated Preferences for Cholera and Typhoid Vaccines with Time to Think in Hue, Vietnam.” Economic Inquiry 45/1: 100–14.Find this resource:

Cummings, R., and L. Taylor. 1999. “Unbiased Value Estimates for Environmental Goods: A Cheap Talk Design for the Contingent Valuation Method.” American Economic Review 89/3: 649–65.Find this resource:

——, G. Harrison, and E. Rutström. 1995. “Home-Grown Values and Hypothetical Surveys: Is the Dichotomous Choice Approach Incentive Compatible?” American Economic Review 85/1: 260–6.Find this resource:

DeShazo, J. R., and G. Fermo. 2002. “Designing Choice Sets for Stated Preference Methods: The Effects of Complexity on Choice Consistency.” Journal of Environmental Economics and Management 44/1: 123–43.Find this resource:

(p. 211) Dosman, D., and W. Adamowicz. 2006. “Combining Stated and Revealed Preference Data to Construct an Empirical Examination of Intra-Household Bargaining.” Review of the Economics of the Household 4/1: 15–34.Find this resource:

Frykblom, P. 1997. “Hypothetical Question Modes and Real Willingness to Pay.” Journal Environmental Economics and Management 34/3: 275–87.Find this resource:

Hamilton, S. F., D. L. Sunding, and D. Zilberman. 2003. “Public Goods and the Value of Product Quality Regulations: The Case of Food Safety.” Journal of Public Economics 87/3–4: 799–817.Find this resource:

Hanemann, M. 1984. “Discrete/Continuous Models of Consumer Demand.” Econometrica 52/3: 541–61.Find this resource:

——1994. “Valuing the Environment through Contingent Valuation.” Journal of Economic Perspectives 8/5: 19–43.Find this resource:

——1999. “Welfare Analysis with Discrete Choice Models.” In J. Herriges and C. Kling, eds, Valuing Recreation and the Environment. Cheltenham: Edward Elgar.Find this resource:

——and B. Kanninen. 1999. “The Statistical Analysis of Discrete-Response CV Data.” In I. Bateman and K. Willies, eds, Valuing Environmental Preferences. Oxford: Oxford University Press.Find this resource:

Hensher, D. 2006. “Revealing Differences in Willingness to Pay Due to Dimensionality of Stated Choice Designs: An Initial Assessment.” Environmental and Resource Economics 34/1: 7–44.Find this resource:

——and W. Greene. 2003. “The Mixed Logit: The State of Practice.” Transportation 30/2: 133–76.Find this resource:

——, J. Rose, and W. Greene. 2005. “The Implications on Willingness to Pay of Respondents Ignoring Specific Attributes.” Transportation 32/3: 203–22.Find this resource:

——, P. Stopher, and J. Louviere. 2001. “An Exploratory Analysis of the Effect of Number of Choice Sets in Designed Choice Experiments: An Airline Choice Application.” Journal of Air Transport Management 7/6: 373–9.Find this resource:

Herriges, J., and J. Shogren. 1996. “Starting Point Bias in Dichotomous Choice Valuation with Follow-Up Questioning.” Journal of Environmental Economics and Management 30/1: 112–31.Find this resource:

Hu, W., W. Adamowicz, and M. Veeman. 2006. “Labeling Context and Reference Point Effects in Models of Food Attribute Demand.” American Journal of Agricultural Economics 88/4: 1034–49.Find this resource:

Huber, J., and K. Zwerina. 1996. “The Importance of Utility Balance in Efficient Choice Designs.” Journal of Marketing Research 33/3: 307–17.Find this resource:

Jacquemet, N., R.-V. Joule, S. Luchini, and J. F. Shogren. 2009. Preference Elicitation under Oath. Working Papers No. 43. Paris: Centre d'Économie de la Sorbonne.Find this resource:

Johansson-Stenman, O., and H. Svedsäter. 2008. “Measuring Hypothetical Bias in Choice Experiments: The Importance of Cognitive Consistency.” B. E. Journal of Economic Analysis and Policy 8, art. 41.Find this resource:

Johnson, R., W. Matthews, and M. Bingham. 2000. “Evaluating Welfare-Theoretic Consistency in Multiple Response Stated-Preference Surveys.” Working Paper T-0003. Durham, NC: Triangle Economic Research.Find this resource:

Kahneman, D., and J. Knetsch. 1992. “Valuing Public Goods: The Purchase of Moral Satisfaction.” Journal of Environmental Economics and Management 22/1: 57–70.Find this resource:

Kanninen, B. 2002. “Optimal Design for Multinomial Choice Experiments.” Journal of Marketing Research 39/2: 214–17.Find this resource:

(p. 212) Kuhfeld, W., R. Tobias, and M. Garratt. 1994. “Efficient Experimental Design with Marketing Research Applications.” Journal of Marketing Research 31/4: 545–57.Find this resource:

Ladenburg, J., and S. Olsen. 2008. “Gender Specific Starting Point Bias in Choice Experiments: Evidence from an Empirical Study.” Journal of Environmental Economics and Management 56/3: 275–85.Find this resource:

Landry, C., A. Lange, J. List, M. Price, and N. Rupp. 2006. “Toward an Understanding of the Economics of Charity: Evidence from a Field Experiment.” Quarterly Journal of Economics 121/2: 747–82.Find this resource:

Layton, D., and G. Brown. 1998. “Application of Stated Preference Methods to a Public Good: Issues for Discussion.” Paper presented at the NOAA Workshop on the Application of Stated Preference Methods to Resource Compensation, Washington, DC, June 1–2.Find this resource:

———. 2000. “Heterogeneous Preferences Regarding Global Climate Change.” Review of Economics and Statistics 82/4: 616–24.Find this resource:

Levitt, S. D., and J. A. List. 2007. “What Do Laboratory Experiments Measuring Social Preferences Reveal about the Real World?” Journal of Economic Perspectives 21/2: 153–74.Find this resource:

Liljenstolpe, C. 2008. “Evaluating Animal Welfare with Choice Experiments: An Application to Swedish Pig Production.” Agribusiness 24/1: 67–84.Find this resource:

List, J. 2001. “Do Explicit Warnings Eliminate the Hypothetical Bias in Elicitation Procedures? Evidence from Field Auction Experiments.” American Economic Review 91/5: 1498–1507.Find this resource:

——2003. “Does Market Experience Eliminate Market Anomalies?” Quarterly Journal of Economics 118/1: 41–72.Find this resource:

——2007. “On the Interpretation of Giving in Dictator Games.” Journal of Political Economy 115/3: 482–93.Find this resource:

——and C. Gallet. 2001. “What Experimental Protocol Influence Disparities between Actual and Hypothetical Stated Values?” Environmental and Resource Economics 20/3: 241–54.Find this resource:

——, R. Berrens, A. Bohara, and J. Kerkvliet. 2004. “Examining the Role of Social Isolation on Stated Preferences.” American Economic Review 94/3: 741–52.Find this resource:

Loureiro, M., and S. Hine. 2002. “Discovering Niche Markets: A Comparison of Consumer Willingness to Pay for Local (Colorado-Grown), Organic, and GMO-Free Products.” Journal of Agricultural and Applied Economics 34/3: 477–87.Find this resource:

Louviere, J. 1988. Analyzing Decision Making: Metric Conjoint Analysis. Newbury Park, CA: Sage.Find this resource:

——2006. “What You Don't Know Might Hurt You: Some Unresolved Issues in the Design and Analysis of Discrete Choice Experiments.” Environmental and Resource Economics 34/ 1: 173–88.Find this resource:

——, D. Hensher, and J. Swait. 2000. Stated Choice Methods: Analysis and Application. Cambridge: Cambridge University Press.Find this resource:

——, T. Islam, N. Wasi, D. Street, and L. Burgess. 2008. “Designing Discrete Choice Experiments: Do Optimal Designs Come at a Price.” Journal of Consumer Research 35 (Aug.), 360–75.Find this resource:

Lundhede, T., S. Olsen, J. Jacobsen, and B. J. Thorsen. 2009. “Handling Respondent Uncertainty in Choice Experiments: Evaluating Recoding Approaches against Explicit Modeling of Uncertainty.” Journal of Choice Modelling 2/2: 118–47.Find this resource:

Lusk, J., and B. Norwood. 2005. “Effect of Experimental Design on Choice-Based Conjoint Valuation Estimates.” American Journal of Agricultural Economics 87/3: 771–85.Find this resource:

———. 2009a. “An Inferred Valuation Method.” Land Economics 85/3: 500–14.Find this resource:

(p. 213) ———. 2009b. “Bridging the Gap between Laboratory Experiments and Naturally Occurring Markets: An Inferred Valuation Method.” Journal of Environmental Economics and Management 58/2: 236–50.Find this resource:

——and T. Schroeder. 2004. “Are Choice Experiments Incentive Compatible? A Test with Quality Differentiated Beef Steaks.” American Journal Agricultural Economics 86/2: 467–82.Find this resource:

——, M. Jamal, L. Kurlander, M. Roucan, and L. Taulman. 2005. “A Meta Analysis of Genetically Modified Food Valuation Studies.” Journal of Agricultural and Resource Economics 30/1: 28–44.Find this resource:

——, J. R. Pruitt, and B. Norwood. 2006. “External Validity of a Framed Field Experiment.” Economics Letters 93/2: 285–90.Find this resource:

——, J. Roosen, and J. A. Fox. 2003. “Demand for Beef from Cattle Administered Growth Hormones or Fed Genetically Modified Corn: A Comparison of Consumers in France, Germany, the United Kingdom, and the United States.” American Journal Agricultural Economics 85/1: 16–29.Find this resource:

McFadden, D. 1974. “Conditional Logit Analysis of Qualitative Choice Behavior.” In P. Zarembka, ed., Frontiers in Econometrics. New York: Academic Press.Find this resource:

——and K. Train. 2000. “Mixed MNL Models for Discrete Response.” Journal of Applied Econometrics 15/5: 447–70.Find this resource:

Mazotta, M., and J. Opaluch. 1995. “Decision Making when Choices Are Complex: A Test of Heiner's Hypothesis.” Land Economics 71/4: 500–15.Find this resource:

Mitchell, R., and R. Carson. 1989. Using Surveys to Value Public Goods: The Contingent Valuation Method. Washington, DC: Resources for the Future.Find this resource:

Morey, E., V. Sharma, and A. Karlström. 2003. “A Simple Method of Incorporating Income Effects into Logit and Nested-Logit Models: Theory and Application.” American Journal of Agricultural Economics 85/1: 248–53.Find this resource:

Plott, C. 1996. “Rational Individual Behavior in Markets and Social Choice Processes: The Discovered Preference Hypothesis.” In K. Arrow, E. Colombatto, M. Perleman, and C. Schmidt, eds, Rational Foundations of Economic Behavior. London: Macmillan.Find this resource:

Revelt, D., and K. Train. 1998. “Mixed Logit with Repeated Choices: Households' Choices of Appliance Efficiency Level.” Review of Economics and Statistics 80/4: 647–57.Find this resource:

Sandor, Z., and M. Wedel. 2001. “Designing Conjoint Choice Experiments Using Managers Prior Beliefs.” Journal of Marketing Research 38/4: 430–44.Find this resource:

Scarpa R., M. Thiene, and D. Hensher. 2010. “Monitoring Choice Task Attribute Attendance in Non-Market Valuation of Multiple Park Management Services: Does it Matter?” Land Economics 86/4: 817–39.Find this resource:

———, and K. Train. 2008. “Utility in WTP space: A Tool to Address Confounding Random Scale Effects in Destination Choice to the Alps.” American Journal of Agricultural Economics 90/5: 994–1010.Find this resource:

Seip, K., and J. Strand. 1992. “Willingness to Pay for Environmental Goods in Norway: A Contingent Valuation Study with Real Payments.” Environmental and Resource Economics 2/1: 91–106.Find this resource:

Shogren, J. F., J. A. Fox, D. J. Hayes, and J. Roosen. 1999. “Observed for Food Safety in Retail, Survey, and Auction Markets.” American Journal of Agricultural Economics 81/5: 1192–9.Find this resource:

Small, K., and S. Rosen. 1981. “Applied Welfare Economics with Discrete Choice Econometrics.” Econometrica 49/1: 105–30.Find this resource:

(p. 214) Soetevent, A. R. 2005. “Anonymity in Giving in a Natural Context: An Economic Field Experiment in Thirty Churches.” Journal of Public Economics 89/11–12: 2301–23.Find this resource:

Swait, J., and W. Adamowicz. 1996. The Effect of Choice Environment and Task Demands on Consumer Behavior: Discriminating between Contribution and Confusion. Working Paper. Edmonton: Department of Rural Economy, University of Alberta.Find this resource:

———, M. Hanemann, A. Diederich, J. Krosnick, D. Layton, W. Provencher, D. Schkade, and R. Tourangeau. 2002. “Context Dependence and Aggregation in Disaggregate Choice Analysis.” Marketing Letters 13/3: 195–205.Find this resource:

Train, K. 2003. Discrete Choice Methods with Simulation. New York: Cambridge University Press.Find this resource:

—— and M. Weeks. 2005. “Discrete Choice Models in Preference Space and Willingness-to-Pay Space.” In A. Alberini and R. Scarpa, eds, Applications of Simulation Methods in Environmental Resource Economics. Dordrecht: Springer.Find this resource:

von Haefen, R. 2003. “Incorporating Observed Choices into the Construction of Welfare Measures from Random Utility Models.” Journal of Environmental Economics and Management 45/2: 145–64.Find this resource:

Vossler, C., R. Ethier, G. Poe, L. Gregory, and M. Welsh. 2003. “Payment Certainty in Discrete Choice Contingent Valuation Responses: Results from a Field Validity Test.” Southern Economic Journal 69/4: 886–902.Find this resource:

Wardman, M. 1988. “A Comparison of Revealed and Stated Preference Models of Travel Behaviour.” Journal of Transport Economics and Policy 22/1: 71–91.Find this resource:

Whittington, D., V. K. Smith, A. Okorafor, A. Okore, J. L. Liu, and A. McPhail. 1992. “Giving Respondents Time to Think in Contingent Valuation Studies: A Developing Country Application.” Journal of Environmental Economics and Management 22/3: 205–25.Find this resource:

Zwerina, K., J. Huber, and W. Kuhfeld. 1996. A General Method for Constructing Efficient Choice Designs. Working Paper. Durham, NC: Fuqua School of Business, Duke University.Find this resource:

## Notes:

(1) In a generic experiment, the alternatives are simply labeled A, B, C or 1, 2, 3. In an alternative specific experiment the alternative could be brand names, shops, different national parks.

(2) If we want to estimate a model with a lognormally distributed coefficient and our expectation is that the sign of the coefficient is negative, we must estimate the model with the negative value of the corresponding variable. The reason is that the log-normal distribution by definition will force the coefficient to be positive.

(3) This is similar to the approach in contingent valuation where the WTP distribution is analyzed directly (Cameron 1988), but they extend it to multinomial choices and random coefficients.

(4) Note that in many choice experiments, the socioeconomic characteristics are interacted with the alternative-specific constants. In that case they will not affect the marginal WTP.

(5) We know that, in reality, the bid a respondent faces is not necessarily equal to the cost, and we cannot know that the respondent believes that the bid is equal to the actual cost even if we tell them that.

(6) What if, for example, the respondent does not interpret the proposed bid as the actual cost he or she would face? Then the situation would differ. The respondent would make a similar evaluation as before, but instead it would be based on expectation about the cost. Consequently, not even the binary question format would be incentive-compatible in this case, at least not for all respondents.

(7) In some early applications the certainty information was used to recode yes responses as no responses. This means that by definition the estimated mean WTP will be lower compared with no recoding.