Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). (c) Oxford University Press, 2015. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy).

Subscriber: null; date: 25 April 2017

Experimental Methods, Agency Incentives, and the Study of Bureaucratic Behavior

Abstract and Keywords

This article explores the key attributes of applying experimental methods to the study of bureaucracy. It engages in experimental research on incentives, structure, and other fundamental questions about bureaucracy. Next, it addresses two of the most prominent criticisms of experimental research on American bureaucracies: they lack external validity, and they cannot create laboratory environments that replicate organizational settings. While both pose knotty problems for any experimental research, each has special wrinkles in the case of experimental research on bureaucracy at both the individual and organizational levels. Is also important to note that field experiments suffer from a type of effect that is addressed most concretely in the case of laboratory experiments: the experimenter effect. The experimental approach applied to surveys may be less useful in the case of surveying real bureaucrats. The article finally covers several promising future possibilities for experimental research on American bureaucracy.

Keywords: American bureaucracy, agency incentives, experimental research, field experiments, laboratory experiments

Do incentives motivate bureaucrats? The belief that bureaucrats (like most people) respond to incentives is central to a number of important theories that frame the evolution of how we think about public agencies in general and American bureaucracy in particular. In 1912, Frederick Taylor created a view of “scientific management” that was based on the belief that “piecework” was the primary means for ensuring efficiency and effectiveness in organizations. That same belief was incorporated in what has become the “new scientific management”—the theory that principals control agents by relying on mechanisms that get agents to reveal their capabilities honestly, or at least to act in the principal's interest when they act without the principal's direct supervision. When Barry Weingast and Mark Moran used principal–agent theory to study how politicians (e.g., members of Congress) (p. 787) oversee public agencies (“the bureaucracy”), they argued that politicians empower congressional committees, which “possess sufficient rewards” to use “sanctions to create an incentive system for agencies” (1983, 768).

This trajectory in the evolution of our thought about public agencies in a democracy—from Taylor to Weingast and Moran—has come to permeate other literatures by again emphasizing the role of incentives. Reforms in public administration around the world—such as the new public management (NPM) movement discussed both positively and negatively throughout this handbook—embrace its precepts. For example, NPM proponents see “pay‐for‐performance” as a central mechanism for improving the performance of the public sector and changing how government “does business” (see Kaboolian 1998).

While clearly this position has become important to the study of bureaucracy, a significant and opposing perspective has emerged in recent years (see especially Frederickson and Stazyk, and Workman, Jones, and Jochim in this volume). Numerous scholars today challenge the “incentives” perspective and offer evidence that bureaucrats see themselves more as conveyors of public power for the purpose of improving society and solving public problems. Such bureaucrats, likely motivated by an intrinsic interest in their work (a “public service motivation”), may act in fundamentally different ways from that predicted by incentive‐based theories when they act on behalf of the state, wielding its power and resources, without the active and direct oversight of politicians (Perry, Mesch, and Paarlberg 2006).

Public service motivation (henceforth, PSM) provides a unique and compelling counterpoint to the long‐term development of theoretical and empirical perspectives on the use of incentives to solve principal–agent problems (e.g., Perry 1996; Rainey 1983). For example, subsequent studies have sought empirically to see whether financial incentives actually diminish the effectiveness of internal motivators such as PSM (Deci and Ryan 2000; Frey 1997; Lee and Whitford 2008; Miller and Whitford 2002; Weibel, Rost, and Osterloh 2010). Readers also will note how this debate has been marbled throughout various chapters in this book. These include discussions of trends in human resources management (Riccucci), contracting and competitive sourcing (Johnston and Romzek), decision making in public organizations (Workman, Jones, and Jochim), delegation of authority and applications of principal–agent theory (Krause; Wood; Zegart), and critiques of technocratic models of bureaucracy (Hummel and Stivers; Maynard‐Moody and Portillo; Schachter).

In some ways, the question of incentives has come to be a central divide in competing literatures on public agencies: one side rooted in political science and economics, the other rooted in public administration and management. Despite its centrality and the controversy around it, however, the question has proven difficult to resolve by pursuing what students of American bureaucracy tend to do best: empirical research in the field. This is the case because it is difficult or impossible in many organizational settings to obtain observational data on individual effort, which rules out estimating how responsive individual effort is to compensation.

As Beth Asch writes, as a rule, “data are unavailable on worker output and company incentive plans [and] worker output is imperfectly measured” (1990, (p. 788) 89S). Researchers certainly would prefer to observe individual productivity figures that serve as a reasonable proxy for individual effort. However, the effect of individual effort often is hopelessly confounded with organizational or environmental variables that jointly determine output. Thus, as James Perry, Debra Mesch, and Laurie Paarlberg recently put the dilemma: “We know more about human performance than we have historically recognized…[but] we must understand that intervening to change human performance will always be an uncertain and indeterminate process” (2006, 508).

This observation is, of course, a recognition of a longstanding problem in organizational research, one dating to the famous Hawthorne Studies of the 1930s. Researchers in that instance found that group performance norms could swamp the effects of the formal reward‐and‐punishment scheme (Roethlisberger and Dickson 1939). Likewise, in his study of piece rates, William Whyte (1955) found that peer pressure to protect jobs completely undermined the efficacy of the piece‐rate mechanism as a solution to the problem of subordinate response to the direction of supervisors. Nor are the effects of incentives any clearer at the top of agencies. Studies of congressional oversight, for example, have shown that bureaucratic behavior is in part responsive to the changing ideologies of Congress (Weingast and Moran 1983) or the president (Moe 1987). However, the evidence is always less than satisfactory: bureaucrats may be responding to the same factors that move the preferences of members of Congress or the president—new information, changing public opinion, new laws, technological change, or the winds of political change that sweep some people into office and others out of office. No study has yet demonstrated the instrumentality of incentives in guiding the responsiveness of upper‐level bureaucrats to political officials (also see Durant and Resh in this volume on presidentializing the bureaucracy).

In the face of these difficulties, our arguments are threefold and apply to a variety of enduring questions facing researchers of American bureaucracy. First, we argue that tensions in the literature such as those we have discussed in regard to incentives illustrate how little we actually know about individual behavior in public organizations. Second, we argue that students of political science, public administration, and public management could turn to laboratory experiments for providing answers about public bureaucracy that are largely unavailable through nonexperimental empirical research. Third, we argue that even though conventional aspects of the bureaucratic process (e.g., decision making, cooperation, deliberation, and response to incentives) are studied in the laboratory with relative ease, scholars of public bureaucracy are nowhere near their colleagues in the field of organizational studies in reaping the potential of experimental designs. Consequently, a robust and important research agenda awaits scholars willing and suitably skilled to incorporate experimental designs into their work. The difficulty in pursuing such an agenda lies less in inherent problems with the experimental method itself than it does with the present state of theorizing about bureaucracy.

We support these arguments, first, by examining key attributes of, and considerations involved in, applying experimental methods to the study of bureaucracy. We (p. 789) inform that discussion with examples of how we have improved our knowledge base by engaging in experimental research on incentives, structure, and other fundamental questions about bureaucracy. We next identify and address two of the most prominent criticisms of experimental research on American bureaucracies: (1) they lack external validity, and (2) they cannot create laboratory environments that replicate organizational settings. While both pose knotty problems for any experimental research, each has special wrinkles in the case of experimental research on bureaucracy at both the individual and organizational levels. Our main assessment, however, is that these concerns are real but not insurmountable and that greater use of experimental designs is worthwhile in light of the considerable contributions they can make to theory building and practice. Accordingly, we conclude the chapter by discussing several promising future possibilities for experimental research on American bureaucracy, a discussion that is not difficult given how little is done these days in this field of study.

The Problem of Causal Inference and the Solution of Experimental Design

Understanding causality (i.e., did “X” cause “Y,” controlling for other factors) is important not just because it is “scientific” but because organizational researchers recognize that it leads to “practical, relevant knowledge” (Bryman 1989, 71). Yet causal inference is one of the fundamental problems in the social sciences (e.g., Holland 1986). Statisticians often claim that it is difficult (if not impossible) to use their tools to address causation (e.g., Barnard 1982). In practice, the statistical techniques that can be used to discuss causation are different from techniques that we use to make inferences about associations, although social science researchers are not always careful about the distinction (Holland 1986, 945). In recent decades, statisticians and social science methodologists have sought to build a research platform that can be used to assess causation in observational data (e.g., Rubin 1974). These techniques are now standard in studies that seek to identify “what causes what” in policy evaluation (e.g., King et al. 2007).

It has proven much more difficult to make causal inferences from naturally occurring data. For example, consider the subject of incentives in bureaucracy and imagine that the U.S. Navy institutes a bonus system for recruiters in district X but not for recruiters in district Y. Suppose we observe that productivity (in the form of new recruits) immediately goes up in district X but not in district Y. In practice, this degree of research control is practically impossible as it would take an act of Congress to conduct such a project, and we know that Congress and the president rarely randomize when imposing policy interventions (e.g., Cawley and Whitford 2007).

(p. 790) However, making inferences from the data is risky even in such an ideal research environment and especially so when the designer has such strong incentives. Were the two districts different prior to the research in terms of morale, turnover, management style, and/or productivity? These variables could contribute to the measured effect, either in isolation or in interaction with the treatment variable. What other things changed in the district at the same time as the piece rate? Did top management place a new district manager in district X? Was turnover the same at the two districts? Was one type of recruiter more likely to leave? Thus, even when the evidence seems clear that incentives are strongly correlated with effort or performance, some third variable may cause a spurious correlation.

In classical experimental design, the goal is to rule out certain causal hypotheses and, ultimately, to support other causal inferences. Generally, there are two broad schools of experimental research on bureaucracy: behavioral and political‐economic. The former dates from the classic Hawthorne experiments and relies on insights and concepts from organizational behavior, social psychology, and sociology. In this approach, theory tends to be less formal, and concern with external validity often results in more lifelike experimental settings. Research in this tradition has improved our understanding of decision making, group deliberation, and how those subjects impact policymaking. In the political‐economic tradition, research has focused on the effects of (and limitations on) incentives in a variety of social settings.

In several important ways, the study of public bureaucracy with the use of experiments benefits from having access to both schools of research. Some questions—such as those centering on coordination and incentives—may benefit from the political‐economic approach since it brings formal rigor and clarity about the mechanisms in play. In contrast, other questions—such as those centering on decision making in structured environments—probably benefit from access to techniques drawn from organizational behavior. But regardless of the approach taken, ensuring experimental control and assuring random assignment are critical considerations in any application of experimental approaches to the study of bureaucracy.

Experimental Control

Faced with the difficulty of making causal inferences from naturally occurring data, social scientists from a variety of disciplines have designed experiments that eliminate some causal hypotheses in favor of others. The key difference here is that experimental data are created by scientists under controlled conditions for the purpose of making causal inferences, while observational data are the result of uncontrolled processes (Friedman and Sunder 1994, 3). An effective experimental design begins by specifying an environment that controls for variation in some variables.

Experiments can take place in the laboratory or the field, just as observational data can come from the field or the laboratory (as when Oersted unintentionally discovered electromagnetism) (e.g., Kipnis 2005). Laboratory experiments achieve the greatest (p. 791) possible control of variables that may have an effect on the outcome. When studying the effect of incentives on individual motivation in a laboratory, the nature of the task will be fixed and identical across all treatments. This is not the case, however, in the hypothetical Navy recruiting project discussed above. In that case, the difficulty of the task depended on the willingness of different recruiting prospects, the socioeconomic backgrounds at different Navy recruiting stations, and the degree to which supervisors emphasized high‐quality versus low‐quality recruits (Asch 1990).

The primary source of experimental control is found in the scientist's ability to choose whether specific variables are constants (conditions that are the same for each person) or treatments (which the scientist changes to see whether variation produces an expected result). Control by setting a constant limits the range of responses that can occur when subjects encounter potential causes of their behavior. Control by setting a treatment condition allows the scientist to assess the relationship between a proposed cause and the observed responses of the subject after they encounter the intervention. Generally, the clearest view of the impact of those proposed causes comes when treatment variables vary independently (Friedman and Sunder 1994, 20). But, again, even in that case, the researcher is able to choose how and when subjects encounter the proposed causes individually or in combination.

Random Assignment

For those variables that cannot be controlled—or, even more importantly, cannot be observed—randomization is used to rule out other possible sources of variation. This is critical to an effective experimental design. In our Navy example above, for example, some subjects may well be more concerned than others about financial motivation. The experimenter cannot set the level of intrinsic motivation, interest, or attention. However, the random assignment of subjects across treatments makes it possible to rule out such systematic differences as a cause of observed treatment effects if the uncontrolled aspects are independent of the treatments.

An example of the importance of random assignment is Thomas Dee and Benjamin Keys's (2004) research on incentives for teachers. As the authors point out, one of the controversies surrounding the use of incentives for bureaucrats is whether it is possible to link incentives to any measurable output that one would want to encourage. It is possible to reward student accomplishments as measured on tests, but those accomplishments may result from the cooperative efforts of teachers, administrators, and family. Consequently, rewarding the teacher may be capricious and imprecise.

Some research has indicated that it is possible to use merit pay to encourage gains in test scores. Elchanan Cohn and Sandra Teel (1992) indicate that students learn more when they take classes from teachers who are rewarded with merit pay than when they are taught by teachers who are not. However, the internal validity (the validity of the causal inference) of this research is in question. As Dee and Keys point out, the Cohn and Teel results “are also consistent with the plausible hypothesis that (p. 792) the teachers who receive merit pay tend to select schools and classes whose students have unobserved propensities for high achievement (e.g., better socioeconomic priors, higher resource levels, increased parental involvement). If this were so, then regression results based on conventional data would overstate the success of the merit‐pay programs in rewarding effective teachers” (2004, 474).

Indeed, it is virtually impossible to imagine a study based on conventional data that could eliminate this ambiguity about the effect of merit pay, which is why Dee and Keys chose an experimental design. Two overlapping programs in Tennessee made a controlled field experiment possible: the Career Ladder Evaluation System (which provided merit pay for better trained and professional teachers) and Project STAR (an experiment to find out about the effect of class size on performance). Under Project STAR, both students and teachers were randomly assigned to large and small class sizes. The teachers who were randomly chosen for Project STAR had various levels of merit pay. This randomization helped rule out the alternative hypothesis in the Cohn and Teel study: that teachers with merit pay were better able to obtain assignment to better performing students. The results showed that students in classrooms with merit‐pay instructors demonstrated statistically better improvement in math scores. Moreover, this result was robust to statistical controls for teacher experience, teacher education, and small class assignment. The effect on reading scores was smaller and not statistically significant. Thus, unlike the Cohn and Teel research, this experimental finding supports a causal inference about the role of incentives.

As a consequence, the starting point in experimental design is to randomize everything that is not controlled in each trial of the experiment. As the number of trials gets large, the effect of unobserved variables on the measurement of the treatment effect gets small when treatments are randomized (Friedman and Sunder 1994, 24). Of course, experimenters have found many ways to improve on this basic design if one expects unobserved causes to affect the data collection effort in different settings. These include a range of complicated designs (e.g., randomized block designs, within‐ and between‐subjects designs, and factorial designs) that are all meant to tease out the independent contribution of specific treatments under study. The design principles that flow from each of these are offered to help researchers meet their goals of reducing the impact of outside forces and clarifying inferences about the causal effects.

Putting the Individual in the Laboratory: Addressing the Problem of External Validity

Despite the significant advantages of using experimental research designs to address important and enduring questions about bureaucratic behavior, this approach is not (p. 793) without its critics. In this section, we discuss the most widely cited critique of these methods when applied to the study of individual behavior in public agencies (e.g., superior–subordinate relationships): the problem of external validity. In doing so, we also offer “defenses” against this critique in light of the potential for experimental designs to advance practice and theory building on administrative behavior.

The Problem of External Validity

The claim that causal inferences can be drawn about the effect of treatment variables in a laboratory experiment is of limited value if, of course, the inference only holds for the subjects in the experiment. Certainly, the differences people experience between treatment and control cases may indicate a true causal relationship, but that causal relationship may not be generalizable to other settings. Consequently, concerns about external validity are among the most obvious bases for a fundamental critique of experiments on public bureaucracy. That is, is there any confidence that the same causal relationship would apply outside the laboratory?

Ironically, the characteristics of experimental design that allow the most confidence in causal inference also may provide an air of unreality to laboratory experiments. Very often, for example, the subjects selected are college undergraduates. Perhaps causal variables have the impact shown on these subjects only, and a similar experiment with senior bureaucrats and larger amounts of money would have quite different impacts. Likewise, the question of validity across subpopulations is always a concern. Who should the subjects in experiments be in order to maximize the probability that the causal effect is tested? The key decisions in this vein tend to involve whether to use beginning or advanced students (since there may be education effects or even minimally necessary computational skills), people who know each other versus those who do not (since expectations can shape decisions), and persons the experimenter drafts or volunteers (since people may “self‐select” based on interest and experience). These decisions are, of course, accompanied by central concerns about the effects of gender, race, religious background, or ethnic origin (e.g., Ball and Cech 1991).

Nor do threats to external validity end here. Some novel experiments stretch the bounds of our understanding of validity by taking into account specific cognitive abilities (e.g., Camerer, Loewenstein, and Prelec 2004) or by conducting experiments involving nontraditional societies (e.g., tribal groups in Africa) (Henrich et al. 2004). For instance, there is great concern in one of the most active areas of social science experimentation—experiments on the performance of markets—that using professionals may complicate rather than simplify the experimental task. While numerous studies show that businesspeople act differently in experiments than do students, precisely how they act differently cannot always be reduced to the impact of their specialized experience and/or knowledge (Friedman and Sunder 1994, 42). In addition, even when avoiding the use of students, the “use of businessmen experienced in one set of rules introduces many unknowns which may confuse the issue and make (p. 794) interpretation impossible. [This means that] the major role that experienced businessmen or traders can play is in model development, or the design of the experimental market itself” (Burns 1985, 152).

We naturally agree that caution must be taken in generalizing experimental results to other settings. However, external validity is a problem that can itself be studied with experimental techniques to reveal how much variation in settings (i.e., more or less realistic) can influence the results. Beyond this important point, there are several additional defenses that we feel are important to marshal in support of the utility of experimental designs for studying American bureaucracy. For convenience, we label these the complementarity, equivalency, and pooling defenses.

The Complementarity Defense

One response to the concern with external validity centers on the philosophy of science and the actual goal of the experiment. A theory may specify a particular causal link: “people produce more when compensation is contingent on productivity.” The theory does not qualify this by adding, “except for college undergraduates.” If our theories are simple and parsimonious, then they also should be universally valid; as such, evidence against a theory should be admissible from any subjects capable of understanding the instructions of the experiment. Reliable data that contradict the theory falsify it, at least in its simplest or most general form.

So the theoretical “goalposts” have to move for a theory to be credible for use in studying bureaucracy. Such a modified theory should lead to a more complex conclusion: “people produce more when compensation is contingent on productivity, unless they have not yet achieved true financial independence,” or “except for those people who are in a highly socialized peer group environment, such as a college campus.” Notice that these modified, more complex versions of the theory have incorporated different implicit explanations for the failure of the theory to work on college students. The first is that college students, who are mostly in a state of financial dependence on parents or financial assistance, are not as motivated by contingent compensation. The second modification accounts for peer group pressure that may diminish the force of financial incentives.

Both of these more complex theories are potentially testable—even in the laboratory. Experimenters could conduct identical experiments with college undergraduates and with masters of public administration (MPA) students who are coming back to school after several years of work experience and full financial autonomy. One could do experiments on college undergraduates at a small liberal arts college where everyone knows each other, compared to an anonymous urban evening community college. Evidence that either comparison group is more responsive to financial incentives than the original definition of college student subjects would advance one or the other of the more complex theories. Doubt is cast on the more complex theory if neither the returning MPAs nor the urban community college students respond differently than the original group of subjects. Consequently, a concern with (p. 795) external validity need not be seen as a complaint that makes laboratory experiments irrelevant; rather, the complaint is an invitation to a more complete and better‐specified theory—which is likely more easily tested in the laboratory than with field data. As noted above, the recent wave of experiments with atypical subject pools attests to the power of this way of thinking about falsification.

Seen from this perspective, the primary purpose of an experiment may not be to produce results that “generalize” but rather to provide an opportunity for a critical test of a particular theory. A laboratory experiment should give a theory “its best shot”—that is, it should realize all the conditions that match the assumptions of the theory. A theory that does not stand up to a rigorous experimental test may have little to commend it for other settings. On the other hand, rigorous experimental support for a theory may well motivate and inform further research outside the laboratory.

Taking the larger picture, then, negative results in a laboratory experiment could be the most valuable possible outcome. Negative experimental results are not just an invocation to falsify a theory but an invitation for further theorizing. In the last decades, experiments have had such an impact. Two striking examples are experiments on two common organizational dilemmas: collective‐action problems and ultimatum games. In the first case, experiments on collective‐action problems have systematically falsified the main and original view that individuals inevitably are led to “free‐riding” behavior, which is an inefficient equilibrium. By and large, these experiments (which have great importance for, but have been largely ignored by, students of bureaucracy) have laid the groundwork for a new generation of “modified rational choice” theories. Taking on the conventional wisdom, these modified theories are built on the assumption that individuals have an intrinsic or contingent motivation to cooperate with others.

Likewise, the founders of game theory originally saw experiments on ultimatum bargaining games (especially the concept of a subgame perfect Nash equilibrium) as critical tests for the entire paradigm. In an ultimatum game, a person (an offerer) given a sum of money makes an offer to divide the money (a division) with another person (a receiver), who can then reject the offer or accept it. Rejecting the offer means to receive nothing, while accepting it means that the receiver gets the division of money while the offerer keeps the remainder. Theory dictates that a rational receiver will accept any non‐zero division (this equilibrium is called “subgame perfect Nash”). The core result has been that the subgame perfect Nash equilibrium is one of the most robustly falsified hypotheses in social science, having been tested and found wanting on six continents (Henrich et al. 2004). This approach to falsification—which has largely been carried out by proponents and not adversaries of these theories—has supported the emergence of whole new families of theories built on concepts such as reciprocity (Fehr and Gächter 1998), fairness (Rabin 1993), and culture (Henrich 2000). Essentially, the falsification of early theories by experimentation, which gave researchers some certainty about the veracity of the tests, has advanced theory building about human behavior.

Finally, consider how a number of recent studies have sought to use experiments to understand how principals and agents “solve” the problems associated with the use (p. 796) of incentives in organizations. Since Dan Wood offers a spirited defense of the logic of principal–agent theory in his chapter, it suffices presently to note a few of the key questions that principal–agent theory addresses and how experimental designs might help refine them. How, for example, do principals use incentives to shape the behavior of agents? How do agents respond to the use of incentives? Do other motivators—such as those documented in the burgeoning literature on PSM noted earlier—complement or compete with the use of incentives? These experimental approaches still may have their limitations. However, their findings complement those of others using different methodologies who either support, qualify, or reject key propositions about organizational behavior in American bureaucracies.

Consider, in this vein, what experimental designs can add to testing propositions from principal–agent theory as it applies to compensation contracts in public organizations. Generally speaking, just like the quantitative field data on these mechanisms, experiments on compensation contracts also reveal deviations from what theory says principals and agents should do when using incentives to solve coordination problems between supervisors and their subordinates. Suppose that supervisors (principals) and subordinates (agents) can trade off between the use of two kinds of wages. One type is contingent on an estimate (noisy) about the agent's effort level. A second type is contingent on the agent receiving compensation regardless of her actions and whether her actions (in combination with her environment) produce positive outcomes. Ernst Fehr and Simon Gächter (2008) summarize experiments showing that principals typically offer considerable noncontingent wages and in return receive higher‐than‐equilibrium levels of effort from agents. An example of a noncontingent wage would be a $12 flat wage (received regardless of one's effort level or the social outcome) when theory says a bonus of $11 should be necessary and sufficient (the equilibrium level).

In experiments coupling a wage and a contingent bonus (Bottom et al. 2006; Conlon and McLean Parks 1990; McLean Parks and Conlon 1995; Miller and Whitford 2002), principals consistently offered agents high fixed wages and bonuses that were, in theory, insufficient to motivate a high effort from a rational, self‐interested agent. By and large, agents responded with high levels of costly effort. For example, Judy McLean Parks and Edward Conlon (1995; see also Conlon and McLean Parks 1990) show that agent effort (indexed by the amount invested in costly information to improve performance) is greater than predicted even when principals rely on contingent compensation. Contingent pay was used more when monitoring was impossible, but the average amount was still less than half that of the fixed wage. Overall, then, there is limited experimental support for the use of bonuses to induce effort (Perry, Mesch, and Paarlberg 2006 show similar results for studies not using experimental designs).

The Equivalency Defense

While experimental designs frequently are critiqued on generalizability grounds, concerns about external validity are not all that different for laboratory experiments (p. 797) and field research. In terms of equivalency, consider the excellent fieldwork research on piece rates done by Edward Lazear (2000). Lazear used sophisticated statistical controls to enhance the validity of his causal inference that a switch to piece rates at a windshield installation company increased productivity. To control for the Hawthorne effect (i.e., that productivity might increase as a result of research observation) or other historical effects, different employees were switched to piece rates at different times after the research started. The results indicated that productivity tended to go up exactly when the employee was switched to piece rates. By examining fixed effects for individuals under two compensation regimes, Lazear also was able to rule out selection for ability as an alternative explanation: variance in fixed effects presumably due to talent was just as high in both regimes. The internal validity of the result seems as strong as it can be, short of a full‐scale randomized experiment. However, Lazear readily acknowledged that the windshield installation firm is quite different from a bank, the U.S. Food and Drug Administration, or a state education agency. Its task was unique, the amount of theoretical training was minimal, the size of the team necessary to do the job was low (one person), and the relationship between employees and management was positive, in that the workers evidently were confident that their piece rates would not be lowered once the upper limits of their productivity were found.

For any one of dozens of variables, one consequently could claim that the positive results that Lazear found for piece rates were not “externally valid.” The subject pool was no more universally representative than college students. Indeed, Lazear notes that several variables constrain generalization of the results. One was monitoring costs: the costs of counting windshields installed were small. He also observes that the company had a supplementary motivation regime that evidently overcame any piece‐rate incentive for employees to lower quality in the interest of quantity. The company allocated rework to peers, who had to redo an installation without compensation and with knowledge of which employee was responsible for the quality error. The peer pressure that resulted kept quality high and made it possible for piece rates to have the effect that they apparently did. One conclusion is that positive relationships with management and strong peer groups are necessary for piece rates to “work.”

Thus, just as with laboratory experiments, speculation about other variables that may limit generalization is an invitation for important development of any underlying theory. And, again, laboratory experiments are not in a particularly disadvantageous position vis‐à‐vis external validity. In fact, a meta‐analysis (discussed below in greater detail) by Antoinette Weibel, Katja Rost, and Margit Osterloh (2010) provides evidence that Lazear's result is not generalizable: financial incentives have a positive impact on effort only when intrinsic task interest is low. Presuming that installing windshields has little intrinsic interest, then Lazear's results, excellent as they are, do not generalize to tasks with high intrinsic motivation (recall Perry's concept of PSM) that are found in many public bureaucracies (e.g., teaching, police investigation, and environmental science).

(p. 798) Moreover, this problem of external validity is probably even more relevant in the case of research data collected by public agencies. Field data are “collected by government or private agencies for non‐scientific purposes…” (Smith 1987, 242). This reality plagues many of the debates on American bureaucracy—especially the debate between those who see a “top‐down” perspective on control versus those who hold a “bottom‐up” view. For instance, are observational data collected by agencies that support a principal–agent perspective underspecified because they do not account for the “black box” of the bureaucracy? When data reveal differences between the motives of bureaucrats and their counterparts in the private sector or in other nations, is it because we have ignored the differences between different governance mechanisms? In fact, the problem is simple yet frustrating: the most useful data for adjudicating these competing claims are unlikely to be produced by the targets of our inquiry—governments.

The Pooling Defense

Yet another defense against the problem of external validity in experimental designs is the amenability of these designs to meta‐analysis of results. In fact, researchers can subject them to the same types of meta‐analysis that they use for understanding the variety of inferences we draw from different observational studies. In the case of observational data, we conduct meta‐analyses because we do not trust our ability to control for unobservables and, thus, to isolate the causal effect in any given data application. Similarly, we might worry that design choices made by experimenters when trying to understand a theory in a given context also affect the outcomes of laboratory experiments. Thus, as with observational data, the meta‐analysis of laboratory experiments provides leverage that is beyond the reach of any one experiment and may reveal patterns that are not apparent from individual studies.

For instance, as we alluded to above, Weibel, Rost, and Osterloh (2010) present a meta‐analysis of forty‐six laboratory experiments that examined the effect of contingent pay on performance. They coded the results in terms of a moderator variable—task type—labeled as “challenging” or “non‐challenging.” Twenty‐seven of the experiments assigned subjects a simple and non‐challenging task, fifteen assigned a challenging task, and four studies did both. Their results show that the effect of contingent pay depended on type of task. Over the forty‐six studies, contingent pay had a strong positive effect on performance for non‐challenging tasks, but monetary rewards reduced performance in the case of challenging tasks. So while it is difficult in any given study to test for financial incentives “crowding out” intrinsic motivation in a task (e.g., Miller and Whitford 2002), “pooling” the results of multiple studies in which task difficulty is held constant in each study allows one to approximate the effect.

In essence, those results speak to the subject of validity. Anyone could have criticized the twenty‐seven experiments with a non‐challenging task as not being generalizable to real‐world bureaucracies. In that world, bureaucrats typically are (p. 799) charged with some complex and absorbing tasks such as teaching, criminal investigation, budget accounting, or forestry. And in light of the meta‐analytic results, the criticism would have been correct. But in specifying the particular reasons why the experiments were not generalizable, other experiments using a more challenging task in fact demonstrated support for a hypothesis (the task‐contingent effect of rewards) that would seem to offer remarkable guidance for empiricists in the world of natural bureaucracy. The fact that some experiments used an (overly) simple task while others did not actually provided significant leverage over the problem of incentives.

Putting the Bureaucracy in the Laboratory: Addressing the Problem of Scale

As the chapters in this handbook attest, there is much more to understanding organizational behavior in American bureaucracy than merely studying incentive structures and their effects on employees. These include factors such as organizational structure (Bendor and Hammond), goal ambiguity (Rainey), leadership and culture (Khademian), interest group interventions (Kerwin, Furlong, and West), and the courts (Mashaw). What is more, as Carolyn Heinrich and Carolyn Hill discuss in great detail in their chapter on hierarchical modeling, behaviors and outcomes are best seen as “nested” within different levels of an organization and across networks.

Within this context, moreover, bureaucracies obviously mobilize the efforts of tens of thousands of individuals, organized in towering hierarchies of fifteen or more levels of authority (or increasingly organized in fluid, nonhierarchical organizations), and, most notably, persisting in time for centuries. As Robert Durant (2009) notes, bureaucratic behavior is typically a product of history, immediate context, and contingency. As such, it is especially challenging to design experiments on questions integral to understanding public bureaucracies. Thus, critics charge that small markets and committees can be created in a laboratory setting, but nothing like a large‐scale bureaucracy—with its attendant human scale, complexity, and temporal duration—can be created in a controlled laboratory setting.

In what ways can organizations possibly be studied in a laboratory with a dozen individuals, with no real authority over each, and with observations lasting in time only an hour or two? We see one answer to this question in a set of experiments carried out largely in the 1960s by sociologists who were reconsidering the lessons of Max Weber (1947, 1958). As discussed in the introduction to this handbook and more fully in the chapters by David Rosenbloom, Hindy Schachter, and Guy Adams and Dan Balfour, Weber saw bureaucracy in terms of hierarchical relations, authority invested in specialized offices, with formal rules, and operated by technically qualified personnel. But not all organizations have all of these characteristics of the ideal (p. 800) type in exactly the same degree. For example, Peter Blau (1955) studied two offices in the same state employment agency in order to see the effects of one of Weber's ideal characteristics of bureaucracy: rule‐following behavior. He observed that the office that was more rule‐bound experienced lower overall productivity. However, the observational results were filled with the normal kinds of inferential uncertainty. Did rigid attention to rules cause lowered productivity or did a third variable cause both? Furthermore, the results were questionable in terms of external validity. Did the finding generalize to other employment agencies, to other state and local bureaucracies, or to street‐level bureaucracies?

This raises an important general question: What effects might variations in these or other key variables in Weber's approach have on organizational behavior and performance? Answering this question helped to put the study of Weberian bureaucracy on a scientific footing but also posed special problems for empirical research. The difficulty in answering this question was in gaining access to a large number of bureaucracies and constructing instruments for gathering comparable data. To try to sort out these effects, a number of key researchers felt Weber's key structural variables could be constructed in a laboratory setting, even if other characteristics (size and duration, for example) were not met. An early example of this approach was William Evan and Morris Zelditch's (1961) study of “bureaucratic authority.” Specifically, they saw a mistake in Weber's implicit assumption that hierarchical office and appointment to office based on merit are “perfectly correlated.” Put differently, they questioned whether “authority of knowledge” is always reinforced by the “authority of office” (884). In fact, of course, these are distinguishable: sometimes those with hierarchical authority have no specialized knowledge and vice versa.

In Evan and Zelditch's laboratory experiment, subjects were “hired” as part‐time employees at an hourly rate to code the face sheet of a new questionnaire. This was a realistic task because the authors worked for a research organization that hired employees for this purpose. Subjects (as coders) were given deliberately ambiguous and fragmentary instructions that involved identical “traps.” Subjects also were encouraged to contact by telephone their “coding supervisor” when the inevitable questions arose. Coding supervisors were confederates who supplied answers that were expert and helpful, neutral, or obviously deficient (e.g., an admission of ignorance and a recommendation to code “no answer”). Subjects were exposed to competent supervisors in the first time period and then randomly assigned to different quality supervisors in the second time period.

Questionnaires established that the subjects saw clear differences in the quality of their second‐period supervisors. Moreover, a perception of having “non‐expert” supervisors caused the coders to behave differently. Subjects were somewhat less likely to call for help from non‐expert supervisors, were more likely to make “trap errors” in the programmed deficiencies in the coding instructions, and were more likely to ignore the incompetent supervisors' advice to code a “no answer.” They even took uninformed guesses, a result consistent with a significant difference in the employee's evaluation of the supervisor's right to hold the job. There were no (p. 801) treatment differences in the reported right of the supervisor to be obeyed, but subjects clearly differentiated between bureaucratic (“It's his job to give orders”) and professional (“The supervisor knows more about it”) grounds for obeying the supervisor.

These experiments show once again how controlled experimentation allows causal inferences to be made: different bases for bureaucratic authority induce different behavioral responses. Control and randomization support the validity of this inference. Of course, the experiments again offer concerns about external validity. Evan and Zelditch note, “The main operative unit of the organization was the supervisor–coder relationship.…The organization was ‘professional’ in its goals, but clearly the subjects were not themselves professionals nor oriented to careers in this organization, despite efforts to induce a professional attitude in the instruction period” (1961, 892).

However, Weber says that bureaucracies are organizations that capture the characteristics of hierarchy, rules, expertise (in varying quantities), and authority in the office (or position). Consequently, the organizations built for these laboratory studies were bureaucracies, however temporary. Moreover, we contend that articulating useful concerns about external validity forces researchers to think theoretically. What theories help us to understand why the experiment's results would not hold in other organizations? Can we test those theoretically driven modifications in the laboratory as well? What field data would help us to test whether the differences between professional and hierarchical types of authority do not have the same causal effects as those found in the laboratory?

Blurring the Boundary?: Synthesizing Policy Experiments and Experimental Survey Research

Up to this point, we have treated experimental research on bureaucracy as being sharply differentiated from other empirical research. However, two areas of active research are presently blurring the boundary between experimental and laboratory research and, in the process, addressing traditional concerns about external validity. One is field experimental research on policy and the other is experimental survey research. Neither of these approaches looks much like traditional laboratory experiments, although each has borrowed ideas from experimental design in order to sharpen the reliability of the results. Moreover, while in some senses promising, these techniques are not without problems of their own.

The purpose of so‐called “natural” or “constructed” field experiments is to join two traditions in data analysis: the use of experimental method in designing studies (p. 802) and the drawing of inferences from auxiliary evidence found in the “real world” (Harrison and List 2004). For economists, field experiments such as these are a unique approach to assessing the counterfactual implicit in any theory of human behavior. This is the case because: (1) the subjects are from the real world; (2) they hold information specific to the transaction studied; (3) the commodities they manipulate are real, as are the tasks they complete or the trading rules they employ; (4) they recognize the stakes as salient; and (5) they interact in a nonsterile environment (i.e., the environment is not fully controlled by the researcher, so other sources of causation cannot be easily ruled out). Advocates of the approach argue that field experiments allow for a finer understanding of “control” in experimental design, that they are a natural analogue to the use of laboratory experiments, and that they help us better understand results from the lab (and how to design experiments in the lab).

Studies such as those done by Gary King and his associates (2007) are excellent examples of this technique. They show the power of leveraging policy experiments for better understanding the consequences of bureaucratic choices, either on the design side or in the process of implementation. Essentially, King's team showed that one can use field experiment data (in combination with advanced statistical tools) to evaluate important programs such as the Mexican Seguro Popular de Salud (Universal Health Insurance) program. Seguro Popular provides medical care, drugs, preventative services, and financial health protection to citizens and has been a central national health reform in Mexico for the last two decades.

Clearly, the use of natural or constructed experiments such as this help us better understand economic behavior generally or specifically, as well as the consequences of these choices and behaviors (e.g., the 1971 Rand Health Insurance Experiment; see Lohr et al. 1986). Moreover, they do so while limiting some of the concerns about external validity that we have discussed. Yet they also raise normative issues of their own. Would the most useful natural experiments actually take place in representative democracies if they involved large numbers of “losers”? If experiments take place in which many people lose, what does it say about the kinds of governments that allow them to be carried out? Either “experiments that are bad for people” will not occur (a selection problem) or the country that allows them is not democratic (which means that people do not have a choice about being involved).

It is also important to note that field experiments suffer from a type of effect that is addressed most concretely in the case of laboratory experiments: the experimenter effect. Treatises have been written about the optimal construction of experiments to limit the chances that the observer is influencing the observed behavior. Of course, the Heisenberg uncertainty principle shows us that even in benchmark sciences such as physics, it is impossible to remove these effects totally. Our claim, however is that knowledge about core bureaucratic politics concerns such as decision making, structure, and policy impact improves in a laboratory setting because the experimenter effect is calculable. In the field, the data are chosen in part by the organization that gathers them, leaving the researcher (as noted earlier) with data that are partially limited by how they were gathered and for what purposes. Field experiments are less affected by this problem because they are partially experimental. Still, when (p. 803) governments or other outsiders (such as foundations or other nongovernmental organizations) fund high‐profile experiments, the experimenter effect remains.

Yet another concern is expressed by Kosuke Imai (2005), who shows the necessity of accounting for implementation discrepancies that creep in between the experimental design stage and when the field experiment is implemented outside the laboratory. This is especially problematic in recent experimental designs involving survey research and is well recognized among researchers using these techniques. Survey researchers, for example, have long recognized the usefulness of implementing surveys in experiments to observe response variations when subjects encounter different versions of the same question (Schuman and Presser 1981). Consequently, researchers now use computer‐assisted telephone interviewing (CATI) systems to implement experimental designs in which respondents are randomly assigned to different conditions. In this fashion, differences can be observed with regard to the subject of interest: in these choice structures, respondents are “treated” under the different conditions. Such an approach combines random selection from a given population to guarantee generalizability with random assignment of subjects to treatments in order to guarantee internal validity.

To date, widespread use of CATI has been made in political science, with research groups at Berkeley and Michigan (through the Time‐Sharing Experiments for Social Sciences survey platform) helping to further this research agenda. But while it has had great success, this approach in some ways has limited usefulness for the study of public bureaucracies. To be sure, the availability of large‐scale survey evidence in the form of the Federal Human Capital Survey or the Merit Principles Survey has brought about a mini‐revolution in the study of bureaucrats. However, once again, the data are fundamentally limited in their usefulness for academic research because the people who construct such surveys want to use them for management instead of research. Moreover, the kinds of techniques that enhance causal statements that researchers desire in such surveys would come from the literature on experimental survey research. Would we really expect governments to use such techniques just so we as scholars might enhance our understanding of causal effects? Still, the technique holds promise in some instances.

Finally, and relatedly, Brian Gaines, James Kuklinski, and Paul Quirk (2007) show that while some of the problems of experimental survey research can be fixed in the survey itself, others can only be corrected when interpreting the results. Specifically, treatment effects are often so transitory that they can only be trusted if observed over time, through repeated interaction, and usually in some sort of panel study. These are exactly the kinds of studies unlikely to be carried out in the federal government in such a way that the responses of bureaucrats can be traced through time. Indeed, one concern we have heard about the current survey strategy involving federal bureaucrats is that respondents believe their responses are being tracked. This may explain some of the upward trend in positive responses to survey questions centering on quality‐of‐work issues. Overall, then, while promising in some ways, the experimental approach applied to surveys may be less useful in the case of surveying real bureaucrats.

(p. 804) An Agenda for Experimental Research on American Bureaucracy

We have mentioned at several points in this chapter the differences between the political‐economic and social‐psychological approaches to bureaucratic experiments. Some of the most intriguing possibilities for future experimental research may well span the accepted boundaries dividing behavioral and political‐economic research or respond to external validity critiques. For example, political‐economic theorists have traditionally assumed that individual decision makers were essentially interchangeable and equally motivated by financial incentives. In fact, experimental results have driven them to the conclusion that much of the variation in outcome is due to differences in individual subjects coming in the door. These include striking variations in how much they attend to social cues as a supplement to financial incentives. In our view, the convergence of behavioralist and “rational actor” experimentalists on such concepts as “social norms” and “signals as to type” (e.g., whether one is a “hard worker” or not) is one of the most promising developments for the future.

Beyond this, our forecast of the opportunities for greater use of laboratory experiments for better understanding American bureaucracy is a robust one. Those opportunities include the following questions (followed by a discussion of several of our suggestions):

  • What types of people populate bureaucracies and what are the impacts of those types? A simple example of the need to address this issue is the current disconnect between the literatures on PSM and those on group decision‐making experiments. Also making this question salient are the issues related to political appointee–careerist relationships and decision making discussed in this handbook by, respectively, Durant and Resh and Workman, Jones, and Jochim.

  • Do neurological and physical attributes shape the behaviors of bureaucrats? For example, are differences in how men and women perceive fairness rooted in experiences or cognition? As Norma Riccucci discusses in her chapter, fairness at this level is a perennial issue in public bureaucracies in the United States, just as it is at the organizational level when grievances become court issues.

  • Do different forms of information aggregation produce different outcomes in bureaucracies (see Workman, Jones, and Jochim, as well as Bendor and Hammond, for insights into this question)? We easily see opportunities for meshing models of information aggregation (e.g., the Condorcet jury theorem) with the longstanding concern over sources of information in agencies. For example, the jury theorem shows that group decisions are always better than individual decisions when trying to ascertain diffused “truth” in the environment. Is the same true in bureaucracies?

  • (p. 805) What are the next steps in bridging the study of organizational behavior and the formal models of bureaucracy discussed in this handbook? For instance, valence‐based models of politics say that people “care” a great deal about some parts of the rewards they are offered and not much about others. Consequently, these models may help us better understand models of multiple principals in America's separation‐of‐powers context.

  • What are the implications of “small‐b” bureaucracy and “big‐b” bureaucracy in the American system? Can experiments help us understand how the American people and their elected representatives differentiate between the organization of government and their individual experiences with bureaucrats?

To illustrate, consider how experimental designs informed by game‐theoretic insights and refinements might examine the question of how “type” affects the behavior of individuals in groups. In game theory terms, our individual “type” provides us with beliefs about the decision environment, beliefs about others' beliefs about the environment, and so on (e.g., Harsanyi 1967–8). In fact, John Harsanyi's (1967–8) early work building a theory of types (e.g., types of people) came out of research for, and about, bureaucratic behavior, a research project done for the U.S. Arms Control and Disarmament Agency (Myerson 2004). As such, types really represent the whole of the private information available to any given player, including beliefs about how others are expected to act in collective‐action problems—a common bureaucratic dilemma (Croson 2000; Gächter and Renner 2006).

Likewise, the literature on PSM argues that agencies perform better when populated by bureaucrats with high public service motivation. In game theory terms, the problem for those bureaucrats is whether they should believe that others are also doing all they can to pursue the agency's best interests (as opposed to, say, shirking). Yet game theorists approach the problem of beliefs by thinking like, well, game theorists. In contrast, organization theorists see PSM from the perspective of the agencies they study. They see PSM in survey terms because they observe differences across the public, private, and nonprofit sectors. In essence, the selection of beliefs (of bureaucratic types) in the real world is in how agencies constitute themselves and manage their collective behavior. In experimental terms, agents with higher PSM may be more likely to contribute to the “public good” in the laboratory. Unlike in the random sample surveys most used to examine PSM, though, subjects with lower PSM in the laboratory also should be more likely to contribute when they interact with more agents with higher PSM.

We would like to close with a broader, more philosophical point that goes beyond particular questions for a future research agenda. In the world of “big science,” some of the implications of the physics of small particles can only be tested in giant particle accelerators that collide particles moving at close to the speed of light. In order to test other hypotheses, scientists often cool matter down nearly to absolute zero. Big science does not justify these experiments by referring to the realistic nature of the setting or to the generalizability of the results to an everyday world in which matter is at a more conventional temperature or moving at a more conventional speed. In both (p. 806) cases, they justify the experiments because theory predicts unusual effects from unusual conditions. In these cases, the value of the experiment is clearly linked to a critical test of a theory, and the uniqueness of the experimental setting is therefore an advantage, not a weakness.

In our view, the problem with experimental work as an approach for better understanding the American bureaucracy is not experimentation itself but rather the state of theorizing. We see the necessity for public administration to become theoretically grounded or else lose legitimacy as a science (see Wood for a similar sentiment, albeit with a different methodological and theoretical solution). Admittedly, theories about bureaucracies are not at the same state of development as physics. However, increasingly elaborate theories in our field are more and more precise about the conditions that constitute a viable test of the theory.

Like critiques leveled at the experimental design approach to understanding bureaucratic behavior, games performed in the laboratory on other topics do not strike observers as being particularly realistic. For instance, in the “ultimatum game” experiments, many realistic variations are controlled so that the subject sees only a computer terminal, a stark choice, and information about the financial consequences of that choice. And yet the unexpected results of the ultimatum game experiments have been used to interpret and theorize about labor markets, the presidential veto, agenda control in legislatures, international conflict, and terrorism. The reason for the impact of the ultimatum game experiments is not that they “generalize” directly to the real world, the standard we continue to apply ruthlessly to experimental designs involving the study of American bureaucracy. Rather, and like the “big science” experiments noted above, they showed some theories to be wrong and others to be potentially right.

Laboratory experiments do not and cannot reproduce the real world. However, the most powerful justification for laboratory experiments is that they are designed to provide the most rigorous test possible of a theory. As such, the positive trajectory of theorizing about bureaucracies leads us confidently to predict a continued opportunity for applying laboratory experiments to the study of American bureaucracy. Experimental research is today considered a central, if not the foundational, approach for understanding the organizational behavior of private firms (e.g., Bryman 1989). The question in our minds is whether researchers will, in turn, take up the method when studying public bureaucracies. If they do, a robust research agenda awaits that can complement, elaborate, and extend theory and practice related to American bureaucracy in the twenty‐first century.

References

Asch, B. J. 1990. Do Incentives Matter? The Case of Navy Recruiters. Industrial and Labor Relations Review, 43/3: 89S–106S.Find this resource:

    Ball, S. B., and Cech, P. A. 1991. The What, When, and Why of Picking a Subject Pool. Discussion Paper, Virginia Polytechnic University.Find this resource:

      (p. 807) Barnard, G. A. 1982. Causation. In Encyclopedia of Statistical Sciences, ed. S. Kotz, N. L. Johnson, and C. B. Read. New York: John Wiley Sons.Find this resource:

        Blau, P. M. 1955. The Dynamics of Bureaucracy: A Study of Interpersonal Relationships in Two Government Agencies. Chicago, IL: University of Chicago Press.Find this resource:

          Bottom, W. P., Holloway, J., Miller, G. J., Mislin, A., and Whitford, A. B. 2006. Building a Pathway to Cooperation: Negotiation and Social Exchange between Principal and Agent. Administrative Science Quarterly, 5/1: 29–58.Find this resource:

            Bryman, A. 1989. Research Methods and Organization Studies. London: Unwin Hyman.Find this resource:

              Burns, P. 1985. Experience and Decision‐Making: A Comparison of Students and Businessmen in a Simulated Progressive Auction. In Research in Experimental Economics, ed. V. L. Smith. Greenwich, CT: JAI Press.Find this resource:

                Camerer, C. F., Loewenstein, G., and Prelec, D. 2004. Neuroeconomics: Why Economics Needs Brains. Scandinavian Journal of Economics, 106/3: 555–79.Find this resource:

                  Cawley, J. H., and Whitford, A. B. 2007. Improving the Design of Competitive Bidding for Medicare Advantage. Journal of Health Politics, Policy and Law, 32/2: 317–47.Find this resource:

                    Cohn, E., and Teel, S. J. 1992. Participation in a Teacher Incentive Program and Student Achievement in Reading and Math. 1991 Proceedings of the Business and Economic Statistics Section, American Statistical Association, Alexandria, VA.Find this resource:

                      Conlon, E. J., and McLean Parks, J. 1990. The Effects of Monitoring and Tradition on Compensation Arrangements: An Experiment on Principal–Agent Dyads. Academy of Management Journal, 33: 603–22.Find this resource:

                        Croson, R. T. A. 2000. Thinking Like a Game Theorist: Factors Affecting the Frequency of Equilibrium Play. Journal of Economic Behavior and Organization, 41/3: 299–314.Find this resource:

                          Deci, E. L., and Ryan, R. M. 2000. The “What” and “Why” of Goal Pursuits: Human Needs and the Self‐Determination of Behavior. Psychological Inquiry, 11: 227–68.Find this resource:

                            Dee, T. S., and Keys, B. J. 2004. Does Merit Pay Reward Good Teachers? Evidence from a Randomized Experiment. Journal of Policy Analysis and Management, 23/3: 471–88.Find this resource:

                              Durant, R. F. 2009. Theory Building, Administrative Reform Movements, and the Perdurability of Herbert Hoover. American Review of Public Administration, 39/4: 327–51.Find this resource:

                                Evan, W. M., and Zelditch, M. 1961. A Laboratory Experiment on Bureaucratic Authority. American Sociological Review, 26/6: 883–93.Find this resource:

                                  Fehr, E., and Gächter, S. 1998. Reciprocity and Economics: The Economic Implications of Homo Reciprocans. European Economic Review, 42: 845–59.Find this resource:

                                    —— ——  2008. Wage Differentials in Experimental Efficiency Wage Markets. In Handbook of Experimental Economics Results, volume 1, ed. C. R. Plott and V. L. Smith. Amsterdam: North Holland.Find this resource:

                                      Frey, B. S. 1997. On the Relationship between Intrinsic and Extrinsic Work Motivation. International Journal of Industrial Organization, 15/4: 427–39.Find this resource:

                                        Friedman, D., and Sunder, S. 1994. Experimental Methods: A Primer for Economists. New York: Cambridge University Press.Find this resource:

                                          GÄchter, S., and Renner, E. 2006. The Effects of (Incentivized) Belief Elicitation in Public Good Experiments. CeDEx Discussion Paper No. 2006‐16, University of Nottingham, September.Find this resource:

                                            Gaines, B. J., Kuklinski, J. H., and Quirk, P. J. 2007. The Logic of the Survey Experiment Reexamined. Political Analysis, 15/1: 1–20.Find this resource:

                                              Harrison, G. W., and List, J. A. 2004. Field Experiments. Journal of Economic Literature, 42/4: 1009–55.Find this resource:

                                                Harsanyi, J. C. 1967–8. Games with Incomplete Information Played by Bayesian Players. Management Science, 14/3: 159–82, 320–34, 486–502.Find this resource:

                                                  (p. 808) Henrich, J. P. 2000. Does Culture Matter in Economic Behavior? American Economic Review, 90/4: 973–9.Find this resource:

                                                    —— Boyd, R., Bowles, S., Camerer, C. F., Fehr, E., and Gintis, H. 2004. Foundations of Human Sociality: Economic Experiments and Ethnographic Evidence from Fifteen Small‐Scale Societies. New York: Oxford University Press.Find this resource:

                                                      Holland, P. W. 1986. Statistics and Causal Inference. Journal of the American Statistical Association, 81/396: 945–60.Find this resource:

                                                        Imai, K. 2005. Do Get‐Out‐the‐Vote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments. American Political Science Review, 99/2: 283–300.Find this resource:

                                                          Kaboolian, L. 1998. The New Public Management: Challenging the Boundaries of the Management vs. Administration Debate. Public Administration Review, 58/3: 189–93.Find this resource:

                                                            King, G., Gakidou, E., Ravishankar, N., Moore, R. T., Lakin, J., Vargas, M., Téllez‐Rojo, M. M., Hernández Ávila, J. E., Hernచndez Ávila, M., and Llamas, H. H. 2007. A “Politically Robust” Experimental Design for Public Policy Evaluation, with Application to the Mexican Universal Health Insurance Program. Journal of Policy Analysis and Management, 26/3: 479–506.Find this resource:

                                                              Kipnis, N. 2005. Chance in Science: The Discovery of Electromagnetism by H. C. Oersted. Science & Education, 14/1: 1–28.Find this resource:

                                                                Lazear, E. P. 2000. Performance Pay and Productivity. The American Economic Review, 90/5: 1346–61.Find this resource:

                                                                  Lee, S.‐Y., and Whitford, A. B. 2008. Exit, Voice, Loyalty, and Pay: Evidence from the Public Workforce. Journal of Public Administration Research and Theory, 18/4: 647–71.Find this resource:

                                                                    Lohr, K. N., Brook, R. H., Kamberg, C. J., Goldberg, G. A., Leibowitz, A., Keesey, J., Reboussin, D., and Newhouse, J. P. 1986. Use of Medical Care in the Rand Health Insurance Experiment: Diagnosis‐ and Service‐Specific Analyses in a Randomized Controlled Trial. Medical Care, 24/9: S1–S87.Find this resource:

                                                                      McLean Parks, J., and Conlon, E. J. 1995. Compensation Contracts: Do the Agency Theory Assumptions Predict Negotiated Agreements? Academy of Management Journal, 38: 821–38.Find this resource:

                                                                        Miller, G. J., and Whitford, A. B. 2002. Trust and Incentives in Principal‐Agent Negotiations: The “Insurance/Incentive Trade‐Off.” Journal of Theoretical Politics, 14/2: 231–67.Find this resource:

                                                                          Moe, T. M. 1987. An Assessment of the Positive Theory of Congressional Dominance. Legislative Studies Quarterly, 12/4: 475–520.Find this resource:

                                                                            Myerson, R. B. 2004. Harsanyi's Games with Incomplete Information. Management Science, 50: 1818–24.Find this resource:

                                                                              Perry, J. L. 1996. Measuring Public Service Motivation: An Assessment of Construct Validity. Journal of Public Administration Research and Theory, 6/1: 5–22.Find this resource:

                                                                                —— Mesch, D., and Paarlberg, L. 2006. Motivating Employees in a New Governance Era: The Performance Paradigm Revisited. Public Administration Review, 66/4: 505–14.Find this resource:

                                                                                  Rabin, M. 1993. Incorporating Fairness into Game Theory. American Economic Review, 83/5: 1281–1302.Find this resource:

                                                                                    Rainey, H. G. 1983. Public Agencies and Private Firms: Incentive Structures Goals, and Individual Roles. Administration and Society, 15/2: 207–42.Find this resource:

                                                                                      Roethlisberger, F. J., and Dickson, W. J. 1939. Management and the Worker. Cambridge, MA: Harvard University Press.Find this resource:

                                                                                        Rubin, D. P. 1974. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, 66/5: 668–701.Find this resource:

                                                                                          Schuman, H., and Presser, S. 1981. Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context. New York: Academic Press.Find this resource:

                                                                                            (p. 809) Smith, V. L. 1987. Experimental Methods in Economics. In The New Palgrave: A Dictionary of Economics, ed. J. Eatwell, M. Milgate, and P. Newman. New York: Stockton Press.Find this resource:

                                                                                              Taylor, F. 1912. The Principles of Scientific Management. In Scientific Management: Tuck School Conference, Dartmouth College. Hanover, NH: Amos Tuck School.Find this resource:

                                                                                                Weber, M. 1947. The Theory of Social and Economic Organization, trans. and ed. A. M. Henderson and T. Parsons. New York: Oxford University Press.Find this resource:

                                                                                                  —— 1958. From Max Weber: Essays in Sociology, trans. and ed. H. H. Gerth and C. W. Mills. New York: Oxford University Press.Find this resource:

                                                                                                    Weibel, A., Rost, K., and Osterloh, M. 2010. Pay for Performance in the Public Sector—Benefits and (Hidden) Costs. Journal of Public Administration Research and Theory, 20/2: 387–412.Find this resource:

                                                                                                      Weingast, B., and Moran, M. 1983. Bureaucratic Discretion or Congressional Control? Regulatory Policymaking by the Federal Trade Commission. Journal of Political Economy, 91: 765–800.Find this resource:

                                                                                                        Whyte, W. F. 1955. Money and Motivation: An Analysis of Incentives in Industry. Westport, CT: Greenwood Press. (p. 810) Find this resource: