Weighing the Costs and Benefits of Public Policy: On the Dangers of Single Metric Accounting

This article presents two related challenges to the idea that, to ensure policy evaluation is comprehensive, all costs and benefits should be aggregated into a single, equity-weighted wellbeing metric. The first is to point out how, even allowing for equity-weighting, the use of a single metric limits the extent to which we can take distributional concerns into account. The second challenge starts from the observation that in this and many other ways, aggregating diverse effects into a single metric of evaluation necessarily involves settling many moral questions that reasonable people disagree about. This raises serious questions as to what role such a method of policy evaluation can and should play in informing policy-making in liberal democracies. Ultimately, to ensure comprehensiveness of policy evaluation in a wider sense, namely, that all the diverse effects that reasonable people might think matter are kept score of, we need multiple metrics as inputs to public deliberation.

supposed to be based, as much as possible, on direct reports of how different outcomes are actually experienced when lived through. Here Dolan puts his cards on the table [1 p5]: What WELLBYs aim to capture is a hedonistic notion of wellbeing; a good life is one of good (pleasurable, purposeful) experiences.
Relative to a hedonistic standard of what a good life is, people's own hypothetical comparative judgements often come out biased. Using a hedonistic metric as a measure of costs and benefits in a standard cost-benefit analysis is a way of implementing a hedonistic form of utilitarianism, according to which the sum-total of subjective wellbeing should be maximised. Equity-weighting one's cost-benefit analysis, on the other hand, implements a hedonistic form of prioritarianism, according to which the experiences of the worst-off count for more.
The philosophical debate on hedonism as a theory of wellbeing, and on utilitarianism and prioritarianism is vast, 3 and, it is safe to say, will never reach a consensus on the central issues. Instead of offering a critical discussion of these particular moral views and their application in public policy evaluation, I would like to take a step back and consider the more general idea that a single metric should be used in order to aggregate the many effects of public policies and to articulate the trade-offs involved in making a decision. I will offer two related challenges to this idea.
The first is to point out how the use of a single metric limits the extent to which we can take "distributional concerns" into account, making the moral commitments embodied in such approaches narrower than it might initially seem. The possibility of equity-weighting is often presented as a way to flexibly accommodate those with more egalitarian leanings, giving the appearance of a "broad church". But equity-weighting can only accommodate concerns about the distribution of the single metric that is being equity-weighted. And those with egalitarian leanings often also care about the distribution of other things (or indeed equality of a non-distributional kind).
The second challenge starts from the observation that in this, and in many other ways, the choice of a single metric and its implementation in the aggregation of diverse effects necessarily involves settling many moral questions that reasonable people (including even, sometimes, philosophers) disagree on. This raises serious questions as to what role such a method of policy evaluation can and should play in informing policy-making in liberal democracies. Ultimately, to ensure comprehensiveness of policy evaluation in a wider sense, namely, that all the diverse effects that reasonable people might think matter are kept score of, we need multiple metrics as an input into public deliberation.

What Else Might Matter
The hedonistic theory of individual wellbeing is only one of several major kinds of theories of what is good for individuals. Some have argued that what is good for you is to get the things you desire, or would desire under some ideal conditions, whether you desire pleasurable experiences or not. The pandemic and the policy responses to it around the world have massively frustrated people's life plans. According to desire-fulfilment theories, this would matter a great deal whether or not going through with those plans would have made people happy. Others have argued that there are some things that are good for people whether they enjoy or desire them or not. If education, engagement with the arts, regular gatherings with family, or direct contact with different ways of life are such things, again the pandemic and policy responses have undermined these in ways that a hedonistic wellbeing measure might not capture.
Using a metric for policy evaluation based on any one of the available theories of wellbeing means using a metric that risks leaving out or not doing full justice to some of the things at least some reasonable people take to be morally relevant. Moreover, many people believe that there are things other than wellbeing that matter, such as freedom (as is again very relevant during the pandemic), or the preservation of the environment for its own sake. In this respect, too, equity-weighted cost-benefit analysis using a broadly hedonistic wellbeing measure cannot capture everything deemed to be morally relevant by at least some parts of the population. Perhaps less obviously, I want to focus here on how even those who agree that the hedonistic theory is the correct theory of individual wellbeing, and that all else that matters are "distributional concerns" may not be satisfied that equity-weighted subjective wellbeing-based cost-benefit analysis captures everything that is morally relevant. This is because the framework can accommodate some, but not all distributional concerns.
Just as there is lively debate on what the correct theory of individual wellbeing is, there is debate on the metric(s) of equality or priority: Equality of what or priority in access to what matters? Having a single metric for the purposes of policy evaluation, and accommodating distributional concerns through equity-weighting commits us to using that same single metric as our metric of equality or priority. In the proposal under discussion, what matters is an equitable distribution of subjective wellbeing, or giving greater weight to the subjective wellbeing of those who are worse-off in terms of subjective wellbeing. As Dolan puts the idea of equity-weighting, the claims individuals have to resources depend both on the gain in WELLBYs they can expect as a result of those resources, and on their current and expected lifetime suffering or wellbeing compared to others (as well as the WELLBY effects they have on others) [1].
But those who defend some form of distributional equality or priority often care about the distributions of things other than wellbeing (be it hedonic or not). For instance, resources, capabilities, or opportunities for wellbeing are alternative potential metrics of equality or priority. 4 Again, it is easy to think of ways in which distributions of these alternative metrics may have been affected in ways not perfectly correlated with wellbeing itself during the ongoing pandemic, which would not be captured by the proposed method of policy evaluation under discussion. For instance, the pandemic and policy response have affected -and have most likely diminished-the ways in which resources can be translated into wellbeing, but despite that, you might think it still matters that some simply have more than others. And the pandemic response has taken away many opportunities for welfare people may not actually have made use of, but which you might think were nevertheless important for them to have as much of as those who did use them.
So, equity-weighted WELLBY-based cost-benefit analysis is narrower in the ways in which it can accommodate distributional concerns than it might initially seem. And in large part, this is down to the ambition of using a single metric to both capture expected harms and benefits, and to capture distributional concerns. Whatever your single measure of costs and benefits, an equity-weighted cost-benefit analysis can only take into account the distribution of that single metric, and will miss other distributional concerns that have been defended by prioritarians and egalitarians. A prominent branch of egalitarianism, social egalitarianism, is not even concerned directly with distribution at all, but rather with making sure that people can engage with each other as equals. 5 Finally, even granting a single metric for the purposes of capturing harms and benefits and for distributional concerns, there remains an important ambiguity about what kind of distributional concern equity-weighting involves in the context of risk; that is, when policies impose probabilities of harms and benefits on people, rather than certainties. In such contexts, is what matters the ex ante distribution of risks of harm and chances of benefit, or is it the ex post distribution of harms and benefits? Given how central risk is to the evaluation of policy responses to the COVID-19 pandemic, the next section will explain this ambiguity in some more detail, and argue that talk of a single metric of policy evaluation risks obfuscating the issue.

Distributing Harms, Distributing Risks
Consider the following choice problem loosely based on an example by Peter Diamond [16]. You are in charge of making sure one of your equally well-off flatmates -Amal or Bella -moves out, and this outcome would be equally bad for each. Do you: A. Choose Amal, B. Choose Bella, or C. Throw a fair coin, giving each a 50% chance of staying?
There seems to be an intuitive equity case for C, even though ex post, the outcomes of all three choices have the same wellbeing distribution. If you agree, this is likely because you think the distribution of chances of harms and benefits matters. To show how such an intuition might extend to a stylised policy case, suppose that as a policy-maker, you have to choose between the following two prospects for a population of ten million: D. Everybody faces an additional 0.002% risk each of a loss of 30 WELLBYs. E. One thousand people who are currently at welfare levels that are representative of the population at large face an additional 10% chance each of a loss of 30 WELLBYs.
Here, there seems to be an intuitive equity case for D, even though it is virtually certain that the loss of WELLBYs is larger in D and no more equally distributed ex post. If you agree, then again this seems to express concern for the distribution of risks of harm, which are much more concentrated on a few individuals in E.
Cost-benefit analysis in the social welfare function tradition can implement equity-weighting in two main different ways in the context of risk: It can either introduce equity weights on the ex ante expectations of harms and benefits a proposed policy imposes on individuals; or it can equity-weight the ex post distributions of harms and benefits in the population for each potential policy outcome, and recommend the policy option with the best expectation of equity-weighted outcomes. 6 The recommendations of the two approaches can come apart, as they (likely) would in the two examples just described. The first strategy is sensitive to the distribution of risks, would recommend option C in the first case, and, with the right parameter choices, could recommend option D in the second case. The second strategy is insensitive to ex ante distributions of risks, and would be indifferent between the options in the first case, and recommend E in the second case.
There is a lively debate about which of these two strategies is better. There are also ways to combine them. 7 The point I want to raise here is not only that defence of an equity-weighted cost-benefit analysis is ambiguous on this morally important question. It is also that talk of a single metric may obfuscate the issue. Both the ex ante and the ex post approaches (and any combination between them) use a single wellbeing metric, and then merely proceed to combine this with probabilities and equity-weights in different ways. But in so doing, the ex ante approach implements a distributive concern for a currency different from wellbeing, namely chances of wellbeing, or conversely, risks of harm. If we think that the distribution of such chances and risks matters, so if we favour the ex ante or a mixed approach, then there are really two things we need to keep track of: The expected wellbeing distribution in the population as a consequence of policies, and how the risks of harms and chances of benefits are distributed in the population. If we don't, there is something else that our approach to policy evaluation does not keep track of that at least some people find morally relevant.
This issue is especially relevant in the pandemic given how central risk is in the management of the pandemic response. The virus itself poses a risk of death or serious adverse health outcomes that is much higher for some parts of the population rather than others. And different potential policy responses differ in the extent to which they concentrate or spread risks of harmful outcomes, such as unemployment, within the population. If avoiding an unequal spread of risks of harm is acknowledged as a distinct policy goal, this may in some cases lead us to accept lower and no more equitably distributed expected aggregate wellbeing in the population ex post, as in the stylised policy example above.

The Problem of Reasonable Disagreement
The last two sections canvassed a number of things that matter morally, according to at least some of the people who have thought about them, but that the framework of policy evaluation proposed by Dolan will not account for -and that indeed any framework insisting on a single metric to capture both costs and benefits, as well as distributional concerns will fail to account for. There are many more ways in which implementing an equity-weighted WELLBY-based cost-benefit analysis involves settling on specific answers to a number of contentious moral questions: What is the right theory of wellbeing? If it is hedonistic in general, which experiences count as bad, which as good? Is wellbeing all that matters? Whose wellbeing matters? What should the equity weights be? Should they be applied ex ante or ex post (or both)? And so on. These are all questions that reasonable, thoughtful and well-informed people disagree about. So, policy evaluation using this framework can only hope to be comprehensive in the sense that all effects of a policy are accounted for against the background of specific answers to these questions. It can't be comprehensive in the sense that it accounts for everything that at least some reasonable people take to be morally relevant.
This problem is not specific to Dolan's proposal, of course. It arises especially starkly for proposals that aspire to aggregate all potential effects of policies into a single metric, and output specific policy recommendations (rather than, say, simply present an array of potentially relevant considerations to policy-makers). Doing so must involve making judgements about what matters, how it matters, and how trade-offs between the things that matter are to be made. Of course, public decision-makers cannot get around making such judgements eventually. They have to choose, after all, and they must do so against a backdrop of reasonable disagreement in the population they aim to serve. The question is what role comprehensive frameworks of policy evaluation, presented and advocated for by social scientists -such as the one under discussion -can and should play in this eventual decision-making process.
The danger, as I see it, is illustrated by this caricature: If social scientists were to simply present policy recommendations based on evaluations in terms of a single metric capturing many different effects to policy-makers and to the public without further context and qualification, this would not only mask all of the contentious moral decisions that went into the construction of that metric, but it it would also endow the recommendation with the authority of scientific expertise, making it hard for public decision-makers to diverge from the recommendation. 8 And that would be a threat to the liberal democratic ideal of how public decision-making in the face of reasonable disagreement should be done: Value conflicts should be resolved by democratically elected officials in a way that is open to public scrutiny. 9 Of course, policy-making needs social scientific input, and recent philosophy of science is also rich in demonstrations that social science, just like any science, can't help but be value-laden. 10 But there are clearly ways in which social scientists can make sure to help, rather than undermine, democratic decision-making. When it comes to comprehensive frameworks for policy evaluation like the one advocated by Dolan, there seem to be two main strategies for doing so.
One strategy is to work closely with the public and democratically elected officials to devolve as much as possible all important value judgements, so that the resulting recommendations would have democratic legitimacy. 11 There are some suggestions in Dolan's text pointing towards this kind of strategy. For instance, he writes "[e]ven if we never end using a single metric as the final arbiter on what to do, the processes by which we discuss the data required to generate one, and debates about how to make the diverse array of human experiences commensurable with one another, will lead to policy decisions that better account for the myriad of ripple effects they generate" [1 p3]. Moreover, he suggests drawing on, and generating more evidence on, what the public thinks about various kinds of trade-off to inform, for instance, how equity-weights are set, while at the same time conceding that "[e]mpirical investigation of these issues can only get us so far. We need to ensure that the policymaking processes better reflect the myriad of concerns and impacts" [1 p11].
To assuage worries about a lack of democratic legitimacy of methods of evaluation that take sides on morally contentious questions, however, there would need to be democratic input on all the contentious value assumptions. And as the foregoing aimed to illustrate, these go far beyond the ways in which equity-weights are set, but also concern what the appropriate currency of distributive justice is to begin with, and much more. Ensuring the democratic legitimacy of every element of the analysis would be a large undertaking. 12 And it is, moreover, not clear to me that such an undertaking would result in anything like an equity-weighted WELLBY-based cost-benefit analysis. For instance, there is some evidence that many people are reluctant to trade-off especially large burdens against any number of smaller burdens, which is antithetical to this framework. 13 The alternative strategy involves presenting one's preferred framework of policy evaluation (or its specific applications) to policy-makers as only one of several reasonable ways of evaluating policy options. Rather than a way of settling what policy-makers should do, the analysis would be an input into public decision-making. To serve as a good basis for public discussion and eventual policy choice, there has to be transparency about all the value judgements that went into the assessment and that are inherent in the general framework. 14 But if we care about comprehensiveness in the wider sense -that there is proper accounting of all the things that reasonable people might find morally relevant -we also have to make sure that alternative frameworks and metrics are presented to the public, to enable there to be an informed public debate amongst people with different values, to reveal whether there are options that can be endorsed from any or most moral perspectives, and ultimately to give policy-makers informed options as to which values to pursue.
Of course, ensuring such wider comprehensiveness is not the responsibility of any one scientist or research team or even subfield, but rather of the scientific community at large and the science policy that sets its parameters. It is from this wider perspective that I think the call for a single metric of policy evaluation is problematic. From within some particular value frameworks (for instance, a hedonistic ex post prioritarian one), the call for a single metric of policy evaluation makes sense (assuming there is transparency about what goes into the metric) and can be a means of comprehensively aggregating everything morally relevant within that moral framework. 15 But from the wider political perspective, where the goal should be to ensure that the outputs of policy-relevant social science enable and inform public discourse in the context of reasonable disagreement, what we need are multiple metrics and frameworks.

Conclusion
I have presented two related challenges to the idea that, to ensure policy evaluation is comprehensive, all costs and benefits should be aggregated into a single, equity-weighted metric. Firstly, the only distributional concerns such equityweighting can accommodate concern the distribution of that single metric. But those with prioritarian or egalitarian leanings often care about the distributions of things other than our chosen metric of costs and benefits (subjective wellbeing in Dolan's case). Moreover, this is just one of many ways in which aggregating diverse effects into a single metric of policy evaluation involves settling on specific answers to controversial moral questions that reasonable people disagree on. The second challenge is that this raises serious questions as to what role such a method of policy evaluation, and advocacy for it by social scientists, can and should play in informing policy-making in liberal democracies. There is a wider sense of comprehensiveness of policy analysis, where the ideal is that everything that reasonable people might think is morally relevant is kept proper score of as an input into public deliberation and choice. Given reasonable people disagree on many important questions of value, achieving such wider comprehensiveness requires the use of multiple metrics.