Reasons to doubt new health aid study on fungibility

This post is by David Roodman, a research fellow at the Center for Global Development (CGD) in Washington, DC. A couple of weeks ago, researchers at the Institute for Health Metrics and Evaluation triggered a Richter-7 media quake with the release of a new study in the Lancet.

Here’s how the Washington Post cast the findings:

After getting millions of dollars to fight AIDS, some African countries responded by slashing their health budgets.

Laura Freschi at Aid Watch blogged it too.

I am not a global health policy wonk, and I don’t play one on this blog, but it may well be the case that I wrote the program that produced the headline numbers (for every dollar donors gave to governments to spend on health, governments cut their own spending by $0.43–1.17).

I find the results generally plausible. I also don’t particularly believe them. Let me explain.

The results are plausible because it is easy to imagine that health aid is partly fungible: governments can take advantage of outside finance for health by shifting their own budget to guns, gyms, and schools. I would. Wouldn’t you? Well, maybe except for the guns part.

The results are dubious because it is an extremely hairy business to infer causation from correlations in cross-country data. That’s why Bill once sighed about the:

1 millionth attempt to resolve the relationship in a cross-country growth regression literature that is now largely discredited in academia.

The variable being explained here is not growth but recipient governments’ aid spending, which is admittedly less mysterious. But skepticism is still warranted. Consider:

  • The model may be wrong. The study assumes that aid received in a year only directly affects government spending that same year, even though it could take longer for the money to pipeline through—especially if recipients bank the aid to smooth its notorious volatility (hat tip to Mead Over; also see Ooms et al. Lancet commentary).
  • The quantities of interest are health-aid-to-governments and government-health-spending-from-own-resources, which is calculated as total government-health-spending minus health-aid-to-governments (yes, the variable I just mentioned above). So if health-aid-to-governments were systematically overestimated for some countries and years, government-health-spending-from-own-resources would automatically be underestimated.For example, suppose the study is wrong, that there is no relationship between health aid and governments’ health spending from their own resources. Suppose too that health aid to some countries, as measured, includes payments to expensive western consultants. That money would never reach the receiving government, resulting in an overestimate of actual aid receipts and an underestimate of how much governments are contributing to their own health budgets. The analysis would then spuriously show higher health aid causing governments to slash their own health spending. In another Lancet commentary, Sridhar and Woods list four possible sources of mismeasurement of this sort.

Both these problems must be present to some extent, creating mirages of fungibility.

Understanding at least the latter problem of causality, the authors feed their data into a black box called “System GMM.” (They call it “ABBB,” using the initials of the people who invented it.) I am in an intimate, long-term relationship with System GMM, having implemented it in a popular computer program. I have worked to demystify System GMM and documented how, just by accepting standard default choices in running the program, you can easily fail to prove causality while appearing to succeed. I can’t explain why without getting technical, which is not to say that only I know the problem – it is very well known among economists with some minimum econometric competence – but NOT to everyone who actually uses the techniques. Suffice it to say that I sometimes feel like this black box is a small time bomb that I have left ticking on the landscape of applied statistical work.

Responsible use of this black box involves telling your readers how you set all the switches and dials on it, as well as running certain statistical tests of validity. The Lancet writers have not done these things (yet). Nor have they shared their full data set. So it is impossible to judge how well their claims about cause and effect are rooted in the data. If replicability is a sine qua non of science, then this study is not yet science.