Development Experiments: Ethical? Feasible? Useful?

A new kind of development research in recent years involves experiments: there is a “treatment group” that gets an aid intervention (such as a de-worming drug for school children), and a “control group” that does not. People are assigned randomly to the two groups, so there is no systematic difference between the two groups except the treatment. The difference in outcomes (such as school attendance by those who get deworming vs. those who do not) is a rigorous estimate of the effect of treatment. These Randomized Controlled Trials (RCTs) have been advocated by leading development economists like Esther Duflo and Abhijit Banerjee at MIT and Michael Kremer at Harvard. Others have criticized RCTs. The most prominent critic is the widely respected dean of development research and current President of the American Economics Association, Angus Deaton of Princeton, who released his Keynes lecture on this topic earlier this year, “Instruments of development: Randomization in the tropics, and the search for the elusive keys to economic development.” Dani Rodrik and Lant Pritchett have also criticized RCTs.

To drastically oversimplify and boil down the debate, here are some criticisms and responses:

1. What are you doing experimenting on humans?

Denying something beneficial to some people for research purposes seems wildly unethical at first. RCT defenders point out that there is never enough money to treat everyone and drawing lots is a widely recognized method for fairly allocating scarce resources. And when you find something works, it will then be adopted and be beneficial to everyone who participated in the experiment. The same issues arise in medical drug testing, where RCTs are mainly accepted. Still, RCTs can cause hard feelings between treatment and control groups within a community or across communities. Given the concerns of this blog with the human dignity of the poor, the researchers should be at least be careful to communicate to the individuals involved what they are up to and always get their permission.

2. Can you really generalize from one small experiment to conclude that something “works”?

This is the single biggest concern about what RCTs teach us. If you find that using flipcharts in classrooms raises test scores in one experiment, does that mean that aid agencies should buy flipcharts for every school in the world? Context matters – the effect of flipcharts depends on the existing educational level of students and teachers, availability of other educational methods, and about a thousand other things. Plus, implementing something in an experimental setting is a lot easier than having it implemented well on a large scale. Defenders of RCTs say you can run many experiments in many different settings to validate that something “works.” Critics worry about the feeble incentives for academics to do replications, and say we have little idea how many or what kind of replications would be sufficient to establish that something “works.”

3. Can you find out “what works” without a theory to guide you?

Critics say this is the real problem with issue #2. The dream of getting pure evidence without theory is usually unattainable. For example, you need a theory to guide you as to what determines the effect of flipcharts to have any hope of narrowing down the testing and replications to something manageable. The most useful RCT results are those that confirm or reject a theory of human behavior. For example, a general finding across many RCTs in Africa is that demand for free life-saving products collapses once you charge a price for them (even a low subsidized price). This refutes the theory that fully informed people are rationally purchasing low cost medical inputs to improve their health and working capacity. This would usefully lead to further testing of whether the problem is lack of information or the assumption of perfect rationality (the latter is increasingly questioned for rich as well as poor people).

4. Can RCTs be manipulated to get the “right” results?

Yes. One could search among many outcome variables and many slices of the sample for results. One could investigate in advance which settings were more likely to give good results. Of course, scientific ethics prohibit these practices, but they are difficult to enforce. These problems become more severe when the implementing agency has a stake in the outcome of the evaluation, as could happen with an agency whose program will receive more funding when the results are positive.

5. Are RCTs limited to small questions?

Yes. Even if problems 1 through 4 are resolved, RCTs are infeasible for many of the big questions in development, like the economy-wide effects of good institutions or good macroeconomic policies. Some RCT proponents have (rather naively) claimed RCTs could revolutionize social policy, making it dramatically more effective – this claim itself can ironically not be tested with RCTs. Otherwise, embracing RCTs has led development researchers to lower their ambitions. This is probably a GOOD thing in foreign aid, where outsiders cannot hope to induce social transformation anyway and just finding some things that work for poor people is a reasonable outcome. But RCTs are usually less relevant for understanding overall economic development.

Overall, RCTs have contributed positive things to development research, but RCT proponents seem to oversell them and seem to be overly dogmatic about this kind of evidence being superior to all other kinds.