Econometric methodology for human mating

econometric-methodology2 I recently helped one of my single male graduate students in his search for a spouse. First, I suggested he conduct a randomized controlled trial of potential mates to identify the one with the best benefit/cost ratio. Unfortunately, all the women randomly selected for the study refused assignment to either the treatment or control groups, using language that does not usually enter academic discourse.

With the “gold standard” methods unavailable, I next recommended an econometric regression approach. He looked for data on a large sample of married women on various inputs (intelligence, beauty, education, family background, did they take a bath every day), as well as on output: marital happiness. Then he ran an econometric regression of output on inputs. Finally, he gathered data on available single women on all the characteristics in the econometric study. He made an out-of-sample prediction of predicted marital happiness. He visited the lucky woman who had the best predicted value in the entire singles sample, explained to her how he calculated her nuptial fitness, and suggested they get married. She called the police.

After I bailed him out of jail, he seemed much more reluctant than before to follow my best practice techniques to find out “who works” in the marriage market. Much later, I heard that he had gotten married. Reluctantly agreeing to talk to me, he described an unusual methodology. He had met various women relying on pure chance, used unconscious instincts to identify one woman as a promising mate, she reciprocated this gut feeling, and without any further rigorous testing they got married.

OK, all of us would admit love is not a science. But there are many other areas where we don’t follow rational decision-making models, and instead skip right to a decision for reasons that we cannot articulate. A great book on this is by Gerd Gigerenzer, Gut Feelings: The Intelligence of the Unconscious. There is also the old idea that not all useful knowledge can be explicitly written down, but some of it is “tacit knowledge” (see any writings by Michael Polanyi).

Is the aid world more like love or science? Probably somewhere in between. Obviously, there is a BIG role for rigorous research to evaluate aid interventions. Yet going from research to implementation must also involve a lot of gut instincts and tacit knowledge. I know experienced aid workers who say that they can tell right away from a site visit whether the project is working or not.

I don’t know if this is true, but certainly implementation involves non-quantifiable factors like people who have complicated motivations and interactions. A manager of an aid project must figure out how to get these people to do what is necessary to get the desired results. The manager (who also has complicated motivations) must adjust when the original blueprint runs into unexpected problems, which again relies more on acquired tacit knowledge than on science. (How to keep the bed net project going when the nets were first impounded and delayed at customs, the truck driver transporting the nets got drunk and didn’t make the trip, the clinic workers are off at a funeral for one of their coworkers, the foreign volunteer is too busy writing a blog and smoking pot, and the local village head is insulted that he was not consulted on the bed net distribution.) Certainly something similar is true also in running a private business or starting a new one – there is no owner’s manual for entrepreneurship.

So for donors and managers of aid funds, is finding the right project to fund more like econometrics or is it more like falling in love? How about a bit of both?

Read More & Discuss

Why does aid hate critics, while medicine appreciates them?

Two stories ran today in the New York Times that showed the important role of critics in medicine. In the first, medical researchers found that the usual methods screening for prostate and breast cancer was not as effective as previously advertised. Screening successfully identifies small tumors and the rate of operating to remove such tumors has skyrocketed. But the screening regimen has failed to make much of a dent in the prevalence of large prostate and breast tumors, so their preventative value is not as great as previously thought. Many other researchers had already pointed out that there is no evidence that the relatively new PSA prostate screening test has reduced prostate cancer deaths (a message that failed to make it to my own doctor, who tells me I am definitely OK once the PSA comes back normal). To make things even worse, some of the operations on small tumors were unnecessary and even harmful: “They are finding cancers that do not need to be found because they would never spread and kill or even be noticed if left alone.” The American Cancer Society concluded that too much emphasis on screening “can come with a real risk of overtreating many small cancers while missing cancers that are deadly.”

In the second story, earlier reports of positive results of an AIDS vaccine trial are coming under more and more doubt. The issue is one very familiar to any statistical researcher – did the apparently positive results from the vaccine trial come from random fluctuations in noisy data, or were the positive outcomes definitely more than could have happened by chance? We have the arcane concept of “statistical significance” to answer this. The NYT ran a story a month ago on the same vaccine trial that suggested definite positive outcomes (“statistically significant”), while today’s story features critics of the original trial results who fear the results were just due to random noise (“not statistically significant.”)

Suppose these critics were operating in the aid world. Aid defenders would accuse the critics of not being constructive – these studies were 100 percent negative (so what’s your plan for eliminating prostate cancer deaths, you fancy-pants researcher, if you don’t like ours?) They would accuse them of hurting the cause of financing cancer and AIDS treatment. The attacks on the critics might even get personal.

If this were the aid world, the mainstreamers would dismiss the arguments over statistical significance as some obscure academic quarrel that needn’t concern them. How do I know this? I have criticized Paul Collier on numerous occasions for failing to establish statistical significance for many of his aid & military intervention results. I have argued that he is doing “data mining,” which is pretty much the equivalent of producing lots of results on the AIDS vaccine and reporting only the positive results. But I have yet to find anyone who cares about these critiques – on the contrary the whole American and British armies seem to base their strategies on Collier’s statistical results. In contrast, it’s almost comical to see the heroic lengths to which the writer Donald McNeil Jr. goes in the latest NYT AIDS vaccine story to explain statistical significance to NYT readers. He is saying, hey you really have to get this if you want to know: Did the vaccine in the trial Work -- or -- Not.

The other feature of both stories is that both throw doubt on excessive confidence in simple panaceas – screening and vaccines. They suggest reality is more complex and that we need to think of new ways of attacking difficult problems like cancer and AIDS. If you are familiar with the aid world, you will know the analogy is exact to how we discuss solving difficult problems like poverty.

So why does medicine welcome critics and aid hates them? Perhaps us aid critics are just not as good as the medical critics. Or perhaps it is because we care so much more whether medicine really works than whether aid or military intervention really works?

Read More & Discuss

The Perils of Not Knowing that You Don’t Know

Suppose you needed to get from New York to Washington for a personal emergency. An airline told you that the projected time of departure for your one-hour flight was 2pm this afternoon. Of course, there is some uncertainty about this. The unstated promise of the airline is that this uncertainty will be kept within familiar bounds, with delays up to an hour not unusual, and delays of several hours possible with unexpected weather or mechanical failure. But suppose the airline secretly knew that due to a large set of unreliable links in the chain like a flight attendants’ union that might strike, a large risk of major engine failures throughout its fleet, and problems with the mechanics’ union that might prevent repairs, that the possible delays were not in HOURS, but in DAYS. You would be angry at the airline for not telling you this, and once you knew how much they didn’t know, you would quickly drop the airline and take the train instead.

The moral of the story is that knowing how much uncertainty there is about a projection – that is, knowing how much the projector DOESN’T KNOW – is often more important than the projection. An “estimated departure time based on the best available information” is meaningless to an airline customer if the uncertainty is about DAYS rather than HOURS.

Not knowing that you don’t know is at the root of many recent disasters, starting with the crisis itself. Holders of opaque derivatives apparently didn’t know how leveraged and exposed they were to shocks like falling housing prices, nor did they know how uncertain the housing prices were. Outside of economics, not knowing what we didn’t know is one of many causes for bad outcomes in Iraq and Afghanistan.

Not knowing that you don't know was our concern about the World Bank's global poverty projections. In his response on this blog and on the New York Times blog Economix, Ravallion misses our point. It is the degree of uncertainty of his poverty projection that is unacceptable. He said: “Faced with all these perceived “impossibilities,” Easterly and Freschi would apparently prefer to wait and see rather than take action when it is needed, based on the information available at the time.” Ravallion called our stance “analytic paralysis in the face of uncertainty.” He portrays us as unwilling to live with ANY uncertainty, which is ridiculous. Economics is filled with uncertainty.

But when the uncertainty is so large as it is with the poverty projections (for all the MAJOR reasons we pointed out, which Ravallion does not address*), then the implication is not “paralysis” but choosing actions that take into account the uncertainty. For example, you DON’T want to have a centralized bureaucracy like the World Bank allocate global poverty relief based on such wild uncertainty. It would be better to support local coping mechanisms (public or private) that flexibly respond at the local level using local knowledge about crisis effects like poverty, hunger, and dropping out of school.

An article in the Wall Street Journal illustrated well how uncertain theoretical economic predictions can be used in the very un-theoretical realm of influencing how scarce aid resources are directed, and to whom:

A joint development committee of the World Bank and the International Monetary Fund estimates the lingering financial crisis could drive an additional 90 million people around the world into "extreme poverty." To combat the staggering statistic, the World Bank is aggressively lobbying for its first capital increase in two decades.

Not only would it be the wrong response to channel poverty relief through the centralized bureaucracy, but confusion is here piled on uncertainty. The capital increase is NOT for lending to the poor countries, but to the middle-income ones, as was made clear in a number of other news stories which explained that the money would be shared between the IBRD, the branch of the Bank which lends to middle-income countries on near-commercial terms, and the IFC, which lends to companies.

This only strengthens our original argument that the poverty projection was a political exercise. We respect Ravallion’s academic work, but this exercise seems to be in a different category. Not knowing what you don’t know is indeed dangerous.

*The closest Ravallion comes is citing his paper that tests the assumption of distribution neutrality of growth ON AVERAGE, which still misses our point about variance around the average. His claim that what people thought would happen before the crisis is a good benchmark for what would have happened without the crisis simply begs the question of the accuracy of country growth forecasts--not responding to our concern that they are radically unreliable (with or without crises). Since his point is invalid, it remains true that his projection of the effect of the crisis on growth is based on no meaningful evidence.

Read More & Discuss

Martin Ravallion comments on "We must know how many are suffering, so let’s make up numbers"

The following is a response from Martin Ravallion, director of the Development Research Group of the World Bank, on last week's Aid Watch post, We must know how many are suffering, so let’s make up numbers. Pull your head out of the sand Bill Easterly!

Faced with all these perceived “impossibilities,” Easterly and Freschi would apparently prefer to wait and see rather than take action when it is needed, based on the information available at the time. Forecasting is impossible in their eyes. What then is possible? The crisis will probably be over before we will no longer need to make forecasts or estimates to fill in for missing data. Counterfactual analysis of the impact of a crisis is also deemed to be “impossible,” even though the pre-crisis expectations for growth in developing countries are a matter of public record—hardly impossible to know! My Economix article last week defended forecasting against this type of analytic paralysis in the face of uncertainty.

Easterly and Freschi also suggest that the numbers coming from the international agencies are a muddle. Granted there are differences, but Easterly and Freschi have manufactured a good deal of the perceived muddle by mixing forecasts of different things made at different times (and hence with different information). As they could have readily verified, the 89 million figure quoted in the World Bank’s G20 paper is the estimated impact of the crisis on the number of people living below $1.25 a day by the end of 2010 based on our latest growth forecasts, as of mid 2009. “Impact” is assessed relative to the pre-crisis trajectories, as expected at the beginning of 2008.

The uncertainty about these numbers is, of course, acknowledged. But they appear to be the best estimates we can currently make given the information available.

Read More & Discuss

We must know how many are suffering, so let’s make up numbers

As major world leaders jet from the UN General Assembly yesterday to the Pittsburgh G-20 today, the UN and World Bank have bombarded them with messages and statistics about the effect of the crisis on the global poor:

(1) We need to know how many are suffering where, so that help can be targeted to those in most need,

(2) Here are our precise numbers of how many additional poor have been created by the crisis,

(3) Since we based the numbers in (2) on thin evidence or no evidence whatsoever, you should also give us more money to expand our abuse of statistics.

Here are some examples to illustrate these three points:

(1) Secretary General Ban Ki-Moon announced yesterday, “We need to know who is being hurt, and where, so we can best respond.” He handed out a new UN report, “Voices of the Vulnerable” with lots of these numbers.

The need to know precise numbers is not so obvious, since the international aid system lacks any central authority that has the skill or power to redeploy aid resources from areas of less poverty to areas of increased poverty. Even in normal times, the relationship between level of poverty by country and aid given to that country (even correcting for quality of government) is not that strong.

(2) “Voices of the Vulnerable” says “in 2009 about 100 million more people will be trapped in extreme poverty ... than was anticipated before the ...crisis.” This figure is based on a World Bank paper prepared for the G20, which actually said “the crisis will leave an additional 89 million people in extreme poverty ... at the end of 2010.” This number is similar to the number in today’s FT oped by WB President Zoellick, except that he said that 90 million had already been pushed into poverty by “food, fuel and now financial crises” (i.e. 2007-2009)

The Bank’s 89 million claim, in turn, is based on a paper by Chen and Ravallion, which actually predicted “the crisis will add 53 million people to the 2009 count of the number of people living below $1.25 a day.” “Voices of the Vulnerable” report also cites figures from the ILO that “as many as 222 million additional workers worldwide run the risk of joining the ranks of the extreme working poor over the period 2007–2009.”

So precise estimates guide us to redeploy resources to the 100 million, or 89 million, or 53 million, or 222 million that were driven into poverty either in 2009, or 2009-2010, or 2007-2009, or 2008-2009.

There is an obscure piece of theoretical statistics called “garbage in, garbage out.” Calculating “additional poor in poverty due to crisis” requires (a) knowing what growth would have been in absence of crisis in every country, (b) knowing what growth will actually turn out to be in 2009 or 2010 in every country, not to mention in 2008, since the World Bank’s World Development Indicators do not yet have estimates for that year, (c) having good data on the current level of world poverty, (d) knowing the effect of growth on poverty, (e) projecting the effect of food and fuel prices on poverty, not to mention projecting food and fuel prices.

The reality: (a) is impossible, (b) is almost impossible, (c) Voices of the Vulnerable says the last real global poverty numbers were in 2005, which themselves reflected an upward revision of 40 percent ,(d) is unreliable and volatile, and (e) is impossible.

Economists can do useful projections sometimes, but the castles in the air implied by (a) through (e) should have caused a responsible analyst to NOT invent such a number.

Unfortunately, the made-up poverty numbers look positively respectable compared to other claims in the UN Voices of the Vulnerable that are based on no known statistics whatsoever:

“Women and children are likely to bear the brunt of the crisis…depression and drug and alcohol abuse could be on the rise….{including} consumption of strong local brews….{There are} rises in domestic violence…{There are} increased social tensions within communities.”

(3) In the same report, the Secretary-General admits: “More than a year in, what we do not know overshadows what we know.” But he offers to produce: “a networked 21st Century global system for real-time monitoring of the impacts of this and future global crises on the most vulnerable and poor: a Global Impact and Vulnerability Alert System (GIVAS). This system will require … resources.”

So ... give more “resources” to serial inventers of numbers to invent more numbers.

Why does all this matter? Because serious numbers are useful in analyzing how best to help alleviate poverty. The onslaught of imaginary numbers weakens that cause while accomplishing nothing for the poor.

Read More & Discuss

Institutions are the secret to development, if only we knew what they were

no-shoes-no-shirt-no-service.png Here’s an example of a simple rule. But is it as simple as it seems? A literal reading of the rule would ban a woman wearing a dress and sandals from entering the store, while it would allow either gender to wear a shirt, shoes, and nothing else. In a Northern beach town in the winter, this rule would be irrelevant. In the same beach town during the summer, if it were particularly carefree, the rule might be ignored.

This example gives a private rule, but it’s a good metaphor for official legal rules. All rules (such as those that make up a legal system) interact with non-rule factors, in this case the population’s clothing habits, the climate, an understanding of the intent of the rule-makers, and the degree of compliance by the population. So when we measure an “institutional” variable such as “rule of law,” we are really measuring some complicated mix of the legal and non-legal. An econometric finding that “rule of law” causes higher per capita income (a) gives little confidence that we have identified a clear relationship between the legal system and development, and (b) gives no guidance on how to modify the legal system to make development more likely.

If I want to understand law and development, maybe I should get a good lawyer. Fortunately, I have one at NYU’s law school, Kevin Davis. He inspired all the thoughts above in a paper that mocks the “rule of law” concept used by economists (available in a preliminary ungated version here). You can also hear Kevin give a fantastic lecture on law and development on the occasion of his getting the high honor of being named the Beller Family Professor of Business Law.

Kevin points out that two current measures of “rule of law” used by economists in “institutions cause development” econometric research are by their own description a mixture of some characteristics of the legal system with a long list of non-legalistic factors such as “popular observance of the law,” “a very high crime rate or if the law is routinely ignored without effective sanction (for example, widespread illegal strikes),” “losses and costs of crime,” “corruption in banking,” “crime,” “theft and crime,” “crime and theft as obstacles to business,” “extent of tax evasion,” “costs of organized crime for business” and “kidnapping of foreigners.” Showing that this mishmash is correlated with achieving development tells you what exactly? Hire bodyguards for foreigners?

What if “institutions” are yet another item in the long list of panaceas offered by development economists that don’t actually help anyone develop?

Back to constructive thoughts on the next blog post!

Read More & Discuss

Official Global Development Network Management Response to our post on GDN Research

The following entry was written by the management of the Global Development Network in response to our June 15 2009 post, A $3 million book with 8 readers? The impact of donor-driven research. The main point of our post was that GDN’s annual budget of $9 million had produced paltry results in publications, readers, or citations. We attributed this partly to a decision made early on to administer research through a bureaucracy rather than follow the meritocratic competition model of academic research.

The GDN response, which we received by email this week, is reproduced here verbatim:

A series of points raised the article merit clarifications for factual accuracy.

Objectives of GDN:

  • generation and sharing of multidisciplinary knowledge for the purpose of development
  • capacity building
  • networking (particularly South-South)
  • policy outreachIn this context, the number of publications in international development journals measures outcomes in just one of the objectives.

    Knowledge Creation, Publications and Career Development:

    The GDN Independent Evaluation 2007 reported that:

    • a median of two publication types per grant for Regional Research Competitions (RRCs) and for Global Research Projects (GRPs)
  • on average one of every two grantees published in an international journal, two of three in a national journal, two of three as a book chapter, and just over one working paper per person
  • these figures represent upper bounds on GDN’s effect on knowledge creation, as GDN cannot take credit in every case due to attribution
  • researchers surveyed also report enhanced knowledge of their subject, increased visibility, positive impact on careers and valuable networking opportunities through their affiliation with GDNA recent review of GDN conducted by the World Bank’s Independent Evaluation Group in 2008 stated that:
    • GDN is supporting increased amounts of development research from within developing and transition countries
  • GDN-funded research has led to an increase in the dissemination of this research through papers, journal articles and books. (1)
  • In the 2006 capacity building pilot evaluation, external reviewers assessed much of GDN-supported research to be of publishable quality: (2)
  • with revision, 66% of papers in FY02 were judged to be publishable in refereed journals rising to almost 80% in FY05
  • a decline in number of papers not considered worthy of publication in current form from 36% in FY02 to 15% in FY05
  • demonstrated value of mentoring: if appropriately revised, 79% of papers were worthy of publication in refereed journals in FY05, a 30% increase from FY02.GDN-publishability.png

    Citations:

    Several past studies noted evidence of higher numbers of citations in open access publications as opposed to expensive journals, although the evidence is not conclusive.

    Nonetheless, there is a problem of access to the scientific literature in developing countries. In Gaule (2009), we find, controlling for the quality and field of research, that the reference lists of Indian scientists are shorter, contain fewer references to expensive journals, and contain more references to open access journals than the reference lists of Swiss scientists. (…) The goal of open access advocates to have all scientific publications freely available to the world from the day of publication is laudable. But in the short run, it is more important to make scientific publications freely available for developing countries, because this is where the problem really is. (Source)

    GDN is committed to facilitating access, providing free of cost access to datasets and journals (J-STOR, Eldis); to all the GDN-funded research; as well as to almost 14,000 additional papers posted by researchers from developing and transition countries online in GDN’s Knowledge Base.

    Cost-Effective Core Activities:

    All GDN evaluations highlighted the cost effectiveness of its activities and the low overhead charged by the Secretariat (8-12%) compared to other similar organizations. The cost per study on average (2004-2007) in three noted GRPs have been:

    • Bridging Research and Policy = $ 92,243
  • Impact of Rich Countries Policies on Poverty = $ 57,830
  • Multidisciplinary and Intermediation Research Initiative = $ 54,211Budget:

    The budget for the 3 year Explaining Growth Global Research Project was, according to GDN’s financial records, less than $2 million dollars, including all thematic papers, case studies, capacity building workshops, mentoring fees (including Bill Easterly’s fee), publications and presentations at regional and global conferences.

    Funding:

    GDN is “not a World Bank-supported effort to promote development research.” GDN is only partly funded by the World Bank, with a declining share over time.

    Process

    GDN used regional networks to administer the project titled Explaining Growth (GRP) but grant allocation to researchers was competitive. Moreover, the Awards and Medals competition is an open competition with roughly 600-700 submissions annually. All RRCs and GRPs since Explaining Growth (the first one) have not been carried out in a “bureaucratic manner” but through open competitions, the modus operandi at GDN.

    Footnotes:

    (1) For example, GDN’s publication series, designed to give voice to researchers from developing and transition countries, has released 13 books to-date, with several edited volumes from GDN’s first Global Research Project on “Explaining Growth.” In partnership with the series’ publisher – Edward Elgar, GDN is able to makes these books available for half price to individuals from developing and transition countries registered on its Knowledge Base and copies of all publications under the series will soon be available electronically free of cost to registered users of GDN’s website.

    (2) Papers reviewed included all RRC papers available for FY02 and FY05 as well as papers randomly selected by the World Bank, produced through the other major GDN activities (GRPs, Awards and Medals Competition and Annual Conferences) in FY02 and FY05.

  • Read More & Discuss

    Development Experiments: Ethical? Feasible? Useful?

    A new kind of development research in recent years involves experiments: there is a “treatment group” that gets an aid intervention (such as a de-worming drug for school children), and a “control group” that does not. People are assigned randomly to the two groups, so there is no systematic difference between the two groups except the treatment. The difference in outcomes (such as school attendance by those who get deworming vs. those who do not) is a rigorous estimate of the effect of treatment. These Randomized Controlled Trials (RCTs) have been advocated by leading development economists like Esther Duflo and Abhijit Banerjee at MIT and Michael Kremer at Harvard. Others have criticized RCTs. The most prominent critic is the widely respected dean of development research and current President of the American Economics Association, Angus Deaton of Princeton, who released his Keynes lecture on this topic earlier this year, “Instruments of development: Randomization in the tropics, and the search for the elusive keys to economic development.” Dani Rodrik and Lant Pritchett have also criticized RCTs.

    To drastically oversimplify and boil down the debate, here are some criticisms and responses:

    1. What are you doing experimenting on humans?

    Denying something beneficial to some people for research purposes seems wildly unethical at first. RCT defenders point out that there is never enough money to treat everyone and drawing lots is a widely recognized method for fairly allocating scarce resources. And when you find something works, it will then be adopted and be beneficial to everyone who participated in the experiment. The same issues arise in medical drug testing, where RCTs are mainly accepted. Still, RCTs can cause hard feelings between treatment and control groups within a community or across communities. Given the concerns of this blog with the human dignity of the poor, the researchers should be at least be careful to communicate to the individuals involved what they are up to and always get their permission.

    2. Can you really generalize from one small experiment to conclude that something “works”?

    This is the single biggest concern about what RCTs teach us. If you find that using flipcharts in classrooms raises test scores in one experiment, does that mean that aid agencies should buy flipcharts for every school in the world? Context matters – the effect of flipcharts depends on the existing educational level of students and teachers, availability of other educational methods, and about a thousand other things. Plus, implementing something in an experimental setting is a lot easier than having it implemented well on a large scale. Defenders of RCTs say you can run many experiments in many different settings to validate that something “works.” Critics worry about the feeble incentives for academics to do replications, and say we have little idea how many or what kind of replications would be sufficient to establish that something “works.”

    3. Can you find out “what works” without a theory to guide you?

    Critics say this is the real problem with issue #2. The dream of getting pure evidence without theory is usually unattainable. For example, you need a theory to guide you as to what determines the effect of flipcharts to have any hope of narrowing down the testing and replications to something manageable. The most useful RCT results are those that confirm or reject a theory of human behavior. For example, a general finding across many RCTs in Africa is that demand for free life-saving products collapses once you charge a price for them (even a low subsidized price). This refutes the theory that fully informed people are rationally purchasing low cost medical inputs to improve their health and working capacity. This would usefully lead to further testing of whether the problem is lack of information or the assumption of perfect rationality (the latter is increasingly questioned for rich as well as poor people).

    4. Can RCTs be manipulated to get the “right” results?

    Yes. One could search among many outcome variables and many slices of the sample for results. One could investigate in advance which settings were more likely to give good results. Of course, scientific ethics prohibit these practices, but they are difficult to enforce. These problems become more severe when the implementing agency has a stake in the outcome of the evaluation, as could happen with an agency whose program will receive more funding when the results are positive.

    5. Are RCTs limited to small questions?

    Yes. Even if problems 1 through 4 are resolved, RCTs are infeasible for many of the big questions in development, like the economy-wide effects of good institutions or good macroeconomic policies. Some RCT proponents have (rather naively) claimed RCTs could revolutionize social policy, making it dramatically more effective – this claim itself can ironically not be tested with RCTs. Otherwise, embracing RCTs has led development researchers to lower their ambitions. This is probably a GOOD thing in foreign aid, where outsiders cannot hope to induce social transformation anyway and just finding some things that work for poor people is a reasonable outcome. But RCTs are usually less relevant for understanding overall economic development.

    Overall, RCTs have contributed positive things to development research, but RCT proponents seem to oversell them and seem to be overly dogmatic about this kind of evidence being superior to all other kinds.

    Read More & Discuss

    A $3 million book with 8 readers? The impact of donor-driven research

    One of aid donors’ less discussed activities is financing research. The Global Development Network (GDN) is probably the best known effort, a World Bank-sponsored effort to promote development research by researchers in the developing world, founded in December 1999, with an annual budget now of over $9 million (roughly the same as the entire annual budget for all National Science Foundation (NSF) funding of all economics research). In GDN’s own words, it exists to “promote research excellence in developing countries.” It has attracted contributions also from many bilateral aid agencies and from the Gates Foundation. (Meanwhile, scholarship programs for Africans and individuals from other under-represented regions to achieve their own academic success are chronically under-funded.) How excellent is GDN research? It is difficult to measure, but there are two common measures of research quality in academia (which affect big things like tenure decisions, as in “publish or perish”): publication in peer-reviewed journals and citations by other publications. Because it takes a while to accumulate citations and publications, we thought we would look at papers and books produced in the early years of the GDN and see what happened with subsequent publications and citations.

    Surprisingly, GDN was unable to provide us with a list of publications that had resulted from GDN-sponsored research, nor did any of several outside evaluations put together such a list. So we unleashed Aid Watch’s crack one-woman investigative team, who assembled the record on publications and citations from two types of GDN outputs: (1) the GDN’s first Global Research Project “Explaining Growth”, and (2) papers that won Global Development Awards & Medals Competitions from GDN during the three years 2000-2002.

    The Explaining Growth project involved over 200 researchers from 2000 to 2005. Its 2002-2003 budget was $3 million. The publications from this project are a direct measure of GDN impact in this effort, since these publications would not have happened otherwise. The main publication was the 2003 book, also called Explaining Growth, which as of June 14, 2009 in Google Scholar had gotten 8 citations. There were other later books on explaining growth in the Commonwealth of Independent States (6 citations), Latin America (5 citations), South Asia (6 citations), and the Middle East (1 citation) – curiously, there was no book on Africa. The editors of these volumes typically had distinguished academic careers outside of GDN, with many more citations for their personal publications.

    There were 51 papers that won Global Development Awards & Medals from 2000-2002, resulting from a competition involving $1 million and about 2000 researchers. We chose to focus on these to make the number of publications manageable to follow, and to focus on the “cream of the crop.” This procedure is biased towards finding the highest quality publications. It also establishes only an upper bound on the GDN impact from this set of papers, since journal publication may have happened anyway. We tracked all 51 papers and found that they resulted in 5 tenure-quality journal publications (publication in a top general interest journal or field journal). Four of the five publications were by Latin American economists, a group that had already achieved plenty of academic success before GDN came along.

    Why was there so little academic success that seemed to result from GDN efforts? Perhaps a decision taken early on was partly to blame. As a 2004 World Bank evaluation put it:

    One model was to promote open competition among various institutions, with funding going to the most qualified institutions that had submitted research proposals. A second model centered on pre-selected institutions within regions, which would serve as regional hubs and nominate members to the board. The second model prevailed.

    Translation: they had to decide on competition by merit vs. research bureaucracy and they chose bureaucracy. As the GDN results illustrate, academic research cannot be planned by bureaucrats.

    Read More & Discuss

    Top 10 reasons to test “War, Guns, and Votes” for data mining

    War-Guns-Votes-40.gif With a previous post on data mining, let’s examine one recent book as a possible candidate for tests of whether data mining could be a problem. Here are the top 10 reasons I chose this book:

    10. Oodles of regressions were run

    Author each morning

    wondering whether, during the previous evening, Pedro, or Anke, or Dominic, or Lisa, or Benedikt, or Marguerite has cracked whatever problem we had crashed into by the time I left for home.

    9. Oodles of control variables were tried

    ...range of possible causes drawn from across the social sciences. In addition to various characteristics of the economy, these include aspects of the country’s history, its geography, its social composition, and its polity.

    8. Weird conclusions about war

    mountains are dangerous...

    7. Sample was sliced up to get results

    Globally we find no effect of ethnic polarization. But in Africa ethnic polarization sharply increases risk.

    6. Very flexible specifications to get results

    If coup risk is high, military spending reduces risk…if coup risk is low, military spending increases risk...

    5. Previous results with same methodology didn’t pass the new data test

    Our previous results got overturned by the new data.

    4. Reverse causality makes every interpretation questionable

    I’ll let Nathan Fiala handle this one.

    3. Overconfidence in such statistical research as definitive

    The ideas in this book are all founded on statistical research.

    2. Author previously announced he was data mining:

    Table 1 presents the preferred reference model of conflict duration with eight variations. The reference model is reached after a series of iterations in which insignificant variables are deleted and variants of economic, social, geographic and historical explanatory variables are then tested in turn.

    1. A lot depends on the results

    The book often won’t let Africans vote, but it will let them experience Western military intervention.

    Read More & Discuss

    Maybe we should put rats in charge of foreign aid research

    Rat.jpg Laboratory experiments show that rats outperform humans in interpreting data, which is why we have today the US aid agency, the Millennium Challenge Corporation. Wait, I am getting ahead of myself, let me explain.

    The amazing finding on rats is described in an equally amazing book by Leonard Mlodinow. The experiment consists of drawing green and red balls at random, with the probabilities rigged so that greens occur 75 percent of the time. The subject is asked to watch for a while and then predict whether the next ball will be green or red. The rats followed the optimal strategy of always predicting green (I am a little unclear how the rats communicated, but never mind). But the human subjects did not always predict green, they usually want to do better and predict when red will come up too, engaging in reasoning like “after three straight greens, we are due for a red.” As Mlodinow says, “humans usually try to guess the pattern, and in the process we allow ourselves to be outperformed by a rat.”

    Unfortunately, spurious patterns show up in some important real world settings, like research on the effect of foreign aid on growth. Without going into any unnecessary technical detail, research looks for an association between economic growth and some measure of foreign aid, controlling for other likely determinants of economic growth. Of course, since there is some random variation in both growth and aid, there is always the possibility that an association appears by pure chance. The usual statistical procedures are designed to keep this possibility small. The convention is that we believe a result if there is only a 1 in 20 chance that the result arose at random. So if a researcher does a study that finds a positive effect of aid on growth and it passes this “1 in 20” test (referred to as a “statistically significant” result), we are fine, right?

    Alas, not so fast. A researcher is very eager to find a result, and such eagerness usually involves running many statistical exercises (known as “regressions”). But the 1 in 20 safeguard only applies if you only did ONE regression. What if you did 20 regressions? Even if there is no relationship between growth and aid whatsoever, on average you will get one “significant result” out of 20 by design. Suppose you only report the one significant result and don’t mention the other 19 unsuccessful attempts. You can do twenty different regressions by varying the definition of aid, the time periods, and the control variables. In aid research, the aid variable has been tried, among other ways, as aid per capita, logarithm of aid per capita, aid/GDP, logarithm of aid/GDP, aid/GDP squared, [log(aid/GDP) - aid loan repayments], aid/GDP*[average of indexes of budget deficit/GDP, inflation, and free trade], aid/GDP squared *[average of indexes of budget deficit/GDP, inflation, and free trade], aid/GDP*[ quality of institutions], etc. Time periods have varied from averages over 24 years to 12 years to to 8 years to 4 years. The list of possible control variables is endless. One of the most exotic I ever saw was: the probability that two individuals in a country belonged to different ethnic groups TIMES the number of political assassinations in that country. So it’s not so hard to run many different aid and growth regressions and report only the one that is “significant.”

    This practice is known as “data mining.” It is NOT acceptable practice, but this is very hard to enforce since nobody is watching when a researcher runs multiple regressions. It is seldom intentional dishonesty by the researcher. Because of our non-rat-like propensity to see patterns everywhere, it is easy for researchers to convince themselves that the failed exercises were just done incorrectly, and that they finally found the “real result” when they get the “significant” one. Even more insidious, the 20 regressions could be spread across 20 different researchers. Each of these obediently does only one pre-specified regression, 19 of whom do not publish a paper since they had no significant results, but the 20th one does publish their spuriously “significant” finding (this is known as “publication bias.”)

    But don’t give up on all damned lies and statistics, there ARE ways to catch data mining. A “significant result” that is really spurious will only hold in the original data sample, with the original time periods, with the original specification. If new data becomes available as time passes you can test the result with the new data, where it will vanish if it was spurious “data mining”. You can also try different time periods, or slightly different but equally plausible definitions of aid and the control variables.

    So a few years ago, some World Bank research found that “aid works {raises economic growth} in a good policy environment.” This study got published in a premier journal, got huge publicity, and eventually led President George W. Bush (in his only known use of econometric research) to create the Millennium Challenge Corporation, which he set up precisely to direct aid to countries with “good policy environments.”

    Unfortunately, this result later turned out to fail the data mining tests. Subsequent published studies found that it failed the “new data” test, the different time periods test, and the slightly different specifications test.

    The original result that “aid works in a good policy environment” was a spurious association. Of course, the MCC is still operating, it may be good or bad for other reasons.

    Moral of the story: beware of these kinds of statistical “results” that are used to determine aid policy! Unfortunately, the media and policy community don’t really get this, and they take the original studies at face value (not only on aid and growth, but also in stuff on determinants of civil war, fixing failed states, peacekeeping, democracy, etc., etc.) At the very least, make sure the finding is replicated by other researchers and passes the “data mining” tests.

    In other news, anti-gay topless Christian Miss California could be a possible candidate for a new STD prevention campaign telling all right-wing values advocates: “abstain, or the left-wing media will catch you.”

    Read More & Discuss

    Can We Trust the World Bank to be a Knowledge Bank?

    Unfortunately, not really, according to Brown University Professor Ross Levine’s presentation at the Aid Watch conference “What would the poor say?” on February 6, 2009. Professor Levine argues that the Bank’s incentives to keep the lending money flowing trumps any incentive to get the advice right on important issues.

    The World Bank acted to suppress data collection on bank regulation because it didn’t like the findings that were emerging (not that an obscure topic like financial regulation is that important). (2 minutes, 6 seconds):


    Ross Levine on a Bank Regulation Study at the World Bank from DRI on Vimeo.

    Was it that the Bank could not afford to spend $50,000 to collect data on what works in bank regulation? Well they did manage to spend $4 million on a “Growth Report” whose usefulness Professor Levine regards as somewhere between that of a pet rock and a clueless bank regulator (54 seconds):


    Ross Levine on How Much is A Lot of Money for the World Bank? from DRI on Vimeo.

    Read More & Discuss

    MADE-UP MALARIA DATA ROUND 2: Gates Foundation responds, WHO graciously offers not to respond

    The modest aim of an initiative like Aid Watch is to be one more small voice holding aid agencies and foundations accountable for doing good things for poor people. The aim of more accountability is to induce improved behavior by those guys, so that aid will work better. The Aid Watch blog already has had its first small test on trying to induce accountability. This post took Bill and Melinda Gates to task for claiming in the Financial Times that foreign aid had big victories over malaria in countries like Rwanda and Ethiopia, because the WHO country data they based it on was made up and later contradicted by the WHO itself.

    The Gates Foundation did respond to this criticism, to their great credit (not directly, but that’s OK, it was visible enough in a response to the Chronicle of Philanthropy’s coverage of this controversy.)

    What was their response to criticism for using invalid country data? Oops, they offered more invalid country data. The Gates Foundation spokesman offered the country data on Rwanda and Ethiopia from this journal article as defense for the Gateses’ claims on those countries' victories over malaria.

    What does the cited article actually say? “Districts and health facilities were not randomly selected, but constituted a (stratified) convenience sample, selecting those sites where intervention scale-up had been relatively rapid and successful … Therefore, estimated impacts cannot be extrapolated to the countries nation-wide.”

    Still, the Gates Foundation was a tad more responsive than the WHO, whose malaria chief first led astray the Gateses and the New York Times with false reports of victories over malaria based on made up country data, then the WHO issued totally different data in its official 2008 Malaria report a few months later, without ever retracting the New York Times story.

    When Aid Watch’s intrepid investigator Laura Freschi approached the WHO for comment, she got the following response from the WHO Project Leader for Information Management & Communications, Epidemic and Pandemic Alert and Response (EPR):

    “Hello. I have received your emails and phone call. However, WHO does not participate in blog discussions.

    Thank you.”

    It may seem obsessive to insist on good data, but bad data costs lives. The sad thing is that there have been SOME victories against malaria, and that solid data on WHAT is working WHERE is vital to guide the campaign against this tragic disease. Would Americans put up with the CDC using made up data to respond to a salmonella outbreak?

    I guess Aid Watch is going to have to work a LOT harder to do our part to get a bit more accountability.

    Read More & Discuss

    Did Bill and Melinda Gates Claim Malaria Victories Based on Phony Numbers?

    Tuesday’s Financial Times printed a Martin Wolf interview with the Gateses from Davos, available as a video on the FT web site. A sample quote from the interview:

    We’re trying to make sure that people understand this: aid is effective…So, for instance, malaria incidence is down in countries such as Zambia, Ethiopia, and Rwanda. It’s down in some countries by over 50 percent and some by 60 percent…[if we and other donors] come in and distribute mosquito nets – 60m to date – that is how we have achieved these declines. So we are able to say, “Look, aid is making a huge difference, we are literally saving people’s lives."

    Real victories against malaria would be great, but false victories can mislead and distract critical malaria efforts. Alas, Mr. and Mrs. Gates are repeating numbers that have already been discredited. This story of irresponsible claims goes back to a big New York Times headline on February 1, 2008: “Nets and New Drug Make Inroads Against Malaria,” which quoted Dr. Arata Kochi, chief of malaria for the WHO, as reporting 50-60 percent reductions in deaths of children in Zambia, Ethiopia, and Rwanda, and so celebrated the victories of the anti-malaria campaign. Alas, Dr. Kochi had rushed to the press a dubious report. The report was never finalized by WHO, it promptly disappeared, and its specific claims were contradicted by WHO’s own September 2008 World Malaria Report, by which time Dr. Kochi was no longer WHO chief of malaria.

    (There was never a retraction in the New York Times, so perhaps Mr. and Mrs. Gates can be forgiven for being confused – although with most of the world’s public health professionals on Mr. and Mrs. Gates’ payroll you would think their briefers would have access to the most accurate information.)

    The September 2008 WHO Malaria Report keeps Rwanda as a success story (along with some other new success stories – not mentioned in the New York Times – like Sao Tome & Principe and Zanzibar), but Zambia and Ethiopia are gone: the effects of malaria control in Zambia were “less clear,” and in Ethiopia, “the expected effects” of malaria control are “not yet visible.”

    Digging deeper into the WHO Malaria Report, the standards for data on malaria are set so low, it is even more striking how the Kochi numbers – those numbers that fueled a February 2008 New York Times story and a February 2009 Gates claim – failed to meet even these low standards. The WHO says (in a small print footnote): “in most countries of Africa, where 86% cases occur, reliable data on malaria are scarce. In these countries estimates were developed based on local climate conditions, which correlate with malaria risk, and the average rate at which people become ill with the disease in the area.” Another stab at explanation of their malaria numbers was: “From an empirical relationship between measures of malaria transmission risk and case incidence; this procedure was used for countries in the African Region where a convincing estimate from reported cases could not be made.” (Possible translation: we make the numbers up.)

    The shakiness of the numbers is visible when you look at them by country in the WHO Malaria Report. For the “success story” of Rwanda, there is an estimate of 3.3 million malaria cases in 2006, with an upper bound of 4.1 million and a lower bound of 2.5 million. But wait – another way to estimate cases, which is the one used to estimate trends, shows 1.4 million cases in 2006 (and this was an increase over the 2001-2003 average). Estimates of child malaria deaths in Rwanda are similarly all over the place – they do show a drop from 2001 to 2006, but the change is dwarfed by the vast imprecision conveyed by the lower and upper bounds.

    In another WHO success, Zanzibar (which, to be fair, Mrs. Gates also mentioned as a success by in the interview), there seems to be more consensus on success from a combination campaign featuring indoor spraying of homes, insecticide-treated bed nets, and treatment of malaria patients with advanced drugs. It seems to be easier to make inroads into malaria on small islands. The American Journal of Tropical Medical Hygeine has published two articles suggesting there was success of malaria control in Sao Tome (also an island) and a corridor in South Africa, Mozambique, and Swaziland, apparently using more rigorous data methods.

    As far as the country claims by the WHO and Mr. and Mrs. Gates, however, there seems to be mass confusion, and data that ranges from phony to made-up to shaky, about what interventions are responsible for what trends where. The WHO Malaria Report offers this ringing conclusion in its “Key Points” summary on how to control malaria:

    In general, however, the links between interventions and trends remain ambiguous, and more careful investigations of the effects of control are needed in most countries.

    Maybe the Gates Foundation should be funding more rigorous data collection. With all this effort to fight the tragedy of malaria, it’s even more tragic that the malaria warriors can’t even get accurate reports of who is sick and dying when and where.

    Read More & Discuss