Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Nevins-Potti, Reinhart-Rogoff

There’s an interesting parallel between the Nevins-Potti debacle (a true debacle, in my mind) and the recent Reinhart-Rogoff kerfuffle. Both were exposed via some essentially small detail that had nothing to do with the real problem.

In the case of Reinhart-Rogoff, the Excel error was what made them look ridiculous, but it was in fact the “unconventional weighting” of the data that had the most dramatic effect. Furthermore, ever since the paper had come out, academic economists were debating and challenging its conclusions from the get go. Even when legitimate scientific concerns were raised, policy-makers and other academics were not convinced. As soon as the Excel error was revealed, everything needed to be re-examined.

In the Nevins-Potti debacle, Baggerly and Coombes wrote article after article pointing out all the problems and, for the most part, no one in a position of power really cared. The Nevins-Potti errors were real zingers too, not some trivial Excel error (i.e. switching the labels between people with disease and people without disease). But in the end, it took Potti’s claim of being a Rhodes Scholar to bring him down. Clearly, the years of academic debate beforehand were meaningless compared to lying on a CV.

In the Reinhart-Rogoff case, reproducibility was an issue and if the data had been made available earlier, the problems would have been discovered earlier and perhaps that would have headed off years of academic debate (for better or for worse). In the Nevins-Potti example, reproducibility was not an issue–the original Nature Medicine study was done using public data and so was reproducible (although it would have been easier if code had been made available). The problem there is that no one listened.

One has to wonder if the academic system is working in this regard. In both cases, it took a minor, but _personal _failing, to bring down the entire edifice. But the protestations of reputable academics, challenging the research on the merits, were ignored. I’d say in both cases the original research conveniently said what people wanted to hear (debt slows growth, personalized gene signatures can predict response to chemotherapy), and so no amount of research would convince people to question the original findings.

One also has to wonder whether reproducibility is of any help here. I certainly don’t think it hurts, but in the case of Nevins-Potti, where the errors were shockingly obvious to anyone paying attention, the problems were deemed merely technical (i.e. statistical). The truth is, reproducibility will be most necessary in highly technical and complex analyses where it’s often not obvious how an analysis is done. If you can show a flaw in an analysis that is complicated, what’s the use if your work will be written off as merely concerned with technical details (as if those weren’t important)? Most of the news articles surrounding Reinhart-Rogoff characterized the problems as complex and statistical (i.e. not important) and not concerned with fundamental questions of interest.

In both cases, I think science was used to push an external agenda, and when the science was called into question, it was difficult to back down. I’ll write more in a future post about these kinds of situations and what, if anything, we can do to improve matters.

Podcast #7: Reinhart, Rogoff, Reproducibility

Jeff and I talk about the recent Reinhart-Rogoff reproducibility kerfuffle and how it turns out that data analysis is really hard no matter how big the dataset.

I wish economists made better plots

I’m seeing lots of traffic on a big-time economics article by that failed to reproduce and here are my quick thoughts. You can read a pretty good summary here by Mike Konczal.

Quick background: Carmen Reinhart and Kenneth Rogoff wrote an influential paper that was used by many to justify the need for austerity measures taken by governments to reduce debts relative to GDP. Yesterday, Thomas Herndon, Michael Ash, and Robert Pollin (HAP) released a paper where they reproduced the Reinhart-Rogoff (RR) analysis and noted a few irregularities or errors. In their abstract, HAP claim that they “find that coding errors, selective exclusion of available data, and unconventional weighting of summary statistics [in the RR analysis] lead to serious errors that inaccurately represent the relationship between public debt and GDP growth among 20 advanced economies in the post-war period.

It appears there were three points made by HAP: (1) RR excluded some important data from their final analysis; (2) RR weighted countries in a manner that was not proportional to the number of years they contributed to the dataset (RR used equal weighting of countries); and (3) there was an error in RR’s Excel formula which resulted in them inadvertently leaving out five countries from their final analysis.

The bottom line is shown in HAP’s Figure 1, which I reproduce below (on the basis of fair use):

HAP Analysis

From the plot you can see that the HAP’s adjusted analysis (circles) more or less coincides with RR’s analysis (diamonds) except for the last categories of countries with debt/GDP ratios over 90%. In that category RR’s analysis shows a large drop in growth whereas HAP’s analysis shows a more or less smooth decline (but still positive growth).

To me, it seems that the incorrect Excel formula is a real error, but easily fixed. It also seemed to have the least impact on the final analysis. The other two problems, which had far bigger impacts, might have some explanation that I’m not aware of. I am not an economist so I await others to weigh in. RR apparently do not comment on the exclusion of certain data points or on the weighting scheme so it’s difficult to say what the thinking was, whether it was inadvertent or purposeful.

In summary, so what? Here’s what I think:

  1. Is there some fishiness? Sure, but this is not the Potti-Nevins scandal a la economics. I suppose it’s possible RR manipulated the analysis to get the answer austerity hawks were looking for, but we don’t have the evidence yet and this just doesn’t feel like that kind of thing.
  2. What’s the counterfactual? Or, what would have happened if the analysis had been done the way HAP propose? Would the world have embraced pro-growth policies by taking on a greater debt burden? My guess is no. Austerity hawks would have found some other study that supported their claims (and in fact there was at least one other).
  3. RR’s original analysis did not contain a plot like Figure 1 in HAP’s analysis, which I personally find very illuminating. From HAP’s figure, you can see that there’s quite a bit of variation across countries and perhaps an overall downward trend. I’m not sure I would have dramatically changed my conclusion if I had done the HAP analysis instead of the RR analysis. My point is that plots like this, which show the variability, are very important._

_

  1. People see what they want to see. I would not be surprised to see some claim that HAP’s analysis supports the austerity conclusion because growth under high debt loads is much lower (almost 50%!) than under low debt loads.
  2. If RR’s analysis had been correct, should they have even made the conclusions they made? RR indicated that there was a “threshold” at 90% debt/GDP. My experience is that statements about thresholds, are generally very hard to make, even with good data. I wonder what other more knowledgable people think of the original conclusions.
  3. If the data had been made available sooner, this problem would have been fixed sooner. But in my opinion, that’s all that would have happened.

The vibe on the Internets seems to be that if only this problem had been identified sooner, the world would be a better place. But my cynical mind says, uh, no. You can toss this incident in the very large bucket of papers with some technical errors that are easily fixed. Thankfully, someone found these errors and fixed them, and that’s a good thing. Science moves on.

UPDATE: Reinhart-Rogoff respond.

UPDATE 2: Reinhart-Rogoff more detailed response.

Data science only poses a threat to (bio)statistics if we don't adapt

We have previously mentioned on this blog how statistics needs better marketing. Recently, Karl B. has suggested that “Data science is statistics” and Larry W. has wondered if “Data science is the end of statistics?” I think there are a couple of types of data science and that each has a different relationship to the discipline of academic statistics:

  1. Data science as marketing tool. Data analytics, data science, big data, etc. are terms that companies who already did something (IT infrastructure, consulting, database management, etc.) throw around to make them sound like they are doing the latest and greatest thing. These marketers are dabblers in what I would call the real “science of data” or maybe deal with just one part of the data pipeline. I think they pose no threat to the statistics community other than by generating backlash by over promising on the potential of data science or diluting the term to the point of being almost non-sensical.
  2. Data science as business analytics. Another common use of “data science” is to describe the exact same set of activities that use to be performed by business analytics people, maybe allowing for some growth in the size of the data sets. This might be a threat to folks who do statistics in business schools - although more likely it will be beneficial to those programs as there is growth in the need for business-oriented statisticians.
  3. Data science as big data engineer Sometimes data science refers to people who do stuff with huge amounts of data. Larry refers to this in his post when he talks about people working on billions of data points. Most classically trained statisticians aren’t comfortable with data of this size. But at places like Google - where big data sets are routine - the infrastructure is built so that statisticians can access and compress the parts of the data that they need to do their jobs. I don’t think this is necessarily a threat to statistics; but we should definitely be integrating data access into our curriculum.
  4. Data science as replacement for statistics Some people (and I think it is the minority) are exactly referring to things that statisticians do when they talk about data science. This means manipulating, collecting, and analyzing data, then making inferences to a population or predictions about what will happen next. This is, of course, a threat to statisticians. Some places, like NC State and Columbia, are tackling this by developing centers/institutes/programs with data science in the name. But I think that is a little dangerous. The data don’t matter - it is the problem you can solve with the data. So the key thing is that these institutes need to focus on solving real problems - not just churning out people who know a little R, a little SQL, and a little Python.

So why is #4 happening? I think one reason is reputation. Larry mentions that a statistician produces an estimate and a confidence interval and maybe the confidence interval is too wide. I think he is on to something there, but I think it is a bigger problem. As Roger has pointed out - statisticians often see themselves as referees - rather scientists/business people. So a lot of people have the experience of going to a statistician and feel like they have been criticized for bad experimental design, too small a sample size, etc. These issues are hugely important - but sometimes you have to make due with what you have. I think data scientists in category 4 are taking advantage of a cultural tendency of statisticians to avoid making concrete decisions.

A second reason is that some statisticians have avoided getting their hands dirty. “Hands clean” statisticians don’t  get the data from the database, or worry about the data munging, or match identifiers, etc. They wait until the data are nicely formated in a matrix to apply their methods. To stay competitive, we need to produce more “hands dirty” statisticians who are willing to go beyond schlep blindness and handle all aspects of a data analysis. In academia, we can encourage this by incorporating more of those issues into our curriculum.

Finally, I think statisticians focus on optimality hurts us. Our field grew up in an era where data was sparse and we had to squeeze every last ounce of information out what little data we had. Those constraints led to a cultural focus on optimality to a degree that is no longer necessary when data are abundant. In fact, an abundance of data is often unreasonably effective even with suboptimal methods. “Data scientists” understand this and shoot for the 80% solution that is good enough in most cases.

In summary I don’t think statistics will be killed off by data science. Most of the hype around data science is actually somewhat removed from our field (see above). But I do think that it is worth considering some potential changes that reposition our discipline as the most useful for answering questions with data. Here are some concrete proposals:

  1. Remove some theoretical requirements and add computing requirements to statistics curricula.
  2. Focus on statistical writing, presentation, and communication as a main part of the curriculum.
  3. Focus on positive interactions with collaborators (being a scientist) rather than immediately going to the referee attitude.
  4. Add a unit on translating scientific problems to statistical problems.
  5. Add a unit on data munging and getting data from databases.
  6. Integrating real and live data analyses into our curricula.
  7. Make all our students create an R package (a data product) before they graduate.
  8. Most important of all have a “big tent” attitude about what constitutes statistics.

 

Sunday data/statistics link roundup (4/14/2013)

  1. The most influential data scientists on Twitter, featuring Amy Heineike, Hilary Mason, and a few other familiar names to readers of this blog. In other news, I love reading list of the “Top K ___” as much as the next person. I love them even more when they are quantitative (the list above isn’t) - even when the quantification is totally bogus. (via John M.)
  2. Rod Little and our own Tom Louis over at the Huffingtonpost talking about the ways in which the U.S. Census supports our democracy. It is a very good piece and I think highlights the critical importance that statistics and data play in keeping government open and honest.
  3. An article about the growing number of fake academic journals and their potential predatory practices. I think I’ve been able to filter out the fake journals/conferences pretty well (if they’ve invited 30 Nobel Laureates - probably fake). But this poses big societal problems; how do we tell what is real science from what is fake if you don’t have inside knowledge about which journals are real? (via John H.)
  4. [ 1. The most influential data scientists on Twitter, featuring Amy Heineike, Hilary Mason, and a few other familiar names to readers of this blog. In other news, I love reading list of the “Top K ___” as much as the next person. I love them even more when they are quantitative (the list above isn’t) - even when the quantification is totally bogus. (via John M.)
  5. Rod Little and our own Tom Louis over at the Huffingtonpost talking about the ways in which the U.S. Census supports our democracy. It is a very good piece and I think highlights the critical importance that statistics and data play in keeping government open and honest.
  6. An article about the growing number of fake academic journals and their potential predatory practices. I think I’ve been able to filter out the fake journals/conferences pretty well (if they’ve invited 30 Nobel Laureates - probably fake). But this poses big societal problems; how do we tell what is real science from what is fake if you don’t have inside knowledge about which journals are real? (via John H.) 4.](https://www.capitalbikeshare.com/trip-history-data) on the DC Capitol Bikeshare. One of my favorite things is when a government organization just opens up its data. The best part is that the files are formatted as csv’s. Clearly someone who knows that the best data formats are open, free, and easy to read into statistical software. In other news, I think one of the most important classes that could be taught is “How to share data 101” (via David B.)
  7. A slightly belated link to a remembrance of George Box. He was the one who said, “All models are wrong, but some are useful.”  An absolute titan of our field.
  8. Check out these cool logotypes for famous scientists. I want one! Also, see the article on these awesome minimalist posters celebrating legendary women in science. I want the Sally Ride poster on a t-shirt.
  9. As an advisor, I aspire to treat my students/postdocs like this. (@hunterwalk). I’m not always so good at it, but those are some good ideals to try to live up to.