Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Sunday data/statistics link roundup (1/27/2013)

  1. Wisconsin is decoupling the education and degree granting components of education. This means if you take a MOOC like mine, Brian’s or Roger’s and there is an equivalent class to pass at Wisconsin, you can take the exam and get credit. This is big. (via Rafa)
  2. 1. Wisconsin is d[ecoupling the education and degree granting components](http://marginalrevolution.com/marginalrevolution/2013/01/the-wisconsin-revolution.html) of education. This means if you take a MOOC like [mine](https://www.coursera.org/course/dataanalysis), [Brian’s](https://www.coursera.org/course/biostats) or [Roger’s](https://www.coursera.org/course/compdata) and there is an equivalent class to pass at Wisconsin, you can take the exam and get credit. This is big. (via Rafa) 2.  is a really cool MLB visualisation done with d3.js and Crossfilter. It was also prototyped in R, which makes it even cooler. (via Rafa via Chris V.)
  3. Harvard is encouraging their professors to only publish in open access journals and to resign from closed access journals. This is another major change and bodes well for the future of open science (again via Rafa - noticing a theme this week?).
  4. This deserves a post all to itself, but Greece is prosecuting a statistician for analyzing data in a way that changed their deficit figure. I wonder what the folks at the International Year of Statistics think about that? (via Alex N.)
  5. Be on the twitters at 10:30AM Tuesday and follow the hashtag #jhsph753 if you want to hear all the crazy stuff I tell my students when I’m running on no sleep.
  6. Thomas at StatsChat is fed up with Nobel correlations. Although I’m still partial to the length of country name association.

My advanced methods class is now being live-tweeted

A student in my class is going to be live-tweeting my (often silly/controversial) comments in the advanced/Ph.D. data analysis and methods class I’m teaching here at Hopkins. The hashtag is #jhsph753 and the class runs from 10:30am to 12:00PM EST. Check it out here.

Why I disagree with Andrew Gelman's critique of my paper about the rate of false discoveries in the medical literature

With a colleague, I wrote a paper titled, “Empirical estimates suggest most published medical research is true”  which we quietly posted to ArXiv a few days ago. I posted to the ArXiv in the interest of open science and because we didn’t want to delay the dissemination of our approach during the long review process. I didn’t email anyone about the paper or talk to anyone about it, except my friends here locally.

I underestimated the internet. Yesterday, the paper was covered in this piece on the MIT Tech review. That exposure was enough for the paper to appear in a few different outlets. I’m totally comfortable with the paper, but was not anticipating all of the attention so quickly.

In particular, I was a little surprised to see it appear on Andrew Gelman’s blog with the disheartening title, “I don’t believe the paper, “Empirical estimates suggest most published medical research is true.” That is, most published medical research may well be true, but I’m not at all convinced by the analysis being used to support this claim.” I responded briefly this morning to his post, but then had to run off to teach class. After thinking about it a little more, I realized I have some objections to his critique.

His main criticisms of our paper are: (1) with type I/type II errors instead of type S versus type M errors (paragraph 2), (2) that we didn’t look at replication, we performed inference (paragraph 4), (3) that there is p-value hacking going on (paragraph 4), and (4) he thinks that our model does not apply because p-value hacking my change the assumptions underlying this model in genomics.

I will handle each of these individually:

(1) This is primarily semantics. Andrew is concerned with interesting/uninteresting with his Type S and Type M Errors. We are concerned with true/false positives as defined by type I and type II errors (and a null hypothesis). You might believe that the null is never true - but then by the standards of the original paper all published research is true. Or you might say that a non-null result might have an effect size too small to be interesting - but the framework being used here is hypothesis testing and we have stated how we defined a true positive in that framework explicitly.  We define the error rate by the rate of classifying thing as null when they should be classified as alternative and vice versa. We then estimate the false discovery rate, under the framework used to calculate those p-values. So this is not a criticism of our work with evidence, rather it is a stated difference of opinion about the philosophy of statistics not supported by conclusive data.

(2) Gelman says he originally thought we would follow up specific p-values to see if the results replicated and makes that a critique of our paper. That would definitely be another approach to the problem. Instead, we chose to perform statistical inference using justified and widely used statistical techniques. Others have taken the replication route, but of course that approach too would be fraught with difficulty - are the exact conditions replicable (e.g. for a clinical trial), can we sample from the same population (if it has changed or is hard to sample), and what do we mean by replicates (would two p-values less than 0.05 be convincing?). This again is not a criticism of our approach, but a statement of another, different analysis Gelman was wishing to see.

(3)-(4) Gelman states, “You don’t have to be Uri Simonsohn to know that there’s a lot of p-hacking going on.” Indeed Uri Samuelson wrote a paper where he talks about the potential for p-value hacking. He does not collect data from real experiments/analyses, but uses simulations, theoretical arguments, and prospective experiments designed to show specific problems. While these arguments are useful and informative, it gives no indication of the extent of p-value hacking in the medical literature. So this argument is made on the basis of a supposition by Gelman that this happens broadly, rather than on data.

My objection to his criticism is that his critiques are based primarily on philosophy (1), a wish that we had done the study a different way (2), and assumptions about the way science works with only anecdotal evidence (3-4).

One thing you could very reasonably argue is how sensitive our approach is to violations of our assumptions (which Gelman implied with criticisms 3-4). To address this,  my co-author and I have now performed a simulation analysis. In the first simulation, we considered a case where every p-value less than 0.05 was reported and the p-values were uniformly distributed, just as our assumptions would state. We then plot our estimates of the swfdr versus the truth. Here our estimator works pretty well.

 

all-significant

We also simulate a pretty serious p-value hacking scenario where people report only the minimum p-value they observe out of 20 p-values. Here our assumption of uniformity is strongly violated. But we still get pretty accurate estimates of the swfdr for the range of values (14%) we report in our paper.

only-min

Since I recognize this is only a couple of simulations, I have also put the code up on Github with the rest of our code for the paper so other people can test it out.

Whether you are convinced by Gelman, or convinced by my response, I agree with him that it is pretty unlikely that “most published research is false” so I’m glad our paper is at least bringing that important point up. I also hope that by introducing a new estimator of the science-wise fdr we inspire more methodological development and that philosophical criticisms won’t prevent people from looking at the data in new ways.

 

 

 

Statisticians and computer scientists - if there is no code, there is no paper

I think it has been beat to death that the incentives in academia lean heavily toward producing papers and less toward producing/maintaining software. There are people that are way, way more knowledgeable than me about building and maintaining software. For example, Titus Brown hit a lot of the key issues in his interview. The open source community is also filled with advocates and researchers who know way more about this than I do.

This post is more about my views on changing the perspective of code/software in the data analysis community. I have been frustrated often with statisticians and computer scientists who write papers where they develop new methods and seem to demonstrate that those methods blow away all their competitors. But then no software is available to actually test and see if that is true. Even worse, sometimes I just want to use their method to solve a problem in our pipeline, but I have to code it from scratch!

I have also had several cases where I emailed the authors for their software and they said it “wasn’t fit for distribution” or they “don’t have code” or the “code can only be run on our machines”. I totally understand the first and last, my code isn’t always pretty (I have zero formal training in computer science so messy code is actually the most likely scenario) but I always say, “I’ll take whatever you got and I’m willing to hack it out to make it work”. I often still am turned down.

So I have a new policy when evaluating CV’s of candidates for jobs, or when I’m reading a paper as a referee. If the paper is about a new statistical method or machine learning algorithm and there is no software available for that method - I simply mentally cross it off the CV. If I’m reading a data analysis and there isn’t code that reproduces their analysis - I mentally cross it off. In my mind, new methods/analyses without software are just vapor ware. Now, you’d definitely have to cross a few papers off my CV, based on this principle. I do that. But I’m trying really hard going forward to make sure nothing gets crossed off.

In a future post I’ll talk about the new issue I’m struggling with - maintaing all that software I’m creating.

 

Sunday data/statistics link roundup (1/20/2013)

  1. This might be short. I have a couple of classes starting on Monday. The first is our 1. This might be short. I have a couple of classes starting on Monday. The first is our class. This is one of my favorite classes to teach, our Ph.D. students are pretty awesome and they always amaze me with what they can do. The other is my Coursera debut in Data Analysis. We are at about 88,000 enrolled. Tell your friends, maybe we can make it an even 100k! In related news, some California schools are 1. This might be short. I have a couple of classes starting on Monday. The first is our [ 1. This might be short. I have a couple of classes starting on Monday. The first is our](http://www.jhsph.edu/courses/course/140.753/01/2012/16424/) class. This is one of my favorite classes to teach, our Ph.D. students are pretty awesome and they always amaze me with what they can do. The other is my Coursera debut in [Data Analysis](https://www.coursera.org/course/dataanalysis). We are at about 88,000 enrolled. Tell your friends, maybe we can make it an even 100k! In related news, some California schools are with offering credit for online courses. (via Sherri R.)
  2. Some interesting numbers on why there aren’t as many “gunners” in the NBA - players who score a huge number of points.  I love the talk about hustling, rotating team defense. I have always enjoyed watching good defense more than good offense. It might not be the most popular thing to watch, but seeing the Spurs rotate perfectly to cover the open man is a thing of athletic beauty. My Aggies aren’t too bad at it either…(via Rafa).
  3. A really interesting article suggesting that nonsense math can make arguments seem more convincing to non-technical audiences. This is tangentially related to a previous study which showed that more equations led to fewer citations in biology articles. Overall, my take home message is that we don’t need less equations necessarily; we need to elevate statistical/quantitative literacy to the importance of reading literacy. (via David S.)
  4. This has been posted elsewhere, but a reminder to send in your statistical stories for the 365 stories of statistics.
  5. Automatically generate a postmodernism essay. Hit refresh a few times. It’s pretty hilarious. It reminds me a lot of this article about statisticians. Here is the technical paper describing how they simulate the essays. (via Rafa)