Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Instead of research on reproducibility, just do reproducible research

Right now reproducibility, replicability, false positive rates, biases in methods, and other problems with science are the hot topic. As I mentioned in a previous post pointing out a flaw with a scientific study is way easier to do correctly than generating a new scientific study. Some folks have noticed that right now there is a huge market for papers pointing out how science is flawed. The combination of the relative ease of pointing out flaws and the huge payout for writing these papers is helping to generate the hype around the “reproducibility crisis”.

I gave a talk a little while ago at an NAS workshop where I stated that all the tools for reproducible research exist (the caveat being really large analyses - although that is changing as well). To make a paper completely reproducible, open, and available for post publication review you can use the following approach with no new tools/frameworks needed.

  1. Use Github for version control.
  2. Use rmarkdown or iPython notebooks for your analysis code
  3. When your paper is done post it to arxiv or biorxiv.
  4. Post your data to an appropriate repository like SRA or a general purpose site like figshare.
  5. Send any software you develop to a controlled repository like CRAN or Bioconductor.
  6. Participate in the post publication discussion on Twitter and with a Blog

This is also true of open science, open data sharing, reproducibility, replicability, post-publication peer review and all the other issues forming the “reproducibility crisis”. There is a lot of attention and heat that has focused on the “crisis” or on folks who make a point to take a stand on reproducibility or open science or post publication review. But in the background, outside of the hype, there are a large group of people that are quietly executing solid, open, reproducible science.

I wish that this group would get more attention so I decided to point out a few of them. Next time somebody asks me about the research on reproducibility or open science I’ll just point them here and tell them to just follow the lead of people doing it.

This list was made completely haphazardly as all my lists are, but just to indicate there are a ton of people out there doing this. One thing that is clear too is that grad students and postdocs are adopting the approach I described at a very high rate.

Moreover there are people that have been doing parts of this for a long time (like the physics or biostatistics communities with preprints, or how people have used Sweave for a long time) . I purposely left people off the list like Titus and Ethan who have gone all in, even posting their grants online. I did this because they are very loud advocates of open science, but I wanted to highlight quieter contributors and point out that while there is a lot of noise going on over in one corner, many people are quietly doing really good science in another.