Instead of research on reproducibility, just do reproducible research
11 Dec 2015Right now reproducibility, replicability, false positive rates, biases in methods, and other problems with science are the hot topic. As I mentioned in a previous post pointing out a flaw with a scientific study is way easier to do correctly than generating a new scientific study. Some folks have noticed that right now there is a huge market for papers pointing out how science is flawed. The combination of the relative ease of pointing out flaws and the huge payout for writing these papers is helping to generate the hype around the “reproducibility crisis”.
I gave a talk a little while ago at an NAS workshop where I stated that all the tools for reproducible research exist (the caveat being really large analyses - although that is changing as well). To make a paper completely reproducible, open, and available for post publication review you can use the following approach with no new tools/frameworks needed.
- Use Github for version control.
- Use rmarkdown or iPython notebooks for your analysis code
- When your paper is done post it to arxiv or biorxiv.
- Post your data to an appropriate repository like SRA or a general purpose site like figshare.
- Send any software you develop to a controlled repository like CRAN or Bioconductor.
- Participate in the post publication discussion on Twitter and with a Blog
This is also true of open science, open data sharing, reproducibility, replicability, post-publication peer review and all the other issues forming the “reproducibility crisis”. There is a lot of attention and heat that has focused on the “crisis” or on folks who make a point to take a stand on reproducibility or open science or post publication review. But in the background, outside of the hype, there are a large group of people that are quietly executing solid, open, reproducible science.
I wish that this group would get more attention so I decided to point out a few of them. Next time somebody asks me about the research on reproducibility or open science I’ll just point them here and tell them to just follow the lead of people doing it.
- Karl Broman - posts all of his talks online , generates many widely used open source packages, writes free/open tutorials on everything from knitr to making webpages, makes his papers highly reproducible.
- Jessica Li - posts her data online and writes open source software for her analyses.
- Mark Robinson - posts many of his papers as preprints on biorxiv, makes his analyses reproducible, writes open source software
- Florian Markowetz - writes open source software, provides Bioconductor data for major projects, links his papers with his code nicely on his publications page.
- Raphael Gottardo - writes/maintains many open source software packages, makes his analyses reproducible and available via Github, posts preprints of his papers.
- Genevera Allen - writes](https://cran.r-project.org/web/packages/TCGA2STAT/index.html) to make data easier to access, posts preprints on biorxiv and on arxiv
- Lorena Barba - teaches open source moocs, with lessons as open source iPython modules, and reproducible code for her analyses.
- Alicia Oshlack - writes papers with completely reproducible analyses, publishes lots of open source software and publishes preprints for her papers.
- Baggerly and Coombs - although they are famous for a highly public reproducible piece of research they have also quietly implemented policies like making all reports reproducible for their consulting center.
This list was made completely haphazardly as all my lists are, but just to indicate there are a ton of people out there doing this. One thing that is clear too is that grad students and postdocs are adopting the approach I described at a very high rate.
Moreover there are people that have been doing parts of this for a long time (like the physics or biostatistics communities with preprints, or how people have used Sweave for a long time) . I purposely left people off the list like Titus and Ethan who have gone all in, even posting their grants online. I did this because they are very loud advocates of open science, but I wanted to highlight quieter contributors and point out that while there is a lot of noise going on over in one corner, many people are quietly doing really good science in another.