Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

An economic model for peer review

I saw this tweet the other day:

It reminded me that a few years ago I had a paper that went through the peer review wringer. It drove me completely bananas. One thing that drove me so crazy about the process was how long the referees waited before reviewing and how terrible the reviews were after that long wait. So I started thinking about the “economics of peer review”. Basically, what is the incentive for scientists to contribute to the system.

To get a handle on this idea, I designed a “peer review game” where there are a fixed number of players N. The players play the game for a fixed period of time. During that time, they can submit papers or they can review papers. For each person, their final score at the end of the time is S_i = \sum {\rm Submitted \; Papers \; Accepted}.

Based on this model, under closed peer review, there is one Nash equilibrium under the strategy that no one reviews any papers. Basically, no one can hope to improve their score by reviewing, they can only hope to improve their score by submitting more papers (sound familiar?). Under open peer review, there are more potential equilibria, based on the relative amount of goodwill you earn from your fellow reviewers by submitting good reviews.

We then built a model system for testing out our theory. The system involved having groups of students play a “peer review game” where they submitted solutions to SAT problems like:

Each solution was then randomly assigned to another player to review. Those players could (a) review it and reject it, (b) review it and accept it, or (c) not review it. The person with the most points at the end of the time (one hour) won.

We found some cool things:

  1. In closed review, reviewing gave no benefit.
  2. In open review, reviewing gave a small positive benefit.
  3. Both systems gave comparable accuracy
  4. All peer review increased the overall accuracy of responses

The paper is here and all of the data and code are here.

The Drake index for academics

I think academic indices are pretty silly; maybe we should introduce so many academic indices that people can’t even remember which one is which. There are pretty serious flaws with both citation indices and social media indices that I think render them pretty meaningless in a lot of ways.

Regardless of these obvious flaws I want in the game. Instead of the K-index for academics I propose the Drake index. Drake has achieved both critical and popular success. His song “Honorable Mentions” from the ESPYs (especially the first verse) reminds me of the motivation of the K-index paper.

To quantify both the critical and popular success of a scientist, I propose the Drake Index (TM). The Drake Index is defined as follows

(# Twitter Followers)/(Max Twitter Followers by a Person in your Field) + (#Citations)/(Max Citations by a Person in your Field)

Let’s break the index down. There are two main components (Twitter followers and Citations) measuring popular and critical acclaim. But they are measured on different scales. So we attempt to normalize them to the maximum in their field so the indices will both be between 0 and 1. This means that your Drake index score is between 0 and 2. Let’s look at a few examples to see how the index works.

  1. Drake  = (16.9M followers)/(55.5 M followers for Justin Bieber) + (0 citations)/(134 Citations for Natalie Portman) = 0.30
  2. Rafael Irizarry = (1.1K followers)/(17.6K followers for Simply Stats) + (38,194 citations)/(185,740 citations for Doug Altman) = 0.27
  3. Roger Peng - (4.5K followers)/(17.6K followers for Simply Stats) + (4,011 citations)/(185,740 citations for Doug Altman) = 0.27
  4. Jeff Leek - (2.6K followers)/(17.6K followers for Simply + (2,348 citations)/(185,740 citations for Doug Altman) = 0.16

In the interest of this not being taken any seriously than an afternoon blogpost should be I won’t calculate any other people’s Drake index. But you can :-).

You think P-values are bad? I say show me the data.

Both the scientific community and the popular press are freaking out about reproducibility right now. I think they have good reason to, because even the US Congress is now investigating the transparency of science. It has been driven by the very public reproducibility disasters in genomics and economics.

There are three major components to a reproducible and replicable study from a computational perspective: (1) the raw data from the experiment must be available, (2) the statistical code and documentation to reproduce the analysis must be available and (3) a correct data analysis must be performed.

There have been successes and failures in releasing all the data, but PLoS' policy on data availability and the alltrials initiative hold some hope. The most progress has been made on making code and documentation available. Galaxy, knitr, and iPython make it easier to distribute literate programs than it has ever been previously and people are actually using them!

The trickiest part of reproducibility and replicability is ensuring that people perform a good data analysis. The first problem is that we actually don't know which statistical methods lead to higher reproducibility and replicability in users hands.  Articles like the one that just came out in the NYT suggest that using one type of method (Bayesian approaches) over another (p-values) will address the problem. But the real story is that those are still 100% philosophical arguments. We actually have very little good data on whether analysts will perform better analyses using one method or another.  I agree with Roger in his tweet storm (quick someone is wrong on the internet Roger, fix it!):

This is even more of a problem because the data deluge demands that almost all data analysis be performed by people with basic to intermediate statistics training at best. There is no way around this in the short term. There just aren't enough trained statisticians/data scientists to go around.  So we need to study statistics just like any other human behavior to figure out which methods work best in the hands of the people most likely to be using them.

Unbundling the educational package

I just got back from the World Economic Forum’s summer meeting in Tianjin, China and there was much talk of disruption and innovation there. Basically, if you weren’t disrupting, you were furniture. Perhaps not surprisingly, one topic area that was universally considered ripe for disruption was Education.

There are many ideas bandied about with respect to “disrupting” education and some are interesting to consider. MOOCs were the darlings of…last year…but they’re old news now. Sam Lessin has a nice piece in the The Information (total paywall, sorry, but it’s worth it) about building a subscription model for universities. Aswath Damodaran has what I think is a nice framework for thinking about the “education business”.

One thing that I latched on to in Damodaran’s piece is the idea of education as a “bundled product”. Indeed, I think the key aspect of traditional on-site university education is the simultaneous offering of

  1. Subject matter content (i.e. course material)
  2. Mentoring and guidance by faculty
  3. Social and professional networking
  4. Other activities (sports, arts ensembles, etc.)

MOOCs have attacked #1 for many subjects, typically large introductory courses. Endeavors like the Minerva project are attempting to provide lower-cost seminar-style courses (i.e. anti-MOOCs).

I think the extent to which universities will truly be disrupted will hinge on how well we can unbundle the four (or maybe more?) elements described above and provide them separately but at roughly the same level of quality. Is it possible? I don’t know.

Applied Statisticians: people want to learn what we do. Let's teach them.

In this recent opinion piece, Hadley Wickham explains how data science goes beyond Statistics and that data science is not promoted in academia. He defines data science as follows:

I think there are three main steps in a data science project: you collect data (and questions), analyze it (using visualization and models), then communicate the results.

and makes the important point that

Any real data analysis involves data manipulation (sometimes called wrangling or munging), visualization and modelling.

The above describes what I have been doing since I became an academic applied statistician about 20 years ago. It describes what several of my colleagues do as well. For example, 15 years ago Karl Broman, in his excellent job talk, covered all the items in Hadley’s definition. The arc of the talk revolved around the scientific problem and not the statistical models. He spent a considerable amount of time describing how the data was acquired and how he used perl scripts to clean up microsatellites data.  More than half his slides contained visualizations, either illustrative cartoons or data plots. This research eventually led to his widely used “data product” R/qtl. Although not described in the talk, Karl used make to help make the results reproducible.

So why then does Hadley think that “Statistics research focuses on data collection and modeling, and there is little work on developing good questions, thinking about the shape of data, communicating results or building data products”?  I suspect one reason is that most applied work is published outside the flagship statistical journals. For example, Karl’s work was published in the American Journal of Human Genetics. A second reason may be that most of us academic applied statisticians don’t teach what we do. Despite writing a thesis that involved much data wrangling (reading music aiff files into Splus) and data visualization (including listening to fitted signals and residuals), the first few courses I taught as an assistant professor were almost solely on GLM theory.

About five years ago I tried changing the Methods course for our PhD students from one focusing on the math behind statistical methods to a problem and data-driven course. This was not very successful as many of our students were interested in the mathematical aspects of statistics and did not like the open-ended assignments. Jeff Leek built on that class by incorporating question development, much more vague problem statements, data wrangling, and peer grading. He also found it challenging to teach the more messy parts of applied statistics. It often requires exploration and failure which can be frustrating for new students.

This story has a happy ending though. Last year Jeff created a data science Coursera course that enrolled over 180,000 students with 6,000+ completing. This year I am subbing for Joe Blitzstein (talk about filling in big shoes) in CS109: the Data Science undergraduate class Hanspeter Pfister and Joe created last year at Harvard. We have over 300 students registered, making it one of the largest classes on campus. I am not teaching them GLM theory.

So if you are an experienced applied statistician in academia, consider developing a data science class that teaches students what you do.