Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

What's wrong with the predicting h-index paper.

Editor’s Note: I recently posted about a paper in Nature that purported to predict the H-index. The authors contacted me to get my criticisms, then responded to those criticisms. They have requested the opportunity to respond publicly, and I think it is a totally reasonable request. Until there is a better comment generating mechanism at the journal level, this seems like as good a forum as any to discuss statistical papers. I will post an extended version of my criticisms here and give them the opportunity to respond publicly in the comments. 

The paper in question is a clearly a clever idea and the kind that would get people fired up. Quantifying researchers output is all the rage and being able to predict this quantity in the future would obviously make a lot of evaluators happy. I think it was, in that sense, a really good idea to chase down these data, since it was clear that if they found anything at all it would be very widely covered in the scientific/popular press. 

My original post was inspired out of my frustration with Nature, which has a history of publishing somewhat suspect statistical papers, such as this one. I posted the prediction contest after reading another paper I consider to be a flawed statistical paper, both for statistical reasons and for scientific reasons. I originally commented on the statistics in my post. The authors, being good sports, contacted me for my criticisms. I sent them the following criticisms, which I think are sufficiently major that a statistical referee or statistical journal would have likely rejected the paper:
  1. Lack of reproducibility. The code/data are not made available either through Nature or on your website. This is a critical component of papers based on computation and has led to serious problems before. It is also easily addressable. 
  2. No training/test set. You mention cross-validation (and maybe the R^2 is the R^2 using the held out scientists?) but if you use the cross-validation step to optimize the model parameters and to estimate the error rate, you could see some major overfitting. 
  3. The R^2 values are pretty low. An R^2 of 0.67 is obviously superior to the h-index alone, but (a) there is concern about overfitting, and (b) even without overfitting, that low of R^2 could lead to substantial variance in predictions. 
  4. The prediction error is not reported in the paper (or in the online calculator). How far off could you be at 5 years, at 10? Would the results still be impressive with those errors reported?
  5. You use model selection and show only the optimal model (as described in the last paragraph of the supplementary), but no indication of the potential difficulties with this model selection are made in the text. 
  6. You use a single regression model without any time variation in the coefficients and without any potential non-linearity. Clearly when predicting several years into the future there will be variation with time and non-linearity. There is also likely heavy variance in the types of individuals/career trajectories, and outliers may be important, etc. 
They carefully responded to these criticisms and hopefully they will post their responses in the comments. My impression based on their responses is that the statistics were not as flawed as I originally thought, but that the data aren’t sufficient to form a useful prediction. 
However, I think the much bigger flaw is the basic scientific premise. The h-index has been identified as having major flaws, biases (including gender bias), and to be a generally poor summary of a scientist’s contribution. See here, the list of criticisms here, and the discussion here for starters. The authors of the Nature paper propose a highly inaccurate predictor of this deeply flawed index. While that alone is sufficient to call into question the results in the paper, the authors also make bold claims about their prediction tool: 
Our formula is particularly useful for funding agencies, peer reviewers and hir­ing committees who have to deal with vast 
numbers of applications and can give each only a cursory examination. Statistical techniques have the advantage of returning 
results instantaneously and in an unbiased way.
Suggesting that this type of prediction should be used to make important decisions on hiring, promotion, and funding is highly scientifically flawed. Coupled with the online calculator the authors handily provide (which produces no measure of uncertainty) it makes it all too easy for people to miss the real value of scientific publications: the science contained in them. 

Why we should continue publishing peer-reviewed papers

Several bloggers are calling for the end of peer-reviewed journals as we know them. Jeff suggest that we replace them with a system in which everyone posts their papers on their blog, pubmed aggregates the feeds, and peer-review happens post publication via, for example, counting up like and dislike votes. In my view, many of these critiques seem to conflate problems from different aspects of the process. Here I try to break down the current system into its key components and defend the one aspect I think we should preserve (at least for now): pre-publication peer-review.

To avoid confusion let me start by enumerating some of the components for which I agree change is needed.

  • There is no need to produce paper copies of our publications. Indulging our preference for reading hard copies does not justify keeping the price of disseminating our work twice as high as it should be. 
  • There is no reason to be sending the same manuscript (adapted to fit guidelines) to several journals, until it gets accepted. This frustrating and time-consuming process adds very little value (we previously described Nick Jewell’s solution). 
  • There is no reason for publications to be static. As Jeff and many others suggest, readers should be able to comment and rate systematically on published papers and authors should be able to update them.

However, all these changes can be implemented without doing away with pre-publication peer-review.

A key reason American and British universities consistently lead the pack of research institutions is their strict adherence to a peer-review system that minimizes cronyism and tolerance for mediocrity. At the center of this system is a promotion process in which outside experts evaluate a candidate’s ability to produce high quality ideas. Peer-reviewed journal articles are the backbone of this evaluation. When reviewing a candidate I familiarize myself with his or her work by reading 5-10 key papers. It’s true that I read these disregarding the journal and blog posts would serve the same purpose. But I also use the publication section of the CV not only because reading all papers is logistically impossible but because these have already been evaluated by ~ three referees plus an editor and provide an independent assessment to mine. I also use the journal’s prestige because although it is a highly noisy measure of quality, the law of large numbers starts kicking in after 10 papers or so. 

So are three reviewers better than the entire Internet? Can a reddit-like system provide just as much signal as the current peer-reviewed journal? You can think of the current system as a cooperative in which we all agree to read each other’s papers thoroughly (we evaluate 2-3 for each one we publish) with journals taking care of the logistics. The result of a review is an estimate of quality ranging from highest (Nature, Science) to 0 (not published). This estimate is certainly noisy given the bias and quality variance of referees and editors. But, across all papers on a CV variance is reduced and bias averages out (I note that we complain vociferously when the bias keeps us from publishing in a good journal, but we rarely say a word when the bias helps us get into a better journal than deserved). Jeff’s argument is that post-publication review will result in many more evaluations and therefore a stronger signal to noise ratio. I need to see evidence of this before being convinced. In the current system ~ three referees commit to thoroughly reviewing of the paper. If they do a sloppy job, they will embarrass themselves with an editor or an AE (not a good thing). With the post-publication review system nobody is forced to review. I fear most papers will go without comment or votes, including really good ones. My feeling is that marketing and PR will matter even more than it does now and that’s not a good thing.

Dissemination of ideas is another important role of the literature. Jeff describes a couple of anecdotes to argue it can be sped up by just posting it on your blog.

I posted a quick idea called the Leekasso, which led to some discussion on the blog, has nearly 2,000 page views

But the typical junior investigator does not have a blog with hundreds of followers. Will their papers ever be read if even more papers are added to the already bloated scientific literature? The current peer-review system provides an important filter. There is an inherent trade-off between speed of dissemination and quality and it’s not clear to me that we should swing the balance all the way over to the speed side. There are other ways to speed up dissemination that we should try first. Also there is nothing stopping us from posting our papers online before publication and promoting them via twitter or an aggregator. In fact, as pointed out by Jan Jensen on Jeff’s post,  arXiv papers are indexed on Google Scholar within a week, which also keeps track of arXiv citations.

The Internet is bringing many changes that will improve our peer-review system. But the current pre-publication peer-review process does a decent job of  

  1. providing signal for the promotion process and
  2. reducing noise in the literature to make dissemination possible. 

Any alternative systems should be evaluated carefully before dismantling a system that has helped keep our Universities at the top of the world rankings.

Sunday Data/Statistics Link Roundup (10/7/12)

  1. Jack Welch got a little conspiracy-theory crazy with the job numbers. Thomas Lumley over at StatsChat makes a pretty good case for debunking the theory. I think the real take home message of Thomas’ post and one worth celebrating/highlighting is that agencies that produce the jobs report do so based on a fixed and well-defined study design. Careful efforts by government statistics agencies make it hard to fudge/change the numbers. This is an underrated and hugely important component of a well-run democracy. 
  2. On a similar note Dan Gardner at the Ottawa Citizen points out that evidence-based policy making is actually not enough. He points out the critical problem with evidence: in the era of data what is a fact? “Facts” can come from flawed or biased studies just as easily from strong studies. He suggests that a true “evidence based” administration would invest more money in research/statistical agencies. I think this is a great idea. 
  3. An interesting article by Ben Bernanke suggesting that an optimal approach (in baseball and in policy) is one based on statistical analysis, coupled with careful thinking about long-term versus short-term strategy. I think one of his arguments about allowing players to play even when they are struggling short term is actually a case for letting the weak law of large numbers play out. If you have a player with skill/talent, they will eventually converge to their “true” numbers. It’s also good for their confidence….(via David Santiago).
  4. Here is another interesting peer review dust-up. It explains why some journals “reject” papers when they really mean major/minor revision to be able to push down their review times. I think this highlights yet another problem with pre-publication peer review. The evidence is mounting, but I hear we may get a defense of the current system from one of the editors of this blog, so stay tuned…
  5. Several people (Sherri R., Alex N., many folks on Twitter) have pointed me to this article about gender bias in science. I initially was a bit skeptical of such a strong effect across a broad range of demographic variables. After reading the supplemental material carefully, it is clear I was wrong. It is a very well designed/executed study and suggests that there is still a strong gender bias in science, across ages and disciplines. Interestingly both men and women were biased against the female candidates. This is clearly a non-trivial problem to solve and needs a lot more work, maybe one step is to make recruitment packages more flexible (see the comment by Allison T. especially). 

Fraud in the Scientific Literature

Fraud in the Scientific Literature

Not just one statistics interview...John McGready is the Jon Stewart of statistics

Editor’s Note: We usually reserve Friday’s for posting Simply Statistics Interviews. This week, we have a special guest post by John McGready, a colleague of ours who has been doing interviews with many of us in the department and has some cool ideas about connecting students in their first statistics class with cutting edge researchers wrestling with many of the same concepts applied to modern problems. I’ll let him explain…

I teach a two quarter course in introductory biostatistics to master’s students in public health at Johns Hopkins.  The majority of the class is composed of MPH students, but there are also students doing professional master’s degrees in environmental health, molecular biology, health policy and mental health. Despite the short length of the course, it covers the “greatest hits” of biostatistics, encompassing everything from exploratory data analysis up through and including multivariable proportional hazards regression.  The course focus is more conceptual and less mathematical/computing centric than the other two introductory sequences taught at Hopkins: as such it has earned the unfortunate nickname “baby biostatistics” from some at the School.  This, in my opinion, is an unfortunate misnomer: statistical reasoning is often the most difficult part of learning statistics.  We spend a lot of time focusing on the current literature, and making sense or critiquing research by considering not only the statistical methods employed and the numerical findings, but also the study design and the logic of the substantive conclusions made by the study authors.

Via the course, I always hope to demonstrate the importance  biostatistics as a core driver of public health discovery, the importance of statistical reasoning in the research process, and how the fundamentals that are covered are the framework for more advanced methodology. At some point it dawned on me that the best approach for doing this was to have my colleagues speak to my students about these ideas.  Because of timing and scheduling constraints, this proved difficult to do in a live setting.  However, in June of 2012 a video recording studio opened here at the Hopkins Bloomberg School. At this point, I knew that I had to get my colleagues on video so that I could share their wealth of experiences and expertise with my students, and give the students multiple perspectives. To my delight my colleagues are very amenable to being interviewed and have been very generous with their time. I plan to continue doing the interviews so long as my colleagues are willing and the studio is available.

I have created a Youtube channel for these interviews.  At some point in the future, I plan to invite the biostatistics community as a whole to participate.  This will include interviews with visitors to my department, and submissions by biostatistics faculty and students from other schools. (I realize I am very lucky to have these facilities and video expertise at Hopkins: but many folks are tech savvy enough to film their own videos on their cameras, phones etc… in fact you have seen such creativity by the editors of this here blog). With the help of some colleagues I plan on making a complimentary website that will allow for easy submission of videos for posting, so stay tuned!