Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

The worlds has changed from analogue to digital and it's time mathematical education makes the change too.

The worlds has changed from analogue to digital and it’s time mathematical education makes the change too.

Reverse scooping

I would like to define a new term: reverse scooping is when someone publishes your idea after you, and doesn’t cite you. It has happened to me a few times. What does one do? I usually send a polite message to the authors with a link to my related paper(s). These emails are usually ignored, but not always. Most times I don’t think it is malicious though. In fact, I almost reverse scooped a colleague recently.  People arrive at the same idea a few months (or years) later and there is just too much literature to keep track-off. And remember the culprit authors were not the only ones that missed your paper, the referees and associate editor missed it as well. One thing I have learned is that if you want to claim an idea, try to include it in the title or abstract as very few papers get read cover-to-cover.

New S.E.C. Tactics Yield Actions Against Hedge Funds

New S.E.C. Tactics Yield Actions Against Hedge Funds

Reproducible Research in Computational Science

First of all, thanks to Rafa for scooping me with my own article. Not sure if that’s reverse scooping or recursive scooping or….

The latest issue of Science has a special section on Data Replication and Reproducibility. As part of the section I wrote a brief commentary on the need for reproducible research in computational science. Science has a pretty tight word limit for it’s commentaries and so it was unfortunately necessary to omit a number of relevant topics.

The editorial introducing the special section, as well as a separate editorial in the same issue, seem to emphasize the errors/fraud angle. This might be because Science has once or twice been at the center of instances of scientific fraud. But as I’ve said previously (and a point I tried to make in the commentary), reproducibility is not needed soley to prevent fraud, although that is an important objective. Another important objective is getting ideas across and disseminating knowledge. I think this second objective often gets lost because there’s a sense that knowledge dissemination already happens and that it’s the errors that are new and interesting. While the errors are perhaps new, there is a problem of ideas not getting across as quickly as they could because of a lack of code and/or data. The lack of published code/data is arguably holding up the advancement of science (if not Science).

One important idea I wanted to get across was that we can ramp up to achieve the ideal scenario, if getting there immediately is not possible. People often get hung up on making the data available but I think a substantial step could be made by simply making code available. Why doesn’t every journal just require it? We don’t have to start with a grand strategy involving funding agencies and large consortia. We can start modestly and make useful improvements

A final interesting question that came up as the issue was going to press was whether I was talking about “reproducibility” or “replication”. As I made clear in the commentary, I define “replication” as independent people going out and collecting new data and “reproducibility” as independent people analyzing the same data. Apparently, others have the reverse definitions for the two words. The confusion is unfortunate because one idea has a centuries long history whereas the importance of the other idea has only recently become relevant. I’m going to stick to my guns here but we’ll have to see how the language evolves.

Roger's perspective on reproducible research published in Science