Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

R Workshop

I am going to start a continuing “R Workshop” series of posts with R tips and tricks. If you have questions you’d like answered or were wondering about certain aspects, please leave them in the comments.

Prezi

Andrew Jaffe pointed me to prezi.com. It looks like a new way of making presentations. Andrew made an example here in just a couple of minutes. Here is one about Coca-Cola.

Things I like: 

  1. I go to a lot of Beamer/Powerpoint talks, these presentations at least look different and could be interesting. 
  2. It is cool how it is easy to arrange slides in a non-linear order and potentially avoid clicking forward a few slides then back a few slides
  3. I also like how the “global picture” of the talk can be shown in a display. 

Things I’m not worried about:

  1. All the zooming and panning might start to drive people nuts, like slide transitions in powerpoint. 
  2. There is serious potential for confusing presentations, organization is already a problem with some talks. 
  3. There is potential for people to spend too much time on making the prezi look cool and less on content. 

Update: From the comments Abhijit points out that David Smith put together a presentation on the R ecosystem using Prezi. Check it out here.

Submitting scientific papers is too time consuming

As an academic who does a lot of research for a living, I spend a lot of my time writing and submitting papers. Before my time, this process involved sending multiple physical copies of a paper by snail mail to the editorial office. New technology has changed this process. Now to submit a paper you generally have to: (1) find a Microsoft Word or Latex template for the journal and use it for your paper and (2) upload the manuscript and figures (usually separately). This is a big improvement over snail mail submission! But it still takes a huge amount of time. Some simple changes would give academics back huge blocks of time to focus on teaching and research.

Just to give an idea of how complicated the current system is here is an outline of what it takes to submit a paper.

To complete step (1) you go to the webpage of the journal you are submitting to, find their template files, and wrestle your content into the template. Sometimes this requires finding additional files which are not on the website of the journal you are submitting too. It always requires a large amount of tweaking the text and content to fit the template.

To complete step (2) you have to go the webpage of the journal and start an account with their content management system. There are frequently different requirements for usernames and passwords, leading to proliferation of both. Then you have to upload the files and fill out between 5-7 web forms with information about the authors, information about the paper, information about the funding, information about human subjects research, etc. If the files aren’t in the right format you may have to reformat them before they will be accepted. Some journals even have editorial assistants who will go over your submission and find problems that have to be resolved before your paper can even be reviewed.

This whole process can take anywhere from one to ten hours, depending on the journal. If you have to revise your paper for that journal, you have to go through the process again. If your paper is rejected, then you have to start all over with a new template and a new content management system at a new journal.

It seems like a much simpler system would be for people to submit their papers in pdf/word format with all the figures embedded. If the paper is accepted to a journal, then of course you might need to reformat the submission to make it easier for typesetters to reformat your article. But that could happen just one time, once a paper is accepted.

This seems like a small thing. But suppose you submit a paper between 10 and 15 times a year (very common for academics in my field). Suppose it takes on average 3 hours to submit a paper. That is 3 x 10 = 30 hours a year, almost an entire workweek, just dealing with reformatting papers!

In the comments, I’d love to hear about the best/worst experiences you have had submitting papers. Where is good? Where is bad?

Cool papers

  1. Here is a paper where they scraped Twitter data over a year and showed how the the tweets corresponded with sleep patterns and diurnal rhythms. The coolest part of this paper is that these two guys just went out and collected the data for free. I wish they had focused on more interesting questions though, it seems like you could do a lot with data like this. 
  2. Since flu season is upon us, here is an interesting paper where the authors used data on friendship networks and class structure in a high school to study flu transmission. They show targeted treatment isn’t as effective as people had thought when using random mixing models. 
  3. This one is a little less statistical. Over the last few years there were some pretty high profile papers that suggested that over-expressing just one protein could double or triple the lifetime of flies or worms. Obviously, that is a pretty crazy/interesting result. But in this paper some of those results are called into question. 

Defining data science

Rebranding of statistics as a field seems to be a popular topic these days and “data science” is one of the potential rebranding options. This article over at Revolutions is a nice summary of where the term comes from and what it means. This quote seems pretty accurate:

My own take is that Data Science is a valuable rebranding of computer science and applied statistics skills.