Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Sunday data/statistics link roundup (8/12/12)

  1. An interesting blog post about the top N reasons to do a Ph.D. in bioinformatics or computational biology. A couple of things that I find interesting and could actually be said of any program in biostatistics as well are: computing is the key skill of the 21st century and computational skills are highly transferrable. Via Andrew J. 
  2. Here is an interesting auto-complete map of the United States where the prompt was, “Why is [state] so”. It seems like using the Google auto-complete functions can lead to all sorts of humorous data, xkcd has used it as a data source a couple of times in the past. By the way, the person(s) who think Idaho is boring haven’t been to the right parts of Idaho. (via Rafa). 
  3. One of my all-time favorite statistics quotes appears in this column by David Brooks: “…what God hath woven together, even multiple regression analysis cannot tear asunder.” It seems like the perfect quote for any study that attempts to build a predictive model for a complicated phenomenon where only limited knowledge of the underlying mechanisms are known. 
  4. I’ve been reading up a lot on how to summarize and communicate risk. At the moment, I’ve been following a lot of David Spiegelhalter’s stuff, and really liked this 30,000 foot view summary.
  5. It is interesting how often you see R popping up in random places these days. Here is a blog post with some clearly R-created plots that appeared on Business Insider about predicting the stock-market. 
  6. Roger and I had a post on MOOC’s this week from the perspective of faculty teaching the courses. For a more departmental/administrative level view, be sure to re-read Rafa’s post on the future of graduate education

How Big Data Became So Big

How Big Data Became So Big

When dealing with poop, it's best to just get your hands dirty

I’m a relatively new dad. Before the kid we affectionately call the “tiny tornado” (TT) came into my life, I had relatively little experience dealing with babies and all the fluids they emit. So admittedly, I was a little squeamish dealing with the poopy explosions the TT would create. Inevitably, things would get much more messy than they had to be while I was being too delicate with the issue. It took me an embarrassingly long time for an educated man, but I finally realized you just have to get in there and change the thing even if it is messy, then wash your hands after. It comes off. 

It is a similar situation in my professional life, but I’m having a harder time learning the lesson. There are frequently things that I’m not really excited to do: review a lot of papers, go to long meetings, revise a draft of that paper that has just been sitting around forever. Inevitably, once I get going they usually aren’t as difficult or as arduous as I thought. Even better, once they are done I feel a huge sense of accomplishment and relief. I used to have a metaphor for this, I’d tell myself, “Jeff, just rip off the band-aid”. Now, I think “Jeff, just get your hands dirty”. 

Why we are teaching massive open online courses (MOOCs) in R/statistics for Coursera

Editor’s Note: This post written by Roger Peng and Jeff Leek. 

A couple of weeks ago, we announced that we would be teaching free courses in Computing for Data Analysis and Data Analysis on the Coursera platform. At the same time, a number of other universities also announced partnerships with Coursera leading to a large number of new offerings. That, coupled with a new round of funding for Coursera, led to press coverage in the New York Times, the Atlantic, and other media outlets.

There was an ensuing explosion of blog posts and commentaries from academics. The opinions ranged from dramatic, to negative, to critical, to um…hilariously angry. Rafa posted a few days ago that many of the folks freaking out are missing the point - the opportunity to reach a much broader audience of folks with our course content. 

[Before continuing, we’d like to make clear that at this point no money has been exchanged between Coursera and Johns Hopkins. Coursera has not given us anything and Johns Hopkins hasn’t given them anything. For now, it’s just a mutually beneficial partnership — we get their platform and they get to use our content. In the future, Coursera will need to figure out a way to make money, and they are currently considering a number of options.] 

Now that the initial wave of hype has died down, we thought we’d outline why we are excited about participating in Coursera. We think it is only fair to start by saying this is definitely an experiment. Coursera is a newish startup and as such is still figuring out its plan/business model. Similarly, our involvement so far has been a little whirlwind and we haven’t actually taught courses yet, and we are happy to collect data and see how things turn out. So ask us again in 6 months when we are both done teaching.

But for now, this is why we are excited.

  1. Open Access. As Rafa alluded to in his post, this is an opportunity to reach a broad and diverse audience. As academics devoted to open science, we also think that opening up our courses to the biggest possible audience is, in principle, a good thing. That is why we are both basing our courses on free software and teaching the courses for free to anyone with an internet connection. 
  2. Excitement about statistics. The data revolution means that there is a really intense interest in statistics right now. It’s so exciting that Joe Blitzstein’s stat class on iTunes U has been one of the top courses on that platform. Our local superstar John McGready has also put his statistical reasoning course up on iTunes U to a similar explosion of interest. Rafa recently put his statistics for genomics lectures up on Youtube and they have already been viewed thousands of times. As people who are super pumped about the power and importance of statistics, we want to get in on the game. 
  3. We work hard to develop good materials. We put effort into building materials that our students will find useful. We want to maximize the impact of these efforts. We have over 30,000 students enrolled in our two courses so far. 
  4. It is an exciting experiment. Online teaching, including very very good online teaching, has been around for a long time. But the model of free courses at incredibly large scale is actually really new. Whether you think it is a gimmick or something here to stay, it is exciting to be part of the first experimental efforts to build courses at scale. Of course, this could flame out. We don’t know, but that is the fun of any new experiment. 
  5. Good advertising. Every professor at a research school is a start-up of one. This idea deserves it’s own blog post. But if you accept that premise, to keep the operation going you need good advertising. One way to do that is writing good research papers, another is having awesome students, a third is giving talks at statistical and scientific conferences. This is an amazing new opportunity to showcase the cool things that we are doing. 
  6. Coursera built some cool toys. As statisticians, we love new types of data. It’s like candy. Coursera has all sorts of cool toys for collecting data about drop out rates, participation, discussion board answers, peer review of assignments, etc. We are pretty psyched to take these out for a spin and see how we can use them to improve our teaching.
  7. Innovation is going to happen in education. The music industry spent years fighting a losing battle over music sharing. Mostly, this damaged their reputation and stopped them from developing new technology like iTunes/Spotify that became hugely influential/profitable. Education has been done the same way for hundreds (or thousands) of years. As new educational technologies develop, we’d rather be on the front lines figuring out the best new model than fighting to hold on to the old model. 

Finally, we’d like to say a word about why we think in-person education isn’t really threatened by MOOCs, at least for our courses. If you take one of our courses through Coursera you will get to see the lectures and do a few assignments. We will interact with students through message boards, videos, and tutorials. But there are only 2 of us and 30,000 people registered. So you won’t get much one on one interaction. On the other hand, if you come to the top Ph.D. program in biostatistics and take Data Analysis, you will now get 16 weeks of one-on-one interaction with Jeff in a classroom, working on tons of problems together. In other words, putting our lectures online now means at Johns Hopkins you get the most qualified TA you have ever had. Your professor. 

A non-exhaustive list of things I have failed to accomplish

A few years ago I stumbled across a blog post that described a person’s complete cv. The idea was that the cv listed both the things they had accomplished and the things they had failed to accomplish. At the time, it really helped me to see that to be successful you have to be willing to fail over and over. 

I use my website to show the things I have accomplished career-wise. But I have also failed to achieve a lot of the things I set out to do. The reason was that there was strong competition for the awards/positions I was up for and other deserving people got them.   

  1. Applied to MIT undergrad in 1999 - rejected
  2. Donovan J. Thompson Award 2001 - did not receive
  3. Applied for Barry Goldwater scholarship 2002 - rejected
  4. Applied for NSF Pre-Doctoral Fellowship 2003 - rejected
  5. Applied for graduate school in math at MIT 2003, rejected
  6. One of my first 3 papers rejected at PLoS Biology 2005
  7. Many subsequent rejections of papers - too many to list exhaustively but here is one example
  8. Applied for Youden Award 2010 - rejected
  9. Applied for Microsoft Faculty Fellowship 2012 - rejected
  10. Applied for Sloan Fellowship 2012 - rejected
  11. Many grants have been rejected, again too long to list exhaustively