Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Sunday data/statistics link roundup (12/2/13)

  1. I’m in Australia for Bioinfo Summer 2013! First time in Australia and excited about the great lineup of speakers and to meet a bunch of people at the University of Adelaide. 
  2. An interesting post about how CS has become the de facto language of our times. They specifically talk about CS50 at Harvard. I think in terms of being an informed citizen CS and Statistics are quickly being added to Reading, Writing, and Arithmetic as the required baseline knowledge (link via Alex N.)
  3. A long but fascinating read by Gary King about restructuring the social sciences with a focus on ending the quantitative/qualitative divide. I think a similar restructuring has been going on in biology for a while. It is nearly impossible to be a modern molecular biologist without at least some basic training in statistics. Similarly statisticians are experiencing an inverted revolution where we are refocusing on applications and some basic scientific experience is becoming a required component of being a statistician (link via Rafa).
  4. This is how you make a splash in data science. Rochester is hiring 20! faculty across multiple disciplines. It will be interesting to see how that works out (link via Rafa). This goes along with the recent announcement of the Moore foundation funding Berkeley, UW, and NYU to build data science cultures/environments.
  5. PLoS is rich and they have to figure out what to do! They are a non-profit, but their journal PLoS One publishes about 30k papers a year at about 1k a pop. That is some serious money, which they need to figure out how to spend pronto. My main suggestion: fund research to figure out a way to put peer reviewing on the same level as publishing in terms of academic credit (link via Simina B.)
  6. A group of psychologists got together and performed replication experiments for 13 major effects. They replicated 11/13 (of course depending on your definition of replication). Hopefully these results are a good first step toward reducing the mania around the “replication crisis” and refocusing attention back on real solutions.

Statistical zealots

Yesterday my data sharing policy went a little bit viral. It hit the front page of Hacker News and was a trending repo on Github. I was reading the comments on Hacker News and came across this gem:

So, while I can imagine there are good Frequentists Statisticians out there, I insist that frequentism itself is bogus.

This is the extension of a long standing debate about the relative merits of frequentist and Bayesian statistical methods. It is interesting that I largely only see one side of the debate being played out these days. The Bayesian zealots have it in for the frequentists in a big way. The Hacker News comments are one example, but here are a [Yesterday my data sharing policy went a little bit viral. It hit the front page of Hacker News and was a trending repo on Github. I was reading the comments on Hacker News and came across this gem:

So, while I can imagine there are good Frequentists Statisticians out there, I insist that frequentism itself is bogus.

This is the extension of a long standing debate about the relative merits of frequentist and Bayesian statistical methods. It is interesting that I largely only see one side of the debate being played out these days. The Bayesian zealots have it in for the frequentists in a big way. The Hacker News comments are one example, but here are a](http://wmbriggs.com/blog/?p=5062) more. Interestingly, even the “popular geek press” is getting in the game.

I think it probably deserves a longer post but here are my thoughts on statistical zealotry:

  1. User effect »»»»»»»»> Philosophy effect. The person doing the statistics probably matters more than the statistical philosophy. I would prefer Andrew Gelman analyzed my data than a lot of frequentists. Similarly, I’d prefer that John Storey analyzed my data than a lot of Bayesians. 
  2. I agree with Noahpinion that this is likely mostly a philosophy battle than a real practical applications battle.
  3. I like Rob Kass’s idea that we should move away from frequentist vs. Bayesian to pragmatism. I think most real applied statisticians have already done this, if for no other reason than being pragmatic helps you get things done.
  4. Papers like this one that claim total victory for one side or the other all have one thing in common: they rarely use real data to verify their claims. The real world is messy and one approach never wins all the time.

My final thought on this matter is: never trust people with an agenda bearing extreme counterexamples.

Simply Statistics interview with Daphne Koller, Co-Founder of Coursera

Jeff and I had an opportunity to sit down with Daphne Koller, Co-Founder of Coursera and Rajeev Motwani Professor of Computer Science at Stanford University. Jeff and I both teach massive open online courses using the Coursera platform and it was great to be able to talk with Professor Koller about the changing nature of education today.

Some highlights:

  • [1:35] On the origins of Coursera: “I actually came to that realization when listening to talk about YouTube, and realizing that, why does it make sense for me to come and deliver the same lecture year after year after year where I could package it in much smaller bite size chunks that were much more fun and much more cohesive and then use the class time for engaging with students in more meaningful ways.
  • [7:22] On the role of MOOCs in academia: “Sometimes I have these discussions with some people in academic institutions who say that they feel that by engaging, for example, with MOOCs or blogs or social media they are diverting energy from what is their primary function which is teaching of their registered students…. But I think for most academic institutions, if I had to say what the primary function of an academic institution is, it’s the creation and dissemination of knowledge…. The only way society is going to move forward is if more people are better educated.”
  • [10:15] On teaching: “I think that teaching is a scholarly work as well, a kind of distillation of knowledge that has to occur in order to put together a really great course.”
  • [11:19] On teaching to the world. “Teaching, and quality of teaching, that used to be something that you could hide away from everyone…here, we’re suddenly in a world where teaching is really visible to everyone, and as a consequence, good teaching is going to be visible as a role model.”
  • [13:33] On work/life balance: “It’s been insane. It’s also been somewhat surreal…. Sometimes I look at my life and I’m saying really, I mean, who’s life is this?”

You must be at least 20 years old for this job

The New York Times is recruiting a chief data scientist.

Future of Statistics take home messages. #futureofstats

A couple weeks ago we had the Future of Statistics Unconference. You can still watch it online here. Rafa also attended the Future of Statistical Sciences Workshop and wrote a great summary which you can read here.

I decided to write a summary of take home messages from our speakers at the Unconference. You can read it on Github here. I put it on Github for two reasons:

  1. I agree with Hadley’s statement that the future of statistics is on Github.
  2. I summarized them based on my interpretation and would love collaboration on the document. If you want to add your new thoughts/summaries, add a new section with your bullet pointed ideas and send me a pull request!

I sent our speakers a gift for presenting in the Unconference (if you were a speaker and didn’t get yours, email me!). Hadley posted the front on Twitter. Here is the back:

2013-11-21 10.16.54

P.S. Stay tuned for the future of Simply Statistics Unconferences.