27 Jan 2012
Some NIH R01 paylines are down to 10%. This means only 10% of grants are being funded. The plot below highlights that all we need is a tiny litte slice from Defense, Medicare, Medicaid or Social Security to bring that back up to 20%. The plot was taken from Alex Tarrabok’s great article in the Atlantic.
Update: The y-axis unit is billions of US dollars.
26 Jan 2012
Like many statisticians, I was amped to see a statistics paper appear in Science. Given the impact that statistics has on the scientific community, it is a shame that more statistics papers don’t appear in the glossy journals like Science or Nature. As I pointed out in the previous post, if the paper that introduced the p-value was cited every time this statistic was used, the paper would have over 3 million citations!
But a couple of our readers* have pointed to a response to the MIC paper published by Noah Simon and Rob Tibshirani. Simon and Tibshirani show that the MIC statistic is underpowered compared to another recently published statistic for the same purpose that came out in 2009 in the Annals of Applied Statistics. A nice summary of the discussion is provided by Florian over at his blog.
If the AoAS statistic came out first (by 2 years) and is more powerful (according to simulation), should the MIC statistic have appeared in Science?
The whole discussion reminds me of a recent blog post suggesting that journals need to pick one between groundbreaking and definitive. The post points out that groundbreaking and definitive are in many ways in opposition to each other.
Again, I’d suggest that statistics papers get short shrift in the glossy journals and I would like to see more. And the MIC statistic is certainly groundbreaking, but it isn’t clear that it is definitive.
As a comparison, a slightly different story played out with another recent high-impact statistical method, the false discovery rate (FDR). The original papers were published in statistics journals. Then when it was clear that the idea was going to be big, a more general-audience-friendly summary was published in PNAS (not Science or Nature but definitely glossy). This might be a better way for the glossy journals to know what is going to be a major development in statistics versus an exciting - but potentially less definitive - method.
25 Jan 2012
Our previous post on future of (statistics) graduate education was motivated by he Stanford online course on Artificial Intelligence. Here is an update on the class that had 160,000 people enroll. Some highlights: 1- Sebastian Thrun has given up his tenure at Stanford and he’s started a new online university called Udacity. 2- 248 students got a perfect score: they never got a single question wrong, over the entire course of the class. All 248 took the course online; not one was enrolled at Stanford. 3- Students from Afghanistan completed the course. What do you think are the chances these students could afford Stanford’s tuition? 4 - There were more students from Lithuania alone than there are students at Stanford altogether.
The class evaluations were not perfect. Here is a particularly harsh one. They also need to figure out how to evaluate online students. But I am sure there are plenty of people working on that problem. Here is an example. Regardless, this was the first such experiment and for a first try it seems like a huge success to me. As more professors try this, for example Harvard’s Gary King is conducting a similar class in Quantitative Research Methodology, it will become clearer that there is no future for in-class lectures as we know them today.
Thanks to Alex and Jeff for all the links.
25 Jan 2012
I wrote a quick (and very dirty) R script for creating a comparison cloud and a commonality cloud for President Obama’s 2011 and 2012 State of the Union speeches. The cloud on the left shows words that have different frequencies between the two speeches and the cloud on the right shows the words in common between the two speeches. Here is a higher resolution version.

The focus on jobs hasn’t changed much. But it is interesting how the 2012 speech seems to focus more on practical issues (tax, pay, manufacturing, oil) versus more emotional issues in 2011 (future, schools, laughter, success, dream).
The wordcloud R package does all the heavy lifting.
23 Jan 2012
The tough economic times we live in, and the potential for big paydays, have made entrepreneurship cool. From the venture capitalist-in-chief, to the javascript coding mayor of New York, everyone is on board. No surprise there, successful startups lead to job creation which can have a major positive impact on the economy.
The game has been dominated for a long time by the folks over in CS. But the value of many recent startups is either based on, or can be magnified by, good data analysis. Here are a few startups that are based on data/data analysis:
- The Climate Corporation -analyzes climate data to sell farmers weather insurance.
- Flightcaster - uses public data to predict flight delays
- Quid - uses data on startups to predict success, among other things.
- 100plus - personalized health prediction startup, predicting health based on public data
- Hipmunk - The main advantage of this site for travel is better data visualization and an algorithm to show you which flights have the worst “agony”.
To launch a startup you need just a couple of things: (1) a good, valuable source of data (there are lots of these on the web) and (2) a good idea about how to analyze them to create something useful. The second step is obviously harder than the first, but the companies above prove you can do it. Then, once it is built, you can outsource/partner with developers - web and otherwise - to implement your idea. If you can build it in R, someone can make it an app.
These are just a few of the startups whose value is entirely derived from data analysis. But companies from LinkedIn, to Bitly, to Amazon, to Walmart are trying to mine the data they are generating to increase value. Data is now being generated at unprecedented scale by computers, cell phones, even thremostats! With this onslaught of data, the need for people with analysis skills is becoming incredibly acute.
Statisticians, like computer scientists before them, are poised to launch, and make major contributions to, the next generation of startups.