Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Statistics isn't math but statistics can produce math

Mathgen, the web site that can produce randomly generated mathematics papers has apparently gotten a paper accepted in a peer-reviewed journal (although perhaps not the most reputable one). I am not at all surprised this happened, but it’s fun to read both the paper and the reviewer’s comments. 

(Thanks to Kasper H. for the pointer.)

Comparing Hospitals

There was a story a few weeks ago on NPR about how Medicare will begin fining hospitals that have 30-day readmission rates that are too high. This process was introduced in the Affordable Care Act and

Under the health care law, the penalties gradually will rise until 3 percent of Medicare payments to hospitals are at risk. Medicare is considering holding hospitals accountable on four more measures: joint replacements, stenting, heart bypass and treatment of stroke.

Those of you taking my computing course on Coursera have already seen some of the data used to for this assessment, which can be obtained at the hospital compare web site. It’s also worth noting that underlying the analysis for this was a detailed and thoughtful report published by the Committee of Presidents of Statistical Societies (COPSS) which was chaired by Tom Louis, a Professor here at Johns Hopkins.

The report, titled “Statistical Issues in Assessing Hospital Performance” covers much of the current methodology and its criticisms and has a number of recommendations. Of particular concern for hospitals is the issue of shrinkage targets—in an hierarchical model the estimate of the readmission rate for a hospital is shrunken towards the mean. But which mean? Hospitals with higher risk or sicker patient populations will look quite a bit worse than hospitals sitting amongst a healthy population if they are both compared to the same mean.

The report is worth reading even if you’re just interested in the practical application of hierarchical models. And the web site is fun to explore if you want to know how the hospitals around you are fairing.

Johns Hopkins Grad Anthony Damico Shows How To

[vimeo 43305640 w=500 h=281]

Johns Hopkins grad Anthony Damico shows how to make coffee with R (except not really). The BLS mug is what makes it for me.

A statistician loves the #insurancepoll...now how do we analyze it?

Amanda Palmer broke Twitter yesterday with her insurance poll. She started off just talking about how hard it is for musicians who rarely have health insurance, but then wandered into polling territory. She sent out a request for people to respond with the following information:

quick twitter poll. 1) COUNTRY?! 2) profession? 3) insured? 4) if not, why not, if so, at what cost per month (or covered by job)?

This quick little poll struck a nerve with people and her Twitter feed blew up. Long story short, tons of interesting information was gathered from folks. This information is frequently kept semi-obscured, particularly what is the cost of health insurance for folks in different places. This isn’t the sort of info that insurance companies necessarily publicize widely and isn’t the sort of thing people talk about. 

The results were really fascinating and its worth reading the above blog post or checking out the hashtag: #insurancepoll. But the most fascinating thing for me as a statistician was thinking about how to analyze these data. @aubreyjaubrey is apparently collecting the data someplace, hopefully she’ll make it public. 

At least two key issues spring to mind:

  1. This is a massive convenience sample. 
  2. It is being collected through a social network

Although I’m sure there are more. If a student is looking for an amazingly interesting/rich data set and some seriously hard stats problems, they should get in touch with Aubrey and see if they can make something of it!

Sunday Data/Statistics Link Roundup (10/14/12)

  1. A fascinating article about the debate on whether to regulate sugary beverages. One of the protagonists is David Allison, a statistical geneticist, among other things. It is fascinating to see the interplay of statistical analysis and public policy. Yet another example of how statistics/data will drive some of the most important policy decisions going forward. 
  2. A related article is this one on the way risk is reported in the media. It is becoming more and more clear that to be an educated member of society now means that you absolutely have to have a basic understanding of the concepts of statistics. Both leaders and the general public are responsible for the danger that lies in misinterpreting/misleading with risk. 
  3. A press release from the Census Bureau about how the choice of college major can have a major impact on career earnings. More data breaking the results down by employment characteristics and major are here and here. These data update some of the data we have talked about before in calculating expected salaries by major. (via Scott Z.)
  4. An interesting article about Recorded Future that describes how they are using social media data etc. to try to predict events that will happen. I think this isn’t an entirely crazy idea, but the thing that always strikes me about these sorts of project is how hard it is to measure success. It is highly unlikely you will ever exactly predict a future event, so how do you define how close you were? For instance, if you predicted an uprising in Egypt, but missed by a month, is that a good or a bad prediction? 
  5. Seriously guys, this is getting embarrassing. An article appears in the New England Journal “finding” an association between chocolate consumption and Nobel prize winners.  This is, of course, a horrible statistical analysis and unless it was a joke to publish it, it is irresponsible of the NEJM to publish. I’ll bet any student in Stat 101 could find the huge flaws with this analysis. If the editors of the major scientific journals want to continue publishing statistical papers, they should get serious about statistical editing.