Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Private health insurers to release data

It looks like four major private health insurance companies will be releasing data for use by academic researchers. They will create a non-profit institute called the Health Care Cost Institute and deposit the data there. Researchers can request the data from the institute by (I’m guessing) writing a short proposal.

Health insurance billing claims data might not sound all that exciting, but they are a gold mine of very interesting information about population health. In my group, we use billing claims from Medicare Part A to explore the relationships between ambient air pollutants and hospital admissions for various cardiovascular and respiratory diseases. The advantage of using a database like Medicare is that the population is very large (about 48 million people) and highly relevant. Furthermore, the data are just sitting there, already collected. The disadvantage is that you get relatively little information about those people. For example, you can’t find out what a particular Medicare enrollee’s blood pressure is on a given day. Also, it requires some pretty sophisticated data analysis skills to go through these large databases and extract the information you need to address a scientific question. But this “disadvantage” is what allows statisticians to play an important role in making scientific discoveries.

It’s not clear what kind of information will be made available from the private insurers—it looks like it’s mostly geared towards doing economic/cost analysis. However, I’m guessing that there will be a host of other uses for the data that will be revealed as time goes on. 

Finish and publish

Roger pointed us to this Amstat news profile of statisticians including one on Francesca Dominici. Francesca has used her statistics skills to become a top environmental scientist. She had this advice for young [academic] statisticians:

First, I would say find a good mentor in or outside the department. Prioritize, manage your time, and identify the projects you would like to lead. Focus the most productive time of day on those projects. Take ownership of projects. The biggest danger is getting pulled in very different directions; focus on one main project. Finish everything you start. Always publish. Even if it is not revolutionary, publish.

I think this is great advice. And I want to add to the last two sentences. If you are smart and it took you time to figure out the solution to a problem you find interesting, chances are others will want to read about it. So follow Francesca’s advice: finish and publish.  Remember Voltaire’s  quote “perfection is the enemy of the good”.

Statistician Profiles

Just in case you forgot to renew your subscription to Amstat News, there’s a nice little profile of statisticians (including my good colleague Francesca Dominici) in the latest issue explaining how they ended up where they are.

I remember a few years ago I was at a dinner for our MPH program and the director at the time, Ron Brookmeyer, told all the students to ask the faculty how they ended up in public health. The implication, of course, was that the route was likely to be highly nonlinear. It was definitely that way for me.

Statisticians in particular, I think, have the ability to lead interesting careers simply because we have the ability to operate in a variety of substantive fields. I started out developing point process models for predicting wildfire occurrence. Perhaps to the chagrin of my advisor, I’m not doing much point process modeling now, but rather am working in environmental health doing quite a bit of air pollution epidemiology.

So ask a statistician how they ended up where they are. It’ll probably be an interesting story.

Data Sources

Here are places you can get data sets to analyze (for class projects, fun and profit!)

  1. Data Market
  2. Infochimps
  3. Data.gov
  4. Factual.com

I’m sure there are a ton more…would love to hear from people. 

Meetings

In this TED talk Jason Fried explains why work doesn’t happen at work. He describes the evils of meetings. Meetings are particularly disruptive for applied statisticians, especially for those of us that hack data files, explore data for systematic errors, get inspiration from visual inspection, and thoroughly test our code. Why? Before I become productive I go through a ramp-up/boot-up stage. Scripts need to be found, data loaded into memory, and most importantly, my brains needs to re-familiarize itself with the data and the essence of the problem at hand. I need a similar ramp up for writing as well. It usually takes me between 15 to 60 minutes before I am in full-productivity mode. But once I am in “the zone”, I become very focused and I can stay in this mode for hours. There is nothing worse than interrupting this state of mind to go to a meeting. I lose much more than the hour I spend at the meeting. A short way to explain this is that having 10 separate hours to work is basically nothing, while having 10 hours in the zone is when I get stuff done.

Of course not all meetings are a waste of time. Academic leaders and administrators need to consult and get advice before making important decisions. I find lab meetings very stimulating and, generally, productive: we unstick the stuck and realign the derailed. But before you go and set up a standing meeting consider this calculation: a weekly one hour meeting with 20 people translates into 1 hour x 20 people x 52 weeks/year = 1040 person hours of potentially lost production per year. Assuming 40 hour weeks, that translates into six months. How many grants, papers, and lectures can we produce in six months? And this does not take into account the non-linear effect described above. Jason Fried suggest you cancel your next meeting, notice that nothing bad happens and enjoy the extra hour of work.

I know many others that are like me in this regard and for you I have these recommendations: 1- avoid unnecessary meetings, especially if you are already in full-productivity mode. Don’t be afraid to use this as an excuse to cancel.  If you are in a soft $ institution, remember who pays your salary.  2- Try to bunch all the necessary meetings all together into one day. 3- Separate at least one day a week to stay home and work for 10 hours straight. Jason Fried also recommends that every work place declare a day in which no one talks. No meetings, no chit-chat, no friendly banter, etc… No talk Thursdays anyone?