DealBook: Glaxo to Make Hostile Bid for Human Genome Sciences
09 May 2012DealBook: Glaxo to Make Hostile Bid for Human Genome Sciences
DealBook: Glaxo to Make Hostile Bid for Human Genome Sciences
Consider this exercise. Come up with a list of the top 5 people that you think are really good at data analysis.
There’s one catch: They have to be people that you’ve never met nor have had any sort of personal interaction with (e.g. email, chat, etc.). So basically people who have written papers/books you’ve read or have given talks you’ve seen or that you know through other publicly available information. Who comes to mind? It’s okay to include people who are no longer living.
The other day I was thinking about the people who I think are really good at data analysis and it occurred to me that they were all people I knew. So I started thinking about people that I don’t know (and there are many) but are equally good at data analysis. This turned out to be much harder than I thought. And I’m sure it’s not because they don’t exist, it’s just because I think good data analysis chops are hard to evaluate from afar using the standard methods by which we evaluate people.
I think there are a few reasons. First, people who are great at data analysis are likely not publishing papers or being productive in a manner that I, an outsider, would be able to observe. If they’re working at a pharmaceutical company working on a new drug or at some fancy new startup company, there’s no way I’m ever going to know about it unless I’m directly involved.
Another reason is that even for people who are well-known scientists or statisticians, the products they produce don’t really highlight the difficulties overcome in data analysis. For example, many good papers in the statistics literature will describe a new method with brief reference to the data that inspired the method’s development. In those cases, the data analysis usually appears obvious, as most things do after they’ve been done. Furthermore, papers usually exclude all the painful details about merging, cleaning, and inspecting the data as well as all the other things you tried that didn’t work. Papers in the substantive literature have a similar problem, which is that they focus on a scientific problem of interest and the analysis of the data is secondary.
As skills in data analysis become more important, it seems odd to me that we don’t have a great way to evaluate a person’s ability to do it as we do in other areas.
The very very cool UCLA Data Fest is going on as we speak. This is a statistical analysis marathon where teams of undergrads work through the night (and day) to address an important problem through data analysis. Last year they looked at crime data from the Los Angeles Police Department. I’m looking forward to seeing how this year goes.
Great work by Rob Gould and the Department of Statistics there.