Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

What are the products of data analysis?

Thanks to everyone for the feedback on my post on knowing when someone is good at data analysis. A couple people suggested I take a look here for a few people who have proven they’re good at data analysis. I think that’s a great idea and a good place to start.

But I also think that while demonstrating an ability to build good prediction models is impressive and definitely shows an understanding of the data, not all important problems can be easily posed as prediction problems. Most of my work does not involve prediction at all and the problems I face (i.e., estimating very small effects in the presence of large unmeasured confounding factors) would be difficult to formulate as a prediction challenge (at least, I can’t think of an easy way). In fact, part of my and my colleagues’ research involves showing how statistical methods designed for prediction problems can fail miserably when applied to other non-prediction settings.

The general question I have is what is a useful product that you can produce from a data analysis that demonstrates the quality of that analysis? So, a very small mean squared error from a prediction model would be one product (especially if it were smaller than everyone else’s). Maybe a cool graph with a story behind it? 

If I were hiring a musician for an orchestra, I wouldn’t have to meet that person to have strong evidence that he/she were good. I could just listen to some recordings of that person playing and that would be a pretty good predictor of how that person would perform in the orchestra. In fact, some major orchestras do completely blind auditions so that although the person is present in the room, all you hear is the sound of the playing.

What seems to be true with music at least, is that even though the final performance doesn’t specifically reveal the important decisions that were made along the way to craft the interpretation of the music, somehow one is still able to appreciate the fact that all those decisions were made and they benefitted the performance. To me, it seems unlikely to arrive at a sublime performance either by chance or by some route that didn’t involve talent and hard work. Maybe it could happen once, but to produce a great performance over and over requires more than just luck.

What products could you send to someone to convince them you were good at data analysis? I raise this question primarily because when I look around at the products that I make (research papers, software, books, blogs), even if they are very good, I don’t think they necessarily convey any useful information about my ability to analyze data.

What’s the data analysis equivalent of a musician’s performance?