Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Tukey Talks Turkey #futureofstats

I’ve been digging up old “future of statistics” writings from the past in anticipation of our Unconference on the Future of Statistics this Wednesday 12-1pm EDT. Last week I mentioned Daryl Pregibon’s experience trying to build statistical expertise into software. One classic is “The Future of Data Analysis” by John Tukey published in the Annals of Mathematical Statistics in 1962.

Perhaps the most surprising aspect of this paper is how relevant it remains today. I think perhaps with just a few small revisions it could easily be published in a journal today and few people would find it out of place.

In Section 3 titled “How can new data analysis be initiated?” he describes directions in which statisticians should go to grow the field of data analysis. But the advice itself is quite general and probably should be heeded by any junior statistician just starting out in research.

How is novelty most likely to begin and grow? Not through work on familiar problems, in terms of familiar frameworks, and starting with the results of applying familiar processes to the observations. Some or all of these familiar constraints must be given up in each piece of work which may contribute novelty.

Tukey’s article serves as a coherent and comprehensive roadmap for the development of data analysis as a field. He suggests that we should study how people analyze data and uncover “what works” and what doesn’t. However, he appears to draw the line at suggesting that such study should result in a single way of analyzing a given type of data. Rather, statisticians should maintain some flexibility in modeling and analysis. I personally think the reality should be somewhere the middle. Too much flexibility can lead to problems, but rigidity is not the solution.

It is interesting, from my perspective, that given how clear and coherent Tukey’s roadmap was in 1962, how much of it was essentially ignored. In fact, the field pretty much went the other direction towards more mathematical elegance (I’m guessing Tukey sensed this would happen). His article is uncomfortable to read, because it’s full of problems that arise in real data that are difficult to handle with standard approaches. He has an uncanny ability to make up methods that look totally bizarre on first glance but are totally reasonable after some thought.

I honestly can’t think of a better way to end this post than to quote Tukey himself.

The future of data analysis can involve great progress, the overcoming of real difficulties, and the provision of a great service to all fields of science and technology. Will it? That remains to us, to our willingness to take up the rocky road of real problems in preference to the smooth road of unreal assumptions, arbitrary criteria, and abstract results without real attachments. Who is for the challenge?

Read the paper. And then come join us at 12pm EDT tomorrow.