30 Oct 2013
The Unconference on the Future of Statistics will begin at 12pm EDT today. Watch the live stream here.
29 Oct 2013
Tomorrow is the Unconference on the Future of Statistics from 12PM-1PM EDT. There are two ways that you can get in the game:
- Ask questions for our speakers on Twitter with the hashtag #futureofstats. Don’t wait, start right now, Roger, Rafa, and I are monitoring the hashtag and collecting questions. We will pick some to ask the speakers tomorrow during the Unconference.
- If you have an idea about the future of statistics write it up, post it on Github, on Blogger, on WordPress, on your personal website, then tweet it with the hashtag #futureofstats. We will do our best to collect these and post them with the video so your contributions will be part of the Unconference.
29 Oct 2013
I’ve been digging up old “future of statistics” writings from the past in anticipation of our Unconference on the Future of Statistics this Wednesday 12-1pm EDT. Last week I mentioned Daryl Pregibon’s experience trying to build statistical expertise into software. One classic is “The Future of Data Analysis” by John Tukey published in the Annals of Mathematical Statistics in 1962.
Perhaps the most surprising aspect of this paper is how relevant it remains today. I think perhaps with just a few small revisions it could easily be published in a journal today and few people would find it out of place.
In Section 3 titled “How can new data analysis be initiated?” he describes directions in which statisticians should go to grow the field of data analysis. But the advice itself is quite general and probably should be heeded by any junior statistician just starting out in research.
How is novelty most likely to begin and grow? Not through work on familiar problems, in terms of familiar frameworks, and starting with the results of applying familiar processes to the observations. Some or all of these familiar constraints must be given up in each piece of work which may contribute novelty.
Tukey’s article serves as a coherent and comprehensive roadmap for the development of data analysis as a field. He suggests that we should study how people analyze data and uncover “what works” and what doesn’t. However, he appears to draw the line at suggesting that such study should result in a single way of analyzing a given type of data. Rather, statisticians should maintain some flexibility in modeling and analysis. I personally think the reality should be somewhere the middle. Too much flexibility can lead to problems, but rigidity is not the solution.
It is interesting, from my perspective, that given how clear and coherent Tukey’s roadmap was in 1962, how much of it was essentially ignored. In fact, the field pretty much went the other direction towards more mathematical elegance (I’m guessing Tukey sensed this would happen). His article is uncomfortable to read, because it’s full of problems that arise in real data that are difficult to handle with standard approaches. He has an uncanny ability to make up methods that look totally bizarre on first glance but are totally reasonable after some thought.
I honestly can’t think of a better way to end this post than to quote Tukey himself.
The future of data analysis can involve great progress, the overcoming of real difficulties, and the provision of a great service to all fields of science and technology. Will it? That remains to us, to our willingness to take up the rocky road of real problems in preference to the smooth road of unreal assumptions, arbitrary criteria, and abstract results without real attachments. Who is for the challenge?
Read the paper. And then come join us at 12pm EDT tomorrow.
28 Oct 2013
Our online conference live-streamed on Youtube is going to happen on October 30th 12PM-1PM Baltimore (UTC-4:00) time. You can find more information here or sign up for email alerts here. I get bored with the usual speaker bios at conferences so I am turning our speaker bios into a game. Below you will find three bullet pointed items of interest about each of our speakers. Two of them are truths and one is a lie. See if you can spot the lies and sign up for the unconference!
Hadley Wickham
- Created the ggplot2/devtools packages.
- Developed R’s first class system.
- Is chief scientist at RStudio.
Daniela Witten
- Developed the most popular method for inferring Facebook connections.
- Created the Spacejam algorithm for inferring networks.
- Made the Forbes 30 under 30 list twice as a rising scientific star.
Joe Blitzstein
- A Professor of the Practice of Statistics at Harvard University.
- Created the first statistical method for automatically teaching the t-test.
- His statistics 101 course is frequently in the top 10 courses on iTunes U.
Hongkai Ji
- Developed the hmChIP database of over 2,000 ChIP-Seq and ChIP-Chip data samples.
- Coordinated the analysis of the orangutan genome project.
- Analyzed data to help us understand sonic-hedgehog mediated neural patterning.
Sinan Aral
- Coined the phrase “social networking potential”.
- Ran a large randomized study that determined the value of upvotes.
- Discovered that peer influence is dramatically overvalued in product adoption.
Hilary Mason
- Is a co-founder of DataGotham and HackNY
- Developed computational algorithms for identifying the optimal cheeseburger
- Founded the first company to create link sorting algorithms.