Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Battling Bad Science

Here is a pretty awesome TED talk by epidemiologist Ben Goldacre where he highlights how science can be used to deceive/mislead. It’s sort of like epidemiology 101 in 15 minutes. 

This seems like a highly topical talk. Over on his blog, Steven Salzberg has pointed out that Dr. Oz has recently been engaging in some of these shady practices on his show. Too bad he didn’t check out the video first. 

In the comments section of the TED talk, one viewer points out that Dr. Goldacre doesn’t talk about the role of the FDA and other regulatory agencies. I think that regulatory agencies are under-appreciated and deserve credit for addressing many of these potential problems in the conduct of clinical trials. 

Maybe there should be an agency regulating how science is reported in the news? 

Why does Obama need statisticians?

It’s worth following up a little on why the Obama campaign is recruiting statisticians (note to Karen: I am not looking for a new job!). Here’s the blurb for the position of “Statistical Modeling Analyst”:

The Obama for America Analytics Department analyzes the campaign’s data to guide election strategy and develop quantitative, actionable insights that drive our decision-making. Our team’s products help direct work on the ground, online and on the air. We are a multi-disciplinary team of statisticians, mathematicians, software developers, general analysts and organizers - all striving for a single goal: re-electing President Obama. We are looking for staff at all levels to join our department from now through Election Day 2012 at our Chicago, IL headquarters.

Statistical Modeling Analysts are charged with predicting electoral outcomes using statistical models. These models will be instrumental in helping the campaign determine how to most effectively use its resources. </blockquote>

I wonder if there’s a bonus for predicting the correct outcome, win or lose?

The Obama campaign didn’t invent the idea of heavy data analysis in campaigns, but they seem to be heavy adopters. There are 3 openings in the “Analytics” category as of today.

Now, can someone tell me why they don’t just call it simply “Statistics”?

Kindle Fire and Machine Learning

Amazon released it’s new iPad competitor, the Kindle Fire, today. A quick read through the description shows it has some interesting features, including a custom-built web browser called Silk. One innovation that they claim is that the browser works in conjunction with Amazon’s EC2 cloud computing platform to speed up the web-surfing experience by doing some computing on your end and some on their end. Seems cool, if it really does make things faster.

Also there’s this interesting bit:

Machine Learning

Finally, Silk leverages the collaborative filtering techniques and machine learning algorithms Amazon has built over the last 15 years to power features such as “customers who bought this also bought…” As Silk serves up millions of page views every day, it learns more about the individual sites it renders and where users go next. By observing the aggregate traffic patterns on various web sites, it refines its heuristics, allowing for accurate predictions of the next page request. For example, Silk might observe that 85 percent of visitors to a leading news site next click on that site’s top headline. With that knowledge, EC2 and Silk together make intelligent decisions about pre-pushing content to the Kindle Fire. As a result, the next page a Kindle Fire customer is likely to visit will already be available locally in the device cache, enabling instant rendering to the screen.

That seems like a logical thing for Amazon to do. While the idea of pre-fetching pages is not particularly new, I haven’t yet heard of the idea of doing data analysis on web pages to predict which things to pre-fetch. One issue this raises in my mind, is that in order to do this, Amazon needs to combine information across browsers, which means your surfing habits will become part of one large mega-dataset. Is that what we want?

On the one hand, Amazon already does some form of this by keeping track of what you buy. But keeping track of every web page you goto and what links you click on seems like a much wider scope.

Once in a lifetime collapse

Baseball Prospectus uses Monte Carlo simulation to predict which teams will make the postseason. According to this page, on Sept 1st, the probability of the Red Sox making the playoffs was 99.5%. They were ahead of the Tampa Bay Rays by 9 games. Before last night’s game, in September, the Red Sox had lost 19 of 26 games and were tied with the Rays for the wild card (the last spot for the playoffs). To make this event even more improbable, The Red Sox were up by one in the ninth with two outs and no one on for the last place Orioles. In this situation the team that’s winning, wins more than 95% of the time. The Rays were in exactly the same situation as the Orioles, losing to the first place Yankees (well, their subs). So guess what happened? The Red Sox lost, the Rays won. But perhaps the most amazing event is that these two games, both lasting much more than usual (one due to rain the other to extra innings) ended within seconds of each other. 

Update: Nate Silver beat me to it. And has much more!

Obama recruiting analysts who know R

Obama recruiting analysts who know R