Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Survival analysis for hard drives

How long do hard drives last?

Backblaze has kept up to 25,000 hard drives constantly online for the last four years. Every time a drive fails, they note it down, then slot in a replacement. After four years, Backblaze now has some amazing data and graphs that detail the failure rate of hard drives over the first four years of their life.

I guess it’s easier to do this with hard drives than it is for people.

Future of Statistical Sciences Workshop is happening right now #FSSW2013

ASA Executive Director Ron Wasserstein is tweeting like mad man. If you’re not in London, catch up on what’s happening at the hashtag #FSSW2013.

Apple's Touch ID and a worldwide lesson in sensitivity and specificity

I’ve been playing with my new iPhone 5s for the last few weeks, and first let me just say that it’s an awesome phone. Don’t listen to whatever Jeff says. It’s probably worth it just for the camera, but I’ve been particularly interested in the behavior of Apple’s fingerprint sensor (a.k.a. Touch ID). Before the phone came out, there were persistent rumors of a fingerprint sensor from now-defunct AuthenTec, and I wondered how the sensor would work given that it was unlikely to be perfect.

Apple reportedly sold 9 million iPhone 5c and 5s models over the opening weekend alone. Of those, about 7 million were estimated to be the 5s model which includes the fingerprint sensor (the 5c does not include it). So now millions of people have been using this thing and I’m getting the sense that many people are experiencing the same behavior I’ve observed over the last few weeks.

  • The sensor appears to have a high specificity. If you put the wrong finger, or the wrong person’s finger on the sensor, it will not let you unlock the phone. I haven’t seen a single instance of a false positive here, which seems like a good thing.
  • The sensor’s sensitivity is modest. Given the correct finger, the sensor seems to have a sensitivity of between 50-80% based on my completely unscientific guestimation. It seems to depend a little on the finger. I don’t know if this is high or low based on other fingerprint sensors, but it’s mildly annoying to have to switch fingers or type in the passcode more often than I was expecting to have to do that.
  • Behavior seems to change depending on the task. This is pure speculation, but it seems the sensor is a bit more open to false positives if you’re using it to buy a song on iTunes. Although I haven’t actually seen it happen, it feels like I don’t have to place my finger on the sensor so perfectly if I’m just purchasing a song or an app.

If my experiences in any way reflect reality, it seems to make sense. Apple had to make some choices on what cutoffs to make for false positives and negatives, and I think they erred on the side of security. Having a high specificity is critical because that prevents a bad guy from accessing the phone. A low sensitivity is annoying, but not critical because the correct user could always type in a passcode. As for modifying the behavior based on the task, that seems to make sense too because you can’t buy songs or apps without first unlocking the phone.

Overall, I think Apple did a good job with the fingerprint sensor, especially for  version 1.0. I’m guessing they’re making improvements in the technology/software as we speak and will want to improve the sensitivity before they start using it for more tasks or applications.

Out with Big Data, in with Hyperdata

Big data is so last year.

Collecting data from all sorts of odd places and analyzing it much faster than was possible even a couple of years ago has become one of the hottest areas of the technology industry. The idea is simple: With all that processing power and a little creativity, researchers should be able to find novel patterns and relationships among different kinds of information.

For the last few years, insiders have been calling this sort of analysis Big Data. Now Big Data is evolving, becoming more “hyper” and including all sorts of sources. Start-ups like Premise and ClearStory Data, as well as larger companies like General Electric, are getting into the act.

...

“Hyperdata comes to you on the spot, and you can analyze it and act on it on the spot,” said Bernt Wahl, an industry fellow at the Center for Entrepreneurship and Technology at the University of California, Berkeley. “It will be in regular business soon, with everyone predicting and acting the way Amazon instantaneously changes its prices around.”

How to Host a Conference on Google Hangouts on Air

We recently hosted the first ever Simply Statistics Unconference on the Future of Statistics. In preparing for the event, we learned a lot about how to organize such an event and frankly we wished there had been a bit more organized documentation on how to do this. The various Google web sites were full of nice videos demonstrating how cool the technology is, but not much in the way of specific instructions on how to get it done.

I posted on GitHub my step-by-step list of instructions for how to set up and run a conference on Google Hangouts on Air in the hopes that someone would find it useful. I’m also happy accept corrections if something there is not right.