Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Not So Standard Deviations Episode 18 - Divide by n-1, or n-2, or Whatever

Hilary and I talk about statistical software in fMRI analyses, the differences between software testing differences in proportions (a must listen!), and a preview of JSM 2016.

Also, Hilary and I have just published a new book, Conversations on Data Science, which collects some of our episodes in an easy-to-read format. The books is available from Leanpub and will be updated as we record more episodes.

If you have questions you’d like us to answer, you can send them to nssdeviations @ gmail.com or tweet us at @NSSDeviations.

Subscribe to the podcast on iTunes.

Subscribe to the podcast on Google Play.

Please leave us a review on iTunes!

Support us through our Patreon page.

Show Notes:

Download the audio for this episode.

Listen here:

Tuesday update

It Might All Be Wrong

Tom Nichols and colleagues have published a paper on the software used to analyze fMRI data:

Functional MRI (fMRI) is 25 years old, yet surprisingly its most common statistical methods have not been validated using real data. Here, we used resting-state fMRI data from 499 healthy controls to conduct 3 million task group analyses. Using this null data with different experimental designs, we estimate the incidence of significant results. In theory, we should find 5% false positives (for a significance threshold of 5%), but instead we found that the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.

Criminal Justice Forecasts

The ongoing discussion over the use of prediction algorithms in the criminal justice system reminds me a bit of the introduction of DNA evidence decades ago. Ultimately, there is a technology that few people truly understand and there are questions as to whether the information they provide is fair or accurate.

Shameless Promotion

I have a new book coming out with Hilary Parker, based on our Not So Standard Deviations podcast. Sign up to be notified of its release (which should be Real Soon Now).

Not So Standard Deviations Episode 18 - Back on Planet Earth

With Hilary fresh from Use R! 2016, Hilary and I discuss some of the highlights from the conference. Also, some followup about a previous Free Advertising and the NSSD drinking game.

If you have questions you’d like us to answer, you can send them to nssdeviations @ gmail.com or tweet us at @NSSDeviations.

Subscribe to the podcast on iTunes.

Subscribe to the podcast on Google Play.

Please leave us a review on iTunes!

Support us through our Patreon page.

Show notes:

Download the audio for this episode.

Tuesday Update

If you weren’t sick of Theranos yet….

Looks like there will be a movie version of the Theranos saga which, as far as I can tell, isn’t over yet, but no matter. It will be done by Adam McKay, the writer-director of The Big Short (excellent film), and will star Jennifer Lawrence as Elizabeth Holmes. From Vanity Fair:

Legendary Pictures snapped up rights to the hot-button biopic for a reported $3 million Thursday evening, after outbidding and outlasting a swarm of competition from Warner Bros., Twentieth Century Fox, STX Entertainment, Regency Enterprises, Cross Creek, Amazon Studios, AG Capital, the Weinstein Company, and, in the penultimate stretch, Paramount, among other studio suitors.

Based on a book proposal by two-time Pulitzer Prize-winning journalist John Carreyrou titled Bad Blood: Secrets and Lies in Silicon Valley, the project (reported to be in the $40 million to $50 million budget range) has made the rounds to almost every studio in town. It’s been personally pitched by McKay, who won an Oscar for best adapted screenplay for last year’s rollicking financial meltdown procedural The Big Short.

Frankly, I think we all know how this movie will end.

The People vs. OJ Simpson vs….Statistics

I’m in the middle of watching The People vs. OJ Simpson and so far it is fantastic—I highly recommend it. One thing that is not represented in the show is the important role that statistics played in the trial. The trial was just in the early days of using DNA as evidence in criminal trials and there were many questions about how likely it was to find DNA matches in blood.

Terry Speed ended up testifying for the defense (Simpson) and in this nice interview, he explains how that came to be:

At the beginning of the Simpson trial, there was going to be a pre-trial hearing and experts from both sides would argue in front of the judge as to what approaches should be accepted. Other pre-trial activities dragged on, and the one on DNA forensics was eventually scrapped. The DNA experts, including me were then asked whether they wanted to give evidence for the prosecution or defence, or leave. I did not initially plan to join the defence team, but wished to express my point of view in what was more or less a scientific environment before the trial started, but when the pre-trial DNA hearing was scrapped, I decided that I had no choice but to express my views in court on behalf of the defence, which I did.

The full interview is well worth the read.

AI is the residual

I just recently found out about the AI effect which I thought was interesting. Basically, “AI” is whatever can’t be explained, or in other words, the residuals of machine learning.

A Year at Stack Overflow

David Robinson (@drob) has a great post on his blog about his first year as a data scientist at Stack Overflow. This section in particular stood out for me:

I like using R to learn interesting things about our data, but my longer term goal is to make it easy for any of our engineers to do so….Towards this goal, I’ve been focusing on building reliable tools and frameworks that people can apply to a variety of problems, rather than “one-off” analysis scripts. (There’s an awesome post by Jeff Magnusson at StitchFix about some of these general challenges). My approach has been building internal R packages, similar to AirBnb’s strategy (though our data team is quite a bit younger and smaller than theirs). These internal packages can query databases and parsing our internal APIs, including making various security and infrastructure issues invisible to the user.

The world needs an army of David Robinsons.