Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Not So Standard Deviations Episode 24 - 50 Minutes of Blathering

Another IRL episode! Hilary and I met at a Jimmy John’s to talk data science, like you do. Topics covered include RStudio Conf, polling, millennials, Karl Broman, and more!

If you have questions you’d like us to answer, you can send them to nssdeviations @ gmail.com or tweet us at @NSSDeviations.

Subscribe to the podcast on iTunes or Google Play. And please leave us a review on iTunes.

Support us through our Patreon page.

Get the Not So Standard Deviations book.

Show notes:

Download the audio for this episode

Listen here:

Should I make a chatbot or a better FAQ?

Roger pointed me to this interesting article (paywalled, sorry!) about Facebook’s chatbot service. I think the article made a couple of interesting points. The first thing I thought was interesting was their explicit acknowledgement of the process I outlined in a previous post for building an AI startup - (1) convince (or in this case pay) some humans to be your training set, and (2) collect the data on the humans and then use it to build your AI.

The other point that is pretty fascinating is that they realized how many data points they would need before they could reasonably replace a human with an AI chatbot. The original estimate was tens of thousands and the ultimate number was millions or more. I have been thinking a lot that the AI “revolution” is just a tradeoff between parameters and data points. If you have a billion parameter prediction algorithm it may work pretty awesome - as long as you have a few hundred billion data points to train it with.

But the theme of the article was that chatbots may have had some mis-steps/may not be ready for prime time. I think the main reason is that at the moment most AI efforts can only report facts, not intuit intention and alter the question for the user or go beyond the facts/state of the world.

One example I’ve run into recently was booking a ticket on an airline. I wanted to know if I could make a certain change to my ticket. The airline didn’t have any information about the change I wanted to make online. After checking thoroughly I clicked on the “Chat with an agent” button and was directed to what was clearly a chatbot. The chatbot asked a question or two and then sent me to the “make changes to a ticket” page of the website.

I eventually had to call and get a person on the phone, because what I wanted to ask about didn’t apply to the public information. They set me straight and I booked the ticket. The chatbot wasn’t helpful because it could only respond with information it had available on the website. It couldn’t identify a new situation, realize it had to ask around, figure out there was an edge case, and then make a ruling/help out.

I would guess that most of the time if a person interacts with a chatbot they are doing it only because they already looked at all the publicly available information on the FAQ, etc. and couldn’t find it. So an alternative solution, which would require a lot less work and a much smaller training set, is to just have a more complete FAQ.

The question to me is does anyone other than Facebook or Google have a big enough training set to make a chatbot worth it?

The Dangers of Weighting Up a Sample

There’s a great story by Nate Cohn over at the New York Times’ Upshot about the dangers of “weighting up” a sample from a survey. In this case, it is in regards to a U.S.C/LA Times poll asking who people will vote for President:

The U.S.C./LAT poll weights for many tiny categories: like 18-to-21-year-old men, which U.S.C./LAT estimates make up around 3.3 percent of the adult citizen population. Weighting simply for 18-to-21-year-olds would be pretty bold for a political survey; 18-to-21-year-old men is really unusual.

The U.S.C./LA Times poll apparently goes even further:

When you start considering the competing demands across multiple categories, it can quickly become necessary to give an astonishing amount of extra weight to particularly underrepresented voters — like 18-to-21-year-old black men. This wouldn’t be a problem with broader categories, like those 18 to 29, and there aren’t very many national polls that are weighting respondents up by more than eight or 10-fold. The extreme weights for the 19-year-old black Trump voter in Illinois are not normal.

It’s worth noting (as a good thing) that the U.S.C./LA Times poll data is completely open, thus allowing the NYT to reproduce this entire analysis.

I haven’t done much in the way of survey analyses, but I’ve done some inverse probability weighting and in my experience it can be a tricky procedure in ways that are not always immediately obvious. The article discusses weight trimming, but also notes the dangers of that procedure. Overall, a good treatment of a complex issue.

Information and VC Investing

Sam Lessin at The Information has a nice post (sorry, paywall, but it’s a great publication) about how increased measurement and analysis is changing the nature of venture capital investing.

This brings me back to what is happening at series A financings. Investors have always, obviously, tried to do diligence at all financing rounds. But series A investments used to be an exercise in a few top-level metrics a company might know, some industry interviews and analysis, and a whole lot of trust. The data that would drive capital market efficiency usually just wasn’t there, so capital was expensive and there were opportunities for financiers. Now, I am seeing more and more that after a seed round to boot up most companies, the backbone of a series A financing is an intense level of detail in reporting and analytics. It can be that way because the companies have the data

I’ve seen this happen in other areas where data comes in to disrupt the way things are done. Good analysis only gives you an advantage if no one else is doing it. Once everyone accepts the idea and everyone has the data (and a good analytics team), there’s no more value left in the market.

Time to search elsewhere.

papr - it's like tinder, but for academic preprints

As part of the Johns Hopkins Data Science Lab we are setting up a web and mobile data product prototyping shop. As part of that process I’ve been working on different types of very cheap and easy to prototype apps. A few days ago I posted about creating a distributed data collection app with Google Sheets.

So for fun I built another kind of app. This one I’m calling papr and its sort of like “Tinder for preprints”. I scraped all of the papers out of the http://biorxiv.org/ database. When you open the app you see one at random and you can rate it according to two axes:

  • Is the paper interesting? - a paper can be rated as exciting or boring. We leave the definitions of those terms up to you.
  • Is the paper correct or questionable? - a paper can either be solidly correct or potentially questionable in its results. We leave the definitions of those terms up to you.

When you click on your rating you are shown another randomly generated paper from bioRxiv. You can “level up” to different levels if you rate more papers. You can also download your ratings at any time.

If you have any feedback on the app I’d love to hear it and if anyone knows how to get custom domain names to work with shinyapps.io I’d also love to hear from you. I tried the instructions with no luck…

Try the app here:

https://jhubiostatistics.shinyapps.io/papr/