06 Mar 2013
Simulation is commonly used by statisticians/data analysts to: (1) estimate variability/improve predictors, (2) to evaluate the space of potential outcomes, and (3) to evaluate the properties of new algorithms or procedures. Over the last couple of days, discussions of simulation have popped up in a couple of different places.
First, the reviewers of a paper that my student is working on had asked a question about the behavior of the method in different conditions. I mentioned in passing, that I thought it was a good idea to simulate some cases where our method will definitely break down.
I also saw this post by John Cook about simple/complex models. He raises the really important point that increasingly complex models built on a canonical, small, data set can fool you. You can make the model more and more complicated - but in other data sets the assumptions might not hold and the model won’t generalize. Of course, simple models can have the same problems, but generally simple models will fail on small data sets in the same way they would fail on larger data sets (in my experience) - either they work or they don’t.
These two ideas got me thinking about why I like simulation. Some statisticians, particularly applied statisticians, aren’t fond of simulation for evaluating methods. I think the reason is that you can always simulate a situation that meets all of your assumptions and make your approach look good. Real data rarely conform to model assumptions and so are harder to “trick”. On the other hand, I really like simulation, it can reveal a lot about how and when a method will work well and it allows you to explore scenarios - particularly for new or difficult to obtain data.
Here are the simulations I like to see:
- Simulation where the assumptions are true There are a surprising number of proposed methods/analysis procedures/analyses that fail or perform poorly even when the model assumptions hold. This could be because the methods overfit, have a bug, are computationally unstable, are on the wrong place on the bias/variance tradeoff curve, etc. etc. etc. I always do at least one simulation for every method where the answer should be easy to get, because I know if I don’t get the right answer, it is back to the drawing board.
- Simulation where things should definitely fail I like to try out a few realistic scenarios where I’m pretty sure my model assumptions won’t hold and the method should fail. This kind of simulation is good for two reasons: (1) sometimes I’m pleasantly surprised and the model will hold up and (2) (the more common scenario) I can find out where the model assumption boundaries are so that I can give concrete guidance to users about when/where the method will work and when/where it will fail.
The first type of simulation is easy to come up with - generally you can just simulate from the model. The second type is much harder. You have to creatively think about reasonable ways that your model can fail. I’ve found that using real data for simulations can be the best way to start coming up with ideas to try - but I usually find that it is worth building on those ideas to imagine even more extreme circumstances. Playing the evil demon for my own methods often leads me to new ideas/improvements I hadn’t thought of before. It also helps me to evaluate the work of other people - since I’ve tried to explore the contexts where methods likely fail.
In any case, if you haven’t simulated the extremes I don’t think you really know how your methods/analysis procedures are working.
04 Mar 2013
There’s a nice article by Nick Bilton in the New York Times Bits blog about the need for context when looking at Big Data. Actually, the article starts off by describing how Google’s Flu Trends model overestimated the number of people infected with flue in the U.S. this season, but then veers off into a more general discussion about Big Data.
My favorite quote comes from Mark Hansen:
“Data inherently has all of the foibles of being human,” said Mark Hansen, director of the David and Helen Gurley Brown Institute for Media Innovation at Columbia University. “Data is not a magic force in society; it’s an extension of us.”
Bilton also talks about a course he taught where students built sensors to install in elevators and stairwells at NYU to see how often they were used. The idea was to explore how often and when the NYU students used the stairs versus the elevator.
As I left campus that evening, one of the N.Y.U. security guards who had seen students setting up the computers in the elevators asked how our experiment had gone. I explained that we had found that students seemed to use the elevators in the morning, perhaps because they were tired from staying up late, and switch to the stairs at night, when they became energized.
“Oh, no, they don’t,” the security guard told me, laughing as he assured me that lazy college students used the elevators whenever possible. “One of the elevators broke down a few evenings last week, so they had no choice but to use the stairs.”
I can see at least three problems here, not necessarily mutually exclusive:
- Big Data are often “Wrong” Data. The students used the sensors measure something, but it didn’t give them everything they needed. Part of this is that the sensors were cheap, and budget was likely a big constraint here. But Big Data are often big because they are cheap. But of course, they still couldn’t tell that the elevator was broken.
- A failure of interrogation. With all the data the students collected with their multitude of sensors, they were unable to answer the question “What else could explain what I’m observing?”
- A strong desire to tell a story. Upon looking at the data, they seemed to “make sense” or to at least match a preconceived notion of that they should look like. This is related to #2 above, which is that you have to challenge what you see. It’s very easy and tempting to let the data tell an interesting story rather than the right story.
I don’t mean to be unduly critical of some students in a class who were just trying to collect some data. I think there should be more of that going on. But my point is that it’s not as easy as it looks. Even trying to answer a seemingly innocuous question of how students use elevators and stairs requires some forethought, study design, and careful analysis.
03 Mar 2013
- A really nice example where epidemiological studies are later confirmed by a randomized trial. From a statistician’s point of view, this is the idealized way that science would work. First, data that are relatively cheap (observational/retrospective studies) are used to identify potential associations of interest. After a number of these studies show a similar effect, a randomized study is performed to confirm what we suspected from the cheaper studies.
- Joe Blitzstein talking about the “Soul of Statistics”, we interviewed Joe a while ago. Teaching statistics is critical for modern citizenship. It is not just about learning which formula to plug a number into - it is about critical thinking with data. Joe’s talk nails this issue.
- Significance magazine has a writing contest. If you are a grad student in statistics/biostatistics this is an awesome way to (a) practice explaining your discipline to people who are not experts - a hugely important skill and (b) get your name out there, which will help when it comes time to look for jobs/apply for awards, etc.
- A great post from David Spiegelhalter about the UK court’s interpretation of probability. It reminds me of the Supreme Court’s recent decision that also hinged on a statistical interpretation. This post brings up two issues I think are worth a more in-depth discussion. One is that it is pretty clear that many court decisions are going to hinge on statistical arguments. This suggests (among other things) that statistical training should be mandatory in legal education. The second issue is a minor disagreement I have with Spiegelhalter’s characterization that only Bayesians use epistemic uncertainty. I frequently discuss this type of uncertainty in my classes although I take a primarily frequentist/classical approach to teaching these courses.
- Thomas Lumley is giving an online course in complex surveys.
- On the protective value of an umbrella when encountering a lion. Seems like a nice way to wrap up a post that started with the power of epidemiology and clinical trials. (via Karl B.)
27 Feb 2013
Editor’s note: With the sequestration deadline hours away, the career of many young US scientists is on the line. In this guest post, our colleague Steven Salzberg , an avid _Editor’s note: With the sequestration deadline hours away, the career of many young US scientists is on the line. In this guest post, our colleague Steven Salzberg , an avid and its peer review process, tells us why now more than ever the NIH should prioritize funding R01s over other project grants .
First let’s get the obvious facts out of the way: the federal budget is a mess, and Congress is completely disfunctional. When it comes to NIH funding, this is not a good thing.
Hidden within the larger picture, though, is a serious menace to our decades-long record of incredibly successful research in the United States. The investigator-driven, basic research grant is in even worse shape than the overall NIH budget. A recent analysis by FASEB, shown in the figure here, reveals that the number of new R01s reached its peak in 2003 - ten years ago! - and has been steadily declining since. In 2003, 7,430 new R01s were awarded. In 2012, that number had dropped to 5,437, a 27% decline.

For those who might not be familiar with the NIH system, the R01 grant is the crown jewel of research grants. R01s are awarded to individual scientists to pursue all varieties of biomedical research, from very basic science to clinical research. For R01s, NIH doesn’t tell the scientists what to do: we propose the ideas, we write them up, and then NIH organizes a rigorous peer review (which isn’t perfect, but it’s the best system anyone has). Only the top-scoring proposals get funded.
This process has gotten much tougher over the years. In 1995, the success rate for R01s was 25.9%. Today it is 18.4% and falling. This includes applications from everyone, even the most experienced and proven scientists. Thus no matter who you are, you can expect that there is more than an 80% chance that your grant application will be turned down. In some areas it is even worse: NIAID’s website announced that it is currently funding only 6% of R01s.
Why are R01s declining? Not for lack of interest: the number of applications last year was 29,627, an all-time high. Besides the overall budget problem, another problem is growing: the fondness of the NIH administration for big, top-down science projects, many times with the letters “ome” or “omics” attached.
Yes, the human genome was a huge success. Maybe the human microbiome will be too. But now NIH is pushing gigantic, top-down projects: ENCODE, 1000 Genomes, the cancer anatomy genome project (CGAP), the cancer genome atlas (TCGA), a new “brain-ome” project, and more. The more money is allocated to these big projects, the less R01s NIH can fund. For example, NIAID, with its 6% R01 success rate, has been spending tens of millions of dollars per year on 3 large Microbial Genome Sequencing Center contracts and tens of millions more on 5 large Bioinformatics Resource Center contracts. As far as I can tell, no one uses these bioinformatics resource centers for anything - in fact, virtually no one outside the centers even knows they exist. Furthermore, these large, top-down driven sequencing projects don’t address specific scientific hypotheses, but they produce something that the NIH administration seems to love: numbers. It’s impressive to see how many genomes they’ve sequenced, and it makes for nice press releases. But very often we simply don’t need these huge, top-down projects to answer scientific questions. Genome sequencing is cheap enough that we can include it in an R01 grant, if only NIH will stop pouring all its sequencing money into these huge, monolithic projects.
I’ll be the first person to cheer if Congress gets its act together and fund NIH at a level that allows reasonable growth. But whether or not that happens, the growth of big science projects, often created and run by administrators at NIH rather than scientists who have successfully competed for R01s, represents a major threat to the scientist-driven research that has served the world so well for the past 50 years. Many scientists are afraid to speak out against this trend, because by doing so we (yes, this includes me) are criticizing those same NIH administrators who manage our R01s. But someone has to say something. A 27% decline in the number of R01s over the past decade is not a good thing. Maybe it’s time to stop the omics train.
25 Feb 2013
Netflix is using data to create original content for its subscribers, the first example of which was House of Cards. Three main data points for this show were that (1) People like David Fincher (because they watch The Social Network, like, all the time); (2) People like Kevin Spacey; and (3) People liked the British version of House of Cards. Netflix obviously has tons of other data, including when you stop, pause, rewind certain scenes in a movie or TV show.
Netflix has always used data to decide which shows to license, and now that expertise is extended to the first-run. And there was not one trailer for “House of Cards,” there were many. Fans of Mr. Spacey saw trailers featuring him, women watching “Thelma and Louise” saw trailers featuring the show’s female characters and serious film buffs saw trailers that reflected Mr. Fincher’s touch.
Using data to program television content is about as new as Bryl Cream, but Netflix has the Big Data and has direct interaction with its viewers (so does Amazon Prime, which apparently is also looking to create original content). So the question is, does it work? My personal opinion is that it’s probably not any worse than previous methods, but may not be a lot better. But I would be delighted to be proven wrong. From my walks around the hallway here it seems House of Cards is in fact a good show (I haven’t seen it). But one observation probably isn’t enough to draw a conclusion here.
John Landgraf of FX Networks thinks Big Data won’t help:
“Data can only tell you what people have liked before, not what they don’t know they are going to like in the future,” he said. “A good high-end programmer’s job is to find the white spaces in our collective psyche that aren’t filled by an existing television show,” adding, those choices were made “in a black box that data can never penetrate.”
I was a bit confused when I read this but the use of the word “programmer” here I’m pretty sure is in reference to television programmer. This quote is reminiscent of Steve Jobs’ line about how it’s not he consumer’s job to know what he/she wants. It also reminds me of financial markets where all the data it the world can only tell you about the past.
In the end, can any of it help you predict the future? Or do some people just get lucky?