Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Missing not at random data makes some Facebook users feel sad

This article, published last week, explained how “some younger users of Facebook say that using the site often leaves them feeling sad, lonely and inadequate”.  Being a statistician  gives you an advantage here because we know that naive estimates from missing not at random (MNAR) data can be very biased. The posts you see on Facebook are not a random sample from your friends’ lives. We see pictures of their vacations,  abnormally flattering pictures of themselves, reports on their major achievements, etc…  but no view of  the mundane typical daily occurrences. Here is a simple cartoon explanation of how MNAR data can give you a biased view of whats really going on. Suppose your life occurrences are rated from 1 (worst) to 5 (best), this table compares what you see to what is really going on after 15 occurrences:

Screen Shot 2014-01-17 at 10.16.32 AM

edge.org asks famous scientists what scientific concept to throw out & they say statistics

I don’t think I’ve ever been forwarded one link on the web more than I have been forwarded the edge.org post on “What scientific idea is ready for retirement?”. Here are a few of the comments with my responses. I’m going to keep them brief because I think the edge.org crowd pushes people to say outrageous things, so it isn’t even clear they mean what they say.

I think the whole conceit of the question is a little silly. If you are going to retire a major scientific idea you better have a replacement or at least a guess at what we could do next. The question totally ignores the key question of: “Suppose we actually did what you suggested, what would we do instead?”

On getting rid of big clinical trials

It is a commonly held but erroneous belief that a larger study is always more rigorous or definitive than a smaller one, and a randomized controlled trial is always the gold standard . However, there is a growing awareness that size does not always matter and a randomized controlled trial may introduce its own biases. We need more creative experimental designs.

My response: Yes clinical trials work. Yes bigger trials and randomized trials are more definitive. There is currently no good alternative for generating causal statements that doesn’t require quite severe assumptions. The “creative experimental designs” has serious potential to be abused by folks who say things like “Well my friend Susie totally said that diet worked for her…”. The author says we should throw out RCT with all the benefits they have provided because it is hard to get women to adhere to a pretty serious behavioral intervention over an 8 year period. If anything, this makes us consider what is a reasonable intervention, not the randomized trial part.

On bailing on statistical independence assumptions

It is time for science to retire the fiction of statistical independence…..So the overwhelming common practice is simply to assume that sampled events are independent. An easy justification for this is that almost everyone else does it and it’s in the textbooks. This assumption has to be one of the most widespread instances of groupthink in all of science.

My response: There are a huge number of statistical methods for dealing with non-independent data. Statisticians have been working on this for decades with blocking, stratification, random effects, deep learning, multilevel models, GEE, Garch models, etc. etc., etc. It’s a fact that statistical independence is a fiction, but sometimes it is a useful one.

On bailing on the p-value (or any other standardized statistical procedure)

Not for a minute should anyone think that this procedure has much to do with statistics proper… A 2011 paper in_Nature Neuroscience_ presented an analysis of neuroscience articles in Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience showed that although 78 did as they should, 79 used the incorrect procedure.

My response: P-values on their own and P-values en-masse are both annoying and not very helpful. But we need a way to tell whether those effect sizes you observed are going to replicate or not. P-values are probably not the best thing for measuring that (maybe you should try to estimate it directly?). But any procedure you scale up to 100,000’s of thousands of users is going to cause all sorts of problems. If you give people more dimensions to call their result “real” or “significant” you aren’t going to reduce false positives. At scale we need fewer researcher degrees of freedom not more.

On science not being self-correcting

The pace of scientific production has quickened, and self-correction has suffered. Findings that might correct old results are considered less interesting than results from more original research questions. Potential corrections are also more contested. As the competition for space in prestigious journals has become increasingly frenzied, doing and publishing studies that would confirm the rapidly accumulating new discoveries, or would correct them, became a losing proposition. ublic registration of the design and analysis plan of a study before it is begun. Clinical trials researchers have done this for decades, and in 2013 researchers in other areas rapidly followed suit. Registration includes the details of the data analyses that will be conducted, which eliminates the former practice of presenting the inevitable fluctuations of multifaceted data as robust results. Reviewers assessing the associated manuscripts end up focusing more on the soundness of the study’s registered design rather than disproportionately favoring the findings. This helps reduce the disadvantage that confirmatory studies usually have relative to fishing expeditions. Indeed, a few journals have begun accepting articles from well-designed studies even before the results come in.

Wait, I thought there was a big rise in retraction rates that has everyone freaking out. Isn’t there a website just dedicated to outing and shaming people who retract stuff?  I think registry of study designs for confirmatory research is a great idea. But I wonder what the effect would be on reducing the opportunity for scientific mistakes that turn into big ideas. This person needs to read the ROC curves of science. Any basic research system that doesn’t allow for a lot of failure is never going to discover anything interesting.

Big effects are due to multiple small effects

So, do big effects tend to have big explanations, or many explanations? There is probably no single, simple and uniformly correct answer to this question. (It’s a hopeless tree!) But, we can use a simple model to help make an educated guess.

The author simulates 200 variables each drawn from a N(0,i) for i=1…5. The author finds that most of the largest values come from the N(0,5) not the N(0,1). This says nothing about simple or complex phenomena. It says a lot about how a N(0,5) is more variable than a N(0,1). This does not address the issue of whether hypotheses are correct or not.

Bonus round: On abandoning evolution

Intelligent design and other Creationist critiques have been easily shrugged off and the facts of evolution well established in the laboratory, fossil record, DNA record and computer simulations. If evolutionary biologists are really Seekers of the Truth, they need to focus more on finding the mathematical regularities of biology, following in the giant footsteps of Sewall Wright, JBS Haldane, Ronald  Fisher and so on.

Among many other things, this person needs a course in statistics. The people he is talking about focused on quantifying uncertainty about biology, not certainty or mathematical regularity.

One I actually agree with: putting an end to the idea that Big Data solves all problems

No, I don’t literally mean that we should stop believing in, or collecting, Big Data. But we should stop pretending that Big Data is magic.

That guy must be reading our blog. The key word in data science is science, after all.

On focusing on the variance rather than the mean

Our focus on averages should be retired. Or, if not retired, we should give averages an extended vacation. During this vacation, we should catch up on another sort of difference between groups that has gotten short shrift: we should focus on comparing the difference in variance (which captures the spread or range of measured values) between groups.

I actually like most of this article, but the format for the edge.org pieces killed it. The author says we should stop caring about the mean or make it secondary. I completely agree we should consider the variance - the examples he points out are great. But we should also always keep in mind the first moment before we move on to the second, so not “retire” just “add to”.

 

 No one asked me but here is what I’d throw out

  • Sweeping generalizations without careful theory, experimentation, and good data
  • Oversimplifying questions that don’t ask for potential solutions that deal with the complexity of the real world.
  • Sensationalism by scientists about science
  • Sensationalism by journalists about science
  • Absolutist claims about uncertain data

Sunday data/statistics link roundup (1/12/2014)

Well it technically is Monday, but I never went to sleep so that still counts as Sunday right?

  1. As a person who has taught a couple of MOOCs I’m used to getting some pushback from people who don’t like the whole concept. But I’m still happy that I’m not the only one who thinks they are a pretty good idea and still worth doing. I think that both the hype and the backlash are too much. They hype claimed it would completely end the university as we know it. The backlash says it will have no impact. I think more likely it will have a major impact on people who traditionally don’t attend colleges. That’s ok with me. I think this post gets it about right.
  2. The Leekasso is finally dethroned! Korbinian Strimmer used my simulation code and compared it to CAT scores in the sda package coupled with Higher Criticism feature selection. Here is the accuracy plot. Looks like Leekasso is competitive with CAT-Leekasso, but CAT+HC wins. Big win for Github there and thanks to Korbinian for taking the time to do the simulation!
  3. Jack Andraka is getting some pushback from serious scientists on the draft of his paper describing the research he outlined in his TED talk. He is taking the criticism like a pro, which says a lot about the guy. From reading the second hand reviews, it sounds like his project was like most good science projects  - it made some interesting progress but needs a lot of grinding before it turns into something real. The hype made it sound too good to be true. I hope that he will just ignore the hype machine from here on in and keep grinding (via Rafa).
  4. I’ve probably posted this before, but here is the illustrated guide to a Ph.D. Lest you think that little bump doesn’t matter, don’t forget to scroll to the bottom and read this.
  5. The bmorebiostat bloggers (http://bmorebiostat.com/), if you aren’t following them, you should be.
  6. Potentially cool website for accessing treasury data.
  7. Ok its 5am. I need a githug and then off to bed.

The top 10 predictor takes on the debiased Lasso - still the champ!

After reposting on the comparison between the lasso and the always top 10 predictor (leekasso) I got some feedback that the problem could be I wasn’t debiasing the Lasso (thanks Tim T. on Twitter!). The idea behind debiasing (as I understand it) is to use the Lasso to do feature selection and then fit model without shrinkage to “debias” the coefficients. The debiased model is then used for prediction. Noah Simon, who knows approximately infinitely more about this than I do, kindly provided some code for fitting a debiased Lasso. He is not responsible for any mistakes/silliness in the simulation, he was just nice enough to provide some debiased Lasso code. He mentions a similar idea appears in the relaxo package if you set \phi=0.

I used the same simulation set up as before and tried out the Leekasso, the Lasso and the Debiased Lasso. Here are the accuracy results (more red = higher accuracy):

accuracy-plot

The results suggest the debiased Lasso still doesn't work well under this design. Keep in mind as I mentioned in my previous post that the Lasso may perform better under a different causal model.

Update:  Code available here on Github if you want to play around.

Preparing for tenure track job interviews

Editor’s note: This is a slightly modified version of a previous post.

If you are in the job market you will soon be receiving (or already received) an invitation for an interview. So how should you prepare?  You have two goals. The first is to make a good impression. Here are some tips:

1) During your talk, do NOT go over your allotted time. Practice your talk at least twice. Both times in front of a live audiences that asks questions.

2) Know your audience. If it’s a “math-y” department, give a more “math-y” talk. If it’s an applied department, give a more applied talk. But (sorry for the cliché) be yourself. Don’t pretend to be interested in something you are not as this almost always backfires.

3) Learn about the faculty’s research interests. This will help during the one-on-one meetings.

4)  Be ready to answer the question “what do you want to teach?” and “where do you see yourself in five years?”

5) I can’t think of any department where it is necessary to wear a suit (correct me if I’m wrong in the comments). In some places you might feel uncomfortable wearing a suit while those interviewing you are in shorts and t-shirt.

Second, and just as important, you want to figure out if you like the department you are visiting. Do you want to spend the next 5, 10, 50 years there?  Make sure to find out as much as you can to answer this question. Some questions are more appropriate for junior faculty, the more sensitive ones for the chair. Here are some example questions I would ask:

1) What are the expectations for promotion? Would you promote someone publishing exclusively in subject matter journals such as Nature, Science, Cell, PLoS Biology, American Journal of Epidemiology ? Somebody publishing exclusively in Annals of Statistics? Is being a PI on an R01 a requirement for tenure?

2) What are the expectations for teaching/service/collaboration? How are teaching and committee service assignments made?

3) How did you connect with your collaborators? How are these connections made?

4) What percent of my salary am I expected to cover? Is it possible to do this by being a co-investigator?

5) Where do you live? How are the schools? How is the commute?

6) How many graduate students does the department have? How are graduate students funded? If I want someone to work with me, do I have to cover their stipend/tuition?

7) How is computing supported? This varies a lot from place to place. Some departments share amazing systems. Ask how costs are shared? How is the IT staff? Is R supported? In others you might have to buy your own hardware. Get all the details.

Specific questions for the junior Faculty:

Are the expectations for promotion made clear to you? Do you get feedback on your progress? Do the senior faculty mentor you? Do the senior faculty get along? What do you like most about the department? What can be improved? In the last 10 years, what percent of junior faculty get promoted?

Questions for the chair:

What percent of my salary am I expected to cover? How soon? Is their bridge funding? What is a standard startup package? Can you describe the promotion process in detail? What space is available for postdocs? (for hard money place) I love teaching, but can I buy out teaching with grants?

I am sure I missed stuff, so please comment away….