06 Aug 2014
Web companies are using A/B testing and experimentation regularly now to determine which features to push for advertising or improving user experience. A/B testing is a form of randomized controlled trial that was originally employed in psychology but first adopted on a massive scale in Biostatistics. Since then a large amount of work on trials and trial design has been performed in the Biostatistics community. Some of these ideas may be useful in the same context within web companies, probably a lot of them are already being used and I just haven’t seen published examples. Here are some examples:
- Sequential study designs. Here the sample size isn’t fixed in advance (an issue that I imagine is pretty hard to do with web experiments) but as the experiment goes on, the data are evaluated and a stopping rule that controls appropriate error rates is used. Here are a couple of good (if a bit dated) review on sequential designs [1] [2].
- Adaptive study designs. These are study designs that use covariates or responses to adapt the treatment assignments of people over time. With careful design and analysis choices, you can still control the relevant error rates. Here are a couple of reviews on adaptive trial designs [1] [2]
- Noninferiority trials These are trials designed to show that one treatment is at least as good as the standard of care. They are often implemented when a good placebo group is not available, often for ethical reasons. In light of the ethical concerns for human subjects research at tech companies this could be a useful trial design. Here is a systematic review for noninferiority trials [1]
It is also probably useful to read about proportional hazards models and time varying coefficients. Obviously these are just a few ideas that might be useful, but talking to a Biostatistician who works on clinical trials (not me!) would be a great way to get more information.
05 Aug 2014
Web companies have been doing human subjects research for a while now. Companies like Facebook and Google have employed statisticians for almost a decade (or more) and part of the culture they have introduced is the idea of randomized experiments to identify ideas that work and that don’t. They have figured out that experimentation and statistical analysis often beat out the opinion of the highest paid person at the company for identifying features that “work”. Here “work” may mean features that cause people to read advertising, or click on ads, or match up with more people.
This has created a huge amount of value and definitely a big interest in the statistical community. For example, today’s session on “Statistics: The Secret Weapon of Successful Web Giants” was standing room only.
But at the same time, these experiments have raised some issues. Recently scientists from Cornell and Facebook published a study where they experimented with the news feeds of users. This turned into a PR problem for Facebook and Cornell because people were pretty upset they were being experimented on and weren’t being told about it. This has led defenders of the study to say: (a) Facebook is doing the experiments anyway, they just published it this time, (b) in this case very little harm was done, (c) most experiments done by Facebook are designed to increase profitability, at least this experiment had a more public good focused approach, and (d) there was a small effect size so what’s the big deal?
OK Cupid then published a very timely blog postwith the title, “We experiment on human beings!”, probably at least in part to take advantage of the press around the Facebook experiment. This post was received with less vitriol than the Facebook study, but really drove home the point that large web companies perform as much human subjects research as most universities and with little or no oversight.
The same situation was the way academic research used to work. Scientists used their common sense and their scientific sense to decide on what experiments to run. Most of the time this worked fine, but then things like the Tuskegee Syphillis Study happened. These really unethical experiments led to the National Research Act of 1974 which codified rules about [Web companies have been doing human subjects research for a while now. Companies like Facebook and Google have employed statisticians for almost a decade (or more) and part of the culture they have introduced is the idea of randomized experiments to identify ideas that work and that don’t. They have figured out that experimentation and statistical analysis often beat out the opinion of the highest paid person at the company for identifying features that “work”. Here “work” may mean features that cause people to read advertising, or click on ads, or match up with more people.
This has created a huge amount of value and definitely a big interest in the statistical community. For example, today’s session on “Statistics: The Secret Weapon of Successful Web Giants” was standing room only.
But at the same time, these experiments have raised some issues. Recently scientists from Cornell and Facebook published a study where they experimented with the news feeds of users. This turned into a PR problem for Facebook and Cornell because people were pretty upset they were being experimented on and weren’t being told about it. This has led defenders of the study to say: (a) Facebook is doing the experiments anyway, they just published it this time, (b) in this case very little harm was done, (c) most experiments done by Facebook are designed to increase profitability, at least this experiment had a more public good focused approach, and (d) there was a small effect size so what’s the big deal?
OK Cupid then published a very timely blog postwith the title, “We experiment on human beings!”, probably at least in part to take advantage of the press around the Facebook experiment. This post was received with less vitriol than the Facebook study, but really drove home the point that large web companies perform as much human subjects research as most universities and with little or no oversight.
The same situation was the way academic research used to work. Scientists used their common sense and their scientific sense to decide on what experiments to run. Most of the time this worked fine, but then things like the Tuskegee Syphillis Study happened. These really unethical experiments led to the National Research Act of 1974 which codified rules about](http://en.wikipedia.org/wiki/Institutional_review_board) to oversee research conducted on human subjects, to guarantee their protection. The IRBs are designed to consider the ethical issues involved with performing research on humans to balance protection of rights with advancing science.
Facebook, OK Cupid, and other companies are not subject to IRB approval. Yet they are performing more and more human subjects experiments. Obviously the studies described in the Facebook paper and the OK Cupid post pale in comparison to the Tuskegee study. I also know scientists at these companies and know they are ethical and really trying to do the right thing. But it raises interesting questions about oversight. Given the emotional, professional, and economic value that these websites control for individuals around the globe, it may be time to discuss whether it is time to consider the equivalent of “institutional review boards” for human subjects research conducted by companies.
Companies who test drugs on humans such as Merck are subject to careful oversight and regulation to prevent potential harm to patients during the discovery process. This is obviously not the optimal solution for speed - understandably a major advantage and goal of tech companies. But there are issues that deserve serious consideration. For example, I think it is no where near sufficient to claim that by signing the terms of service that people have given informed consent to be part of an experiment. That being said, they could just stop using Facebook if they don’t like that they are being experimented on.
Our reliance on these tools for all aspects of our lives means that it isn’t easy to just tell people, “Well if you don’t like being experimented on, don’t use that tool.” You would have to give up at minimum Google, Gmail, Facebook, Twitter, and Instagram to avoid being experimented on. But you’d also have to give up using smaller sites like OK Cupid, because almost all web companies are recognizing the importance of statistics. One good place to start might be in considering new and flexible forms of consent that make it possible to opt in and out of studies in an informed way, but with enough speed and flexibility not to slowing down the innovation in tech companies.
29 Jul 2014
I’ve been introducing people to R for quite a long time now and I’ve been doing some reflecting today on how that process has changed quite a bit over time. I first started using R around 1998–1999 I think I first started talking about R informally to my fellow classmates (and some faculty) back when I was in graduate school at UCLA. There, the department was officially using Lisp-Stat (which I loved) and only later converted its courses over to R. Through various brown-bag lunches and seminars I would talk about R, and the main selling point at the time was “It’s just like S-PLUS but it’s free!” As it turns out, S-PLUS was basically abandoned by academics and its ownership changed hands a number of times over the years (it is currently owned by TIBCO). I still talk about S-PLUS when I talk about the history of R but I’m not sure many people nowadays actually have any memories of the product.
When I got to Johns Hopkins in 2003 there wasn’t really much of a modern statistical computing class, so Karl Broman, Rafa Irizarry, Brian Caffo, Ingo Ruczinski, and I got together and started what we called the “KRRIB” class, which was basically a weekly seminar where one of us talked about a computing topic of interest. I gave some of the R lectures in that class and when I asked people who had heard of R before, almost no one raised their hand. And no one had actually used it before. My approach was pretty much the same at the time, although I left out the part about S-PLUS because no one had used that either. A lot of people had experience with SAS or Stata or SPSS. A number of people had used something like Java or C/C++ before and so I often used that a reference frame. No one had ever used a functional-style of programming language like Scheme or Lisp.
Over time, the population of students (mostly first-year graduate students) slowly shifted to the point where many of them had been introduced to R while they were undergraduates. This trend mirrored the overall trend with statistics where we are seeing more and more students do undergraduate majors in statistics (as opposed to, say, mathematics). Eventually, by 2008–2009, when I’d ask how many people had heard of or used R before, everyone raised their hand. However, even at that late date, I still felt the need to convince people that R was a “real” language that could be used for real tasks.
R has grown a lot in recent years, and is being used in so many places now, that I think its essentially impossible for a person to keep track of everything that is going on. That’s fine, but it makes “introducing” people to R an interesting experience. Nowadays in class, students are often teaching me something new about R that I’ve never seen or heard of before (they are quite good at Googling around for themselves). I feel no need to “bring people over” to R. In fact it’s quite the opposite–people might start asking questions if I weren’t teaching R.
Even though my approach to introducing R has evolved over time, with the topics that I emphasize or de-emphasize changing, I’ve found there are a few topics that I always stress to people who are generally newcomers to R. For whatever reason, these topics are always new or at least a little unfamiliar.
- R is a functional-style language. Back when most people primarily saw something like C as a first programming language, it made sense to me that the functional style of programming would seem strange. I came to R from Lisp-Stat so the functional aspect was pretty natural for me. But many people seem to get tripped up over the idea of passing a function as an argument or not being able to modify the state of an object in place. Also, it sometimes takes people a while to get used to doing things like lapply() and map-reduce types of operations. Everyone still wants to write a for loop!
- R is both an interactive system and a programming language. Yes, it’s a floor wax and a dessert topping–get used to it. Most people seem expect one or the other. SAS users are wondering why you need to write 10 lines of code to do what SAS can do in one massive PROC statement. C programmers are wondering why you don’t write more for loops. C++ programmers are confused by the weird system for object orientation. In summary, no one is ever happy.
- Visualization/plotting capabilities are state-of-the-art. One of the big selling points back in the “old days” was that from the very beginning R’s plotting and graphics capabilities where far more elegant than the ASCII-art that was being produced by other statistical packages (true for S-PLUS too). I find it a bit strange that this point has largely remained true. While other statistical packages have definitely improved their output (and R certainly has some areas where it is perhaps deficient), R still holds its own quite handily against those other packages. If the community can continue to produce things like ggplot2 and rgl, I think R will remain at the forefront of data visualization.
I’m looking forward to teaching R to people as long as people will let me, and I’m interested to see how the next generation of students will approach it (and how my approach to them will change). Overall, it’s been just an amazing experience to see the widespread adoption of R over the past decade. I’m sure the next decade will be just as amazing.
25 Jul 2014
I think that the main distinction between academic statisticians and those calling themselves data scientists is that the latter are very much willing to invest most of their time and energy into solving specific problems by analyzing specific data sets. In contrast, most academic statisticians strive to develop methods that can be very generally applied across problems and data types. There is a reason for this of course: historically statisticians have had enormous influence by developing general theory/methods/concepts such as the p-value, maximum likelihood estimation, and linear regression. However, these types of success stories are becoming more and more rare while data scientists are becoming increasingly influential in their respective areas of applications by solving important context-specific problems. The success of Money Ball and the prediction of election results are two recent widely publicized examples.
A survey of papers published in our flagship journals make it quite clear that context-agnostic methodology are valued much more than detailed descriptions of successful solutions to specific problems. These applied papers tend to get published in subject matter journals and do not usually receive the same weight in appointments and promotions. This culture has therefore kept most statisticians holding academic position away from collaborations that require substantial time and energy investments in understanding and attacking the specifics of the problem at hand. Below I argue that to remain relevant as a discipline we need a cultural shift.
It is of course understandable that to remain a discipline academic statisticians can’t devote all our effort to solving specific problems and none to trying to the generalize these solutions. It is the development of these abstractions that defines us as an academic discipline and not just a profession. However, if our involvement with real problems is too superficial, we run the risk of developing methods that solve no problem at all which will eventually render us obsolete. We need to accept that as data and problems become more complex, more time will have to be devoted to understanding the gory details.
But what should the balance be?
Note that many of the giants of our discipline were very much interested in solving specific problems in genetics, agriculture, and the social sciences. In fact, many of today’s most widely-applied methods were originally inspired by insights gained by answering very specific scientific questions. I worry that the balance between application and theory has shifted too far away from applications. An unfortunate consequence is that our flagship journals, including our applied journals, are publishing too many methods seeking to solve many problems but actually solving none. By shifting some of our efforts to solving specific problems we will get closer to the essence of modern problems and will actually inspire more successful generalizable methods.
16 Jul 2014
One of the best things to happen on the Internet recently is that Jan de Leeuw has decided to own the Twitter/Facebook universe. If you do not already, you should be following him. Among his many accomplishments, he founded the Department of Statistics at UCLA (my alma mater), which is currently thriving. On the occasion of the Department’s 10th birthday, there was a small celebration, and I recall Don Ylvisaker mentioning that the reason they invited Jan to UCLA way back when was because he “knew everyone and knew everything”. Pretty accurate description, in my opinion.
Jan’s been tweeting quite a bit of late, but recently had this gem:
followed by
I’m not sure what Jan’s thinking behind the first tweet was, but I think many in statistics would consider it a “good thing” to be a minor subfield of data science. Why get involved in that messy thing called data science where people are going wild with data in an unprincipled manner?
This is a situation where I think there is a large disconnect between what “should be” and what “is reality”. What should be is that statistics should include the field of data science. Honestly, that would be beneficial to the field of statistics and would allow us to provide a home to many people who don’t necessarily have one (primarily, people working not he border between two fields). Nate Silver made reference to this in his keynote address to the Joint Statistical Meetings last year when he said data science was just a fancy term for statistics.
The reality though is the opposite. Statistics has chosen to limit itself to a few areas, such as inference, as Jan mentions, and to willfully ignore other important aspects of data science as “not statistics”. This is unfortunate, I think, because unlike many in the field of statistics, I believe data science is here to stay. The reason is because statistics has decided not to fill the spaces that have been created by the increasing complexity of modern data analysis. The needs of modern data analyses (reproducibility, computing on large datasets, data preprocessing/cleaning) didn’t fall into the usual statistics curriculum, and so they were ignored. In my view, data science is about stringing together many different tools for many different purposes into an analytic whole. Traditional statistical modeling is a part of this (often a small part), but statistical thinking plays a role in all of it.
Statisticians should take on the challenge of data science and own it. We may not be successful in doing so, but we certainly won’t be if we don’t try.