21 Oct 2011
Chris Barr
Chris Barr is an assistant professor of biostatistics at the Harvard School of Public Health in Boston. He moved to Boston after getting his Ph.D. at UCLA and then doing a postdoc at Johns Hopkins Bloomberg School of Public Health. Chris has done important work in environmental biostatistics and is also the co-founder of OpenIntro, a very cool open-source (and free!) educational resource for statistics.
Which term applies to you: data scientist/statistician/analyst?
I’m a “statistician” by training. One day, I hope to graduate to “scientist”. The distinction, in my mind, is that a scientist can bring real insight to a tough problem, even when the circumstances take them far beyond their training.
Statisticians get a head start on becoming scientists. Like chemists and economists and all the rest, we were trained to think hard as independent researchers. Unlike other specialists, however, we are given the opportunity, from a young age, to see all types of different problems posed from a wide range of perspectives.
How did you get into statistics/data science (e.g. your history)?
I studied economics in college, and I had planned to pursue a doctorate in the same field. One day a senior professor of statistics asked me about my future, and in response to my stated ambition, said: “Whatever an economist can do, a statistician can do better.” I started looking at graduate programs in statistics and noticed UCLA’s curriculum. It was equal parts theory, application, and computing, and that sounded like how I wanted to spend my next few years. I couldn’t have been luckier. The program and the people were fantastic.
What is the problem currently driving you?
I’m working on so many projects, it’s difficult to single out just one. Our work on smoking bans (joint with Diez, Wang, Samet, and Dominici) has been super exciting. It is a great example about how careful modeling can really make a big difference. I’m also soloing a methods paper on residual analysis for point process models that is bolstered by a simple idea from physics. When I’m not working on research, I spend as much time as I can on OpenIntro.
What is your favorite paper/idea you have had? Why?
I get excited about a lot of the problems and ideas. I like the small teams (one, two, or three authors) that generally take on theory and methods problems; I also like the long stretches of thinking time that go along with those papers. That said, big science papers, where I get to team up with smart folks from disciplines and destinations far and wide, really get me fired up. Last, but not least, I really value the work we do on open source education and reproducible research. That work probably has the greatest potential for introducing me to people, internationally and in small local communities, that I’d never know otherwise.
Who were really good mentors to you? What were the qualities that really helped you?
Identifying key mentors is such a tough challenge, so I’ll adhere to a self-imposed constraint by picking just one: Rick Schoenberg. Rick was my doctoral advisor, and has probably had the single greatest impact on my understanding of what it means to be a scientist and colleague. I could tell you a dozen stories about the simple kindness and encouragement that Rick offered. Most importantly, Rick was positive and professional in every interaction we ever had. He was diligent, but relaxed. He offered structure and autonomy. He was all the things a student needs, and none of the things that make students want to read those xkcd comics. Now that I’m starting to make my own way, I’m grateful to Rick for his continuing friendship and collaboration.
I know you asked about mentors, but if I could mention somebody who, even though not my mentor, has taught me a ton, it would be David Diez. David was my classmate at UCLA and colleague at Harvard. We are also cofounders of OpenIntro. David is probably the hardest working person I know. He is also the most patient and clear thinking. These qualities, like Rick’s, are often hard to find in oneself and can never be too abundant.
What is OpenIntro?
OpenIntro is part of the growing movement in open source education. Our goal, with the help of community involvement, is to improve the quality and reduce the cost of educational materials at the introductory level. Founded by two statisticians (Diez, Barr), our early activities have generated a full length textbook (OpenIntro Statistics: Diez, Barr, Cetinkaya-Rundel) that is available for free in PDF and at cost ($9.02) in paperback. People can also use openintro.org to manage their course materials for free, whether they are using our book or not. The software, developed almost entire by David Diez, makes it easy for people to post lecture notes, assignments, and other resources. Additionally, it gives people access to our online question bank and quiz utility. Last but not least, we are sponsoring a student project competition. The first round will be this semester, and interested people can visit openintro.org/stat/comp for additional information. We are little fish, but with the help of our friends (openintro.org/about.php) and involvement from the community, we hope to do a good thing.
How did you get the idea for OpenIntro?
Regarding the book and webpage - David and I had both started writing a book on our own; David was keen on an introductory text, and I was working on one about statistical computing. We each realized that trying to solo a textbook while finishing a PhD was nearly impossible, so we teamed up. As the project began to grow, we were very lucky to be joined by Mine Cetinkaya-Rundel, who became our co-author on the text and has since played a big role in developing the kinds of teaching supplements that instructors find so useful (labs and lecture notes to name a few). Working with the people at OpenIntro has been a blast, and a bucket full of nights and weekends later, here we are!
Regarding making everything free - David and I started the OpenIntro project during the peak of the global financial crisis. With kids going to college while their parents’ house was being foreclosed, it seemed timely to help out the best way we knew how. Three years later, as I write this, the daily news is running headline stories about the Occupy Wall Street movement featuring hard times for young people in America and around the world. Maybe “free” will always be timely.
For More Information
Check out Chris’ webpage, his really nice publications including this one on the public health benefits of cap and trade, and the OpenIntro project website. Keep your eye open for the paper on cigarette bans Chris mentions in the interview, it is sure to be good.
Related Posts: Jeff’s interview with Daniela Witten, Rafa on the future of graduate education, Roger on colors in R.
20 Oct 2011
The job of the statistician is almost entirely about collaboration. Sure, there’s theoretical work that we can do by ourselves, but most of the impact that we have on science comes from our work with scientists in other fields. Collaboration is also what makes the field of statistics so much fun.
So one question I get a lot from people is “how do you find good collaborations”? Or, put another way, how do you find good collaborators? It turns out this distinction is more important than it might seem.
My approach to developing collaborations has evolved over time and I consider myself fairly lucky to have developed a few very productive and very enjoyable collaborations. These days my strategy for finding good collaborations is to look for good collaborators. I personally find it important to work with people that I like as well as respect as scientists, because a good collaboration is going to involve a lot of personal interaction. A place like Johns Hopkins has no shortage of very intelligent and very productive researchers that are doing interesting things, but that doesn’t mean you want to work with all of them.
Here’s what I’ve been telling people lately about finding collaborations, which is a mish-mash of a lot of advice I’ve gotten over the years.
- Find people you can work with. I sometimes see situations where a statistician will want to work with someone because he/she is working on an important problem. Of course, you want to be working on a problem that interests you, but it’s only partly about the specific project. It’s very much about the person. If you can’t develop a strong working relationship with a collaborator, both sides will suffer. If you don’t feel comfortable asking (stupid) questions, pointing out problems, or making suggestions, then chances are the science won’t be as good as it could be.
- It’s going to take some time. I sometimes half-jokingly tell people that good collaborations are what you’re left with after getting rid of all your bad ones. Part of the reasoning here is that you actually may not know what kinds of people you are most comfortable working with. So it takes time and a series of interactions to learn these things about yourself and to see what works and doesn’t work. Of course, you can’t take forever, particularly in academic settings where the tenure clock might be ticking, but you also can’t rush things either. One rule I heard once was that a collaboration is worth doing if it will likely end up with a published paper. That’s a decent rule of thumb, but see my next comment.
- It’s going to take some time. Developing good collaborations will usually take some time, even if you’ve found the right person. You might need to learn the science, get up to speed on the latest methods/techniques, learn the jargon, etc. So it might be a while before you can start having intelligent conversations about the subject matter. Then it takes time to understand how the key scientific questions translate to statistical problems. Then it takes time to figure out how to develop new methods to address these statistical problems. So a good collaboration is a serious long-term investment which has some risk of not working out. There may not be a lot of papers initially, but the idea is to make the early investment so that truly excellent papers can be published later.
- Work with people who are getting things done. Nothing is more frustrating than collaborating on a project with someone who isn’t that interested in bringing it to a close (i.e. a published paper, completed software package). Sometimes there isn’t a strong incentive for the collaborator to finish (i.e she/he is already tenured) and other times things just fall by the wayside. So finding a collaborator who is continuously getting things done is key. One way to determine this is to check out their CV. Is there a steady stream of productivity? Papers in good journals? Software used by lots of other people? Grants? Web site that’s not in total disrepair?
- You’re not like everyone else. One thing that surprised me was discovering that just because someone you know works well with a specific person doesn’t mean that you will work well with that person. This sounds obvious in retrospect, but there were a few situations where a collaborator was recommended to me by a source that I trusted completely, and yet the collaboration didn’t work out. The bottom line is to trust your mentors and friends, but realize that differences in personality and scientific interests may determine a different set of collaborators with whom you work well.
These are just a few of my thoughts on finding good collaborators. I’d be interested in hearing others’ thoughts and experiences along these lines.
Related Posts: Rafa on authorship conventions, finish and publish
20 Oct 2011
Brian Caffo from the comments:
Personal theorem: the application of statistics in any new field will be labeled “Technical sounding word” + ics. Examples: Sabermetrics, analytics, econometrics, neuroinformatics, bioinformatics, informatics, chemeometrics.
It’s like how adding mayonnaise to anything turns it in to salad (eg: egg salad, tuna salad, ham salad, pasta salad, …)
I’d like to be the first to propose the statistical study of turning things in salad. So called mayonaisics.
19 Oct 2011
All statisticians in academia are constantly confronted with the question of where to publish their papers. Sometimes it’s obvious: A theoretical paper might go to the Annals of Statistics or_JASA Theory & Methods_ or Biometrika. A more “methods-y” paper might go to JASA or JRSS-B or_Biometrics_ or maybe even Biostatistics (where all three of us are or have been associate editors).
But where should the applied papers go? I think this is an increasingly large category of papers being produced by statisticians. These are papers that do not necessarily develop a brand new method or uncover any new theory, but apply statistical methods to an interesting dataset in a not-so-obvious way. Some papers might combine a set of existing methods that have never been combined before in order to solve an important scientific problem.
Well, there are some official applied statistics journals: JASA Applications & Case Studies or JRSS-C or Annals of Applied Statistics. At least they have the word “application” or “applied” in their title. But the question we should be asking is if a paper is published in one of those journals, will it reach the right audience?
What is the audience for an applied stat paper? Perhaps it depends on the subject matter. If the application is biology, then maybe biologists. If it’s an air pollution and health application, maybe environmental epidemiologists. My point is that the key audience is probably not a bunch of other statisticians.
The fundamental conundrum of applied stat papers comes down to this question:If your application of statistical methods is truly addressing an important scientific question, then shouldn’t the scientists in the relevant field want to hear about it? If the answer is yes, then we have two options: Force other scientists to read our applied stat journals, or publish our papers in their journals. There doesn’t seem to be much momentum for the former, but the latter is already being done rather frequently.
Across a variety of fields we see statisticians making direct contributions to science by publishing in non-statistics journals. Some examples are this recent paper in Nature Genetics or a paper I published a few years ago in the Journal of the American Medical Association. I think there are two key features that these papers (and many others like them) have in common:
There was an important scientific question addressed. The first paper investigates variability of methylated regions of the genome and its relation to cancer tissue and the second paper addresses the problem of whether ambient coarse particles have an acute health effect. In both cases, scientists in the respective substantive areas were interested in the problem and so it was natural to publish the “answer” in their journals.
The problem was well-suited to be addressed by statisticians. Both papers involved large and complex datasets for which training in data analysis and statistics was important. In the analysis of coarse particles and hospitalizations, we used a national database of air pollution concentrations and obtained health status data from Medicare. Linking these two databases together and conducting the analysis required enormous computational effort and statistical sophistication. While I doubt we were the only people who could have done that analysis, we were very well-positioned to do so.
So when statisticians are confronted by a scientific problems that are both (1) important and (2) well-suited for statisticians, what should we do? My feeling is we should skip the applied statistics journals and bring the message straight to the people who want/need to hear it.
There are two problems that come to mind immediately. First, sometimes the paper ends up being so statistically technical that a scientific journal won’t accept it. And of course, in academia, there is the sticky problem of how do you get promoted in a statistics department when your CV is filled with papers in non-statistics journals. This entry is already long enough so I’ll address these issues in a future post.