The Care and Feeding of the Biostatistician

08 Oct 2013

Editor’s Note: This guest post was written by Elizabeth C. Matsui, an Associate Professor in the Division of Pediatric Allergy and Immunology at the Johns Hopkins School of Medicine.

I’ve been collaborating with Roger for several years now and we have had quite a few discussions about characteristics of a successful collaboration between a clinical investigator and a biostatistician. I can’t remember for certain, but think that this cartoon may have been the impetus for some of our discussions. I have joked that I should write a guide for clinical investigators entitled, “The Care and Feeding of the Biostatistician.” Fortunately, Roger has a good sense of humor and appreciates the ironic title, so asked me to write down a few thoughts for Simply Statistics. Forging successful collaborations may seem less important than other skills such as grant writing, but successful collaboration is an important determinant of career success, and for many people, an enormous source of career satisfaction. And in the current scientific environment in which large, complex datasets and sophisticated quantitative and data visualization methods are becoming increasingly common, collaboration with biostatisticians is necessary to harness the full potential of your data and to have the greatest scientific impact. In some cases, not engaging a biostatistical collaborator may put you at risk of making statistical missteps that could result in erroneous results.

Be respectful of time. This tenet, of course, is applicable to all collaborations, but may be a more common stumbling block for clinical investigators working with biostatisticians. Most power estimates and sample size calculations, for example, are more complex than appreciated by the clinical investigator. A discussion about the research question, primary outcome, etc. is required and some thought has to go into determining the most appropriate approach before your biostatistician collaborator has even laid hands on the keyboard and fired up R. At a minimum, engage your biostatistician collaborator earlier than you might think necessary, and ideally, solicit their input during the planning stages. Engaging a biostatistician sooner rather than later not only fosters good will, but will also improve your science. A biostatistician’s time, like yours, is valuable, so respect their time by allocating an appropriate level of salary support on grants. Most academicians I come across appreciate that budgets are tight, so they understand that they may not get the level of salary support that they think is most appropriate. However, “finding room” in the budget for 1% salary support for a biostatistician sends the message that the biostatistician is an afterthought, a necessity for a sample size calculation and a competitive grant application, but in the end, just a formality. Instead, dedicate sufficient salary support in your grant to support the level of biostatistical effort that will be needed. This sends the message that you would like your biostatistician collaborator to be an integral part of the investigator team and provides an opportunity for the kind of regular, ongoing interactions that are needed for productive collaborations.
Understand that a biostatistician is not a computational tool. Although sample size and power calculations are probably the most common service solicited from biostatisticians, and biostatisticians can be enormously helpful in this arena, they have the most impact when they are engaged in discussions about study designs and analytic approaches for a scientific question. Their quantitative approach to scientific problems provides a fresh perspective that can increase the scientific impact of your work. My sense is that this is also much more interesting work for a biostatistician than sample size and power calculations, and engaging them in interesting work goes a long way towards cementing a mutually productive collaboration.
Make an effort to learn the language of biostatistics. Technical jargon is a serious impediment to successful collaboration. Again, this is true of all cross-discipline collaborations, but may be particularly true in collaborations with biostatisticians. The field has a penchant for eponymous methods (Hosmer-Lemeshow, Wald, etc.) and terminology that is entertaining, but not intuitive (jackknife, bootstrapping, lasso). While I am not suggesting that a clinical investigator needs to enroll in biostatistics courses (why gain expertise in a field when your collaborator provides this expertise), I am advocating for educating yourself about the basic concepts and terminology of statistics. Know what is meant by: distribution of a variable, predictor variable, outcome variable, and variance, for example. There are some terrific “Biostatistics 101”-type lectures and course materials online that are excellent resources. But also lean on your biostatistician collaborator by asking him/her to explain terminology and teach you these basics and do not be afraid to ask questions.
When all else fails (and even when all else doesn’t fail), draw pictures. In truth, this is often the place where I start when I first engage a biostatistician. Showing your biostatistician collaborator what you expect your data to look like in a figure or conceptual diagram simplifies communication as it avoids use of jargon and biostatisticians can readily grasp the key information they need from a figure or diagram to come up with a sample size estimate or analytic approach.
Teach them your language. Clinical medicine is also rife with jargon, and just as biostatistical jargon can make it difficult to communicate clearly with a biostatistician, so can clinical jargon. Avoid technical jargon where possible, and define terminology where it is not possible. Educate your collaborator about the background, context and rationale for your scientific question and encourage questions.
Generously share your data and ideas. In many organizations, biostatisticians are very interested in developing new methods, applying more sophisticated methods to an “old” problem, and/or answering their own scientific questions. Do what you can to support these career interests, such as sharing your data and your ideas. Sharing data opens up avenues for increasing the impact of your work, as your biostatistician collaborator has opportunities to develop quantitative approaches to answering research questions related to your own interests. Sharing data alone is not sufficient, though. Discussions about what you see as the important, unanswered questions will help provide the necessary background and context for the biostatistician to make the most of the available data. As highlighted in a recent book, giving may be an important and overlooked component of success, and I would argue, also a cornerstone of a successful collaboration.

The Leek group policy for developing sustainable R packages

07 Oct 2013

As my group has grown over the past few years and I have more people writing software, I have started to progressively freak out more and more about how to make sure that the software is sustainable as students graduate and move on to bigger and better things. I am also concerned with maintaining quality of the software we are developing in a field where the pace of development/discovery is so high.

As a person who simultaneously (a) has no formal training in CS or software development and (b) believes that if there is no software there is no paper I am worried about creating a bunch of unsustainable software. So I solicited the advice of people around here who know more about it than I do and I collected my past experience with creating software and how I screwed it up. I put it all together in the Leek group guide to building and maintaing software packages.

The guide covers (among other things):

When to start building a package
How to version the package
How to document the package
What not to include
How to build unit tests
How to create a vignette
The commitment I expect in terms of software maintenance

I put it on Github because I’m still not 100% sure I got it right. The policy takes effect as of now. But I would welcome feedback/pull requests on how we can improve the policy to make it better and reduce the probability that I end up with a bunch of broken packages when all my awesome students, who are much better coders than me, eventually graduate.

Sunday data/statistics link roundup (10/6/2013)

06 Oct 2013

A fascinating read about applying decision theory to mathematical proofs. They talk about Type I and Type II errors and everything.
Statistical concepts explained through dance. Even for a pretty culture-deficient dude like me this is cool.
Lots of good talks from the WIN Workshop, including by one of our speakers for the Unconference on the Future of Statistics.
The best advice for graduate students (or any academics) I have seen in my time writing the Sunday Links. You must try, and then you must ask (via Seth F.).
Alberto C. has a MOOC on infographics and visualization that looks pretty cool. That way you can avoid this kind of thing.
This picture is awesome. Nothing to do with statistics. (via @AstroKatie).
If you aren’t reading Thomas L.’s notstatschat, you should be.
Karl B. has an interesting presentation on open access that is itself open access. First Beamer theme I’ve seen that didn’t make me want to cover my eyes in sadness. My only problem is I wish open access publishing wasn’t so expensive. Can’t we just use a blog/figshare to publish journals that are almost as good. This dude says peer review is old news anyway.

Repost: Finding good collaborators

04 Oct 2013

Editor’s note: Simply Statistics is still freaking out about the government shut down and potential impending economic catastrophe if the debt ceiling isn’t raised. Since anything new we might write seems trivial compared to what is going on in Washington, we are reposting an awesome old piece by Roger on finding good collaborators.

The job of the statistician is almost entirely about collaboration. Sure, there’s theoretical work that we can do by ourselves, but most of the impact that we have on science comes from our work with scientists in other fields. Collaboration is also what makes the field of statistics so much fun.

So one question I get a lot from people is “how do you find good collaborations”? Or, put another way, how do you find good collaborators? It turns out this distinction is more important than it might seem.

My approach to developing collaborations has evolved over time and I consider myself fairly lucky to have developed a few very productive and very enjoyable collaborations. These days my strategy for finding good collaborations is to look for good collaborators. I personally find it important to work with people that I like as well as respect as scientists, because a good collaboration is going to involve a lot of personal interaction. A place like Johns Hopkins has no shortage of very intelligent and very productive researchers that are doing interesting things, but that doesn’t mean you want to work with all of them.

Here’s what I’ve been telling people lately about finding collaborations, which is a mish-mash of a lot of advice I’ve gotten over the years.

Find people you can work with. I sometimes see situations where a statistician will want to work with someone because he/she is working on an important problem. Of course, you want to be working on a problem that interests you, but it’s only partly about the specific project. It’s very much about the person. If you can’t develop a strong working relationship with a collaborator, both sides will suffer. If you don’t feel comfortable asking (stupid) questions, pointing out problems, or making suggestions, then chances are the science won’t be as good as it could be.
It’s going to take some time. I sometimes half-jokingly tell people that good collaborations are what you’re left with after getting rid of all your bad ones. Part of the reasoning here is that you actually may not know what kinds of people you are most comfortable working with. So it takes time and a series of interactions to learn these things about yourself and to see what works and doesn’t work. Of course, you can’t take forever, particularly in academic settings where the tenure clock might be ticking, but you also can’t rush things either. One rule I heard once was that a collaboration is worth doing if it will likely end up with a published paper. That’s a decent rule of thumb, but see my next comment.
It’s going to take some time. Developing good collaborations will usually take some time, even if you’ve found the right person. You might need to learn the science, get up to speed on the latest methods/techniques, learn the jargon, etc. So it might be a while before you can start having intelligent conversations about the subject matter. Then it takes time to understand how the key scientific questions translate to statistical problems. Then it takes time to figure out how to develop new methods to address these statistical problems. So a good collaboration is a serious long-term investment which has some risk of not working out. There may not be a lot of papers initially, but the idea is to make the early investment so that truly excellent papers can be published later.
Work with people who are getting things done. Nothing is more frustrating than collaborating on a project with someone who isn’t that interested in bringing it to a close (i.e. a published paper, completed software package). Sometimes there isn’t a strong incentive for the collaborator to finish (i.e she/he is already tenured) and other times things just fall by the wayside. So finding a collaborator who is continuously getting things done is key. One way to determine this is to check out their CV. Is there a steady stream of productivity? Papers in good journals? Software used by lots of other people? Grants? Web site that’s not in total disrepair?
You’re not like everyone else. One thing that surprised me was discovering that just because someone you know works well with a specific person doesn’t mean that you will work well with that person. This sounds obvious in retrospect, but there were a few situations where a collaborator was recommended to me by a source that I trusted completely, and yet the collaboration didn’t work out. The bottom line is to trust your mentors and friends, but realize that differences in personality and scientific interests may determine a different set of collaborators with whom you work well.

These are just a few of my thoughts on finding good collaborators. I’d be interested in hearing others’ thoughts and experiences along these lines.

Statistical Ode to Mariano Rivera

30 Sep 2013

Mariano Rivera is an outlier in many ways. The plot below shows one of them: top 10 pitchers ranked by postseason saves.

Older Newer

Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

The Care and Feeding of the Biostatistician

The Leek group policy for developing sustainable R packages

Sunday data/statistics link roundup (10/6/2013)

Repost: Finding good collaborators

Statistical Ode to Mariano Rivera