Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Dissecting the genomics of trauma

Today the results of a study I’ve been involved with for a long time (read: since my early graduate school days) came out in PLoS Medicine (also Princeton News coverage, Eurekalert press release).

We looked at gene expression profiles - how much each of your 20,000 genes is turned on or turned off - in patients who had experienced blunt force trauma. Using these profiles we were able to distinguish very early on which of the patients were going to have positive or negative health trajectories. The idea was to compare patients to themselves and see how much their genomic profiles deviated from the earliest measurements.

I’m excited about this paper for a couple of reasons: (1) like we say in the paper, “Trauma is the number one killer of individuals 1-44y of age in the United States”, (2) the communicating author and joint first authors, Keyur Desai and Chuen Seng Tan, on the paper were statisticians, highlighting the important role statistics played in the scientific process. 

Update:  If you want to check out the data/analyze them yourself, there is a website explaining how to access the data & code here

Advice for stats students on the academic job market

Job hunting season is upon us. Openings are already being posted here, here, and here. So you should have your CV, research statement, and web page ready. I highly recommend having a web page.  It doesn’t have to be fancy. Herehere, and here are some good ones ranging from simple to a bit over the top. Minimum requirements are a list of publications and a link to a CV. If you have written software, link to that as well.

The earlier you submit the better. Don’t wait for your letters. Keep in mind two things: 1) departments have a limit of how many people they can invite and 2) admissions committee members get tired after reading 200+ CVs. 

If you are seeking an academic job your CV should focus on the following: PhD granting institution, advisor (including postdoc advisor if you have one), and papers. Be careful not to drown out these most important features with superflous entries. For papers, Include three sections: 1-published, 2-under review, and 3-under preparation. For 2, include the journal names and if possible have tech reports available on your web page. For 3, be ready to give updates during the interview. If you have papers for which you are co-first author be sure to highlight that fact somehow. 

So what are the different types of jobs?  Before listing the options I should explain the concept of hard versus soft money. Revenue in academia comes from tuition (in public schools the state kicks in some extra $), external funding (e.g. NIH grants), services (e.g. patient care), and philanthropy (endowment). The money that comes from tuition, services, and philanthropy is referred to as hard money. Every year roughly the same amount is available and the way its split among departments rarely changes. When it does, it’s because your chair has either lost or won a long hard-fought zero-sum battle. Research money comes from NIH, NSF, DoD, etc.. and one has to write grants to raise funding (which pay part or all of your salary). These days about 10% of grant applications are funded, so it is certainly not guaranteed. Although at the school level the law of large numbers kicks in, at the individual level it certainly doesn’t. Note that the break down of revenue varies widely from institution to institution. Liberal arts colleges are almost 100% hard money while research institutes are almost 100% soft money.

So to simplify, your salary will come from teaching (tuition) and research (grants). The percentages will vary depending on the department. Here are four types of jobs:

1) Soft money university positions: examples are Hopkins and Harvard Biostat. A typical breakdown is 75% soft/25% hard. To earn the hard money you will have to teach, but not that much. In my dept we teach 48 classroom hours a year (equivalent to one one-semester class). To earn the soft money you have to write, and eventually get, grants. As a statistician you don’t necessarily have to write your own grants, you can partner up with other scientists that need help. And there are many! Salaries are typically higher in these positions. Stress levels are also higher given the uncertainty of funding. I personally like this as it keeps me motivated, focused, and forces me to work on problems important enough to receive NIH funding.

1a) Some schools of medicine have Biostatistics units that are 100% soft money. One does not have to teach, but, unless you have a joint appointment, you won’t have access to grad students.  Still these are tenure track jobs. Although at 100% soft what does tenure mean? The Oncology Biostat division at Hopkins is an example. I should mention at MD Anderson, one only needs to raise 50% of ones salary and the other 50% is earned via service (statistical consulting to the institution). I imagine there are other places like this, as well as institutions that use endowments to provide some hard money. 

2) Hard money positions: examples are Berkeley and Stanford Stat. A typical break down is 75% hard/25% soft. You get paid a 9 month salary.  If you want to get paid in the summer and pay students, you need a grant. Here you typically teach two classes a semester but many places let you “buy out” of  teaching if you can get grants to pay your salary. Some tension exists when chairs decide who teaches the big undergrand courses (lots of grunt work) and who teaches the small seminar classes where you talk about your own work.

3) Research associate positions: examples are jobs in schools of medicine in departments other than Stat/Biostat. These positions are typically 100% soft and are created because someone at the institution has a grant to pay for you. These are usually not tenure track positons and you rarely have to teach. You also have less independence since you have to work on the grant that funds you.

4) Industry: typically 100% hard. There are plenty of for-profit companies where one can have fruitful research careers. AT & T, Google, IBM, Microsoft, and Genentech are all examples of companies with great research groups. Note that S, the language that R is based on, was born in Bell Labs. And one of the co-creators of R now does his research at Genentech. Salaries are typically higher in Industry and cafeteria food can be quite awesome. The drawbacks are no access to students and lack of independence (although not always!).

Update: I reader points out that I forgot:

5) Government jobs: The FDA and NIH are examples of agencies that have research positions. The NCI’s Biometric Research Branch is an example. I would classify these as 100% hard. But it is different than other hard money places in that you have to justify your budget every so often. Service, collaborative, and independent research is expected.  A drawback is that you don’t have access to students although you can get joint appointments. At Hopkins we have a couple of NCI researchers with joint appointments. 

Ok, that is it for now. Sometime in December we will blog about job interviews. 

The Duke Saga

For those of you that don’t know about the saga involving genomic signatures, I highly recommend reading this very good summary published in The Economist. Baggerly and Coombes are two statisticians that can confidently say they have made an impact on clinical research and actually saved lives. A paper by this pair describing the details was published in the Annals of Applied Statistics as most of the Biology journals refused to publish their letters to the editor. Baggerly is also a fantastic public speaker as seen in this video and this one

What is a Statistician?

This Column was written by Terry Speed in 2006 and is reprinted with permission from the IMS Bulletin, http://bulletin.imstat.org

In the generation of my teachers, say from 1935 to 1960, relatively few statisticians were trained for the profession. The majority seemed to come from mathematics, without any specialized statistical training. There was also a sizeable minority coming from other areas, such as astronomy (I can think of one prominent example), chemistry or chemical engineering (three), economics (several), history (one), medicine (several), physics (two), and psychology (several). In those days, PhD programs in statistics were few and far between, and many, perhaps most people moved into statistics because they were interested in the subject, or were responding to a perceived need. They learned the subject on the job, either in government, industry or academia. I also think statistics benefited disproportionately from the minority coming from outside mathematics and statistics, but that may be a personal bias.

This diversity of backgrounds seems to have diminished from the mid-1960s. Almost all of my colleagues in statistics over the last 40 years had some graduate training in statistics. Typically they had a PhD in statistics, probability or mathematics, the last two with some exposure to statistics. A few had masters degrees or diplomas in statistics. My experience probably reflects that of most of you.

By the 1960s our subject had become professional, there was a ticket of entry into it — a PhD or equivalent — and many graduate programs handing them out. I know many statistics departments now include people with joint appointments, for example in the biological, engineering or social sciences, but I have the impression that the majority are people who trained in statistics and moved ‘away’ through their interest in applications there, rather than people from these other areas who were embraced by the statisticians. As is to be expected, there are plenty of exceptions.

Why am I presenting this made-up history of the recent origins of statisticians? Because I have the sense that the situation which has prevailed for about 40 years is changing again. I see a steady trickle, which I predict will grow substantially, of people not trained in statistics moving into our profession. Many have noticed, and I have previously remarked on, the current shortage of bright young people going into our subject. We probably all know universities, institutes or industries trying hard to recruit statisticians, and coming up empty handed. On the other hand, there has been substantial growth in areas which, while not generally regarded as mainstream statistics, might well have been, had things gone differently. My unoriginal observation is that some people from these areas are starting to see statistics as a worthwhile career, not beating but joining us. Computer science, machine learning, image analysis, information theory and bioinformatics, to name a few, have all provided future statisticians to statistics departments around the world in recent years, and I think there will be much more of this.

Recently there was a call for applications for the new United Kingdom EPSRC Statistics Mobility Fellowships, whose aim is “to attract new researchers into the statistics discipline at an early stage in their career”. Is this “mobility” a good idea? In my view, unquestionably yes. Not only do we need an influx of talent to swell our numbers, we also need it to broaden and enrich our subject, so that much of the related activity we now see taking place outside of statistics, and threatening its future, comes inside. In his highly stimulating polemic “Statistical Modelling: The Two Cultures” published in Statistical Science just 5 years ago (16:199–231, 2001), my late colleague Leo Breiman argued that “the focus in the statistical community on data models has:

  • led to irrelevant theory and questionable scientific conclusions; 
  • kept statisticians from using more suitable algorithmic models; 
  • prevented statisticians from working on exciting new problems.”

His view was that “we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.”

One, perhaps over-optimistic, view is that the reform that Leo so desired will come automatically as mainstream statistics is joined by “outsiders” from fields like those mentioned above. Are there risks in this trend? There must be. We want statistics broadened and enriched; we don’t want to see it fragmented, trivialized, or otherwise weakened. We need our theorists working hard to incorporate all these new ideas into our long-standing big picture, we need the newcomers to become familiar with the best we have to offer, and we all need to work together in answering the questions of all the people outside our discipline needing our involvement.

Data visualization and art

Mark Hansen is easily one of my favorite statisticians today. He is a Professor of Statistics at UCLA and his collaborations with artists have brought data visualization to a whole new place, one that is both informative and moving. 

Here is a video of his project with Ben Rubin called Listening Post. The installation grabs conversations from unrestricted chat rooms and processes them in real-time to create interesting “themes” or “movements”. I believe this one is called “I am” and the video is taken from the Whitney Museum of American Art.

[youtube http://www.youtube.com/watch?v=dD36IajCz6A&w=420&h=345]

Here some pretty cool time-lapse photography of the installation of Listening Post at the San Jose Museum of Art

[youtube http://www.youtube.com/watch?v=cClHQU6Fqro]