Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Interview with Héctor Corrada Bravo

Héctor Corrada Bravo

Héctor Corrada Bravo is an assistant professor in the Department of Computer Science and the Center for Bioinformatics and Computational Biology at the University of Maryland, College Park. He moved to College Park after finishing his Ph.D. in computer science at the University of Wisconsin and a postdoc in biostatistics at the Johns Hopkins Bloomberg School of Public Health. He has done outstanding work at the intersection of molecular biology, computer science, and statistics. For more info check out his webpage.

Which term applies to you: statistician/data scientist/computer
scientist/machine learner?

I want to understand interesting phenomena (in my case mostly in
biology and medicine) and I believe that our ability to collect a large number of relevant
measurements and infer characteristics of these phenomena can drive
scientific discovery and commercial innovation in the near future.
Perhaps that makes me a data scientist and means that depending on the
task at hand one or more of the other terms apply.

A lot of the distinctions many people make between these terms are
vacuous and unnecessary, but some are nonetheless useful to think
about. For example, both statisticians and machine learners [sic] know
how to create statistical algorithms that compute interesting and informative objects using measurements (perhaps) obtained through some stochastic or partially observed
process. These objects could be genomic tools for cancer screening, or
statistics that better reflect the relative impact of baseball players
on team success.
 

Both fields also give us ways to evaluate and characterize these objects.
However, there are times when these objects are tools that fulfill an
immediately utilitarian purpose and thinking like an engineer might
(as many people in Machine Learning do) is the right approach.
Other times, these objects are there to help us get insights about our
world and thinking in ways that many statisticians do is the right
approach.  You need both of these ways of thinking to do interesting
science and dogmatically avoiding either of them is a terrible idea.

How did you get into statistics/data science (i.e. your history)?

I got interested in Artificial Intelligence at one point, and found
that my mathematics background was nicely suited to work on this. Once
I got into it, thinking about statistics and how to analyze and
interpret data was natural and necessary. I started working with two
wonderful advisors at Wisconsin, Raghu Ramakrishnan (CS) and Grace Wahba (Statistics)
that helped shape the way I approach problems from different angles
and with different goals. The last piece was discovering that
computational biology is a fantastic setting in which to apply and
devise these methods to answer really interesting questions.

What is the problem currently driving you?

I’ve been working on cancer epigenetics to find specific genomic
measurements for which increased stochasticity appears to be general
across multiple cancer types. Right now, I’m really wondering how far
into the clinic can these discoveries be taken, if at all. For
example, can we build tools that use these genomic measurements to
improve cancer screening?

How do you see CS/statistics merging in the future?

I think that future got here some time ago, but is about to get much
more interesting.

Here is one example: Computer Science is about creating and analyzing
algorithms and building the systems that can implement them. Some of
what many computer scientists have done looks at problems concerning how to
keep, find and ship around information (Operating Systems, Networks,
Databases, etc.). Many times these have been driven by very specific
needs, e.g., commercial transactions in databases. In some ways,
companies have moved from from asking how do I use data to keep track
of my activities to how do I use data to decide which activities to do
and how to do them. Statistical tools should be used to answer these
questions, and systems built by computer scientists have statistical
algorithms at their core.

Beyond R, what are some really useful computational tools for
statisticians to know about?

I think a computational tool that everyone can benefit a lot from
understanding better is algorithm design and analysis. This doesn’t
have to be at a particularly deep level, but just getting a sense of
how long a particular process might take, and how to devise a different way of doing it that might make it more efficient is really useful. I’ve been toying with the idea of creating a CS course called (something like) “Highlights of continuous
mathematics for computer science” that reminds everyone of the cool
stuff that one learns in math now that we can appreciate their usefulness. Similarily, I think
statistics students can benefit from “Highlights of discrete
mathematics for statisticians”.

Now a request for comments below from you and readers: (5a) Beyond R,
what are some really useful statistical tools for computer scientists
to know about?

Review times in statistics journals are long, should statisticians
move to conference papers?

I don’t think so. Long review times (anything more than 3 weeks) are
really not necessary. We tend to publish in journals with fairly quick
review times that produce (for the most part) really useful and
insightful reviews.
 

I was recently talking to senior members in my field who were telling
me stories about the “old times” when CS was moving from mainly
publishing in journals to now mainly publishing in conferences. But
now, people working in collaborative projects (like computational biology) work in fields
that primarily publish in journals, so the field needs to be able to
properly evaluate their impact and productivity. There is no perfect
system.
 

For instance, review requests in fields where conferences are the main
publication venue come in waves (dictated by conference schedule).
Reviewers have a lot of papers to go over in a relatively short time
which makes their job of providing really helpful and fair reviews not
so easy. So, in that respect, the journal system can be better. The one thing that is universally true is that you don’t need long review times.

Previous Interviews: Daniela Witten, Chris Barr, Victoria Stodden

Google Scholar Pages

If you want to get to know more about what we’re working on, you can check out our Google Scholar pages:

I’ve only been using it for a day but I’m pretty impressed by how much it picked up. My only problem so far is having to merge different versions of the same paper.

The History Of Nonlinear Principal Components

[youtube http://www.youtube.com/watch?v=V-hFORcBj44?wmode=transparent&autohide=1&egm=0&hd=1&iv_load_policy=3&modestbranding=1&rel=0&showinfo=0&showsearch=0&w=500&h=375]

The History of Nonlinear Principal Components Analysis, a lecture given by Jan de Leeuw. For those that have ~45 minutes to spare, it’s a very nice talk given in Jan’s characteristic style.

Amazon EC2 is #42 on Top 500 supercomputer list

Amazon EC2 is #42 on Top 500 supercomputer list

Preparing for tenure track job interviews

If you are in the job market you will soon be receiving (or already received) an invitation for an interview. So how should you prepare?  You have two goals. The first is to make a good impression. Here are some tips:

1) During your talk, do NOT go over your allotted time. Practice your talk at least twice. Both times in front of a live audiences that asks questions. 

2) Know you audience. If it’s a “math-y” department, give a more “math-y” talk. If it’s an applied department, give a more applied talk. But (sorry for the cliché) be yourself. Don’t pretend to be interested in something you are not. I remember one candidate that pretended to be interested in applications and it back fired badly during the talk.  

3) Learn about the faculty’s research interests. This will help during the one-on-one interviews.

4)  Be ready to answer the question “what do you want to teach?” and “where do you see yourself in five years?”

5) I can’t think of any department where it is necessary to wear a suit (correct me if I’m wrong in the comments). In some places you might feel uncomfortable wearing a suit while those interviewing you are in shorts and t-shirt. But do dress up. Show them you care. 

Second, and just as important, you want to figure out if you like the department you are visiting. Do you want to spend the next 5, 10, 50 years there?  Make sure to find out as much as you can to answer this question. Some questions are more appropriate for junior faculty, the more sensitive ones for the chair. Here are some example questions I would ask:

1) What are the expectations for promotion? Would you promote someone publishing exclusively in Nature? Somebody publishing exclusively in Annals of Statistics? Is being a PI on an R01 a requirement for tenure? 

2) What are the expectations for teaching/service/collaboration? How are teaching and committee service assignments made?   

3) How did you connect with your collaborators? How are these connections made?

4) What percent of my salary am I expected to cover? Is it possible to do this by being a co-investigator?

5) Where do you live? How are the schools? How is the commute?  

6) How many graduate students does the department have? How are graduate students funded? If I want someone to work with me, do I have to cover their stipend/tuition?

Specific questions for the junior Faculty:

Are the expectations for promotion made clear to you? Do you get feedback on your progress? Do the senior faculty mentor you? Do the senior faculty get along? What do you like most about the department? What can be improved? In the last 10 years, what percent of junior faculty get promoted?

Questions for the chair:

What percent of my salary am I expected to cover? How soon? Is their bridge funding? What is a standard startup package? Can you describe the promotion process in detail? What space is available for postdocs? (for hard money place) I love teaching, but can I buy out teaching with grants? 

I am sure I missed stuff, so please comment away….

Update: I can’t believe I forgot computing! Make sure to ask about computing support. This varies a lot from place to place. Some departments share amazing systems. Ask how costs are shared? How is the IT staff? Is R supported? In others you might have to buy your own hardware. Get all the details.