26 Mar 2012
I just read this fascinating post on _why, apparently a bit of a cult hero among enthusiasts of the Ruby programming language. One of the most interesting bits was The Little Coder’s Predicament, which boiled down essentially says that computer programming languages have grown too complex - so children/newbies can’t get the instant gratification when they start programming. He suggested a simplified “gateway language” that would get kids fired up about programming, because with a simple line of code or two they could make the computer do things like play some music or make a video.
I feel like there is a similar ramp up with data scientists. To be able to do anything cool/inspiring with data you need to know (a) a little statistics, (b) a little bit about a programming language, and (c) quite a bit about syntax.
Wouldn’t it be cool if there was an R package that solved the little data scientist’s predicament? The package would have to have at least some of these properties:
- It would have to be easy to load data sets, one line of not complicated code. You could write an interface for RCurl/read.table/download.file for a defined set of APIs/data sets so the command would be something like: load(“education-data”) and it would load a bunch of data on education. It would handle all the messiness of scraping the web, formatting data, etc. in the background.
- It would have to have a lot of really easy visualization functions. Right now, if you want to make pretty plots with ggplot(), plot(), etc. in R, you need to know all the syntax for pch, cex, col, etc. The plotting function should handle all this behind the scenes and make super pretty pictures.
- It would be awesome if the functions would include some sort of dynamic graphics (with svgAnnotation or a wrapper for D3.js). Again, the syntax would have to be really accessible/not too much to learn.
That alone would be a huge start. In just 2 lines kids could load and visualize cool data in a pretty way they could show their parents/friends.
25 Mar 2012
Shortly after the Duke trial scandal broke, the Institute of Medicine convened a committee to write a report on translational omics. Several statisticians (including one of our interviewees) either served on the committee or provided key testimony. The report came out yesterday. Nature, Nature Medicine, and Science had posts about the release. Keith Baggerly sent an email with his thoughts and he gave me permission to post it here. He starts by pointing out that the Science piece has a key new observation:
The NCI’s Lisa McShane, who spent months herself trying to validate Duke results, says the IOM committee “did a really fine job” in laying out the issues. NCI now plans to require that its cooperative groups who want to use omics tests follow a checklist similar to that in the IOM report. NCI has not yet decided whether it should add new requirements for omics tests to its peer review process for investigator-initiated grants. But “our hope is that this report will heighten everyone’s awareness,” McShane says.
Some further thoughts from Keith:
First, the report helps clarify the regulatory landscape: if omics-based tests (which the FDA views as medical devices) will direct patient therapy, FDA approval in the form of an Investigational Device Exemption (IDE) is required. This is in keeping with increased guidance FDA has been providing over the past year and a half dealing with companion diagnostics. It seems likely that several of the problems identified with the Duke trials would have been caught by an FDA review, particularly if the agency already had cause for concern, such as a letter to the editor identifying analytical shortcomings.
Second, the report recommends the publication of the full data, code, and metadata used to construct the omics assays prior to their use to guide patient therapy. Had such data and code been available earlier, this would have greatly reduced the amount of effort required for others (including us) to check and potentially extend on the underlying results.
Third, the report emphasizes, repeatedly, that the test must be fully specified (“locked down”) before it is validated, let alone used to guide patient therapy. Quite a bit of effort is given to providing an explicit definition of locked down, in part (we suspect) because both Lisa McShane (NCI) and Robert Becker (FDA) provided testimony that incomplete specification was a problem their agencies encountered frequently. Such specification would have prevented problems such as that identified by the NCI for the Lung Metagene Score (LMS) in 2010, which led the NCI to remove the LMS evaluation as a goal of the Phase III cooperative group trial CALGB-30506.
Finally, the very existence of the report is recognition that reproducibility is an important problem for the omics-test community. This is a necessary step towards fixing the problem.
23 Mar 2012
The NIH provides financial support for a large percentage of biological and medical research in the United States. This funding supports a large number of US jobs, creates new knowledge, and improves healthcare for everyone. So I am signing this petition:
NIH funding is essential to our national research enterprise, to our local economies, to the retention and careers of talented and well-educated people, to the survival of our medical educational system, to our rapidly fading worldwide dominance in biomedical research, to job creation and preservation, to national economic viability, and to our national academic infrastructure.
The current administration is proposing a flat $30.7 billion FY 2013 NIH budget. The graph below (left) shows how small the NIH budget is in comparison to the Defense and Medicare budgets in absolute terms. The difference between the administration’s proposal and the petition’s proposal ($33 billion) are barely noticeable.
The graph on the right shows how in 2003 growth in the NIH budget fell dramatically while medicare and military spending kept growing. However, despite the decrease in rate, the NIH budget did continue to increase under Bush. If we follow Bush’s post 2003 rate (dashed line), the 2013 budget will be about what the petition asks for: $33 billion.

If you agree that the relatively modest increase in the NIH budget is worth the incredibly valuable biological, medical, and economic benefits this funding will provide, please consider signing the petition before April 15