09 Aug 2013
A silly, but actually very serious, error in the supplementary material of a recent paper in Organometallics is causing a stir on the internets (I saw it on Andrew G.’s blog). The error in question is a comment in the supplementary material of the paper:
Emma, please insert NMR data here! where are they? and for this compound, just make up an elemental analysis . . .
As has been pointed out on the chemistry blogs, this is actually potentially a pretty serious problem. Apparently, the type of analysis in question is relatively easy to make up or at minimum, there are a lot of researcher degrees of freedom.
This error reminds me of another slip-up, this one from a paper in BMC Bioinformatics. Here is the key bit, from the abstract:
In this study, we have used (insert statistical method here) to compile unique DNA methylation signatures from normal human heart, lung, and kidney using the
These slip-ups seem pretty embarrassing/funny at first pass. I will also admit that in some ways, I’m pretty sympathetic as a person who advises students and analysts. The comments on intermediate drafts of papers frequently say things like, “put this analysis here” or “fill in details here”. I think if one slipped through the cracks and ended up in the abstract or supplement of a paper I was publishing, I’d look pretty silly to.
But there are some more important issues here that relate to the issue of analysts/bioinformaticians/computing experts being directed by scientists. In some cases the scientists might not understand statistics, which has its own set of problems. But often the scientists know exactly what they are talking about; the analyst and their advisor/boss just need to communicate about what is acceptable and what isn’t acceptable in practice. This is beautifully covered in this post on advice for lonely bioinformaticians. I would extend that to all students/lonely analysts in any field. Finally, in the era of open science and collaboration, it is pretty clear that it is important to make sure that statements made in the margins of drafts can’t be misinterpreted and to check for typos in final submitted drafts of papers. Always double check for typos.
08 Aug 2013
A couple of cool things happened at this years JSM.
- Twitter adoption went way up and it was much easier for people (like me) who weren’t there to keep track of all the action by monitoring the #JSM2013 hashtag.
-
Nate Silver gave the keynote and [A couple of cool things happened at this years JSM.
- Twitter adoption went way up and it was much easier for people (like me) who weren’t there to keep track of all the action by monitoring the #JSM2013 hashtag.
- Nate Silver gave the keynote and](https://twitter.com/rafalab/status/364480835577073664/photo/1) showed up.
Nate Silver is hands down the rockstar of our field. I mean, no other statistician changing jobs would make the news at the Times, at ESPN, and on pretty much every other major news source.
Silver’s talk at JSM focused on 11 principles of statistical journalism, which are covered really nicely here by Joseph Rickert from Revolution. After his talk, he answered questions Tweeted from the audience. He brought the house down (I’m sure in person, but definitely on Twitter) with his response to a question about data scientists versus statisticians with the perfectly weighted response for the audience:
Data scientist is just a sexed up word for statistician
Of course statisticians love to hear this but data scientists didn’t necessarily agree.
I’ve talked about the statistician/data scientist divide before and how I think that we need better marketing as statisticians. I think it is telling that some of the very accomplished, very successful people tweeting about Nate’s quote are uncomfortable being labeled statistician. The reason, I think, is that statisticians have a reputation for focusing primarily on theory and not being willing to do the schlep.
I do think there is some cachet to having the “hot job title” but eventually solving real problems matters more. Which leads me to my favorite part of Nate’s quote, the part that isn’t getting nearly as much play as it should:
Just do good work and call yourself whatever you want.
I think that as statisticians we should embrace a “big tent” approach to labeling. But rather than making it competitive by saying data scientists aren’t that great they are just “sexed up” statisticians, we should make it inclusive, “data scientists are statisticians because being a statistician is awesome and anyone who does cool things with data is a statistician”. People who build websites, or design graphics, or make reproducible documents, or build pipelines, or hack low-level data are all statisticians and we should respect them all for their unique skills.
07 Aug 2013
Sorry for the delay with my session picks for Wednesday. Here’s what I’m thinking of:
- 8:30-10:20am: Bayesian Methods for Causal Inference in Complex Settings (CC-520a) or Developments in Statistical Methods for Functional and Imaging Data (CC-522bc)
- 10:30am-12:20pm: Spatial Statistics for Environmental Health Studies (CC-510c) or Big Data Exploration with Amazon (CC-516c)
- 2-3:50pm: There are some future stars in the session Environmental Impacts on Public and Ecological Health (CC-512h) and Statistical Challenges in Cancer Genomics with Next-Generation Sequencing and Microarrays (CC-514a)
- 4-5:50pm: Find out who won the COPSS award! (CC-517ab)
06 Aug 2013
It seems like Monday was a big hit at JSM with Nate Silver’s talk and all. Rafa estimates that there were about 1 million people there (+/- 1 million). Ramnath Vaidyanathan has a nice summary of the talk and the Q&A afterwards. Among other things, Silver encouraged people to start a blog and communicate directly with the public. Couldn’t agree more! Thanks to all who live-tweeted at #JSM2013. I felt like I was there.
On to Tuesday! Here’s where I’d like to go:
- 8:30-10:20am: Spatial Uncertainty in Public Health Problems (CC-513b); and since Nate says education is the next important area, Statistical Knowledge for Teaching: Research Results and Implications for Professional Development (CC-520d)
- 10:30am-12:20pm: Check out the latest in causal inference at Fresh Perspectives on Causal Inference (CC-512f) and come see the future of statistics at the **SBSS Student Paper Travel Award Winners II** (CC-520d)
- 2-3:50pm: There’s a cast of all-stars over in the Biased Epidemiological Study Designs: Opportunities and Challenges (CC-511c) session and a visualization session with an interesting premise Painting a Picture of Life in the United States (CC-510a)
- 4-5:50pm: Only two choices here, so take your pick (or flip a coin).
05 Aug 2013
I’m sadly not able to attend the Joint Statistical Meetings this year (where Nate Silver is the keynote speaker!) in the great city of Montreal. I’m looking forward to checking out the chatter on #JSM2013 but in the meantime, here are the sessions I would have attended if I’d been there. If I pick more than one session for a given time slot, I assume you can run back and forth between the two.
- 8:30-10:20am: Kasper Hansen is presenting in Statistical Methods for High-Dimensional Data: Presentations by Junior Researchers (CC-515c) and there are some great people in The Profession of Statistics and Its Impact on the Media (CC-516d)
- 10:30am-12:20pm: There are some heavy hitters in the Showcase of Analysis of Correlated Measurements (CC-511d); this session has a great title Herd Immunity: Teaching Techniques for the Health Sciences (CC-515b)**
**
- 2-3:50pm: I have a soft spot in my heart for a good MCMC session like Challenges in Using Markov Chain Monte Carlo in Modern Applications (CC-510d); I also have a soft spot for visualization and Simon Urbanek - Visualizing Big Data Interactively (CC-510b)
- 4-5:50pm: I would check out Nate Silver’s talk (CC-517ab)
Have fun!