Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Sunday data/statistics link roundup (12/30/12)

  1. An interesting new app called 100plus, which looks like it uses public data to help determine how little decisions (walking more, one more glass of wine, etc.) lead to more or less health. Here’s a post describing it on the heathdata.gov blog. As far as I can tell, the app is still in beta, so only the folks who have a code can download it.
  2. Data on mass shootings from the Mother Jones investigation.
  3. A post by Hilary M. on “Getting Started with Data Science”. I really like the suggestion of just picking a project and doing something, getting it out there. One thing I’d add to the list is that I would spend a little time learning about an area you are interested in. With all the free data out there, it is easy to just “do something”, without putting in the requisite work to know why what you are doing is good/bad. So when you are doing something, make sure you take the time to “know something”.
  4. An analysis of various measures of citation impact (also via Hilary M.). I’m not sure I follow the reasoning behind all of the analyses performed (seems a little like throwing everything at the problem and hoping something sticks) but one interesting point is how citation/usage are far apart from each other on the PCA plot. This is likely just because the measures cluster into two big categories, but it makes me wonder. Is it better to have a lot of people read your paper (broad impact?) or cite your paper (deep impact?).
  5. An [ 1. An interesting new app called 100plus, which looks like it uses public data to help determine how little decisions (walking more, one more glass of wine, etc.) lead to more or less health. Here’s a post describing it on the heathdata.gov blog. As far as I can tell, the app is still in beta, so only the folks who have a code can download it.
  6. Data on mass shootings from the Mother Jones investigation.
  7. A post by Hilary M. on “Getting Started with Data Science”. I really like the suggestion of just picking a project and doing something, getting it out there. One thing I’d add to the list is that I would spend a little time learning about an area you are interested in. With all the free data out there, it is easy to just “do something”, without putting in the requisite work to know why what you are doing is good/bad. So when you are doing something, make sure you take the time to “know something”.
  8. An analysis of various measures of citation impact (also via Hilary M.). I’m not sure I follow the reasoning behind all of the analyses performed (seems a little like throwing everything at the problem and hoping something sticks) but one interesting point is how citation/usage are far apart from each other on the PCA plot. This is likely just because the measures cluster into two big categories, but it makes me wonder. Is it better to have a lot of people read your paper (broad impact?) or cite your paper (deep impact?).
  9. An](https://twitter.com/hmason/status/285163907360899072) on Twitter about how big data does not mean you can ignore the scientific method. We have talked a little bit about this before, in terms of how one should motivate statistical projects.