Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Sunday Data/Statistics Link Roundup (7/15/12)

  1. A really nice list of journals software/data release policies from Titus’ blog. Interesting that he couldn’t find a data/release policy for the New England Journal of Medicine. I wonder if that is because it publishes mostly clinical studies, where the data are often protected for privacy reasons? It seems like there is going to eventually be a big discussion of the relative importance of privacy and open data in the clinical world. 
  2. Some interesting software that can be used to build virtual workflows for computational science. It seems like a lot of data analysis is still done via “drag and drop” programs. I can’t help but wonder if our effort should be focused on developing drag and drop or educating the next generation of scientists to have minimum scripting capabilities. 
  3. We added StatsChat by Thomas L. and company to our blogroll. Lots of good stuff there, for example, this recent post on when randomized trials don’t help. You can also follow them on twitter.  
  4. A really nice post on processing public data with R. As more and more public data becomes available, from governments, companies, APIs, etc. the ability to quickly obtain, process, and visualize public data is going to be hugely valuable. 
  5. Speaking of public data, you could get it from APIs or from government websites. But beware those category 2 problems