Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Sunday data/statistics link roundup (4/22)

  1. Now we know who is to blame for the pie chart. I had no idea it had been around, straining our ability to compare relative areas, since 1801. However, the same guy (William Playfair) apparently also invented the bar chart. So he wouldn’t be totally shunned by statisticians. (via Leonid K.)
  2. A nice article in the Guardian about the current group of scientists that are boycotting Elsevier. I have to agree with the quote that leads the article, “All professions are conspiracies against the laity.” On the other hand, I agree with Rafa that academics are partially to blame for buying into the closed access hegemony. I think more than a boycott of a single publisher is needed; we need a change in culture. (first link also via Leonid K)
  3. A blog post on how to add a transparent image layer to a plot. For some reason, I have wanted to do this several times over the last couple of weeks, so the serendipity of seeing it on R Bloggers merited a mention. 
  4. I agree the Earth Institute needs a better graphics advisor. (via Andrew G.)
  5. A great article on why multiple choice tests are used - they are an easy way to collect data on education. But that doesn’t mean they are the right data. This reminds me of the Tukey quote: “The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data”. It seems to me if you wanted to have a major positive impact on education right now, the best way would be to develop a new experimental design that collects the kind of data that really demonstrates mastery of reading/math/critical thinking. 
  6. Finally, a bit of a bleg…what is the best way to do the SVD of a huge (think 1e6 x 1e6), sparse matrix in R? Preferably without loading the whole thing into memory…