Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

R and the little data scientist's predicament

I just read this fascinating post on _why, apparently a bit of a cult hero among enthusiasts of the Ruby programming language. One of the most interesting bits was The Little Coder’s Predicament, which boiled down essentially says that computer programming languages have grown too complex - so children/newbies can’t get the instant gratification when they start programming. He suggested a simplified “gateway language” that would get kids fired up about programming, because with a simple line of code or two they could make the computer do things like play some music or make a video. 

I feel like there is a similar ramp up with data scientists. To be able to do anything cool/inspiring with data you need to know (a) a little statistics, (b) a little bit about a programming language, and (c) quite a bit about syntax. 

Wouldn’t it be cool if there was an R package that solved the little data scientist’s predicament? The package would have to have at least some of these properties:

  1. It would have to be easy to load data sets, one line of not complicated code. You could write an interface for RCurl/read.table/download.file for a defined set of APIs/data sets so the command would be something like: load(“education-data”) and it would load a bunch of data on education. It would handle all the messiness of scraping the web, formatting data, etc. in the background. 
  2. It would have to have a lot of really easy visualization functions. Right now, if you want to make pretty plots with ggplot(), plot(), etc. in R, you need to know all the syntax for pch, cex, col, etc. The plotting function should handle all this behind the scenes and make super pretty pictures. 
  3. It would be awesome if the functions would include some sort of dynamic graphics (with svgAnnotation or a wrapper for D3.js). Again, the syntax would have to be really accessible/not too much to learn. 

That alone would be a huge start. In just 2 lines kids could load and visualize cool data in a pretty way they could show their parents/friends.