Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Online education: many academics are missing the point

Many academics are complaining about online education and warning us about how it can lead to a lower quality product. For example, the New York Times recently published this op-ed piece wondering if “online education [will] ever be education of the very best sort?”. Although pretty much every controlled experiment comparing online and in-class education finds that students learn just about the same under both approaches, I do agree that in-person lectures are more enjoyable to both faculty and students. But who cares? My enjoyment and the enjoyment of the 30 privileged students that physically sit in my classes seems negligible compared to the potential of reaching and educating thousands of students all over the world.  Also, using recorded lectures will free up time that I can spend on one-on-one interactions with tuition paying students.  But what most excites me about online education is the possibility of being part of the movement that redefines existing disciplines as the number of people learning grows by orders of magnitude. How many Ramanujans are out there eager to learn Statistics? I would love it if they learned it from me. 

Voters Say They Are Wary of Ads Made Just for Them

Voters Say They Are Wary of Ads Made Just for Them

Buy your own analytics startup for $15,000 (at least as of now)

Buy your own analytics startup for $15,000 (at least as of now)

Really Big Objects Coming to R

I noticed in the development version of R the following note in the NEWS file:

There is a subtle change in behaviour for numeric index values 2^31 and larger.  These used never to be legitimate and so were treated as NA, sometimes with a warning.  They are now legal for long vectors so there is no longer a warning, and x[2^31] <- y will now extend the vector on a 64-bit platform and give an error on a 32-bit one.

This is significant news indeed!

Some background: In the old days, when most us worked on 32-bit machines, objects in R were limited to be about 4GB in size (and practically a lot less) because memory addresses were indexed using 32 bit numbers. When 64-bit machines became more common in the early 2000s, that limit was removed. Objects could theoretically take up more memory because of the dramatically larger address space. For the most part, this turned out to be true, although there were some growing pains as R was transitioned to be runnable on 64-bit systems (I remember many of those pains).

However, even with the 64-bit systems, there was a key limitation, which is that vectors, one of the fundamental objects in R, could only have a maximum of 2^31-1 elements, or roughly 2.1 billion elements. This was because array indices in R were stored internally as signed integers (specifically as ‘R_len_t’), which are 32 bits on most modern systems (take a look at .Machine$integer.max in R).

You might think that 2.1 billion elements is a lot, and for a single vector it still is. But you have to consider the fact that internally R stores all arrays, no matter how many dimensions there are, as just long vectors. So that would limit you, for example, to a square a matrix that was no bigger than roughly 46,000 by 46,000. That might have seemed like a large matrix back in 2000 but it seems downright quaint now. And if you had a 3-way array, the limit gets even smaller. 

Now it appears that change is a comin’. The details can be found in the R source starting at revision 59005 if you follow on subversion. 

A new type called ‘R_xlen_t’ has been introduced with a maximum value of 4,503,599,627,370,496, which is 2^52. As they say where I grew up, that’s a lot of McNuggets. So if your computer has enough physical memory, you will soon be able to index vectors (and matrices) that are significantly longer than before.

A Contest for Sequencing Genomes Has Its First Entry in Ion Torrent

A Contest for Sequencing Genomes Has Its First Entry in Ion Torrent