Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Oh no, the Leekasso....

An astute reader (Niels Hansen, who is visiting our department today) caught a bug in my code on Github for the Leekasso. I had:

lm1 = lm(y ~ leekX)

predict.lm(lm1,as.data.frame(leekX2))

Unfortunately, this meant that I was getting predictions for the training set on the test set. Since I set up the test/training sets the same, this meant that I was actually getting training set error rates for the Leekasso. Neils Hansen noticed the bug and reran the fixed code with this term instead:

lm1 = lm(y ~ ., data = as.data.frame(leekX))

predict.lm(lm1,as.data.frame(leekX2))

He created a heatmap subtracting the average accuracy of the Leekasso/Lasso and showed they are essentially equivalent.

LeekassoLasso

This is a bummer, the Leekasso isn’t a world crushing algorithm. On the other hand, I’m happy that just choosing the top 10 is still competitive with the optimized lasso on average. More importantly, although I hate being wrong, I appreciate people taking the time to look through my code.

Just out of curiosity I’m taking a survey. Do you think I should publish this top10 predictor thing as a paper? Or do you think it is too trivial?