Oh no, the Leekasso....

12 Mar 2014

An astute reader (Niels Hansen, who is visiting our department today) caught a bug in my code on Github for the Leekasso. I had:

lm1 = lm(y ~ leekX)

predict.lm(lm1,as.data.frame(leekX2))

Unfortunately, this meant that I was getting predictions for the training set on the test set. Since I set up the test/training sets the same, this meant that I was actually getting training set error rates for the Leekasso. Neils Hansen noticed the bug and reran the fixed code with this term instead:

lm1 = lm(y ~ ., data = as.data.frame(leekX))

predict.lm(lm1,as.data.frame(leekX2))

He created a heatmap subtracting the average accuracy of the Leekasso/Lasso and showed they are essentially equivalent.

This is a bummer, the Leekasso isn’t a world crushing algorithm. On the other hand, I’m happy that just choosing the top 10 is still competitive with the optimized lasso on average. More importantly, although I hate being wrong, I appreciate people taking the time to look through my code.

Just out of curiosity I’m taking a survey. Do you think I should publish this top10 predictor thing as a paper? Or do you think it is too trivial?

Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Oh no, the Leekasso....

Related Posts

Some default and debt restructuring data 04 May 2017

Science really is non-partisan: facts and skepticism annoy everybody 24 Apr 2017

Redirect 06 Apr 2017