02 Feb 2012
This plan has been making the rounds on Twitter and is being attributed to William Cleveland in 2001 (thanks to Kasper for the link). I’m not sure of the provenance of the document but it has some really interesting ideas and is worth reading in its entirety. I actually think that many Biostatistics departments follow the proposed distribution of effort pretty closely.
One of the most interesting sections is the discussion of computing (emphasis mine):
Data analysis projects today rely on databases, computer and network hardware, and computer and network software. A collection of models and methods for data analysis will be used only if the collection is implemented in a computing environment that makes the models and methods sufficiently efficient to use. In choosing competing models and methods, analysts will trade effectiveness for efficiency of use.
…..
This suggests that statisticians should look to computing for knowledge today, just as data science looked to mathematics in the past.
I also found the theory section worth a read and figure it will definitely lead to some discussion:
Mathematics is an important knowledge base for theory. It is far too important to take for granted by requiring the same body of mathematics for all. Students should study mathematics on an as-needed basis.
….
Not all theory is mathematical. In fact, the most fundamental theories of data science are distinctly nonmathematical. For example, the fundamentals of the Bayesian theory of inductive inference involve nonmathematical ideas about combining information from the data and information external to the data. Basic ideas are conveniently expressed by simple mathematical expressions, but mathematics is surely not at issue.
01 Feb 2012
There was recently a fascinating article published in PNAS that compared the sound quality of different types of violins. In this study, researchers assembled a collection of six violins, three of which were made by Stradivari and Guarneri del Gesu and three made by modern luthiers (i.e. 20th century). The combined value of the “old” violins was $10 million, about 100 times greater than the combined value of the “new” violins. Also, they note:
Numbers of subjects and instruments were small because it is difficult to persuade the owners of fragile, enormously valuable old violins to release them for extended periods into the hands of blindfolded strangers.
Yeah, I’d say so.
They then got 21 professional violinists to try them all out wearing glasses to obscure their vision so they couldn’t see the violins. The researchers were also blinded to the type of violin as the study was being conducted.
The conclusions were striking:
We found that (i) the most-preferred violin was new; (ii) the least-preferred was by Stradivari; (iii) there was scant correlation between an instrument’s age and monetary value and its perceived quality; and (iv) most players seemed unable to tell whether their most-preferred instrument was new or old.
First, I’m glad the researchers got people to actually play the instruments. I don’t think it’s sufficient to just listen to some recordings because usually the recordings are by different performers and the quality of the recording itself may vary quite a bit. Second, the study was conducted in a hotel room for its “dry acoustics”, but I think changing the venue might have changed the results. Third, even though the authors don’t declare any specific financial conflict of interest, it’s worth noting that the second author is a violinmaker who could theoretically benefit if people decide they no longer need to focus on old Italian violins.
I was surprised, but not that surprised, at the results. As a lifelong violinist, I had always wondered whether the Strads and the Guarneris were that much better. I once played on a Guarneri (for about 30 seconds) and I think it’s fair to say that it was incredible. But I’ve also seen some amazing violins made by guys in Brooklyn and New Jersey. I’d always heard that Strads have a darker more mellow sound, which I suppose is nice, but I think these days people may prefer a brighter and bigger sound, especially for those larger modern-day concert halls.
I hope that this study and others like it will get people to focus on which violins sound good rather than where they came from. I’m glad to see the use of data pose a challenge to another long-standing convention.
31 Jan 2012
I find it surprising that NBA commentators rarely talk about field goal percentage. Everybody knows that the more you shoot the more you score. But players that score a lot are admired without consideration of their FG%. Of course having a high FG% is not necessarily admirable as many players only take easy shots, while top-scorers need to take difficult ones. Regardless, missing is undesirable and players that miss more than usual are not criticized enough. Iverson, for example, had a lowly career FG% of 43 yet he regularly made the allstar team. But I am not surprised he never won an NBA championship: it’s hard to win when your top scorer misses so often.

Experts consider Kobe to be one of the all time greats and compare him to Jordan. They never mention that he is consistently among league leaders in missed shots. So far this year, Kobe has missed a whopping 279 times for a league leading 13.3 misses per game. In contrast, Lebron has missed 8.8 per game and has scored about the same per game. The plot above (made with this R script) shows career FG% for players considered to be superstars, top-scorers, and that have won multiple championships (red lines are 1st and 3rd quartiles). I also include Gasol, Lebron, Wade, and Dominique. Note that Kobe has the worst FG% in this group. So how does he win 5 championships? Well perhaps Shaq and later Gasol made up for his misses. Note that the first year Kobe played without Shaq, the Lakers did not make the playoffs. Also, during Kobe’s career the Lakers’ record has been similar with and without him. Experts may compare Kobe to Jordan, but perhaps we should be comparing him to Dominique.
Update: Please see Brunsloe87’s comment for a much better analysis than mine. He/she points out that it’s too simplistic to look at FG%. Instead we should look at something closer to points scored per shot taken. This rewards players, like Kobe, that draw many fouls and has a high FT%. There is a weighted statistic called true scoring % (TS%) that tries to summarize this and below I include a plot of TS% for the same players. Kobe is no Jordan but he is not as bad as Dominique either. He is somewhere in the middle.

The comment also points out that Magic didn’t shoot as much as other superstars so it’s unfair to include him. A better plot would plot TS% versus shots taken (e.g. FGA+FTA/2) but I’ll let someone with more time make that one. Anyways, this plot explains why the early 80s Lakers (Magic+Kareem) were so good.
30 Jan 2012
A growing tend in education is to put lectures online, for free. The Kahn Academy, Stanford’s recent AI course, and Gary King’s new quantitative government course at Harvard are three of the more prominent examples. This new pedagogical format is more democratic, free, and helps people learn at their own pace. It has led some, including us here at Simply Statistics, to suggest that the future of graduate education lies in online courses. Or to forecast the end of in-class lectures.
All this excitement led John Cook to ask, “What do colleges sell?”. The answers he suggested were: (1) real credentials, like a degree, (2) motivation to ensure you did the work, and (3) feedback to tell you how you are doing. As John suggests, online lectures really only target motivated and self-starting learners. For graduate students, this may work (maybe), but for the vast majority of undergrads or high-school students, self-guided learning won’t work due to a lack of motivation.
I would suggest that until the feedback, assessment,and credentialing problems have been solved, online lectures are still more edu-tainment than education.
Of these problems, I think we are closest to solving the feedback problem with online quizes and tests to go with online lectures. What we haven’t solved are assessment and credentialing. The reason is there is no good system for verifying a person taking a quiz/test online is who they say they are. This issue has two consequences: (1) it is difficult to require that a person do online quizes/tests like we do with in-class quizes/tests and (2) it is difficult to believe credentials given to people who take courses online.
What does this have to do with statistics? Well, what we need is an Completely Automated Online Test for Student Identity (COATSI). People will notice a similarity between my acronym and the acronym for CAPTCHAs, the simple online Turing tests used to prove that you are a human and not a computer.
The properties of a COATSI need to be:
- Completely automated
- Provide tests that verify the identity of the student being assessed
- Can be used throughout an online quiz/test/assessment
- Are simple and easy to solve
I can’t think of a deterministic system that can be used for this purpose. My suspicion is that a COATSI will need to be statistical. For example, one idea is to have people sign in with Facebook, then at random intervals while they are solving problems, they have to identify their friends by name. If they do this quickly/consistently enough, they are verified as the person taking the test.
I don’t have a good solution to this problem yet; I’d love to hear more suggestions. I also think this seems like a potentially hugely important and very challenging problem for a motivated grad student or postdoc….
29 Jan 2012
- A really nice D3 tutorial. I’m 100% on board with D3, if they could figure out a way to export the graphics as pdfs, I think this would be the best visualization tool out there.
- A personalized calculator that tells you what number (of the 7 billion or so) that you are based on your birth day. I’m person 4,590,743,884. Makes me feel so special….
- An old post of ours, on dongle communism. One of my favorite posts, it came out before we had much traffic but deserves more attention.
- This isn’t statistics/data related but too good to pass up. From the Bones television show, malware fractals shaved into a bone. I love TV science. Thanks to Dr. J for the link.
- Stats are popular…