Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Innovation and overconfidence

I posted a while ago on how overconfidence may be a good thing. I just read this fascinating article by Neal Stephenson (via aldaily.com) about innovation starvation. The article focuses a lot on how science fiction inspires people to work on big/hard/impossible problems in science. Its a great read for the nerds in the audience. But one quote stuck out for me:

Most people who work in corporations or academia have witnessed something like the following: A number of engineers are sitting together in a room, bouncing ideas off each other. Out of the discussion emerges a new concept that seems promising. Then some laptop-wielding person in the corner, having performed a quick Google search, announces that this “new” idea is, in fact, an old one—or at least vaguely similar—and has already been tried. Either it failed, or it succeeded. If it failed, then no manager who wants to keep his or her job will approve spending money trying to revive it. If it succeeded, then it’s patented and entry to the market is presumed to be unattainable, since the first people who thought of it will have “first-mover advantage” and will have created “barriers to entry.” The number of seemingly promising ideas that have been crushed in this way must number in the millions.

This has to be the single biggest killer of ideas for me. I come up with an idea, google it, find something that is close, and think well it has already been done so I will skip it. I wonder how many of those ideas would have actually turned into something interesting if I had just had a little more overconfidence and skipped the googling? 

OracleWorld Claims and Sensations

Larry Ellison, the CEO of Oracle, like most technology CEOs, has a tendency for the over-the-top sales pitch. But it’s fun to keep track of what these companies are up to just to see what they think the trends are. It seems clear that companies like IBM, Oracle, and HP, which focus substantially on the enterprise (or try to), think the future is data data data. One piece of evidence is the list of companies that they’ve acquired recently.

Ellison claims that they’ve developed a new computer that integrates hardware with software to produce an overall faster machine. Why do we need this kind of integration? Well, for data analysis, of course!

I was intrigued by this line from the article:

On Sunday Mr. Ellison mentioned a machine that he claimed would do data analysis 18 to 23 times faster than could be done on existing machines using Oracle databases. The machine would be able to compute both standard Oracle structured data as well as unstructured data like e-mails, he said.

It’s always a bit hard in these types of articles to figure out what they mean by “data analysis”, but even still, there’s an important idea here.

Alex Szalay talks about the need to “bring the computation to the data”. This comes from his experience working with ridiculous amounts of data from the Sloan Digital Sky Survey. There, the traditional model of pulling the data on to your computer, running some analyses, and then producing results just does not work. But the opposite is often reasonable. If the data are sitting in an Oracle/Microsoft/etc. database, you bring the analysis to the database and operate on the data there. Presumably, the analysis program is smaller than the dataset, or this doesn’t quite work.

So if Oracle’s magic computer is real, it and others like it could be important as we start bringing more computations to the data.

Karl's take on meetings

Karl’s take on meetings

Department of Analytics, anyone?

This article following up on the Moneyball PR demonstrates one of the reasons why statistics might be doomed:

Julia Rozovsky is a Yale M.B.A. student who studied economics and math as an undergraduate, a background that prepared her for a traditional — and lucrative — consulting career. Instead, partly as a result of reading “Moneyball” and finding like-minded people, she pointed herself toward work in analytics.

Why can’t they call it statistics?? The message, of course, is statistics is boring. Analytics is awesome. We probably need to start changing the names of our departments.

Bits: Big Data: Sorting Reality From the Hype

Bits: Big Data: Sorting Reality From the Hype