A Tribute To One Of The Most Popular Methods In

16 Jan 2012

[youtube http://www.youtube.com/watch?v=oPzERmPlmw8?wmode=transparent&autohide=1&egm=0&hd=1&iv_load_policy=3&modestbranding=1&rel=0&showinfo=0&showsearch=0&w=500&h=375]

A tribute to one of the most popular methods in statistics.

(Source: http://www.youtube.com/)

Sunday Data/Statistics Link Roundup

15 Jan 2012

Statistics help for journalists (don’t forget to keep rating stories!) This is the kind of thing that could grow into a statisteracy page. The author also has a really nice plug for public schools.
An interactive graphic to determine if you are in the 1% from the New York Times (I’m not…).
Mike Bostock’s d3.js presentation, this is some really impressive visualization software. You have to change the slide numbers manually but it is totally worth it. Check out slide 10 and slide 14. This is the future of data visualization. Here is a beginners tutorial to d3.js by Mike Dewar.
An online diagnosis prediction start-up (Symcat) based on data analysis from two Hopkins Med students.

Finally, a bit of a bleg. I’m going to try to make this link roundup a regular post. If you have ideas for links I should include, tweet us @simplystats or send them to Jeff’s email.

In the era of data what is a fact?

13 Jan 2012

The Twitter universe is abuzz about this article in the New York Times. Arthur Brisbane, who responds to reader’s comments, asks

I’m looking for reader input on whether and when New York Times news reporters should challenge “facts” that are asserted by newsmakers they write about.

He goes on to give a couple of examples of qualitative facts that reporters have used in stories without questioning the veracity of the claims. As many people pointed out in the comments, this is completely absurd. Of course reporters should check facts and report when the facts in their stories, or stated by candidates, are not correct. That is the purpose of news reporting.

But I think the question is a little more subtle when it comes to quantitative facts and statistics. Depending on what subsets of data you look at, what summary statistics you pick, and the way you present information - you can say a lot of different things with the same data. As long as you report what you calculated, you are technically reporting a fact - but it may be deceptive. The classic example is calculating median vs. mean home prices. If Bill Gates is in your neighborhood, no matter what the other houses cost, the mean price is going to be pretty high!

Two concrete things can be done to deal with the malleability of facts in the data age.

First, we need to require that our reporters, policy makers, politicians, and decision makers report the context of numbers they state. It is tempting to use statistics as blunt instruments, punctuating claims. Instead, we should demand that people using statistics to make a point embed them in the broader context. For example, in the case of housing prices, if a politician reports the mean home price in a neighborhood, they should be required to state that potential outliers may be driving that number up. How do we make this demand? By not believing any isolated statistics - statistics will only be believed when the source is quoted and the statistic is described.

But this isn’t enough, since the context and statistics will be meaningless without raising overall statisteracy (statistical literacy, not to be confused with numeracy). In the U.S. literacy campaigns have been promoted by library systems. Statisteracy is becoming just as critical; the same level of social pressure and assistance should be applied to individuals who don’t know basic statistics as those who don’t have basic reading skills. Statistical organizations, academic departments, and companies interested in analytics/data science/statistics all have a vested interest in raising the population statisteracy. Maybe a website dedicated to understanding the consequences of basic statistical concepts, rather than the concepts themselves?

And don’t forget to keep rating health news stories!

Academics are partly to blame for supporting the closed and expensive access system of publishing

13 Jan 2012

Michael Eisen recently published a New York Times op-ed arguing that a bill meant to protect publishers, introduced in the House of Representatives, will result in tax payers paying twice for scientific research. According to Eisen

If the bill passes, to read the results of federally funded research, most Americans would have to buy access to individual articles at a cost of $15 or $30 apiece. In other words, taxpayers who already paid for the research would have to pay again to read the results.

We agree and encourage our readers to write Congress opposing the “Research Works Act”. However, whereas many are vilifying the publishers that are lobbying for this act, I think us academics are the main culprits keeping open access from succeeding.

If this bill makes it into law, I do not think that the main issue will be US taxpayers paying twice for research, but rather that access will be even more restricted to the general scientific community. Interested parties outside the US -and in developing countries in particular- should have unrestriced access to scientific knowledge. Congresswoman Carolyn Maloney gets it wrong by not realizing that giving China (and other countries) access to scientific knowledge is beneficial to science in general and consequently to everyone. However, to maintain the high quality of research publications we currently enjoy, someone needs to pay for competent editors, copy editors, support staff, and computer servers. Open access journals shift the costs from the readers to authors that have plenty of funds (grants, startups, etc..) to cover the charges. By charging the authors, papers can be made available online for free. Free to everyone. Open access. PLoS has demonstrated that the open access model is viable, but a paper in PLoS Biology will run you $2,900 (see Jeff’s table). Several non-profit societies and for profit publishers, such as Nature Publishing Group, offer open access for about the same price.

So given all the open access options, why do gated journals survive? I think the main reason is that we -the scientific community- through appointments and promotions committees, study sections, award committees, etc. use journal prestige to evaluate publication records disregarding open access as a criteria (see Eisen’s related post on decoupling publication and assessment). Therefore, those that decide to only publish in open access journals, may hinder not only their careers, but also the careers of their students and postdocs. The other reason is that for authors, publishing gated papers is typically cheaper than open access papers, and we don’t always make the more honorable decision.

Another important consideration is that a substantial proportion of publication costs comes from printing paper copies. My department continues to buy print copies of several stat journals as well as some of the general science magazines. The Hopkins library, on behalf of the faculty, buys print versions of hundreds of journals. As long as we continue to create a market for paper copies, the journals will continue to allocate resources to producing them. Somebody has to pay for this, yet with online versions already being produced the print versions are superfluous.

Apart from opposing the Research Works Act as Eisen proposes, there are two more things I intend to do in 2012: 1) lobby my department to stop buying print versions and 2) lobby my study section to give special consideration to open access publications when evaluating a biosketch or a progress report.

Help us rate health news reporting with citizen-science powered http://www.healthnewsrater.com

11 Jan 2012

We here at Simply Statistics are big fans of science news reporting. We read newspapers, blogs, and the news sections of scientific journals to keep up with the coolest new research.

But health science reporting, although exciting, can also be incredibly frustrating to read. Many articles have sensational titles, like “How using Facebook could raise your risk of cancer”. The articles go on to describe some research and interview a few scientists, then typically make fairly large claims about what the research means. This isn’t surprising - eye catching headlines are important in this era of short attention spans and information overload.

If just a few extra pieces of information were reported in science stories about the news, it would be much easier to evaluate whether the cancer risk was serious enough to shut down our Facebook accounts. In particular we thought any news story should report:

A link back to the original research article where the study (or studies) being described was published. Not just a link to another news story.
A description of the study design (was it a randomized clinical trial? a cohort study? 3 mice in a lab experiment?)
Who funded the study - if a study involving cancer risk was sponsored by a tobacco company, that might say something about the results.
Potential financial incentives of the authors - if the study is reporting a new drug and the authors work for a drug company, that might say something about the study too.
The sample size - many health studies are based on a very small sample size, only 10 or 20 people in a lab. Results from these studies are much weaker than results obtained from a large study of thousands of people.
The organism - Many health science news reports are based on studies performed in lab animals and may not translate to human health. For example, here is a report with the headline “Alzheimers may be transmissible, study suggests”. But if you read the story, scientists injected Alzheimer’s afflicted brain tissue from humans into mice.

So we created a citizen-science website for evaluating health news reporting called HealthNewsRater. It was built by Andrew Jaffe and Jeff Leek, with Andrew doing the bulk of the heavy lifting. We would like you to help us collect data on the quality of health news reporting. When you read a health news story on the Nature website, at nytimes.com, or on a blog, we’d like you to take a second to report on the news. Just determine whether the 6 pieces of information above are reported and input the data at HealthNewsRater.

We calculate a score for each story based on the formula:

HNR-Score = (5 points for a link to the original article + 1 point each for the other criteria)/2

The score weights the link to the original article very heavily, since this is the best source of information about the actual science underlying the story.

In a future post we will analyze the data we have collected, make it publicly available, and let you know which news sources are doing the best job of reporting health science.

Update: If you are a web-developer with an interest in health news contact us to help make HealthNewsRater better!

Older Newer

Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

A Tribute To One Of The Most Popular Methods In

Sunday Data/Statistics Link Roundup

In the era of data what is a fact?

Academics are partly to blame for supporting the closed and expensive access system of publishing

Help us rate health news reporting with citizen-science powered http://www.healthnewsrater.com