Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Sunday data/statistics link roundup (6/17)

Happy Father’s Day!

  1. A really interesting read on randomized controlled trials (RCTs) and public policy. The examples in the boxes are fantastic. This seems to be one of the cases where the public policy folks are borrowing ideas from Biostatistics, which has been involved in randomized controlled trials for a long time. It’s a cool example of adapting good ideas in one discipline to the specific challenges of another. 
  2. Roger points to this link in the NY Times about the “Consumer Genome”, which basically is a collection of information about your purchases and consumer history. On Twitter, Leonid K. asks: ‘Since when has “genome” becaome a generic term for “a bunch of information”?’. I completely understand the reaction against the “genome of x”, which is an over-used analogy. I actually think the analogy isn’t that unreasonable; like a genome, the information contained in your purchase/consumer history says something about you, but doesn’t tell the whole picture. I wonder how this information could be used for public health, since it is already being used for advertising….
  3. This PeerJ journal looks like it has the potential to be good.  They even encourage open peer review, which has some benefits. Not sure if it is sustainable, see for example, this breakdown of the costs. I still think we can do better.  
  4. Elon Musk is one of my favorite entrepreneurs. He tackles what I consider to be some of the most awe-inspiring and important problems around. This article about the Tesla S got me all fired up about how a person with vision can literally change the fuel we run on. Nothing to do with statistics, other than I think now is a similarly revolutionary time for our discipline. 
  5. There was some interesting discussion on Twitter of the usefulness of the Yelp dataset I posted for academic research. Not sure if this ever got resolved, but I think more and more as data sets from companies/startups become available, the terms of use for these data will be critical. 
  6. I’m still working on Roger’s puzzle from earlier this week. 

Statisticians, ASA, and Big Data

Today I got my copy of Amstat News and eagerly opened it before I realized it was not the issue with the salary survey….

But the President’s Corner section had the following column on big data by ASA president Robert Rodriguez.

Big Data is big news. It is the focus of stories in The New York Times and the subject of technology blogs, business forums, and economic studies. This column describes how statisticians can prepare for opportunities in Big Data and explains the distinctive value our profession can provide.

Here’s a homework assignment for you all: Please read the column and explain what’s wrong with it. I’ll post the answer in a (near) future post.

Poison gas or...air pollution?

From our Beijing bureau, we have the following message from the U.S. embassy that was recently issued to U.S. citizens in China:

The Embassy has received reports from U.S. citizens living and traveling in Wuhan that the air quality in the city has been particularly poor since yesterday morning.  On June 11 at 16:20, the Wuhan Environmental Protection Administrative Bureau posted information about this on its website.  Below is a translation of that information:

“Beginning on June 11, 2012 around 08:00 AM, the air quality inside Wuhan appeared to worsen, with low visibility and burning smells. According to city air data, starting at 07:00 AM this morning, the density of the respiratory particulate matter increased in the air downtown; it increased quickly after 08:00 AM.  The density at 14:00 approached 0.574mg/m3, a level that is deemed “serious” by national standards.  An analysis of the air indicates the pollution is caused from burning of plant material northeast of Wuhan.

It’s not immediately clear which pollutant they’re talking about, but it’s probably PM10 (particulate matter less than 10 microns in aerodynamic diameter). If so, that level is quite high—U.S. 24-hour average standards are at 0.15 mg/m3 (note that the reported level was an hourly level). 

Our investigation of downtown’s districts, and based on reports from all of Wuhan’s large industrial enterprises, have determined that that there has not been any explosion, sewage release, leakage of any poisoning gas, or any other type of urgent environmental accident from large industrial enterprises.  Nor is there burning of crops in the new city area.  News spread online of a chlorine leak from Qingshan or a boiler explosion at Wuhan Iron and Steel Plant are rumors.

So, this is not some terrible incident, it’s just the usual smell. Good to know.

According to our investigation, the abnormal air quality in our city is mainly caused by the burning of the crops northeast of Wuhan towards Hubei province.  Similar air quality is occurring in Jiangsu, Henan and Anhui provinces, as well as in Xiaogan, Jingzhou, Jingmen and Xiantao, cities nearby Wuhan.

The weather forecast authority of the city has advised that recent weather conditions have not been good for the dispersion of pollutants.”

The embassy goes on to warn:

U.S. citizens are reminded that air pollution is a significant problem in many cities and regions in China.  Health effects are likely to be more severe for sensitive populations, including children and older adults.  While the quality of air can differ greatly between cities or between urban and rural areas, U.S. citizens living in or traveling to China may wish to consult their doctor when living in or prior to traveling to areas with significant air pollution.

Big Data Needs May Create Thousands Of Tech Jobs

Big Data Needs May Create Thousands Of Tech Jobs

Green: E.P.A. Soot Rules Expected This Week

Green: E.P.A. Soot Rules Expected This Week