Creating the field of evidence based data analysis - do people know what a p-value looks like?
16 Oct 2014In the medical sciences, there is a discipline called “evidence based medicine”. The basic idea is to study the actual practice of medicine using experimental techniques. The reason is that while we may have good experimental evidence about specific medicines or practices, the global behavior and execution of medical practice may also matter. There have been some success stories from this approach and also backlash from physicians who don’t like to be told how to practice medicine. However, on the whole it is a valuable and interesting scientific exercise.
Roger introduced the idea of evidence based data analysis in a previous post. The basic idea is to study the actual practice and behavior of data analysts to identify how analysts behave. There is a strong history of this type of research within the data visualization community starting with Bill Cleveland and extending forward to work by Diane Cook, , Jeffrey Heer, and others.
Today we published a large-scale evidence based data analysis randomized trial. Two of the most common data analysis tasks (for better or worse) are exploratory analysis and the identification of statistically significant results. Di Cook’s group calls this idea “graphical inference” or “visual significance” and they have studied human’s ability to detect significance in the context of [In the medical sciences, there is a discipline called “evidence based medicine”. The basic idea is to study the actual practice of medicine using experimental techniques. The reason is that while we may have good experimental evidence about specific medicines or practices, the global behavior and execution of medical practice may also matter. There have been some success stories from this approach and also backlash from physicians who don’t like to be told how to practice medicine. However, on the whole it is a valuable and interesting scientific exercise.
Roger introduced the idea of evidence based data analysis in a previous post. The basic idea is to study the actual practice and behavior of data analysts to identify how analysts behave. There is a strong history of this type of research within the data visualization community starting with Bill Cleveland and extending forward to work by Diane Cook, , Jeffrey Heer, and others.
Today we published a large-scale evidence based data analysis randomized trial. Two of the most common data analysis tasks (for better or worse) are exploratory analysis and the identification of statistically significant results. Di Cook’s group calls this idea “graphical inference” or “visual significance” and they have studied human’s ability to detect significance in the context of](http://www.tandfonline.com/doi/abs/10.1080/01621459.2013.808157) and how it associates with demographics and visual characteristics of the plot.
We performed a randomized study to determine if data analysts with basic training could identify statistically significant relationships. Or as the first author put it in a tweet:
First paper just dropped! Can you tell the difference between these two plots? https://t.co/Lng0FWI0XY pic.twitter.com/zFCwwcxaAX
— Aaron Fisher (@PrfFarnsworth) October 16, 2014
What we found was that people were pretty bad at detecting statistically significant results, but that over multiple trials they could improve. This is a tentative first step toward understanding how the general practice of data analysis works. If you want to play around and see how good you are at seeing p-values we also built this interactive Shiny app. If you don’t see the app you can also go to the Shiny app page here.