The landscape of data analysis
10 Jan 2013I have been getting some questions via email, LinkedIn, and Twitter about the content of the Data Analysis class I will be teaching for Coursera. Data Analysis and Data Science mean different things to different people. So I made a video describing how Data Analysis fits into the landscape of other quantitative classes here:
Here is the corresponding presentation. I also made a tentative list of topics we will cover, subject to change at the instructor’s whim. Here it is:
- The structure of a data analysis (steps in the process, knowing when to quit, etc.)
- Types of data (census, designed studies, randomized trials)
- Types of data analysis questions (exploratory, inferential, predictive, etc.)
- How to write up a data analysis (compositional style, reproducibility, etc.)
- Obtaining data from the web (through downloads mostly)
- Loading data into R from different file types
- Plotting data for exploratory purposes (boxplots, scatterplots, etc.)
- Exploratory statistical models (clustering)
- Statistical models for inference (linear models, basic confidence intervals/hypothesis testing)
- Basic model checking (primarily visually)
- The prediction process
- Study design for prediction
- Cross-validation
- A couple of simple prediction models
- Basics of simulation for evaluating models
- Ways you can fool yourself and how to avoid them (confounding, multiple testing, etc.)
Of course that is a ton of material for 8 weeks and so obviously we will be covering just the very basics. I think it is really important to remember that being a good Data Analyst is like being a good surgeon or writer. There is no such thing as a prodigy in surgery or writing, because it requires long experience, trying lots of things out, and learning from mistakes. I hope to give people the basic information they need to get started and point to resources where they can learn more. I also hope to give them a chance to practice a couple of times some basics and to learn that in data analysis the first goal is to “do no harm”.