Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

Statistics project ideas for students (part 2)

A little while ago I wrote a post on statistics projects ideas for students. In honor of the first Simply Statistics Coursera offering, Computing for Data Analysis, here is a new list of student projects for folks excited about trying out those new R programming skills. Again we have rated each project with my best guess difficulty and effort required. Happy computing!

Data Analysis

  1. Use city data to predict areas with the highest risk for parking tickets. Here is the data for Baltimore. (Difficulty: Moderate, Effort: Low/Moderate)
  2. If you have a Fitbit with a premium account, download the data into a spreadsheet (or get Chris’s data)  Then build various predictors using the data: (a) are you running or walking, (b) are you having a good day or not, (c) did you eat well that day or not, (d) etc. For special bonus points create a blog with your new discoveries and share your data with the world. (Difficulty: Depends on what you are trying to predict, Effort: Moderate with Fitbit/Jawbone/etc.)

Data Collection/Synthesis

  1. Make a list of skills associated with each component of the Data Scientist Venn Diagram. Then update the data scientist R function described in this post to ask a set of questions, then plot people on the diagram. Hint, check out the readline() function. (Difficulty: Moderately low, Effort:__Moderate)
  2. HealthData.gov has a ton of data from various sources about public health, medicines, etc. Some of this data is super useful for projects/analysis and some of it is just data dumps. Create an R package that downloads data from healthdata.gov and gives some measures of how useful/interesting it is for projects (e.g. number of samples in the study, number of variables measured, is it summary data or raw data, etc.) (Difficulty: Moderately hard, Effort: High)
  3. Build an up-to-date aggregator of R tutorials/how-to videos, summarize/rate each one so that people know which ones to look at for learning which tasks. (Difficulty: Low, Effort: Medium)

Tool building

  1. Build software that creates a 2-d author list and averages people’s 2-d author lists. (Difficulty: Medium, Effort: Low)
  2. Create an R package that interacts with and downloads data from government websites and processes it in a way that is easy to analyze. (Difficulty: Medium, Effort: High)

_
_