Simply Statistics A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek

The limiting reagent for big data is often small, well-curated data

I’ve been working on “big” data in genomics since I was a first year student in graduate school (a longer time than I’d rather admit). At the time, “big” meant microarray studies with a couple of hundred patients. Of course, that is now a really small drop in the pond compared to the huge sequencing data sets, like the one published recently in Nature.

Despite the exploding size of these genomic data sets, the discovery process is almost always limited by the quality and quantity of useful metadata that go along with them. In the trauma study I referenced above, the genomic data was both costly and hard to collect. But the bigger, more impressive feat was to collect the data from trauma patients at relatively precise time points after they had been injured. Along with the genomic data a host of clinical data was also collected and aligned with the genomic data.

The key insights derived from the data were the relationships between low-dimensional and high-dimensional measurements. 

This is actually relatively common:

  • In computer vision you need quality labeled images to use as a training set (this type of manual labeling is so common it forms the basis for major citizen science projects like zooniverse)
  • In genome-wide association studies you need accurate phenotypes.
  • In the analysis of social networks like the Framingham Heart Survey, you need to collect data on obesity levels, etc.

One common feature of these studies is that they are examples of what computer scientists call _supervised learning. _Most hypothesis-driven research falls into this type of study. It is important to recognize that these studies can only work with painstaking and careful collection of small data. So in many cases, the limits to the insights we can obtain from big data are imposed by how much schlep we are willing to put in to get small data.

 

 

 

Announcing the Simply Statistics Unconference on the Future of Statistics #futureofstats

Sign up here!

We here at Simply Statistics are pumped about the Statistics 2013 Future of Statistical Sciences Workshop (Nov. 11-12). It is a great time to be a statistician and discussing the future of our discipline is of utmost importance to us. In fact, we liked the idea so much that we decided to get in the game ourselves.

We are super excited to announce the first ever “Unconference” hosted by Simply Statistics. The unconference will focus on the Future of Statistics and will be held October 30th from 12-1pm EST. The unconference will be hosted on Google Hangouts and will be simultaneously live-streamed on YouTube. After the unconference is over we will maintain a recorded version for viewing on YouTube. Our goal is to compliment and continue the discussion inspired by the Statistics 2013 Workshop.

This unconference will feature some of the most exciting and innovative statistical thinkers in the field discussing their views on the future of the field and focusing on issues that affect junior statisticians the most: education, new methods, software development, collaborations with natural sciences/social sciences, and the relationship between statistics and industry.

The confirmed presenters are:

  • Daniela Witten, Assistant Professor, Department of Biostatistics, University of Washington
  • Hongkai Ji, Assistant Professor, Department of Biostatistics, Johns Hopkins University
  • Joe Blitzstein, Professor of the Practice, Department of Statistics, Harvard University
  • Sinan Aral, Associate Professor, MIT Sloan School of Management
  • Hadley Wickham, Chief Scientist, RStudio
  • Hilary Mason, Chief Data Scientist at Accel Partners

Follow us on Twitter or sign up for the Unconference at http://simplystatistics.org/unconference. In the month or so leading up to the conference we would also love to hear from you about your thoughts on the future of statistics. Let us know about your ideas on Twitter with the hashtag #futureofstats, we’ll be compiling the information and will make it available along with the talks so that you can tell us what you think the future is.

Tell your friends, tell your family, it is on!

Data Analysis in the top 9 courses in lifetime enrollment at Coursera!

Holy cow I just saw this, my Coursera class is in the top 9 by all time enrollment!

Only problem is those pesky other classes ahead of me. Help me take down Creativity, Innovation and Change (what good is all that anyway :-)by signing up here!

So you're moving to Baltimore

Editor’s Note: This post was written by Brian Caffo, occasional Simply Statistics contributor and Director of Graduate Studies in the Department of Biostatistics at Johns Hopkins. This was written primarily for incoming graduate students, but if you’re planning on moving to Baltimore anyway, feel free to use it to your advantage!

Congratulations on picking Hopkins Biostatistics for your graduate studies. Now that you’re either here or coming to to Baltimore, I’m guessing that you’ll need some start-up knowledge for this quirky, fun city. Here’s a guide of to some of my favorite Baltimore places and traditions.

Put more in the comments!

Events

First, let me discuss some sporting events that you should be aware of.  Absolutely top on the list is going to a baseball game at Camden Yards to watch the Orioles. There’s lots of games on days, nights and weekends and for the most part, tickets are easy to get and relatively cheap. Going to the (twice Super Bowl champion) NFL Ravens is a bit harder and more expensive, but well worth the splurge once during your studies. Then you can come back to your research on investigating the long term impact of football head trauma.

** **The [Editor’s Note: This post was written by Brian Caffo, occasional Simply Statistics contributor and Director of Graduate Studies in the Department of Biostatistics at Johns Hopkins. This was written primarily for incoming graduate students, but if you’re planning on moving to Baltimore anyway, feel free to use it to your advantage!

Congratulations on picking Hopkins Biostatistics for your graduate studies. Now that you’re either here or coming to to Baltimore, I’m guessing that you’ll need some start-up knowledge for this quirky, fun city. Here’s a guide of to some of my favorite Baltimore places and traditions.

Put more in the comments!

Events

First, let me discuss some sporting events that you should be aware of.  Absolutely top on the list is going to a baseball game at Camden Yards to watch the Orioles. There’s lots of games on days, nights and weekends and for the most part, tickets are easy to get and relatively cheap. Going to the (twice Super Bowl champion) NFL Ravens is a bit harder and more expensive, but well worth the splurge once during your studies. Then you can come back to your research on investigating the long term impact of football head trauma.

** **The](http://www.preakness.com/) horse race is another that’s worth going to at least once.  The Preakness takes place on a Saturday and is a very popular event; this can translate to big crowds.  If you don’t like big crowds but would like to see what all the fuss is about, you may enjoy the Black Eye Susan Stakes; this is a day of racing at Pimlico on Friday before the Preakness where the crowds are smaller, it costs $5 to get into the track and you can enjoy the celebratory atmosphere of the Preakness.  Another fun event is the Baltimore Grand Prix which happens every Labor day weekend (at least for the next few years).  Since you’re at Hopkins, try to go catch a lacrosse game. The Hopkins team is consistently among the best. If you’re a distance runner, there’s the Baltimore Marathon. Also, I hesitate to include this with sports, but I can’t get enough of the Kinetic Sculpture “Race”, the most fun Baltimore event that I can think of.  And we would be doing Hilary Parker a disservice if we failed to mention the Charm City Roller Girls!

The main non-sporting event that I like are all of the festivals. Every year, especially during the summer, every neighborhood has a festival. Honfest in Hampden is surely the one not to be missed (but there are festivals in every notable neighborhood including the Fells Point Festival) At Christmas time, there’s the Miracle on 34th Street right nearby and 36th street (the Avenue) is a fun place to go out for shopping and eating, regardless of whether Honfest is going on.  During the summer months, a local radio station sponsors “First Thursdays” where they put on a free concert series at the Washington Monument in Mt. Vernon.

Things to do during the day

Probably you’ll visit the Harbor as one of the first things you do. Make sure to hit the National Aquarium, the Visionary Arts Museum and the Maryland Science Center (not all in one day). Downtown there’s the Walters Art Museum and the Baltimore Museum of Art on the Johns Hopkins Homewood campus. Go see Fort McHenry, where Francis Scott Key wrote the National Anthem. The Museum of African American History and Culture is right near the Inner Harbor on Pratt Street.

If you’re outdoorsy, Patapsco and Gunpowder Falls appear to be good places nearby. Catoctin Park is nearby with Camp David tucked in it somewhere; you’ll know you’ve found it when the secret service tackles you. If you don’t want to travel too far, just outside the northern border of the city is Robert E. Lee park which has a nice hiking trail and a dog park. When you’re done there you can grab lunch at the Haute Dog.

** **If you have kids, the Baltimore Zoo is a really nice outdoor zoo that is a great place to go if the weather is nice. It’s in Druid Hill Park, which is also a great place to go running or biking. If you’re willing to drive an hour or more, the outdoor options are basically endless.

DC and Philly are easy day trips using the train and Annapolis is an easy drive. If you go to the DC, only schedule a few museums right near one and another, otherwise you’ll spend the whole day walking. On a nice day, the National Zoo is fantastic (and free). The MARC train goes to DC from Penn Station and is under $10 each way, but it only runs in the morning and evening. Outside of those times you can take the Amtrak train. If you drive, it’s usually about an hour one-way, depending on where you’re going.

Things to do during the night

I have little kids. How would I know? My answer is, fight about bedtime and collapse. However, if I was forced to come up with something, I would say go to Patterson Park Lanes and do Duckpin Bowling. Make sure to reserve a lane earlier on in the week if you want to go at night on a weekend.

From my outside vantage point, there appears to be tons of nightlife. The best places appear to be in upscale city areas, like Fells Point, Canton, downtown, Harbor East, Federal Hill. Also, catch a show at the [Editor’s Note: This post was written by Brian Caffo, occasional Simply Statistics contributor and Director of Graduate Studies in the Department of Biostatistics at Johns Hopkins. This was written primarily for incoming graduate students, but if you’re planning on moving to Baltimore anyway, feel free to use it to your advantage!

Congratulations on picking Hopkins Biostatistics for your graduate studies. Now that you’re either here or coming to to Baltimore, I’m guessing that you’ll need some start-up knowledge for this quirky, fun city. Here’s a guide of to some of my favorite Baltimore places and traditions.

Put more in the comments!

Events

First, let me discuss some sporting events that you should be aware of.  Absolutely top on the list is going to a baseball game at Camden Yards to watch the Orioles. There’s lots of games on days, nights and weekends and for the most part, tickets are easy to get and relatively cheap. Going to the (twice Super Bowl champion) NFL Ravens is a bit harder and more expensive, but well worth the splurge once during your studies. Then you can come back to your research on investigating the long term impact of football head trauma.

** **The [Editor’s Note: This post was written by Brian Caffo, occasional Simply Statistics contributor and Director of Graduate Studies in the Department of Biostatistics at Johns Hopkins. This was written primarily for incoming graduate students, but if you’re planning on moving to Baltimore anyway, feel free to use it to your advantage!

Congratulations on picking Hopkins Biostatistics for your graduate studies. Now that you’re either here or coming to to Baltimore, I’m guessing that you’ll need some start-up knowledge for this quirky, fun city. Here’s a guide of to some of my favorite Baltimore places and traditions.

Put more in the comments!

Events

First, let me discuss some sporting events that you should be aware of.  Absolutely top on the list is going to a baseball game at Camden Yards to watch the Orioles. There’s lots of games on days, nights and weekends and for the most part, tickets are easy to get and relatively cheap. Going to the (twice Super Bowl champion) NFL Ravens is a bit harder and more expensive, but well worth the splurge once during your studies. Then you can come back to your research on investigating the long term impact of football head trauma.

** **The](http://www.preakness.com/) horse race is another that’s worth going to at least once.  The Preakness takes place on a Saturday and is a very popular event; this can translate to big crowds.  If you don’t like big crowds but would like to see what all the fuss is about, you may enjoy the Black Eye Susan Stakes; this is a day of racing at Pimlico on Friday before the Preakness where the crowds are smaller, it costs $5 to get into the track and you can enjoy the celebratory atmosphere of the Preakness.  Another fun event is the Baltimore Grand Prix which happens every Labor day weekend (at least for the next few years).  Since you’re at Hopkins, try to go catch a lacrosse game. The Hopkins team is consistently among the best. If you’re a distance runner, there’s the Baltimore Marathon. Also, I hesitate to include this with sports, but I can’t get enough of the Kinetic Sculpture “Race”, the most fun Baltimore event that I can think of.  And we would be doing Hilary Parker a disservice if we failed to mention the Charm City Roller Girls!

The main non-sporting event that I like are all of the festivals. Every year, especially during the summer, every neighborhood has a festival. Honfest in Hampden is surely the one not to be missed (but there are festivals in every notable neighborhood including the Fells Point Festival) At Christmas time, there’s the Miracle on 34th Street right nearby and 36th street (the Avenue) is a fun place to go out for shopping and eating, regardless of whether Honfest is going on.  During the summer months, a local radio station sponsors “First Thursdays” where they put on a free concert series at the Washington Monument in Mt. Vernon.

Things to do during the day

Probably you’ll visit the Harbor as one of the first things you do. Make sure to hit the National Aquarium, the Visionary Arts Museum and the Maryland Science Center (not all in one day). Downtown there’s the Walters Art Museum and the Baltimore Museum of Art on the Johns Hopkins Homewood campus. Go see Fort McHenry, where Francis Scott Key wrote the National Anthem. The Museum of African American History and Culture is right near the Inner Harbor on Pratt Street.

If you’re outdoorsy, Patapsco and Gunpowder Falls appear to be good places nearby. Catoctin Park is nearby with Camp David tucked in it somewhere; you’ll know you’ve found it when the secret service tackles you. If you don’t want to travel too far, just outside the northern border of the city is Robert E. Lee park which has a nice hiking trail and a dog park. When you’re done there you can grab lunch at the Haute Dog.

** **If you have kids, the Baltimore Zoo is a really nice outdoor zoo that is a great place to go if the weather is nice. It’s in Druid Hill Park, which is also a great place to go running or biking. If you’re willing to drive an hour or more, the outdoor options are basically endless.

DC and Philly are easy day trips using the train and Annapolis is an easy drive. If you go to the DC, only schedule a few museums right near one and another, otherwise you’ll spend the whole day walking. On a nice day, the National Zoo is fantastic (and free). The MARC train goes to DC from Penn Station and is under $10 each way, but it only runs in the morning and evening. Outside of those times you can take the Amtrak train. If you drive, it’s usually about an hour one-way, depending on where you’re going.

Things to do during the night

I have little kids. How would I know? My answer is, fight about bedtime and collapse. However, if I was forced to come up with something, I would say go to Patterson Park Lanes and do Duckpin Bowling. Make sure to reserve a lane earlier on in the week if you want to go at night on a weekend.

From my outside vantage point, there appears to be tons of nightlife. The best places appear to be in upscale city areas, like Fells Point, Canton, downtown, Harbor East, Federal Hill. Also, catch a show at the](http://www.france-merrickpac.com/home.html) or Center Stage or any one of the many local theatres. The best places to go to movies are the Senator, Rotunda, the Charles and the Landmark at Harbor East.

** **The Baltimore Symphony is one of the top orchestras in the country and usually has interesting programs. You can usually just show up a few minutes before the concert and get a good (cheap) ticket. There’s also opera at the Lyric Opera House, but Ingo will tell you that the real stuff is in DC at the National Opera.

Things to eat

There’s too many restaurants to discuss. So, I’ll talk about some recommendations. If you have to have deli food, go to Attman’s on Lombard street. If you need authentic Chinese food, go to Hunan Taste in Catonsville. All of the Korean restaurants are just north of North Avenue on Charles; try Jong Kak.  If you’re a locavore and want to go out for a nice dinner, there’s a lot of choices. I like the Woodberry Kitchen and Clementine. If you want to break the bank, go to the Charleston, probably the fanciest restaurant in the city. Also, make sure to hit the big Farmer’s Market on Sunday at least once. The best place to go drink beer and eat crabs is LP Steamers. If you want a crab cake the size of a softball, go to Faidley’s in Lexington Market. Lexington Market is its own spectacle that you should try at least once. If you need an Italian Deli, there’s several (Mastellone’s is my favorite, but this list at least omits Isabella’s in Little Italy and Ceriello in Belvedere Square).

What you eat

You’re a Baltimoron now, so you drink Natty Boh, eat Utz Potato chips and Berger cookies. (Don’t question; this is what you do now.) In the summer, go get an egg cream snowball with marshmallow.  If you want high end local beer, I like Heavy Seas and Union Craft.  If you’re a coffee drinker, you drink Zeke’s coffee now.

Baltimore stuff

So you need to know a few things so you don’t look the fool. I’ve created a Baltimore cheat sheet. Normally I wouldn’t suggest cheating, but feel free to write this on your hand or something.

  • The O’s  are the baseball team (Orioles, named after a species of bird that lives around here); they have a rich history and are in a division with poser glamour bankroll teams: the Yankees and Red Sox. You do not like the Yankees or Red Sox now.

  • Cal Ripken Jr is a former O’s player who broke a famous record for number of consecutive games played.

  • The Ravens are the football team (named after the poem from Edgar Allan Poe see below). They have been very good for a while. There was an issue where the old team, the Baltimore Colts, left Baltimore for Indianapolis and Baltimore subsequently got Cleveland’s team and named it the Ravens. So, now you don’t like Indianapolis Colts fans and people from Cleveland don’t like you.

  • Lacrosse is a sport that exists and Hopkins is good at it.

  • Thurgood Marshall, the first black US Supreme court justice, was born here. The airport is named after him.

  • The author Edgar Allan Poe lived, worked, died and was buried here. You can go visit his grave.

  • The most famous baseball player ever, Babe Ruth, was born, grew up and started in baseball here. He really liked duckpin bowling, so the story goes.

  • Olympic swimmer Michael Phelps grew up, lives and trains here.

  • John Waters is a famous film director of cult classics is from Baltimore and the city is prominent in many of his movies.

  • HL Mencken was a celebrated intellectual and writer.

  • Frederick Douglass, the abolitionist and intellectual was born and lived near here.

  • There was a wonderfully done and controversial television program from HBO, The Wire, by David Simon, that everyone talks about around here. It’s filmed in and is about Baltimore.

  • There is a Baltimore accent, but you may miss it at first. People say hon as a term of endearment, pronounce Baltimore as Bawlmer  and Washington as Warshington, among other things. Think about all of the time you can save for research now, by omitting several pesky syllables.

  • That’s it for now. We’ll do another one on Hopkins and research in the area.

    Help needed for establishing an ASA statistical genetics and genomics section

    To promote research and education in statistical genetics and genomics, some of us in the community would like to establish a statistical genetics and genomics section of the American Statistical Association (ASA). Having an ASA section gives us certain advantages, such as having allocated invited sessions at JSM, young investigator and student awards, and senior investigator awards in statistical genetics and genomics, as well as a community to interact and exchange information.

    We need at least 100 ASA members  to pledge that they will join the section (if you are in more than 3 sections already you will be asked to pay a nominal fee of less than $10). If you are interested please fill a row in the following google doc by November 1st:

    https://docs.google.com/spreadsheet/ccc?key=0AtD3gd8kGN45dE9BZ1pTYWtCa0M2VWhKckRoUE9KLVE#gid=0