Podcast #6: Data Analysis MOOC Post-mortem
25 Mar 2013Jeff and I talk about Jeff’s recently completed MOOC on Data Analysis.
Jeff and I talk about Jeff’s recently completed MOOC on Data Analysis.
I am in the process of uploading the video lectures for Data Analysis. I am getting ready to send out the course wrap-up email and I wanted to include the link to the Youtube playlist as well.
Unfortunately, Youtube keeps reporting that a pair of the videos in week 2 are duplicates. This is true despite them being different lengths (12:15 vs. 16:58), having different titles, and having dramatically different content. I [I am in the process of uploading the video lectures for Data Analysis. I am getting ready to send out the course wrap-up email and I wanted to include the link to the Youtube playlist as well.
Unfortunately, Youtube keeps reporting that a pair of the videos in week 2 are duplicates. This is true despite them being different lengths (12:15 vs. 16:58), having different titles, and having dramatically different content. I](http://productforums.google.com/forum/#!topic/youtube/Yc7hHqwtBX0) on the forums:
YouTube uses a checksum to determine duplicates. The chances of having two different files containing different content but have the same checksum would be astronomical.
That isn’t on the official Google documentation page, which is pretty sparse, but is the only description I can find of how Youtube checks for duplicate content. A checksum is a function you apply to the data from a video that (ideally) with high probability will yield different values when different videos are uploaded and the same value when the same video is uploaded. One possible checksum function could be the length of the video. Obviously that won’t work in general because many videos might be 2 minutes exactly.
Regardless, it looks like Youtube can’t distinguish my lecture videos. I’m thinking Vimeo or something else if I can’t get this figured out. Of course, if someone has a suggestion (short of re-exporting the videos from Camtasia) that would allow me to circumvent this problem I’d love to hear it!
Update: I ended up fiddling with the videos and got them to upload. Thanks to the helpful comments!
David Madigan sends the following. It looks like a really interesting place to submit papers for both statisticians and data scientists, so submit away!
Statistical Analysis and Data Mining, An American Statistical Association Journal
Call for PapersSpecial Issue on Observational Healthcare DataGuest Editors: Patrick Ryan, J&J and Marc Suchard, UCLADue date: July 1, 2013Data sciences is the rapidly evolving field that integratesmathematical and statistical knowledge, software engineering and large-scale data management skills, and domain expertise to tackle difficult problems that typically cannot be solved by any one discipline alone. Some of the most difficult, and arguably most important, problems exist in healthcare. Knowledge about human biology has exponentially advanced in the past two decades with exciting progress in genetics, biophysics, and pharmacology. However, substantial opportunities exist to extend the evidence base about human disease, patient health and effects of medical interventions and translate knowledge into actions that can directly impact clinical care. The emerging availability of 'big data' in healthcare, ranging from prospective research with aggregated genomics and clinical trials to observational data from administrative claims and electronic health records through social media, offer unprecedented opportunities for data scientists to contribute to advancing healthcare through the development, evaluation, and application of novel analytical solutions to explore these data to generate evidence at both the patient and population level. Statistical and computational challenges abound andmethodological progress will draw on fields such as data mining,epidemiology, medical informatics, and biostatistics to name but afew. This special issue of Statistical Analysis and Data Mining seeks to capture the current state of the art in healthcare data sciences. We welcome contributions that focus on methodology for healthcare data and original research that demonstrates the application of data sciences to problems in public health.