“Data scientist” is one of the buzzwords in the running for rebranding applied statistics mixed with some computing. David Champagne, over at Revolution Analytics, described the skills for being a data scientist with a Venn Diagram. Just for fun, I wrote a little R function for determining where you land on the data science Venn Diagram. Here is an example of a plot the function makes using the Simply Statistics bloggers as examples.
The code can be found here. You will need the png and klaR R packages to run the script. You also need to either download the file datascience.png or be connected to the internet.
Here is the function definition:
dataScientist(names=c(“D. Scientist”),skills=matrix(rep(1/3,3),nrow=1), addSS=TRUE, just=NULL)
So how do you define your skills? Here is how it works:
If you are an academic
You calculate your skills by adding papers in journals. The classification scheme is the following:
Some journals are general, like Nature, Science, the Nature sub-journals, PNAS, and PLoS One. For papers in those journals, assess which of the areas the paper falls in by determining the main contribution of the paper in terms of the non-academic classification below.
If you are a non-academic
Since papers aren’t involved, determine the percent of your time you spend on the following things:
Enjoy!
…we are in the process of changing themes. The spammers got to us in the notes. I tried to fix the html and that didn’t go so well. New theme up shortly.
Update: Done! We are back in business - minus the spammers.
Application programming interfaces (APIs) are tools that are built by companies/governments/organizations to allow software engineers to interact with their websites. One of the main uses of these APIs is to allow software engineers to build apps on top of Facebook/Twitter/etc. Many APIs are really helpful for statisticians/data scientists as well. Using APIs, it is generally very easy to collect large amounts of interesting data. Here are some examples of APIs (you may need to sign up for accounts to get access to some of these). They vary in how easy/useful it is to obtain data from them. If people know of other good ones, I’d love to see them in the comments.
Web 2.0
Publishing
Government