If you are in technical sales or trying to break into Big Data or Data Science this post is definitely a must-read for you. It is based on real-life examples, research and analysis from a practitioner stand-point..
There is a sad but realistic truth about Data Science (DS henceforth) and Big Data (BD). There is an unfortunate dichotomy in this space: Those who are on the inside track (the Insiders), and those who are trying to get in (theOutsiders). In the world of the “Insiders” there exists a persistent message which garnered support amongst employers which amounts to touting the Outsiders and raising the entry barrier. How? By raising the number of skills required. I came up with a list of must-have skills for a BD or a DS role based on my own analytics exercise on hundreds of DS/BD job descriptions I gathered from a web scraping. So guess how many skills are required? Twenty two! Setting the bar at least at twenty two skills, and counting, is ridiculously unrealistic!
The will-be DS/BD game changer happened quietly with not much coverage in the tech media world. This was Microsoft’s acquisition of Revolution Analytics. So, how does the Microsoft acquisition have to do with Outsiders wanting to get into Data Science? A Lot!! Let me explain.
The one fundamental skill which is a must-have in DS is Statistics! And what does the DS community use as a Statistics programming language? R!.Here is where the story gets interesting.
A branch of DS is Machine Learning or Predictive Analysis. This is the true value-add for any BD initiative: to be able to make a data-driven decision about future strategy.
To do this in R, the very popular Analytics language, you need to go through training, etc, become an expert before you can attempt to get close to solving problems. R is archaic. It is a reincarnation of languages called S and S-Plus and it is 40-years old.
The great news is that Microsoft just acquired Revolution Analytics (hence the post title). Revolution R (a product of Revolution Analytics) is known for enabling parallel processing of R resulting in massive performance gains.
Today, R is supported in Azure Machine Learning in its current format. So why does this matter? The reason is very simple. There is talk that Microsoft will probably take R, modernize it and make it a first class language, add it to Visual Studio with Intellisense support and allow everyone to develop Azure ML solution in Visual Studio ready to be published directly to Azure. The net effect:
- It will be nothing short of a coup for the Outsiders in the DS/BD field who can now develop solutions without the need for the massive R learning curve cost.
- Azure ML already provides a lot of the R functionality today out-of-the-box so it’s a natural extension to the existing functionality
- Azure ML makes it very simple to share code, models, and common problem solutions thus:
- I will get my solutions at 10X time-to-market
- I can profit from my work if I choose to publish it in the Azure Gallery
- It will change the market dynamics as it will increase the short supply of talent of BD/DS
- And will unleash the genius of statisticians, business users with minimal programming experience to conduct their own experiments.
- Coupled with HDInsight which has access to Analytics APIs, this is the more efficient BD solution on the market today
What’s more: Keep an eye out on the space. As I mentioned, if you are in Microsoft technical sales, this will be your ticket to moving large enterprises onto Azure with a strategy rather than just a plain tactical need!! If you are evaluating DS technology today, you probably know very little about Azure ML and the functionality provided. I would highly encourage to evaluate Azure ML before heading to a Cloudera or MapR or any other Hadoop vendors. You will not be disappointed!
I am a freelance consultant with over 24 years of experience in IT, strategy and Economics. I specialize in Cloud, Data Architecture and DS, Machine Learning, corporate strategy and provide architectural consulting, training, technical research, data-driven decision making solutions based on economics-based statistical methods grounded in scientific frameworks.
The 22 Skills List of Data Science:
- R Programming
- Getting and Cleaning Data
- Exploratory Data Analysis
- Reproducible Research
- Statistical Inference
- Regression Models
- Practical Machine Learning
- Developing Data Products
- Data Visualization
- Hadoop (including Azure HD Insight technology stack
- Orchestrate data workflows
- Data ingestion/curation using Pig, Hive, Sqoop or other Hadoop tools
- Hadoop cluster configuration using Hadoop big data architecture
- High-level design using Business Analysis, Microsoft Azure Platform Knowledge, Blob Storage API Knowledge
- Blob Storage API Knowledge
- Metadata management tool.
- Model client data
- Data profiling – Information analyzer/Excel preferred
- Decide how data is going to be used to make decisions, and
- Knowledge of both tools and methods from statistics, machine learning, software engineering, as well as being human and show persistence