n March this year I enrolled in the open sap course ‘Getting Started with Data Science’. It was a great insight in the business value a data scientist can have and how SAP can make their life easier.
Some elements I like to point out.
What I really liked in the course is the use of the cross-industry standard process for data mining (CRISP-DM) to walk through the steps of the process. Using this methodology the data science process becomes reliable and repeatable by people with little data science background. It provides a framework for recording experience and allows projects to be replicated.
In the first phase, the business understanding phase, the goals of the project are determined. The success of the project must be described from a business perspective and a data science perspective.
The success criteria for the different data science models will differ depending on whether the models are predictive or descriptive type models and the type of algorithm chosen.
Descriptive analysis describes or summarizes raw data and makes it more interpretable. It describes the past– i.e. any point of time that an event occurred, whether it was one minute ago or one year ago. Descriptive analytics are useful because they allow us to learn from past behaviors and understand how these might influence future outcomes. Common examples of descriptive analytics are reports that provide historical insights regarding a company’s production, financials, operations, sales, finance, inventory and customers. Descriptive analytical models include cluster models, association rules, and network analysis.
Predictive analysis predicts what might happen in the future – providing estimates about the likelihood of a future outcome. One common application is the use of predictive analytics to produce a credit score. These scores are used by financial services to determine the probability of customers making future credit payments on time. Typical business uses include: understanding how sales might close at the end of the year, predicting what items customers will purchase together, or forecasting inventory levels based upon a myriad of variables. Predictive analytical models include classification models, regression models, and neural network models.
Data understanding and Data preparation, steps 2 and 3 in the process, are the most time consuming in the process and takes up for about 50% to 80% of the total time.
A friend of mine is a data scientist and he could totally agree with that. He considers this step very important to get a ‘feel’ for the data.
After the data understanding and data preparation, the modelling starts. There are a lot of models available, depending on the problem you are trying to solve.
- Classification –bivariate target variable
- Detect anomalies or outliers (data cleansing or decision support
- Regression – continuous target variable
- Forecasting with time series data
In the course, some very nice exercises were given that gave a better understanding of what it is to work as a data scientist with the SAP Predictive Analytics tool.
With this course my understanding of the work of a data scientist has increased considerable. A data scientist spends most of his/her time getting to know the data. That makes a proper working datawarehouse relevant as a source for reliable data. Also, when a model has been approved, the rules of the algorithm can then be incorporated in the BI environment to monitor the predictions and use the outcomes in an easy manner.
To conclude: SAP Predictive Analytics is a great tool for a data scientist to use and SAP Predictive Analytics can thus increase the value of the BI environment as a whole.