≡ Menu

Getting Started with Data Science

O

n March this year I enrolled in the open sap course ‘Getting Started with Data Science’. It was a great insight in the business value a data scientist can have and how SAP can make their life easier.

Some elements I like to point out.

Project methodology

What I really liked in the course is the use of the cross-industry standard process for data mining (CRISP-DM) to walk through the steps of the process. Using this methodology the data science process becomes reliable and repeatable by people with little data science background. It provides a framework for recording experience and allows projects to be replicated.

https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining

In the first phase, the business understanding phase, the goals of the project are determined. The success of the project must be described from a business perspective and a data science perspective.

The success criteria for the different data science models will differ depending on whether the models are predictive or descriptive type models and the type of algorithm chosen.

Descriptive analysis describes or summarizes raw data and makes it more interpretable. It describes the past– i.e. any point of time that an event occurred, whether it was one minute ago or one year ago. Descriptive analytics are useful because they allow us to learn from past behaviors and understand how these might influence future outcomes. Common examples of descriptive analytics are reports that provide historical insights regarding a company’s production, financials, operations, sales, finance, inventory and customers. Descriptive analytical models include cluster models, association rules, and network analysis.

Predictive analysis predicts what might happen in the future – providing estimates about the likelihood of a future outcome. One common application is the use of predictive analytics to produce a credit score. These scores are used by financial services to determine the probability of customers making future credit payments on time. Typical business uses include: understanding how sales might close at the end of the year, predicting what items customers will purchase together, or forecasting inventory levels based upon a myriad of variables. Predictive analytical models include classification models, regression models, and neural network models.

Data understanding and Data preparation, steps 2 and 3 in the process, are the most time consuming in the process and takes up for about 50% to 80% of the total time.
A friend of mine is a data scientist and he could totally agree with that. He considers this step very important to get a ‘feel’ for the data.

After the data understanding and data preparation, the modelling starts. There are a lot of models available, depending on the problem you are trying to solve.

Descriptive models:

  • Association
  • Clustering

Predictive models:

  • Classification –bivariate target variable
  • Detect anomalies or outliers (data cleansing or decision support
  • Regression – continuous target variable
  • Forecasting with time series data

In the course, some very nice exercises were given that gave a better understanding of what it is to work as a data scientist with the SAP Predictive Analytics tool.

With this course my understanding of the work of a data scientist has increased considerable. A data scientist spends most of his/her time getting to know the data. That makes a proper working datawarehouse relevant as a source for reliable data. Also, when a model has been approved, the rules of the algorithm can then be incorporated in the BI environment to monitor the predictions and use the outcomes in an easy manner.

To conclude: SAP Predictive Analytics is a great tool for a data scientist to use and SAP Predictive Analytics can thus increase the value of the BI environment as a whole.

SAP Lumira and Tableau compared

SAP Lumira and Tableau are both data visualization tools that can be used to explore data. Recently I did small projects in both tools, so I am finally able to make a comparison.

Tableau
Based on my experience of working with Tableau (version 10.3) I found that the tool is amazingly rich. There is great functionality for building graphs, especially when it comes to visualizing distributions of data. You can build nice tooltips that give additional information. Within the dashboard view it is easy to create a responsive layout, so you can use your dashboard on different devices. When I needed to figure things out, I could go online for help. For most topics I found the answer. I worked with the free online option, and that is a very generous tool already. What I also liked, is the concept that the tool just does things without prompting ‘are you sure’. That makes the feeling of the tool very quick. Of course you need to press the undo button quite often, when it turns out that you didn’t really want to do it, but you get used to that.
What I found difficult to work with was dealing with the horizontal and verticals grids. It felt that things happened at random there when I tried to place the worksheets or other components in the grids. The swearing jar got filled very quickly. Also I tried to build a story, but I didn’t find it a very useful option, since I couldn’t do a lot of customizing. Another thing, all this great stuff is sometimes hidden very well, so you need to do a lot of trial and error.

SAP Lumira
The data visualization tool SAP Lumira (version 1.31) has a nice focus on the process (data preparation – visualize – build a story). Compared with Tableau it lacks a lot of additional functionality. The basic graphs are covered, but special graphs as for example box-plot are very limited. Working with the tool does not feel quick, since there is a lot of prompting. Also when building a restricted or calculated measure this is more complicated, and less functionality is available. When it comes to online help, the SAP Community has been nearly murdered by implementing a new platform last October, since then a lot of contributors have been lost. Of course, the big advantage of SAP Lumira is in working within an SAP environment, you can benefit from better integration.

So I must confess that I have become a big fan of Tableau. SAP will deliver a new Lumira release in the coming months, it will be interesting to see if they have been able to bridge the gap.

To conclude: SAP Lumira is a nice tool to do the basics in data visualization, but Tableau is the tool to use when you need rich functionality.

HERUG 2017

HERUG2017 Day 5, Amsterdam, 13 April 2017

Last month (April 2017) I had the pleasure to attend the Higher Education & Research User Group (HERUG) conference in Amsterdam.

I was also an presenter at two sessions. On Tuesday I was presenting together with Pieter-Jan Aartsen of the UvA about the project we did with the implementation of the IMR.
Partly this has been described in this blog.

On Wednesday I had the pleasure to be presenting together with Masood Nazir of the UvA about the project we did concerning the implementation of the evaluation reports.

Both of the presentations can be downloaded from the Herug site: Developing management dashboards for University of Amsterdam and Quality evaluation reports with SAP Design Studio.

I also visited some interesting sessions myself and got in touch with useful contacts.

To conclude: It was a very interesting conference.

Integration of SAP Lumira in an existing environment

SAP Lumira is a great tool with a lot of possibilities. However, when it comes to the integration in an existing environment, there remain a few inconveniences.

Imagine an existing environment where several connected applications have been built with SAP Design Studio, based on SAP BW, which is reached through a SAP Netweaver portal, using single sign on.

Ideally, users should be able to work with Lumira desktop and easily access BW BEx query data, for which they are authorized already anyway. Unfortunately, single sign on is not supported in this case(!), thus Business Analyst are required to log on to retrieve the data. It is possible to make a connection to HANA views, single sign on supported, but HANA views need to be built and maintained as well, and require sometimes additional license fee.

Thus if you want to use your SAP Businesswarehouse with SAP Lumira, what are the options?
1. Give users passwords they can use to log on to the system (but they have already been spoiled with singe sign on and will have to do this every time from now on)
2. No access to BW query’s directly, so they can only use downloaded excel data, and data cannot be refreshed automatically

So no great choiches there.

Then when the business analyst has finished his story, based on several datasources, and wants to share his report, another challenge arises. To be able to share the report, he/she must save it on the BusinessObjects Enterprise platform, for which single sign on is not supported either. Also, the BusinessObjects platform is not the nicest looking environment, so in general it is better to keep business users away there.

An solution for the latter is to let the consuming business users access the report through an link. This can be organized for example using a consistent naming convention for the reports. But unfortunately these cannot be managed easily.

At the moment, SAP is working on Lumira 2.0. The vision of SAP is to have a better integration between SAP Design Studio and Lumira desktop. This sounds very promising. Hopefully single sign on will be working then as well in all the environments.

So for now the question is what to do. For the users to make better use of the datawarehouse, a tool as Lumira is essential. Unfortunately, the current version is not benefitting from the potential unique selling point it can have over its competitors, namely perfect integration within the platform.

To conclude: great visualizations and analyses can be done with Lumira desktop, but perfect integration on the BusinessObjects Enterprise platform has not been realized yet.

This week I brought a report live that uses the International Business Communication Standards (IBCS) as the guideline in presenting information. Although this report can only be considered as a small baby step, still the benefits of using standards are clear.

In the begin of this year I was at the book launch of the book: SAP BO Design Studio – The Comprehensive Guide. At this book launch, Dr Rolf Hichert presented his SUCCESS formula on IBCS. He didn’t send his audience away empty handed, everybody got his poster with the SUCCESS principles explained. For me, the best place to have a poster like this is at the project site since it can be inspiring.

Then after the summer, a change to the quarterly management reports is required. With a glance at the SUCCESS poster, this is an opportunity to standardize the look and feel of this report in a consistent language using IBCS as guideline.

The end result of the management report is in the form of a word document. So very pragmatic it was decided that the datawarehouse will deliver per chapter the mandatory tables and graphs in an automated way, formatted according to the IBCS guidelines. The users will use a screen capture tool to copy and paste them in the Word document and will then add commentary and explanation to the figures in the Word document.

With a strict deadline for this 3rd Quarter report, a small project team was formed, and priorities were set.

The steps to deliver this report were basically the same as any other report. In addition, I made a mock up in Word and Excel following the IBCS guidelines , to see how it would turn out. I also investigated the use of the graphomate add on, which is a very beautiful tool to use, but in this stage not feasible to procure. So in the end I used the standard graph components of Design Studio, and a SDK  customizable table component to meet the special formatting requirements.

The IBCS guidelines had a positive impact on speeding up the process. Especially in the design phase, in the translation of the list of required topics into graphs and tables, it really helped to have clear guidelines.

So far, the reactions in general have been positive. The report has been brought live last week, and will be used in the coming period more intensively. The end product will be used in the quarterly management meetings. Hopefully these small steps in using IBCS guidelines will be appreciated and embraced by the organization.

To conclude: with limited amount of time and resources, a consistent and clear management report was build using the IBCS guidelines