≡ Menu

Tangible benefits of a good datawarehouse – part 2

T

he benefits of a datawarehouse are not easily quantifiable. Therefore I like to point them out when there is an opportunity, see also Tangible benefits of a good datawarehouse.

Last week the UvA received a positive quality assessment of the NVAO (the accreditation organization tasked with providing an expert and objective assessment of the quality of higher education in the Netherlands and Flanders).

In their review, the committee expressed its admiration for the way in which the UvA monitors its quality assurance using UvAData (the datawarehouse) and UvAQ (which reports are also build and distributed using the datawarehouse). For the full press release, click here.

With this positive quality assessment, the UvA saves a lot of time and effort that otherwise needs to be spend in order to thoroughly asses every program individually.

In conclusion: A good datawarehouse saves time and effort and has good business value.

Robots

One of the topics that is getting more attention lately is ‘Robotics Process Automation’ (RPA). At Tacstone we build a new proposition on this service. In essence this is a new software service that allows to automate processes, without having to modify the existing software systems. The software can logon, simulate keyboard- and mouse input, and reads information from the screen. This service is most useful for processes that are highly repetitive and with a relative high volume. In this blog I like to explain a use case we implemented a few years ago, based on a similar principle.

In our case, we wanted to make beautiful reports which we then could distribute as pdf. The tool we wanted to work with, SAP Design Studio (SAP Lumira Designer these days), allows for making these beautiful reports. Unfortunately, the standard function to print to pdf is very limited, and at the time in 2014, Design Studio 1.2, it was not possible to automate that as well.

To solve the problem of properly printing, we developed an SDK component, that could handle this in the desired way. With this SDK component, it was possible to push a print button, and the default ‘print to pdf’ from you browser shows up, with the lay-out just right. All you need to do now is ‘save as pdf’.

This was already a huge progress. At the time we thought that since scheduling is a basic functionally that SAP had built in the past for their Bex tools, SAP will build this in the new tools as well. Considering this, we might as well have an intern or something, manually running these reports and saving them as pdf. On average the intern can do maybe 20 in an hour, so that might be ok. So there we had a fallback scenario to continue on this route.

But of course this sounds like a horrible job, so we tried to figure out a way to automate this. A long story short, with the use of PhantomJS, Java and ABAP, we managed to build a robot that was able to mimic this human interface, without having to wait for standard SAP functionality.

At the moment, the robot has been in production for more than three years, and has produced more than 80.000 pdf reports, and is still going strong. Had we had an intern doing this manually, it would have taken this person more than 4000 working hours! Making this robot was therefore a sound investment. Also, the standard functionality SAP is offering about scheduling is, not as sophisticated as we need. So although the robot started out as a temporary solution, it is becoming pretty permanent.

In conclusion: A robot to replace your human interface is worthwhile considering!

Scrum please!

Last week I had the pleasure of attending a Scrum Master training, given by Zilverline. I have to admit, I was skeptical at first about Scrum. I am a trained Prince2 Practitioner, so I figured, this is probably “just old wine in new bags”, as we say in Holland.

And in a way that is the case of course. But still, the principles of Scrum are very attractive. Especially since I realized I was already working very agile. This has mainly to do with the environment and product I work with. In the environment I am a trusted resource, that has quite a lot of freedom to put product increments in production. The product owner is “a man with a vision” who can look at the bigger picture. And the product is BI, where quick prototyping is needed to demonstrate the potential of a new report, and implementation time from idea to product can be fast.

What I liked about Scrum is the limitation of overhead. 4 meetings, 3 roles, 2 lists. Compared to Prince2 that is a lot less overhead to maintain. The roles are very limited and clear. The role of the Product owner makes a lot of sense, especially since this role is carried by one person.

The Planning Poker was something I like very much as well. I think this can be a good teambuilding exercise, and also, it focuses on the principle that estimating is very difficult, and should be regarded as an exercise that should not be the main goal.

It reminds me of a quote from Terry Pratchett, Going Postal:
“Mr. Pony struggled manfully with the engineer’s permanent dread of having to commit himself to anything, and managed, “Well, if we don’t lose too many staff, and the winter isn’t too bad, but of course there’s always—”

So in that sense, if you do not have to commit completely, but just go for an order of magnitude, you get quickly a good enough estimation of the workload. And you can start the work!

In my current project, we could do with a bit more structure and additional teambuilding is necessary to tear down the artificial organizational boundaries. I am convinced Scrum will be helpful in achieving that goal.

I passed the test, so I can now call myself Scrum Master. I do feel like a sage!

To conclude: Scrum please!

Enterprise deep learning with TensorFlow

An interesting course offered by openSAP is Enterprise deep learning with TensorFlow, which is currently running in its last week. I found this a great insight in the current state of machine learning possibilities.

It was a very hands-on training where it was possible to play with TensorFlow Applications, an open-source library for numerical computation. For SAP, TensorFlow is a key element in the SAP Leonardo Machine Learning architecture. With SAP Leonardo, SAP aims to make machine learning euasy to use for businesses.

Deep learning is a sub-field of neural networks, machine learning, and artificial intelligence. It is inspired by the architecture of the human brain and consists of neural networks with many layers.

Deep learning is a promising approach when:

  • there is a large amount of training data available
  • it concerns solving an image/audio/natural language problem
  • the raw input data has little structure and it is needed for the model to learn meaningful representations (e.g., pixels in an image)

One of the topics in the course was about convolutional networks. Convolutional networks are used to classify objects on pictures. The complexity to do this is enormous, but with combining several techniques and doing smart optimizations it becomes possible.

Also some examples of use cases were given. One of them was a Medical Image Segmentation with Fully-Convolutional Networks. In this example images retrieved from an MRI scanner are processed with a fully convolutional network to construct a new image that points out possible cancer cells.

I found the explanation on how to deal with unsupervised and reinforcement learning very informative as well. To explain:
Machine learning applications fall into three broad contexts:

  • Supervised learning; in this case there is dataset with labels or annotations. Usually this dataset is not too big, because it is costly to label all the data. Most machine learning is done with these.
  • Unsupervised learning; in this case there is a data without labels or annotations. Typically this data is generated with machines or software, in an internet of things kind of way. With machine learning there are techniques to identify anomalies and outliers of the data. Making good use of this data. An example can be a financial pattern that is monitored. When an anomaly occurs, this can be due to fraud.
  • Reinforcement learning; In reinforcement learning there is no initial dataset. The dataset is accumulated with experience. The machine learning agents interact with the environment in an trial an error kind of way. An example is a robot learning a task. It performs actions, and when the action is correct it is rewarded, when the action is incorrect, there is no reward and a penalty.

Another inspiring example was the generating of new images using GANs (Generative Adversarial Networks). In this example a generator generates images, and this is combined with an discriminator that determines if the pictures is a real or fake image (i.e. blurry). This approach gives impressive results.

To conclude: another very inspiring course from the Open SAP learning environment. Very useful machine learning techniques for businesses were presented.

Getting Started with Data Science

O

n March this year I enrolled in the open sap course ‘Getting Started with Data Science’. It was a great insight in the business value a data scientist can have and how SAP can make their life easier.

Some elements I like to point out.

Project methodology

What I really liked in the course is the use of the cross-industry standard process for data mining (CRISP-DM) to walk through the steps of the process. Using this methodology the data science process becomes reliable and repeatable by people with little data science background. It provides a framework for recording experience and allows projects to be replicated.

https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining

In the first phase, the business understanding phase, the goals of the project are determined. The success of the project must be described from a business perspective and a data science perspective.

The success criteria for the different data science models will differ depending on whether the models are predictive or descriptive type models and the type of algorithm chosen.

Descriptive analysis describes or summarizes raw data and makes it more interpretable. It describes the past– i.e. any point of time that an event occurred, whether it was one minute ago or one year ago. Descriptive analytics are useful because they allow us to learn from past behaviors and understand how these might influence future outcomes. Common examples of descriptive analytics are reports that provide historical insights regarding a company’s production, financials, operations, sales, finance, inventory and customers. Descriptive analytical models include cluster models, association rules, and network analysis.

Predictive analysis predicts what might happen in the future – providing estimates about the likelihood of a future outcome. One common application is the use of predictive analytics to produce a credit score. These scores are used by financial services to determine the probability of customers making future credit payments on time. Typical business uses include: understanding how sales might close at the end of the year, predicting what items customers will purchase together, or forecasting inventory levels based upon a myriad of variables. Predictive analytical models include classification models, regression models, and neural network models.

Data understanding and Data preparation, steps 2 and 3 in the process, are the most time consuming in the process and takes up for about 50% to 80% of the total time.
A friend of mine is a data scientist and he could totally agree with that. He considers this step very important to get a ‘feel’ for the data.

After the data understanding and data preparation, the modelling starts. There are a lot of models available, depending on the problem you are trying to solve.

Descriptive models:

  • Association
  • Clustering

Predictive models:

  • Classification –bivariate target variable
  • Detect anomalies or outliers (data cleansing or decision support
  • Regression – continuous target variable
  • Forecasting with time series data

In the course, some very nice exercises were given that gave a better understanding of what it is to work as a data scientist with the SAP Predictive Analytics tool.

With this course my understanding of the work of a data scientist has increased considerable. A data scientist spends most of his/her time getting to know the data. That makes a proper working datawarehouse relevant as a source for reliable data. Also, when a model has been approved, the rules of the algorithm can then be incorporated in the BI environment to monitor the predictions and use the outcomes in an easy manner.

To conclude: SAP Predictive Analytics is a great tool for a data scientist to use and SAP Predictive Analytics can thus increase the value of the BI environment as a whole.

SAP Lumira and Tableau compared

SAP Lumira and Tableau are both data visualization tools that can be used to explore data. Recently I did small projects in both tools, so I am finally able to make a comparison.

Tableau
Based on my experience of working with Tableau (version 10.3) I found that the tool is amazingly rich. There is great functionality for building graphs, especially when it comes to visualizing distributions of data. You can build nice tooltips that give additional information. Within the dashboard view it is easy to create a responsive layout, so you can use your dashboard on different devices. When I needed to figure things out, I could go online for help. For most topics I found the answer. I worked with the free online option, and that is a very generous tool already. What I also liked, is the concept that the tool just does things without prompting ‘are you sure’. That makes the feeling of the tool very quick. Of course you need to press the undo button quite often, when it turns out that you didn’t really want to do it, but you get used to that.
What I found difficult to work with was dealing with the horizontal and verticals grids. It felt that things happened at random there when I tried to place the worksheets or other components in the grids. The swearing jar got filled very quickly. Also I tried to build a story, but I didn’t find it a very useful option, since I couldn’t do a lot of customizing. Another thing, all this great stuff is sometimes hidden very well, so you need to do a lot of trial and error.

SAP Lumira
The data visualization tool SAP Lumira (version 1.31) has a nice focus on the process (data preparation – visualize – build a story). Compared with Tableau it lacks a lot of additional functionality. The basic graphs are covered, but special graphs as for example box-plot are very limited. Working with the tool does not feel quick, since there is a lot of prompting. Also when building a restricted or calculated measure this is more complicated, and less functionality is available. When it comes to online help, the SAP Community has been nearly murdered by implementing a new platform last October, since then a lot of contributors have been lost. Of course, the big advantage of SAP Lumira is in working within an SAP environment, you can benefit from better integration.

So I must confess that I have become a big fan of Tableau. SAP will deliver a new Lumira release in the coming months, it will be interesting to see if they have been able to bridge the gap.

To conclude: SAP Lumira is a nice tool to do the basics in data visualization, but Tableau is the tool to use when you need rich functionality.

HERUG 2017

HERUG2017 Day 5, Amsterdam, 13 April 2017

Last month (April 2017) I had the pleasure to attend the Higher Education & Research User Group (HERUG) conference in Amsterdam.

I was also an presenter at two sessions. On Tuesday I was presenting together with Pieter-Jan Aartsen of the UvA about the project we did with the implementation of the IMR.
Partly this has been described in this blog.

On Wednesday I had the pleasure to be presenting together with Masood Nazir of the UvA about the project we did concerning the implementation of the evaluation reports.

Both of the presentations can be downloaded from the Herug site: Developing management dashboards for University of Amsterdam and Quality evaluation reports with SAP Design Studio.

I also visited some interesting sessions myself and got in touch with useful contacts.

To conclude: It was a very interesting conference.

Integration of SAP Lumira in an existing environment

SAP Lumira is a great tool with a lot of possibilities. However, when it comes to the integration in an existing environment, there remain a few inconveniences.

Imagine an existing environment where several connected applications have been built with SAP Design Studio, based on SAP BW, which is reached through a SAP Netweaver portal, using single sign on.

Ideally, users should be able to work with Lumira desktop and easily access BW BEx query data, for which they are authorized already anyway. Unfortunately, single sign on is not supported in this case(!), thus Business Analyst are required to log on to retrieve the data. It is possible to make a connection to HANA views, single sign on supported, but HANA views need to be built and maintained as well, and require sometimes additional license fee.

Thus if you want to use your SAP Businesswarehouse with SAP Lumira, what are the options?
1. Give users passwords they can use to log on to the system (but they have already been spoiled with singe sign on and will have to do this every time from now on)
2. No access to BW query’s directly, so they can only use downloaded excel data, and data cannot be refreshed automatically

So no great choiches there.

Then when the business analyst has finished his story, based on several datasources, and wants to share his report, another challenge arises. To be able to share the report, he/she must save it on the BusinessObjects Enterprise platform, for which single sign on is not supported either. Also, the BusinessObjects platform is not the nicest looking environment, so in general it is better to keep business users away there.

An solution for the latter is to let the consuming business users access the report through an link. This can be organized for example using a consistent naming convention for the reports. But unfortunately these cannot be managed easily.

At the moment, SAP is working on Lumira 2.0. The vision of SAP is to have a better integration between SAP Design Studio and Lumira desktop. This sounds very promising. Hopefully single sign on will be working then as well in all the environments.

So for now the question is what to do. For the users to make better use of the datawarehouse, a tool as Lumira is essential. Unfortunately, the current version is not benefitting from the potential unique selling point it can have over its competitors, namely perfect integration within the platform.

To conclude: great visualizations and analyses can be done with Lumira desktop, but perfect integration on the BusinessObjects Enterprise platform has not been realized yet.

This week I brought a report live that uses the International Business Communication Standards (IBCS) as the guideline in presenting information. Although this report can only be considered as a small baby step, still the benefits of using standards are clear.

In the begin of this year I was at the book launch of the book: SAP BO Design Studio – The Comprehensive Guide. At this book launch, Dr Rolf Hichert presented his SUCCESS formula on IBCS. He didn’t send his audience away empty handed, everybody got his poster with the SUCCESS principles explained. For me, the best place to have a poster like this is at the project site since it can be inspiring.

Then after the summer, a change to the quarterly management reports is required. With a glance at the SUCCESS poster, this is an opportunity to standardize the look and feel of this report in a consistent language using IBCS as guideline.

The end result of the management report is in the form of a word document. So very pragmatic it was decided that the datawarehouse will deliver per chapter the mandatory tables and graphs in an automated way, formatted according to the IBCS guidelines. The users will use a screen capture tool to copy and paste them in the Word document and will then add commentary and explanation to the figures in the Word document.

With a strict deadline for this 3rd Quarter report, a small project team was formed, and priorities were set.

The steps to deliver this report were basically the same as any other report. In addition, I made a mock up in Word and Excel following the IBCS guidelines , to see how it would turn out. I also investigated the use of the graphomate add on, which is a very beautiful tool to use, but in this stage not feasible to procure. So in the end I used the standard graph components of Design Studio, and a SDK  customizable table component to meet the special formatting requirements.

The IBCS guidelines had a positive impact on speeding up the process. Especially in the design phase, in the translation of the list of required topics into graphs and tables, it really helped to have clear guidelines.

So far, the reactions in general have been positive. The report has been brought live last week, and will be used in the coming period more intensively. The end product will be used in the quarterly management meetings. Hopefully these small steps in using IBCS guidelines will be appreciated and embraced by the organization.

To conclude: with limited amount of time and resources, a consistent and clear management report was build using the IBCS guidelines

SAP BW/4HANA – the next step towards simplicity

Last month SAP has announced the next generation of SAP Business Warehouse, BW/4HANA.

This logical step in the evolution of the Business Warehouse is completely dedicated to the HANA platform, and as such, can do without millions lines of coding to support other databases.
Also, it has unloaded the heritage of previous SAP BW versions by allowing only 4 building blocks (Composite Provider, Open ODS View, InfoObject and Datastore Object), instead of the 15 that are mostly used these days.

I think this is a good step towards simplicity and this will have huge benefits in development and maintenance. Of course, this level of simplicity can today already be achieved by BWonHANA (release 7.5), but this is without the benefit of efficient coding in the background.

Obviously, for customers considering moving to this new version, it is always a good advice to choose the moment well. Not too soon, because then there will be too many bugs that need to be fixed. But also not too late, because you will want to benefit from the improvements as soon as possible.
Nevertheless, the move towards SAP BW/4HANA will be inevitable, if you want to continue doing datawarehousing with SAP, since the future developments will focus on this platform.

So with this in mind, and since BW/4HANA will only allow 4 building blocks, it is sound advice to consider using only the allowed building blocks in new developments. And if possible, migrate the current datamodels to the new objects.

To conclude: the next step in the evolution of SAP BW is here!

Text Analytics and Text Mining with SAP HANA platform

Last month I finished the Open SAP Course Text Analytics with SAP HANA Platform. This was a nice opportunity to get a grip of the concepts involved. So in this blog, I give a short overview of the concepts I learned about Text Analytics and Text Mining.

Text Analytics

Text Analytics can be used to:

  • search on text related contend i.e. crm related documents
  • extract meaningful, structured information from unstructured text
  • combine unstructured data with structured data

As an example, when an evaluation asks for a score and open questions, it will be useful to combine this information to see if there is a correlation.
Also, for consumer oriented companies, it can be used to structure the information on social media.

SAP HANA executes the following steps to analyze text:

1. Structuring the data
In this step all types of documents are taken apart stored in fact tables in a structured way.

  • File format filtering; convert any binary document format to text/HTML.
  • Language detection; identify the language to apply appropriate tokenization and stemming
  • Tokenization; decomposing word sequences e.g. “card-based payment systems” becomes “card” “based” “payment” “systems”
  • Stemming; normalizing tokens to linguistic base form e.g. houses -> house; ran -> run
  • Identify part-of-speech; tagging word categories, e.g. quick: Adjective; houses: Noun-Plural
  • Identify noun groups; identifying concepts e.g. text data; global piracy

2. Entity determination
In this step pre-defined entity types are used to classify the data; e.g. Winston Churchill: PERSON; U.K.: COUNTRY;
Possible entity types are:

  • Who; people, job title, and national identification numbers
  • What; companies, organizations, financial indexes, and products
  • When; dates, days, holidays, months, years, times, and time periods
  • Where; addresses, cities, states, countries, facilities, Internet addresses, and phone numbers
  • How much; currencies and units of measure
  • Generic concepts; big data, text data, global piracy, and so on

3. Fact extraction
Fact extraction is realized through rules that look for sentiments between entities.
Example: I love your product.
Love is in this context classified as a ‘Strong positive sentiment’.

Another example is known as the Voice of customer, with typical classifications:

  • Sentiments: strong positive, weak positive, neutral, weak negative, strong negative, and problems
  • Requests: general and contact info
  • Emoticons: strong positive, weak positive, weak negative, strong negative
  • Profanity: ambiguous and unambiguous

With these steps, text analysis gives ‘structure’ to two sorts of elements from unstructured text: Entities and Facts. Counting Entities and Facts can then be combined with structured information.

Text Mining

The second topic in the course is Text Mining.

Text Mining works at the document level, it is about making semantic determinations about the overall content of documents relative to other documents.
This differs from text analysis, which does linguistic analysis and extracts information embedded within documents.
With text mining you can:

  • identify similar documents;
  • identify key terms of a document;
  • identify related terms;
  • categorize new documents based on a training corpus.

The way Text Mining works is by representing a document collection as a huge terms/documents matrix. The elements of this matrix represent the weight of this term in this document.
Based on the elements of this matrix the Vector Space Module is used to calculate the similarity between documents.

To categorize documents a “reference set” of previously classified documents is used. By comparing an input document to the documents in the reference set the most likely categories are returned.

Text Mining is used for example to:

  • Highlight the key terms when viewing a patent document
  • Identify similar incidents for faster problem solving
  • Categorize new scientific papers along a hierarchy of topics

To summarize: Text Analytics and Text Mining are very interesting tools to deal with unstructured data.

SAP BO Design Studio – release 1.6 is out

In April 2015 I wrote an evaluation about the experience we had with SAP BO Design Studio 1.3, after being live for 5 months. The general feeling was that the new application was a huge substantial improvement, but we were also waiting for some enhancements to come. So now, with the new release of Design Studio 1.6 out since the end of November 2015, it is interesting to see where we stand.

Design Studio 1.4: functionality improvements – cleaner applications

With release 1.4 we have made some functional improvements to our template that will help maintain our reports and organizes them more logically.

  • Fragmented bookmarking makes the bookmarking more robust. We used to have regular bookmarking, but with that, when an application is changed, even the smallest change, all the bookmarks become invalid. With fragmented bookmarks this has been solved;
  • Context menu is used for filtering on members in the table. We used to mimic this functionality with BIAL (SAP BI Action Language) coding, but working with the context menu works a lot better;
  • In some applications we use checkbox groups, usually for key-figures;
  • Drag and drop in the filter panel is enabled;
  • Global Scripts are used to organize the BIAL coding;

Design Studio 1.5: huge performance improvement

The main thing about the 1.5 version is the improvement in performance. Initial startup became a lot faster, and also working with the tool on the platform has improvement significantly.

Further, we use the binding properties where possible to minimize the BIAL.

Design Studio 1.6: interesting improvements

I will need more time to figure out what we will be using of DS 1.6, but from reading ‘the what’s new’ document, I already like a few features:

  • Crosstabs hierarchies exported to Excel; This is something we have been waiting for. We even created an SDK component to mimic this behavior;
  • Assigning bookmarks to folders; This might be the solution for some of our users who are requesting for a central environment where they can retrieve all their bookmarks;
  • No restart after updating the SDK; A nice improvement, we always need to ask administrators to restart the servers after business hours. Not necessary anymore;

As for the new components, I need to become more familiar with them.

Platform: Netweaver versus BO BI

In March 2014, when just starting with SAP Design Studio, the Netweaver platform seemed more robust than the BO BI platform, as I described in my blog. However, since then a lot of improvements have been made to the BO BI platform. Also, SAP states that future platform development will be concentrated on the BI Platform. That makes the choosing a lot easier.

 Comparison with April this year

In April 2015 I mentioned 5 bullets we were skeptical about. My current view on them is that 4 of them are more or less covered with DS1.6, and steps are made with DS1.6 to cover the fifth:

  • Stability: with DS1.5 already the applications feel more robust;
  • Export to Excel: exporting hierarchies are now supported with DS1.6, other issues have been solved;
  • Bookmarking: with fragmented bookmarking in DS1.4, and assigning bookmarks to folders, this has improved a lot;
  • Performance: great improvement with DS1.5;
  • Mobile readiness: with DS 1.6 steps are made towards a responsive design;

To conclude: Design Studio is really getting mature. Keep up the good work!

Correlation and causation

These days I am reading the new book of Stephen Few, Signal. As always with books of Stephen Few, it sets my daily work in a new perspective and is inspiration for several ideas. One of the chapters I particularly like is about ‘correlation and causation’.

To summarize the conclusion: it is important to always use your head!

This is a conclusion I can really relate to. Nowadays with new technologies and new social media datasources there is more opportunity to find correlations between variables.

But what to do with these correlations?

Just using these correlations without knowing if there is a meaningful relation is not sound advice. As Stephen Few explains, finding a correlation between two variables is a potential step on the path of finding a cause, which can lead to something beneficial.

Also, the number of meaningful relations in the world will not increase by the number of correlations we find. It will only make them harder to find. In that sense, Big Data will give us a lot of additional correlations, but will this help in finding meaningful relations that matter?

As explained by Stephen Few, for every observed relationship between two variables X and Y, we can only conclude that X causes Y when:

  • It is certain that there is a real relationship (ruling out unreliable data measurements, statistically significant, the sample size is big enough).
  • Y does not cause X.
  • It is not the case that something else, related to X, causes Y.

Ruling out these three points is something you need your head for. And tools like SAP Design Studio and SAP Lumira can help you there with visualizations and analysis capabilities.

To conclude: always use your head!