This week I attended the hands-on sessions ‘Zero to Snowflake’ provided by Snowflake. One of the reasons for me to look into Snowflake is that the Erasmus University of Rotterdam (EUR ) is replacing their SAP BI stack with a combination of Matillion, Snowflake and Tableau.
Snowflake is a data platform in the Cloud. In their own words:
Quote:
Built from the ground up for the cloud, Snowflake’s unique multi-cluster shared data architecture delivers the performance, scale, elasticity, and concurrency today’s organizations require.
Snowflake is a single, integrated platform delivered as-a-service. It features storage, compute, and global services layers that are physically separated but logically integrated. Data workloads scale independently from one another, making it an ideal platform for data warehousing, data lakes, data engineering, data science, modern data sharing, and developing data applications.
Unquote.
The introduction I joined was a hands-on session, walking along a pdf with prepared sql statements. And even though it was scripted, I must say I am impressed by the tool.
Some key findings/learning I had:
- I was told the tool is called ‘Snowflake’ because the 2 French starters loved skiing and it is an English word that they can pronounce.
- The scalability is absolutely impressive, this can be done quickly and flexible.
- The pricing is very transparent. You pay per second and it works with Credits. If you want to increase performance, for example van Small to Large, the speed doubles, and also the number of credits per hour doubles. As I remember correctly, a credit is about 5 USD.
- The cloud location is a choice. So if you need to have your data within the EU, this can be arranged. Also the cloud platform (Amazone, Azure, Google) is a choice.
- The cloud software will be updated regularly and is backwards compatible.
- It had been more than 10 years for me to work in a SQL environment. I had forgotten how nice it is just to write some code, mark it, and execute it.
- The data we worked with showed in a great way how you can combine several types of data sources and formats (csv and json).
All in all I got a great first impression of what this tool can do and how to use it.
As for a use case in a SAP BI environment, I can see some possibilities as well. I don’t want to go as far as to completely replace the complete SAP BI stack. However, sometimes users just want to have the data from the data-warehouse, to either make their own reports in non-SAP tools, or practice data-science using Python. A tool like Snowflake might be useful as layer between SAP BI and these tools. The advantages could be:
- The data is still centralized, so the ‘single version of the truth’ can be maintained (as opposed to having several datasets going around into the organization)
- Access to the data set can be done in an authorized, controlled environment
- Data enrichment can be done initially flexible in this environment, without the need to load the data in SAP BI
- In snowflake a lot is done with sql scripting, this is more natural for the data scientists who want to work with the data than getting to train them in SAP BI
- The scalability of the tool is very flexible, the costs are per usage and not the number of users
- It’s in the cloud, so for IT it is easier to maintain
To conclude: Snowflake is a great tool!