The choice of data warehouse depends on data volume, data variety and analysis requirements. In this blog, we are trying to compare what would be an ideal warehouse for Power BI reporting. So, the results of our testing will only reflect the performance of Azure and Snowflake with respect to PowerBI reporting.
What data did we use?
We took an open-source stock data, mocked and duplicated it to be at 100 GB, which would be our fact. We have the date, time and security code dimensions. Together we have a star schema with one fact and 3 masters. We have loaded the same data into both Snowflake and Azure Data Warehouse.
What did we do with the data?
We built a similar PowerBI dashboard on top of Snowflake and Azure Data Warehouse. Both are connected on direct query mode. We have built a semantic star schema on the PowerBI Modelling perspective. So now if we run any query on these dashboards, we will have similar queries fired on both systems. We will compare both these query performances to draw conclusions.
What systems did we use?
We have chosen a data warehouse which has similar costs. In Azure, we have chosen DW200c Gen 2 which would cost around $3.02/hour in pay as you go subscriptions. You could avail these lot cheaper at longer subscriptions.
In Snowflake we have chosen XS tier, which could charge you between $2-4 depending on the subscription type. You can secure price discounts with pre-purchased Snowflake capacity options.
How did we tune the data warehouses?
In azure, we have performed Round Robin partitioning on the fact, so the 100 GB of data is evenly distributed across all partitions. We have used duplicated partitioning for the master table to minimize inter partition data movement.
In Snowflake, we have built clusters on the columns on which filtering will happen in the fact table. Other than this, we couldn’t find any specific optimization techniques from Snowflake. Snowflake claims to be already storing data for the best possible performance.
What did we observe?
Criteria | Snowflake Warehouse | Azure Data Warehouse |
Tier | XS Standard | DW 200c Gen 2 |
Operation Costs | $2-4/ Hour calculated per second basis, with a minimum of 60 seconds. | $3.02/ Hour |
Storage costs | $23/1 TB/month | $122.88/1 TB/month |
Auto Resume | Yes | No |
Auto Scaling | Yes | No, on-demand scaling is available. |
Result Caching | Yes | Yes |
Performance | Performed well for a single table and cached queries | Performs at least 30% faster than Snowflake in complex queries |
In most cases, we identified that Azure was able to fetch faster results than Snowflake. On analyzing the Query plan, seemed like snowflake was spending most of the time in IO. Seems like Snowflake needed an extra bit of time to get the data into the computer. But, the auto-resume feature of Snowflake would overtake any performance advantage of Azure. Because Azure Data Warehouse doesn’t have an auto-resume option, we must keep the system running and billing. But, in snowflake cluster would come alive only automatically at the event of a query and hence doesn’t need to be kept running. Similarly, you have the option to auto-scale the number of clusters in Snowflake, whereas this must be programmed or manually upgraded in Azure. Other than this, we felt both systems of competitively charged for their service.
The most difference between the qualitative and not quantitative. Azure is built for tunability whereas snowflake is for ease of use. It’s best to demo both these products and choose what works best for your organization.
Read more about similar Self Service BI topics click here and learn more about Visual BI Solutions Microsoft Power BI offerings here.
Subscribe to our Newsletter
The post [Video] Cloud Data Warehouse Comparison – Azure & Snowflake appeared first on Visual BI Solutions.