Quantcast
Channel: Visual BI Solutions
Viewing all 989 articles
Browse latest View live

Understanding the ‘n’ Key Difference between Power BI Report Server and Power BI Service

$
0
0

What is Report Server and Power BI Service?

Power BI Report Server is the on-premise option offered by Microsoft Power BI Deployment Suit. Power BI Service on the other hand is the cloud service and the cloud solution from Microsoft.

 

Usage of Capacity Nodes

There are many major differences in terms of going with On-Premise or Cloud. But the common point to note on both is the usage of Premium capacity for your organization resources to run on. This is covered in Power BI Premium. Power BI Premium when purchased is placed as a On-Premise. The option is given to the organization to scale up and use Power BI Service when needed.

Power BI provides the scalability with the existing business model. This blog explains the key difference between the Power BI report server and Power BI service.

 

Top 6 Key Differences between Power BI Report Server and Power BI Service

1. Creating reports on the Browser

This is one of the most unique features as the solution is hosted on the cloud and can be consumed via Web browser having access to Power BI Service. Users who have Power BI accounts can consume the Power BI Service. The features can be accessed based on the user (Free/Pro User).

2. Creation of Dashboards

Yes, Power BI supports creation of Dashboards on the Power BI Service only. We have multiple options from pinning the tiles to the dashboard. Perform and place advance visualization on the dashboard when consuming Power BI Service. With the use of Azure Machine Learning Module this becomes a great advantage for using Power BI Service.

3. Usage of AI and Machine Learning Capabilities

Power BI Service has a direct connection to the Azure Machine Learning Module which gets connected via this service. We can perform and use AI charts and components from Power BI. Key Influencer is one such visual which can be used when the report/dashboard building is done on Power BI Service.

4. Data Alerts

Data Alerts could possibly be on of the must needed feature for fast moving organizations who want to take actions based on quick alerts, when their KPI’s dip down compared to the normal level. This feature is not available in Report Server. Alerts are placed on the Power BI Visual – KPI cards. These alerts can also be received on Power BI Mobile.

5. Usage of R and Python Engine

The usage of R and Python in Power BI can be enabled or disabled with the option found on the settings level in Power BI Service. This becomes inapplicable when we tend to use Power BI Report Server.

6. Preview Features

When a release is up, all Power BI users wait eagerly to test and taste all its new features. Only the Power BI Service model gets its preview features. These get tested and the reviews are given out for the preview features. The Report Server model needs to wait to avail the preview features until it becomes generally available.

 

Feature Comparison List

Deployment On-premises or Private cloud Public Cloud Power BI Report Server can be deployed in Azure VMs (hosted cloud) if licensed through Power BI Premium
Source data Cloud and/or on-premises Cloud and/or on-premises  NA
License Power BI Premium or SQL Server EE with SA Power BI Pro and/or Power BI Premium  NA
Lifecycle Modern lifecycle policy Fully managed service  NA
Release cycle Once in every 4 months Once in a month Latest features and fixes come to Power BI Service first. Most core functionality comes to Power BI Report Server in the next few releases; some features only meant for the Power BI service.
Create Power BI reports in Power BI Desktop Yes Yes  NA
Create Power BI reports in the browser No Yes  NA
Gateway required No Yes for on-premises data sources  NA
Real-time streaming No Yes Real-time streaming in Power BI
Dashboards No Yes Dashboards in the Power BI service
Distribute group of reports using apps No Yes Create and publish apps with dashboards and reports
Content packs No Yes Organizational content packs: Introduction
Connect to services like Salesforce Yes Yes Connect to the services you use with content packs in the Power BI service. In Power BI Report Server, use certified connectors to connect to services. See Power BI report data sources in Power BI Report Server for details.
Q&A No Yes Q&A in the Power BI service and Power BI Desktop
Quick insights No Yes Automatically generate data insights with Power BI
Analyze in Excel No Yes Analyze in Excel
Paginated reports Yes Yes Paginated reports are available in the Power BI service in preview in a Premium capacity
Power BI mobile apps Yes Yes Power BI mobile apps overview
ARC GIS maps No Yes ArcGIS maps in Power BI service and Power BI Desktop by Esri
Email subscriptions for Power BI reports No Yes Subscribe yourself or others to a report or dashboard in the Power BI service
Email subscriptions for paginated reports Yes No E-Mail delivery in Reporting Services
Data alerts No Yes Data alerts in the Power BI service
Row-level security (RLS) Yes Yes Available in both DirectQuery (data source) & Import mode
Row-level security in the Power BI service
Row-level security in Power BI Report Server
Full-screen mode No Yes Full-screen mode in the Power BI service
Advanced Office 365 collaboration No Yes Collaborate in an app workspace with Office 365
R visuals No Yes Create R visuals in Power BI Desktop and publish them to the Power BI service. You can’t save Power BI reports with R visuals to Power BI Report Server.
Preview features No Yes Opt in for Power BI service preview features
Custom visuals Yes Yes Custom visuals in Power BI

 

Which one should you go with?

This is purely based on the organizational needs. If the organization requires the usage of Reporting within its firewalls and security, then it is recommended to go with Power BI Report Server. But if the organization is looking to explore and check and enjoy features advanced with AI and ML capabilities then Power BI Service is recommended.

Source: https://docs.microsoft.com/en-us/power-bi/report-server/compare-report-server-service

 

Know more about Microsoft Power BI services offerings from Visual BI solutions here.

 

Subscribe to our Newsletter

The post Understanding the ‘n’ Key Difference between Power BI Report Server and Power BI Service appeared first on Visual BI Solutions.


Self-Service tools with ETL Capabilities: A quick comparison

$
0
0

1. Introduction

Over the past couple of years, there has been a strong inclination in the BI market towards Self-Service tools. This blog aims at providing a brief comparison between the three Self-service tools with ETL capabilities: Alteryx 2019.1.6.58192, Tableau Prep 2018.3.1 and Power BI Dataflow (July 2019).

Alteryx is one of the latest tools in the market to provide self-service BI and analytics which extends to spatial and predictive analytics etc., Even though Alteryx is predominantly used as an analytics tool, it also has ETL capabilities to assist the users to prepare and blend data from multiple sources and export the results of the analytics to a variety of platform.

Tableau and Power BI are the major players in the BI industry and have been constantly identified as the market leaders of self-service BI tools by Gartner over the last couple of years. To bridge the gap between transforming raw data to a fully functional dynamic report both the technologies introduced an additional component as a part of their service to include self-service ETL capabilities. Tableau Prep and Power BI Dataflow were introduced as the ETL counterpart for Tableau and Power BI respectively.

Though comparing data preparation capabilities of an analytical tool like Alteryx to ETL counterparts of visualization tools such as Tableau and Power BI is like comparing apples and orange, we are curious to understand how easy to achieve the data preparation from a self-service perspective. ETL capabilities are compared across the three tools based on the following parameters;

  • Data Connectivity
  • Data Preparation
  • Transformation and Join Capabilities

2. Data Connectivity

Since the first step in an ETL process is to Extract the data, it is essential to understand the list of possible sources, that all these tools can connect to and retrieve data from.

self-service-etl-tools-a-quick-comparison-1

To view the complete list of sources, click here.

Self-service users can easily connect to a variety of data sources in Alteryx when compared to other tools. Also, the data can easily be loaded to tables for further processing and consumption whereas this is limited in other tools in comparison.

 

3. Data Preparation

The next step of an ETL process is data preparation which involves the conversion of raw data into a usable format for further analysis. This might involve the following processes;

    1. Data Cleansing
    2. Building Expressions
    3. Sampling and Indexing

3.1 Data Cleansing

self-service-etl-tools-quick-comparison

We can comfortably achieve all common data cleansing needs such as replacing NULL values, removing white spaces, tabs, numbers, special characters, modifying the case of text, to select/deselect columns, to reorder, rename, resize columns and to modify the data type in all the three tools.

In addition to the above-listed data cleansing options, direct options for removing duplicate white spaces, restricting the column size and adding column description is available only in Alteryx.

3.2 Building Expressions

self-service-etl-tools-quick-comparison-3

In order to create a new column or to update an existing column, an expression is used which can be created with the help of following categories of functions such as;

    • Conditional functions such as if, elseif, etc.,
    • Conversion functions which include data type conversions
    • Date Time functions such as Datediff(), DateParse() etc.,
    • Math functions such as ABS(), Mean(), Mode() etc.,
    • Operators such as Boolean AND, OR, NOT, etc.,
    • String functions such as TRIM(), SUBSTR(), LEN(), REPLACE() etc.,
    • Test functions which includes Isnull(), IsNum() etc.,

By comparing the functions across all the three tools, operations such as Boolean expressions, regular expressions and generating random numbers are explicitly available in Alteryx. Even though most of these functions can be achieved in all the tools, unlike Alteryx there is no direct option to incorporate them in PowerBI Dataflow as the ability to build expressions in PowerBI Dataflow is restricted. It requires additional workarounds or extensive knowledge in Power Query to achieve them which adds a little bit of complexity to naïve users.

3.3 Data Sampling and Indexing

Sampling – fetch the set of records from the input.
self-service-etl-tools-quick-comparison-4

Indexing – assigns a unique identifier which increments sequentially for every record in the data. In terms of indexing, all three tools have indexing functionality.

4. Data Transformation

With data transformations, the users can convert the data from one structure/format to another which includes activities such as data integration and data management. In terms of data transformation, the following parameters are considered for the analysis;
self-service-etl-tools-quick-comparison-5

5. Conclusion

An overall summary for the three tools is given in the table below;
self-service-etl-tools-quick-comparison-6

From the perspective of a self-service tool with ETL capabilities, almost all the basic features are possible in Alteryx, Tableau Data Prep, and Power BI Data flow.

In terms of advanced functionalities for data preparation and data transformation, Alteryx provides a better intuitive experience for naïve end users whereas we have to use workarounds in Tableau Data Prep and power query in Power BI Data Flow. Power BI Data Flow is a very recent launch, it is in its initial version and we expect it to quickly ramp up in features and capabilities like how Power BI got better and better with every month.

The tools that an analyst is going to pick depends on the analytical task at hand. If the platform of the user is either Tableau or PowerBI they can opt for Tableau Prep or PowerBI dataflow for self-service ETL. If you need to take a deep dive into your data to do predictive, spatial and BI analytics in addition to data preparation/ETL then the analyst could go for Alteryx.

References:

  1. https://docs.microsoft.com/en-us/power-bi/service-dataflows-data-sources
  2. https://onlinehelp.tableau.com/current/prep/en-us/prep_connect.htm
  3. https://help.alteryx.com/current/DataSources/SupportedDataSources.htm

Learn more about Visual BI Solutions Microsoft Power BI offerings here. 

 

Subscribe to our Newsletter

The post Self-Service tools with ETL Capabilities: A quick comparison appeared first on Visual BI Solutions.

Looker, the new era of SQL

$
0
0

On June 6th, 2019 there was a huge buzz on the BI Realm about Google acquiring an Analytics Startup called “Looker”. They acquired them in an all-cash deal for a whopping $2.6 billion which made all the eyes maneuver towards Looker. With just seven years into the BI space, Looker had already established itself with 1500+ Customers and revenue exceeding $100 million.

Looker is a BI Platform that connects to any relational database. The data can be modeled in LookML, it’s Agile modeling layer which then could be consumed in Looker’s Visualization from where we build Looks and Dashboards. This eases the process of making data-driven decisions faster with more information for various users and business analysts.looker-new-era-sql-1

Unlike SQL, Looker focuses on Reusability, Collaborative and Querying in an organized manner which led to the development of LookML. LookML simplifies the SQL Code that is written into more readable language so that development is easier. This unique nature of the Looker simplicities the development process by abstracting SQL thereby the users/ business analysts can lay more emphasis on the analysis part than the development. The integration of GIT, a version control mechanism with Looker enables us to automatically track various developments and keep a check on which needs to be moved to production. This makes the development and Visualization process as two different tracks so that one can develop and the other can visualize it without any waiting in between. Once the development is complete and pushed to Prod, all the changes will be automatically captured in the Visualization allowing users to see the recent changes made in development.

Looker Visualization, the front-end of Looker is perfectly suited for Self-Service BI. It incorporates all the necessary features a Self-Service BI should possess right from scheduling emails, sending alerts when certain conditions are met in a KPI, Exploring more insights from existing Looks, etc. They also have custom visualizations options that can be achieved using D3. REST API is provided by Looker as well, which could be consumed in various visualizations and other applications. Looker is fast as it directly queries the Database. Looker can be configured to use read-only connections to access the minimum amount of data required to respond to our queries thus focusing on data availability and not storage.

Now we can guess why Google acquired Looker. It would help them in broadening their portfolio in Business Intelligence. With already Big Query, Google’s inhouse analytical tool in the picture, Looker will make them strong contenders with Amazon and Microsoft who are holding the fort of BI in the cloud.

 

Know more about our Visual BI offerings and to read more related content click here .

 

Subscribe to our Newsletter

The post Looker, the new era of SQL appeared first on Visual BI Solutions.

Scheduling Workflow In Alteryx Designer

$
0
0

Introduction

Scheduling is the final step of every workflow by which we configure a process to run at different timings based on the requirement. In this blog, we will understand how to schedule a workflow in Alteryx, and a brief comparison of its scheduling features with SAP BODS.

Like any scheduler, Alteryx Scheduler makes it easy to specify the date, time and frequency to run Alteryx workflows. Adding to it, Alteryx Scheduler can also be configured to employ Distributed Scheduling functionality across your network of computers (Workers).

Every workflow that you create in the Alteryx designer can be scheduled by selecting Options >> Schedule workflow from the menu bar.
scheduling-workflow-alteryx-designer

On the click of the schedule option, a pop-up window appears as shown below with the list of Alteryx gallery and Alteryx controllers.

Select the location, where you want to schedule or else you can add a new location to schedule the workflow.
scheduling-workflow-alteryx-designer

Note: The Alteryx Gallery is a cloud-based application for publishing, sharing, and executing workflows. It communicates directly with the Alteryx Service for the management and execution of the workflows and utilizes a MongoDB persistence layer for all state maintenance.
The Alteryx Service Controller is responsible for the management of the service settings and the delegation of work to the Alteryx Service Workers which is responsible for executing analytic workflows.

It opens a new pop-up as shown below with a list of predefined scheduling options.
You can schedule a workflow from one of three methods given below with an option to provide the scheduling job name:

  • Once
  • Recurring
  • Custom

Scheduling – Once

This option allows us to run a workflow once at a specific time.scheduling-workflow-alteryx-designer

Scheduling – Recurring

It allows us to schedule a job to run Hourly, Daily, Weekly or Monthly for a specific period with this option.scheduling-workflow-alteryx-designer

Where you can select the interval in which the workflow needs to be iterated from the list available in the drop-down. The time selection options differ based on our selection in the drop-down that enables us to configure the iteration of the job.scheduling-workflow-alteryx-designerscheduling-workflow-alteryx-designer scheduling-workflow-alteryx-designer  scheduling-workflow-alteryx-designer

Each option in the frequency tab is used to address a specific requirement or scenario as follows;

  1. Hourly: If the workflow needs to run multiple times every day, at a regular interval of time, it can be achieved by specifying the duration in the input field of in Hours and minutes.
  2. Daily: If a workflow needs to run every day at a specific time, select the daily option and specify the time on which the workflow needs to be executed.
  3. Weekly: If a workflow needs to run at different days of a week, select the weekly option and select the days on which the workflow to run and schedule it.
  4. Monthly: If a workflow needs to run monthly on a day or a date, the monthly option can be used in which we can select a day of the month or week.

Scheduling – Custom

It allows us to schedule a workflow to run at a custom date list for different months between a period with this option.scheduling-workflow-alteryx-designer

If the workflow needs to be scheduled at a time on different days in different months, the custom option can be used and as shown.

Note: You can also assign the priority of the scheduling job and the worker, to distribute the load. Let us see more about workers and priorities of workflow in the next section.

As we have similar scheduling features in SAP BODS, we have a brief comparison of Alteryx.

Alteryx Vs BODS

Alteryx BODS
Scheduling from the tool without external services Yes  No (Need to schedule from server)
Scheduling multiple times, a day Yes  Yes
Option to specify the end date of scheduling Yes Yes
Assigning priority for the Job Yes  No (Need to write Shell script)
Scheduling Daily or only in working days(Monday – Friday) Yes  Yes
Scheduling at different days of a week Yes  Yes
Scheduling at a specific day in a month(First Sunday) Yes  Yes
Scheduling at a custom set of dates in a set of months Yes Yes
Assigning a worker Yes  No
Trigger a mail before/after running Yes Yes
Trigger a mail when a job is completed with or without error Yes Yes
Attaching assets of the Job by default in the mail Yes  No (Need to write Shell Script)
Running a custom command when a workflow is completed Yes  No (Need to write Shell Script)

Conclusion

We can schedule every workflow using the Alteryx designer with the simple steps described above. Based on the above comparison, we can infer that scheduling in Alteryx is much easier and offers far more features than SAP BODS.

 

Learn more about Visual BI Solutions Microsoft Power BI offerings here. 

Subscribe to our Newsletter

The post Scheduling Workflow In Alteryx Designer appeared first on Visual BI Solutions.

The Decline of HADOOP and Ushering An Era of Cloud

$
0
0

Five years ago, the go-to architecture for a data lake was HADOOP which was synonymous with big data. What exactly is HADOOP and why it’s rise to stardom started to decline?

HADOOP is primarily composed of three layers namely;

  1. HDFS – Storage Layer
  2. MapReduce – Processing Layer
  3. Yarn – Resource Management Layer (Yet Another Resource Negotiator)

The Layered attack by Cloud

       With the advent of cloud and organizations like Amazon Web Service, Microsoft Azure, and Google Cloud Platform to name a few providing cheaper scalable and faster solutions than on-premise HDFS, HADOOP dealt a heavy blow. One of the primary factors in the adoption of HADOOP was its ability to scale for exponential data volume growth with cheaper hardware. All said and done, it still was an on-premise platform and the maintenance overhead loomed over organizations. Cloud providers were able to do much more by providing the same scalability along with nil maintenance overhead and cheaper costs. A new architecture is gaining momentum on the cloud which allows the separation of compute and storage layers, thus enabling scaling on both fronts. Snowflake is a very good example of providing scalable data warehousing architecture with the ability to scale both on computation and storage dynamically and on the cloud.

      The selling point at the time of introduction of HADOOP was the MapReduce component. Built out on Java, MapReduce did exactly what it was named for. HADOOP could distribute and store huge amounts of data across several nodes of a cluster. A great concept that was open source and quickly adopted and modified by various cloud vendors to suit their needs. HADOOP was never able to catch up with it. Also, the complexity involved shied away developers from getting information out of HADOOP on time.

       Kubernetes and Cloud Foundry have started to eat away enthusiasts at the YARN layer. It is an orchestration layer that was developed entirely on Java limiting the number to Java focused tools. However, with microservices and enthusiasm of python-based data mining tools, Kubernetes proved to be an effective solution. In this, a developer focuses primarily on the development and not on deployment, scaling, and management of the application. Cloud Foundry provided serverless computing, deployment, and scaling of applications that could not be provided by YARN.

Factoring but not Cloud

      The subsequent blow came with in-memory computing. Hardware cost has reduced and at the same time, in-memory capacity is increasing to Terabytes. Spark with its Scala-based programming and ability to run extensively in just in-memory could churn out more data at 100 times the speed of HADOOP. Steering ahead with the wave was real-time computation. When cloud adapters and several solutions provided real-time data processing, HADOOP was still left with batch data processing.

     The foreseeable future is guided by Machine Learning driving important decisions and garnering the attention of all organizations. Machine Learning in HADOOP is not an easy task as that of other MLaaS offerings from various cloud providers like IBM Watson, Google Cloud Machine Learning Engine and BigML to name a few.

The End and an era of new Beginning

       With the advent of cloud computing and in-memory applications and anything as a service, HADOOP is losing its luster. Scalable architectures on compute and storage fronts have captured markets whereas HADOOP was not a choice and belong to a bygone era. Cloudera and Hortonworks merger might revive this once-thriving big data solution. Hadoop is now facing a decline; however, it is not dead.
decline-hadoop-ushering-era-cloud

Subscribe to our Newsletter

The post The Decline of HADOOP and Ushering An Era of Cloud appeared first on Visual BI Solutions.

Triggering Events from a Workflow

$
0
0

Introduction

You will always end up facing a requirement either to get notified when a workflow is running or do some other job based on the status of the workflow. For which, Alteryx designer gives us the option to trigger an event from a workflow. The event can be any one of the below;

  1. Send mail
  2. Run Command

triggering-events-workflow

Send mail – event

This option is used to send a mail at different instances of the workflow. When you click on the send mail option as shown, it will open a pop-up window as shown below. By default, it has all the information about the workflow, in which we can add our own message. To send a mail we need to give the SMTP configuration.

We can also add attachments like documentation about the workflow or details of the person mentioned who can be contacted.
triggering-events-workflow

The event can be triggered at the available predefined instances of the workflow which you can find below.
triggering-events-workflow

 

In the mail, we can add the assets that are added to the workflow and can also add our files as assets to the mail as an attachment through the assets tab.
triggering-events-workflow

 

Run Command – Event 

The run command option allows us to open any application that is installed and it can also trigger another workflow to run. It opens the selected services or application and runs the given command argument in that application.

You can select any program on the computer which needs to be opened using the browse option near the command box as shown below.
triggering-events-workflow

Please refer the blogs below for more information on using the Run command:

  1. https://community.alteryx.com/t5/Alteryx-Server-Knowledge-Base/Scheduling-Workflows-Using-Event-Run-Command/ta-p/36583
  2. https://help.alteryx.com/2018.3/RunCommand.htm

Conclusion

You can add multiple events for a single workflow, like sending a mail before the start and after the completion of a workflow. We can also schedule a dependent workflow to run after completion of the parent workflow at the workflow level while developing itself.

 

Subscribe to our Newsletter

The post Triggering Events from a Workflow appeared first on Visual BI Solutions.

Snowflake – The Unique Data Warehousing Solution

$
0
0

As we know, the BI space is growing tremendously in the last few decades. Recently, “Snowflake” has been creating a lot of buzz. Snowflake offers unique features that make them the next big thing in the market. Entirely set up in the cloud, with its Multi-Cluster Shared Data Architecture, Snowflake already has a great head start and is looking to be a potential winner among all other DWs present in the market.

Advantage of Snowflake Architecture

Completely built on SQL, Snowflake, not only stores structured data, but also handles all the semi-structured data such as JSON, XML, Avro, etc. Data is sorted and stored in various micro partitions in columnar format, accounting for fast data retrieval. Snowflake has an impressive dynamic caching mechanism. When any new query matches the existing result in the cache then the result is returned immediately, without any computation for the past 24 hours, provided the data hasn’t changed. Snowflake has 3 different caches namely Result Cache, Local Disk Cache, and Remote Disk Cache.

Cloning of a Database helps in creating a copy of the existing instance of the database in seconds without disturbing the original database. Cloning doesn’t duplicate any data and performs it as a meta-data operation thereby not adding any extra costs on the storage of cloned data.

Another amazing feature of Snowflake is its hybrid architecture. The advantages of both Traditional Shared Disk and Shared Nothing Database Architecture make Snowflake separate the Storage layer from Computing. With this, we can load the data and query it at the same time without contention. The database storage layer resides in scalable cloud storage like Amazon S3 and the computation part is taken by the Virtual Warehouses, which can query databases and also, these can be created, resized and deleted dynamically depending on the usage of resources. This along with the cloud’s advantage pertaining to scalability and near-zero management make Snowflake, undoubtedly, the prime player in the market.
snowflake-unique-data-warehousing-solution-1

Snowflake has centralized storage which represents data as a single point of truth. With the help of Virtual warehouses, now the same data can be queried by different services, without any impact on each other. This is achieved using Multi-Cluster Warehouses, where each Virtual Warehouse scales independently but have access to the same data.

Snowflake introduced a concept called Time Travel’ which acts as a snapshot of the given data at a point of time. By default, the time travel comes with 24 hours and for Enterprise Edition we can Time Travel up to 90 days. Native to the cloud, they take care of all fail-safe mechanisms. In the case of disk-failures, they provide 7 days of fail-safe protection.  Let’s look at the various factors that make Snowflake a better tool than its competitors.

Security Aspect

Snowflake stores all its data only in encrypted form. It has end-to-end encryption. The data stored inside the disks, or when it’s in motion, the users can forget about building complex security models from their end. It supports two-factor authentication and federal authenticator with a single sign-on option. Any authorization given to the users is role-based and we can set up predefined policies for limited access. Snowflake has SOC 2 Type 2 certification on both AWS and Azure. If users insist, an additional level of encryption can be provided across all network communications.

Cost Aspect

The major advantage of Snowflake is its cost. Though there are slight changes in cost over different regions. On average, the storage cost of 1 TB data is as slight as 23$ per month. Snowflake saves the cost by compressing the data stored in a 3:1 compression ratio. Unlike Google Big Query which charges for the uncompressed data, Snowflake charges only for the compressed data. This compression in data doesn’t have any impact on the performance as most of the operations are going to be through meta-data. The computation charge is separate, the charge is accounted on a pay per second basis depending on the Snowflake credit the Warehouse uses for computing. Another advantage is the Zero Clone concept, where we could clone the data but just pay for the master data once. This eliminates the need for separate environments. The cost factor varies for different editions of Snowflake and the size of the Warehouse we select for computing. But overall, this is a much better solution with added advantages of scalability and agility of the cloud.

Performance Aspect

Snowflake separates the Compute layer from the storage which boosts its performance, ‘CapSpeciality’, an insurance provider in the US was able to achieve 200x times faster reporting and were able to query 10 years of data in less than 15 minutes. The ability to upgrade and downgrade clusters automatically and the option to specify the minimum and maximum clusters make Snowflake unique. Native to SQL, it addresses both structured data and semi-structured data with no degradation in performance.  We can specify the minimum and the maximum number of clusters for each Warehouse, thereby the Snowflake would automatically Scale-out for Concurrency and Scale-up for performance accordingly, without any manual intervention. Snowflake offers Instant Elasticity to its customers thereby providing an unlimited number of users with consistent performance, predictable pricing and no overbuy.

Maintenance Aspect

Instead of heavy investments in maintenance with other tools, Snowflake has Near Zero Administration. We need not specify any tuning or indexes as Snowflake takes care of everything automatically. Plus, the initial investments that occur on installing any other tools are completely removed out of the picture. Snowflake has a fail-safe mechanism that eliminates the need for any backup.

Summary

Snowflake is climbing and making impacts on the Analytics market with its exceptional architecture. The efficiency and flexibility offered to a traditional data warehouse on the cloud by Snowflake are significantly remarkable when compared to its competitors. These unique aspects make Snowflake a standout product to be watched out for.

 

Know more about our Visual BI offerings and to read more related content click here.

Subscribe to our Newsletter

The post Snowflake – The Unique Data Warehousing Solution appeared first on Visual BI Solutions.

Alteryx Weekly Challenge – Dice Game

$
0
0

Introduction

The Alteryx technical communicate is very active.  In addition to the learning materials available in Alteryx Academy, they also provide various business scenarios through the “Weekly Challenge”.

Anyone can solve the challenges posted in the community, this helps members to apply their learning, collaborate and discover several different approaches to the problem.

In this blog, I will discuss in detail one such solution to a challenge that was posted on week 168

Challenge Description

Consider you have three ordinary dice. You should roll all three dice and compute scores in the following manner.

  • Multiplying the largest number with the second largest number
  • Add the remaining value
  • By this method of scoring find the most occurrence of the score
    alteryx-weekly-challenge-dice-game

GIF Source: https://community.alteryx.com/t5/Weekly-Challenge/Challenge-168-Dice-Game-Born-to-Solve/td-p/427239

To make you understand easier, I have taken two probabilities of the three dice and illustrated the steps of the workflow below.

Step 1: Get the Dice input

Let’s have one text input tool each for 3 dice with the value of 1 to 6″
alteryx-weekly-challenge-dice-gamealteryx-weekly-challenge-dice-game

 

Step 2: Calculate the Probability

Generally, the append tool is used to perform the Cartesian Join. Here cartesian join will append the value of one input to each record of another input.

By this, I have taken the two-text input to generate the probability of two dice. Then the output of this Cartesian is taken as input for the next append tool to generate the probability of three dice which is 216. (6*6*6)

alteryx-weekly-challenge-dice-game                         alteryx-weekly-challenge-dice-game

 

 

Step 3: To Calculate the largest and Smallest Number

Now, the largest and smallest numbers can be calculated by –

  • Pivoting the columns
  • Sorting the output
  • Adding record id to find the largest and smallest value

We will look into the details of these steps below.

3.1  Pivot the Column

Pivoting of columns can be achieved by the Transpose tool. It rotates horizontal data into the vertical axis. Before rotating the column value, we should have some unique ID to track the column values after doing transpose. Hence, I have used the Record ID tool to create a unique ID for each column and then performed the transpose.
alteryx-weekly-challenge-dice-game         alteryx-weekly-challenge-dice-game

 

3.2  Sort the Output

Now the data is arranged in a vertically. In this state, Sorting must be performed to order inputs from smallest to largest based on two columns Record ID and Value.
alteryx-weekly-challenge-dice-game

alteryx-weekly-challenge-dice-game

 

3.3  Add Record ID to find the Largest, Smallest value

Again, I’ve added another record id as the first column to determine records having the largest, second-largest, smallest in the following manner.

Output = Perform modulo operation on Record ID by 3

Output Value Record
1 Smallest
2 Second Largest
0 Largest

 

alteryx-weekly-challenge-dice-gamealteryx-weekly-challenge-dice-game

 

Step 4: Find the occurrence of Score

Then we will find the number of occurrences of each score by unpivoting the column and implementing the scoring method. We will look into the details of each step as below.

4.1  Unpivot the Column

The crosstab tool rotates vertical data into the horizontal axis. This tool needs two inputs, column 1 as a header, column 2 as value.
alteryx-weekly-challenge-dice-game alteryx-weekly-challenge-dice-game

 

4.2  Implement the scoring method

Once the data is arranged horizontally, we can perform the calculation using the formula tool. Then leverage the summarize tool to find the occurrence of each score. In this case, I’ve used the sample tool to fetch the topmost record.
alteryx-weekly-challenge-dice-game

 

alteryx-weekly-challenge-dice-game

 

As displayed above, there are 216 probabilities when three dice are rolled and the maximum occurrence of a score out of these 216 probabilities is 13.
alteryx-weekly-challenge-dice-game

 

The final workflow will look like below.alteryx-weekly-challenge-dice-game

 

In subsequent blogs, we will demonstrate similar features and usages of Alteryx to resolve other weekly challenges. I hope you can leverage some of these options and capabilities to address your enterprise data blending and data wrangling needs.

Read more about similar Self Service BI topic here and learn more about Visual BI Solutions Microsoft Power BI offerings here. 

https://community.alteryx.com/t5/Weekly-Challenge/bd-p/weeklychallenge

 

 

Subscribe to our Newsletter

The post Alteryx Weekly Challenge – Dice Game appeared first on Visual BI Solutions.


Ad Hoc Analysis using Data Utility in Lumira Designer

$
0
0

Performance optimization and near self-experience are lucrative features that customers are always looking for. Reusing data sources across the application, dynamic slicer and dice facility are available to a primitive extent in SAP Lumira Designer. Data utility features in VBX expands this further and provides advance runtime data source manipulation capabilities while minimizing the number of instances of the data source in the dashboard. Let’s take a deep dive into how VBX’s data utility can help you overcome performance bottlenecks.

Reutilization of data sources

In Lumira designer there is a data selection feature for components that allows the developer to restrict/filter the data to be shown during the run time. This feature provides of reusing the same data source without creating multiple copies. Unfortunately, data selection works only on measures, not dimensions (However users will be able to select individual members of a dimension).
ad-hoc-analysis-using-data-utility-lumira-designer

There is a Data Utility feature in all the VBX Charts which allows for dimension & measure selections, change in the order of dimensions, measure aggregation, suppression type. This feature is available as a discrete component called Data Utility Source under the custom data source category. When creating huge dashboards, if similar data utility rules are applied in various parts of the dashboard, it might be tedious for the users to recreate the same set of rules for many individual components. Data utility sources can be used here to store these sets of rules. Creating a data utility source is akin to creating virtual copies of the “Source” data source and selecting the required dimensions and measures that should appear in the charts. These data sources can now be assigned to any component like a typical data source.

As shown in the below image we are using 2 components that are assigned to 2 virtual copies of our data source and the main data source (DS_4) is assigned to the data utility component.
ad-hoc-analysis-using-data-utility-lumira-designer

Once the data source is been assigned to the data utility component, data selection can be done for the virtual copies separately including dimensions. This really reduces the run time and shows only the information required in the initial view.
ad-hoc-analysis-using-data-utility-lumira-designer-3

Data Utility provides an edge over its native counterpart by allowing the users to select dimensions also.

Near Self-service experience

Allowing the user to change the granularity of the data set and to select the way in which the data is presented in runtime mimics a near real-time experience to the user.
Navigation Panel in Lumira designer allows the user to change displayed dimensions, their order, and measures in the data source.
However, the change impacts the data source and all the visualizations bound to this data source are affected. But when using data utility data sources, we are using a virtual data source for each chart and the dimensions and measures in all these virtual data sources can be controlled from a single cockpit component called – “Data Utility” which can be added to the layout of the application just like the navigation panel. This allows the users to change the data utility rules defined in the virtual data sources during runtime.
ad-hoc-analysis-using-data-utility-lumira-designer-7

As seen in the above GIF, two virtual copies (DS_REGION & DS_STORE) have been created and assigned to a chart and table each. Here you can observe that the rendering of the component happens in the user-defined layout. It provides the Initial View appearance having Dimensions and Measures in the run time with columns and rows appearing as buttons “C” and “R” where C represents the Columns and R represents the Rows. You can now only reassign the Dimensions to the Rows or Columns by a single click on “R” or “C” at a time. Also, you cannot reassign the Measures to the Rows or Columns as they remain configured based on the design time. You can only remove or add the Measures and at least there should be one measure being added. There is an option that you can deselect the members of both the Dimensions and Measures at any time.

The dashboard created allows the user to change the granularity on the fly to get more insights. This can be extended to the users to select the data source from connections in run time using the back-end connection technical component.

Performance

Let us consider two scenarios where we need to view the key figures across various dimensions in four different visualizations with and without the Data Utility Component.
ad-hoc-analysis-using-data-utility-lumira-designer-4ad-hoc-analysis-using-data-utility-lumira-designer-5

 

 

 

 

 

 

As shown in the above image, the profiling statistics shows that the data source time on startup is 5.3 seconds.

With Data Utility:
ad-hoc-analysis-using-data-utility-lumira-designer-4ad-hoc-analysis-using-data-utility-lumira-designer

 

 

 

 

 

 

Using the Data Utility component reduces half the data source time (2.4 Seconds) on startup.

DISCLAIMER

Information provided in this document is intended to be used only as a learning aid. While Visual BI Solutions has used reasonable care in compiling this document it does not warrant that this information is error-free. Visual BI Solutions assumes no liability whatsoever for any damages incurred by usage of this documentation.
This is intended to be a living document with changes to be made over time.
(or)
The steps outlined below are only for learning purposes. Try this out in your own environment at your own risk. Information is provided ‘as-is’, without warranty of any kind. Visual BI will not be liable for any damages or consequences thereof.

 

Know more about our Visual BI offerings and to read more related content click here.

 

Subscribe to our Newsletter

The post Ad Hoc Analysis using Data Utility in Lumira Designer appeared first on Visual BI Solutions.

A Guide To Convert HANA Calculation Views to CDS Views

$
0
0

The recent studies on data modeling guides users, to implement or model as much as you can at the database level. In SAP, we have the calculation views and CDS views that use the pushdown mechanism to consume the data at the database level rather than at application.

Use Case for Migrating HANA Calculation Views to CDS

To consume HANA Calculation views in the reporting tools like Analysis for Office or SAP Analytics Cloud, the users should be defined at the database level. From a security standpoint, many customers do not prefer creating users at the database level. Customers using S/4 HANA for reporting can migrate from calculation views to ABAP CDS, implying the maintenance of authorizations at SAP NetWeaver Gateway which eliminates the authorizations at the DB level.

With the introduction of S/4 HANA and reporting capabilities from the ERP system, SAP created standard content on the database tables with CDS views which are readily available for the end-users for reporting purposes. With the reporting enabled on the ERP system, the users will be able to analyze the live data as soon as the data is posted in the ERP systems.

Checklist for the Conversion

Following are the basic parameters to consider for the basic conversion

  1. Naming Standards
  2. Types of calculation view (Master or Transaction)
  3. View types (Private, Reuse and Reporting (Queries))
  4. Client Handling
  5. Variables and parameters
  6. Aggregation types & Assigning UOM/Currency Code

Naming Standards

1) File suffix
The file suffix differs according to SAP HANA XS version:

  • XS classic: hdbdd, for example, MyModel.hdbdd.
  • XS advanced: hdbcds, for example, MyModel.hdbcds.

2) Permitted characters

  • CDS object and package names can include the following characters:
  • Lower or Upper-Case letters (aA-zZ) and the underscore character (_)
  • Digits (0-9)

3) Forbidden characters
The following restrictions apply to the characters you can use (and their position) in the name of a CDS document or a package:

  • You cannot use either the hyphen (-) or the dot (.) in the name of a CDS document.
  • You cannot use a digit (0-9) as the first character of the name of either a CDS document or a package, for example, 2CDSobjectname.hdbdd (XS classic) or acme.com.1package.hdbcds (XS advanced).
  • The CDS parser does not recognize either CDS document names or package names that consist exclusively of digits, for example, 1234.hdbdd (XS classic) or 999.hdbcds (XS advanced).

VDM/Analytic models Types

In the conversion of HANA Calculation views to CDS Views choosing the right Virtual/Analytic data model plays an important role in how the data needs to be accessed.

Business data of an SAP system is exposed as an understandable, relatable, reusable, executable, stable and compatible platform for Consumers using VDM and hence the data model of a view can be defined as,

HANA Calculation View CDS Views Description
Private Views @VDM.viewtype: #BASIC

Created on top of DDIC Tables/ Views with no redundancies

Reuse Views @VDM.viewtype: #COMPOSITE

Derived or composed of BASIC Views that includes Joins/Association, calculated field, etc

 

Query Views @VDM.viewtype: #CONSUMPTION

Expose data to different analytical tools hence can be built on top of BASIC or COMPOSITE views

Data Aggregation, slicing, and dicing of data, data consumption at a multi-dimensional level are done by enabling Analytical Manager. The Type of a view can be defined as follows and hence the Analytic Manager knows how to interpret the data.

HANA Calculation View CDS View Description

Dimension based Calculation View / Attribute View

@Analytics.dataCategory: #DIMENSION

Represents the Master Data and can be used for Replication

Star Join based Calculation View / Analytic View

@Analytics.dataCategory: #FACT

Centre of Star Schema, it contains only measures and needed for Replication hence not joined with Master Data

Cube based Calculation View

@Analytics.dataCategory: #CUBE

Factual data with redundancies as It includes Master data join, data is replicated from facts and query can be built on top of it

Not Applicable

@Analytics.dataCategory: #AGGREGATIONLEVEL

Write Back Functionality consuming a CUBE view

Client Handling

Client Dependency in the view can be determined by using the @ClientHandling.type annotation. The default value for the view is @ClientHandling.type:  # INHERITED

HANA Calculation View Annotation Description
Session Client CLIENT DEPENDENT Client Specific
Cross Client CLIENT INDEPENDENT Cross Client
Not Applicable INHERITED

It depends on the Data sources used. The view is client-specific if at least one of the data sources is client-specific. The view is cross-client if none of the data sources is client-specific.

Client Handling algorithm can be specified to understand the Implicit Performance. It expands the Join Condition between the data sources in the view and behaves accordingly. The Default Value for the view is @ClientHandling.Algorithm: #AUTOMATED

Annotation Can be Grouped Cannot be Grouped
AUTOMATED INHERITED, CLIENT DEPENDENT CLIENT INDEPENDENT
SESSION_VARIABLE INHERITED, CLIENT DEPENDENT CLIENT INDEPENDENT
NONE CLIENT INDEPENDENT INHERITED, CLIENT DEPENDENT

Aggregation Types & UOM

The aggregation in calculation views can be defined in 2 types

  1. Native SQL
  2. CDS Incorporated

SQL approach supports most of the aggregation types defined in SQL standards. Supported functions are Sum, Min, Max, Avg, Count, Var, Stddev.guide-convert-hana-calculation-views-cds-views

In a CDS specific approach, the semantics are defined to specify what type of aggregation needs to happen. The supported aggregation types as shown in the below image.
guide-convert-hana-calculation-views-cds-views

The aggregations are defined with annotation @DefaultAggregation: #(Aggregations type as mentioned n the above image).

We can also declare a characteristic as a Unit of Measure/Currency Code with @Semantics.unitOfMeasure:true @Semantics.currencyCode: true.

A UOM/Currency can be applied to a measure to show meaningful data at the reporting layer using

@Semantics.quantity.unitOfMeasure: ‘Quantity Field Name’

@Semantics.amount.currencyCode: ‘Currency Field Name
guide-convert-hana-calculation-views-cds-views

Parameters and Variables

The parameters in CDS Views helps you to filter the data at the base node and once the parameter is declared, it is mandatory in nature. The CDS view with parameters are declared after defining the view Name with syntax as follows.

Define View /*ViewName*/ with parameters

Parameter1 : abap.char( 10 )

Parameter2: Dats

Ex:
guide-convert-hana-calculation-views-cds-views

Like the ‘Manage Mappings’ option in HANA calculation views which push down the Input Parameters from the top-level calculation view to the base level, CDS views also allow the push down of the filters using the below syntax.guide-convert-hana-calculation-views-cds-views

Like the ‘Variables’ in HANA Calculation Views, CDS views allows the creation of variables at the Query model. They use the annotation ‘viewType’ with value ‘Consumption’ as below.
guide-convert-hana-calculation-views-cds-views

Possible types inside the variable are as follows

TYPES VALUES
SelectionType HIERARCHY_NODE, INTERVAL, RANGE, SINGLE
DefaultValue Any Constant Value
Mandatory Boolean
MultipleSelections Boolean

 

 

REFERENCES

https://help.sap.com/viewer/cc0c305d2fab47bd808adcad3ca7ee9d/7.5.9/en-US/efe9c80fc6ba4db692e08340c9151a17.html

https://help.sap.com/doc/abapdocu_752_index_htm/7.52/en-US/abencds_client_handling.htm

 

 

Know more about our Visual BI offerings and to read more related content click here.

Subscribe to our Newsletter

The post A Guide To Convert HANA Calculation Views to CDS Views appeared first on Visual BI Solutions.

SAP Analytics Cloud – Overview of Planning capabilities

$
0
0

SAP Analytics Cloud is SAP’s cloud-based analytical tool with a wide range of capabilities. SAP brings its extensive enterprise analytics experience to bear in SAC by bringing forth an enterprise-level cloud-based analytics platform that caters to multiple user groups with a simple UI with gentle learning.

 

In this series of blogs, we will be looking at the planning capabilities of SAC in particular. The current BI market has a plethora of planning tools both on-premise and cloud-based but, in most cases, the planning tools are for specific capabilities and most of the planning tools are built to support planning capabilities (as expected) and do not cater to analytical needs. This often leads to different tools being used for analytics and planning and along with it, multiple data sets and the usual interoperability issues that come with it.

This works well when there is a lot of multivariate planning that needs to be done and often, the planning systems are set up to perform a lot of disaggregation and aggregation on various levels depending on a lot of configuration. This feature of planning tools also makes them hard for the high-level planner who just wants to plan at a higher level and simulate business outcomes rather than develop a fully-fledged plan.

For this purpose, we can classify the planning functions at a high level with some examples.

  • A financial planner who has to forecast the annual plan for the company, more often at an SKU level or Business Line level, which involves variables like Production capacity, material cost, logistics cost, etc.
  • A high-level planner who wants to simulate based on macro planning variables
    sap-analytics-cloud-overview-planning-capabilities

 

This comes to another Segway where we have 2 kinds of levers that can be used for planning – Macro and Micro. A Macro variable would be something that affects the entirety of the business, this could be things like Exchange rates, Trade tariffs, Cost of Logistics, Balancing plant output, etc. These are variables that are high level and have a deep impact on planning across multiple levels. A micro variable would be something that is specific to the enterprise or plant-like cost of utilities, cost of logistics to move to a warehouse, trucking costs, labor costs, etc.

In a rapidly changing business environment, it is very important to be able to understand the impact of these macro variables because they can change quickly, and business plans have to be changed accordingly. This would involve questions like – How will my profitability change of exchange rates went up b 5%, If there is a trade tariff of 20% what should be my manufacturing mix?sap-analytics-cloud-overview-planning-capabilities

Unfortunately, in most cases, these variables are baked into the planning logic that simulations of any specific variables would still involve a lot of changes in the planning model to simulate the outcome. As a result, the business often relies on excel sheets to simulate using thumb rules rather than try and change the planning model since these are in most cases one-off scenarios.

For this reason, we have SAC Planning which gives you a mix of both. In SAC Planning, you can do high-level planning and simulate your forecasts and also have the capability to push down the disaggregation if you are using SAP’s Business Planning and Consolidation.

In this case, SAC allows for two options:

  1. Import data from SAP BPC (Netweaver and Microsoft) into SAC and report on the same by mapping the dimensions, and then use SAC for planning instead of BPC and write back data to BPC from SAC
  2. Run predictive analysis on BPC data locally within SAC to identify trends and simulations and write back if necessary.
  3. The other advantage is that once the BPC data is imported into SAC, you can combine it with other data models to develop your plan further.
    sap-analytics-cloud-overview-planning-capabilities

 

These use cases are referred to as Hybrid Planning. We will look into these Hybrid Planning use cases in the subsequent blog. SAP Analytics Cloud has all the analytics capabilities which include Business Intelligence (BI), planning and predictive analytics in a single solution. SAC has a lot of features available along with planning features which can be referred to as SAC – Planning. The User Interface for SAC is user-friendly and has a very low learning curve for users to get acquainted with it.

SAP being an enterprise-focused company has built the SAP Analytics Cloud as a platform for all your analytical needs and not just as a visualization platform. As a result, it supports a multitude of functions like Planning, Predictive, Analytics, etc.

 

Reach out to our team here to know more about SAP Analytics Cloud other offerings from Visual BI Solutions.

 

 

Subscribe to our Newsletter

The post SAP Analytics Cloud – Overview of Planning capabilities appeared first on Visual BI Solutions.

Data Ingestion Techniques in Snowflake

$
0
0

Snowflake is a data warehouse built exclusively for the cloud. Unlike traditional shared-disk and shared-nothing architecture, Snowflake has a multi-cluster shared data architecture that is faster, easier to use and highly scalable. In this blog, we are going to cover the various data ingestion techniques in Snowflake.

Data Ingestion using Web Interface

The straightforward approach to do data ingestion into snowflake is through the Snowflake Web Interface. However, the wizard supports loading only a small number of files of limited size (up to 50MB).

Once you have created the table, click on the table to open the table details page and click on the Load Table option. When you select the Load Table option, the Load Data wizard opens which will load the file into your table.

1. Select the desired warehouse that is intended for data loading purposes.
data-ingestion-techniques-snowflake-1

 

2. Once you have selected the warehouse, you can either choose to load files from the local machine or if you have Amazon S3 storage in your landscape, you can choose to load it from there.
data-ingestion-techniques-snowflake-2

 

3. Select a file format for your data files from the dropdown list. You can also create a new named file format.
data-ingestion-techniques-snowflake-3

 

4. Once the file format is chosen, specify the load options in case an error occurs. Finally, click the Load button. Snowflake then loads the data file into the specified table via the selected warehouse.
data-ingestion-techniques-snowflake-4

Loading data from the internal stage using the COPY command

To load a bulk dataset from our local machine to Snowflake, SnowSQL should be used to upload these data files to one of the following Snowflake stages:

  1. Named Internal Stage
  2. Internal Stage for the specified table
  3. Internal Stage for the current user

Once files are staged, the data in the files can be loaded into the table using the COPY INTO command. For using Snowsql, download and install the Snowsql from the Snowflake web user interface.

  • Once installed, open the command line window and type Snowsql -v to check the version installed.
    data-ingestion-techniques-snowflake-5

 

  • Connect to SnowSQL by typing in your account name and username. You will be prompted for a password. Once given, it would be possible to connect to your Snowflake account.
    data-ingestion-techniques-snowflake-6

 

  • Once connected, set the database, schema, and warehouse used for the processing.
    data-ingestion-techniques-snowflake-7

 

  • Use the PUT command to load the data files from our local machine to the Snowflake stage (Internal, User or Table Stages). Here we have created a named staged “INTERNAL_STAGE” and have loaded the data file to that stage.
    data-ingestion-techniques-snowflake-8

 

  • Use the COPY command to populate the tables with the contents of the data file from the staging area.
    data-ingestion-techniques-snowflake-9

 

Loading data files staged in Amazon S3, Microsoft Azure, etc to Snowflake

If you already have an Amazon Web Services (AWS) account or Microsoft Azure account in your landscape, then you can use S3 buckets or azure containers to store and manage the data files. You can bring these data files into Snowflake either by directly accessing the storage and then load into snowflake tables or by staging these data files in an external stage and access the external stage instead. You can create an external stage either via an interface or by building the stage in the worksheet. In the case of Amazon S3, you need to have an AWS key and a secret key to access the bucket and in case of Azure, Shared Access Signature (SAS) token needs to be generated to access the Azure containerCREATE STAGE “SF_TRYOUTS”.”COE_TRYOUTS”.S3_STAGE S3 URL = ‘s3://test’ CREDENTIALS = (AWS_KEY_ID = ‘raghavia’ AWS_SECRET_KEY = ‘********’);Copy the data files into your table using COPY command by,

  • Accessing the S3 bucket directly, COPY INTO MYTABLE FROM s3://test CREDENTIALS= (AWS_KEY_ID =’raghavia’ AWS_SECRET_KEY =’********’) FILE_FORMAT = (FORMAT_NAME = my_csv_format);
  • Accessing the external stage created, COPY INTO MYTABLE FROM @S3_STAGE FILE_FORMAT = (FORMAT_NAME = my_csv_format); PATTERN=’.*.csv’;

Loading data from various data sources into Snowflake

Having different data sources across the organization with different requirements can pose to be a challenge. Data integration involves combining data from different sources and enabling users to query and manipulate data from a single interface and derive analytics and statistics. Snowflake can operate with a variety of Data Integration tools such as Alooma, Fivetran, Stitch, Matillion, etc.
data-ingestion-techniques-snowflake-10

 

Read more about similar Self Service BI topics here and learn more about Visual BI Solutions Microsoft Power BI offerings here. 

Subscribe to our Newsletter

The post Data Ingestion Techniques in Snowflake appeared first on Visual BI Solutions.

Deliver a powerful data story with Power BI Visuals Session at Microsoft Business Application Summit, Georgia, World Congress Center Atlanta.

$
0
0

A Powerful data story with Power BI Visuals a session by Ranin Salameh and Gopal Krishnamurthy which happened on June 10 &11 at Microsoft Business Application Summit 2019 Georgia World Congress Center, Atlanta received an overwhelming participation.

The session was titled “Deliver a Powerful data story with Power BI Visuals” by Gopal Krishnamurthy, CEO, Visual BI Solutions and Ranin Salameh, Product Manager at Microsoft. The event occurred on 11th and 12th of June at Georgia World Congress Center Atlanta.

Watch more about Visual BI’s experience with Power BI platform and a demonstration of valQ visuals for dynamic planning and simulations by Gopal Krishnamurthy, CEO Visual BI Solutions  & Ranin Salameh, Product Manager, Microsoft Power BI here.

microsoft-business-applications-summit-2019-visualbi-booth-33

 

ABOUT THE EVENT

The Microsoft Business Applications Summit 2019 event in Atlanta, had a great line up of sessions related to PowerApps, Microsoft Flow, Common Data Service and related technologies. Click here to view the session catalog.

Subscribe to our Newsletter

The post Deliver a powerful data story with Power BI Visuals Session at Microsoft Business Application Summit, Georgia, World Congress Center Atlanta. appeared first on Visual BI Solutions.

Visual BI at Microsoft Inspire 2019 Event

$
0
0

Gopal Krishnamurthy shared his experience and success story of valQ crossing 10k+ downloads with the colossal support from Power BI at the Microsoft Inspire event which happened at July 14-18 at Las Vegas, NV. Here is a snippet of stories from the best Power BI Partners, combined into short recording showcased at the beginning of the “Microsoft Power BI opportunity and roadmap” session. The session was led by Microsoft General Manager, Arun Ulag. Other partners also joined the event and shared their experience in the msinspire film – Rob Collie, Darren Goonawardana, Brian Knight, Liz Hamilton, Darrin Lange, PMP, Karolina Kocalevski.

microsoft-inspire-microsoft-ready-2019-visualbi-booth-307-event-page

 

About Microsoft Inspire

This co-location of Microsoft Inspire and Microsoft Ready will create multiple opportunities for partners and field teams to connect, learn, and collaborate on solutions that will accelerate the digital transformation and success of Microsoft customers. These shared experiences will add another dimension to the content-rich learning experience for which Microsoft Inspire is traditionally known. This year’s Microsoft Inspire will focus on three key ingredients to enable building and growing profitable businesses. Click here for more information about the event.

Subscribe to our Newsletter

The post Visual BI at Microsoft Inspire 2019 Event appeared first on Visual BI Solutions.

Why the Future of Data Warehousing is Cloud?

$
0
0

Undoubtedly, Cloud is the future. Businesses of all sizes, industries, and geographies are transforming into cloud-based solutions. Cloud adoption has drastically improved and it has become the easiest way to run any business. Data Warehousing is no exception when it comes to cloud adoption. There are multiple benefits that attract organizations to migrate from an on-premise to a cloud-based data warehousing solution.

Installation & Maintenance

Hardware procurement, operating system installation, applying patches, database deployment are things of the past. The platform is readily available and with minimal IT support and less impact on productivity, the systems can be deployed and made operational.

Easy Licensing Model

Subscription-based licensing & pay-for-what-you-use are some of the licensing options. The initial capital required is much lesser and the Return on Investment is quicker than ever.

Scalability

Scaling up or down is not just easier but is also cheaper and faster.

Availability & Security

Highly encrypted data storage, asynchronous data back up and disaster recovery has never been this easier. Very limited downtime ensures high degrees of availability.

These are some of the pressing issues in any legacy on-premise base data warehousing solution. These never seemed like issues until cloud-based solutions started making way into the market, and the shortcomings of on-premise solutions are clearly evident. It makes even more sense for organizations to explore and migrate to cloud-based data warehouses. There are several players in the cloud-based data warehousing stream, a few of them are:

1. Snowflake

Snowflake is a cloud-based software-as-a-service (SAAS) that provides cloud-based data storage and analytic services. Snowflake can be hosted on cloud platforms such as Amazon Web Service or Microsoft Azure. One of the highlights of Snowflake is that it can handle maximum concurrency on reading by isolating the computing resources from storage resources. It ensures greater elasticity and maximum performance with its unique Multi-Cluster shared data architecture, thus adding compute resources during peak load periods and scaling down the same when loads subside. It has unique features like Zero copy clone and Time Travel. Snowflake can handle both structured and semi-structured data like AVRO, JSON, etc. The storage resources can be scaled independent of the computing resources and hence data loading and unloading can be done without worrying about running queries and workloads. In Snowflake, the compute and storage are billed separately; storage charges are based on terabyte compressed per month and compute charges are billed on a pay-per-second model. Snowflake provides security by encrypting all the data that is stored on disks.

2. Google’s BigQuery

BigQuery by Google is a serverless, highly scalable, cost-effective cloud data warehouse with an in-memory BI Engine and with built-in machine learning capabilities.  It is powered by the Google Cloud Platform, which is a one-stop-shop for all Google Cloud Products.  BigQuery can source data from various cloud storage-based source systems. As per the requirements, either you can bring the data into BigQuery or you can access the external data remotely through a powerful federated query option without data replication.  BigQuery’s high-speed Streaming Insertion API provides support for real-time analytics with BI and AI capabilities. By having separate storage and computation layers, you can choose the best-fit storage and processing units for your business. Also, you have an option to choose between on-demand and flat-rate pricing models.  It can integrate very well with various leading BI Data Reporting/Visualization tools like Tableau, Qlik, Looker, SAP Analytics Cloud, etc.

3. Amazon’s Redshift

Amazon Redshift is a Data Warehousing offering available as part of Amazon web services (AWS). It offers enterprise-level relational database system with columnar data storage structure that can store petabytes of data. With its massively parallel processing and read optimization techniques it can provide analytics with the best performance. Modeling between structured and semi-structured data file systems is possible in Redshift. Its powerful analytics tool called Amazon Quick Sight is loaded with Machine Learning capabilities that give you diagnostic analytics and best predictions. Amazon Quick Sight is not just limited to Amazon products but with ODBC and JDBC drivers you can connect it to different reporting tools like Tableau, Spotfire, etc. It works on a pay-for-what-you-use model and you can easily scale up/scale down the storage based on the requirements. You can even readily load live streaming data with Amazon Kinesis Data Firehose and get real-time analytics from it.

4. SAP Data Warehouse Cloud

SAP, of late, is also getting into the race and it has revealed its own enterprise-level solution called SAP Data Warehouse Cloud. It is powered by SAP HANA Cloud Services which helps to integrate all the heterogeneous data in one place. You can seamlessly integrate on-premise as well as cloud data sources. Fueled by SAP HANA in-memory technology, it is expected to give unmatched performance benefits. Additionally, it is expected to provide real-time business benefits with out-of-the-box advanced analytics options using SAP Analytics Cloud. It has a concept called “Spaces” that helps to build a logical area for each line of business and lets you manage the storage and computation for each space.  You can leverage predictive, planning and machine learning capabilities that are in-built into the solution. It is also loaded with prebuilt templates and business content that are ready to be consumed. It helps users to get instant benefits and businesses will run better on SAP Data Warehouse Cloud.

The very first look of the solution on videos and blogs has created a real buzz in the SAP installed base. The product is under active development and it is expected for a beta release soon. Register for the beta program here.

 

For more information and insights on SAP Data Warehouse Cloud click here.

Subscribe to our Newsletter

The post Why the Future of Data Warehousing is Cloud? appeared first on Visual BI Solutions.


SAP Data Warehouse Cloud – The First Look

$
0
0

SAP Data Warehouse Cloud is an enterprise class cloud Data Warehousing solution by SAP. This could be your one stop solution for all your business needs where in you can get real time business insights as well using built-in advanced analytics in SAP Analytics Cloud in addition to the data warehouse and data integration capabilities. Powered by SAP HANA cloud services, it integrates data from all types of heterogeneous sources into one place. The solution caters the needs of all kinds of business across all sizes.

The three key layers of a Data Warehousing solution are carefully crafted into the SAP Data Warehouse Cloud solution to provide numerous benefits.
sap-data-warehouse-cloud-the first-look

Data Integration

Organizations find it critical to trap business benefits by making smart insights and arrive at critical decisions using the humongous data that gets generated every day. But, with the data distributed across multiple systems in diverse formats, the integration becomes more troublesome.

With the introduction of SAP Data Warehouse cloud powered by SAP HANA Cloud, integration of data from different landscapes has become easier. It offers the flexibility of managing and combining data from multi cloud, hybrid (cloud and on-premise) and on-premise environments. Decision making capabilities are further enhanced by analysing not only the structured data but also the unstructured data. You can harness the power of heterogenous data (structured, semi-structured and unstructured) across multiple systems using the SAP Data Warehouse Cloud. It gives open access to 3rd party ETL tools to import any data from any source into SAP Data Warehouse Cloud.

Data Modeling

SAP Data Warehouse Cloud has a new concept called Spaces which is a logical area that can be created for each line of business inside an organization. Here, the semantics are built with the natural language index to provide specific KPI names based on the line of business(spaces). With the data and semantics readily available, business users can create their own reports based on the available KPIs and make the decisions faster and smarter.

You can model in SAP Data Warehouse Cloud using two methods, SQL and Graphical. The Graphical modeling involves business users to create their own models whereas the SQL modeling would need IT’s efforts. SAP Data Warehouse Cloud comes with pre-installed standard business content across different modules right out of the box.

Analytics

SAP Data Warehouse Cloud is tightly coupled with embedded analytics platform called the SAP Analytics Cloud. It also pre-loaded with industry specific templates that are ready for consumption. Other 3rd party tools can also consume data from SAP Data Warehouse Cloud via inbuilt secured channels.

With the integration of SAP Analytics cloud with planning, predictive and forecasting capabilities, businesses can easily adopt the hassle-free planning and data simulation across KPIs and publish the planned data into the system with lesser involvement of IT. Advanced analytics is also possible via machine learning and python integration which will let you forecast the KPIs. Powerful and key insights are derived on real time data with the use of SAP Data  Warehouse Cloud.

Thus, SAP Data warehouse cloud provides an end-to-end cloud solution with quick turnaround time and reduces the total cost of ownership. All the eyes of the Data Warehousing world are on the release of SAP Data Warehouse Cloud which is coming soon. You can register for the beta programme here – https://saphanacloudservices.com/data-warehouse-cloud/

For more information and insights on SAP Data Warehouse Cloud click here.

Subscribe to our Newsletter

The post SAP Data Warehouse Cloud – The First Look appeared first on Visual BI Solutions.

Fetching Office 365 users using Graph API

$
0
0

Microsoft Graph is like a gateway to the data and insights in Office 365. This exposes APIs and libraries to access data from Office 365 services, Azure active directories, etc. It connects all the services available using relationships on which we can get valuable insight into the data from the Microsoft Graph API. Microsoft Graph is a RESTful web API that enables you to access Microsoft Cloud service resources. After you register your app and get authentication tokens for a user or service, you can make requests to the Microsoft Graph API. In this blog, we will feature briefly on how we can connect to Graph API and fetch Office 365 Users list from Azure Data Factory using examples.

Building a pipeline to fetch O365 Users

1) Create a new pipeline and drag a web activity which will fetch us the access token to get authorized to hit the graph API.
2) Configure the web activity by providing the following details:

  • URL
  • Method
  • Headers
  • Body

fetching-office-365-users-using-graph-api

 

3. Drag a copy activity and configure the source and sink datasets with necessary parameters. Source dataset will be a REST API which will have one parameter for holding the access token.
fetching-office-365-users-using-graph-api

4. Mention the expression for parameter which will hold the access token generated by the web activity.
fetching-office-365-users-using-graph-api

Sink dataset will be AzureDataLake file (here we are having JSON formatted file as a sink)fetching-office-365-users-using-graph-api

5. Validate and trigger the pipeline.

Following the above steps, your pipeline looks like the one in the below image.
fetching-office-365-users-using-graph-api

 

fetching-office-365-users-using-graph-api

The output file will look like below
fetching-office-365-users-using-graph-api

Thus, we have a list of Office 365 users and their details in the Users.json file fetched from graph API using Azure Data Factory.

Read more blogs on Microsoft click here to know more on Visual BI Solutions Self Service offerings click here.

Subscribe to our Newsletter

The post Fetching Office 365 users using Graph API appeared first on Visual BI Solutions.

Data Extraction from SAP ECC using Delta token – Azure Data Factory

$
0
0

Very often, we have a data warehouse, where we would like to integrate data from many data sources. Like any OLTP system, SAP ERP Central Component (ECC) has its own challenges, which restricts us from hooking on to the underlying data directly. One of the recommended ways of exposing data is through an ODATA service. In this blog, we will discuss how to consume from the ODATA service and delta handling in Azure Data Factory.

For this use case, we exposed ECC data though SAP’s Operation Data Provisioning service, which would, in turn, provide an ODATA endpoint. The ODATA service will return a delta token at the end of each request, which will be used later to get only the delta records and so on. Also, we would break the response into multiple pages by enabling pagination. Follow this link to understand the usage syntax.

Azure Data Factory is an on-demand data orchestration tool, with native connectors to many storage and transformation services. To deal with ODATA we could either use the ODATA connector, HTTP Connector or REST API connector. The ODATA connector was able to parse the response automatically but it didn’t have the provision to pass additional headers and enable pagination. Similarly, HTTP didn’t have the provision to enable pagination. The Rest API connector has both options available. Based on these considerations, we would recommend using the Rest API connector to extract ECC data.
data-extraction-sap-ecc-using-delta-token-azure-data-factory-1

 

How do we read the data?

Base URL of SAP’s ODATA would be as below;

http://<Host>:<>/sap/opu/odPortata/SAP/<Service Name>/<Entity Set Name>/<Query Options>

For the initial load, we would be invoking this URL with “odata.track-changes” headers. This would return all the records in the entity with the delta URL. This delta URL is unique to each user. For future reads, we will be using this delta URL to get the delta records. But, this meant we would have to store the response into a file and read again to get the delta URL. Instead, to avoid this extra read we decided to use the below URL, which would give us the history of all delta token generated by the user.

http://<Host>:<Port>/sap/opu/odata/sap/<Service-Name>/<Delta Set Name>

This API could return the delta URL in chronological order and we would use the latest delta token for each run. The same link would explain the usage of these endpoints.

Alternatively, you could use query parameters to get the delta, if the table has an “updated date” column.

This is how data orchestration process flow would be
data-extraction-sap-ecc-using-delta-token-azure-data-factory-2

 

Reach out to us for more details on delta implementation (or) for questions on exposing SAP ECC via ODATA service.

Subscribe to our Newsletter

The post Data Extraction from SAP ECC using Delta token – Azure Data Factory appeared first on Visual BI Solutions.

AD Credentials Passthrough in Azure Databricks for implementing RBAC

$
0
0

Azure Databricks is a Unified Analytics Platform built by the creators of Apache Spark. Databricks is the first Unified Analytics Platform that can handle all your analytical needs from ETL to training AI models. Databricks is committed to security by taking a Security-First Approach while building the product. This blog features on one such new security features provided by Databricks.

There are two methods to connect to Azure Data Lake,

  1. API Method
  2. Mount Method

To Connect through the API Method or Mount method, a Service Principal ID and key would be provided. The security concern with connecting through a Service Principal Key is that anyone who has access to the Databricks instance will have access to all the files in Data Lake, even if they don’t individually have access to files.

Now that is a security concern, how do we solve it? That’s where AD Credentials Passthrough comes into the picture. Say, we have marketing and finance related files in the Data Lake and we do not want marketing to access the finance files. We need to implement Role-Based Access Control, in Databricks. We can use this Credentials Passthrough method to achieve this goal. By enabling this option, Databricks would pass your AD access token to the Data Lake and fetch only the data the user has access to read. This works with Databricks instances in the premium tier, and with high concurrency clusters.
ad-credentials-passthrough-azure-databricks-implementing-rbac

Below are the Advanced options, you’ll find the Data Lake Credential Passthrough option. As you can see, this works for both Data Lake Gen1 and Gen2.  Now, let see how it works. For testing purposes, we have a file in Data Lake for which one user has access and the other does not have access.
ad-credentials-passthrough-azure-databricks-implementing-rbac

First, we try to access that file with Finance User credentials and we can read the file.
ad-credentials-passthrough-azure-databricks-implementing-rbac

Now, we try the same with the Marketing User credentials and receive an access denied error.ad-credentials-passthrough-azure-databricks-implementing-rbac

As you can see, the AD Credentials have been used to get a token which has been passed on to the Data Lake to check whether the user has access to the file. We can implement this with a mounted path, While creating the mount connection, do not provide the information needed in the regular config and use this.

ADLS Gen 1

ADLS Gen 2ad-credentials-passthrough-azure-databricks-implementing-rbac

For testing purposes, we have removed access for the file to the Finance User and created 2 mount paths. Vbitraining(service principal key) and vbitraining1. When I try to access the file with both the mount points you can see the error.ad-credentials-passthrough-azure-databricks-implementing-rbac

There are some limitations with using this method:

  • It’s not supported in Scala, currently supported in Python and SQL.
  • Supports Data Lake Gen1 and Gen2 only, other storage options do not work with this method.
  • It does not support the usage of some deprecated methods, but sc and spark methods work without issues

Now that we’ve seen this method, we can provide access to the marketing distribution list for their folder, so only the team can access it.

Reach out to us for any questions and click here to read blogs from Microsoft Azure category.

Subscribe to our Newsletter

The post AD Credentials Passthrough in Azure Databricks for implementing RBAC appeared first on Visual BI Solutions.

Introduction to Azure Data Lake Gen2

$
0
0

Azure Data Lake Storage Gen2 is Microsoft’s latest version of cloud-based big data storage. In the prior version of Azure Data Lake Storage, i.e.., Gen 1 the hot/cold storage tier and the redundant storage’s were not available. Though the blob storage in Azure had the capability of hot and cold storage but was short of features like a directory, and file-level security, etc., which are available in Azure Data Lake Storage Gen1.  In order to overcome this difference in storage and features, Azure Data Lake Storage Gen2 is Microsoft’s latest version of cloud-based big data storage.

Azure Data Lake Storage Gen2 is built on Azure Blob storage as its foundation. Azure Data Lake Gen 2 contains features from Azure Data Lake Storage Gen1 such as file system semantics, directory, file-level security, and scalability are combined with low-cost, tiered storage, high availability/disaster recovery capabilities from Azure Blob storage.

Azure Data Lake Gen1 vs Azure Data Lake Gen2

Azure Data Lake Gen1 Azure Data Lake Gen2
Azure Data Lake Gen 1 is file system storage in which data is distributed in blocks in a hierarchical file system.

 

Azure Data Lake Gen 2 contains both file system storage for performance & security and object storage for scalability.
Hot/Cold storage tier not supported Supports Hot/Cold Storage tier
Redundant Storage not supported Supports Redundant Storage
Azure data lake Analytics support available Azure Data Lake analytics support is not available (till date 2nd July 2019).

Creating Azure Data Lake Gen2 and Converting Blob Storage to Gen 2

1. Go to all resource -> Click Add -> Choose Storage Account -> Choose Account Kind as Storage V2.introduction-azure-data-lake-gen2

 

2. Once you created the Storage Account. Go to Configuration -> Enable Hierarchical Namespace.
introduction-azure-data-lake-gen2

 

Azure Data Lake Gen 2 provides different access tier for storing the data.

Hot Storage

When Data Lake Gen 2 is created with Hot access tier then the file available in the storage is readily accessible. Storage Cost for hot access tier is higher whereas Access cost is lower. In case these files are not being accessed frequently it will lead to paying a lot of costs.

Cool Storage

The purpose of creating the Cool access tier in Data Lake Gen 2(Storage Account V2) is that the file or storage is not accessed frequently. For example, monthly reports or annual reports which will be consumed once monthly or yearly have less access which will reduce the access cost. In the Cool Storage access tier, Storage cost is lower whereas Access cost is higher. In case these files are accessed frequently it leads to paying a lot of costs.

You can choose the hot access tier at the time of creating the Storage Account V2 (Data Lake Gen 2).

introduction-azure-data-lake-gen2

You can also change the access tier by object level once you created the storage account which was available in blob Storage not in Azure Data Lake Gen 2 General availability.

To convert the existing Blob Storage Account v1 to Gen 2, you need to upgrade the storage account from V1 to V2 by clicking the upgrade button under configuration and enable the Hierarchical Namespace under the Data Lake Gen 2.
introduction-azure-data-lake-gen2

Use cases of Blob Storage and Azure Data Lake Gen 2

Blob storage can be useful when you are going to store only backup files, images and videos which have very less transaction on top of the blob storage may reduce your transaction cost. When comparing the transaction cost of Gen 2 and Blob storage, Gen2 transaction cost is a little high due to the overhead of namespace. However, storage cost for Blob storage and Gen 2 will be the same. Choose the storage based on your usage like analytical and non-analytical use cases.

Pros of Azure Data Lake Gen 2 over Gen 1

  • Azure Data Lake Gen 2 contains file system storage and object storage which is available in Blob storage which gives the flexibility to store Excel, Images, and Videos, etc.,
  • Hierarchical File system leverages better query performance in ADLS Gen 2.
  • Object storage leverage better scalability and is cost-effective.
  • Granular security to files and directory level can be achieved with help of Role-based Access Control (RBAC) and Access Control List (ACLs)

A diagram to illustrate the Azure Data Lake Gen 2introduction-azure-data-lake-gen2

 

Cons of Azure Data Lake Gen 2 (Expecting Update soon for the below features)

  • Snapshots and soft delete which available in Azure storage is not available in Gen 2
  • Object-level storage tiers (such as hot/cold/archive) and lifecycle management are not available in Gen 2
  • Direct connectivity from Power BI or Azure Analysis Services are not available. Power BI Dataflows can connect with Azure Data Lake Gen 2
  • Integration with Azure Data Lake Analytics (U-SQL) is not available as of now in Gen 2

For more information on limitation, please refer below link:

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-known-issues

Pricing of Azure Data Lake Gen 2

When we compare the Azure Data Lake Gen 2 pricing with Gen 1, Gen 2 pricing will be half the price of Gen 1.

https://azure.microsoft.com/en-in/pricing/details/storage/data-lake/

Upgrading Azure Data Lake Gen 1 to Gen 2

Please refer to the Microsoft recommended practice for upgrading Gen 1 to Gen 2.

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-upgrade

 

Reach out to us for any questions and click here to read blogs from Microsoft Azure category.

 

 

 

Subscribe to our Newsletter

The post Introduction to Azure Data Lake Gen2 appeared first on Visual BI Solutions.

Viewing all 989 articles
Browse latest View live