Azure Data Factory Alerts to User

August 22, 2019, 12:00 am

≫ Next: How Data Integration works in Snowflake?

≪ Previous: Introduction to Azure Data Lake Gen2

Azure Data Factory Alerts provide an automated response that can be beneficial to monitor and audit Azure Data Factory activity. These alerts are very proactive and more efficient than manual monitoring operations. Alerts can be fired on both success and failure of a pipeline based on the rule configuration.

Alert Rule

Azure Data Factory Alerts use an alert rule which states the criteria upon which the alerts should trigger. We can enable or disable the alert rules.

To create a new alert rule, we must configure three properties;

1. Alert Rule Name and Severity
2. Target Criteria
3. Configure Email/SMS/Push/Voice notification

Steps to figure the alert rule

1. Navigate to monitoring tab in Azure Data Factory. Select Alerts & Metrics panel and select New Alert Rule.

azure-data-factory-alerts-to-user

2. Set Alert Rule Name and add severity to the alert.
azure-data-factory-alerts-to-user

3. In Target Criteria, select Azure Data Factory metrics on which the alerts must be triggered. There will be a monthly rate as per the configured criteria.

Here I have selected ‘Failed pipeline runs metrics’ which will trigger only when a pipeline activity fails.
azure-data-factory-alerts-to-user

This will navigate to configure alert logic tab.

Set Show History to compare the metric values within that time period
Select the dimension values so it could filter on the right time series. Here the dimensions are pipeline name and failure type. These dimensions vary for each data factory metrics.

Alert Logic – To compare the metric value with threshold calculated based on time aggregati

Based on the above alert logic condition, we can receive alerts on the very first pipeline failure.

Set the period and frequency based on which the above time aggregation in alert logic condition works

After adding this criterion, we will be redirected to alert rule page. Only two criteria can be added.

4. In configure Email/SMS/Push/Voice notification, an action group must be set

Action Group: An action group defines a set of notification preferences and actions included by Azure alerts
We can either create a new action group or use an existing action group
To create a new action group,
Enter a name in the Action group name box and enter a name in the short name box which is used in each alerts sent using this group.
Then select Add Notification and enter a name for Action Name

There are four options to receive the alerts and Carrier charges may apply for some of these services.

Email
SMS
Azure App push notifications
Voice

5. After configuring all the mentioned properties, click on to create alert rule and declare the enable option. The alerts will be triggered only if it is enabled.

If the selected pipeline fails, an alert will be triggered and the fired alert will be in active state. We can check this alert in the monitor service from the Azure portal.
azure-data-factory-alerts-to-user

During alert active state, if the same pipeline run completed successfully, then alert will be deactivated. For the same reason, we receive an automated deactivated severity alert mail. If we want to disable these deactivated alerts, we can control this through suppress notification under action group.
azure-data-factory-alerts-to-user

These alerts provide around a clock service, to manage and monitor the data factory. This will allow us to quick wit the problems that affect any dependency process and increases the data integration process. But these alerts come up with regular monthly rates.

To know more about Azure pricing information click here. Azure data factory alerts is one of the ways to receive notification in case of a pipeline run failure.

To learn more about other alternatives for triggering alerts in case of pipeline run failure refer here.

Subscribe to our Newsletter

The post Azure Data Factory Alerts to User appeared first on Visual BI Solutions.

↧

How Data Integration works in Snowflake?

August 10, 2019, 6:05 am

≫ Next: How to achieve Delta/Incremental Load in Alteryx

≪ Previous: Azure Data Factory Alerts to User

Data has always been at the heart of a modern enterprise. For Decades most of the efforts have focused on bringing or joining the data from various Enterprise applications like Enterprise Resource Planning (ERP), Customer Resource Management (CRM), Point of Sale (POS) etc. for analysis.

Data Integration is the process of combining data from different sources to provide a unified view of data with valuable information and meaningful Insights. Movement of data into Data warehouse can be achieved using the concept of ETL/ELT. Extract, Transform and Load is a general procedure used to move the data from source database to target database. As we all know the series of functions involved in data integration are as follows.

Extract – Desired Data is extracted from Homogeneous or Heterogeneous Data sets

Transform – Data is transformed/Modified to desired format for storage

Load – Migration of data to target database or Data Marts or Data Warehouse
data-integration-works-snowflake

The Integration of data using ETL (Extract, Transform and Load) occurs by (Initially loading the data into staging layer from different source systems, transformation logics are applied at staging server and then transferred to data warehouse. The staged data needs to be maintained on-time to remove unnecessary data from system) converging data from different source systems using transformation tools like SAP BODS, Alteryx, Informatica etc., providing the unified data for reporting purposes, whereas in ELT(Extract, Load and Transform) the data from different source systems are loaded into a single system and transformation logics are applied inside the same. With ELT approach, integrating data at a single place gives the assurance of accuracy and consistency of data over its entire lifecycle.

What is Snowflake and how data integration works?

Earlier days in Warehousing the database costs are much higher, even though the cost of storage sector went down over the years there wasn’t much change in the design structure making it dependable on the external tools. With introduction of Cloud Data warehouse, separating storage and computation layers offering scalable storage sector with lesser costs, the data from different source systems can be stored at a single place providing more accuracy than the traditional ones. Also, the maintenance and operations costs are reduced drastically by dynamic pricing strategies.

Snowflake is a SaaS(Software as a Service) offering that provides an analytic Data Warehouse, hosted on cloud platforms such as Amazon Web Services(AWS) or Microsoft Azure. The storage resources in Snowflake can be scaled independent of the computing resources and hence data loading and unloading can be done without worrying about running queries and workloads.

Though Data Integration (Preparation of Data, Migration or Movement of data) in Snowflake (Image as shown below) can be handled using both ETL and ELT processes, ELT is preferred approach within Snowflake Architecture.

Architecture in Snowflake

Data Ingestion/transformation in Snowflake can be done using external third-party tools like Alooma, Stitch etc. To learn more about how to load data via data ingestion tools, Snowflake provides partner account which offers free trial. As of June 2019, the partner and non-partner accounts supported by Snowflake are as below. data-integration-works-snowflake

We can find the complete list of data integration tools compatible with Snowflake(will be updated by Snowflake in time) at https://docs.snowflake.net/manuals/user-guide/ecosystem-etl.html .

To load structured/transformed data (ETL) in snowflake third party transformation tools like Alteryx, Informatica etc., are used in the transformation process whereas if the transformation needs to be applied after data ingestion the preferred way to approach this is using DBT.

Why DBT?

DBT (data build tool) is an open source command line tool which enables Business Data analysts/scientists to effectively transform the data after loading inside Data Warehouse. It works on different Warehouses (Redshift, Big Query and Snowflake). Most external transformation tools need the data to be pushed into their server, Transform the data and move them back into database. Whereas in DBT the it compiles and creates a materialized view using the using the compute resources of the warehouse.
data-integration-works-snowflake

Image Source: https://www.getdbt.com/product/

The manifestation of data inside Warehouse with DBT can be expressed in the form of tables and views. The core function of DBT was to compile the code and execute it against Data Warehouse using DAG (Directed Acyclic Graph) and the outputs are stored in tables and view format. With the integration of Jinja, an open source template language used in Python eco-system the reusability of code will become much easier (e.g. if statements and for loops). With templates created in Jinja, we use the ref() function to call the templates for easier coding. These are similar to Functions in SQL perspective.

The functionality of DBT is not just limited to transformation of data it also provides,

Version control
Quality Assurance
Documentation

Transformation can also occur externally in Snowflake using ETL tools and then loaded into Snowflake, however with different costs for storage and computation layers ELT provides more accrued and consistent data over time. Since Snowflake can handle massive storage, data can be loaded into Snowflake followed by Transformation using DBT thus reducing the load time and time spent in transit making the system more effective.

Read more about similar Self Service BI topics here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post How Data Integration works in Snowflake? appeared first on Visual BI Solutions.

↧

How to achieve Delta/Incremental Load in Alteryx

August 11, 2019, 6:53 am

≫ Next: Looker’s Persistent Derived Tables

≪ Previous: How Data Integration works in Snowflake?

Updating or loading only the new records in the target without doing anything on the old records is the common requirement in any ETL process. In this blog let us look on how to achieve delta\incremental update in Alteryx in an excel file with a simple example and compare the performance between local file update and In-DB update in HANA.

Requirement

Need to load the records that are not available in the target file or table. For example, we need to load only 5 records that are not available in target table.
delta-incremental-load-alteryx

delta-incremental-load-alteryx

Logic needed

To perform the delta load you need to restrict or filter out the data in the source that are already available in the target.

Solution

You can easily achieve this using the join tool, which can be used to filter out the records already available in the target.

You can find three different anchors in the output of join tool such as L, J and R. L anchor of the join tool will help us achieving this, which will give us the records that are not available in the target based on our condition that we have provided in the tool to compare. In our case configure the Join tool as shown below.

delta-incremental-load-alteryx

Output of L anchor

The new records from the join tool can be inserted into the target file using the output Data tool by configuring the tool to append the data to existing sheet.
delta-incremental-load-alteryx

Workflow

Using the join tool and output data tool with the above configuration. We can achieve the delta load in work flow as shown below.
delta-incremental-load-alteryx

If you are working with the In-DB tools the delta load has to be handled differently like what you see below:
delta-incremental-load-alteryx

In the above example workflow using IN-DB tools, you will not find anchors like what the default join tool has. That’s the reason you will do a left outer join on the source table with the target. So, you will get all the target records joined with source, which will provide us the records with null value for the records that are not available in target.

Using the filter tool, we can filter out the list of records which has null values in the newly joined field which will provide us the delta records that are needs to be loaded. Then we can unselect the fields from target table using the select tool and then load them to the target.

Conclusion

In Alteryx, using the Join tool, delta change can be captured by comparing the data key objects in a flat file or in a Database easily.

Read more about similar Self Service BI topic here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post How to achieve Delta/Incremental Load in Alteryx appeared first on Visual BI Solutions.

↧

Looker’s Persistent Derived Tables

August 12, 2019, 9:41 am

≫ Next: Data Tiering Properties in ADSO

≪ Previous: How to achieve Delta/Incremental Load in Alteryx

Persistent Derived Tables in Looker is one of the unique key features. Many BI softwares allows us to create derived tables but they are local to that particular software tools. Also when we talk about persistent tables, the first concern everyone raises is the performance. In this blog we will see how Looker is changing the perception of Derived tables.

With Looker’s Persistent Derived Tables we can extend the capabilities of Looker to use it as a modeling tool. Looker supports many dialects, we can use LookML capabilities to create Derived tables and persist them in the database .In the database, these tables can act as your normal database objects.

At a high level, Looker’s derived table functionality provides a way to create new tables that doesn’t already exist in the database to extend our analysis. In some cases, they can also play a valuable role in enhancing query performance.

Below,we will see how to configure PDT’s (Persistent Derived Tables) in Looker.

Looker’s ability to provide PDTs depends on whether the database dialect supports them or not. Snowflake, Redshift, Google Big Query, SAP HANA, DB2, Azure Warehouse and many more support PDT’s. Follow here for updates.

The image below shows a sample of PDT enabled Looker database connection.

Where to Configure and Prerequisites?

In the Connections parameters, check the box “Persistent Derived Tables” to enable the feature when creating a connection or for the existing connection.

This reveals the Temp Database field and the PDT Overrides column. Looker displays this option only if the database dialect you chose supports using PDTs.

Temp Database is a Prerequisite.

Although this is labeled Temp Database, we will enter either the database name or schema name that Looker should use to create persistent derived tables with the appropriate write permissions.

Each connection must have its own Temp Database or Schema; they cannot be shared across connections.

Identify PDT and Datagroup Maintenance Schedule

This setting accepts a cron expression that indicates when Looker should check datagroups and persistent derived tables (that are based on sql_trigger_value) to be regenerated or dropped. Below are sample cron expressions to use.

cron expression	Definition
/5 8-17 * MON-FRI	Check datagroups and PDTs every 5 minutes during business hours, Monday through Friday=
**/5 8-17 * ***	Check datagroups and PDTs every 5 minutes during business hours, every day
*0 8-17 * MON-FRI**	Check datagroups and PDTs every hour during business hours, Monday through Friday
1 3 * * *	Check datagroups and PDTs every day at 3:01am

Below are couple of resources to assist with creating cron strings

https://crontab.guru Help editing and testing cron strings.
http://www.crontab-generator.org Select time settings and the generator creates the corresponding cron string.

Please check here to configure sql_trigger_value regeneration for your database.

How to define Derived Tables?

We can define a derived table in one of these ways;

Using LookML to define a Native Derived Table (NDT):

These derived tables are defined in LookML, referring to dimensions and measures in the model.

Turning a SQL query into a derived table definition:

These derived tables are defined in SQL, referring to tables and columns in your database. We can use SQL Runner to create the SQL query and turn it into a derived table definition. We cannot refer to LookML dimensions and measures in a SQL-based derived table. lookers-persistent-derived-tables

Making a Derived Table Persistent

There is no parameter that means “make this derived table persistent.” Rather, persistence is created by the addition of the datagroup_trigger, sql_trigger_value, or persist_for parameter.

Please find details here

Rebuild PDT’s

These PDTs are rebuilt periodically based on one of three settings details here

triggered by a change (using sql_trigger_value)
a set time period (using persist_for)
a caching policy definition in a datagroup, triggered by a change using sql_trigger, and assigned to the PDT using datagroup_trigger

Where to find created Persisted Derived Tables and Datagroups?

Find already created Datagroups and Persisted Derived Tables LookML under Admin->Database section. Clicking on LookML will take us to model where we defined these objects.

We can see default “ecommerce_etl” datagroup and default Persistent Derived tables.

Considerations

There are some situations where you should avoid persistence. Persistence should not be added to derived tables that will be extended (The extends parameter allows you to reuse code), because each extension of a PDT will create a new copy of the table in your database. Also, persistence cannot be added to derived tables that make use of templated filters or Liquidparameters. There are potentially an infinite number of possible user inputs with those features, so the number of persistent tables in your database could become unmanageable.

Conclusion

Looker provides great flexibility to use PDT’s for performance by persisting results of complex queries, we can write results to database. We can create PDT’s with LookML or with direct SQL for database. We can regenerate or drop with PDT maintenance schedule and sql trigger parameter values. Looker allows to Index PDT’s, in fact it is must. PDT is huge subject in Looker, please go through here for better understand caching and datagroups .

Read more about similar Self Service BI topic here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post Looker’s Persistent Derived Tables appeared first on Visual BI Solutions.

↧

Data Tiering Properties in ADSO

August 22, 2019, 6:00 pm

≫ Next: Fetching Top N Records in Alteryx

≪ Previous: Looker’s Persistent Derived Tables

In this blog, we are going to see the different data tiering options available with release of SAP BW/4HANA 1.0 SP04 and their advantages.

Need for Data Tiering

Maintaining large volumes of data in in-memory impacts system performance
Data growth proportionately affects the hardware and the storage costs.

With the release of BW/4HANA 1.0 SP04, SAP is providing a functionality to classify the data in your datastore object as hot, warm and cold based on the data type, operational usefulness, performance requirements, frequency of access and security requirements of data.
data-tiering-properties-adso

Hot (SAP HANA Standard Nodes)

This tier is used to store data that is frequently accessed and is of high importance. Since data is being stored in-memory of SAP HANA database, data retrieval and query performance is higher.

Warm (SAP HANA Extension Nodes)

This tier is used to store data that is less frequently accessed. Data is stored in a dedicated SAP HANA extension node (available with SAP BW 7.4 with HANA 1.0 SP12), which will be reserved entirely for warm data storage and processing.

When the database is queried, it automatically retrieves data from the appropriate node. However, extension node provides more storage capacity when compared to worker nodes by extensively leveraging the disk to load and unload data whenever necessary into memory. While this provides an added advantage rather than storing everything in-memory, the access to warm data is however slower than hot data due to the fact that data sets need to be loaded into memory for processing.

For the minimal setup, we need to have one master node to store hot data and one extension node to store warm data.

Cold (External Cold Store)

This tier is used to store data for sporadic or very limited access. The data is stored externally in SAP IQ, Hadoop or SAP Vora that are managed separately from SAP HANA database but still accessible at any time.

Implementation Steps

In the General Tab of ADSO, under Data Tiering Properties, you can choose to whether store your data in the standard layer (hot), extension layer (warm) or the external layer (cold). Based on the selected temperature, you can go ahead and choose the method by which the data needs to be moved to the specified storage layer i.e. either by partition level or by object level. To store data in the external layer, partitions must be defined.

If you have chosen “On Partition Level”, you must create partitions based on characteristics such as time characteristics (0CALDAY, 0CALMONTH etc) to divide the entire dataset of datastore object into several small independent units.
If you have chosen “On Object Level”, it refers to the entire dataset of the datastore object.

To create a partition,

1. Go to the Settings tab of ADSO and browse for the relevant characteristic by which we are going to split (0CALYEAR e.g.) and provide a lower limit and upper limit for the partition.
data-tiering-properties-adso

2. Once created, you can choose to split the partition either by simple split, which will split the partition into two or by equidistant split, which will split the partition based on days, months, years etc.
data-tiering-properties-adso

3. Finally, you can maintain temperature setting for each individual partition by clicking on “Maintain Temperatures” which will take us to tcode RSOADSODTOS. Below we have created partition based on 0CALYEAR and we have defined warm storage for data below the year 2014 and hot storage for recent years.
data-tiering-properties-adso

Automation of Data Movement across different tiers

SAP has provided a new process chain type Adjust Data Tiering, which can be used to schedule the DTO job.

It has also provided a set of API’s that can be used in changing and scheduling temperature changes. This makes it possible to automate temperature changes and movements.

API	Description
RSO_ADSO_DTO_API_GET_INFO	Reads the partition temperatures of a specific Data Store object (advanced)
RSO_ADSO_DTO_API_SET_TEMPS	Changes the partition temperatures
RSO_ADSO_DTO_API_EXECUTE	Executes the DTO job

Learn more about Visual BI’s SAP HANA Offerings here and click here for reading more SAP HANA related blogs.

Subscribe to our Newsletter

The post Data Tiering Properties in ADSO appeared first on Visual BI Solutions.

↧

Fetching Top N Records in Alteryx

August 15, 2019, 2:51 am

≫ Next: SAP Analytics Cloud – Consuming Data Using ODATA as a Data Source

≪ Previous: Data Tiering Properties in ADSO

In a large dataset, when you want to focus on certain data you can apply filtering/ aggregating/ fetching top or bottom data. Alteryx provides enormous tool to achieve this data preparation.

In our previous blog, we solved one of the weekly challenges to demonstrate the usage of tools like transpose, crosstab, sorting, sampling.

In this blog, we have taken one business case and explained the steps involved to solve it in Alteryx.

Requirement

Let’s say the CEO of a company wants to know which are the Top 3 Products that are performing well and in which countries they are performing (i.e., top 3 countries).

Steps of the workflow

Step 1: Get the input

The Text Input tool will have the details of country-wise product sales.
fetching-top-n-records-alteryx

Step 2: Find Top 3 Countries by sales

2.1 Required fields selection

Select Tool is used to restrict the data to only Country and its Sales amount.

fetching-top-n-records-alteryx

2.2 Aggregate Sales

By default, select tool won’t aggregate the input data. So, Summarize Tool is used to aggregate Sales amount by Country. It is used to filter input data by selecting required fields to increase the workflow’s performance.
fetching-top-n-records-alteryx

2.3 Find Top 3 countries

Use the Sort tool to arrange the Sales value in descending order. Further, use the Sampling tool to pick the top 3 sorted records.
fetching-top-n-records-alteryx

The output of Step 2 looks like this:
fetching-top-n-records-alteryx

In Alteryx Designer, we use various tools for different operations that are later organized into executable workflows. Tool Container is the component that is used to group tools in the workflow. This will include/exclude a set of tools from the workflow execution.
fetching-top-n-records-alteryx

To get this tool container:

Select the required tool -> Right click -> In the context menu, choose the Tool Container option.
Type the tool name in the search box and select it

Note: We will use this tool in upcoming steps to group the tools.

Step 3: Find Top 3 Products from each country

Arrange Sales amount in descending order using the Sort tool. Specifying top 3 values in Sampling tool gives top 3 records alone as the output. What we finally need is the top 3 Products in each country. This can be accomplished by leveraging Group by column option in the Sampling tool. Check the configuration below:

From steps 2 and 3, we now have the top 3 countries and the top 3 products for each country.

4. Find Intersection

To achieve our requirement, we must find the Intersection of these two results. To do this, we can use the Join tool in Alteryx as shown below.

Usually Join tool will have three outputs namely, L anchor, R anchor, J anchor. Here we have joined step 2 & 3 outputs as Inner Join.

fetching-top-n-records-alteryx

Solution

fetching-top-n-records-alteryx

Final Output

fetching-top-n-records-alteryx

By this we have achieved our business case of finding the top 3 Sales from top 3 Countries while exploring the features of Alteryx tools such as Select, Join, Sorting, and Sampling.

Read more about similar Self Service BI topic here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post Fetching Top N Records in Alteryx appeared first on Visual BI Solutions.

↧

SAP Analytics Cloud – Consuming Data Using ODATA as a Data Source

August 16, 2019, 2:45 am

≫ Next: Security and Localization in Looker

≪ Previous: Fetching Top N Records in Alteryx

SAP Analytics Cloud allows you to connect to various data sources from live data connections to on-premise or cloud systems. In this blog, let us see how OData can be used as a data source for SAP Analytics Cloud.

***

What is OData?

OData is a REST-based protocol for querying and updating data. It is built on standardized technologies such as HTML or XML and JSON. It provides a uniform way for both creating data and data models.

Connecting to OData Services

1. Go to Connections from Main Menu of SAP Analytics Cloud.
sap-analytics-cloud-consuming-data-using-odata-data-source-1

2. Click Add Connection.
sap-analytics-cloud-consuming-data-using-odata-data-source-2

3. Select OData Services under Acquire Data.
sap-analytics-cloud-consuming-data-using-odata-data-source-3

4.Add Connection Name and Description as required.

5. SAP OData Service – If you use SAP OData Service, you can make use of specific SAP metadata as given in SAP ANNOTATIONS FOR ODATA VERSION 2.0. Some features like search, filter, sort are available which are disabled by default.

6.On-Premise OData service – You need to install an SAP CP Cloud Connector and properly map the OData service URL to the on-premise system.

7. Enter the Data Service URL and click create.
sap-analytics-cloud-consuming-data-using-odata-data-source-4

8. The created connection appears in the Connection page.
sap-analytics-cloud-consuming-data-using-odata-data-source-5

Creating a Model using OData Connection

1. Select the Create Model option from Main Menu.
sap-analytics-cloud-consuming-data-using-odata-data-source-6

2. Choose the option Get data from a data source to connect with OData source. Select the OData Services option.

3. Choose the previously created OData Connection. Click Next.
sap-analytics-cloud-consuming-data-using-odata-data-source-7

4. Select the Option Create a new query if you want to create a new query for the model using OData.

5.A new pop-up appears asking for a new Query name. Enter the required query name and select any one of the metadata options available.
sap-analytics-cloud-consuming-data-using-odata-data-source-8

6. Build Query popup allows you to select the data for the query to be created. Once the data is selected, click on create.
sap-analytics-cloud-consuming-data-using-odata-data-source-9

7. Once the Query is created, you will be taken to the model creation. Including/excluding dimensions and other Data Management options are available. Once complete select ‘Create model’ and now you can use the OData model in the story or analytic application.
sap-analytics-cloud-consuming-data-using-odata-data-source-10

To refresh the data, go connections page and switch to the Schedules tab. You can view the Model created along with the option to refresh the data.
sap-analytics-cloud-consuming-data-using-odata-data-source-11

Advantage

You can connect to customized OData Services, however, proper configuration is needed to take full advantage of the customized OData Services.

Limitation

SAP Analytics Cloud supports only OData Version 2.0 for now. Hence complex algorithms and arithmetic operations are not supported yet.

***

To learn more about SAP Analytics Cloud check out our series of blogs here. Reach out to us here today if you are interested in evaluating if SAP Analytics Cloud is right for you.

Subscribe to our Newsletter

The post SAP Analytics Cloud – Consuming Data Using ODATA as a Data Source appeared first on Visual BI Solutions.

↧

Security and Localization in Looker

August 16, 2019, 11:35 pm

≫ Next: Unlimited Audit Capabilities in Looker

≪ Previous: SAP Analytics Cloud – Consuming Data Using ODATA as a Data Source

There are commonly two types of information security in BI reporting;

Row-level security – This enables filtering of rows of data from a query output based on user authorization. Example : Business users from Sales Division Dallas should only see Dallas Sales.
Column-level security – This enables the hiding or masking of fields or columns from query output based on user authorizations. Example : When customer credit information report is shared to agencies,they should not see Credit card / phone numbers.

When it comes to providing analytics to users we would need to restrict data at row or column level frequently. Though we will leverage data security inherited from database most of the times, now and then we require to configure security at the front end. In this blog we will discuss options available out of the box LookML to configure data security, Localizing Looker UI and localizing the data model!

Looker provides simple ways to handle row, column security and localization options. In Looker Administrators can set granular permissions by user or group, and can restrict data access from the database level all the way down to the row or column level.

Column (Field) level security

We Can Exclude Individual objects by Using fields with “explore”.

We can use fields under the explore parameter in order to take advantage of the ALL_FIELDS* set and then exclude fields. For example:

security-localization-looker-1

All fields and sets must be fully scoped (use view_name.field_name syntax)
Fields and sets may be excluded by preceding them with a hyphen (-)
Set names always end with an asterisk (*)
We may use the built-in set name ALL_FIELDS*, which includes all fields in the Explore
We may use the built-in set name view_name*, which includes all fields in the referenced view.
A list of fields may be used like this: [view_name.field_a, view_name.field_b].
We can also reference a set of fields (which is defined in a view’s set parameter), like [view_name.set_a*]. The asterisk tells Looker that we are referencing a set name, rather than a field name.

Row Level security

Access_filter lets us to apply user-specific data restrictions. An access_filter parameter is specific to a single Explore, so we need to make sure we apply an access_filter parameter to each Explore that needs a restriction.

In below example we created a user attribute called currency and used it along with access_filters in the model.Note,User Access must be “View” to use user attribute in access_filters.
security-localization-looker

A WHERE clause is added to SQL when the field “currency” used in explore.Please note, we have not added any filters manually.
security-localization-looker

Localizing the Looker User Interface

Certain Looker user interface text can be displayed into the following languages:

Language	Locale Code and Strings File Name
French	fr_FR
German	de_DE
Japanese	ja_JP
Spanish	es_ES
Italian	it_IT
Korean	ko_KR
Dutch	nl_NL
Portuguese	pt_PT
Brazilian Portuguese	pt_BR
Russian	ru_RU
English	en

To enable localization, set the locale for users or user groups through one of the following methods:

To set a locale for individual users: Use one of the codes in the above table in the Locale (beta) field on the Edit User page in the Admin panel.

To set a locale for a user group: Assign one of the codes in the above table to the locale user attribute for a particular user group. If users within a group have set a custom value using the Locale (beta) setting, the custom value will override any value assigned to the group. To prevent that from happening, ensure the User Access setting for the locale user attribute is not set to Edit.

For users with no locale set, Looker uses English as the default.

Below results for setting user attribute locale(beta) to es_MX. Please note Browse,Explore,development and Admin menu items.
security-localization-looker

Localizing View and Field Names in Looker

With model localization, we can customize how the model’s labels and descriptions display according to a user’s locale. We can localize labels, group labels, and descriptions in the model. We can localize LookML dashboard title, description and text.

Models can be localized into any language using.strings files.To localize both data models and the Looker user interface, the models’.strings files must be named with the same locale code used for the user interface.

Conclusion

Looker’s LookML will help us configure data security and out of the box(beta) locale feature will help us localize most of the menu items Looker UI and localize labels, titles and descriptions with ease.

Read more about similar Self Service BI topic here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post Security and Localization in Looker appeared first on Visual BI Solutions.

↧

Unlimited Audit Capabilities in Looker

August 23, 2019, 5:00 pm

≫ Next: Performance Enhancement in Alteryx

≪ Previous: Security and Localization in Looker

System and Content usage monitoring is necessary and part of day to day job for any tool administration. In the current trend, organizations are opting to switch their BI reporting tools or interested in having multiple BI tools according to their needs. When a new tool is introduced for their users, management always wanted to observe how their users are adopting to the new tool by comprehensive usage metrics and how system is responding with performance auditing. We can use these metrics also to identify data security and access breaches if any!

Looker has out of the box capabilities for most of Audit needs. In this blog, we discuss Looker’s out of the box features for content and system usage metrics and monitoring.

Image: Performance Audit Dashboard

Looker admins and users should have see_system_activity permission to see Looker instance’s Usage panel, which connects directly to Looker’s underlying application database, called i__looker.

The i__looker database stores information about your Looker instance, including all Looks and dashboards saved on your instance, user information, and 90 days of historical queries:

Looker users with access to Usage can see below sections, with dashboards

1.User Activity

2.Content Activity

3.Performance Audit

4.Usage

All of these sections provides various level of information, explained below. The Looks further can be explored as we need.

1.User Activity :

Total Users, Users by Type, Weekly Querying Users and many.

2.Content Activity :

· Total Dashboards, Total Looks, Popular Explores, Explores by User Group, Dashboard Usage

· Unused Content, Unused Explores and many.

3.Performance Audit :

· Queries executed Last 30 days and variance with past period,% of Results fetched from cache

· Queries issued by Source (Source being action/visual which generates/triggers query.Ex:SQL Runner,Look..) ,Avg Run time by Source, Errors by Source

· Hourly Schedules, Hourly PDT(Persistent Derived Tables) builds, Schedule Failure ,PDT Overview and Exceptions and many.

4.Usage :

· Active users per day(last 2 weeks), top users(last 30 days),Query run time performance pivoted by seconds bucket

· List of Public Looks, list of top Looks, list of top dashboards,

· Scheduled Plans performance, Scheduled Plan status and many.

In addition to above metrics, we have options to further explore usage metrics using below URL’s

The History Explore

The History Explore includes information about each query run on your Looker instance in the last 90 days. Looker automatically truncates its data every 90 days.

https://<my.looker.com>/explore/i__looker/history , replace <my.looker.com> with the address of your Looker instance

history explore has so much information, interested columns being query.sql_text, query.model.query.explore, history.runtime_in_seconds.

One of the use case being

What Is the Average Runtime for Different Models?

https://<my.looker.com>/explore/i__looker/history?fields=query.model,history.average_runtime&f[history.result_source]=query&sorts=history.average_runtime+desc&limit=500&column_limit=50&query_timezone=America%2FLos_Angeles&vis=%7B%7D&filter_config=%7B%22history.result_source%22%3A%5B%7B%22type%22%3A%22%3D%22%2C%22values%22%3A%5B%7B%22constant%22%3A%22query%22%7D%2C%7B%7D%5D%2C%22id%22%3A0%2C%22error%22%3Afalse%7D%5D%7D&origin=share-expanded

we can build similar queries for Most Popular dashboards, unused Explore in last 90 days, unused fields in last 90 days and so on.

Unused content – https://<my.looker.com>/browse/unused

Looker providers similar features to explore

Dashboards – https://<my.looker.com>/explore/i__looker/dashboard

Users – https://<my.looker.com>/explore/i__looker/user

Looks – https://<my.looker.com>/explore/i__looker/look

Conclusion

Looker provides out of the box analytics for our Looker instance Usage and Performance monitoring. We have observed some of the features available in the area, please explore your interest in Looker guide https://docs.looker.com/admin-options/server/usage .

Subscribe to our Newsletter

The post Unlimited Audit Capabilities in Looker appeared first on Visual BI Solutions.

↧

Performance Enhancement in Alteryx

August 23, 2019, 7:33 am

≫ Next: SAP Analytics Cloud – Smart Insights

≪ Previous: Unlimited Audit Capabilities in Looker

What is In-DB Processing?

When handling huge volumes of data, the major challenge involved in moving data out of the source into the analytical environment. This process of moving data in and out of the database is time-consuming and thus it invariably affects the overall performance. To overcome this challenge Alteryx has introduced a concept of In-Database processing which simplifies data blending across heavy datasets without moving data out of the database. When a database is connected through in-database processing, a direct connection to the database is established and it pushes the processing steps into the database and retrieves only the required data into Alteryx environment. By restricting the movement of vast volumes of data, performance is significantly improved.

Alteryx has introduced a dedicated set of tools (which has suffix as in-DB) to assist and support In-Database processing. This dedicated set of tools are added under ‘In -Database’ palette. This blog explores the In-DB capabilities of Alteryx 2019.2.7.63499.

What are the databases that support In-DB processing?

In-Database processing is supported by the following databases

Supported Databases
1	Amazon RedShift	10	Microsoft Azure SQL Database and Azure DWH
2	Apache Spark ODBC	11	Microsoft SQL Server 2008, 2012, 2014, 2016
3	Cloudera Impala	12	MySQL
4	DataBricks	13	Oracle
5	EXASOL	14	Pivotal Greenplum
6	Hive	15	PostgreSQL
7	HP Vertica	16	SAP HANA
8	IBM Netezza	17	Snowflake
9	Microsoft Analytics Platform Systems	18	Teradata

How connectivity to a database is established?

Four In-DB tools are available for establishing connectivity to the database

Connect In-DB
Write Data In-DB
Data Stream In
Data Stream Out

Connect In-DB enables the user to establish a connection and access the tables available in the database. It enables live connectivity to the database that allows the user to view schemas and tables listed in the database from which our table of interest can be chosen and accessed.

A table that has massive volumes of data can be accessed in seconds with connect In-DB tool whereas with generic input tool it takes about hours. This vast difference is mainly because the generic input tool fetches all the data into the Alteryx environment which is a time-consuming process and hence affects the overall performance.

Write Data In-DB assists with writing the data back to the Database. This ensures that the schema of existing tables in the database remains unaltered. There are 5 ways by which output mode can be configured

Create a New Table
Append Existing
Delete Data and Append
Overwrite Table (Drop)
Create a Temporary Table

In-DB doesn’t change the schema of the table hence all the output modes are restricted in such a way that the schema remains unaltered.

performance-enhancement-alteryx Data Stream In along with input tool can connect to external datasets and push the same into the database. If the user has a small lookup table that needs to be included in the workflow for the analysis, data stream in tools comes in handy. Instead of moving the larger dataset into the Alteryx environment for processing, Data stream in provides an option to move the smaller lookup file into the database thus reducing the runtime effectively.

performance-enhancement-alteryx Data Stream Out pulls the data from the database into Alteryx environment. With this option, selective data fields become available in the Alteryx environment so that it can be integrated with the other tools for further analysis

How does the filter, cleanse and transform in in-DB differs from generic tools?

performance-enhancement-alteryx Filter In-DB tools help to query and filter records that match the condition specified. Like the generic filter tool, Filter In-DB also has two types of filters namely basic and advanced. If the business is interested specific analytics, the In-DB filter can be used to fetch the selective set of records involved in the analysis into the Alteryx environment and perform further analytics by connecting them with Data Stream In tool. This process can effectively improve performance.

performance-enhancement-alteryx Select In-DB can select/deselect the fields to be analysed, reorder and rename the fields. Resizing of the fields is not possible as it changes the schema design. The main difference between a generic select tool and In-DB select is that generic select tool fetches all the records to Alteryx environment over which select operation is handled whereas in-DB select fetches the records from the DB based on the fields chosen in the select and brings them to the environment for processing.

performance-enhancement-alteryx Formula In-DB is used to write expressions based on the existing fields. Formula in-DB tool ensures that schema of the tables in the database remains unaltered. Hence there is no option to add an additional column using which expressions can be added.

performance-enhancement-alteryx Summarize In-DB can perform basic operations such as group, sum, count, count distinct, etc., Like the generic tool, summarize in-DB provides an option to perform aggregation operations based on the datatype of a particular column.

What are the tools available for combining data?

Join In-DB combines data streams based on mutual fields. Inner Join, Left Outer Join, Right Outer Join and Full Outer Join are the four available join options for In-DB. This tool is extremely effective in case of data blending of tables across multiple platforms.

performance-enhancement-alteryx Union In-DB combines data from different data stream into a single stream and is used when combining data from more than one stream.

Summary

When the Dataset is massive, the performance of the system can be significantly improved by using in-DB tools. This set of tools are easy to use and using these the data can be streamed in or out of the database as and when required which invalidates any restrictions on them. As a best practice approach, it is always recommended to use in-DB tools when handling larger datasets as it significantly improves the performance.

Note: The values obtained for some of the tools for indicating the performance are subjective to change based on and system configuration and doesn’t remain fixed. These values can be identified by enabling performance profiling under runtime of workflow configuration.

References

Read more about similar Self Service BI topic here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post Performance Enhancement in Alteryx appeared first on Visual BI Solutions.

↧

SAP Analytics Cloud – Smart Insights

August 25, 2019, 6:40 am

≫ Next: Macros in Alteryx Designer

≪ Previous: Performance Enhancement in Alteryx

Most visualization tools allow you to view top contributors to a specific dimension being analyzed. What if you want to analyze the top contributors of a selected value (data point) without pivoting the data?

The Machine Learning techniques in SAP Analytics Cloud allows us to pick a data point and analyze the contributors. You can also enter a search query in natural language and get appropriate results.

Add Smart Insights

Smart Insight is a simple, easy to use machine Learning tool which discovers the top contributor of a selected data point. You can add Smart Insights to the chart either by clicking on the icon in the context menu or by selecting a data point.
sap-analytics-cloud-smart-insights-1

Consider an example – Sales by Cities. By enabling Smart Insights, you can see that the top contributor New York is approximately 6000% above average. Also, the sales are driven significantly by Subcategory – Chairs.
sap-analytics-cloud-smart-insights-2

Also, notice that the Average reference line is represented automatically as soon as you enable smart insights.

Smart Insight on Data Point Selection

Similarly, when you select a data point and click on the “Insights” icon you can see the smart insights for that member in a separate panel.
sap-analytics-cloud-smart-insights-3

Consider the above example, on selecting Chicago, it shows that the Subcategory ‘Phone’ is the Top contributor which is 320% above average. You also get to know preferred shipped mode is Standard Class and it is 126% above average. One more insight is that consumer goods are doing well in Chicago.
sap-analytics-cloud-smart-insights-4

These smart insights allow the business users to dive deep into the data and get an idea of the data which can otherwise be easily overlooked. It also helps the user to research the unexpected behavior of a data point.

Smart Insight as Chart

Smart Insights are also represented as a chart when you choose to view smart insight for a selected data point. You can even include this chart in the story using the Add as Chart option.
sap-analytics-cloud-smart-insights-5

The chart added will automatically have “Chicago” as a Filter.
sap-analytics-cloud-smart-insights-6

Reach out to us here today if you are interested to evaluate if SAP Analytics Cloud is right for you.

Subscribe to our Newsletter

The post SAP Analytics Cloud – Smart Insights appeared first on Visual BI Solutions.

↧

Macros in Alteryx Designer

August 27, 2019, 6:37 pm

≫ Next: Toggle Snowflake Warehouse based on User in Looker

≪ Previous: SAP Analytics Cloud – Smart Insights

A macro is a workflow or group of tools built into a single component which will be available at the tool palette that can be inserted into another workflow. Create a macro to save an analytic process you perform repeatedly. Use the macro within a workflow without having to recreate the analytic process each time.

You can create or convert any workflow to a macro by changing its type in the configuration window as shown below.
macros-alteryx-designer

You can also find a set of macros that you can download and use from the Alteryx Gallery. You can download and import them to alteryx designer.

Requirement

Data cleansing is the most important task in any ETL process. Alteryx provides a data cleanse tool which helps you to perform cleaning operation in the data. With this tool, you can handle special characters, spaces, nulls and cases letters.

Let us assume you have a requirement to handle date and phone numbers, you can find an advanced data cleansing tool with which you can handle date formats, phone numbers and validation of data and more. Let us see about importing this “Cleanse macro” in alteryx.

Importing a macro

Let’s walk through the steps with an example where we are going to import the custom data cleanse tool with additional options.
macros-alteryx-designer

The macros will be saved as alteryx package(.yxzp file) in the gallery, which on opening will prompt us whether to import or not. By selecting ‘yes’ it will show us the list of files in the package then you can import the package into the application.
macros-alteryx-designer

Once the application is imported, open the .yxmc file and then save the file in the macro repository then the macro will be added as a tool in the alteryx designer.

Macros repository

You can find the macro repository from Options > User Settings > Edit User Settings > Macros tab
macros-alteryx-designer

If there are any folder selected, you can select a folder as Macros repository using icon.

Any .yxmc file saved in the folder will be loaded as a tool in the alteryx designer as shown below which can be used in any workflows.
macros-alteryx-designer

Cleanse Tool configuration window

macros-alteryx-designer

Conclusion

Macros reduce the work to a great extent in development as you can build a logic that you might reuse in multiple workflows. Let’s discuss more on macros in successive blogs.

Read more about similar Self Service BI topic here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post Macros in Alteryx Designer appeared first on Visual BI Solutions.

↧

Toggle Snowflake Warehouse based on User in Looker

August 26, 2019, 7:29 am

≫ Next: Snowflake Best Practices for Performance Optimization

≪ Previous: Macros in Alteryx Designer

Snowflake’s multi-warehouse architecture allows querying any database using any warehouse by set in setting the context in the Snowflake Web UI or in the SnowSQL. How can we leverage this feature when we are consuming Snowflake data in Looker? This blog helps us to configure Looker users to use different Snowflake’s Warehouses.
toggle-snowflake-warehouse-based-user-looker-1

Looker is a great choice to use in conjunction with Snowflake because, among other reasons, Looker does nearly all its processing on the database using SQL. By utilizing the database for processing, Looker takes advantage of Snowflake’s on-demand, elastic compute.

In this example we see two users accessing the same Snowflake database from Looker. One user is a Sales user interested in aggregated Sales data and the other a data scientist performing complex analysis. The sales user’s workloads are typically lighter and take a few seconds to complete. On the other hand, the data scientist is writing complex queries that use analytical functions takes minutes. In this scenario, I want the Sales user to use a separate warehouse with less compute and data scientist to use different warehouse with a larger warehouse. Snowflake’s shared data, multi-cluster architecture with tight integration to Looker allows us to do just that.

How we do that?

Using User’s attribute in Looker.

Prerequisites

In Snowflake: To configure this feature, we need at least two Warehouses.

In Looker: Access to create User Groups, Users, User Attributes and Database Connection in Looker.

Let’s start with the procedure

We have configured two Virtual Warehouses in Snowflake with below parameters

toggle-snowflake-warehouse-based-user-looker-2

We have created two User Groups namely “Data Scientists” and “Sales Users” in Looker and assigned users respectively.
toggle-snowflake-warehouse-based-user-looker-3

Now comes the key part of the configuration, create a user attribute.

What is the User Attribute?

User attributes provide a customized experience for each Looker user. A Looker admin defines a user attribute and then applies a user attribute value to a user group or to individual users.

User attributes can be used to configure Database Connections, Data Actions, Filters, Scheduled Dashboards, and Looks, Access Filters, Connecting to Git Providers, Liquid Variables, Google BigQuery Data Limits, Embedded Dashboards. Click here for more information.

We have created a user attribute for Snowflake Warehouse. This user attribute can be called in the Database connection. We can assign User Groups and Snowflake Warehouse to use it for their queries under the “Group Values” tab.
toggle-snowflake-warehouse-based-user-looker-4

That’s all, now we are ready to configure the toggle switch in our Snowflake database connection.

Use the below parameter in the “Additional Parameters” section.

warehouse={{ _user_attributes[‘snowflake_warehouse’]}}

Now run Looker queries or dashboards from different user groups and observe query “History” in Snowflake, we should observe the respective warehouse have been used for executing the query.
toggle-snowflake-warehouse-based-user-looker-6

Conclusion

Looker provides the option to use a different warehouse for different user sessions based on requirements with out of the box available features.

Read more about similar Self Service BI topics here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post Toggle Snowflake Warehouse based on User in Looker appeared first on Visual BI Solutions.

↧

Snowflake Best Practices for Performance Optimization

August 28, 2019, 6:00 pm

≫ Next: Advanced Spatial Analytics – A quick look

≪ Previous: Toggle Snowflake Warehouse based on User in Looker

Snowflake was built for the cloud and supports an unlimited number of Virtual Warehouses – effectively independently sized compute clusters, that share access to a common data store.

This Elastic Parallel Processing means it’s possible to run complex data science operations, ELT loading and Business Intelligence queries against the same data without contention for resources.

Snowflake was designed for efficiency, almost no performance tuning options needed for user to consider. We will be discussing few best practices to gain most of the performance in a cost-efficient way!

Data loading recommendations

Files need to be Split on Snowflake: Considering Snowflakes multi cluster and multi-threading architecture split your data into multiple small files than one large file, to make use of all the nodes in Cluster. Loading a large single file will make only one node at action and other nodes are ignored even if we have larger warehouse. Follow same practice for data unloading as well.

Have a separate large warehouse to support ingesting large files, this provides full horse power for data loading. We can turnoff warehouse after data loading.

Data querying recommendations

Dedicated warehouse for Querying: Snowflake automatically caches data in the Virtual Warehouse (local disk cache), so place users querying the same data on the same virtual warehouse. This maximizes the chances that data retrieved to the cache by one user will also be used by others. Suspending warehouse will erase this cache.

Result Cache is maintained by Global Services layer, any query executed by any user on the account will be served from the result cache, provided the SQL text is the same. Results are retained for 24 hours.

Snowflake Query Profile feature helps us to analyze queries being run from BI tools as well. In case, you have multiple BI tools and common users, having dedicated warehouse for each BI tool will help identifying queries generated from BI tool. We do not have out of the box option to identify the origin.

Consider scaling out the warehouse which will be using for BI Analytics to cater concurrent users.

Design Recommendations

Storing Semi-structured Data in a VARIANT Column: For data that is mostly regular and uses only native types (strings and integers), the storage requirements and query performance for operations on relational data and data in a VARIANT column is very similar. Load the data set into a VARIANT column in a table. Use the FLATTEN function to extract the objects and keys you plan to query into a separate table.

Date/Time Data Types for Columns: When defining columns to contain dates or timestamps, choose a date or timestamp data type rather than a character data type. Snowflake stores DATE and TIMESTAMP data more efficiently than VARCHAR, resulting in better query performance.

Set a Clustering Key for larger datasets: Specifying a clustering key is not necessary for most tables. Snowflake performs automatic tuning via the optimization engine and micro-partitioning. Set Cluster keys for larger data sets greater than 1 TB and if Query Profile indicates that a significant percentage of the total duration time is spent scanning.

Use Transient tables as needed:Snowflake supports the creation of transient tables. Snowflake does not preserve a history for these tables, which can result in a measurable reduction of your Snowflake storage costs. Consider using Transient tables as needed such as for POC’s.

Create and Maintain Dashboard for Snowflake Usage: Snowflake provides out of the box feature to Audit Cost incurred for the “Account” and other usage metrics as good visualizations under “Account” section. This information is only available to Snowflake Account Admin and others who can query.

Create and Maintain a live dashboard for developers and users who consume snowflake to better manage their usage. We can use available usage dashboards in Looker and Tableau for easy implementation.

https://looker.com/platform/blocks/source/cost-and-usage-analysis-by-snowflake

https://www.tableau.com/about/blog/2019/5/monitor-understand-snowflake-account-usage

Set SEARCH_PATH: setting SEARCH_PATH for to session can gain milliseconds to our querying as we are routing the query to right database and schema.

Conclusion:

Snowflake’s architecture loads, stores and manages data without additional configurations, above discussed are few parameters to use Snowflake in optimized way. Follow Snowflake documentation for more information to fine tune your instance.

Read more about similar Self Service BI topics here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post Snowflake Best Practices for Performance Optimization appeared first on Visual BI Solutions.

↧

Advanced Spatial Analytics – A quick look

August 30, 2019, 1:46 am

≫ Next: Keyboard Shortcuts for BW/4HANA

≪ Previous: Snowflake Best Practices for Performance Optimization

The volume of potential data in any industry is tremendous but a great deal of effort is taken to convert this huge voluminous data to a usable format. Generally, visualization tools with geospatial capabilities or analytical languages such as python are considered for handling analytics related to geographical data.

To handle geospatial data in a simplified way, Alteryx has introduced a separate component – Spatial Analytics. It provides a unique way of doing analytic operations using geographical data with a single workflow. This blog will give an insight into spatial objects in Alteryx and simple demo of how those objects perform.

Spatial Analysis is the most intriguing and remarkable aspect of GIS. These tools will help you with critically important decisions that are beyond the scope of simple visual analysis. Alteryx uses the spatial objects as highlighted below to incorporate spatial analytics.

There are three types of data in this scenario.

Traditional data: Transaction data, customer data, Inventory data, Marketing data, Store data, etc.,
Spatial Data: Trade area data, GIS mapping data, Location aware mobile service data, Asset Location data, Drive time data, etc.,
Enrichment data: Demographic, Firmographic, Industry-Specific and Segmentation.

Our demo includes 4 tools:

Input Tool
Create a Point/Street Geo-Code Tool
Trade Area Tool
Spatial Matching Tool

Input

Using the Input tool, you can easily infuse the data from a variety of file formats. We use the Input tool to connect to two types of data sources 1. Customer database to retrieve Customer GEO location 2. Store database to retrieve Store location.

Consider a simple customer and Store dataset with Longitude and Latitude fields.
advanced-spatial-analytics-quick-look

advanced-spatial-analytics-quick-look

Street Geo-Coding /Create Point

The street Geocode tool will spatially enable your data by taking standard address information or latitude, longitude information and creates spatial points.

The Create Point Will plot the location using the Longitude, Latitude and creates the spatial Points.
advanced-spatial-analytics-quick-look

advanced-spatial-analytics-quick-look

Note: we can use the Street Geocoder tool to get the street level view of your map. To use that tool, we need to opt for the license of the Geocoder package and it must be installed along with the Alteryx designer. Please refer to the link given.

https://community.alteryx.com/t5/Location-Data-Knowledge-Base/Data-Products-101-Local-Installation/ta-p/355403

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Help-with-Street-Geocoder/td-p/34625

Trade Area Geocoding

The trade area tool is used to plot a path within a targeted area surrounding each location point. To create the drive time polygon, the Trade area tool will start at each location and traverse with the road map network until a point that specified. Eg: 10 miles

Non-Overlapping Drive Time Tool is used to create drive-time trade areas, that do not overlap for a point.
advanced-spatial-analytics-quick-look

Like customer data, we have the store data which has been plotted using create point Tool. Centroid value generated from the Create Point Tool has been given as an Input to the Trade area Tool, where we must specify the Radius/DriveTime. Here We have taken the Radius(5.0miles). The trade area tool will cover the area within the given radius.
advanced-spatial-analytics-quick-look

We have six store information. All the stores are plotted and covered the radius within the given coverage area.
advanced-spatial-analytics-quick-look

Spatial Match Tool

This tool will help to establish a spatial relationship between two sets of spatial objects. It will give the customers lists who falls inside/outside of the specified drive time. It used to combine the spatial points from the customer dataset with Trade area location polygons.
advanced-spatial-analytics-quick-look

Spatial Match tool is used to find the customer who falls in the given store’s radius for which it requires two inputs.

Target – The customer centroid value must be the input value for the Target.

Universe – Input value of the universe field will be the Spatial Object. The spatial Object output from the trade area will be given as input to the universe.
advanced-spatial-analytics-quick-look

Note: The intersection condition can be modified. Here we have chosen the condition as Target within the Universe.

We will get the two outputs in the spatial Match tool. One is the customer/records which satisfy the intersection condition, and another will be unmatched records.
advanced-spatial-analytics-quick-look

Green Dots indicates the Store Location and the Red Indicates the Customer.

Alteryx includes a certain set of tools to make advanced spatial analytics accessible to business users. Additional tools which include Distance Tool, Find Nearest Tool, Heat Map Tool.

Read more about similar Self Service BI topics here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post Advanced Spatial Analytics – A quick look appeared first on Visual BI Solutions.

↧

Keyboard Shortcuts for BW/4HANA

September 1, 2019, 2:39 am

≫ Next: An Introduction to Fuzzy Matching in Alteryx

≪ Previous: Advanced Spatial Analytics – A quick look

Working in B4HANA from eclipse

You would have worked with SAP GUI till now for BW development. But for development in BW/4HANA, you will be working in eclipse. Working in B4HANA through eclipse is a lot easier than you think. There are a lot of keyboard shortcuts that you can use to work in eclipse.

You can see some of the most useful shortcuts below.

Operations	Shortcuts
Open SAP GUI	Ctrl+6
Create a new object	Ctrl+N
Activate an object	Ctrl+F3
Validate an Object	Ctrl+F2
Maximize\Restore tab	Ctrl+M
Search\Find object	Ctrl+H
Open Resource	Ctrl+Shift+R
Open an Object	Ctrl+Shift+D
Toggle tab	Ctrl+F7
Toggle perspective	Ctrl+F8
Toggle between Open objects	Ctrl+E
Move across open objects	Alt+(→ or ←)
Close current Object	Ctrl+(F4 or W)
Close all open objects	Ctrl+Shift+(F4 or W)
Get Where used list	Ctrl+F5
Undo	Ctrl+Z
Redo	Ctrl+Y
Quick Access	Ctrl+3
Open views	Alt+Shift+Q
Show error logs	Alt+Shift+Q+L
Refresh data preview	Ctrl+Shift+P

You can get the list of available shortcuts in eclipse using the command Ctrl+Shift+L.

Read more about similar Self Service BI topic here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post Keyboard Shortcuts for BW/4HANA appeared first on Visual BI Solutions.

↧

An Introduction to Fuzzy Matching in Alteryx

August 27, 2019, 4:46 am

≫ Next: Can Snowflake fit your pocket as Complete DW?

≪ Previous: Keyboard Shortcuts for BW/4HANA

We often want to match words that were referring to the same thing, but they were written somewhat different, or with a typographical error. This is a common problem we experience in a wide range of scenarios starting from human errs to joining data from different databases where it coded differently. Alteryx dispense fuzzy match tool to address these scenarios with ease. Fuzzy matching is a process that enables the identification of duplicates or matches that are not the same.

Data preparation is the key to success!

In order to perform a successful fuzzy matching, it is essential to prep the data for it. To extract the most out of a fuzzy matching tool, data must be properly cleaned. Also, all unneeded fields can be removed to keep the matching process simple and clean. Please keep in mind that, we can always use a Join tool to join back the required fields at the end of the fuzzy matching process.

The very first step is setting the record ID for our data. The fuzzy match tool uses the Record ID to output the list of matched records. We can either use an existing field as Record ID or create one from scratch. In case we are comparing records from different sources (Merge Process), it is always good to keep an eye on the Record ID field to make sure that these IDs are not getting overlapped and duplicated.

The next step in the preparation process is to cleanse the data. A comprehensive set of tools is available in Alteryx to perform this step. Better success rates, it is always important to understand the data and correcting corrupt or inaccurate records from the data set. Data cleansing activities may also involve activities like, harmonization of data and standardization of data. After cleansing activities, the data model should be consistent in most of the aspects with different sources.

Following is a set of possible tools to use in the Data Preparation Process, but it is not only limited to this set. These tools are just frequently used.

Record ID
Data Cleansing Tool
Multi-field formula tool
Select Tool
CASS
Unique Tool
Formula Tool

Fuzzy Matching is an Art

Once the data preparation process is successful, then and only then, performing the fuzzy matching on top of this cleansed data sets would yield the required outcome.

Unlike the other tools in the Join family, the Fuzzy matching tool has only one input stream available. So, it always necessary to union the different data streams into one before Fuzzy matching.

There are two match modes available in this tool.

Purge Mode
Records from a single source will be compared against each other to identify the potential duplicates.
Merge Mode
Records from different sources will be compared to identify the duplicates across different input files. It is always good to de-dupe the input files before the merge mode because Merge mode does not detect duplicate records within the same source. And, each source must contain a Source ID. Source ID column helps us to locate the stream of data where a record is coming from.

Once we decided on the matching mode, specify the record ID field on the tool. And then, the match threshold as a percentage should be specified on the tool. If the match score is less than this specified value, then the record will not be considered as a potential match. Under the match fields section, select the required field and appropriate match style for the selected field. Match style can be either pre-configured or custom style.

Once these configurations are done in the fuzzy matching tool and the workflow is executed, the Tool will output the list of potential matches based on the configured match styles and threshold percentage.

The fuzzy matching tool will output the Record ID fields only. It is always expected that there will be duplicated in the result set, its all related to the way how keys are generated in the Fuzzy matching process. So, it is very important that every time the Fuzzy matching tool is used in the workflow, we should always de-dupe the result set right after it.

Read more about similar Self Service BI topics here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post An Introduction to Fuzzy Matching in Alteryx appeared first on Visual BI Solutions.

↧

Can Snowflake fit your pocket as Complete DW?

August 31, 2019, 4:11 am

≫ Next: Meet us in our booth in Power Platform World Tour 2019 at Calgary

≪ Previous: An Introduction to Fuzzy Matching in Alteryx

For the past few months, Snowflake has been establishing its print in the BI world with its unique architecture and cost-effective cloud technology. But many users might not be sure how Snowflake could be leveraged or should Snowflake replace some BI tool in their existing architecture.

In this blog, we are going to compare what would happen if a company has no other DWs other than Snowflake and check if it’s really going to be cost-effective. We are going to assume Company ABC Corp has 2000 users with 3 TB of data with roughly 1~1.5 Gigs of data growing daily, and they are going to utilize Snowflake as their only DW solution. We are going to assume that we have Snowflake Enterprise Edition with AWS server in the US East region. So essentially Snowflake is going to charge them $3/credit. Since they know beforehand their data size, they are going to pay the capacity storage cost upfront – $23/TB.

Cost Consumption in Snowflake:

Loading Part (365 days 3 hour/day)

For the daily load of data into Snowflake, we are assuming that they will consume 3 hours in their warehouse. This we assume happens at the end of the day (non-working hours). They have an independent warehouse setup for the daily loads with a ‘Small’ size. They have chosen 365 days because of the daily load.

Total Compute – 3 (hours per day) * 365 (days per year) * 2 (credits for S Size) = 2,190 Credits per year

Warehouse	Active hours/day	Days/Year	Credits/Size (S)	Clusters	Total
WH_LOAD	3	365	2	1	2190 Credits

Development Part (250 days 10 hour/day)

For the Business Analyst and the developers who are involved in developing the required KPIs or insights derived from the business they are allocating a separate warehouse for development sized ‘Xtra Small’. Because multiple people will be working on it, they will set the number of clusters to 2, providing enough threads. Developers are not going to be there throughout the year, we are taking 250 days and 10 hours per day roughly for their workload.

Total Compute – 10 (hours per day) * 250 (days per year) * 1 (credits for XS size) * 2 (Clusters) = 5000 Credits per year

Warehouse	Active hours/day	Days/Year	Credits/Size (M)	Clusters	Total
WH_DEVELOP	10	250	1	2	5000 Credits

Consumption Part

Based on the business use, we are assuming the data warehouse is going to be running for close to 16 hours per working day.

We will have 3 warehouses one for the C level Executives, another for the Manager and one dedicated for the Data Science team. We are having a Warehouse with M size for Data Science team and a couple of XS for the C level and the Manager level.

XS size – 1 Credits/Hour 2 Warehouses

M size – 4 Credits/Hour

XS size Cluster Compute per year (Executives)= 8 (hours per day) * 260 (days per year) * 1 (credits for XS size) * 1 (Clusters) = 2080 Credits per year

XS size Cluster Compute per year (Managers/Analysts) = 16 (hours per day) * 300 (days per year) * 1 (credits for XS size) * 1 (Clusters) = 4800 Credits per year

M size Cluster Compute per year = 10 (hours per day) * 260 (days per year) * 4 (credits for M size) = 10,400 Credits per year. On combining, we have 17,280 Credits per year

Warehouse	Active hours/day	Days/Year	Credits/Size (XS, M)	Clusters	Total
WH_TOP_LEVEL	8	260	1	1	2080 Credits
WH_MED_LEVEL	16	300	1	1	4800 Credits
WH_DATASCIENCE	10	260	4	1	10400 Credits

Storage Capacity ($23 per TB per month)

3 TBs of Data is getting stored. 3 * 23 * 12 = $828 per year

Overall Credits/Cost Spent:

2,190+5,000+17,280 = 24,470 Credits Per year so in dollars, it’s 24,470 * 3 = $73,410+ 828 (for storage cost) coming up to a total of $74,238 per year

Note:

Out of 2000 users, there are only 1200 active users. Out of which 800 would be using it as per need basis. We also assume there are only 50-75 concurrent users per time and designed our Warehouse accordingly.
This architecture is designed in comparison with an existing On-prem Scenario vs Snowflake Warehouse for an imaginary Company ABC Corp so the costs may vary depending on the usage/use-cases.
The Warehouses can be more optimally designed according to the front-end reporting tool or as per various domains depending on the architecture to leverage the advantages of Cache.
We have not considered the costs spent for ingesting data into the Snowflake nor the cost of the reporting tool. We also assume that we don’t use Snow Pipe to Ingest as it involves an additional cost.
The price mentioned is based on our assumption. The costs/credit may vary depending on the exact scenario
All these depictions of hours are assumed in upper limits. We have data getting ingested into the system every day and users develop/consume the reports round the year.
The price above is with no discounts, Snowflake may offer discounts which reduces cost further.

Read more about similar Self Service BI topics here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post Can Snowflake fit your pocket as Complete DW? appeared first on Visual BI Solutions.

↧

Meet us in our booth in Power Platform World Tour 2019 at Calgary

August 26, 2019, 11:10 pm

≫ Next: Set Actions in Tableau

≪ Previous: Can Snowflake fit your pocket as Complete DW?

Hear from our experts

1. Planning and Forecasting with Power BI: The ValQ Journey presented by Chandler Stevens, VP –Microsoft BI & Analytics Practice

Date & Time: Thursday, October 3, 4:00 PM- 5:00 PM

Venue: Power Platform World Tour, Calgary, Alberta, Best Western Plus Village

Launching ValQ to become one of the most downloaded visualizations in Power BI history did not happen overnight. Come learn about the ValQ journey as Chandler Stevens will share the ups and downs of developing, testing, marketing and launching a Power BI solution while partnering with Microsoft. Also, he will share

Presentation Style: Talk with Demo
Intended Audience: Intermediate with knowledge in Power BI Visualization SDK

Link to the event page

2. Application Lifecycle Management, Session presented by Shankar Narayanan SGS, BI Solutions Specialist

Date & Time : Friday, October 4, 10:15 AM- 11:15 AM

Venue: Power Platform World Tour, Calgary, Alberta, Best Western Plus Village

This session is about DevOps and how Application Lifecycle Management is done in Power BI using DevOps. It talks about the different methods, best practices and success stories with a demo.

Presentation Style: Talk with Demo
Intended Audience: Intermediate with knowledge in Power BI offering and Service

Link to the event page

Reach out to us at marketing@visualbi.com for any further inquiry.

Subscribe to our Newsletter

The post Meet us in our booth in Power Platform World Tour 2019 at Calgary appeared first on Visual BI Solutions.

↧

Set Actions in Tableau

September 4, 2019, 4:32 am

≫ Next: Meet us in our booth in Power Platform World Tour 2019 at Calgary

≪ Previous: Meet us in our booth in Power Platform World Tour 2019 at Calgary

In Tableau 2018.3 release a unique feature was introduced which gave greater flexibility and ease of use for developers and business users to perform complex analysis with just a few clicks.

Earlier it was possible to create sets by;

1. Adding dimension values manually
2. Based on measure ranges and conditional statements
3. Top/Bottom N conditions based on the measure

With Set Actions, it is possible to dynamically pass dimension values to existing sets based on value selection from other visualizations. To straighten out the confusion let’s understand how to Set Actions in Tableau with an example.

Create SET A for Customer Name dimension and select any values.

Creating Customer Name Set

Add the SET A to rows and Customer Name dimensions adjacent in the row pane. The customer names are categorized as IN and OUT based upon the values selected when creating a set.

Set Actions in Tableau

Create another bar chart for Sales by Customer Name. Add both the sheets in a dashboard as shown below;

With Set Actions implemented, the values in the Set, change based on the selection from source sheet. For example, a Set has customers A, B and C. When users select customers X, Y and Z from another chart these values replace the existing values of A, B and C in the set.

To add Set Actions

Go to Dashboard -> Actions -> Add Actions -> Change Set Value.

Select the source sheet as Sales by Customer Name sheet which we will use to pass customer names to SET A.

Set Actions Example

The target Set is SET A.

Set Actions Example

Select Customer Names from Sales by Customer Name bar chart and those values will be passed to SET A.

Through Set Action business users can select Customer Names from a chart and pass them to sets to perform complex analysis.

Let’s put Set Actions to use with a proportional brushing analysis example.
Let’s find out the contribution of selected Sub-Categories to total sales in each state. Build a Sales by Sub-Category bar chart. Create a Sales by State bar chart and convert the measure into % of the total. Set the Compute Using value to Table (Across). In the same sheet, create a Set for Sub-Category with first five Sub-Categories selected.

Set Actions in Tableau Proportional Brushing

Add the above two sheets onto a dashboard. Create a Set Action with Sales by Sub-Category sheet as source and target Set as Sub-Category Set that was created in the previous step.

Set Actions Example

Select Sub-Category values from the Sales by Sub-Category bar chart and we will be able to find the selected Set of Sub-Category’s contribution to the total sales in each state.

With Set Actions, many complex analyses can be performed easily even by non-technical users. A similar feature called parameter action was released in Tableau 2019.2. What are its capabilities? We will be posting about it in the subsequent blogs.

To learn more about Visual BI’s Tableau Consulting & End User Training Programs, contact us here.

Subscribe to our Newsletter

The post Set Actions in Tableau appeared first on Visual BI Solutions.

↧