An Introduction to Fuzzy Matching in Alteryx

We often want to match words that were referring to the same thing, but they were written somewhat different, or with a typographical error. This is a common problem we experience in a wide range of scenarios starting from human errs to joining data from different databases where it coded differently. Alteryx dispense fuzzy match tool to address these scenarios with ease. Fuzzy matching is a process that enables the identification of duplicates or matches that are not the same.

Data preparation is the key to success!

In order to perform a successful fuzzy matching, it is essential to prep the data for it. To extract the most out of a fuzzy matching tool, data must be properly cleaned. Also, all unneeded fields can be removed to keep the matching process simple and clean. Please keep in mind that, we can always use a Join tool to join back the required fields at the end of the fuzzy matching process.

The very first step is setting the record ID for our data. The fuzzy match tool uses the Record ID to output the list of matched records. We can either use an existing field as Record ID or create one from scratch. In case we are comparing records from different sources (Merge Process), it is always good to keep an eye on the Record ID field to make sure that these IDs are not getting overlapped and duplicated.

The next step in the preparation process is to cleanse the data. A comprehensive set of tools is available in Alteryx to perform this step. Better success rates, it is always important to understand the data and correcting corrupt or inaccurate records from the data set. Data cleansing activities may also involve activities like, harmonization of data and standardization of data. After cleansing activities, the data model should be consistent in most of the aspects with different sources.

Following is a set of possible tools to use in the Data Preparation Process, but it is not only limited to this set. These tools are just frequently used.

Record ID
Data Cleansing Tool
Multi-field formula tool
Select Tool
CASS
Unique Tool
Formula Tool

Fuzzy Matching is an Art

Once the data preparation process is successful, then and only then, performing the fuzzy matching on top of this cleansed data sets would yield the required outcome.

Unlike the other tools in the Join family, the Fuzzy matching tool has only one input stream available. So, it always necessary to union the different data streams into one before Fuzzy matching.

There are two match modes available in this tool.

Purge Mode
Records from a single source will be compared against each other to identify the potential duplicates.
Merge Mode
Records from different sources will be compared to identify the duplicates across different input files. It is always good to de-dupe the input files before the merge mode because Merge mode does not detect duplicate records within the same source. And, each source must contain a Source ID. Source ID column helps us to locate the stream of data where a record is coming from.

Once we decided on the matching mode, specify the record ID field on the tool. And then, the match threshold as a percentage should be specified on the tool. If the match score is less than this specified value, then the record will not be considered as a potential match. Under the match fields section, select the required field and appropriate match style for the selected field. Match style can be either pre-configured or custom style.

Once these configurations are done in the fuzzy matching tool and the workflow is executed, the Tool will output the list of potential matches based on the configured match styles and threshold percentage.

The fuzzy matching tool will output the Record ID fields only. It is always expected that there will be duplicated in the result set, its all related to the way how keys are generated in the Fuzzy matching process. So, it is very important that every time the Fuzzy matching tool is used in the workflow, we should always de-dupe the result set right after it.

Read more about similar Self Service BI topics here and learn more about Visual BI Solutions Microsoft Power BI offerings here.

Subscribe to our Newsletter

The post An Introduction to Fuzzy Matching in Alteryx appeared first on Visual BI Solutions.

An Introduction to Fuzzy Matching in Alteryx

Data preparation is the key to success!

Fuzzy Matching is an Art

Subscribe to our Newsletter

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112