Knowledge Base Article

Unify: Core concepts

This page describes the core concepts underpinning the Quantexa Unify workload.

Overview

The Unify workload is built on the Entity Resolution component of Quantexa’s industry-leading Decision Intelligence Platform. This provides Unify with best-in-class Entity Resolution capabilities: all available in a few clicks inside your Fabric tenant.

The concepts listed on this page are fundamental to understanding the Unify workload’s capabilities and Quantexa’s Entity Resolution features within the workload.

Core concepts

The following sub-sections describe core concepts underpinning the Unify workload.

Entity

An Entity is the representation of a real-world person or object, such as a customer or bank account. Quantexa distinguishes Entities from their real-world counterparts to make clear that Entities are simply compiled from data points found in the Data Sources you provide.

Entity Type

An Entity Type is a category of Entity. Entity Resolution in the Unify workload recognizes the following Entity Types:

  • Individual
  • Business
  • Telephone
  • Address
  • Account

Entity Group

An Entity Group provides further refinement of Entities within an Entity Type. For example, a Telephone Entity may contain a landline entry as well as a mobile phone entry. Both the landline and mobile phone numbers are each Entity Groups within the Telephone Entity Type. Quantexa provides several predefined Entity Groups within the Unify workload.

Entity Resolution

Entity Resolution is the process of identifying Entities within your Data Sources by finding various and likely disparate occurrences of that Entity across the available data.

Based on your use case and data quality, you can adjust the strictness threshold for matching, known as the Matching Level, that the workload uses for Entity Resolution. The Matching Level impacts the level of Overlinking or Underlinking. These concepts are explained below.

REMEMBER: You can view metrics about the resolved Entities in, for example, PowerBI, or using the workload’s output tables or Delta Lake files in OneLake.

→ Matching Level

A Matching Level is a strictness threshold for matching that Unify refers to when deciding whether to resolve Entity references. You must specify the Matching Level that the workload should apply to an Iteration. For each Iteration, you can choose one of the following three Matching Level options:

  • Default: The standard Matching Level that applies to most use cases, striking a balance between Overlinking and Underlinking. Overlinking and Underlinking are explained below.
  • Fuzzy: A looser Matching Level that casts a wider net. It enables more matches to be found, but may result in some Overlinking.
  • Strict: A stricter Matching Level that only resolves Entity references where there is strong confidence that the match is correct. It ensures no incorrect matches are made, but may result in some Underlinking.

For further information on Matching Levels, see Unify: A closer look at selected key features.

→ Overlinking

Overlinking occurs when multiple references are incorrectly linked to the same Entity, even though they refer to different real-world Entities. An Overlinked Entity is an Entity that is incorrectly resolved with one or more other Entities.

Overlinking is typically caused by similarities between the records of different Entities, such as two separate customers having the same name and even address.

→ Underlinking

Underlinking occurs when two or more references to the same real-world Entity are not linked in the dataset. An Underlinked Entity is an Entity that is only partially resolved.

Underlinking is typically caused by missing or incorrectly entered data, such as one customer being listed multiple times in one database under different names or addresses, and with no other data to connect those references.

Project

A Project is one instance of your Unify workload. It is a collection of Data Sources you have uploaded that you can then use for various Iterations.

Version History

All changes to a Project, such as the upload of new Data Sources, are automatically recorded in the Version History. You can view the history of your changes by clicking Version History under your workload's Home tab.

Data Source

A Data Source is a Lakehouse Table in OneLake, which you can create from a file you upload or from another source in OneLake. You upload your Data Sources to a Project within your Unify workload. You can upload multiple Data Sources to your Project. However, you can only upload one Data Source at a time.

In the Demo version of Unify, you cannot use your own Data Sources. Instead, Quantexa provides example customer Data Sources for the following two fictional product brands:

  • Contoso
  • Northwind

Each one contains example data such as names, addresses, and telephone numbers for customers of the brand, but each file has a different schema and columns, reflecting the diversity and messy data typically encountered in an organization.

In Full User and Trial versions of Unify, you may use your own Data Sources. These may contain your organization’s internal data or external data from third parties, such as corporate registries or watchlists.

Data Mapping

The Data Mapping process is an automatic process that runs when you upload a Data Source to your Project. It does the following:

  • Analyzes the uploaded Data Source’s contents.
  • Uses an inference engine to determine the appropriate data schema.
    • For example, a field containing names is mapped to the Individual Entity Type. From this, it then maps the component parts of the field to the appropriate Entity Groups within that Entity Type, such as Forename or Surname.
  • Applies the necessary parsing, cleansing, and standardization of your raw input data. For further information on this, see the definition for Parsing, Cleansing and Standardization on this page.

After the mapping process is complete, a Data Mapping panel lets you view and refine the results of the process. You can also view data quality metrics for the raw input data.

For further information on Data Mapping, see Unify: A closer look at selected key features. For guidance on reviewing and editing the initial Data Mapping output, see Unify: Step-by-step guide to using the workload.

Parsing, Cleansing, and Standardization

The Unify workload parses, cleanses, and standardizes your Data Source data automatically as part of the Data Mapping process. It uses Quantexa’s Machine Learning model to do so.

  • Parsing splits source data into its component parts. For example, parsing a raw full name data entry of Michael Greene creates a Forename = Michael and Surname = Greene.
  • Cleansing manipulates the raw data to prepare it for optimal Entity Resolution. For example, removing generic terms such as Ltd or Organization, and removing punctuation and default values. It also converts all data to uppercase.
  • Standardization replaces different presentations of the same data with a single version for consistent formatting. For example, a dataset may contain USA, AMERICA, UNITED STATES, or UNITED STATES OF AMERICA in the country field. Standardization converts all of these to US.

The main purpose of parsing, cleansing, and standardization is to create consistent data that facilitates linking through Entity Resolution.

Iteration

An Iteration is the execution of Entity Resolution for a Project at a specific version. You can select a different set of Data Sources for each Iteration, which may help you identify the Data Sources that provide the highest quality of Entity data.

An Iteration execution submits a series of automatic background jobs to do the following:

  • Resolve and build Entities.
  • Generate the resulting Entity data as Lakehouse tables, which you can view in OneLake or Power BI.
    • These typically include tables for the different Entity Types, and a table containing links between the records and the resulting Entities to show how the Entities have been built from your Data Sources.
    • You can also view a Semantic Model showing the relationships between the output tables.

When executing an Iteration, you can select the Matching Level you want to use when resolving Entities. For further details, see the definition for Matching Levels in this document.

For further information on Iterations, see Unify: A closer look at some key features.

Semantic Model

The Semantic Model output by an Iteration shows the relationships between the input and output tables of that Iteration. For further information on Semantic Models in Microsoft Fabric, see Power BI Semantic Models in Microsoft Fabric.

For further information on Semantic Models and other automated Unify outputs, see Unify: A closer look at some key features.

Next Steps

For a guide to using the Unify workload, see Unify: Step-by-step guide to using the workload. For an applied example of the step-by-step guide, see Unify: Example workflow.

Updated 2 days ago
No CommentsBe the first to comment
Related Content