This guide provides essential information about Quantexa's Entity Resolution solution and the concepts that underpin it.
Useful resources
The following resources will help you understand Quantexa’s approach to Entity Resolution and get an overview of the Quantexa Unify solution.
Introduction
The Quantexa Unify for Microsoft Fabric workload enables Data Warehouse specialists, Business Analysts, Data Engineers, and Data Scientists to unify and resolve internal data, such as customer data, with external data, such as Corporate registry or Watchlist data, to generate more accurate and reliable information about real-world Entities for improved analysis and decision-making. You can view the output data and share findings and insights in PowerBI and other applications through native integrations within the Microsoft Fabric ecosystem.
An automatic schema inference engine ingests and analyzes your data to apply the appropriate parsing, cleansing, and data standardizations. From this, the Entity Resolution process identifies connections in your data to generate information about individuals, businesses, accounts, addresses, phone numbers, and emails that more accurately represent their real-world equivalents.
Entity Resolution uses all the information in your source data to find connections between data records. Unlike the traditional record-matching approach, it does not only rely on matching unique identifiers. Instead, it enables you to uncover connections through indirect relationships.
The Quantexa Unify workload supports the end-to-end process from ingesting, mapping, and resolving your source data, to generating reliable information for more effective decision-making.
Quantexa concepts
The following concepts are fundamental to understanding Quantexa’s Unify workload solution.
Entity
Entities are the real-world people and objects represented in your data, such as a customer or bank account. Entity Resolution recognizes the following Entity Types:
- Individual
- Business
- Telephone
- Address
- Account
Entity Group
An Entity Group provides further refinement of Entities per Entity Type. For example, a telephone Entity may contain a landline and mobile phone entry. Quantexa provides several predefined Entity Groups.
Entity Resolution
Entity Resolution is the process of identifying Entities within your Data Sources by finding matches in the available data. Based on the quality and completeness of your data, you can control the level of strictness the Entity Resolution engine applies when matching data for each Entity Type. This can impact the level of overlinking or underlinking.
Overlinking
An overlinked Entity is an Entity that is incorrectly resolved with one or more other Entities. Overlinking occurs when multiple references are incorrectly linked to the same Entity, even though they refer to different real-world Entities. Overlinking is typically caused by similarities between the records of different Entities. For example, if two customers in a sales database have the same name and address but are actually two different people, linking them as one Entity would result in an overlinking error.
Underlinking
An underlinked Entity is an Entity that is only partially resolved. Underlinking occurs when two or more references to the same real-world Entity are not linked in the dataset. Underlinking is typically caused by missing or incorrectly entered data. For example, if a customer is listed in a sales database under different names or addresses, not linking these references together would result in an underlinking error.
Matching levels
For each Entity Type, you can select one of the following Matching levels to determine how Entities are resolved from your data.
- Default: The standard matching level that applies to most use cases. It strikes a balance between over-linking and under-linking.
- Fuzzy: A looser matching level that casts the net wider, enabling more matches to be found, but it may result in some over-linking. This is more appropriate for some use cases where you need to ensure you’ve found all possible matches but are happy to review them. For example, customers can be matched to a watchlist for a Financial Crime use case.
- Strict: A stricter matching level that will only resolve data together where there is strong confidence that the match is correct. This is more appropriate for other use cases where you need to ensure you’ve not created an incorrect match. For example, generating a master set of customers in a Master Data Management (MDM) use case where you don’t review all the matches and may automatically contact customers or take actions on their accounts.
You can view metrics about the resolved Entities and their quality in PowerBI and using the generated Tables or Delta Lake files in OneLake.
Data Source
A Data Source is a Lakehouse Table in OneLake, which you can create from another source in OneLake or an uploaded file. You can upload multiple Data Sources to a Project.
In Public Preview, Quantexa provides the following example customer Data Sources for two fictional product brands:
Each one contains sample data such as names, addresses, and telephone numbers for customers of the brand but each file has a different schema and columns, reflecting the diversity and messy data typically encountered in an organization.
In Private Preview, Data Sources may contain your organization’s internal data or external data from third parties, such as Corporate Registry or Watchlist data.
Project
A Project is a collection of related Data Sources you can use for a particular outcome. A Project may have one or more Versions.
Parsing
Parsing splits source data into its component parts. For example, parsing a raw full name data entry of Michael Greene
creates a Forename = Michael
and Surname = Greene.
When parsing addresses, Quantexa uses a Machine Learning model to split, cleanse, and standardize addresses into their component parts. For example, 31 54th Street #2A
splits into a house number, street name, and flat number.
Bootstrapping
When you add a Data Source to a Project, a Bootstrapping process analyzes its contents. Bootstrapping uses an inference engine to determine the appropriate data schema. For example, a field containing names is mapped to name fields for the Individual Entity type. From this analysis, the inbuilt mapping process identifies the suitable Entity Groups and applies the necessary parsing, cleansing, and standardization. After Bootstrapping is complete, you can view the raw and cleansed data using the Data Mapping features.
Data Mapping
A Data Mapping panel lets you view and refine the mapping of source fields to Entity Groups generated by the Bootstrapping process before you initiate Entity Resolution. For example, you may want to map an address field in your source data to a main, previous, or current address. You can also view data quality metrics for the pre-mapped data.
Iteration
An Iteration is the execution of Entity Resolution for a Project at a specific version. You can select a different set of Data Sources for each Entity Resolution Iteration, which may help you identify the Data Sources that provide the highest quality of Entity data when viewing the Entity Resolution outputs.
When submitting an Entity Resolution run through an Iteration, you can select the Matching level you want to use for resolving Entities.
An Entity Resolution Iteration execution submits a series of background jobs to cleanse the source data, resolve and build Entities, and generate the resulting Entity data as Lakehouse tables, which you can view in OneLake or PowerBI. Typical outputs include tables for the different Entity types: Address, Business, Email, Individual, and Telephone, and a table containing links between the records and the resulting Entities to show how the Entities have been built from the source data.
Project version
All changes to a Project are automatically saved to the Project history. Each time you run an Iteration, a new version of the Project is created using the Iteration name as the version identifier. A Project version includes all changes made to the Project since the last Iteration run. You can view a Project’s history using the History option on the Quantexa Unify workload's Home tab.
Next Steps
You can now proceed to the Quantexa Unify: Product Tour for a detailed step-by-step tour of the product.