Before you begin this Product Tour, the Getting Started: Quantexa Unify for Microsoft Fabric article will help you understand some essential concepts that underpin Quantexa's workload solution.
This document guides you through the following main steps to generate resolved Entities from sample data provided by Quantexa for the Quantexa Unify Public Preview.
Prerequisites
Before you start this Tutorial, complete the following prerequisites:
- Your Fabric administrator must enable Fabric for your organization.
- From Workloads in the left navigation pane, select the More Workloads tab, then add the Quantexa Unify workload to your Fabric tenant. Once added, it appears on the Fabric My Workloads tab. For more details on adding workloads see the Microsoft tutorial.
- Someone with the appropriate permissions, such as a Capacity, Tennant, or Workspace Administrator, must activate the Quantexa Unify workload.
- Before using the Quantexa Unify workload, you must provide consent to the Quantexa application. Contact your Workload Administrator if you experience any issues.
Getting support
If you experience issues or require support whilst using the Quantexa Unify Workload, you can request help in the dedicated support Topic.
Overview
Follow these steps to generate resolved Entities from your organization's source data.
- Launch the Quantexa Unify workload.
- Create a Project.
- Load the Data Sources.
- Review data mapping.
- Run Entity Resolution.
- View Entity Resolution results.
NOTE: The Public Preview of Quantexa Unify is only able to be used with the provided Quantexa sample data. If you wish to use the product with your own data you can request access to the Private Preview using the available links in the Quantexa Unify Workload:
1: Launch the Quantexa Unify workload
Within Microsoft Fabric, launch the Quantexa Unify workload using one of the following methods:
- Select the Home icon, then select the Quantexa Unify workload.
- Select Microsoft Fabric from the left Navigation panel, then select Quantexa Unify in the menu.
2: Create a Project
Create a Project to add Data Sources for Entity Resolution.
- In the Quantexa Workspace, click on the Quantexa Unify tile to create a Project.
This opens the New Project dialog.
- In the dialog, enter a name for your Project and select Create.
NOTE: If the dialog box does not pop up, please refer to the Fabric issues when creating new items in the Troubleshooting section.
Next, you can select and add Data Sources to your Project.
3: Select Data Sources
The next step is to select the data to perform Entity Resolution on. In the Public preview there are two sources of demo data to choose from: Contoso and Northwind. These are designed to represent some customer data from two different businesses / brands upon which we want to run entity resolution.
NOTE: When accessing the Quantexa Unify workload in Private Mode, additional functionality is available for Data Sources, including using any of your own data. You can request access to Private Mode through the available links in the Quantexa workload.
- On the Home tab of your Quantexa workspace, use one of the following options to add a Data Source:
- Select Add Data Source from the menu:
- Click the Add Data Source button.
- In the Explorer panel, click the + next to Data Sources.
- In the Public Preview, you can only select pre-defined Data Sources provided by Quantexa in the Quantexa OneLake. The actual file names may vary.
In Public Preview, selecting an option to add a Data Source opens the Demo Data Sources overlay:
The following image shows example Data Sources.
On the overlay, select contoso
and then select Connect.
Bootstrapping
After selecting a Data Source, the Bootstrapping process is initiated. This process analyzes the structure and contents of your data and determines which fields within the data should be used in the Entity Resolution process. For example it will detect the names of people and businesses, addresses, phone numbers, email etc. Users can override the configuration as required.
The system is then automatically configured with the relevant Entity Groups.
After Bootstrapping you will be presented with the data mapping section which shows the result of the bootstrapping.
The next step is to review the data mapping to validate the configuration that has been done by the system.
4: Review data mapping
You can use the Data Mapping panels to view the configuration of the system and the Data Viewer to see the underlying data and understand how it is being cleansed, parsed and normalized.
Review data quality metrics
The Data Mapping panel shows information about your data columns, including some data quality metrics.
The data quality metrics help you to understand your data and ensure the configuration is correct:
- Uniqueness – A measure of how many distinct values there are in the data as a percentage of the total rows.
- Populated – What proportion of the rows have a value in this field. High population statistics are ideal where you have linking information (such as customer names).
- Distinct Values – A count of the distinct values within the column
You can also review your raw and cleansed data in the Data Viewer panel to inform your decision.
Review data
The Data Viewer panel shows information about your raw and cleansed/processed data.
Select the expand icon to expand the Data Viewer.
The Data Viewer panel contains the Raw Data tab and the Entity Data tabs:
- The Raw Data tab displays a table that reproduces the original columns and values from your uploaded Data Source. Here you can filter the columns of interest.
- The Entity Data tab displays how the raw data has been processed (cleansing, parsing and normalization). The Green columns show the core fields associated with each entity type and show how it has been processed, the remaining columns are the attributes associated with that entity (e.g. their address, phone, email etc)
You can use this table to validate the entity mappings to ensure data is being cleansed and processed as expected.
You can use the drop-down to switch between the different Entity types and you can turn off specific fields using the Columns Filter.
Update data mapping
The most common activity is to update or change the mappings that the system has automatically defined.
In the Data Mapping panel, you can use the drop-downs in the Entity Mapping and Mapping Field columns to update the mapping of your source data to Entity Groups.
For example, when looking at the contoso
data source, it looks like most fields are mapped correctly, but there is a nationalID
field which has no assignment and therefore is not being used as part of the Entity Resolution.
You can update this configuration as follows:
- In the Entity Mapping column, find the field you want to remap, in this case
nationalID
, then select the drop-down and choose individual
because the national identifier in this case is associated with the person:
- Now select the field to map this information into. In this case we are mapping the
nationalID
into a field called nationalID
but you could also be mapping a passport number, or another unique identifier for the individual:
You have now successfully updated the mapping for the nationalID
field and it will now be used within the Entity Resolution process.
Adding Entity Groups
Quantexa Unify supports having multiple Entities of a given type, such as multiple addresses or multiple contact details on a single row. Entity Groups are used to define the different entities within the data. You can view the configuration using the Manage Entities button on the Data Source tab.
In the Public Preview, you can create new new Entity Groups on this panel. You can also configure new Entity Types and Matching Levels, but this functionality is not available in the Public Preview demo.
5: Run Entity Resolution
Once you are happy with the way the data source(s) are configured, you can execute the Entity Resolution. This is known as running an Iteration. Each Iteration runs with the configuration as it was when the job was kicked off. You can have multiple Iterations within the Project, all based on different configurations and Data Sources.
- In the Explorer panel of the Data Source tab, click the + icon next to Iterations. Or select the Create Iteration option on the Home tab to create a new Iteration.
This opens the New Iteration dialog.
- In the New Iteration dialog, provide the following information:
Name – enter an identifiable name for your Iteration. This must be unique within the current Unify project.
Matching level – this defines the matching level. Default is recommended for most use cases. See the Getting Started: Quantexa Unify for Microsoft Fabric guide for an explanation of the Matching levels.
Destination – from the dropdown, select the destination Lakehouse where you want the resolved Entity output to be saved. Note: for public preview this will also store a copy of the source data.
- Select Run.
This submits several background jobs to process the mapped and cleansed source data and generate resolved Entities. While the Iteration runs, you can view its progress on the screen:
NOTE: These jobs take approximately 5 minutes to complete and the vast majority of this is time spent on the overheads of setting up the jobs, therefore running 1 row, 1000 rows or 100,000 million rows would take around the same time. The Quantexa system has been proven at 60 billion+ record volumes, but for Public Preview the test files are intentionally small to allow our customers to easily view the results.
On completion, you can see the results of the run in the iteration details screen in the Iterations section of the Explorer panel.
6: Viewing the results in PowerBI and the Lakehouse
After Entities have been built from your source data, you can view the results as a list of tables for each Entity Type, and as a PowerBI chart.
View Iteration outputs
On successful completion of the Entity Resolution Iteration, the Iteration tab displays details of the Iteration:
The following sections provide information about each numbered section in the preceding image.
- Information section: displays status and timestamp information for the Iteration and allows you to view more Job Details if required.
- Total Entities section: displays a bar chart showing the number of Entities generated per Entity Type.
- Output data section: shows details of the tables generated by the Iteration for each Entity Type. This can be useful for understanding how many input rows were put into
Details include file path, file size, and the number of rows within the file.
View PowerBI reports
A completed Iteration generates a default set of PowerBI reports where you can analyze information about the generated Entities.
Use the following steps to view the reports.
- From the left navigation panel, select the Workspace where you ran your Iteration.
- Select the PowerBI Report generated for your Iteration. It has the following naming convention:
Quantexa:<Project name> - <Iteration name>
This opens a series of default charts generated in PowerBI for each of the Entity types generated by the Iteration.
- Select the Entity type you wish to view the output for in the left Pages panel.
For example, selecting the Individual Entity type displays the following charts:
A data viewer below the charts shows details of the resolved Entities:
The charts are all interactive and will filter the tables below.
By clicking on this table you can also open up a filters panel on the right to search for a specific resolved Entity.
NOTE: If the charts do not initially display, click the refresh icon in the top right of the Fabric menu:
View the Input and Output Data
Each iteration writes out a set of tables to the chosen Lakehouse, detailing which records have been linked together by the entity resolution.
The sample data provided by Quantexa is also loaded into the same output Lakehouse as the iterations are written to.
To access this data:
Open the Lakehouse you selected for the iteration output.
On the left you will see a set of Tables.
The tables are as follows:
Raw input tables contain the data used in the iteration, for example "contoso" in the tutorial.
Quantexa Entity Tables contain one row per resolved Entity with some statistics about the Entity and some details about the different values or variations within those Entities.
Quantexa Record Tables provide the linkage back to the raw underlying data.
Additional features
This section describes additional features available in the Quantexa Unify workload.
Version History
Every change to a Project is captured in the Version History. This includes the addition or removal of Data Sources, details of each Iteration run, and changes to the Project settings.
Each Iteration run creates a new version of the Project that includes all changes made to it since the previous Iteration run.
To view the history of a Project, select Version History from the Home tab:
This opens the Version history overlay, where you can view all changes to the Project and all its versions. Each version of a Project is identified using the Iteration name.
Manage Project settings
You can view the settings for a Project using the Project settings option on the Home tab:
This opens the Project settings overlay, where you can view and edit the details of your Project.
See the Microsoft documentation for details about this overlay.
Troubleshooting
This section details known issues and how to resolve them.
Fabric issue when creating new items
Occassionally, after attempting to create a new Project, there is no response or error message from the system. This is due to a known intermittent issue with the Microsoft Fabric system, where it appends theclientSideAuth=0
parameter to the URL in your web browser's address bar.
If the URL contains clientSideAuth=0
, remove it then try again.
Next Steps
For a guided tutorial to see Entity Resolution in action using the sample data data available in the Public Preview, see Quantexa Unify: ER in Action.