Knowledge Base Article

Unify: Step-by-step guide to using the workload

This page provides a step-by-step guide to setting up and using the Unify workload. The guide indicates where capabilities and features differ in the Demo version of the workload, which has restricted capabilities compared to the Full User and Trial versions.

Overview

The following steps provide a summary overview of your end-to-end process when using the Unify workload:

  1. Prepare your Data Sources.
  2. Launch the Quantexa Unify workload.
  3. Create a Project.
  4. Load your Data Sources. Loading your Data Sources automatically triggers Data Mapping of each Data Source.
  5. Review the output of the Data Mapping process and make any amendments if needed.
  6. Run an Iteration to conduct Entity Resolution.
  7. View your Iteration’s Entity Resolution results.

The following diagram provides a visual overview of the workflow:

Prerequisites

Before proceeding to the Using the Unify workload section to start using the workload, complete the following prerequisites:

  1. Ensure your Fabric administrator has Enable Microsoft Fabric for your organization - Microsoft Fabric.

  2. Next, add the Unify workload to your Fabric tenant as follows:
    2.1. From the Fabric home page, click the Workloads button in the left
             navigation pane.
    2.2. From the Add more Workloads section, click the Quantexa Unify workload to
             add it to your tenant.
    2.3. Once added, the workload appears on the My Workloads section on the same
             page.
    2.4. For further details on adding workloads see the Microsoft tutorial.

  3. Next, someone in your organisation with the appropriate permissions, such as a Capacity, Tenant, or Workspace Administrator, must activate the Quantexa Unify workload.

  4. Before using the workload, you must also provide consent to the Quantexa application. Your organisation’s Workload Administrator will typically have provided consent on behalf of all users from your organisation already. However, if you experience any issues, contact your Workload Administrator.

  5. Finally, ensure you have prepared any Data Sources you want to use with the workload and that they are in a suitable format to upload as Lakehouse objects to Fabric.

NOTE: For the Demo version of Unify, you cannot use your own Data Sources and must use those provided by Quantexa instead.

Using the Unify workload

The following steps guide you through using the Unify workload, after you have prepared your Data Sources and added the workload, and are separated into the following sub-sections:

  1. Creating your Project.
  2. Adding Data Sources.
  3. Data Mapping.
  4. Running an Iteration.
  5. Viewing and using the Unify output.

(1) Creating your Project

This section guides you through creating your Project in the Unify workload.

  1. After completing the prerequisites, navigate to the workspace you want to use the Unify workload in.
    • You can navigate to the workspace by clicking on the Workspaces item on the left-hand sidebar and selecting the workspace from the list.
    • Clicking on your workspace in the list takes you to the workspaces homepage.
  1. Launch the workload by clicking + New item in the top left of your workspace’s homepage. This brings up a pop-up list of workloads.

     

  2. Click on the Unify workload you want to use from the Others section at the bottom of the list.

  3. A pop-up titled New Project appears.
    IMPORTANT: If a permission request appears at this stage, click Agree.
  4. In the New Project pop up, type in a unique name for your Project that is easy for you to recall and identify.

  5. Click Create.

  6. You are then taken to the homepage of your Quantexa Unify Project.

(2) Adding Data Sources

  1. Once on your Project homepage, see the Explorer panel on the left side of the page. This shows your Project’s Data Sources and Iterations. On setting up your Project for the first time, the panel will show that you do not have any Data Sources or Iterations.

     

  2. To add a Data Source, click Add Data Source using one of the following options:

    • From the Explorer panel, click the + button next to Data Sources.
    • From the menu under the Home tab, click Add Data Source.
    • On the main section of the page, click Add Data Source.

  3. When you click Add Data Source, a pop-up appears listing the various Lakehouse objects you can choose as your Data Source. This is your OneLake catalogue. Select the Lakehouse you want to add as a Data Source and click Connect.

NOTE: You can only add one Data Source at a time, so you must repeat this process for each Data Source you want to add.

IMPORTANT: If you are using a Demo version of the Unify workload, you cannot upload your own Data Sources. Instead, you can only use the Data Sources Quantexa provides in Unify: Contoso and Northwind.

(3) Data Mapping

Once you have connected a Data Source, the Data Mapping process for that Data Source runs automatically.

  1. Once the Data Mapping process completes, a Data Mapping table appears in the main section of the page.

     

    The table contains the following columns:
  • Field: this is the column name pulled from the Data Source, such as Forename, CustomerAddress and Email.
    • The key symbol against a field, for example against customerID, indicates that the field is a Primary Key. A Primary Key is a unique identifier for the Entity Resolution process. A field must be 100% unique and 100% populated to qualify as a Primary Key.
  • Entity Mapping: this is the Entity Group that the Field is or that it maps to, such as Business, Individual or Email.
  • Mapping Field: this is the Entity attribute that the Field maps onto, such as forename and dateOfBirthString.
  • Type: this details the Field’s data type, such as Optional Int, String or Optional String. For example, the schema does not strictly require an Individual Entity Type to include a date of birth, making CustomerDoB an Optional String.
  • Uniqueness: this measures how many distinct values there are as a percentage of total rows.
  • Populated: this details the proportion of rows that have a value in this field.
  • Distinct Values: this counts the distinct values within the field column.

  1. At the bottom of the Data Mapping section is an expandable Data Viewer panel.

     

    The panel shows the following:
  • The Raw Data tab shows the field input strings that the Data Mapping process pulled from your Data Source.
  • The Entity Data tab shows the cleansed, parsed and standardized output for that Data Source, mapped to Entity Resolution fields.

  1. Once the Data Mapping table appears, you are advised to review its outputs and amend the Data Mapping schema as needed.
  • For example, the process may accidentally recognize three component parts of one address as a separate address each. In such cases, you may want to manually amend the mapping table.
  • To manually amend the Data Mapping, complete the following steps:
    • Review the main Data Mapping section and amend any of the Entity Mapping and Mapping Field allocations using the drop down options, as needed.
    • Additionally, review the Manage Entities tab, under the Data Source tab in the top left.

       

    • The Manage Entities tab allows you to review the Entity Types that are pre-populated by the Data Mapping process, and the Entity Groups within these Entity Types.

       

  • Unify provides six different Entity Types. However, you can add new Entity Types as needed, and this feature is available in all versions of Unify except for the Demo version.
    • Within the Manage Entities tab, the Entity Groups sub-tab lists out the Entity Groups for each Entity Type. Ensure you click through each Entity Type and review the Entity Groups, making any amendments as needed.
  • For example, you may need to delete, add or edit Entity Groups as part of your Data Mapping review.

IMPORTANTManage Entities does not currently allow you to edit the Matching Level at a granular level for each Entity Type. You may only specify the Matching Level at the Iteration run stage.

IMPORTANT: Ensure that you review the Manage Entities tab for each of your Data Sources as there may be differences between the Data Sources. For example, in the Demo version, you will notice that the Entity Types for Contoso and Northwind are slightly different.

  1. You can resolve Entities within a single Data Source, such as where a database contains multiple entries per customer. However, Entity Resolution is typically used to resolve Entities across multiple Data Sources. Therefore, if you are using more than one Data Source, you must connect your second Data Source once the Data Mapping process for your first source completes.
    • The Data Mapping process outlined in the preceding points repeats for the second Data Source and any subsequent Data Sources.

(4) Running an Iteration

Once your Data Sources are mapped, you can now resolve and create Entities by running an Iteration.

  1. To run an Iteration, complete the following steps:
  • Return to the Explorer panel on the left.
  • Click the + next to Iterations.

     

  • A New Iteration pop-up appears.

     

  • Fill in the required details, with the following in mind: 
    • Provide a unique Iteration name that is easy to identify.
    • Select the Data Sources you want to resolve in this specific Iteration. For example, if you have five different Data Sources, you may want to run multiple Iterations using different Data Source combinations each time. You may use any number of Data Sources in an Iteration, from a minimum of one.
    • Choose the Matching Level for this Iteration.
    • Select the Destination Lakehouse. This is the Lakehouse in which Unify outputs the Entity Resolution output tables.
  • Click Run.

  1. The Iteration first completes some pre-processing jobs. It then resolves the Entities.

    The whole process can take approximately 10 minutes. Most of this time is spent on the overheads of setting up the jobs. Therefore, running 1 row, 1,000 rows or 100,000 million rows of data all take approximately the same time. The Quantexa system has been proven at 60 billion+ record volumes.

NOTE: For the Demo version, the test files are intentionally small to allow you to easily view the results.

  1. Once complete, the page displays a summary of the Iteration as follows:

  • The Information section on the top left contains administrative details about the Iteration. You can view further details by clicking on the Job Details button.
  • The Total Entities section on the top right compares the number of input records against the number of resolved Entities in bar chart format, by Entity Type.
  • The Output Data section at the bottom shows the Lakehouse tables created from the input records and the resolved Entities, by Entity type. This data is saved to the Lakehouse that you selected as your Destination Lakehouse in step 2.

(5) Viewing and using the Unify output

Once your Iteration is complete, you can view the Iteration output. These outputs are as follows:

  • Entity Resolution Power BI Report
  • Semantic Model of the Data Sources and the underpinning Entity Resolution tables

For further detail on the content of these outputs, see Unify: A closer look at selected key features.

 To view these outputs, complete the following steps:

  1. Return to the Workspace in which you set up your Unify workload. You can navigate to it using the Workspaces button on the left-hand navigation bar.

  2. A list of Fabric items, including your Workspace folder structure, is shown in the bottom panel of your Workspace. Scroll down through the list to find the following items, and click to open them:
  • The Iteration’s Power BI Report. The Report has the same name as your Iteration, preceded by 'Quantexa: '.
  • The Iteration’s Semantic Model, which includes the Entity Resolution tables. The Semantic Model has the same name as your Iteration.
    • Once in the Semantic Model space, you can view the related Entity Resolution tables by clicking on each table in the right-hand Tables panel.
    • To view the Semantic Model itself, click Open semantic model in the top menu bar.

Support

If you run into any issues while using the Unify workload, visit the Unify Support page. You can post a question outlining your issue and request for help or view previous posts to see if they answer your question.

Next steps

For an applied example of the step-by-step guide, including ways to use your Unify output downstream, see Unify: Example workflow.

Published 2 days ago
No CommentsBe the first to comment
Related Content