Unify: Example workflow
This page provides an example, easy-to-follow workflow as a hypothetical customer using the Data Sources provided in the Unify Demo version.
It shows the step-by-step guide in action, to help you develop a comprehensive understanding of how to use the workload and the different ways to use its output.
Table of Contents
Overview
The following steps show how you would apply the step-by-step guide and interact with the workload at each stage, including analyzing the workload outputs.
Prerequisites
Before beginning, you complete all prerequisites required.
- These are listed in the Prerequisites section of the step-by-step guide.
Creating your Project
You now proceed to creating your project.
- You navigate to your workspace and click to create a New Item.
- You select the Unify workload from the pop-up list.
- You call the new Project BrandABC and create the Project.
Adding Data Sources
You have both the Contosa and Northwind data. You want to analyze the Contosa data first, so you decide to upload it by itself for now.
- On your Project homepage, you click to add a Data Source from the Explorer panel.
- In the pop-up, you select Contoso as your first Data Source and connect it.
The Data Mapping process automatically begins.
Data Mapping
Once the Data Mapping is complete, you are presented with the mapping output.
- You review the mapping output and see that the nationalId field is not mapped. It is, therefore, not being included in the Entity Resolution process.
- You decide to edit the mapping schema to map it into the Individual -> nationalId field, which now includes it in the Data Mapping.
Running a first Iteration
You want to judge the quality of the Contoso Data Source by itself first, so you decide to conduct Entity Resolution for only that source first.
NOTE: Remember that single-source Entity Resolution is a viable use for the Unify workload, even though you would more typically run Entity Resolution between two or more Data Sources.
- You create the first Iteration, name the Iteration Contoso, and click Run.
- The Iteration completes and you can now analyze the results.
Viewing and using the Unify output
The automatic outputs of the workload are as follows:
- Iteration summary
- Power BI Report
- Entity Resolution tables
- Semantic Model of the Entity Resolution tables
(A) You first view the Iteration summary.
- The summary appears automatically once the Iteration is complete.
- You see that the Total Entities chart and summary of Output Data show the total number of resolved Entities for each Entity Type.
(B) You next want to view the Power BI Report, which includes more detailed bar charts and data tables.
- You navigate to your Workspace and open the Power BI Report.
- You see the bar charts showing counts for Entities by certain measures. You navigate through the report using the different tabs on the Pages panel on the left, which allows you to view the results for each Entity Type.
- You also want to explore the underlying data. Therefore, you scroll down to the data table below the bar charts.
- You click on the table and see that the Filters panel on the right now shows certain options. You use these options to examine the data.
-
- For example, under the Individual tab on the Pages panel, you use the Name filter to search for all names matching ‘DEBORA’ (uppercase) and find that three Entities match.
- As another example, under the Address tab on the Pages panel, you use the Address filter to search for all addresses matching ‘St. Peters’ and find two Entities that match.
IMPORTANT: The search filters within the Pages panel are case sensitive.
(C) Next, you want to view the Entity Resolution tables in full and see a Semantic Model of them.
- You navigate to your Workspace and open the Semantic Model.
- On the Semantic Model homepage, you see the Tables panel on the right, which lists the output data tables from your Iteration. You click on one of the tables to view the underlying data.
- You next open the semantic Model of the tables.
- You review the model to identify the relationships between the tables.
- Clicking on a specific table in the Data panel on the right, takes you to that table in the Semantic Model.
IMPORTANT: You can also view the output data by opening the OneLake Lakehouse you sent your Iteration output to. Note that if your Lakehouse is shared in your organisation, that other data tables unrelated to your Iteration appear here too.
Uploading a new Data Source and running a new Iteration
Once you have reviewed the results of your first Iteration that uses only the Contoso Data Source, you decide to upload the Northwind Data Source and run an Iteration between the two.
- The steps are the same as in the preceding points, starting from Adding a Data Source.
- However, one difference is that you must ensure you select Northwind as a second Data Source when running your second Iteration.
- You name this Iteration Contoso-Northwind.
- Once the second Iteration runs, you view the results, using the steps outlined in the Viewing and using the Unify output section.
Comparing results between Iterations
You want to compare results between the two Iterations. You do this programmatically by using the Notebook functionality within Fabric.
You complete the following steps:
- Within the Lakehouse where the output data is stored, you click Open notebook from the top menu bar and select New notebook.
- Next, you input the code required to run the comparison. The following code block provides an example of the code you would input if you were comparing the changes to the Individual Entity type, especially the national ID Entity Group.
IMPORTANT: For your own Project, you must configure the name of the Lakehouse you selected for the outputs, and the Project and Iteration names.
# Configure to suit your Project.
lakehouse="EntityOutputs" #Name of the lakehouse containing the output of the iterations
projectName="BrandABC" #Name of the Quantexa unify project
contosoOnlyIterationName="Contoso" #Name of the iteration with Contoso data
contosoNorthwindIterationName="ContosoNorthwind" #Name of the iteration with Contoso and Northwind data
#Raw input data
contosoRecords = spark.sql(f"SELECT * FROM {lakehouse}.contoso")
#Resolved entity output: Contoso Only
individualsContosoOnly = spark.sql(f"SELECT * FROM {lakehouse}.quantexa_{projectName}_{contosoOnlyIterationName}_individual_records")
#Resolved entity output: Contoso and Northwind
individualsContosoNorthwind = spark.sql(f"SELECT * FROM {lakehouse}.quantexa_{projectName}_{contosoNorthwindIterationName}_individual_records")
#Find any contoso records which have changed the Entity they are associated with
changedDocuments=individualsContosoNorthwind.exceptAll(individualsContosoOnly)
display(changedDocuments)
#And all the Entities in the "contosoNorthwind" build that have changed
changedEntities=individualsContosoNorthwind.join(changedDocuments.select("entityId"), "entityId")
display(changedEntities)
#join on the raw data
entitiesWithRawData=changedEntities.join(contosoRecords, changedEntities["documentId"]==contosoRecords["customerID"], "INNER")
display(entitiesWithRawData.drop("entityType","documentType","documentId"))
This outputs a table showing the differences between the two Iterations.
Next steps
Remember to refer back to the step-by-step guide when using the workload, for full instructions on how to use it.