Node Navigation
Community Guides
Learn more about the Community, get helpful tips and tricks, FAQs, and give us feedbackNews & Announcements
Stay up-to-date with all the latest news from QuantexaProduct Releases
Information and announcements on new releases, release notes, and related eventsUser Groups
Dedicated spaces for Community members to connect around shared use cases or areas of interestPlatform Library
Articles, guides, and blog posts from Quantexa expertsGetting Started
If you're new to Quantexa or the Community, you'll find everything you need here.Quantexa Academy
Stay up-to-date with the latest news and announcements from the Quantexa Academy Team. Ask questions about our programs and certifications, and get assistance from our technical experts.What’s New
Unify: Example workflow
This page provides an example, easy-to-follow workflow as a hypothetical customer using the Data Sources provided in the Unify Demo version. It shows the step-by-step guide in action, to help you develop a comprehensive understanding of how to use the workload and the different ways to use its output. Overview The following steps show how you would apply the step-by-step guide and interact with the workload at each stage, including analyzing and using the workload outputs. Prerequisites Before beginning, you complete all prerequisites required. These are listed in the Prerequisites section of the step-by-step guide. Creating your Project You now proceed to creating your project. You navigate to your workspace and click to create a New Item. You select the Unify workload from the pop-up list. You call the new Project BrandABC and create the Project. Adding Data Sources You have both the Contosa and Northwind data. You want to analyze the Contosa data first, so you decide to upload it by itself for now. On your Project homepage, you click to add a Data Source from the Explorer panel. In the pop-up, you select Contoso as your first Data Source and connect it. The Data Mapping process automatically begins. Data Mapping Once the Data Mapping is complete, you are presented with the mapping output. You review the mapping output and see that the nationalId field is not mapped. It is, therefore, not being included in the Entity Resolution process. You decide to edit the mapping schema to map it into the Individual -> nationalId field, which now includes it in the Data Mapping. Running a first Iteration You want to judge the quality of the Contoso Data Source by itself first, so you decide to conduct Entity Resolution for only that source first. NOTE: Remember that single-source Entity Resolution is a viable use for the Unify workload, even though you would more typically run Entity Resolution between two or more Data Sources. You create the first Iteration, name the Iteration Contoso, and click Run. The Iteration completes and you can now analyze the results. Viewing and using the Unify output The automatic outputs of the workload are as follows: Iteration summary Power BI Report Entity Resolution tables Semantic Model of the Entity Resolution tables (A) You first view the Iteration summary. The summary appears automatically once the Iteration is complete. You see that the Total Entities chart and summary of Output Data show the total number of resolved Entities for each Entity Type. (B) You next want to view the Power BI Report, which includes more detailed bar charts and data tables. You navigate to your Workspace and open the Power BI Report. For guidance on how to find the Power BI Report, see Unify: Step-by-step guide to using the workload. You see the bar charts showing counts for Entities by certain measures. You navigate through the report using the different tabs on the Pages panel on the left, which allows you to view the results for each Entity Type. You also want to explore the underlying data. Therefore, you scroll down to the data table below the bar charts. You click on the table and see that the Filters panel on the right now shows certain options. You use these options to examine the data. For example, under the Individual tab on the Pages panel, you use the Name filter to search for all names matching ‘DEBORA’ (uppercase) and find that three Entities match. As another example, under the Address tab on the Pages panel, you use the Address filter to search for all addresses matching ‘St. Peters’ and find two Entities that match. IMPORTANT: The search filters within the Pages panel are case sensitive. (C) Next, you want to view the Entity Resolution tables in full and see a Semantic Model of them. You navigate to your Workspace and open the Semantic Model. For guidance on how to find the Semantic Model, see Unify: Step-by-step guide to using the workload. On the Semantic Model homepage, you see the Tables panel on the right, which lists the output data tables from your Iteration. You click on one of the tables to view the underlying data. You next open the semantic Model of the tables. You review the model to identify the relationships between the tables. Clicking on a specific table in the Data panel on the right, takes you to that table in the Semantic Model. IMPORTANT: You can also view the output data by opening the OneLake Lakehouse you sent your Iteration output to. Note that if your Lakehouse is shared in your organisation, that other data tables unrelated to your Iteration appear here too. Uploading a new Data Source and running a new Iteration Once you have reviewed the results of your first Iteration that uses only the Contoso Data Source, you decide to upload the Northwind Data Source and run an Iteration between the two. The steps are the same as in the preceding points, starting from Adding a Data Source. However, one difference is that you must ensure you select Northwind as a second Data Source when running your second Iteration. You name this Iteration Contoso-Northwind. Once the second Iteration runs, you view the results, using the steps outlined in the Viewing and using the Unify output section. Comparing results between Iterations You want to compare results between the two Iterations. You do this programmatically by using the Notebook functionality within Fabric. You complete the following steps: You open the Lakehouse where the output data is stored. You click Open notebook from the top menu bar and select New notebook. Next, you input the code required to run the comparison. The following code block provides an example of the code you would input if you were comparing the changes to the Individual Entity type, especially the national ID Entity Group. IMPORTANT: For your own Project, you must configure the name of the Lakehouse you selected for the outputs, and the Project and Iteration names. # Configure to suit your Project. lakehouse="EntityOutputs" #Name of the lakehouse containing the output of the iterations projectName="BrandABC" #Name of the Quantexa unify project contosoOnlyIterationName="Contoso" #Name of the iteration with Contoso data contosoNorthwindIterationName="ContosoNorthwind" #Name of the iteration with Contoso and Northwind data #Raw input data contosoRecords = spark.sql(f"SELECT * FROM {lakehouse}.contoso") #Resolved entity output: Contoso Only individualsContosoOnly = spark.sql(f"SELECT * FROM {lakehouse}.quantexa_{projectName}_{contosoOnlyIterationName}_individual_records") #Resolved entity output: Contoso and Northwind individualsContosoNorthwind = spark.sql(f"SELECT * FROM {lakehouse}.quantexa_{projectName}_{contosoNorthwindIterationName}_individual_records") #Find any contoso records which have changed the Entity they are associated with changedDocuments=individualsContosoNorthwind.exceptAll(individualsContosoOnly) display(changedDocuments) #And all the Entities in the "contosoNorthwind" build that have changed changedEntities=individualsContosoNorthwind.join(changedDocuments.select("entityId"), "entityId") display(changedEntities) #join on the raw data entitiesWithRawData=changedEntities.join(contosoRecords, changedEntities["documentId"]==contosoRecords["customerID"], "INNER") display(entitiesWithRawData.drop("entityType","documentType","documentId")) This outputs a table showing the differences between the two Iterations. Next steps Remember to refer back to the step-by-step guide when using the workload, for full instructions on how to use it.0CommentsUnify: FAQs
This page sets out frequently asked end-user questions when using Unify. For frequently asked questions for administrators, see Adding the Unify workload: technical prerequisites. Q: I’m experiencing a Fabric error or no response when creating a new Project A: Occasionally, after attempting to create a new Project, there is no response or error message from Fabric. This is due to a known intermittent issue with the Microsoft Fabric system, where it appends the clientSideAuth=0 parameter to the URL in your web browser's address bar. If the URL contains clientSideAuth=0 remove it then try again. Q: My Iteration fails when running it against my own Data Sources A: Occasionally, your Iteration may fail when running it against your own Data Sources. In this instance, you must navigate to the monitoring hub to find the job that failed. The failed job’s name matches the name of the Iteration in the Unify UI. On locating the job in the monitoring hub, you must investigate the job’s logs to see why your Iteration failed. Further support If your query is not answered in the FAQs, you can contact Quantexa Unify Support for further assistance.0CommentsUnify: Further guidance on selected key features
This page provides further guidance on some key features of the Unify workload. Overview The selected features discussed on this page are those you will encounter as you use the Unify workload. This page provides further advice and guidance on using these features, in addition to their basic definitions on the Core Concepts page. Features The following sub-sections provide further detail on some of Unify's key features. Data Mapping For a definition of Data Mapping, see Unify: Core concepts. Data Mapping is an integral part of Quantexa’s Entity Resolution solution. Quantexa’s Data Mapping process in the Unify workload focuses on mapping your Data Sources to pre-defined Entity Type and Entity Group fields. In the context of the Unify workload, Data Mapping seeks to answer some initial questions about your Data Source such as the following: What source fields match the Unify Entity attribute fields? Which should they be mapped to? For source fields that do not directly match Unify’s pre-defined Data Mapping fields, what are the most suitable matches? If there are no suitable matches, why? What Entity Types and Entity Groups are being populated by the source data? To what percentage are these fields being populated? As noted in the step-by-step walkthrough, while you may edit the Data Mapping process output, the process itself runs automatically on loading a Data Source. This saves significant time and manual effort. However, to ensure accurate Data Mapping in Unify, your data must be in a suitable format and have some logical structure for the mapping process to read it effectively. Iterations For a definition of Iteration, see Unify: Core concepts. Running an Iteration serves two purposes: Conducting Entity Resolution on the Data Sources you include for that Iteration. Comparing Entity Resolution outputs across multiple Iterations that use different Data Sources, or different combinations of Data Sources. In addition to comparisons on the data content, an Iteration can help you compare data quality, Entity Resolution metrics and field population rates between your Data Sources. The first scenario is straightforward, and thanks to Quantexa’s Entity Resolution features within the Unify workload, you can use the workload to build a trusted data foundation directly. The second scenario would be more complex without the Unify workload, as it would require a significant investment of time and resources to conduct a true comparison. However, with the Unify workload, the complex is made simple. You simply run multiple iterations using the straightforward step-by-step process. Matching Levels For a definition of Matching Level, see Unify: Core concepts. The availability of Matching Levels helps you tailor Unify’s Data Mapping and Entity Resolution processes to your Project’s needs. As a reminder, there are three available Matching Levels within the Quantexa Unify workload: Default, Fuzzy, and Strict. The following are example use cases for Fuzzy and Strict Matching Levels. Fuzzy: You can use a Fuzzy Matching Level in a scenario like matching customers to a watchlist in the Financial Crime arena. Due to the seriousness of the matter, you want to ensure you find all possible matches. Even where there is Overlinking, you are happy to manually review the matches to find the correct ones. Strict: You can use a Strict Matching Level in a scenario like generating a master set of customers in Master Data Management. As the output may be used to trigger automatic action, such as contacting customers, and you are unlikely to review the matches, you want to ensure that all generated matches are correct. Even where there is Underlinking, you are happy to have a smaller scope of matches given the reputational and practical consequences of any incorrect matches. The following factors can help you decide which Matching Level to choose at the Data Mapping stage and for each Iteration: The quality of your Data Source. The completeness of your Data Source. Your particular use case. For example, if you are planning to use the Entity Resolution output to execute automated tasks without reviewing all matches, it may be better to use a Strict matching level. For cases where you want to ensure you have all possible matches, even with overlinking, you may want to use a Fuzzy matching level. If you are not sure which Matching Level to use, you can opt for the Default Matching Level, as this strikes a balance between Overlinking and Underlinking. Automated output After completing an Iteration, the Unify workload automatically outputs the results of the Entity Resolution process into the following: Iteration summary The summary shown for an Iteration after Entity Resolution is a bar-chart in the top-right corner. The bar chart shows a comparison between the total number of input Records against the total number of resolved Entities for each Entity Type. Power BI Report The automatic report shows summaries of key information for Entity Types, such as Entity size, Entities by Address and Entities by Business and Individual counts. Entity Resolution records tables and Entities tables Records tables show the records that triggered the resolution of a particular Entity. For example, the workload outputs multiple tables showing the relevant records for a particular Entity. Each record table covers a specific Entity type, such as Individual or Address. Entities tables show the Entities the source data has resolved to. For example, you may input two Data Source tables, and after Entity Resolution, the workload outputs multiple additional tables showing the resolved Entities. Each table covers a specific Entity type, such as Individual or Address. Semantic Model An Iteration’s Semantic Model shows the relationships between the tables described in the preceding point and your input Data Source tables, within an Iteration. For further information on Semantic Models in Microsoft Fabric, see Power BI Semantic Models in Microsoft Fabric. Additionally, using the automatic outputs, you can optionally create other outputs within the broader Fabric suite, including the following: Other types of Power BI reports Power BI is a functionality provided by Microsoft Fabric, and not by the Unify workload. Power BI reports are typically based on one Semantic Model and can feature visualizations such as charts, graphs and tables to provide data insights. They can help you explore your data – and the output of Unify – further. For more information on Power BI reports, see Reports in Power BI. Notebooks Power Query (M script) with Dataflow Gen2 Next steps For a guide to using the Unify workload, see Unify: Step-by-step guide to using the workload. For an applied example of the step-by-step guide, see Unify: Example workflow.0Comments
Sign in to unlock exclusive content 🔒
Sign InNew to the Community?
- 4.5KViews12likes
- 4.3KViews1like
- 3.1KViews2likes
Find your group. Join the conversation
Whether you’re interested in Data Management, FinCrime, or our Architects User Group, connect with users, partners, and experts to share insights, collaborate, and grow your knowledge.
Join a User Group