This article provides information about the various Entity Resolution Tools which can be used on projects. Typically these tools are most relevant to Implementation Engineers or Senior Users very interested in understanding the detail of Entity Resolution within their deployments. Ultimately, all the tools intend to help you understand, and analyze, Entities, when focusing on improving Entity quality.
Tool Designations
A detailed description of each designation can be found on the documentation website. As a reminder:
- General Availability: This tool is available in a core Quantexa library. Quantexa supports the feature, it has comprehensive documentation, has been fully tested and is therefore considered production ready. All Quantexa projects have access to these tools.
- Early Access: This tool will be added to the product, but is currently in a feature validation or testing phase. Projects will need to consider that tools can undergo fundamental changes as part of the feedback from Early Access. As such you will likely need to get agreement that it's appropriate before using the tool. Documentation is available on the AI documentation site. Speak to your project architect or Quantexa contact for details of accessing model files.
- Experimental Access: This tool does not attract formal support by Quantexa but has been released to support innovation or validation of approaches to solve a problem. It is possible that these tools will never reach formal support and may not be compatible with every version of the Quantexa Platform. Customers should consider whether experimental tools will be right for them and they may require a level of expertise within the project to get the most value from them.
- Delivery community: These tools are written by the delivery community and can be found in the delivery community repository. Support is on a best efforts basis by the Quantexa Team on Community. Documentation exists in the repository as markdown files and can be viewed within GitHub. Speak to your project architect or Quantexa contact for access to the repository if you do not already have it as it is not granted by default.
Product
Entity Lab
Entity Lab is available within the Quantexa UI and is the default way to review an Entity.
Entity Explorer
Entity Explorer provides a view of all Entities from a batch build and allows the user to investigate and query for Entities that match certain conditions. It is helpful for both thematic review and review of specific instances.
Note: This is available from 2.4 onwards.
Batch Resolver (QBR) Analyzer
The QBR Analyzer (also known as Batch Resolver Analyzer) provides summaries from the batch Entity build in two sets of reports, a compounds report and an Entity report.
Note: As of version 2.4 this is now available in two scripts, Compound Report and Entity Report
Entity Report
The Entity Report provides summary statistics on the sizes and distribution of Entities generated by Batch Resolver.
Compound Report
The Compound Report provides summary statistics on the excluded Compounds. The report provides information to enable you to assess the quality of data and the effectiveness of Compounds used for Entity Resolution.
Statistical Profile Testing Framework
SPTF allows for the configuration of statistics which can be calculated at a determined frequency from batch Entity builds. This tool should be used to monitor the population level statistics of Entities.
Entities and Networks Regression Testing Framework
The regression testing framework is used to make sure new deployments don't cause ER issues.
Note: This is deprecated as of 2.4.1.
Bad Entity Analyzer
The Bad Entity Analyzer is a tool to find Entities and compounds that pull together too many instances of a trusted compound. The reports produced can be used to identify issues of over-linking.
Entity Quality - Overlinking
The Entity Quality Overlinking tool scores the structure of an Entity to predict whether the Entity is overlinked. Scores are output between 0-100. A higher score indicates better quality. The tool can be used to quickly find severe issues with overlinking in your Batch Resolver runs. It can also be used to test for regressions between runs.
Note: This is available in product from version 2.5 onwards.
Early Access
Experimental
Entity Quality - Underlinking
The Entity Quality Underlinking tool searches for instances of underlinking in your data by expanding the network from a starting point and finding similar-looking Entities that may be the same based on how similar configured attributes are between them.
ORCA
The Overlinking Root Cause Analyzer (ORCA) tool identifies bad compound types, values, and default values which are contributing to overlinking in your Entity Resolution. Changes can then be made to address these issues and improve ER performance. Instructions and documentation can be provided on request.
ERO
The Entity Resolution Optimizer (ERO) is a tool designed to aid in the rapid prototyping of a Resolver configuration consumed by Batch Resolver.
The tool enables you to test the effect of different Compounds and Exclusions quickly and efficiently by simulating Batch Resolver. You can then analyze these different simulations to discover what changes may improve your Entities.
When you are satisfied with the Entities produced in the simulation, you can use the refined Resolver configuration in Batch Resolver as you typically do for Entity Resolution.
Delivery-community
Entity Build Diff
Entity Build Diff summarizes the differences between two batch Entity builds. It provides a report on the compounds and exclusions that result in changes as well as an option to visualize the changes that happen to specific Entities.
Note: This is available for Quantexa 2.2, 2.3, and 2.4
Big Entity Visualizer
Big Entity Visualizer takes large Entities in your batch build and visualizes them. It achieves this by collapsing records together that have the same value for a configured element (e.g. businessDisplay) and then visualizes what compounds cause the different names to link to each other.
Note: This is available for Quantexa 2.2, 2.3, and 2.4