Here are some useful tools to help you understand, and analyze, Entities, when focusing on improving Entity quality.
Tool designations
These tools are categorized under the following:
- Product: This tool is available in a core Quantexa library. The Quantexa Product Management Team supports the code and it is considered production ready. All Quantexa projects have access to these tools.
- Pre-product or beta testing: This tool will be added to the product, but is currently in a beta testing phase. You will likely need to get some agreement that it's appropriate before using the tool. Documentation is available on the AI documentation site. Speak to your project architect or Quantexa contact for details of accessing model files.
- Delivery community: These tools are written by the delivery community and can be found in the delivery community repository. Support is on a best efforts basis by the Quantexa Team on Community. Documentation exists in the repository as markdown files and can be viewed within GitHub. Speak to your project architect or Quantexa contact for access to the repository if you do not already have it as it is not granted by default.
Product
Entity Lab
Entity Lab is available within the Quantexa UI and is the default way to review an Entity.
Entity Explorer
Entity Explorer provides a view of all Entities from a batch build and allows the user to investigate and query for Entities that match certain conditions. It is helpful for both thematic review and review of specific instances.
Note: This is available from 2.4
onwards.
QBR Analyzer
QBR Analyzer provides summaries from the batch Entity build in two sets of reports, a compounds report and an Entity report.
Note: As of version 2.4
this is now available in two scripts, Compound Report
and Entity Report
Entity Report
The Entity Report provides summary statistics on the sizes and distribution of Entities generated by Batch Resolver.
Compound Report
The Compound Report provides summary statistics on the excluded Compounds. The report provides information to enable you to assess the quality of data and the effectiveness of Compounds used for Entity Resolution.
Statistical Profile Testing Framework
SPTF allows for the configuration of statistics which can be calculated at a determined frequency from batch Entity builds. This tool should be used to monitor the population level statistics of Entities.
Entities and Networks Regression Testing Framework
The regression testing framework is used to make sure new deployments don't cause ER issues. Note: This is deprecated as of 2.4.1
.
Bad Entity Analyzer
The Bad Entity Analyzer is a tool to find Entities and compounds that pull together too many instances of a trusted compound. The reports produced can be used to identify issues of over-linking.
Pre-product
Entity Quality - Overlinking
The Entity Quality Overlinking tool scores the structure of an Entity to predict whether the Entity is overlinked. Scores are output between 0-100. A higher score indicates better quality. The tool can be used to quickly find severe issues with overlinking in your Batch Resolver runs. It can also be used to test for regressions between runs.
Note: This will be in the product from version 2.5
onwards.
Entity Quality - Underlinking
The Entity Quality Underlinking tool searches for instances of underlinking in your data by expanding the network from a starting point and finding similar-looking Entities that may be the same based on how similar configured attributes are between them.
ORCA
The Overlinking Root Cause Analyzer (ORCA) tool identifies bad compound types, values, and default values which are contributing to overlinking in your Entity Resolution. Changes can then be made to address these issues and improve ER performance. Instructions and documentation can be provided on request.
Delivery-community
Entity Build Diff
Entity Build Diff summarizes the differences between two batch Entity builds. It provides a report on the compounds and exclusions that result in changes as well as an option to visualize the changes that happen to specific Entities.
Note: This is only available for Quantexa 2.2
. Support for 2.3
and 2.4
is planned soon.
Big Entity Visualizer
Big Entity Visualizer takes large Entities in your batch build and visualizes them. It achieves this by collapsing records together that have the same value for a configured element (e.g. businessDisplay
) and then visualizes what compounds cause the different names to link to each other.
Note: This is only available for Quantexa 2.2
. Support for 2.3
and 2.4
is planned soon.