Entity Quality: The Good, the Bad and the Ugly ðŸŽ
This article looks at Entity Quality and how to judge it based on business need. It introduces Overlinking and Underlinking as a product of Entity Resolution, and these two concepts are two ends of a continuum. Entity Quality is the bedrock of any Entity Resolution (ER) based solution and has implications up and down the Quantexa stack. Getting this right is an essential part of an effective deployment. In this article we look at the basics of how to frame this problem and highlight that context is king for each solution. Read the full article here (login required): 3. Entity Quality: The Good, the Bad and the Ugly - Quantexa Community This article looks at Entity Quality and how to judge it based on business need. It introduces Overlinking and Underlinking as a product of Entity Resolution and these two concepts are two ends of a continuum. Note: All the examples in this article are fictional to illustrate how to approach Entity Quality. Entity Quality…62Views1like0CommentsScoring Concepts: Network Scoring
We can feed networks derived in Graph Scripting DSL into a Scoring pipeline to derive information and insight in the form of network-based scenarios. This article outlines: Concepts and approaches available for batch Network Scoring, including extracting information from graphs, testing, and debugging. Available tools and methods for performing network analysis with Assess. Read the article (login required): 2. Scoring Concepts: Network Scoring - Quantexa Community This article outlines the concepts and approaches available for batch Network Scoring, including extracting information from graphs, testing, and debugging. This article also highlights the available tools and methods for performing network analysis. The below steps are applicable for Assess only. Introduction We can feed…43Views0likes0Comments2.7 Quantexa Upgrade Guide
This guide aims to provide additional complementary guidance to that on the Docs Site related to the 2.7 Quantexa Upgrade. The 2.7 Quantexa Upgrade consists of three main parts: Core Product Changes Removal of Quantexa Incubators Data Packs Migration Read the full guide (login required): 2.7 Quantexa Upgrade Guide - Quantexa Community Quick Upgrade Overview The 2.7 Quantexa Upgrade consists of three main parts: Core Product Changes Removal of Quantexa Incubators Data Packs Migration Most of the Core Product changes are automated migrations and minor adjustments* which can be tested in a local environment. Migration to Delta Lake is going to be the…31Views1like0CommentsElastic Load Optimization Strategies 📖
Loading data into Elasticsearch can sometimes lead to performance issues, such as slow data loads or loads that fail to complete. The Elastic Load Optimization Strategies guide outlines actionable steps to help improve the performance and reliability of Elasticsearch loads. Key Elastic load optimization strategies: Shard Count Analysis Shards dictate parallelism in Elasticsearch. Adjusting the number of shards for a Document ensures efficient node utilization during loads. Spark Settings Optimize Spark job cores based on Elasticsearch node capacity to enhance indexing performance. Identifying the Problematic Index Pinpoint specific indices causing issues, such as those related to search or a single Entity, for focused troubleshooting. Compounds Table Analysis Analyze the Compounds/DocumentIndexInput.parquet table to uncover further optimization opportunities when issues persist. Compound Partitioning Address large file sizes by repartitioning the compound table during the creation step. Read the full article to explore these strategies and ensure faster, more reliable Elasticsearch loads (login required): Elastic Load Optimization Strategies - Quantexa Community Projects may encounter challenges with performance when loading data to Elasticsearch. This may present in the form of excessively slow loads or loads that fail to complete. The following outlines a series of steps projects should consider when trying to improve performance and reliability in such cases: Shard Count…31Views0likes0CommentsNew guide: Using the Entity Quality Underlinking (EQU) tool for the first time 📖
The Entity Quality Underlinking (EQU) tool is a powerful resource for tuning and monitoring Entity Resolution. Using the Entity Quality Underlinking (EQU) tool for the first time is a detailed guide to implementing the Entity Quality Underlinking tool, including its design, implementation tips, and practical use cases. Why is the Entity Quality Underlinking Tool useful? The Entity Quality Underlinking Tool helps you: Identify underlinked Entities and analyze root causes through manual examination in the UI. Measure the extent of underlinking over time, especially when tracking this metric in Production. Adjust Entity Resolution templates to address Overlinking issues identified earlier. What does the Entity Quality Underlinking Tool do? Monitoring and Tuning: The Entity Quality Underlinking Tool supports both tuning iterations and ongoing Entity Resolution monitoring. Analysis: It observes the similarity of Entity Elements and identifies potentially underlinked Entities. Output: The Entity Quality Underlinking Tool generates: Summary Statistics for tracking improvement or ongoing performance metrics. Potentially Underlinked Entities for investigation in the User Interface (UI). What’s in the guide? Step-by-step instructions for implementation. Design considerations for effective use. Tips to ensure smooth implementation and accurate results. Read the full article for a comprehensive understanding of how to integrate the Entity Quality Underlinking Tool into your Entity Resolution processes (login required): Using the Entity Quality Underlinking (EQU) tool for the first time - Quantexa Community This article details the implementation of the Entity Quality Underlinking (EQU) tool, developed to assist when tuning Entity Resolution. What is EQU? The EQU tool is used as part of Entity Resolution Tuning and BAU Entity Resolution monitoring. It observes the similarity of your Entities' Elements and identifies whether…31Views1like0CommentsNew guide 📖 Scoring Concepts: Write-Once Steps
Discover how to efficiently use Write-Once Steps in the Assess framework for data transformation. This detailed guide complements the Write-Once Steps documentation and helps you determine when to apply Write-Once Steps effectively in both Batch and Dynamic Scoring contexts. Key topics covered: The potential cost of using the wrong method When to use Write-Once Steps vs. Logical Sources Strategies for scoring networks in Batch (SparkScoringContext) and Dynamic (DynamicScoringContext) environments Gain a deeper understanding of how to avoid duplicating logic across contexts and streamline your data engineering workflows. Read the full article (login required) to explore practical scenarios and best practices for scoring networks with Write-Once Steps: 5. Scoring Concepts: Write-Once Steps - Quantexa Community This article serves as an extension of the Product documentation of Write-Once Steps and provides a guide on which situation should the Write-Once Steps be used. Introduction The Write-Once Step is a data transformation step in the Assess framework, which is executable in both Batch and Dynamic contexts, meaning you do not…21Views0likes0Comments