Elastic Load Optimization Strategies ๐
Loading data into Elasticsearch can sometimes lead to performance issues, such as slow data loads or loads that fail to complete. The Elastic Load Optimization Strategies guide outlines actionable steps to help improve the performance and reliability of Elasticsearch loads. Key Elastic load optimization strategies: Shard Count Analysis Shards dictate parallelism in Elasticsearch. Adjusting the number of shards for a Document ensures efficient node utilization during loads. Spark Settings Optimize Spark job cores based on Elasticsearch node capacity to enhance indexing performance. Identifying the Problematic Index Pinpoint specific indices causing issues, such as those related to search or a single Entity, for focused troubleshooting. Compounds Table Analysis Analyze the Compounds/DocumentIndexInput.parquet table to uncover further optimization opportunities when issues persist. Compound Partitioning Address large file sizes by repartitioning the compound table during the creation step. Read the full article to explore these strategies and ensure faster, more reliable Elasticsearch loads (login required): Elastic Load Optimization Strategies - Quantexa Community Projects may encounter challenges with performance when loading data to Elasticsearch. This may present in the form of excessively slow loads or loads that fail to complete. The following outlines a series of steps projects should consider when trying to improve performance and reliability in such cases: Shard Countโฆ31Views0likes0CommentsCommon Elastic Loader errors
For common elastic loader errors, such as load elastic job failing, read Common Elastic Loader errors (login required) on our Docs site. You'll find some common Elastic Loader failure cases, and how to address them. The errors apply to both the Resolver-Search Elastic Loader and the Generic Elastic Loader. These include: Connection issues Performance issues Elasticsearch is overloaded Spark is running out of memory Count validation failure Data could not be indexed Did you know you can also click the elastic-loader tag and then filter to find Questions with an answer that is Accepted by the Community?54Views1like0CommentsWhy Does Entity Quality Matter? & Best of the Community from March
March Top Picks Why Entity Quality Matters ๐login required Enter our latest competition: The Knowledge Exchange ๐ New Education Programs: Scala & Spark Bootcamp and Quantexa Data Engineer Velocity Program Quantexa & Xander Talent - New Education Partnership ๐ค Elevating Data Management: Unveiling the Pillars of a Trusted Data Foundation with Quantexa Latest from the Community Library ๐ A day in the life of a... Senior Learning Designer A day in the life of... an Academy Trainee How To Test Your Upgrades ๐login required Updates to Quantexa Supported Versions ๐login required Upcoming events ๐๏ธ 3rd May Community Connect ๐ Join for a demo of top Community features. Best of Q&A โ Unable to load Batch Scores to Elastic ๐login required How to fix image style for investigation icon in the Qx UI? ๐login required Error creating bean with name 'springSecurityFilterChain' ๐login required New & Popular Ideas๐ก Usability increase using 2 screens for investigations ๐login required Changing the default settings for Graphic Filters in Data Viewer ๐login required Make updates to metadata.parquet optional ๐login required In case you missed it ๐ฃ Welcome to Quantexa 2.6 | 2.6.0 Release Announcement ๐ Badge of the Month: The Name Dropper Badge Community quick links ๐ Submit and vote for Ideas in our Ideas Portal ๐ฃ๏ธ Join one of our Specialist User Groups: FinCrime, Insurance, Data Management & KYC ๐๏ธ Browse blogs, articles and guides in our Community Library131Views1like0CommentsWelcome to Quantexa 2.3 | 2.3.0 Release Announcement
In Quantexa 2.3, you will find: Entity Store, a new component in our set of Entity Resolution capabilities; Introduction of Security Model V2, a new way to manage Role-Based Access and Control within The Quantexa Platform; Enhancements to Assess including Path Ranking, a new way to define paths; Support for Elasticsearch 8 and Spark 3.2. Entity Store In 2.3.0, we are introducing Entity Store in beta. Before 2.3.0, all interactions with the Quantexa UI and the Quantexa Mid-tier APIs depended upon resolving Entities on-the-fly (dynamically). The Entity Store allows a persisted or materialized view of Entities to be stored in the system. In this first release, the core Entity Store will load a full set of pre-resolved Entities produced by Batch Resolver, into Elasticsearch. The Entity Store can then be used by Explorer to enable querying of the underlying entities, as a Cache for Resolver to improve performance, and to support the new Entity REST API. Security Model V2 We have introduced a new framework for authentication and authorization across the platform. This gives the platform greater control over the data and features that a user can access and enables easier integration with Identity Providers (IdPs). End-users no longer need to interact directly with low-level Quantexa Roles. They are now able to share their work with Groups that are logical to their organizational structure. For example: UK Fraud Investigators. Using a new User Management Screen, you can now provision users and Groups, collections of users that can be assigned Roles and Dynamic Privileges, to The Quantexa Platform from Identity Providers simply and easily. Path Ranking It is now possible to specify the rules for defining the relative importance of paths, a chain of Documents and Entities that connect two Nodes, when writing Network Scores with Path Ranking, which make the Network easier to analyze by signaling the most significant areas to explore further. Dual Context Sources Dual Context Sources enable scoring logic to be designed that will work in both batch and dynamic, and provides a config-based tool that generates Source steps for both pipelines. This simplifies the deployment of a build with a dual architecture. Support for Elasticsearch and Spark Quantexa now supports Elasticsearch 8 across the platform, except for Offline Indexers and Spark 3.2 is now supported across the platform. Warning: Spark 3.0 is deprecated and support will be removed completely in the next release of the platform. Spring Boot upgrade Spring Boot has been upgraded to version 2.6.8. This results in faster startup times and solves a number of security vulnerabilities found in older versions of Spring Boot. Documentation Site Glossary We have completely revamped and reworked our Documentation site glossary, adding almost 100 new terms. Explore the Glossary to find out more. Other highlights For more control and flexibility when running ETL, Quantexa has has added the ability to generate Resolver Search Loader and Compound Creator scripts at the point where users define a Root Model in Data Fusion. List inputs have been introduced for Entity Attribute functions, to allow for more flexibility and customizability, granting users more ways to define Entity Attributes in Data Fusion, and cater to a wider range of data models. You are now able to collapse the Query Builder in Explorer. This is the next step in improving the user experience of Explorer, allowing users to focus on the results of the query. You will find the full set of Release Notes on the Quantexa Documentation site. If you are unable to access them, you will need to get a user with access to submit a Documentation site access request through the Quantexa Support Portal.1.2KViews1like3CommentsWelcome to Parsers 4.1 | Release Announcement
We are excited to announce the release of version 4.1 of Quantexa's Standard Parsers. This release focuses on improving the integration with Fusion UI (look out for exciting 2.6 release announcements coming soon) and improvements of the file structure of configuration files. This release includes the following highlights, which are detailed below. Consistency of configuration files - general improvements to Parser and lexicon configuration and files have been introduced to make sure the way you use these files is consistent across all available Parsers. This will make your configuration easier to understand and simplify the process of making future modifications. To minimise redundant data storage in Elastic Search, you can now exclude business standardisation terms that arenโt used in areas such as exclusions for Entity Resolution. Similarly, you can now choose to parse multiple names or just a single name for the Individual Parser to reduce your Elastic Search footprint. For the Telephone Parser you can now specify conditional parsing rules to increase the output accuracy. For example, if you have more specific parsing rules for UK telephone numbers, you can now use country code to parse these telephone numbers differently to the default telephone parsing behaviour. Note: There are no changes in this release that affect the output of parsing.151Views1like1CommentOctober Community Digest ๐
Welcome to this spooktacular edition of our Community Digest, where we gather the most thrilling posts from our members just in time for Halloween ๐ป Public content: ๐ฌ Tips & Tricks for Managing Large and Complex Networks - Discussion ๐ Join our Monthly Community Connect on November 10th - Event ๐ฃ Quantexa's Chris Bagnall wins ACAMS Today Article of the Year 2023 - Announcement ๐ Understanding Human Trafficking - Blog ๐ฌ Elasticsearch and Why We Use It - Discussion ๐ Automatic Data Cleaning Through Data Normalisation and Statistics - Blog ๐ Unlocking the Power of pKYC: Smarter KYC Processes EMEA and APAC webinars ๐Community Competition - Refer 5 Colleagues to Win! ๐ Elasticsearch Considerations for Quantexa - Blog ๐ฌ Perspective: Addressing SEC-Identified AML Program Deficiencies at Broker-Dealers - KYC Group Discussion ๐ Using Data Fusion for the first time - Blog Members content (log in required): ๐กMake dropdowns in Explorer charts searchable and sort options alphabetically in multiple languages - Idea โ๏ธLoad Elastic not connecting - Academy Q&A ๐กAuditing search viewer click event for audit monitoring - Idea โ๏ธWhen aggregating entity scores, only one appears in the customer scorecard - Q&A ๐กRunning out of screen real estate in the Investigation view - Idea Community quick links: ๐ก Submit and vote for Ideas in our Ideas Portal ๐ฃ๏ธ Join one of our Specialist User Groups: FinCrime, Insurance, Data Management & KYC ๐๏ธ Browse blogs, articles and guides in our Community Library New to the Community? Sign up for a Community Tour ๐121Views1like0CommentsElastic Collection
Below you can find links to all the articles and discussions relating to Elastic in the Community. Elasticsearch, or Elastic, is a near real-time, distributed storage, search, and analytics engine. Since the beginning, Quantexa has used Elasticsearch to store and query the data we ingest into the Quantexa platform, as we knew Search was going to be such a central feature of the platform. ๐๏ธArticles available to everyone Elasticsearch Considerations For Quantexa Elasticsearch and Why We Use It ๐Log in required Useful Elasticsearch API Calls Bookmark this thread and be notified whenever we publish a new article on Elastic. To bookmark a thread, click the chevron icon next to the title of the post.11Views0likes0CommentsUseful Elasticsearch API Calls
, Solutions Architect, outlines how to manage Elasticsearch (Elastic) clusters and indices which are crucial for maintaining a Quantexa implementation. This article provides an overview of common Elasticsearch API calls that can be used for cluster and index management. Key topics: Cluster Management: Retrieve cluster statistics using the _cluster/stats endpoint. Index Manipulation: Create an index with specific settings. Delete an index. Open and close an index. Enable read/write on an index. Reindex data from one index to another. Force a merge operation for better read performance. Move shards within the cluster. Alter the number of replicas for redundancy. Disable and re-enable shard allocation. Index Interrogation: List aliases. List indices, including their health and status. List shards and their details. List segments within an index. Retrieve index mapping. List unassigned shards. Check the progress of a force merge operation. Index Entry Manipulation: Add an entry to an index. Nodes: List nodes in the cluster, including their roles and resource usage. List nodes with detailed queries. Search: Perform a search query within an index using cURL requests. Tasks: List active tasks within the cluster. List detailed task information. This serves as a practical guide for managing Elasticsearch clusters and indices in the context of a Quantexa implementation, providing essential commands to effectively maintain and optimize the Elastic search environment. Useful Elasticsearch API Calls - Quantexa Community Managing your Elasticsearch clusters and indices is an important part of building and maintaining your Quantexa implementation. In this blog we have outlined and explained some of the more common Elasticsearch API calls that you may need. The example cURL requests below assume you are on an Elastic node. If thatโs not theโฆ101Views0likes0CommentsElasticsearch and Why We Use It
This article gives an overview of what Elasticsearch is, and how and why it's used at Quantexa. If you're a data scientist, business analyst, or an end user, this piece will give you some useful context for what Elasticsearch is all about. What is Elasticsearch? Elasticsearch, or Elastic, is a near real-time, distributed storage, search, and analytics engine. Since the beginning, Quantexa has used Elasticsearch to store and query the data we ingest into the Quantexa platform, as we knew Search was going to be such a central feature of the platform. Elasticsearch powers both our Search and Entity Resolution capabilities. How does it work? Data is passed to Elasticsearch, to be stored, following each step of the Extract, Transform, and Load (ETL) process. This process, as shown in the diagram, is generally handled by Data Fusion, which takes in the raw data and prepares it to be used for Entity Resolution (ER). Each of the four indexes sources data from a different point in the process, as follows: Document Indexes: Once the data has been Cleansed, Document indexes are created as an output. These indexes are queried by Document Search and can also be used for dynamic Scoring of Documents. Resolver Indexes: The linking data extracted during Cleansing can be used to resolve Entities. This data is stored in "Resolver" indexes and is used to perform dynamic Entity Resolution for UI features such as Investigations, as well as batch processes such as Graph Scripting. Entity Store Indexes: Once Entities are created, the information about them is stored together to increase the speed and reliability of Entity searches and to allow more detailed inspection of them in the UI. Other Indexes: When data is discovered as part of the Cleansing process which can't be used for Entity Resolution, but could still add value, it is transferred to Other indexes. Examples of this might include individual transactions, which are often too numerous to visualize as Documents in a Network diagram. Other indexes can be queried and displayed in the UI within a table or with other visualizations such as Sankey diagrams in our Explorer feature. Any "other data" in the platform not indexed after the final step relates to Quantexa-specific data around how the platform itself operates, such as a list of active Investigations, and is stored elsewhere. You can read about the indexes in more details in our article about Elasticsearch considerations for Quantexa. You'll find more detail about the configuration of Elasticsearch indexes on our Resolver Elasticsearch configuration page on our Documentation site. How is a search performed? There is a layer in between what the user does and Elasticsearch, known as the Search service, which communicates using queries between the User Interface (UI) and Elasticsearch to make Search work. This allows Elasticsearch to understand our complex data models, which differ depending on the Document type. For example, if the user puts a query into the User Interface (UI) filtered for "Forename" and "Surname", some additional work needs to be done to handle that request, as that filter might correspond to multiple different locations in the Document, such as "Shareholder name" or "Beneficial owner name". The Search service uses logic, configured to the Document type, to translate all instances of that type of data into something consistent, so they can be passed to Elasticsearch and it can then pass back the right results. Deployments will configure different filters, under customizable groups, depending on the needs of the project, and the user will then be able to select the corresponding options in the Search UI. Each filtered search will retrieve specific information from the indexes stored in Elasticsearch, depending on which filter is used. Why do we use Elasticsearch? Elasticsearch uses a particular type of logic to allow our Search to be smarter. For example, if you ran a search for directors with forename "Michael" and surname "Greene", you'd also return results from "David Green" and "Michael Jones" if you didn't have a nested logic which tells the system you only want results where both terms match. Plus, when you look at more detailed information about a Document, Elasticsearch data is rendered in the UI through the Document Viewer. This functionality is only available thanks to the way we store the data in Elasticsearch. Overall, the approach makes configuration easier down the line and means that Quantexa takes on the complexity of nesting data, rather than it being taken on by the deployment. Where can I find out more? For more details on the architecture of how Elasticsearch is implemented, see Elasticsearch Considerations for Quantexa. Refer to the Documentation Site for further details about Elasticsearch, or you can find general information on Elasticsearch's website.831Views1like0Comments