Welcome to Quantexa 2.3 | 2.3.0 Release Announcement
In Quantexa 2.3, you will find: Entity Store, a new component in our set of Entity Resolution capabilities; Introduction of Security Model V2, a new way to manage Role-Based Access and Control within The Quantexa Platform; Enhancements to Assess including Path Ranking, a new way to define paths; Support for Elasticsearch 8 and Spark 3.2. Entity Store In 2.3.0, we are introducing Entity Store in beta. Before 2.3.0, all interactions with the Quantexa UI and the Quantexa Mid-tier APIs depended upon resolving Entities on-the-fly (dynamically). The Entity Store allows a persisted or materialized view of Entities to be stored in the system. In this first release, the core Entity Store will load a full set of pre-resolved Entities produced by Batch Resolver, into Elasticsearch. The Entity Store can then be used by Explorer to enable querying of the underlying entities, as a Cache for Resolver to improve performance, and to support the new Entity REST API. Security Model V2 We have introduced a new framework for authentication and authorization across the platform. This gives the platform greater control over the data and features that a user can access and enables easier integration with Identity Providers (IdPs). End-users no longer need to interact directly with low-level Quantexa Roles. They are now able to share their work with Groups that are logical to their organizational structure. For example: UK Fraud Investigators. Using a new User Management Screen, you can now provision users and Groups, collections of users that can be assigned Roles and Dynamic Privileges, to The Quantexa Platform from Identity Providers simply and easily. Path Ranking It is now possible to specify the rules for defining the relative importance of paths, a chain of Documents and Entities that connect two Nodes, when writing Network Scores with Path Ranking, which make the Network easier to analyze by signaling the most significant areas to explore further. Dual Context Sources Dual Context Sources enable scoring logic to be designed that will work in both batch and dynamic, and provides a config-based tool that generates Source steps for both pipelines. This simplifies the deployment of a build with a dual architecture. Support for Elasticsearch and Spark Quantexa now supports Elasticsearch 8 across the platform, except for Offline Indexers and Spark 3.2 is now supported across the platform. Warning: Spark 3.0 is deprecated and support will be removed completely in the next release of the platform. Spring Boot upgrade Spring Boot has been upgraded to version 2.6.8. This results in faster startup times and solves a number of security vulnerabilities found in older versions of Spring Boot. Documentation Site Glossary We have completely revamped and reworked our Documentation site glossary, adding almost 100 new terms. Explore the Glossary to find out more. Other highlights For more control and flexibility when running ETL, Quantexa has has added the ability to generate Resolver Search Loader and Compound Creator scripts at the point where users define a Root Model in Data Fusion. List inputs have been introduced for Entity Attribute functions, to allow for more flexibility and customizability, granting users more ways to define Entity Attributes in Data Fusion, and cater to a wider range of data models. You are now able to collapse the Query Builder in Explorer. This is the next step in improving the user experience of Explorer, allowing users to focus on the results of the query. You will find the full set of Release Notes on the Quantexa Documentation site. If you are unable to access them, you will need to get a user with access to submit a Documentation site access request through the Quantexa Support Portal.1.2KViews1like3CommentsFAQ: Elasticsearch "cluster health: Not connected" in Chrome / "Connection refused" script error
FAQ relevant for: all Academy versions Sometimes on the VDIs you will encounter your Elasticsearch being disconnected which will then mean that the data isn't available for easy viewing and it will also lead to errors in your UI. If you try to run e.g. a load Elastic ETL script while Elastic isn't connected then you will get an error, for example: Exception in thread "main" java.net.ConnectException: Connection refused To reconnect the Elasticsearch service you just need to run the following command anywhere in a terminal window on your VDI: sudo systemctl restart elasticsearch.service332Views1like2CommentsFAQ: I'm missing data in Elasticsearch / my number of docs are wrong
FAQ relevant for: all Academy versions If you have completed the ETL pipeline stages of your project and uploaded the data to ElasticSearch, then when checking your indices in the ElasticSearch Head plugin on Chrome you should have numbers similar to the picture below (to get a bigger version of the image, right click it and chose the option to open it in a new tab). Note: If your numbers vary a little bit from these, for example having 152k address instead of 148k, then that's ok - the numbers will change a little for the resolver indices (Individual/Address/Business) based on the compound keys you have imported in the respective *.qentity Fusion config files. If your numbers are significantly different to this, then you will want to go back through your ETL pipeline and carefully check each stage to see if there is somewhere that you lose the data along the way. A good way to approach this problem is to work forwards from CreateCaseClass and check the output of each stage to find the problem area. You should also use the counts in ElasticSearch to guide you - for example if you have only half the number of businesses listed above, and no individuals, it lets you know that you probably haven't joined your Third Parties onto the ICIJ document properly, and so you would want to go back and double check how you have done this join and on what fields. Specific points to consider: Have I correctly parsed all of the necessary fields in my qmodel files? Have I used the correct type of joins in CreateCaseClass, and have I joined on the correct fields? Have I outputted the correct Dataset at the end of CreateCaseClass? Have I loaded up the DocumentDataModel.parquet (the output of CreateCaseClass) into a Spark-Shell to check the output there? Have I correctly identified and defined all relevant start paths in my qentity files? Do I have a good range of compound keys for each Entity? If you are convinced that you have done all of the above correctly then you can try to clear the data from ElasticSearch, restart the service and then re-upload the data to Elastic using the following three commands: curl -X DELETE 'http://localhost:9200/_all' sudo systemctl restart elasticsearch.service ./runQSS.sh -s com.quantexa.academy.task.icij.model.etl.IcijLoadElasticScript -c ../external.conf -r elastic.icij2.1KViews1like0CommentsWelcome to Parsers 4.1 | Release Announcement
We are excited to announce the release of version 4.1 of Quantexa's Standard Parsers. This release focuses on improving the integration with Fusion UI (look out for exciting 2.6 release announcements coming soon) and improvements of the file structure of configuration files. This release includes the following highlights, which are detailed below. Consistency of configuration files - general improvements to Parser and lexicon configuration and files have been introduced to make sure the way you use these files is consistent across all available Parsers. This will make your configuration easier to understand and simplify the process of making future modifications. To minimise redundant data storage in Elastic Search, you can now exclude business standardisation terms that arenβt used in areas such as exclusions for Entity Resolution. Similarly, you can now choose to parse multiple names or just a single name for the Individual Parser to reduce your Elastic Search footprint. For the Telephone Parser you can now specify conditional parsing rules to increase the output accuracy. For example, if you have more specific parsing rules for UK telephone numbers, you can now use country code to parse these telephone numbers differently to the default telephone parsing behaviour. Note: There are no changes in this release that affect the output of parsing.151Views1like1CommentOctober Community Digest π
Welcome to this spooktacular edition of our Community Digest, where we gather the most thrilling posts from our members just in time for Halloween π» Public content: π¬ Tips & Tricks for Managing Large and Complex Networks - Discussion π Join our Monthly Community Connect on November 10th - Event π£ Quantexa's Chris Bagnall wins ACAMS Today Article of the Year 2023 - Announcement π Understanding Human Trafficking - Blog π¬ Elasticsearch and Why We Use It - Discussion π Automatic Data Cleaning Through Data Normalisation and Statistics - Blog π Unlocking the Power of pKYC: Smarter KYC Processes EMEA and APAC webinars πCommunity Competition - Refer 5 Colleagues to Win! π Elasticsearch Considerations for Quantexa - Blog π¬ Perspective: Addressing SEC-Identified AML Program Deficiencies at Broker-Dealers - KYC Group Discussion π Using Data Fusion for the first time - Blog Members content (log in required): π‘Make dropdowns in Explorer charts searchable and sort options alphabetically in multiple languages - Idea βοΈLoad Elastic not connecting - Academy Q&A π‘Auditing search viewer click event for audit monitoring - Idea βοΈWhen aggregating entity scores, only one appears in the customer scorecard - Q&A π‘Running out of screen real estate in the Investigation view - Idea Community quick links: π‘ Submit and vote for Ideas in our Ideas Portal π£οΈ Join one of our Specialist User Groups: FinCrime, Insurance, Data Management & KYC ποΈ Browse blogs, articles and guides in our Community Library New to the Community? Sign up for a Community Tour π121Views1like0CommentsCommon Elastic Loader errors
For common elastic loader errors, such as load elastic job failing, read Common Elastic Loader errors (login required) on our Docs site. You'll find some common Elastic Loader failure cases, and how to address them. The errors apply to both the Resolver-Search Elastic Loader and the Generic Elastic Loader. These include: Connection issues Performance issues Elasticsearch is overloaded Spark is running out of memory Count validation failure Data could not be indexed Did you know you can also click the elastic-loader tag and then filter to find Questions with an answer that is Accepted by the Community?54Views1like0CommentsElasticsearch and Why We Use It
This article gives an overview of what Elasticsearch is, and how and why it's used at Quantexa. If you're a data scientist, business analyst, or an end user, this piece will give you some useful context for what Elasticsearch is all about. What is Elasticsearch? Elasticsearch, or Elastic, is a near real-time, distributed storage, search, and analytics engine. Since the beginning, Quantexa has used Elasticsearch to store and query the data we ingest into the Quantexa platform, as we knew Search was going to be such a central feature of the platform. Elasticsearch powers both our Search and Entity Resolution capabilities. How does it work? Data is passed to Elasticsearch, to be stored, following each step of the Extract, Transform, and Load (ETL) process. This process, as shown in the diagram, is generally handled by Data Fusion, which takes in the raw data and prepares it to be used for Entity Resolution (ER). Each of the four indexes sources data from a different point in the process, as follows: Document Indexes: Once the data has been Cleansed, Document indexes are created as an output. These indexes are queried by Document Search and can also be used for dynamic Scoring of Documents. Resolver Indexes: The linking data extracted during Cleansing can be used to resolve Entities. This data is stored in "Resolver" indexes and is used to perform dynamic Entity Resolution for UI features such as Investigations, as well as batch processes such as Graph Scripting. Entity Store Indexes: Once Entities are created, the information about them is stored together to increase the speed and reliability of Entity searches and to allow more detailed inspection of them in the UI. Other Indexes: When data is discovered as part of the Cleansing process which can't be used for Entity Resolution, but could still add value, it is transferred to Other indexes. Examples of this might include individual transactions, which are often too numerous to visualize as Documents in a Network diagram. Other indexes can be queried and displayed in the UI within a table or with other visualizations such as Sankey diagrams in our Explorer feature. Any "other data" in the platform not indexed after the final step relates to Quantexa-specific data around how the platform itself operates, such as a list of active Investigations, and is stored elsewhere. You can read about the indexes in more details in our article about Elasticsearch considerations for Quantexa. You'll find more detail about the configuration of Elasticsearch indexes on our Resolver Elasticsearch configuration page on our Documentation site. How is a search performed? There is a layer in between what the user does and Elasticsearch, known as the Search service, which communicates using queries between the User Interface (UI) and Elasticsearch to make Search work. This allows Elasticsearch to understand our complex data models, which differ depending on the Document type. For example, if the user puts a query into the User Interface (UI) filtered for "Forename" and "Surname", some additional work needs to be done to handle that request, as that filter might correspond to multiple different locations in the Document, such as "Shareholder name" or "Beneficial owner name". The Search service uses logic, configured to the Document type, to translate all instances of that type of data into something consistent, so they can be passed to Elasticsearch and it can then pass back the right results. Deployments will configure different filters, under customizable groups, depending on the needs of the project, and the user will then be able to select the corresponding options in the Search UI. Each filtered search will retrieve specific information from the indexes stored in Elasticsearch, depending on which filter is used. Why do we use Elasticsearch? Elasticsearch uses a particular type of logic to allow our Search to be smarter. For example, if you ran a search for directors with forename "Michael" and surname "Greene", you'd also return results from "David Green" and "Michael Jones" if you didn't have a nested logic which tells the system you only want results where both terms match. Plus, when you look at more detailed information about a Document, Elasticsearch data is rendered in the UI through the Document Viewer. This functionality is only available thanks to the way we store the data in Elasticsearch. Overall, the approach makes configuration easier down the line and means that Quantexa takes on the complexity of nesting data, rather than it being taken on by the deployment. Where can I find out more? For more details on the architecture of how Elasticsearch is implemented, see Elasticsearch Considerations for Quantexa. Refer to the Documentation Site for further details about Elasticsearch, or you can find general information on Elasticsearch's website.831Views1like0CommentsWhy Does Entity Quality Matter? & Best of the Community from March
March Top Picks Why Entity Quality Matters πlogin required Enter our latest competition: The Knowledge Exchange π New Education Programs: Scala & Spark Bootcamp and Quantexa Data Engineer Velocity Program Quantexa & Xander Talent - New Education Partnership π€ Elevating Data Management: Unveiling the Pillars of a Trusted Data Foundation with Quantexa Latest from the Community Library π A day in the life of a... Senior Learning Designer A day in the life of... an Academy Trainee How To Test Your Upgrades πlogin required Updates to Quantexa Supported Versions πlogin required Upcoming events ποΈ 3rd May Community Connect π Join for a demo of top Community features. Best of Q&A β Unable to load Batch Scores to Elastic πlogin required How to fix image style for investigation icon in the Qx UI? πlogin required Error creating bean with name 'springSecurityFilterChain' πlogin required New & Popular Ideasπ‘ Usability increase using 2 screens for investigations πlogin required Changing the default settings for Graphic Filters in Data Viewer πlogin required Make updates to metadata.parquet optional πlogin required In case you missed it π£ Welcome to Quantexa 2.6 | 2.6.0 Release Announcement π Badge of the Month: The Name Dropper Badge Community quick links π Submit and vote for Ideas in our Ideas Portal π£οΈ Join one of our Specialist User Groups: FinCrime, Insurance, Data Management & KYC ποΈ Browse blogs, articles and guides in our Community Library131Views1like0CommentsFAQ - List of Academy Frequently Asked Questions
Below is a list of FAQ's for the Quantexa Academies. Have an idea for an FAQ? Please let us know by emailing training@quantexa.com with the idea or by commenting on this post here! General Quantexa Documentation Links for the Academy Gradle taking too long to index I tried to run a spark shell script and got a "Permission denied" error message My Quantexa Licence has expired, how do I update it? I'm trying to run a script and seeing an error about "Unrecognized option: -s" Fusion What are Document Attributes used for? What are Entity Attributes used for? What are (Entity) Records? What are Traversals? Elastic Elasticsearch "cluster health: Not connected" in Chrome / "Connection refused" when running Load script error I'm missing data in Elasticsearch / my number of docs are wrong Resolver Config Why are search constraints not working & how to use wildcards (* characters) within field sets? How can I configure the timeline? Batch Resolver (ENG) I tried to run Batch Resolver (ENG) but got an error about Timestamps / Dates Scoring My score isn't triggering / doesn't appear in the UI How can I handle dates when writing a Quantexa Score in Scala? Checking Score Outputs (Optional Sink Step) UI I tried to start my UI but some of the apps didn't start up / I got a UI error Search configuration could not be loaded in the UI I can't find Addresses or Individuals when I search in my UI! Data / Attributes aren't showing up in my UI! VDI VDI Usage and Common Issues I'm having trouble pasting into my VDI Admin I can't access a website or course BA Academy List of Technical Business Analyst (BA) Academy Frequently Asked Questions Don't forget to bookmark this page in the upper right corner to save it for easy access.2.8KViews1like0Comments