MLflow at Quantexa
In the dynamic landscape of Machine Learning (ML) development, the need for effective experiment tracking is paramount. This is especially true as organizations scale their operations. As the complexity of ML projects grows, so does the need for comprehensive tools to efficiently manage experimentation, iterations, and model versions. At Quantexa, we encountered this challenge head-on and sought a robust solution to streamline our ML workflows. We found a big part of the solution to these challenges was to use MLflow. MLflow is a powerful platform designed to simplify the end-to-end machine learning lifecycle. In this blog, we delve into how we leverage MLflow at Quantexa. What is MLflow? MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It's designed to help Data Scientists and Machine Learning Engineers with tracking experiments, packaging code into reproducible runs, and sharing and deploying models. It comes with a graphical user interface which makes it easy to use. At Quantexa, we primarily use MLflow for experiment tracking and reproducibility. This will be the focus of this post. If you are interested in learning more about how you can use MLflow for model deployment, we recommend you spend time reading collateral on the MLflow Website. What issues is MLflow solving? MLflow addresses several critical challenges in our machine learning workflow. This enhances our overall efficiency and effectiveness. These challenges will also be faced by any client building models on top of Quantexa data. Here's how it tackles three of our most key issues: Reproducibility and traceability At Quantexa, we have a considerable number of machine learning models. It is critical that the origin of all these models is well-documented and readily available. This is important from a compliance perspective but also for internal purposes. The team and clients should be aware of performance metrics and each model's strengths and weaknesses. The models are also valuable intellectual property for the company, and we need to be able to re-produce results. MLflow helps us with these issues. When models are trained, we record versions of code, model parameters, and the source of the original data. We can then use this information to perfectly recreate a model if ever needed. Furthermore, we have a record of any experiments which were used in selecting models. This means that a Data Scientist in the future can understand why certain decisions were made during the model prototyping phase. Collaborative insight We want Data Scientists to be able to collaborate easily. MLflow fosters collaboration by providing a centralized repository for tracking and sharing experiments. This is particularly useful when it comes to gaining insight from non-technical stakeholders. Stakeholders can use the MLflow UI to access all the information they need to assess and give feedback on models. This dramatically speeds up the feedback cycle and leads to better models. It also saves time for all involved. MLflow enables data scientists to collaborate better with one another. This encourages a culture of knowledge exchange and innovation within the team. Data correctness At Quantexa, we are dealing with very large datasets when running predictions using our models. This scale can make it difficult to assess the quality and correctness of data. Manual sense checks are always performed, but mistakes can be missed. This is particularly true if errors only affect a certain portion of the dataset (e.g. one country). To avoid errors slipping through, we produce lots of metrics and plots when we receive new data refreshes. We automatically compare this data to what we have seen previously. We log all of these metrics in MLflow. This makes it easy for the whole team to audit the data and spot mistakes such as missing data. Large differences are highlighted automatically even if only in one geographical region or data source. This information can then be easily forwarded to upstream teams to resolve issues. This process dramatically reduces the risk of erroneous predictions from incorrect data. How do we use it? In practice, we mainly use MLflow for three different workflows. Feature generation A pre-processing step transforms data produced by Quantexa Entity Resolution into a different tabular format. Our models then use this new format for training and prediction. We record aggregate information about data produced during this pre-processing step in the form of plots and CSVs. This aggregate information is compared to previous runs to highlight big discrepancies in easy-to-read files automatically. As mentioned, this reduces the risk of downstream tasks using erroneous data critical for model training and prediction. Code versions and parameters for this pipeline are recorded. This allows us always to re-create the pre-processing step. It also provides an understanding of the data source we are training our models on. Here is a list of some of the information we record at this stage: Mean, median, and variance of features Data drift scores for each feature Percentage of null values in a column A plot produced to visualize the top drifting features across countries. Country and feature names have been omitted. A plot produced to visualize the percentage of fields that have missing values in the Documents. These fields are used to produce features. Country and field names have been omitted. Model training We use MLflow to record model training. This can be to track multiple experiments during the prototyping phase of a project or be used as a reference for models in production. All the required information is recorded so that models can be re-produced if required. Here is a list of some of the information we record when training models Model hyperparameters Test metrics such as precision and recall Confusion matrix An ROC curve (receiver operating characteristic curve) produced against the test set during training. A cumulative gains curve produced against the test set during training. Staging model evaluation We often iterate on our production machine learning models to improve their performance. We do this by adding more labeled examples to their training datasets or by adding more features. When we train a new model that we think should replace the existing model, we want to ensure that the new model is superior to the model we are replacing. We can look at the performance of the model across a test set, but sometimes this does not tell the full story. This is particularly true if the amount of labeled data we have is limited. Before upgrading to any new model, we run an evaluation across the entire unlabeled dataset to see how the models differ in practice. This is important for understanding the real business impact of a change. This evaluation is recorded in MLflow such that the data scientists and any relevant stakeholders can review it. Here is some of the information we record during the evaluation: The average size of differences in model score between the old model and new model Individual examples with the biggest differences in the model score between the old model and the new model Individual examples with SHAP explainability plots Explainability plot produced when the model is used for inference across unlabelled data. Feature names have been omitted. Visualization of the different scores produced by a model currently in production compared to the model selected to replace it. Conclusion In the dynamic realm of machine learning, effective experiment tracking is indispensable, especially with growing organizational scale. At Quantexa, MLflow has emerged as our solution, simplifying the ML lifecycle and addressing critical challenges in reproducibility, collaboration, and data correctness. Leveraging MLflow enhances our internal workflows and empowers clients to build models on Quantexa data to navigate complexities seamlessly.671Views1like0CommentsQuantexa's AI Roundup - 2023
In July 2023, Quantexa announced a significant investment into its Artificial Intelligence (AI) capabilities (Quantexa Bringing Total Investment in AI R&D to over $250M by 2027). Since this announcement, there has been significant advancement in the AI space, and growth in some of the core AI capabilities at Quantexa. Alongside the significant growth of the NLP capability, Quantexa’s Analytical Innovation team have completed the MVPs of their three flagship products which are now released under experimental. These tools use Quantexa networks to uncover insights: the Entity Resolution AI suite; Q-Knowledge Graph and Shell Company Detection. In this round up post, we introduce the three products and demonstrate how they can add value to your Quantexa deployment. The Entity Resolution (ER) AI Suite The ER AI suite provides a series of tools for analysing the outputs of Quantexa’s ER product and provides suggestions for improving the configurations powering the ER using AI. In particular, the tool can detect overlinking and underlinking in Quantexa Entities and their root causes. The overlinking detection tool is powered by machine learning with features based on the qualities of the Entity’s constituent record-compound graph (read more about using the Entity Quality Overlinking tool for the first time). These graph-based features include the use of several complex graph algorithms (e.g., the Stoer-Wagner algorithm) to find shapes which are indicative of overlinking. Such shapes include ‘bridges’ in the network which incorrectly link Entities together, as well as graphs with very long paths. Statistical techniques can then be applied to determine what compounds or data points may be leading to this overlinking. The underlinking tool uses sophisticated graph algorithms to find ‘Super-Entities’ – Entities which should be formed of several existing Entities. This helps the user to identify template changes to merge such entities together in future ER runs. Q-Knowledge Graph Q-Knowledge Graph is a series of tools for analysing large-scale Quantexa Entity and Document graphs. It scales to billions of nodes and edges and uses sophisticated optimisation techniques to provide extremely fast implementations of core graph transformations and algorithms (including page rank). Not only does the tool provide access to commonly used graph algorithms out of the box (for example, PageRank) – it also provides a connection to common graph learning libraries such as PyG. This enables several use cases across Risk, KYC and MDM and has already been deployed for transactional use cases in a global bank. It will also be a core back-end component of a number of upcoming Quantexa AI products. Shell Company Detection The Shell company detection tool uses machine learning to identify shell companies, using characteristics of the local ego-networks of the companies. The model uses a combination of structural features (e.g., links to known shell directors); temporal features (e.g., patterns of director resignation) and static features (including the size of the corresponding corporate registry Document). For more information, see What can Network structure tell us about risk? The current model is built specifically for the UK and Singapore and can encapsulate some behaviours specific to shells in these jurisdictions. Models focused on other jurisdictions are coming this year. Upcoming AI releases The NLP team at Quantexa are also developing a machine learning pipeline called Text2Networks for working with unstructured data, which will be available in the next major release of Quantexa. The Text2Networks pipeline is a highly-configurable pipeline of ML models for mapping any unstructured textual data into a graph. The pipeline detects, labels and organizes people, places, and things in the real world – the supported Entity types include People, Locations, Companies and Geo-political organizations. With text2networks integrated into the core Quantexa product, our users will be able to incorporate any textual data source that is important for their business. Concrete example could include global news, intelligence reports, and Suspicious Activity Reports (SARs). There are several tools in development, including further tooling within the ER quality suite and Q-Knowledge Graph, as well as other risk models such as the SME detection tool which will be coming in later releases of Quantexa. To keep up with the latest releases, be sure to follow our Release Announcements topic.531Views1like0CommentsQuantexa Acquires Aylien, an Award-Winning Provider of AI-Based Risk & Market Intelligence Solutions
This combines Aylien’s advanced AI and Natural Language Processing (NLP) and Quantexa’s Decision Intelligence Platform to boost our customers’ ability to unify the world of structured and unstructured data to augment and automate decision making According to Gartner®, “end-users report unstructured data within their organizations is growing more than 30% year-over-year.” Founded in 2012, Aylien’s solutions include a News API for building intelligent applications with aggregated multi-lingual content from over 80,000 trusted sources across the web, traditional media, and licensed news outlets. Aylien also provides an application for analysts to proactively identify, investigate, and monitor news data to assess critical business risks and opportunities. Aylien’s innovative offerings in market insights and risk management immediately expand the portfolio of solutions Quantexa can offer in multiple industries. The joint offerings will empower customers to create real-time and reliable streams of actionable insights across risk and compliance practices and identify new opportunities for growth. Customers will be able to stay one step ahead of potential risks with the ability to proactively monitor their entire risk landscape across ESG, third-parties, reputational issues, and operations beyond compliance and controls. “We are thrilled to welcome the Aylien team into the Quantexa organization following an extensive process run by our Corporate Development team,” said Quantexa CEO, Vishal Marria. “Aylien is a first-class organization with an impressive team that continues to push the boundaries of what is possible in NLP. Their unique approach and commitment to combining research and commercial software development has allowed Aylien to deliver significant value to their customers using AI to extract intelligence for critical decision making. Our organizations are strongly aligned when it comes to how we build our solutions and our cultural values.” “For over a decade we have helped our customers become more efficient and resilient by using our cutting-edge AI to allow them to put their data in context and make confident decisions,” said Parsa Ghaffari, Aylien CEO. “In the last few months, we have worked closely with Quantexa’s leadership team and have seen first-hand how closely our cultures and R&D efforts align. Quantexa is experiencing remarkable growth and I am confident that they will leverage Aylien’s offerings and bring their benefits to new solutions and customers. We are excited about the expanding opportunities we will have as part of the Quantexa team.” Read more about the acquisition here101Views0likes0CommentsA Roadmap to Resilience: How Banks Can Leverage AI to Advance AML Capabilities
Pleased to share my latest article in The International Banker on how regulated entities can maximise their Compliance programme investment in AI - starting with getting their data in the best possible shape to inform any downstream model or process. You can find the article here:https://internationalbanker.com/technology/a-roadmap-to-resilience-how-banks-can-leverage-ai-to-advance-aml-capabilities/91Views1like1CommentQuantexa's AI Commitment
Quantexa is bringing its total global investment in AI research and development to over $250M by 2027. This fresh injection of capital will help clients quickly and responsibly advance the use of AI to protect, optimize, and grow their organizations. Read the full blog on the Quantexa website: AI in Context: How Quantexa Augments and Automates Decision Intelligence with AI The relationship between human and machine will become increasingly important and the insight needed to drive the right decisions would be impossible without AI. See how Quantexa continues to drive AI innovation.71Views1like0CommentsIDC Revenue for AI software Will Reach $279B in 2027
A recent forecast from International Data Corporation (IDC) shows that the worldwide artificial intelligence (AI) software market will grow from $64 billion in 2022 to nearly $251 billion in 2027 at a compound annual growth rate (CAGR) of 31.4%. The forecast for AI-centric software includes Artificial Intelligence Platforms, AI Applications, AI System Infrastructure Software (SIS), and AI Application Development and Deployment (AD&D) software (excluding AI platforms). However, it does not include Generative AI platforms and applications, which IDC recently forecast will generate revenues of $28.3 billion in 2027. A recent IDC survey found that, in the next 12 months, roughly a third of respondents believe that organizations will prefer to buy AI software from a vendor or use in-house support alongside vendor-supplied AI software for specific use cases or application areas. This indicates a growing demand for AI solutions and highlights the need for customized approaches based on individual business requirements. Quantexa's Parsa Ghaffari weighs in with his thoughts: IDC: Revenue for AI Software Will Reach $279B in 202751Views0likes0CommentsGartner: CISOs Need to Champion AI TRiSM to Improve AI Results
By 2026, organizations that operationalize artificial intelligence (AI) transparency, trust and security will see their AI models achieve a 50% improvement in terms of adoption, business goals and user acceptance, according to Gartner, Inc. Speaking at the Gartner Security & Risk Management Summit in London today, Mark Horvath, VP Analyst at Gartner said, “CISOs can’t let AI control their organization. AI requires new forms of trust, risk and security management (TRiSM) that conventional controls don’t provide. Chief information security officers (CISOs) need to champion AI TRiSM to improve AI results, by, for example, increasing the speed of AI model-to-production, enabling better governance or rationalizing AI model portfolio, which can eliminate up to 80% of faulty and illegitimate information." Felix Hoddinot, Chief Analytics Officer with Quantexa added, “The TRiSM is a great model highlighting requirements for responsible AI solutions. Quantexa’s Decision Intelligence platform is a great way to achieve these requirements. The transparency delivered by our contextual AI approach drives trust and enables effective AI risk management. Quantexa’s unique capabilities to embedding data security within our Entity Resolution offering is more important now than ever as we deploy solutions with ever larger and more extensive data sets.” Read more here: Gartner: CISOs Need to Champion AI TRiSM to Improve AI Results51Views0likes0CommentsGoogle Banking Survey: C-Suites & Boards More Involved in Tech Decisions with AI
New research explored the sentiment towards generative AI (gen AI) in banking among North American banking executives and consumers. The study, based on a survey of 350 banking executives responsible for genAI decisioning and more than 2,000 banking consumers in the United States, found broad interest in gen AI technologies as a way to improve operations and the customer experience, while some barriers and risks remain. The majority (92%) of banking executives stated there is high demand for gen AI within the banking industry, with 95% stating it has the potential to transform the industry. Increased interest in gen AI is driving senior leadership, like C-suite executives and boards of directors, to get more involved in technology and IT decisions, according to almost all banking respondents (96%). Quantexa's Parsa Ghaffari weighs in: Google Banking Survey: C-Suites and Boards More Involved in Tech Decisions Due to Heightened Interest in Gen AI41Views1like0CommentsGen AI and the evolving role of marketing - Capgemini
The majority of marketers (62%) believe that generative AI will augment human creativity, enhancing unique human qualities such as intuition, emotion, and context understanding. Organizations already investing in generative AI for marketing dedicate 62% of their total marketing technology budget towards it, seeing this breakthrough technology as a catalyst for creativity and innovation in marketing. That’s according to Capgemini Research Institute’s latest report ‘Generative AI and the evolving role of marketing: A CMO’s Playbook’, which reveals that half of organizations have already set aside specific budgets and almost half (47%) have allocated teams for the implementation of generative AI in marketing. Quantexa's Matt Hooper weighs in: Capgemini: 60% Integrating Generative AI Into Marketing31Views0likes0Comments📣Upcoming Webinar: The Biggest Challenges in Data Quality: How Far Can AI Go to Solve Them? 📣
In this webinar, Dan Onions, Global Head of Data Management at Quantexa, and Martin Maisey, Head of Data Management EMEA, will delve into the pressing question on every data professional's mind: "How can AI help me?" Unlock the full potential of your data strategy: As AI technologies, particularly LLMs, become increasingly integral to data management strategies, ensuring the quality and reliability of these systems' outputs is paramount. Our experts will explore the critical role of foundational data quality in harnessing AI effectively and responsibly, and address key challenges, such as achieving consistency and accuracy in AI-generated outputs and aligning them with regulatory standards already on the horizon. Attendees will gain insights into practical applications of AI in the real world, understanding how to make AI outputs on data trustworthy across the entire organization. Register Your Place Here: The Biggest Challenges in Data Quality: How Far Can AI Go to Solve Them? (quantexa.com)21Views1like0Comments