By Matt Jackson and Christina Tkachenko
Background
In models that power Quantexa’s decision intelligence, many data points come together to power a recommendation, or highlight a risk. This data can take various forms, from financial data and text to contextual information. Using data of all kinds to fuel our AI solutions can improve their predictive capabilities. However, careful consideration of how different types of data are combined is important to maximize the effectiveness of this approach.
Quantexa’s Shell Detection models use information from corporate registry documents to estimate the likelihood of a business being a shell company. A Shell Company fraud is an umbrella term to describe a range of risky company behaviors. This includes operations linked to Mini Umbrella Companies (MUCs), shelf companies, phoenix companies, and property tax. Illegitimate shells can be used for money laundering, tax evasion, or hiding the identities of their owners.
In addition to data derived directly from each document, our Shell Detection models also utilise contextual information about company directors. This context comes from the Quantexa Graph, constructed by linking businesses, individuals, and many more types of entities across multiple documents, using Quantexa’s entity resolution capability.
Features have different scopes
Adding contextual information to our models introduces a challenge. To incorporate contextual information, we use graph features calculated from a director’s local ego-Graph. However, at the graph level these features can comprise of information about multiple, related businesses. The same business can be found on multiple ego-Graphs, where businesses have more than one director (Fig. 1). To make predictions about a single business, we need to be able to link graph-level information to individual businesses. There are several ways this can be done, each affecting model training and inference capabilities.
Fig. 1: Using resolved entities (top) to generate ego-graphs (bottom).
A potential solution is to train a model at both the business and ego-Graph levels and use results of both (we can call this Method 1). This results in one business-level prediction, and several graph level predictions per business, depending on how many ego-Graphs they appear in. Decisions could then be made based on results from both models.
However, this approach can result in complications. Should we predict that a business is a shell company if it has contradicting Graph-level predictions? This solution would require business logic implemented on top of model predictions, that may be difficult to justify. This method also requires twice the effort in ongoing support, with two models to maintain.
Another option is to match businesses with their various graph-level feature scores, before model training and predictions (Method 2). This would result in duplicated sets of document level features, with different sets of graph features for each ego-Graph a business is found in.
Again though, this solution comes with some problems. In this case, there can be multiple conflicting predictions for a single business. In addition, we artificially duplicate document-level data for our model training, which may have unintended consequences for model performance.
Quantexa’s Roll-down method
Instead, Quantexa’s shell company model pipelines involve ‘rolling-down’ graph-level data (at the ego-Graph level) to the document level using aggregations. This means contextual information can be used to drive single business predictions, without complex business logic or artificial data duplication.
Suitable aggregation logic is selected based on each feature. Consider an example graph-level feature, which counts the number of businesses incorporated by a director in a short amount of time. If we know that a high number is linked to risk, we can use the maximum graph-level count of all ego-Graphs a business is found in to drive our model predictions.
Fig. 2: Ego-Graph level calculations are aggregated, with different aggregations used depending on their underlying relationship to risk.
In some cases, the optimal aggregation for a feature may be less obvious. Here, we can play to the strengths of tree-based algorithms such as XGBoost, and add multiple different aggregations of the same graph-level feature to training data. Now, during model training, we can allow the model to select a suitable aggregation for a single feature, based on contribution to risk.
Effect on model performance
We can evaluate each method discussed above, Method 1, Method 2 and the Roll-down method, by comparing model performance. Each model is trained with XGBoost, using consistent, labelled Shell Company data. Only the method used for combining information from the Graph- and document-levels is changed.
Comparing the performances of the three feature combination methods reveals elevated performance of the Roll-down method:
Features combination method | Accuracy | Precision | Recall | F1 |
---|
Method 1 (threshold 0.5) | 72.21% | 98.28% | 7.53% | 13.99% |
Method 2 | 94.51% | 89.12% | 93.01% | 91.03% |
Roll-down Method | 99.56% | 94.25% | 94.25% | 94.25% |
Table. 1: Comparison of model performance using different methods to combine business and Graph features.
Method 1 uses business logic for the combination of the graph and document level predictions. A graph-level model is used to predict high-risk directors, and these director’s ego-Graphs are filtered based on the number of predicted shell documents in the graphs. Businesses on these high-risk ego-Graphs are then treated as predicted shell companies. Though this business logic-driven approach results in a model with few false positives (high precision), it suffers from a very high false negative rate (low recall). In practice, this would result in businesses exhibiting risky behavior going unnoticed by the model. While thresholds can be adjusted to modify precision and recall, this method still suffers from overall poor accuracy.
Method 2 shows a large improvement in performance, providing an equal opportunity for the companies to score based on either document- or graph-level patterns, learned through a common model. However, due to the duplication of document information across varying graph level features, XGBoost can have conflicting graph feature information with which to train, and complex patterns of behavior across multiple ego-Graphs may be lost.
Finally, the Roll-down method shows a further improvement to model performance, leveraging its ability to sensibly combine complex contextual data from the graph and document levels, without duplication of data and arbitrary business logic.
Using contextual information in machine learning models can be powerful, but here we demonstrate that there are pitfalls to navigate around when combining data with different scopes. For Quantexa’s Shell Detection models, using an appropriate method for data aggregation before training and inference results in a significant increase in model performance.