(by Matt Jackson and Christina Tkachenko)
Corporate registry Documents provide useful structured information for use in risk modelling. In particular, the information within can be a good indicator of suspicious activity. Yet, sometimes this information is not enough to inform a decision or take an action. By looking at each business' data in an isolated way, we omit the complex cross-business connections.
For example, same individuals, addresses and telephone numbers can appear across several companies. At Quantexa we resolve such information into interconnected Entities that form Networks.
Observing the Networks we can link their different structures to high-risk business behaviours. Network structure refers to the arrangement and pattern of connections between Nodes within a graph.
Capturing graph structure
There are several ways to distil complex Network structures into aggregative measures. Two main categories of methods exist. One is graph machine learning (GML) approaches involving Network encoding (see Predicting Risk through Network Shape). The second - more classical graph theory techniques. GML methods can be powerful under the right circumstances. But they tend to be less explainable, which has an impact on potential business decisions. Regulators often require justifications for AI-derived decisions. Customers may feel that ‘black-box’ AI decisions lack a solid foundation. Finding biases and errors in models is also more difficult without interpretability.
Classical graph theory approaches have a different aspect. They enable transparent and fair business decisions. This is achieved by offering calculations that are both repeatable and fully explained.
We apply graph theory approaches to differentiate legitimate businesses from potential risks. Classic measures like PageRank and degree centrality capture certain Network qualities. Additionally, we create custom Network science measures. These measures capture distinct features associated with particular risks.
Network shape and risk
Spatial properties of co-directorship Networks vary between MUCs and legitimate businesses (Figure 1). In the latter, the central individual is the director of few businesses, each with a large board of directors. Such Networks tend to be more dense (Figure 1a). In contrast, MUC co-directorship Networks are sparse (Figure 1b), and often star-shaped. MUC incorporation agents tend to be on the board of many businesses, each with few directors.
Figure 1 Shared directorship graphs. The yellow Node depicts the Subject director of the Network. Each Node represents an individual, and each Edge represents a shared directorship. (a) A typical, legitimate Network of co-directorships. (b) A MUC co-directorship Network.
A typical MUC co-directorship graph has measurably different properties from legitimate Networks. We can use PageRank to measure each Node’s relative importance within a Network. This results in consistently different patterns for the legitimate and MUC Networks (Figure 2a).
PageRank does a good job at separating legitimate and MUC co-directorship Networks. Additionally, we can encode our domain knowledge of MUC Networks’ structure by designing a custom graph measure. Typically, directors of legitimate Networks share common, rich context relationships with one another. In MUCs, however, we observe a contrasting behaviour. The businesses share nothing in common, but the incorporation agent that created them.
To detect this pattern, we measure the Subject director’s Network connectivity. We calculate the proportion of Edges lost if the Subject Node is removed from the Network. This measure places the Subject’s connectivity on a spectrum ranging between (0, 1]. On the left side of the spectrum we have dense graphs connected through the Subject Node. In the fully connected graph case the score is 2/n, where n is the number of graph Nodes. On the right side of the spectrum - star-like graphs (score of 1). This domain-knowledge derived feature provides a very good class separation (Figure 2b).
Figure 2. Histograms for (a) Subject PageRank (b) Connectivity via subject. Measures show strong class separation between legitimate co-directorship Networks (blue) and MUC Networks (orange).
Model performance uplift
We incorporate structural features into our machine learning models for MUC detection. Resulting model has a strong performance uplift. We thus are able to better identify positive MUC cases. Which is reflected in the improved Precision and Recall metrics (Table I).
Table I. Model performance can be improved by the addition of structural features
We presented a comparative analysis between legitimate businesses and Mini-Umbrella Companies (MUCs). The analysis reveals clear differences in the properties of their co-directorship Networks. Corporate registry Documents alone provide a rather limited view. Structural Network features offer valuable insights into connections between Entities. Such models are fully explainable and have strong predictive capabilities.