In this article, we ask Rosie Lang, Principal Data Engineer at Quantexa, to share her experience and walk us through the set up and use of Quantexa's Entity-Entity, a feature which allows you to see how individual Entities are associated.
So what’s Entity-Entity all about?
Figure 1. Entity-Entity
Entity-Entity allows you to clearly see the associations between different Entities, rather than having to work out those links manually by tracing mutual connections.
Why did you find the feature useful?
From looking at a Quantexa Network in the standard Document-Entity model, you can’t tell how different Entities are associated.
For example: “Which phone is associated with which individual in the diagram below?”
Figure 2. Who owns which phone?
The answer? Use Entity-Entity links.
Figure 3. Who owns which phone? – Solution
The phone Entities are now directly connected to the individuals who own them.
Adding the links was very similar to adding “associated Entities” for Compounds, and the code needed to be written was similar…
— Rosie Lang, Principal Data Engineer at Quantexa
What were your findings during implementation?
The actual implementation was very simple. It is possible to implement Entity -Entity (E-E) links in both data fusion and the lenses framework (legacy Extract, Transform and Load (ETL)). Both approaches are documented and supported. The initial projects to implement E – E links were using Fusion, and therefore this would be the recommended approach.
For the most part adding the links was very similar to adding “associated Entities” for Compounds, and the code needed to be written was similar.
What implementation and design steps do you suggest?
Decide on Entity-Entity links.
- Work out which Entity-Entity edges you wish to display to users and add them into your Fusion config.
Decide which Document-Entity links you wish to keep and which you wish to cut.
- Mark these edges as “primary” and “secondary” respectively, by adding edge attributes in your config.
Make your perspective and decide which links to show.
- You can entirely remove the secondary Entity-Document links or you can fade them out.
- You can either make different perspectives that users can switch between, or you can make one global view. Remember, to be worthwhile, users will need to familiarize themselves with the perspective, so don’t create unnecessary perspectives.
Consider Auto Node Grouping if there is a clear primary Entity.
- To make a true Entity-only view you can auto-group Documents and Entities to make a cleaner view.
- For example, consider a customer Document which always has a single customer Entity resolved. You don’t need both the customer Document node and the customer Entity node on the Network – you can group them for a cleaner view.
- For a transaction Document – there they are two key Entities and no obvious primary Entity, so it would be harder to define a primary Entity.
What design considerations should be taken into account?
Unattached Entities
- For complex data models or poor data quality, it is not always obvious which is the “primary Entity” and you may end up with unattached Entities. This will be confusing for users.
For example:
- First, take a transaction Document, without Entity-Entity implemented.
Figure 4. Entity Model
- Then, a clear design would be to add Account - Individual links and to cut the Individual – Document links.
Figure 5. Entity links
- However what happens if you have transaction Document with no Account Entity…
Figure 6. Unattached Individuals
The options are:
- Fade out Document – Entity links instead of cutting.
- Use more complex logic to decide the primary Entity (for example, if account is populated then account, otherwise use individual).
Direction of links
- All of our links are non-directional, so it wasn't clear if we should be adding “incoming” or “outgoing” links. Ultimately, it didn’t matter, but to ensure consistency across the data sources we should have decided an approach upfront.
What implementation tips would you like to share?
- Whilst technical implementation is fairly trivial, the design is not. For most data sources, it requires a fair bit of “thinking”. For each data source I would recommend drawing out wire diagrams, and discussing them in a team.
- Spending some time drawing ideas with a whiteboard is also recommended. The E – E design of each data source should be its own piece of work, and it would be challenging for someone not experienced in Networks or data modelling.
Additional information
Did you know that if you log in (or sign up) for the Community you can unlock further technical details on Quantexa products and services on our Documentation site?
Entity-Entity linking on the Quantexa Documentation site is very helpful for giving additional background information on this topic.
For further reading on entities, you can also visit the Entity Resolution section of our Community library (login required).
Build information
Version 2.0.0 TQP, May 2022