This article introduces Knowledge Graphs at an introductory technical level. Note that:
- QKnowledgeGraph is the analytics library for working with Knowledge Graphs
- Knowledge Graphs created by Quantexa are the graphs created and analyzed using QKnowledgeGraphs
In an introductory Quantexa blog, we introduced why knowledge graphs are popular, and how Knowledge Graphs created by Quantexa, drawing on Quantexa’s Decision Intelligence Platform, facilitate unsurpassed knowledge graph scale. We used the analogy of the "widest angle" James Webb telescope to compare it with other like-minded technologies. The full capability of knowledge graph generation and analytics allows you to explore and analyze the widest networks, illuminating relationships and knowledge, with no need to define an ontology nor be constrained by the need for a graph database.
Here are some key technical highlights and what you need-to-know to get started with Knowledge Graphs created by Quantexa and analyzed by QKnowledgeGraph. We shall refer to supporting documentation assets throughout, with much of this and the following blog discussed in detail in the documentation QKnowledgeGraph Technical Overview and Using Knowledge Graph-Based Scores in Assess, which also includes executable code examples.
When To Use QKnowledgeGraph
QKnowledgeGraph is a Quantexa engine for large-scale graph analytics in Python, which allows you to interact directly with Knowledge Graphs created by Quantexa. They should be used when you want to:
- Perform graph analytics or Graph Machine Learning (GML) at scale on Quantexa networks.
- Create a Batch Perspective of your networks for analytical or scoring purposes.
- Create simple ego-networks in Batch faster than Graph Scripting, servicing at-speed workflows, for example populating machine learning feature stores.
The core capabilities of QKnowledgeGraph include:
- Creating Analytical Perspectives: Create new relationships between existing Nodes and Edges in Quantexa Networks.
- Expansion Queries: Fast neighborhood expansions to create multi-hop ego-networks. They also provide expansions through Profligate Entities and an intuitive query Application Program Interface (API)
- Graph Algorithm Library: This includes a PageRank algorithm, which we introduce in this blog and show an example in a later article.
- Export to Open Source Graph Formats: They include an open data format of Nodes and Edges as flat Parquet files, readable with Pandas, PySpark & PyArrow. You can export to the following formats:
- Graph Machine Learning Adapter: Load graph data into Graph Neural Networks (GNNs) for training deep-learning graph models using PyG. This is analogous to Petastorm for knowledge graphs.
Knowledge Graphs created by Quantexa center on Entities and understanding key Entity-to-Entity relationships. They do not include Documents. This makes computation intentionally light, allowing scaled use for key investigatory use cases, and aiding the creation of simple ego-centric networks in batch at speed. This can include encoding network context into machine learning models by calculating graph analytical statistics as a feature engineering step, for example to perform machine learning inference, and/or model training.
Feature Engineering with Network Context
For typical Machine Learning (ML) projects, data scientists may need to encode Network context into their model by calculating graph analytical statistics in a feature engineering step. QKnowledgeGraph supports this popular use case by providing a scalable way of parallelizing algorithms from the popular NetworkX graph analytics package in the neighborhood of each node. You can calculate features based on node attributes in the same processing step.
- Run Batch Resolver up to the Flatten Entity Attributes stage.
- Ingest Batch Resolver outputs to create a Knowledge Graph.
- Define and save a Perspective that captures relevant Network context.
- Create ego-networks with QKnowledgeGraph.
- Convert ego-networks to NetworkX format.
- Extract features from NetworkX graphs in parallel with Spark.
Perform inference on an existing AI model or train a new model.
In this way, Knowledge Graphs can be used with QKnowledgeGraph to populate external offline and online feature stores.
Introducing PageRank
As noted earlier, QKnowledgeGraph produces statistics and predictions in Batch, typically at the Entity level. This architecture suits the following outputs:
- PageRank.
- Node classification prediction by a Graph Neural Network (GNN) or another graph-based machine learning model.
- Through the NetworkX backend for QKnowledgeGraph, most popular graph algorithms can be applied directly
We plan to add community detection and other algorithms in future QKnowledgeGraph releases.
These statistics create a Spark DataFrame with Entity ID and a numeric value. For example, for PageRank, the numeric value is a probability between 0 and 1.
PageRank is an algorithm that measures a Node’s importance relative to its neighbors. This algorithm typically requires 50-100 depth traversals around the global network to converge, a challenging compute problem for many environments. You can then incorporate analytics calculated using QKnowledge Graph into your Scoring DAG and use them in an Assess Score.
PageRank is a more global centrality measure than other more localized measures. For example, Node degree only considers immediate neighbors. Being recursive in nature, a node has a high page rank if its neighbours have a high page rank.
Personalized PageRank
You can also personalize the PageRank to treat specific Nodes as important sources. This enables you to skew the output, based on the question you are looking to answer, to favor these more important Nodes. Personalized PageRank (PPR) applies to situations where you model flow, such as to find points of high and low concentration. For example, PPR can show Nodes with high connectivity to known bad Nodes. Think suppliers with bank accounts from sanctioned countries, for example.
Final Comments
For further information on QKnowledgeGraph and Knowledge Graphs created by Quantexa, explore the relevant sections in the technical documentation, the QKnowledgeGraph documentation and an example of how to add the output of a normally computationally intense PageRank algorithm into your Batch and Scoring DAGs.