The purpose of this article is to guide organizations on the benefits and implementation practices of centralizing data sources when deploying Quantexa for overlapping use cases. It provides a comprehensive understanding of the process, from deciding which data sources to centralize based on use case and data source requirements, to developing and maintaining these centralized data sources.
Introduction
Organizations often deploy Quantexa for multiple use cases, which usually overlap in their data sources. Quantexa recommends centralizing data sources when there is more than one use case for The Quantexa Platform. This brings significant benefits, including:
- Reduced data storage, development, infrastructure & operational costs
- Improved infrastructure utilization
- Reduced time spent on quality assurance
- Shorter batch run times
- Reduced time-to-value when onboarding new use cases
From a conceptual view, centralizing data sources involves decoupling Quantexa as a single logical unit and splitting it into a Data tier and use-case specific Application tiers. Centralized data sources will be owned, managed, released, and deployed independently from the Application tier, and will be consumed by multiple downstream use cases.
Deciding which data sources to centralize
When it comes to deciding which data sources to centralize, it’s first important to think about the requirements for both the use case and data source:
Figure 1: Use Case and Data Source requirements for deciding which data sources to centralize
A number of industry specific data sources already have the potential to be good candidates for centralization, thanks to their broad applicability across use cases within the industry. A few examples are illustrated in the below image.
Figure 2: Examples of industry specific data suitable for centralization
Developing centralized data sources
Centralized data sources are designed to support a variety of use cases. As a result, the interface between the data source code and application code should no longer be managed by the same team. Centralized data sources will have interfaces that span across different teams and deployments, making it crucial to apply rigor in their definition and use.
When developing centralized data sources, Quantexa has several recommendations on how best to manage this. These include:
- Separating Data tier code from Application tier code to enable independent management and releases.
- Ensuring code and data artifacts are published in an artifactory accessible to the Application tier.
- Publishing Data tier artifacts at least once per sprint to keep consumers updated for changes.
- Upgrading and testing Application tier code when new Data tier artifacts are published.
- Implementing version numbering for data source artifacts to track changes made to the interface.
Maintaining centralized data sources:
It's recommended that data sources are maintained by a team outside their current use case.
As a result of adopting centralized data sources, the application and data source teams should think of the following roles & responsibilities for centralized data sources:
Figure 3: Roles & Responsibilities for maintaining centralized data source
Related Links
For more information on managing centralized data sources, please visit:
Managing centralized data sources
If you are unable to access the documentation site, please get in touch with your Quantexa point of contact or the Community team at community@quantexa.com.