This page describes Quantexa's technology-related recommendations for multi use case deployments. Many of the practices described on this page will also benefit single use case deployments.
These best practices are:
- Share data in Elasticsearch, but have a separate analytics pipeline and UI per use case.
- Ensure you have automated testing which regularly tests integration between components.
- Ensure you have easily-scalable infrastructure.
- Organize the Quantexa codebase into multiple repositories, divided by component.
- Ensure each repository is owned by a single team, and is published regularly.
1. Share data in Elasticsearch, but have a separate analytics pipeline and UI per use case.
There are many possibilities for sharing components of the Quantexa system between use cases.
Sharing components between use cases has various advantages, including reducing the cost of processing and storing data. This is especially beneficial for large data sets, such as corporate registry sources or transaction history.
However, sharing components increases the amount of cross-team coordination and testing, which can reduce the agility of each use case. It may also limit their ability to meet use-case-specific requirements.
For most deployments, Quantexa recommend that the data ingestion process for common data sources is shared between use cases. This means that all use cases use the same batch outputs and Elasticsearch indices. You can read more about centralizing common data sources here:
https://community.quantexa.com/kb/articles/231-5-centralized-data-sources
For most use cases, Quantexa recommend that separate batch entity builds, network builds, analytics, and mid-tier/UI are maintained. This allows each use case to make customizations without impacting other use cases in any way, including:
- adding supplementary data sources;
- tuning entities and networks;
- implementing new scores;
- customizing the UI.
2. Ensure you have automated testing which regularly tests integration between components.
Interfaces between components are particularly important in a multi use case deployment. Each component may be owned by a different team, and a single component may interface with multiple downstream use cases. As such, these interfaces need to be highly reliable and consistent.
We recommend the use of automated testing to streamline testing of these interfaces and to provide assurance across all Quantexa use cases.
Testing shared components
For components which are shared between multiple use cases, such as common data sources, the following activities are key:
- End-to-end batch pipeline: Perform an end-to-end run of the code over full data. Confirm that there are no failures and that broadly correct outputs are produced. For common data sources, this should include running Batch Resolver over a standard set of data sources using a default Resolver configuration.
- Statistical Profile Testing Framework: Run the relevant modules from the SPTF on the outputs from the batch pipeline. Use the Test Suites to confirm that outputs have not changed unexpectedly.
- Generate representative outputs: publish the outputs from the batch data pipeline. This provides a reliable test interface which consumers can use independently to test their deployments.
Testing use-case-specific components
To ensure that each use case can easily test their interface with shared components, implement the following automated tests for the use-case-specific components:
- End-to-end batch pipeline: Use the test outputs provided by the shared component. Confirm that there are no failures and that broadly correct outputs are produced (such as Tasks).
- Statistical Profile Testing Framework: Run the relevant modules from the SPTF on the outputs from the batch pipeline. Use the Test Suites to confirm that outputs have not changed unexpectedly.
- Call the mid-tier search APIs for each Document type, confirming that the correct number of results is returned.
- A series of calls to the mid-tier APIs following a typical user journey in the UI, confirming that all requests return successfully.
3. Ensure you have easily-scalable infrastructure.
The infrastructure usage of a Quantexa use case is not static over time. It will be higher or lower depending on ongoing development/testing activities and scheduling of batch runs. As the number of use cases and developers increases, this variability increases too.
Static infrastructure will either cause contention during busy periods, or will sit idle during quiet periods. Both of these outcomes lead to higher costs.
If you are using cloud-based infrastructure, ensure you are making the most of the ability to quickly turn infrastructure on and off. For some infrastructure components this can be achieved through autoscaling. The greatest flexibility can be achieved by ensuring developers have the ability to launch ephemeral resources as needed.
For on-premise infrastructure, you should seek to minimize contention:
- Ensure developers use the smallest possible data set for the task at hand, to reduce the amount of infrastructure they require.
- Ensure there is ample cross-team communication about planned upcoming infrastructure usage. Factor additional time into timelines if needed.
- Consider how large workloads can be offset between use cases, for example by timing batch windows carefully.
4. Organize the Quantexa codebase into multiple repositories, divided by component.
Storing all Quantexa code in a single repository can cause challenges for multiple use case deployments. Most notably, it prevents use cases from releasing a new version of their code independently from other use cases.
We instead recommend that you create single-purpose code repositories to simplify the storage and management of code across the deployment.
The recommended repository structure is:
- One repository per data source, containing data ingestion code and configuration (e.g. Data Fusion). Each data source may be shared between multiple use cases, or may be specific to a single use case.
Where there are closely-related data sources, these can share a single repository. - One or two repositories per use case containing:
- Analytics: batch entity resolution configuration, batch network-building configuration, scoring, alerting, task loading.
- Application tier: UI customization code, and mid-tier code/configuration.
5. Ensure each repository is owned by a single team, and is published regularly.
Giving ownership of each repository to a single team allows for clear definition of responsibilities for the development and maintenance of each component.
It's important to allow downstream components to stay aligned with changes as they are made, rather than accumulating migration effort for a formal release. To achieve this:
- Publish versions of common repositories at least once per sprint. This should include producing both code artifacts (namely published binaries and a tagged codebase) and data artifacts (such as indices in Elasticsearch) which can be used by consumers.
- Ensure that owners of common components have a clear understanding of breaking vs. non-breaking changes. Breaking changes are those which require consumers to adjust their code or perform other manual migrations. These should be rare, well-documented, and clearly communicated. Release notes should summarize the breaking and non-breaking changes in each release.
- As new versions of common artifacts are published, consumers must migrate promptly to the latest version. Quantexa recommends that each team update their dependencies to the latest published version of each common artifact at the start of each sprint. As part of this activity, teams must perform any mandatory migrations, and use automated testing to ensure that their code runs correctly using the new dependencies.
Whilst this approach may sound complex, it becomes a straightforward habit once established. These few small regular tasks will significantly reduce the frequency of more serious dependency issues later on.
Further reading
For further information on Quantexa's best practices for multi use case deployments, see: