Introduction
Quantexa deployments require a simple set of underlying components:
- Distributed Spark cluster for Quantexa batch data processing
- Container platform to serve the Quantexa UI and microservices
- Elasticsearch for serving data to the mid-tier
- RDBMS for storing user state in the mid-tier
Along side these you will need good development and production tooling.
Recommended deployment patterns
Quantexa recommends deploying the above components in the following way:
Spark
Use a distributed Spark cluster; many customers share an existing Spark cluster, or alternatively set up a dedicated cluster for their needs. Recommended distributed Spark deployments are
It is not recommended to use Spark for Quantexa in either Standalone or Apache Mesos deployments. Note that on-premise Kubernetes clusters setup for microservice deployments are unlikely to have the required resources and configuration to be suitable for deploying Spark.
If deploying into a cloud environment such as GCP or AWS, it is recommended to use a cloud native distributed Spark service.
Scaling: Distributed Spark clusters scale horizontally with the volume of data to be processed.
Networking: Spark must be able to make outbound network connections to the Quantexa mid-tier applications, and to Elasticsearch.
Container platform
Quantexa strongly recommend using Kubernetes, or Kubernetes derived container platform such as OpenShift for hosting Quantexa's mid-tier components. By using a container platform with Quantexa's micro-service architecture allows for easy scaling and highly available deployments.
Quantexa provides Helm charts to make the deployment and management of its mid-tier applications deployed on Kubernetes easy.
Key notes on using Kubernetes with Quantexa:
- Quantexa's container deployment model fully supports using your own container base images to meet your organizations security and patching requirements. The only requirement is that they can run
java
processes in a JVM. - Quantexa containers can require more resources than common platform limits. Quantexa recommends containers be able to have up to 6 vCPUs and 12GB of memory available.
- TCP communication must be allowed between different containers.
- It is recommended to use a service mesh such as Istio to provide inter-application.
Scaling: container platform clusters scale horizontally with the volume of users or load on the system.
Networking: the container platform must be able to make outbound connections to Elasticsearch and the RDBMS. It must be able to accept inbound connections from Spark.
Elasticsearch
Elasticsearch is used to serve data to the Quantexa mid-tier applications, both for display and dynamic entity resolution. The performance of Elasticsearch has a direct impact on the performance seen in the Quantexa UI and REST APIs, so it is critical to setup your Elasticsearch cluster correctly.
The Elasticsearch cluster used by Quantexa can be deployed across a series of virtual or physical machines, or within Kubernetes. If deploying into a cloud environment such as GCP or AWS, it is possible to use a cloud native Elasticsearch service. The OpenSearch fork of Elasticsearch can also be used with Quantexa.
Key notes on using Elasticsearch with Quantexa:
- Elastic strongly recommend using SSDs for storage, otherwise a significant performance degradation may be seen.
- Tiered data storage cannot be used, where Elasticsearch has Hot, Warm, Cold and Frozen data tiers.
- Quantexa's use of Elasticsearch tends to be more CPU and memory intensive than typical log management use cases.
Scaling: Elasticsearch clusters scale horizontally with the volume of data to be processed, and concurrent user load on the Quantexa mid-tier.
Networking: Elasticsearch must be able to accept inbound connections from both the container platform and Spark.
RDBMS
Quantexa does not intensively use the RDBMS or store large volumes of data in it. The RDBMS is mainly used to store state data for saved investigations, tasks and audit.
Quantexa can use a RDBMS deployed as a service, in Kubernetes or dedicated virtual machines. It is recommended that the deployment pattern transparently provides support for high availability and disaster recovery.
There must be a one-to-one relationship between RDMBS and application-deployments, so a single set of database schemas cannot be shared by multiple different deployments of the mid-tier.
Networking: the RDBMS must be able to accept inbound requests from the container platform.
Deployment and operational tooling
Along with the core platform components, it is highly recommended pattern to deploy Quantexa with appropriate SDLC and production tooling, such as monitoring and log observability. The above diagram gives an example deployment and operational tooling approach.
SDLC Tools
Quantexa highly recommend the use of:
- Version control using Git, which stores the deployment specific code and configuration.
- Artefact repositories, such as Nexus or Artifactory, to store and serve Quantexa compiled binaries to the Spark tier.
- Container repositories, to store and serve the container platform with the Quantexa mid-tier containers.
Use of these tools, alongside a thorough CI/CD tooling, is recommended to ensure easy code management, and support automated or semi-automated deployments, reducing the risk of human error.
Production Tools
When deployed into a production environment, a Quantexa deployment recommends having the following tooling to aid the operational running of the deployment:
- A batch scheduler, such as Apache Airflow, to orchestrate the Quantexa batch component run in Spark. Existing batch scheduling tools can be used.
- An external IAM system providing either SAML or LDAP. This allows users' access to be managed independently of the Quantexa system, fitting into existing processes to manage roles within an organization.
- Monitoring and log management across both Quantexa's batch and mid-tier application components, to provide observability on the health and state in time of components to aid service management.
Development tooling
Developers deploying Quantexa require access to a GUI development desktop to perform project analysis and development activities.
The desktop environment will require a range of development tools available which will require a reasonable level of resource (4vCPU, 8GB RAM, 50GB disk).
Standard scala and node development tools required on this environment include, Java JDK, Scala, Node JS, npm, gradle, git and an IDE.
Additional Resources
Did you know that you can log in (or sign up) to the Community to unlock further resources in the Community and on our Documentation site?
Further reading