Quantexa makes extensive use of underlying platforms to enable it to operate at reliably at scale. When choosing these platforms the key principles were that they should be:
- Open source such that customers would be able to deploy them without incurring licence costs
- Widely adopted such that there is a large community of people with expertise in them and most organisations have already deployed them
- Horizontally scalable such that there are no limitations to data volumes or system workload
- Easily deployable in the cloud such that customers can either deploy straight into their chosen cloud platform or migrate an existing system into the cloud at a later date
Quantexa use Apache Spark for all batch data processing. Spark is an open-source, distributed processing system using memory caching and optimised query execution for processing big data workloads. It enables processing to be trivially scaled to run across large clusters of commodity hardware allowing it to process enormous datasets. This is combined with it being able to run executable binaries meaning that releases can be reliably tested prior to deployment.
Quantexa use Elasticsearch for search and low-latency data lookup. Elasticsearch has a rich feature set which is used across the Quantexa platform, its extensive use of in-memory caching are valuable for the performance of intensive operations over large datasets and it is horizontally scalable enabling large datasets (e.g. global corporate registries or massive transaction backlogs) to be efficiently queried over.
In particular Quantexa uses Elasticsearch for the following functions:
- Search - unsurprisingly Elasticsearch provides excellent functionality for search, this includes the use of facets, wildcarding, fuzzy matching and synonyms.
- Dynamic Entity Resolution - this is a very intensive process and building a single entity requires a large number of iterative queries, the Quantexa platform uses Elasticsearch's scalability and performance optimisations to enable this to be achieved at good performance
- Explorer/transaction viewers - Elasticsearch's rich querying functionality, optimised aggregations and significant terms aggregation are all used heavily by both Explorer and the transaction viewers
Container management platform
Quantexa strongly recommend using Kubernetes, or Kubernetes derived container platform such as OpenShift for hosting Quantexa's mid-tier components. Container management platforms take care of a number of boring, challenging and fiddly but extremely important tasks including:
- Easy replication of the applications,
- Health monitoring and self-healing
- Resource management
These help you build platforms which are easy to manage whilst having high availability.
Quantexa use a database to store state for the mid-tier this includes user investigations, tasks and audits of user actions. This database is typically small, i.e. at least 2 orders of magnitude smaller than Elasticsearch.