Introduction
This blog shares insights into the challenges encountered during a complex streaming project focused on Lending Fraud detection. The solutions implemented and key lessons learned are derived from a real-world project aimed at achieving two primary objectives:
- Decommissioning a legacy system – replacing the outdated legacy system.
- Implementing a near-real-time decision engine – integrating Quantexa’s Entity Resolution (ER) and network capabilities to enhance fraud detection and decision-making.
The project faced significant functional and non-functional challenges, requiring meticulous performance optimization. Initial performance metrics did not meet expectations, but a series of refinements facilitated the Go-Live. Further optimizations are ongoing to fully meet stringent performance requirements.
Project background
Replacing the legacy system
The client aimed to retire their existing system, which lacked advanced capabilities such as:
- 4-hop network generation
- Complex Entity Resolution (ER) algorithms
Quantexa’s platform was selected to address these deficiencies, offering superior fraud detection through enhanced network analysis.
Building real-time decision system
The client developed a centralized decision engine, functioning as a hub that integrates assessments from multiple spoke systems. Quantexa’s solution serves as one of these key spokes, responsible for:
- Advanced Entity Resolution (ER)
- Network generation and scoring
- Providing fraud investigators with enriched data for decision-making
The streaming data processing layer for the solution relies on a client-managed enterprise Kafka platform.
Performance requirements
A key requirement was to complete ingestion, network expansion, and scoring within a 10-second SLA. While task loading is not in the critical SLA path, it plays a crucial role in fraud investigations by providing investigators with relevant case links.
Additional data sources
The project introduced two new watchlist data sources in addition to all existing data sources including customers, Orbis, SMR etc:
- External Watchlist – provided by the Australian Financial Crimes Exchange (AFCX), a non-profit combating financial crime.
- Internal Watchlist – a proprietary list maintained by the client to flag entities linked to financial crime.
These watchlists are ingested using a Spark-based hourly micro-batch, consuming events from the same enterprise Kafka platform.
Quantexa streaming pipeline overview
The Quantexa streaming fraud detection pipeline processes bank's lending applications. The end-to-end streaming pipeline includes:
- Ingestion
- Network Expansion
- Scoring
- Task Loading
- Persisting scorecards in Elasticsearch for future retrieval
Project key characteristics and solutions to address challenges
1. Application tier
Challenge
In non-Kafka deployments, the startup sequence of mid-tier applications is generally non-critical. However, Kafka applications function as "clients," meaning that applications such as expand-score and task-load depend on core services being available beforehand. Additionally, without service discovery, Kafka clients are unaware of core service availability, leading to failed API requests.
Solution
- Sequential startup – Core services start first, followed by Kafka services, with a brief delay in between.
- API health-check – Kafka services check core service availability before starting. If unavailable, they shut down and must be restarted.
- Retry mechanism – API calls retry up to three times with exponential backoff.
These measures reduce failures and ensure a stable startup sequence. In a containerized environment (e.g., Kubernetes), service dependencies could be managed through orchestration tools.
2. Managing high tasks volume
Challenge
Each Lending Application required a corresponding task, even if its risk score was zero. With 5,000+ applications daily, this resulted in over 1 million tasks annually, exacerbated by multiple update and outcome events per application.
Solution
- Initial task creation – A task is created only for the first application submission.
- Task updates instead of new tasks – Subsequent application events update the existing task.
- APIs:
Investigation Client: Refresh Graph & Check Refresh State
Investigation Client: Expand
This approach reduces unnecessary task creation while maintaining a timeline view of application updates.
3. Propagating Event Metadata
Challenge
After upgrading from Quantexa 2.1 to 2.6, metadata propagation was disrupted due to the new record extraction and ingestion schema definitions. This affected scoring decisions and task creation.
Solution
Initially, metadata was embedded in the document model but led to overwrites. Finally, we managed to overwrite (the document-ingest service was customized) the schema of the document ingest success topic to include the full application event, preserving metadata integrity across updates.
4. Meeting Performance Requirements
Challenge
The decision engine required a 5 TPS throughput, with a <10-second response time. Initial performance tests showed:
- 0.4 TPS throughput
- 40th percentile: <10s, but 90th percentile: >30s
Solution
Infrastructure optimization
- Increased heap space to reduce GC overhead (some services spent 30+ seconds per minute on garbage collection).
- Configured ActiveProcessorCount to optimize thread allocation on the single-node environment.
- Adjusted service instance count:
- Reduced
app-investigate
instances - Increased
app-resolve
and app-scoring
instances
OpenSearch Index Optimization
- Identified Orbis indices as bottlenecks using slow logs.
- Implemented:
- Segment merging post-Elastic Load (
jobSettings.indexAdmin.sendMergeSegmentMessage
) - Optimized shard count (
indexShards
settings) - Added replicas for high availability
Results
- TPS increased to 1.2
- Response times <10s for 60% of applications
- Further improvements uplifted performance to 80% of applications scoring within 10s
Conclusion
This project highlighted key challenges in implementing a streaming Lending Fraud detection solution and demonstrated how Kafka, Quantexa’s ER, and optimized infrastructure can meet complex real-time processing requirements. Lessons learned include:
- The importance of service startup sequencing in Kafka-based architectures.
- Task update strategies to manage high event volumes.
- Ensuring metadata integrity after product upgrades.
- Performance tuning across infrastructure, application, and indexing layers.
While the project has significantly improved since its initial deployment, ongoing refinements are necessary to achieve full SLA compliance. Future enhancements will continue to leverage improvements in newer version of Quantexa platform, further optimizing processing efficiency and resilience.
More read:
https://community.quantexa.com/kb/categories/209-platform-architecture-kafka-streaming