FAQ: I'm missing data in Elasticsearch / my number of docs are wrong
FAQ relevant for: all Academy versions
If you have completed the ETL pipeline stages of your project and uploaded the data to ElasticSearch, then when checking your indices in the ElasticSearch Head plugin on Chrome you should have numbers similar to the picture below (to get a bigger version of the image, right click it and chose the option to open it in a new tab).
If your numbers are significantly different to this, then you will want to go back through your ETL pipeline and carefully check each stage to see if there is somewhere that you lose the data along the way. A good way to approach this problem is to work forwards from CreateCaseClass and check the output of each stage to find the problem area. You should also use the counts in ElasticSearch to guide you - for example if you have only half the number of businesses listed above, and no individuals, it lets you know that you probably haven't joined your Third Parties onto the ICIJ document properly, and so you would want to go back and double check how you have done this join and on what fields.
Specific points to consider:
- Have I correctly parsed all of the necessary fields in my qmodel files?
- Have I used the correct type of joins in CreateCaseClass, and have I joined on the correct fields?
- Have I outputted the correct Dataset at the end of CreateCaseClass?
- Have I loaded up the DocumentDataModel.parquet (the output of CreateCaseClass) into a Spark-Shell to check the output there?
- Have I correctly identified and defined all relevant start paths in my qentity files?
- Do I have a good range of compound keys for each Entity?
If you are convinced that you have done all of the above correctly then you can try to clear the data from ElasticSearch, restart the service and then re-upload the data to Elastic using the following three commands:
curl -X DELETE 'http://localhost:9200/_all'
sudo systemctl restart elasticsearch.service ./runQSS.sh -s com.quantexa.academy.task.icij.model.etl.IcijLoadElasticScript -c ../external.conf -r elastic.icij
Dan Pryer - Senior Academy Lead (Quantexa)
Did my reply answer your question? Then why not mark it as having answered in the bottom right corner of my post! 😁
of Christmas
Badge Hunt!

Topics
- Topics
- General Topics
- 152 Quantexa News & Announcements
- 16 Community Digest
- 55 Getting Started
- 713 Academy
- 5 Jobs Board
- Platform Topics
- 6 Release Announcements
Academy Links
Introduction to the Quantexa Academy
Quantexa Academy↗
Academy Training VDI Access
Quantexa Docs↗
Feedback & Suggestions
Visual Scala Reference↗
Linux Command cheat sheet↗
GitHub cheat sheet↗
IntelliJ Shortcuts↗
Specialist User Groups
-
Investigating Network Fraud
Following on from my article on why networks are important – I regularly talk with investigators on how best to identify and investigate organised network fraud. I am mindful that there are a lot of seasoned investigators out there so in this article I will be talking basics and from a perspective of using technology only…
-
New blog on various journeys to KYC transformation - tell us how you are approaching it!
While there is no "one-size-fits-all" journey to pKYC, the same foundations are critical to success for banks. Discover our latest blog on 'Pathways to pKYC: Different Journeys, Same Foundations' and let us know how you are intending to take on this transformation.
-
FCA Dear CEO letter for wealth management and stockbroking firms - Are you doing enough?
The UK Financial Conduct Authority has published a ‘Dear CEO’ letter this week to wealth management and stockbroking firms. The letter highlights two areas of concern, prevention of financial crime and implementation of the Consumer Duty. Specifically related to Financial Crime, it notes that the FCA expects firms to: not…