Need help to make LoadElasticScript job load data faster

We are loading data into Elastic Search hosted on Kubernetes.
Elastic Search cluster data nodes = 10
Data volume = 500 million
Data processing speed = 1.9 million records processed in 6.5 hours.
Seems my job is not utilizing 10 data nodes completely. Could you please suggest me the needful to make this job run faster?
Please see below
etlConfig configuration
elastic {
dnb {
documentType: "dnb"
runId: 149
metadataPath: "gs://dnb-p2d-s-sto-g-inbound/resolver-output-data/metadata.parquet"
jobSettings {
dataPath {
hdfsRoot: "/dnb-p2d-s-sto-g-inbound/resolver-output-data/20230620/149/dnb"
}
indexSettings {
name: "dnb-r-2-5-0"
indexCreationOptions {
search {
additionalSettings {
"index.mapping.nested_objects.limit": "1000000"
"index.mapping.total_fields.limit": "100000"
"index.mapping.depth.limit": "100"
"index.mapping.nested_fields.limit": "100"
"number_of_shards" : "100"
}
}
}
}
metricsOptions {
collectLoadSizes: true
collectIndexSizes: true
}
}
elasticSettings {
elasticNodes: {
searchNodes = ["elasticsearch.stg.p2d.prod.gcpdnb.net:443"]
resolverNodes = ["elasticsearch.stg.p2d.prod.gcpdnb.net:443"]
}
auth {
user = "elastic"
password = "p2d@esk@admin"
}
https {
enabled: true
}
clientRetrySettings {
timeoutInSeconds = 180
retries = 10
retryWait = 10
}
bulk {
sizeInMb = 1000
entries = 50
retries = 10
retryWait = 10
}
}
useSynonyms = ${synonyms.useSynonyms}
incrementalMode = false
deleteSettings {
batchSize = 1000
indexTypes: ["doc2rec", "address", "business", "individual", "telephone", "email"]
}
updateMode = ${incrementalMode.enabled}
}
Spark-submit configuration
spark-submit
--class com.quantexa.scriptrunner.QuantexaSparkScriptRunner
--master yarn
--executor-cores 8
--num-executors 36
--executor-memory 8G
--driver-memory 200G
--conf spark.executor.memoryOverhead=$E_OVERHEAD
--conf spark.dynamicAllocation.enabled=false --conf spark.sql.autoBroadcastJoinThreshold=-1
--conf "spark.es.nodes.wan.only=true" --conf "spark.yarn.dist.archives=$LIBPOSTALHOME/joint.tar.gz,$LIBPOSTALHOME/libpostal_datadir.tar.gz"
--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -DlibpostalDataDir=./libpostal_datadir.tar.gz"
--conf "spark.executor.extraLibraryPath=./joint.tar.gz"
--conf "spark.driver.extraClassPath=/usr/lib/hadoop-lzo/lib/"
--conf spark.task.maxFailures=10
--conf "spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED"
--conf "spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED"
--conf "spark.driver.extraClassPath=/etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/aws-java-sdk/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/*"
--conf spark.sql.shuffle.partitions=2005
--conf spark.default.parallelism=2005
--conf spark.sql.debug.maxToStringFields=1000
--jars /home/hadoop/p2d/jars/allds/data-source-all-shadow-dependency-$VERSION.jar /home/hadoop/p2d/jars/allds/data-source-all-shadow-projects-$VERSION.jar \g
Topics
- Topics
- General Topics
- 109 Quantexa News & Announcements
- 13 Community Digest
- 44 Getting Started
- 574 Academy
- 5 Jobs Board
- Platform Topics
- 5 Release Announcements