Issue
I am uploading pandas dataframe to elastic (using elasticsearch==6.3.1), if the dataframe size is less 100MB it works fine, i am using a solution from How to export pandas data to elasticsearch?
def rec_to_actions(df):
for record in df.to_dict(orient="records"):
yield ('{ "index" : { "_index" : "%s", "_type" : "%s" }}'% (INDEX, TYPE))
yield (json.dumps(record, default=int))
from elasticsearch import Elasticsearch
e = Elasticsearch([{'host': 'localhost', 'port': 9200}])
r = e.bulk(rec_to_actions(df))
This works perfect, but for dataframe over 100MB it throws Transport Error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
TransportError: TransportError(413, '')
How to handle this? i tried setting http.max_content_length: 350mb in elasticsearch.yml file, but still getting error.
Also one more question is how to add a timestamp field along with the above function?
Solution
You can send them in batches or use Parallel bulk, example:
from elasticsearch import helpers
results = list(helpers.parallel_bulk(e, generator_fn() , thread_count=2, chunk_size=400, request_timeout=1000, raise_on_error=False))
Answered By - DARK_C0D3R Answer Checked By - Candace Johnson (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.