Influxdb bulk insert using influxdb-python

Question

asked Mar 31, 2021 in Python by laddulakshana (16.4k points)
closed Jun 27, 2023 by Anamika Chakravarty

I utilized influxDB-Python to embed a lot of information read from the Redis-Stream. Since Redis-stream and set maxlen=600 and the information is embedded at a speed of 100ms, and I expected to hold the entirety of its information. so I read and move it to influxDB(I don't have the foggiest idea what's a superior data set), yet utilizing batch inserts just ⌈count/batch_size⌉ bits of information, both toward the finish of each batch_size, seem, by all accounts, to be overwritten.

Following code:

import redis
from apscheduler.schedulers.blocking import BlockingScheduler
import time
import datetime
import os
import struct
from influxdb import InfluxDBClient
def parse(datas):
ts,data = datas
w_json = {
"measurement": 'sensor1',
"fields": {
"Value":data[b'Value'].decode('utf-8')
"Count":data[b'Count'].decode('utf-8')
}
}
return w_json
def archived_data(rs,client):
results= rs.xreadgroup('group1', 'test', {'test1': ">"}, count=600)
if(len(results)!=0):
print("len(results[0][1]) = ",len(results[0][1]))
datas = list(map(parse,results[0][1]))
client.write_points(datas,batch_size=300)
print('insert success')
else:
print("No new data is generated")
if __name__=="__main__":
try:
rs = redis.Redis(host="localhost", port=6379, db=0)
rs.xgroup_destroy("test1", "group1")
rs.xgroup_create('test1','group1','0-0')
except Exception as e:
print("error = ",e)
try:
client = InfluxDBClient(host="localhost", port=8086,database='test')
except Exception as e:
print("error = ", e)
try:
sched = BlockingScheduler()
sched.add_job(test1, 'interval', seconds=60,args=[rs,client])
sched.start()
except Exception as e:
print(e)

Data Changes accompanying the influxDB

> select count(*) from sensor1;
name: sensor1
time count_Count count_Value
---- ----------- -----------
0 6 6
> select count(*) from sensor1;
name: sensor1
time count_Count count_Value
---- ----------- -----------
0 8 8
> select Count from sensor1;
name: sensor1
time Count
---- -----
1594099736722564482 00000310
1594099737463373188 00000610
1594099795941527728 00000910
1594099796752396784 00001193
1594099854366369551 00001493
1594099855120826270 00001777
1594099913596094653 00002077
1594099914196135122 00002361

For what reason does the information give off an impression of being overwritten, and How might I settle it to insert all the information at a time?

I would value it in the event that you could disclose to me how to settle it?

closed

4 Answers

answered Jun 27, 2023 by Similu (15.4k points)
selected Jun 27, 2023 by Anamika Chakravarty

Best answer

You are encountering an issue where data seems to be overwritten when inserting it into InfluxDB using batch inserts. This occurs while reading data from a Redis-Stream and attempting to move it to InfluxDB for long-term storage. You have observed that only a portion of the data is successfully inserted, with the rest appearing to be overwritten.

To address this issue and ensure that all the data is inserted into InfluxDB without being overwritten, you can consider the following suggestions:

Validate Redis-Stream data: Double-check the Redis-Stream data you are reading to ensure that it doesn't contain duplicates or repeated entries. Verifying the uniqueness of the data will help prevent unintentional overwriting during insertion.

Adjust batch size: Experiment with different batch sizes when using the write_points function. You currently have a batch size of 300, but it may not be optimal for your specific data. Try increasing the batch size or even removing it to insert all the data at once, ensuring that the batch size aligns with your system's capabilities.

Handle data parsing errors: Implement error handling mechanisms to gracefully handle any errors that may occur during data parsing or decoding. This will prevent exceptions from disrupting the data insertion process and help ensure the successful transfer of all data.

Verify data retention policy: Verify the retention policy settings in your InfluxDB configuration. It is possible that the overwritten data is a result of retention policy rules. Adjust the retention policy if necessary to retain the desired data for your purposes.

By taking these steps and carefully reviewing your code, adjusting the batch size, addressing any data parsing errors, and verifying the retention policy, you should be able to overcome the issue of data being overwritten during insertion into InfluxDB.

hari_sh · Answer 1 · 2021-03-31T09:14:10+0000

In Influxdb, timestamp + tags are one of a kind (for example two information points with the same label values and timestamp can't exist). Dissimilar to SQL influxdb doesn't toss interesting limitation infringement, it overwrites the current information with the approaching information. It appears to be your information doesn't have labels, so if some approaching information whose timestamps are now present in the influxdb will override the current information

Are you pretty much interested to learn python in detail? Come and join the python training course to gain more knowledge.

For more details, do check out the below video tutorial...

Balram111 · Answer 2 · 2023-06-27T15:26:04+0000

The issue you are facing with overwriting data in InfluxDB while using batch inserts could be due to the way you are handling the Redis-Stream data and the batch size in your code. Here are a few suggestions to resolve the issue:

Check the Redis-Stream data: Make sure that the Redis-Stream data you are reading is unique and not duplicating the previously read data. You can add some logging or print statements to verify the data being processed.

Adjust batch size: Experiment with different batch sizes in the write_points function. You have set the batch size to 300, but it may not be suitable for your specific data and use case. Try increasing the batch size or even removing it altogether to insert all the data at once. Keep in mind the limitations of InfluxDB and the available system resources.

Handle data parsing errors: Add error handling mechanisms to handle any potential errors that may occur during data parsing or decoding. This will ensure that any exceptions do not interrupt the data insertion process.

Verify data retention policy: Check the retention policy settings in your InfluxDB configuration. It is possible that data is being overwritten due to retention policy rules. Adjust the retention policy if necessary to retain all the data you want.

By reviewing these aspects of your code and adjusting the batch size and data handling, you should be able to resolve the issue of overwritten data in InfluxDB.

Anamika Chakravarty · Answer 3 · 2023-06-27T15:27:15+0000

You're facing an issue where data appears to be overwritten during insertion into InfluxDB using batch inserts. You're reading data from a Redis-Stream and transferring it to InfluxDB, but only a portion of the data is successfully inserted while the rest is overwritten. To resolve this, validate the Redis-Stream data, adjust the batch size, handle data parsing errors, and verify the retention policy in InfluxDB.

Influxdb bulk insert using influxdb-python

Influxdb bulk insert using influxdb-python

Please log in or register to add a comment.

4 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions