# creating pandas dataframe with dtype float64 changes last digit of its entry (a fairly large number)

1 view

i tried to create a pandas dataframe like below

import pandas as pd

import numpy as np

pd.set_option('precision', 20)

a = pd.DataFrame([10212764634169927, 10212764634169927, 10212764634169927], columns=['counts'], dtype=np.float64)

a returns as:

counts

0  10212764634169928.0

1  10212764634169928.0

2  10212764634169928.0

So, my question is, why is the last digit modified?

EDIT: i understand it has to do with the dtype. But why +1 to the last digit specifically? If i were to use 10212764634169926 instead, nothing happens, the results keeps to 10212764634169926. The same is with 10212764634169928, it returns 10212764634169928

by (38.4k points)

The issue is related to float numbers and not with pandas. If you try the following:

The below code will give you an idea about how float numbers are stored in memory through the exponential notation.

float(10212764634169927)

1.0212764634169928e+16

For showing the demo of float32 format that would return  more difference, following test is done on the given values.

a.astype('float64')

counts

0  10212764634169928.0

1  10212764634169928.0

2  10212764634169928.0

a.astype('float32')

counts

0  10212764362473472.0

1  10212764362473472.0

2  10212764362473472.0

If you wish to learn more about how to use python for data science, then go through data science python programming course by Intellipaat for more insights.