Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (50.2k points)

Is there an easy way to check whether two data frames are different copies or views of the same underlying data that doesn't involve manipulations? I'm trying to get a grip on when each is generated, and given how idiosyncratic the rules seem to be, I'd like an easy way to test.

For example, I thought "id(df.values)" would be stable across views, but they don't seem to be:

# Make two data frames that are views of same data.

df = pd.DataFrame([[1,2,3,4],[5,6,7,8]], index = ['row1','row2'], 

       columns = ['a','b','c','d'])

df2 = df.iloc[0:2,:]

# Demonstrate they are views:

df.iloc[0,0] = 99

df2.iloc[0,0]

Out[70]: 99

# Now try and compare the id on values attribute

# Different despite being views! 

id(df.values)

Out[71]: 4753564496

id(df2.values)

Out[72]: 4753603728

# And we can, of course, compare df and df2

df is df2

Out[73]: False

1 Answer

0 votes
by (108k points)

One can verify either by:

  • testing equivalence of the values.base attribute rather than the values attribute, just like the following code:

df.values.base is df2.values.base rather df.values is df2.values

  • Or we can use the (admittedly internal) _is_view attribute (df2._is_view is True).

If you are interested in learning Pandas and want to become an expert in Python Programming, then check out this Python Course and upskill yourself.

Related questions

Browse Categories

...