Is there an easy way to check whether two data frames are different copies or views of the same underlying data that doesn't involve manipulations? I'm trying to get a grip on when each is generated, and given how idiosyncratic the rules seem to be, I'd like an easy way to test.

For example, I thought "id(df.values)" would be stable across views, but they don't seem to be:

# Make two data frames that are views of same data.

df = pd.DataFrame([[1,2,3,4],[5,6,7,8]], index = ['row1','row2'], 

       columns = ['a','b','c','d'])

df2 = df.iloc[0:2,:]

# Demonstrate they are views:

df.iloc[0,0] = 99


Out[70]: 99

# Now try and compare the id on values attribute

# Different despite being views! 


Out[71]: 4753564496


Out[72]: 4753603728

# And we can, of course, compare df and df2

df is df2

Out[73]: False

1 Answer

0 votes
by (108k points)

One can verify either by:

  • testing equivalence of the values.base attribute rather than the values attribute, just like the following code:

df.values.base is df2.values.base rather df.values is df2.values

  • Or we can use the (admittedly internal) _is_view attribute (df2._is_view is True).

