Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

Subclassing pandas classes seems a common need but I could not find references on the subject. (It seems that pandas developers are still working on it: https://github.com/pydata/pandas/issues/60).

There are some SO threads on the subject, but I am hoping that someone here can provide a more systematic account on currently the best way to subclass pandas.DataFrame that satisfies two, I think, general requirements:

import numpy as np

import pandas as pd

class MyDF(pd.DataFrame):

    # how to subclass pandas DataFrame?

    pass

mydf = MyDF(np.random.randn(3,4), columns=['A','B','C','D'])

print type(mydf)  # <class '__main__.MyDF'>

# Requirement 1: Instances of MyDF, when calling standard methods of DataFrame,

# should produce instances of MyDF.

mydf_sub = mydf[['A','C']]

print type(mydf_sub)  # <class 'pandas.core.frame.DataFrame'>

# Requirement 2: Attributes attached to instances of MyDF, when calling standard 

# methods of DataFrame, should still attach to the output.

mydf.myattr = 1

mydf_cp1 = MyDF(mydf)

mydf_cp2 = mydf.copy()

print hasattr(mydf_cp1, 'myattr')  # False

print hasattr(mydf_cp2, 'myattr')  # False

And is there any significant differences for subclassing pandas.Series? Thank you.

1 Answer

0 votes
by (41.4k points)

For Requirement 1, just define _constructor:

import pandas as pd

import numpy as np

class MyDF(pd.DataFrame):

    @property

    def _constructor(self):

        return MyDF

mydf = MyDF(np.random.randn(3,4), columns=['A','B','C','D'])

print type(mydf)

mydf_sub = mydf[['A','C']]

print type(mydf_sub)

I think there is no simple solution for Requirement 2, I think you need define __init__, copy, or do something in _constructor, for example:

import pandas as pd

import numpy as np

class MyDF(pd.DataFrame):

    _attributes_ = "myattr1,myattr2"

    def __init__(self, *args, **kw):

        super(MyDF, self).__init__(*args, **kw)

        if len(args) == 1 and isinstance(args[0], MyDF):

            args[0]._copy_attrs(self)

    def _copy_attrs(self, df):

        for attr in self._attributes_.split(","):

            df.__dict__[attr] = getattr(self, attr, None)

    @property

    def _constructor(self):

        def f(*args, **kw):

            df = MyDF(*args, **kw)

            self._copy_attrs(df)

            return df

        return f

mydf = MyDF(np.random.randn(3,4), columns=['A','B','C','D'])

print type(mydf)

mydf_sub = mydf[['A','C']]

print type(mydf_sub)

mydf.myattr1 = 1

mydf_cp1 = MyDF(mydf)

mydf_cp2 = mydf.copy()

print mydf_cp1.myattr1, mydf_cp2.myattr1

If you wish to learn more about Pandas visit this Pandas Tutorial.

Related questions

0 votes
1 answer
0 votes
2 answers
0 votes
1 answer

Browse Categories

...