I want to mark some quantiles in my data, and for each row of the DataFrame, I would like the entry in a new column called e.g. "xtile" to hold this value.

For example, suppose I create a data frame like this:

import pandas, numpy as np

dfrm = pandas.DataFrame({'A':np.random.rand(100),

'B':(50+np.random.randn(100)),

'C':np.random.randint(low=0, high=3, size=(100,))})

And let's say I write my own function to compute the quintile of each element in an array. I have my own function for this, but for example, just refer to scipy.stats.mstats.mquantile.

import scipy.stats as st

def mark_quintiles(x, breakpoints):

# Assume this is filled in, using st.mstats.mquantiles.

# This returns an array the same shape as x, with an integer for which

# breakpoint-bucket that entry of x falls into.

Now, the real question is how to use transform to add a new column to the data. Something like this:

def transformXtiles(dataFrame, inputColumnName, newColumnName, breaks):

dataFrame[newColumnName] = mark_quintiles(dataFrame[inputColumnName].values,

breaks)

return dataFrame

And then:

dfrm.groupby("C").transform(lambda x: transformXtiles(x, "A", "A_xtile", [0.2, 0.4, 0.6, 0.8, 1.0]))

The problem is that the above code will not add the new column "A_xtile". It just returns my data frame unchanged. If I first add a column full of dummy values, like NaN, called "A_xtile", then it does successfully over-write this column to include the correct quintile markings.

But it is extremely inconvenient to have to first write in the column for anything like this that I may want to add on the fly.

Note that a simple "apply" will not work here, since it won't know how to make sense of the possibly differently-sized result arrays for each group.