pandas concat generates nan values

Question

asked Oct 5, 2019 in Data Science by ashely (50.2k points)

I am curious why a simple concatenation of two data frames in pandas:

shape: (66441, 1)
dtypes: prediction int64
dtype: object
isnull().sum(): prediction 0
dtype: int64
shape: (66441, 1)
CUSTOMER_ID int64
dtype: object
isnull().sum() CUSTOMER_ID 0
dtype: int64

of the same shape and both without NaN values

foo = pd.concat([initId, ypred], join='outer', axis=1)
print(foo.shape)
print(foo.isnull().sum())

can result in a lot of NaN values if joined.

(83384, 2)
CUSTOMER_ID 16943
prediction 16943

How can I fix this problem and prevent NaN values from being introduced?

Trying to reproduce it like

aaa = pd.DataFrame([0,1,0,1,0,0], columns=['prediction'])
print(aaa)
bbb = pd.DataFrame([0,0,1,0,1,1], columns=['groundTruth'])
print(bbb)
pd.concat([aaa, bbb], axis=1)

failed e.g. worked just fine as no NaN values were introduced.

1 Answer

vinita · Answer 1 · 2019-10-05T07:08:44+0000

There is problem with different index values, so where concat is not able to align is getting the NaN:

aaa = pd.DataFrame([0,1,0,1,0,0], columns=['prediction'], index=[4,5,8,7,10,12])
print(aaa)
    prediction
4 0
5 1
8 0
7 1
10 0
12 0
bbb = pd.DataFrame([0,0,1,0,1,1], columns=['groundTruth'])
print(bbb)
   groundTruth
0 0
1 0
2 1
3 0
4 1
5 1
print (pd.concat([aaa, bbb], axis=1))
    prediction groundTruth
0 NaN 0.0
1 NaN 0.0
2 NaN 1.0
3 NaN 0.0
4 0.0 1.0
5 1.0 1.0
7 1.0 NaN
8 0.0 NaN
10 0.0 NaN
12 0.0 NaN

So, the solution for this is reset_index if indexes values are not necessary:

aaa.reset_index(drop=True, inplace=True)
bbb.reset_index(drop=True, inplace=True)
print(aaa)
   prediction
0 0
1 1
2 0
3 1
4 0
5 0
print(bbb)
   groundTruth
0 0
1 0
2 1
3 0
4 1
5 1
print (pd.concat([aaa, bbb], axis=1))
   prediction groundTruth
0 0 0
1 1 0
2 0 1
3 1 0
4 0 1
5 0 1

If you are interested in learning Pandas and want to become an expert in Python Programming, then check out this Python Course and upskill yourself.

@vinita I tries this but still it gives me nan values in one of the columns. Here is my code:

ohe = OneHotEncoder(handle_unknown = 'ignore', sparse = False)
train_x_encoded = pd.DataFrame(ohe.fit_transform(train_x[['model', '
vehicleType', 'brand']]))
train_x_encoded.columns = ohe.get_feature_names(['model', 'vehicleType',
'brand'])
train_x.drop(['model', 'vehicleType', 'brand'], axis = 1, inplace = True)
train_x = train_x.reset_index(drop = True)
train_x_encoded = train_x_encoded.reset_index(drop = True)
train_x_final = pd.concat([train_x_encoded, train_x], axis = 1) — brollyy, Jul 18, 2021

pandas concat generates nan values

1 Answer

Related questions

Browse Categories