Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (50.2k points)

I'm trying to merge a (Pandas 14.1) dataframe and a series. The series should form a new column, with some NAs (since the index values of the series are a subset of the index values of the dataframe).

This works for a toy example, but not with my data (detailed below).

Example:

import pandas as pd

import numpy as np

df1 = pd.DataFrame(np.random.randn(6, 4), columns=['A', 'B', 'C', 'D'], index=pd.date_range('1/1/2011', periods=6, freq='D'))

df1

A   B C   D

2011-01-01  -0.487926 0.439190    0.194810 0.333896

2011-01-02  1.708024 0.237587    -0.958100 1.418285

2011-01-03  -1.228805 1.266068    -1.755050 -1.476395

2011-01-04  -0.554705 1.342504    0.245934 0.955521

2011-01-05  -0.351260 -0.798270   0.820535 -0.597322

2011-01-06  0.132924 0.501027    -1.139487 1.107873

s1 = pd.Series(np.random.randn(3), name='foo', index=pd.date_range('1/1/2011', periods=3, freq='2D'))

s1

2011-01-01   -1.660578

2011-01-03   -0.209688

2011-01-05    0.546146

Freq: 2D, Name: foo, dtype: float64

pd.concat([df1, s1],axis=1)

A   B C   D foo

2011-01-01  -0.487926 0.439190    0.194810 0.333896 -1.660578

2011-01-02  1.708024 0.237587    -0.958100 1.418285 NaN

2011-01-03  -1.228805 1.266068    -1.755050 -1.476395 -0.209688

2011-01-04  -0.554705 1.342504    0.245934 0.955521 NaN

2011-01-05  -0.351260 -0.798270   0.820535 -0.597322 0.546146

2011-01-06  0.132924 0.501027    -1.139487 1.107873 NaN

The situation with the data (see below) seems identical - concatting a series with a DatetimeIndex whose values are a subset of the dataframe's. But it gives the ValueError in the title (blah1 = (5, 286) blah2 = (5, 276) ). Why doesn't it work?:

In[187]: df.head()

Out[188]:

high    low loc_h   loc_l

time                

2014-01-01 17:00:00 1.376235    1.375945 1.376235 1.375945

2014-01-01 17:01:00 1.376005    1.375775 NaN NaN

2014-01-01 17:02:00 1.375795    1.375445 NaN 1.375445

2014-01-01 17:03:00 1.375625    1.375515 NaN NaN

2014-01-01 17:04:00 1.375585    1.375585 NaN NaN

In [186]: df.index

Out[186]:

<class 'pandas.tseries.index.DatetimeIndex'>

[2014-01-01 17:00:00, ..., 2014-01-01 21:30:00]

Length: 271, Freq: None, Timezone: None

In [189]: hl.head()

Out[189]:

2014-01-01 17:00:00    1.376090

2014-01-01 17:02:00    1.375445

2014-01-01 17:05:00    1.376195

2014-01-01 17:10:00    1.375385

2014-01-01 17:12:00    1.376115

dtype: float64

In [187]:hl.index

Out[187]:

<class 'pandas.tseries.index.DatetimeIndex'>

[2014-01-01 17:00:00, ..., 2014-01-01 21:30:00]

Length: 89, Freq: None, Timezone: None

In: pd.concat([df, hl], axis=1)

Out: [stack trace] ValueError: Shape of passed values is (5, 286), indices imply (5, 276)

1 Answer

0 votes
by (108k points)

In this problem, join worked, but concat failed.

You can check for duplicate index values in df1 and s1, (e.g. df1.index.is_unique)

Removing duplicate index values (e.g., df.drop_duplicates(inplace=True))

If you are interested to learn Pandas visit this Python Pandas Tutorial. 

Browse Categories

...