Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

My data can have multiple events on a given date or NO events on a date. I take these events, get a count by date and plot them. However, when I plot them, my two series don't always match.

idx = pd.date_range(df['simpleDate'].min(), df['simpleDate'].max())

s = df.groupby(['simpleDate']).size()

In the above code idx becomes a range of say 30 dates. 09-01-2013 to 09-30-2013 However S may only have 25 or 26 days because no events happened for a given date. I then get an AssertionError as the sizes dont match when I try to plot:

fig, ax = plt.subplots()    

ax.bar(idx.to_pydatetime(), s, color='green')

What's the proper way to tackle this? Do I want to remove dates with no values from IDX or (which I'd rather do) is add to the series the missing date with a count of 0. I'd rather have a full graph of 30 days with 0 values. If this approach is right, any suggestions on how to get started? Do I need some sort of dynamic reindex function?

Here's a snippet of S ( df.groupby(['simpleDate']).size() ), notice no entries for 04 and 05.

09-02-2013     2

09-03-2013    10

09-06-2013     5

09-07-2013     1

1 Answer

0 votes
by (41.4k points)

 Using Series

 

.

rei

ndex:

im

port pandas as pd

idx = pd.date_range('09-01-2013', '09-30-2013')

s = pd.Series({'09-02-2013': 2,

               '09-03-2013': 10,

               '09-06-2013': 5,

               '09-07-2013': 1})

s.index = pd.DatetimeIndex(s.index)

s = s.reindex(idx, fill_value=0)

print(s)

Gives output:

2013-09-01     0

2013-09-02     2

2013-09-03    10

2013-09-04     0

2013-09-05     0

2013-09-06     5

2013-09-07     1

2013-09-08     0

If you want to learn more about Pandas then visit this Python Course designed by the industrial experts.

 

 

Related questions

0 votes
1 answer
0 votes
1 answer

Browse Categories

...