0 votes
2 views

Working in Jupyter with Pandas DataSeries I have a dataset with rows like this:

color: white

engineType: diesel

make: Ford

manufacturingYear: 2004

accidentCount: 123

What I need to do is to plot charts of accident counts (y-axis) by manufacturing year (x-axis) for all permutations of color/engineType/make. Any ideas how to proceed with this?

To speed things up I have this initial setup:

import numpy as np

import pandas as pd

from pandas import DataFrame, Series

import random

colors = ['white', 'black','silver']

engineTypes = ['diesel', 'petrol']

makes = ['ford', 'mazda', 'subaru']

years = range(2000,2005)

rowCount = 100

def randomEl(data):

rand_items = [data[random.randrange(len(data))] for item in range(rowCount)]

return rand_items

df = DataFrame({

'color': Series(randomEl(colors)),

'engineType': Series(randomEl(engineTypes)),

'make': Series(randomEl(makes)),

'year': Series(randomEl(years)),

'accidents': Series([int(1000*random.random()) for i in range(rowCount)])

})

## 1 Answer

0 votes
by (41.4k points)

We will use groupby() so that we can count the number of accidents by unique combinations of ‘color’,’make’ and ‘engineType’

So, below is the query

accident_counts = df.groupby(['color', 'engineType', 'make'])['accidents'].sum()

And for plotting the results, we will use Matplotlib

import matplotlib.pyplot as plt

accident_counts.plot(kind='bar')

plt.show()

If you wish to learn about Data Science visit this Data Science Online Course.

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
+1 vote
2 answers
0 votes
1 answer