0 votes
1 view
in Data Science by (17.6k points)

Working in Jupyter with Pandas DataSeries I have a dataset with rows like this:

color: white

engineType: diesel

make: Ford

manufacturingYear: 2004

accidentCount: 123

What I need to do is to plot charts of accident counts (y-axis) by manufacturing year (x-axis) for all permutations of color/engineType/make. Any ideas how to proceed with this?

To speed things up I have this initial setup:

import numpy as np

import pandas as pd

from pandas import DataFrame, Series

import random

colors = ['white', 'black','silver']

engineTypes = ['diesel', 'petrol']

makes = ['ford', 'mazda', 'subaru']

years = range(2000,2005)

rowCount = 100

def randomEl(data):

    rand_items = [data[random.randrange(len(data))] for item in range(rowCount)]

    return rand_items

df = DataFrame({

    'color': Series(randomEl(colors)),

    'engineType': Series(randomEl(engineTypes)),

    'make': Series(randomEl(makes)),

    'year': Series(randomEl(years)),

    'accidents': Series([int(1000*random.random()) for i in range(rowCount)])


1 Answer

0 votes
by (38.5k points)

We will use groupby() so that we can count the number of accidents by unique combinations of ‘color’,’make’ and ‘engineType’

So, below is the query

accident_counts = df.groupby(['color', 'engineType', 'make'])['accidents'].sum()

And for plotting the results, we will use Matplotlib

import matplotlib.pyplot as plt



If you wish to learn about Data Science visit this Data Science Online Course.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !