Explore Courses Blog Tutorials Interview Questions
0 votes
in Data Science by (17.6k points)

Working in Jupyter with Pandas DataSeries I have a dataset with rows like this:

color: white

engineType: diesel

make: Ford

manufacturingYear: 2004

accidentCount: 123

What I need to do is to plot charts of accident counts (y-axis) by manufacturing year (x-axis) for all permutations of color/engineType/make. Any ideas how to proceed with this?

To speed things up I have this initial setup:

import numpy as np

import pandas as pd

from pandas import DataFrame, Series

import random

colors = ['white', 'black','silver']

engineTypes = ['diesel', 'petrol']

makes = ['ford', 'mazda', 'subaru']

years = range(2000,2005)

rowCount = 100

def randomEl(data):

    rand_items = [data[random.randrange(len(data))] for item in range(rowCount)]

    return rand_items

df = DataFrame({

    'color': Series(randomEl(colors)),

    'engineType': Series(randomEl(engineTypes)),

    'make': Series(randomEl(makes)),

    'year': Series(randomEl(years)),

    'accidents': Series([int(1000*random.random()) for i in range(rowCount)])


1 Answer

0 votes
by (41.4k points)

We will use groupby() so that we can count the number of accidents by unique combinations of ‘color’,’make’ and ‘engineType’

So, below is the query

accident_counts = df.groupby(['color', 'engineType', 'make'])['accidents'].sum()

And for plotting the results, we will use Matplotlib

import matplotlib.pyplot as plt


If you wish to learn about Data Science visit this Data Science Online Course.

Browse Categories