making matplotlib scatter plots from dataframes in Python's pandas

Question

asked Sep 27, 2019 in Python by Sammy (47.6k points)

What is the best way to make a series of scatter plots using matplotlib from a pandas dataframe in Python?

For example, if I have a dataframe df that has some columns of interest, I find myself typically converting everything to arrays:

import matplotlib.pylab as plt
# df is a DataFrame: fetch col1 and col2
# and drop na rows if any of the columns are NA
mydata = df[["col1", "col2"]].dropna(how="any")
# Now plot with matplotlib
vals = mydata.values
plt.scatter(vals[:, 0], vals[:, 1])

The problem with converting everything to an array before plotting is that it forces you to break out of data frames.

Consider these two use cases where having full dataframe is essential to plotting:

For example, what if you wanted to now look at all the values of col3 for the corresponding values that you plotted in the call to scatter, and colour each point (or size) it by that value? You'd have to go back, pull out the non-NA values of col1,col2 and check what their corresponding values.
Is there a way to plot while preserving the dataframe? For example:

mydata=df.dropna(how="any",subset=["col1","col2"])
#plot a scatter of col1 by col2, with sizes according to col3
scatter(mydata(["col1", "col2"]), s=mydata["col3"])

Similarly, imagine that you wanted to filter or colour each point differently depending on the values of some of its columns. E.g. what if you wanted to automatically plot the labels of the points that meet a certain cutoff on col1, col2 alongside them (where the labels are stored in another column of the df), or colour these points differently, like people do with data frames in R. For example:

mydata = df.dropna(how="any", subset=["col1", "col2"])
myscatter = scatter(mydata[["col1", "col2"]], s=1)
# Plot in red, with smaller size, all the points that
# have a col2 value greater than 0.5
myscatter.replot(mydata["col2"] > 0.5, color="red", s=0.5)

How can this be done?

1 Answer

Vishal · Answer 1 · 2019-09-27T08:54:50+0000

To make matplotlib scatter plots from data frame in Python’s pandas you can use the scatter method.

Code is as follows:-

df = pd.DataFrame(np.random.randn(10,2),columns=['col1','col2'])
df['col3'] = np.arange(len(df))**2 * 100 + 100
df.plot.scatter('col1', 'col2', df['col3'])

To know more about this you can have a look at the following video tutorial:-

making matplotlib scatter plots from dataframes in Python's pandas

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources