8.1. Pandas Built-in Data Visualization#
Pandas offers a simple, high-level interface for creating plots directly from Series and DataFrame objects. It builds on Matplotlib behind the scenes, allowing you to make quick, readable plots with calls like df.plot() or Series.plot(), while still retaining access to Matplotlib’s full range of customization options.
Pandas provides a high-level, convenient API for creating plots directly from Series and DataFrame objects. Under the hood it uses matplotlib, so you get simple, readable calls (df.plot or Series.plot) while still having access to the full power of matplotlib for fine-grained customization.
In this section you will learn:
plotting*: Use
df.plot(kind=...)
or the shorthand methods such asdf.plot.line
,df.plot.hist
,df.plot.scatter
,df.plot.box
, etc. to create plots.
custom parameters: Most plotting calls accept Matplotlib keyword arguments such as the ones below and return a Matplotlib Axes object for further customization:
figsize
lw
alpha
title
xlabel
ylabel
style sheets: Style the plots to look globally with
plt.style.use(...)
.color & size:
For scatter plots you can color by a column (
c
) orsize by an array (
s=df['col']*scale
) anduse
cmap
for colormaps.
fine-tuning: For large or dense data, consider hexbin or KDE plots instead of scatter.
Note: Jupyter Notebooks enable inline plotting (no pop-up) with %matplotlib inline
, which is an IPython magic command that tells Jupyter Notebook to render Matplotlib plots directly inside the notebook output cell instead of in a separate window. You don’t need %matplotlib inline in Jupyter environment with IPython >= 7.
This common plot types and practical tips on how to produce exploratory plots and figures in Pandas include:
Plot Type |
Command Example |
Usage Example |
---|---|---|
Area |
|
Show cumulative totals or overlapping trends over time |
Bar / Stacked Bar |
|
Compare categorical data; stacked shows parts of a whole |
Histogram |
|
Display frequency distribution of numeric data |
Line |
|
Visualize trends or changes over time |
Scatter |
|
Explore relationships or correlations between two variables |
Box |
|
Summarize data distribution and detect outliers |
Hexbin |
|
Visualize density of points in large scatter datasets |
KDE / Density |
|
Estimate the probability density function of a variable |
Pie |
|
Show proportions or percentage breakdowns of a whole |
Later, when you learn Matplotlib, you will see why these methods of plotting are a lot easier to use. Pandas visualization balances ease of use with control over the figure. A lot of the plot calls also accept additional arguments of their parent matplotlib.plt call.
%pip install pandas numpy --quiet ### ensure pandas and numpy are installed; uncomment when done
%pip install matplotlib --quiet ### ensure matplotlib is installed; uncomment when done
import numpy as np
import pandas as pd
# %pip install matplotlib --quiet ### ensure matplotlib is installed; uncomment when done
import matplotlib.pyplot as plt ### because this is still matplotlib
# %matplotlib inline
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
8.1.1. Loading Data#
There are some fake data csv files you can read in as dataframes:
df1 = pd.read_csv('../../data/df1',index_col=0)
df2 = pd.read_csv('../../data/df2')
### what does the data look like?
df1.head()
A | B | C | D | |
---|---|---|---|---|
2000-01-01 | 1.339091 | -0.163643 | -0.646443 | 1.041233 |
2000-01-02 | -0.774984 | 0.137034 | -0.882716 | -2.253382 |
2000-01-03 | -0.921037 | -0.482943 | -0.417100 | 0.478638 |
2000-01-04 | -1.738808 | -0.072973 | 0.056517 | 0.015085 |
2000-01-05 | -0.905980 | 1.778576 | 0.381918 | 0.291436 |
### descriptive statistics
df1.describe()
A | B | C | D | |
---|---|---|---|---|
count | 1000.000000 | 1000.000000 | 1000.000000 | 1000.000000 |
mean | -0.017755 | 0.048072 | -0.001723 | 0.002432 |
std | 0.957223 | 1.004197 | 0.982384 | 1.066366 |
min | -3.693201 | -2.719020 | -2.987971 | -3.182746 |
25% | -0.639101 | -0.652530 | -0.690831 | -0.676107 |
50% | -0.017793 | 0.058035 | -0.012805 | -0.044868 |
75% | 0.623478 | 0.696946 | 0.706496 | 0.721699 |
max | 3.412236 | 3.199850 | 3.342484 | 2.879793 |
df2.head()
a | b | c | d | |
---|---|---|---|---|
0 | 0.039762 | 0.218517 | 0.103423 | 0.957904 |
1 | 0.937288 | 0.041567 | 0.899125 | 0.977680 |
2 | 0.780504 | 0.008948 | 0.557808 | 0.797510 |
3 | 0.672717 | 0.247870 | 0.264071 | 0.444358 |
4 | 0.053829 | 0.520124 | 0.552264 | 0.190008 |
df2.describe()
a | b | c | d | |
---|---|---|---|---|
count | 10.000000 | 10.000000 | 10.000000 | 10.000000 |
mean | 0.460880 | 0.352935 | 0.587008 | 0.631597 |
std | 0.340793 | 0.301272 | 0.284332 | 0.258158 |
min | 0.039762 | 0.008948 | 0.103423 | 0.190008 |
25% | 0.212334 | 0.179302 | 0.427949 | 0.457694 |
50% | 0.371366 | 0.240298 | 0.555036 | 0.584144 |
75% | 0.753558 | 0.515799 | 0.873619 | 0.837267 |
max | 0.937288 | 0.997075 | 0.907307 | 0.977680 |
8.1.2. Style Sheets#
Pandas by default pulls colors from Matplotlib’s axes.prop_cycle, a Matplotlib rcParam (runtime configuration parameter), which is a color iterator (cycler) that cycles through a list of predefined colors. That’s why you may see different colors (by default 3) when you plot multiple lines.
Matplotlib has style sheets (or themes) you can use to make your plots look a little nicer. Popular stylesheets include:
plot_bmh (Bayesian Methods for Hackers)
plot_fivethirtyeight (FiveThirtyEight is a news site)
plot_ggplot (R’s ggplot2 default theme)
The style sheet, or themes, basically create a set of style rules that your plots follow. The use of stylesheet gives your plots a unified look and feel more professional. You can even create your own stylesheet.
Before plt.style.use():
df1['A'].hist()
<Axes: >

Call the stylesheet. Let’s use the ggplot theme:
plt.style.use('ggplot')
After applying plt.style.use(ggplot):
df1['A'].hist()
<Axes: >

Now try plt.style.use(bmh):
plt.style.use('bmh')
df1['A'].hist()
<Axes: >

fivethirtyeight
plt.style.use('fivethirtyeight')
df1['A'].hist()
<Axes: >

A dark background theme:
plt.style.use('dark_background')
df1['A'].hist()
<Axes: >

Let’s stick with the ggplot style for now.
plt.style.use('ggplot')
# plt.style.use('bmh')
8.1.3. Plot Types#
There are several plot types built-in to Pandas, most of them statistical plots by nature:
df.plot.area
df.plot.bar & df.plot.barh
df.plot.density
df.plot.hist
df.plot.line
df.plot.scatter
df.plot.bar
df.plot.box
df.plot.hexbin
df.plot.kde
df.plot.pie
Let’s call these plot type methods (the key terms shown in the list above, e.g. ‘box’, ‘barh’, etc) to see how they work.
### remember the dataframe
df2.head()
a | b | c | d | |
---|---|---|---|---|
0 | 0.039762 | 0.218517 | 0.103423 | 0.957904 |
1 | 0.937288 | 0.041567 | 0.899125 | 0.977680 |
2 | 0.780504 | 0.008948 | 0.557808 | 0.797510 |
3 | 0.672717 | 0.247870 | 0.264071 | 0.444358 |
4 | 0.053829 | 0.520124 | 0.552264 | 0.190008 |
8.1.3.1. Area Plot#
df2.plot.area(alpha=0.4)
<Axes: >

8.1.3.2. Bar Plots#
Bar plots are one of the most common ways to compare categorical data visually. They represent quantities as rectangular bars whose length (or height) corresponds to the value being measured. In Python, these are typically created using Matplotlib or Seaborn.
Both bar() and barh() create bar charts — the difference is simply orientation. Stacked bar charts, on the other hand, how multiple subcategories contribute to a total within each main category.
Plot Type |
Orientation |
Purpose |
Best For |
---|---|---|---|
Bar Plot |
Vertical |
Compare category values |
Simple category comparisons |
Barh Plot |
Horizontal |
Compare category values with long labels |
Readability and ranking-type data |
Stacked Bar |
Either |
Show part-to-whole relationships |
Composition of totals across categories |
8.1.3.2.1. Vertical Bar Plot#
### bar plot
df2.plot.bar()
<Axes: >

8.1.3.2.2. Horizontal Bar Plot#
### bar plot horizontal
df2.plot.barh()
<Axes: >

### bar plot stacked
df2.plot.bar(stacked=True)
<Axes: >

8.1.3.3. Histograms#
df1['A'].plot.hist(bins=100)
<Axes: ylabel='Frequency'>

8.1.3.4. Line Plots#
# df1.plot.line(x=df1.index, y='B',figsize=(12,3),lw=1)
df1.plot.line(y='B',figsize=(10,6),lw=5)
<Axes: >

8.1.3.5. Scatter Plots#
df1.plot.scatter(x='A',y='B')
<Axes: xlabel='A', ylabel='B'>

8.1.4. Color Maps#
You can use c to color based off another column value.
Use cmap to indicate colormap to use.
For all the colormaps, check out: http://matplotlib.org/users/colormaps.html
### the color of each point corresponds to the values in column C
### try them out
# df1.plot.scatter(x=df1['A'], y=df1['B'], c=df1['C'], cmap='coolwarm') ### need to use x='A' instead of x=df1['A']
df1.plot.scatter(x='A', y='B', c='C', cmap='coolwarm') ### color value from 'C'; colormap
df1.plot.scatter(x='A', y='B', c='C', cmap='viridis') ### another colormap
<Axes: xlabel='A', ylabel='B'>


8.1.5. Size#
Or use s to indicate size based off another column. s parameter needs to be an array, not just the name of a column:
### try different sizes
# df1.plot.scatter(x='A', y='B', s=df1['C']*200)
df1.plot.scatter(x='A',y='B',s=df1['C']*10)
/Users/tychen/workspace/dsm/.venv/lib/python3.13/site-packages/matplotlib/collections.py:999: RuntimeWarning: invalid value encountered in sqrt
scale = np.sqrt(self._sizes) * dpi / 72.0 * self._factor
<Axes: xlabel='A', ylabel='B'>

8.1.5.1. BoxPlots#
df2.plot.box() # Can also pass a by= argument for groupby
<Axes: >

8.1.5.2. Hexagonal Bin Plot#
Useful for Bivariate Data, alternative to scatterplot:
np.random.seed(42)
df = pd.DataFrame(np.random.randn(1000, 2), columns=['a', 'b'])
df.plot.hexbin(x='a', y='b', gridsize=25, cmap='Oranges')
<Axes: xlabel='a', ylabel='b'>

8.1.5.3. Kernel Density Estimation plot (KDE)#
A smooth version of histogram
%pip install scipy ### ensure scipy is installed; uncomment when done
df2['a'].plot.kde()
Collecting scipy
Using cached scipy-1.16.2-cp313-cp313-macosx_14_0_arm64.whl.metadata (62 kB)
Requirement already satisfied: numpy<2.6,>=1.25.2 in /Users/tychen/workspace/dsm/.venv/lib/python3.13/site-packages (from scipy) (2.3.3)
Using cached scipy-1.16.2-cp313-cp313-macosx_14_0_arm64.whl (20.9 MB)
Installing collected packages: scipy
Successfully installed scipy-1.16.2
Note: you may need to restart the kernel to use updated packages.
<Axes: ylabel='Density'>

8.1.5.4. Density Plot#
df2.plot.density()
Requirement already satisfied: scipy in /Users/tcn85/workspace/dsm/.venv/lib/python3.13/site-packages (1.16.2)
Requirement already satisfied: numpy<2.6,>=1.25.2 in /Users/tcn85/workspace/dsm/.venv/lib/python3.13/site-packages (from scipy) (2.3.3)
Note: you may need to restart the kernel to use updated packages.
<Axes: ylabel='Density'>

Using density plot as an example of how Pandas visualization differs from Matplotlib and Seaborn:
Library |
Function |
Notes |
---|---|---|
Seaborn |
|
most common, easy to use |
Pandas |
|
convenient for quick plots |
Matplotlib + SciPy |
|
manual control over details |
The end.