8.2. Matplotlib Overview#
Section worksheet on Colab
Section exercise - Questions
8.2.1. Introduction#
Matplotlib is the foundational plotting library in the Python ecosystem, originally developed by John D. Hunter to provide MATLAB-like plotting capabilities in Python. It offers a rich, programmatic interface for producing high-quality 2D graphics (and basic 3D plots via extension modules). For a history and introduction of Matplotlib, see matplotlib.org ’s History section.
Matplotlib sits at a low level in the visualization stack:
Many higher-level libraries (for example, Pandas’ plotting methods and Seaborn) build on top of it.
It interoperates smoothly with NumPy arrays and, therefore, the scientific Python toolchain.
Matplotlib can be used either via the stateful pyplot API (the
plot()
fuction that creates both the Figure and Axes automatically) for quick interactive work, or viaThe object-oriented API for precise control in complex figures (the
plt.figure()
function that creates and returns a new Figure object).
For data scientists, Matplotlib is valuable both as a first plotting tool to learn and as a powerful engine for fine-grained customization when presentation quality and exact control are required. Overall, it is an excellent 2D and 3D graphics library for generating scientific figures.
Some of the major advantages of Matplotlib are:
Generally easy to get started for simple plots
Support for custom labels and texts
Great control of every element in a figure
High-quality output in many formats
Very customizable in general
Matplotlib lets you create and customize quality figures programmatically. This notebook will introduce the core APIs and workflows. Before continuing, review the official Matplotlib documentation and gallery for tutorials and practical examples: https://matplotlib.org/ (see the Tutorials and Examples ).
Pandas .plot() is a convenience wrapper around plt.plot() for quick DataFrame visualization. Here we are using the plot function from Matplotlib. The differences between Pandas .plot()
and Matplotlib plt.plot()
can be summarized as:
Feature |
|
Pandas |
---|---|---|
Typical use |
Fine-grained, figure-level control |
Fast EDA from DataFrame/Series |
Built on |
Core matplotlib |
Wrapper around matplotlib |
Customization |
Full control |
Limited, but can access underlying axes |
Data Input |
Expects x/y arrays |
Pandas columns/index |
8.2.1.1. Installation#
You’ll need to install matplotlib first with either using pip
in the terminal:
pip install matplotlib ### in terminal with .venv enabled
or using %pip
in the notebook:
%pip install matplotlib ### in Jupyter Notebook
Import the matplotlib.pyplot
module under the name plt
:
import matplotlib.pyplot as plt
To render visualization outside of a Jupyter notebook (e.g., VS Code), you usually need to use the command plt.show()
. Also, if you are using a Jupyter (IPython) version before version 7, you may need to add the %matplotlib inline
line to render plots in the notebook cell.
You can check out the version of IPython by:
### remember !pip vs %pip? what is the difference?
!jupyter --version
Selected Jupyter core packages...
IPython : 9.6.0
ipykernel : 6.30.1
ipywidgets : not installed
jupyter_client : 8.6.3
jupyter_core : 5.8.1
jupyter_server : 2.17.0
jupyterlab : 4.4.9
nbclient : 0.10.2
nbconvert : 7.16.6
nbformat : 5.10.4
notebook : 7.4.7
qtconsole : not installed
traitlets : 5.14.3
### as a bonus, to check if we need %matplotlib inline programmatically
j = !jupyter --version
print(type(j), j)
jv = int(j[1].split(':')[1].strip().split('.')[0])
if jv < 7:
%matplotlib inline
<class 'IPython.utils.text.SList'> ['Selected Jupyter core packages...', 'IPython : 9.6.0', 'ipykernel : 6.30.1', 'ipywidgets : not installed', 'jupyter_client : 8.6.3', 'jupyter_core : 5.8.1', 'jupyter_server : 2.17.0', 'jupyterlab : 4.4.9', 'nbclient : 0.10.2', 'nbconvert : 7.16.6', 'nbformat : 5.10.4', 'notebook : 7.4.7', 'qtconsole : not installed', 'traitlets : 5.14.3']
That line is only for jupyter notebooks, if you are using another editor, you’ll use: plt.show() at the end of all your plotting commands to have the figure pop up in another window.
8.2.2. Plotting with plt.plot( )#
Let’s walk through a simple example with two NumPy arrays. Lists work too, but you’ll usually pass NumPy arrays or Pandas columns, which behave like arrays. Here we use:
The plt.plot() function to create plots.
Other plt functions (such as xlabel, ylabel, and title) to customize the plots.
8.2.2.1. Basic Matplotlib Commands#
Task |
Matplotlib plt.plot() |
Pandas df.plot() |
Notes |
---|---|---|---|
To plot |
|
|
Uses DataFrame/Series directly; x defaults to index if omitted. |
X label |
|
|
|
Y label |
|
|
Same pattern as xlabel. |
Title |
|
|
Pandas supports |
8.2.2.2. The Data#
The data we want to plot:
import numpy as np
x = np.linspace(0, 5, 11) ### 11 numbers from 0 to 5, inclusive
y = x ** 2
x
array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])
y
array([ 0. , 0.25, 1. , 2.25, 4. , 6.25, 9. , 12.25, 16. ,
20.25, 25. ])
The following code produces a basic line plot. As you work, use Shift+Tab in Jupyter to view each function’s inline documentation (docstrings).
8.2.2.3. plt.plot( ) vs. Pandas df.plot( )#
passing arrays as data, unlike Pandas
# plt.plot(x, y, 'r') ### 'r' means color red
# plt.plot(x, y, c='r') ### same as above, we saw this in Pandas plot
plt.plot(x, y, c='red') ### same as above, we saw this in Pandas plot
plt.xlabel('X Axis Title') ### in Pandas plot, this is automatic from column names
plt.ylabel('Y Axis Title') ### in Pandas plot, this is automatic from column names
plt.title('String Title') ### in Pandas plot, this is missing and needs to be set manually
# plt.show()
Text(0.5, 1.0, 'String Title')

The same plot could be done in Pandas plot(), and it’s even easier:
import pandas as pd
df = pd.DataFrame({'x': x, 'y': y}) ### DataFrame with two columns
# df.plot.line(x='x', y='y') ### line plot is default in Pandas; axes is automatically created
# df.plot(x='x', y='y') ### same as above
df.plot('x', 'y', xlabel='X Axis Title', ylabel='Y Axis Title', title='String Title')
### this could be surprising, Pandas plot() uses all numeric columns by default
df.plot()
<Axes: >


8.2.3. Multiple Plots in One Canvas#
We use the subplot()
(NOT subplots()
later) function in pyplot to create multiple plots in one figure. The syntax is:
plt.subplot(nrows, ncols, plot_number)
### plt.subplot(nrows, ncols, plot_number) ### when you want multiple plots in one figure
### order is important, plot after subplot
plt.subplot(1, 2, 1) ### subplot 1 of 1 row, 2 columns
plt.plot(x, y, 'r--') ### 'r--' means red dashed line
plt.subplot(1, 2, 2) ### subplot 2 of 1 row, 2 columns
plt.plot(y, x, 'g*-'); ### 'g*-' means green line with star markers

8.2.4. Matplotlib Object Oriented Method#
Now that we’ve seen the basics, let’s break it all down with a more formal introduction of Matplotlib’s Object Oriented API. This means we will instantiate figure objects and then call methods or attributes from that object.
Now that we’ve covered the basics, let’s move to Matplotlib’s object-oriented API: we’ll create Figure and Axes objects and control the plot by calling methods on those objects.
8.2.4.1. The Object Oriented Method#
With Matplotlib’s object-oriented approach, you create a Figure and one or more Axes, then control the plot by calling methods on those objects. This shines when you need multiple plots or precise layout on a single canvas.
To start, create a figure and axes (the plot object):
### just an explanation on figsize
### the default size of the figure is 640*480 pixels
### (width * height 8 by 6 inches at 80 dpi)
fig = plt.figure() ### create a figure object
<Figure size 640x480 with 0 Axes>
With the OOP approach, we gain control of the size and position of axes. Here we will
create the figure object first, then
Add axes to the figure
Use plot() to draw the plot
Also, we are also seeing a set of methods here to customize the axes such as:
set_xlabel
set_ylabel
set_title
### creating axes based on figure
fig = plt.figure() ### create empty canvas on the figure (object)
axes = fig.add_axes([0.1, 0.1, 0.8, 0.8]) ### left, bottom, width, height (range 0 to 1)
### left = 0.1 => start 10% in from the figure’s left edge
### bottom = 0.1 => start 10% up from the bottom
### width = 0.8 => span 80% of the figure’s width
### height = 0.8 => span 80% of the figure’s height
### 1. creates blank canvas fig
### 2. add axes to figure
### 3. plot on the axes

While slightly more involved in OO approach, this approach gives us precise control over where the plot axes are placed and makes it straightforward to add multiple axes to a single figure.
### 1. creates blank canvas fig
### 2. add axes to figure
### 3. plot on the axes
fig = plt.figure()
### add two axes to the figure
axes1 = fig.add_axes([0.1, 0.1, 0.8, 0.8]) ### main axes
axes2 = fig.add_axes([0.2, 0.5, 0.4, 0.3]) ### inset axes
### larger figure Axes 1
axes1.plot(x, y, 'b')
axes1.set_xlabel('X_label_axes1')
axes1.set_ylabel('Y_label_axes1')
axes1.set_title('Axes 1 Title')
### insert figure Axes 2
axes2.plot(y, x, 'r')
axes2.set_xlabel('X_label_axes2')
axes2.set_ylabel('Y_label_axes2')
axes2.set_title('Axes 2 Title')
Text(0.5, 1.0, 'Axes 2 Title')

8.2.5. subplots( )#
The plt.subplots()
object will act as a more automatic axes manager. You create the figure objects and axes in one step, as in comparison with the two-step way of plt.plot(). After that, we plot the axes.
### this cell is an FYI.
### unlike plt.plot(), plt.subplots() returns both a figure and axes object(s)
### the axes object is an array of axes if more than one subplot
# print(type(plt.subplots())) ### tuple of (figure, array of axes)
# plt.subplots()[0] ### figure object
# plt.subplots()[0] ### array of axes object
# plt.subplots()[1]
### use is similar to plt.figure(), except use tuple unpacking to grab fig and axes
fig, axes = plt.subplots() ### assign tuple data to fig and axes
### now use the axes object to add stuff to plot
axes.plot(x, y, 'r')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('title');

With subplots()
, you can specify the number of rows and columns when creating the subplots() object:
### empty canvas of 1 by 2 subplots
### note we have two axes objects here
fig, axes = plt.subplots(nrows=1, ncols=2) ### similar to plt.subplot(1, 2, ith axes)
# fig, axes = plt.subplots(1, 2) ### same as above

# Axes is an array of axes to plot on
axes
array([<Axes: >, <Axes: >], dtype=object)
We can loop through this array:
for ax in axes:
ax.plot(x, y, 'b')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('title')
# Display the figure object
fig

A common issue with matplolib is overlapping subplots or figures. We ca use fig.tight_layout() or plt.tight_layout() method, which automatically adjusts the positions of the axes on the figure canvas so that there is no overlapping content:
fig, axes = plt.subplots(nrows=1, ncols=2)
for ax in axes:
ax.plot(x, y, 'g')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('title')
fig
plt.tight_layout()

8.2.6. Figure Size/Ratio and DPI#
Matplotlib allows the aspect ratio, DPI and figure size to be specified when the Figure object is created. You can use the figsize
and dpi
keyword arguments.
figsize
is a tuple of the width and height of the figure in inchesdpi
is the dots-per-inch (pixel per inch).
For example:
The same arguments can also be passed to layout managers, such as the subplots
function:
fig, axes = plt.subplots(figsize=(12,3))
axes.plot(x, y, 'r')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('title')
Text(0.5, 1.0, 'title')

# fig = plt.figure(figsize=(3,1), dpi=50)
fig = plt.figure(figsize=(3,1), dpi=100)
# fig = plt.figure(figsize=(3,1), dpi=200)
axes = fig.add_axes([0.1, 0.1, 0.8, 0.8])
axes.plot(x, y, 'r')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('title')
Text(0.5, 1.0, 'title')

8.2.7. Saving figures#
Matplotlib can generate high-quality output in a number formats, including PNG, JPG, EPS, SVG, PGF and PDF.
To save a figure to a file we can use the savefig
method in the Figure
class:
fig.savefig("filename.png")
Here we can also optionally specify the DPI and choose between different output formats:
fig.savefig("filename.png", dpi=200)
8.2.8. Legends#
We have been using the methods set_xlabel, set_ylabel, and set_title when creating plots. Now let’s look at another import element in an axes, the legends.
You can use the label=”label text” keyword argument when plots or other objects are added to the figure, and then using the legend method to add the legend to the figure:
x
array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])
fig = plt.figure(figsize=(5,3), dpi=100)
ax = fig.add_axes([0.1, 0.1, 1, 1])
ax.plot(x, x**2, label="x**2")
ax.plot(x, x**3, label="x**3") ### cubed
ax.legend()
<matplotlib.legend.Legend at 0x10e606d50>

Notice how are legend overlaps some of the actual plot!
The legend function takes an optional keyword argument loc that can be used to specify where in the figure the legend is to be drawn. The allowed values of loc are numerical codes for the various places the legend can be drawn. See the documentation page for details. Some of the most common loc values are:
# Lots of options....
### ax.legend(loc=1) # upper right corner
### ax.legend(loc=2) # upper left corner
### ax.legend(loc=3) # lower left corner
### ax.legend(loc=4) # lower right corner
### most common to choose
ax.legend(loc=0) # let matplotlib decide the optimal location
fig

8.2.9. Colors, Linewidths, Linetypes#
Matplotlib gives you a lot of options for customizing colors, linewidths, and linetypes.
8.2.9.1. Colors with MatLab like syntax#
With matplotlib, we can define the colors of lines and other graphical elements in a number of ways. First of all, we can use the MATLAB-like syntax where 'b'
means blue, 'g'
means green, etc. The MATLAB API for selecting line styles are also supported: where, for example, ‘b.-’ means a blue line with dots:
# MATLAB style line color and style
fig, ax = plt.subplots()
ax.plot(x, x**2, 'b.-') # blue line with dots
ax.plot(x, x**3, 'g--') # green dashed line
[<matplotlib.lines.Line2D at 0x10e687d90>]

8.2.9.2. Colors with the color= parameter#
We can also define colors by their names or RGB hex codes and optionally provide an alpha value using the color
and alpha
keyword arguments. Alpha indicates opacity.
fig, ax = plt.subplots(figsize=(5,3), dpi=100)
ax.plot(x, x+1, color="blue", alpha=0.5) ### half-transparent
ax.plot(x, x+2, color="#8B008B") ### RGB hex code
ax.plot(x, x+3, color="#FF8C00") ### RGB hex code
[<matplotlib.lines.Line2D at 0x10e6aec10>]

8.2.9.3. Line and marker styles#
To change the line width, we can use the linewidth
or lw
keyword argument. The line style can be selected using the linestyle
or ls
keyword arguments:
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(x, x+1, color="red", linewidth=0.25)
ax.plot(x, x+2, color="red", linewidth=0.50)
ax.plot(x, x+3, color="red", lw=1.00)
ax.plot(x, x+4, color="red", lw=2.00)
### possible linestype options ‘-‘, ‘–’, ‘-.’, ‘:’, ‘steps’
ax.plot(x, x+5, color="green", lw=3, linestyle='-')
ax.plot(x, x+6, color="green", lw=3, ls='-.')
ax.plot(x, x+7, color="green", lw=3, ls=':')
### custom dash
line, = ax.plot(x, x+8, color="black", lw=1.50)
line.set_dashes([5, 10, 15, 10]) ### format: line length, space length, ...
### possible marker symbols: marker = '+', 'o', '*', 's', ',', '.', '1', '2', '3', '4', ...
ax.plot(x, x+ 9, color="blue", lw=3, ls='-', marker='+')
ax.plot(x, x+10, color="blue", lw=3, ls='--', marker='o')
ax.plot(x, x+11, color="blue", lw=3, ls='-', marker='s')
ax.plot(x, x+12, color="blue", lw=3, ls='--', marker='1')
### marker size and color
ax.plot(x, x+13, color="purple", lw=1, ls='-', marker='o', markersize=2)
ax.plot(x, x+14, color="purple", lw=1, ls='-', marker='o', markersize=4)
ax.plot(x, x+15, color="purple", lw=1, ls='-', marker='o', markersize=8, markerfacecolor="red")
ax.plot(x, x+16, color="purple", lw=1, ls='-', marker='s', markersize=8,
markerfacecolor="yellow", markeredgewidth=3, markeredgecolor="green");

8.2.10. Control over axis appearance#
In this section we will look at controlling axis sizing properties in a matplotlib figure.
8.2.10.1. Plot Range#
We can configure the ranges of the axes using the set_ylim
and set_xlim
methods in the axis object, or axis('tight')
for automatically getting “tightly fitted” axes ranges:
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
axes[0].plot(x, x**2, x, x**3)
axes[0].set_title("default axes ranges")
axes[1].plot(x, x**2, x, x**3)
axes[1].axis('tight')
axes[1].set_title("tight axes")
axes[2].plot(x, x**2, x, x**3)
axes[2].set_ylim([0, 60])
axes[2].set_xlim([2, 5])
axes[2].set_title("custom axes range");

8.2.11. Some Plot Types#
So far, we have used a line plot to introduce Matplotlib. There are many specialized plots we can create, such as barplots, histograms, scatter plots, and much more. Most of these types of plots we will actually create using Seaborn, a statistical plotting library for Python. Here let us take a look at some examples of these types of plots:
plt.scatter(x,y)
<matplotlib.collections.PathCollection at 0x10e992710>

from random import sample
data = sample(range(1, 1000), 100) ### chooses 100 unique random elements from sequence.
plt.hist(data)
(array([ 9., 8., 10., 11., 7., 11., 6., 10., 11., 17.]),
array([ 11. , 107.3, 203.6, 299.9, 396.2, 492.5, 588.8, 685.1, 781.4,
877.7, 974. ]),
<BarContainer object of 10 artists>)

### list comprehension
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
### rectangular box plot
plt.boxplot(data, vert=True, patch_artist=True);
### Box = Q1 to Q3 (IQR = Q3−Q1)
### Line in box = median
### Whiskers = last data points within 1.5 × IQR from Q1/Q3
### Circles = points outside that range
