Seaborn

7. Seaborn#

7.1. Overview#

Seaborn is a high-level visualization library built on top of Matplotlib. It gives you cleaner defaults and concise plotting functions for statistical graphics, while still letting you use Matplotlib for fine-grained control. Pandas plotting is excellent for fast, direct plotting from a DataFrame. Seaborn becomes stronger when you need semantic mappings (color/style/size by variables), cleaner statistical defaults, and grouped comparisons.

Practical rules:

Start with Pandas for quick checks.
Use Seaborn for statistical/grouped visuals.
Drop to Matplotlib when you need exact low-level control.

As a comparison between the three visualization libraries:

In this notebook, we will use the following learning path:

Quick start (first useful plots)
Data model and semantic mappings (hue, style, size)
Core seaborn plot families (distribution, relational, categorical)
Figure-level wrappers and faceting (displot, relplot, catplot)
Multivariate views (pairplot, jointplot, heatmap)
Optional KDE internals appendix

By convention, Seaborn is imported as sns:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# %pip install seaborn
import seaborn as sns

# Set consistent defaults for all following plots
sns.set_theme(style='whitegrid', context='notebook')

tips = sns.load_dataset('tips')
planets = sns.load_dataset('planets')
iris = sns.load_dataset('iris')
penguins = sns.load_dataset('penguins')

7.1.1. Quick Comparison#

7.1.1.1. Seaborn vs. Pandas#

Compare the two plots in the next cells and you shall see that seaborn is giving more information about the data:

pd.plot.scatter(): fast DataFrame-first plotting.
sns.scatterplot(): semantic mappings (hue, style, size) with cleaner statistical defaults.

Style note: this plot uses the global theme set earlier with sns.set_theme(style='whitegrid', context='notebook').

Both plots are placed side-by-side using plt.subplots(). The ax= parameter tells each plotting function which subplot panel to draw on — without it, every call would create its own separate figure.

../../_images/213bbb37c8a03dd8f36c04216afbbc381032f7b977f134fca656138dfe93dd61.png

7.1.2. Comparing Libraries#

Compare histogram APIs across pandas, Matplotlib, and seaborn:

pandas: DataFrame-oriented convenience
Matplotlib: lower-level direct control
seaborn: statistical defaults and cleaner styling

fig, ax = plt.subplots(1, 3, figsize=(10, 3))

data = tips[['total_bill', 'tip']]
print(data.head())

### Pandas histogram
data.plot(kind='hist', title='Pandas plot(kind="hist")', ax=ax[0])

### Matplotlib histogram
ax[1].hist(data, label=['total_bill', 'tip'])
ax[1].set_title('Matplotlib ax.hist()')
ax[1].legend()

### Seaborn histogram
sns.histplot(data=data, ax=ax[2])
ax[2].set_title('Seaborn histplot()')

plt.tight_layout()

   total_bill   tip
     16.99  1.01
     10.34  1.66
     21.01  3.50
     23.68  3.31
     24.59  3.61

../../_images/1899120e8598152e50f4c91022f2a4262f272e754b40ac8ec7bbf6c37189d51f.png

The histograms are different because:

Matplotlib and Pandas uses 10 bins by default; whereas Seaborn uses uses an automatic bin-width algorithm (Sturges or FD rule), which may produce more or fewer bins depending on the data distribution.
Matplotlib has relatively narrow bin width because we are passing a 2-column DataFrame directly to ax.hist(), and Matplotlib is plotting each column as a separate histogram side-by-side each within half of the bin-width.
The Seaborn plot has an default alpha (transparency) when the plots overlap.

The rule of thumb for choosing among the libraries: if you want a fast exploratory plot, Seaborn is more automatic; if you need precise control over bins, colors, and layout, Matplotlib gives you more flexibility.

7.1.3. Plotting Methods Example#

Let’s start with a few axes-level examples. This gives you a fast way to recognize seaborn syntax before diving into API details. Here we see three different plotting functions of seaborn: histplot(), scatterplot(), and boxplot().

Let’s take a look at the tips dataset.

# df.loc[row_label, column_label]

tips.loc[:, ['total_bill', 'tip', 'size', 'day', 'time', 'sex']].head()

	total_bill	tip	size	day	time	sex
0	16.99	1.01	2	Sun	Dinner	Female
1	10.34	1.66	3	Sun	Dinner	Male
2	21.01	3.50	3	Sun	Dinner	Male
3	23.68	3.31	2	Sun	Dinner	Male
4	24.59	3.61	4	Sun	Dinner	Female

Using the MPL subplots() function, you can place the plots side-by-side. You can see that the seaborn syntax for calling the plotting functions is familiar but the method names are different. Here we have one plot created by MPL and three by seaborn.

fig, ax = plt.subplots(1, 4, figsize=(16, 3))

### Matplotlib
ax[0].hist(tips['total_bill'], bins=25, alpha=0.7)

### Seaborn
sns.histplot(data=tips, x='total_bill', bins=25, ax=ax[1])
ax[1].set_title('Histogram')

### Seaborn with hue
sns.scatterplot(data=tips, x='total_bill', y='tip', hue='time', ax=ax[2])
ax[2].set_title('Scatter + hue')

### Seaborn categorical plot
sns.boxplot(data=tips, x='day', y='total_bill', ax=ax[3])
ax[3].set_title('Categorical boxplot')

plt.tight_layout()

../../_images/9d3550dc9cfa7780be41d14bfb75094deac8ceb6c12f9c4cda360fa5cb00d18a.png

7.2. Seaborn Data Model and Semantic Mappings#

Seaborn works best with tidy (long-form) data:

each column is a variable
each row is one observation

Common semantic mappings:

hue: map category/value to color
style: map category to marker or line style
size: map numeric values to marker size

Important distinction:

Use size=... to map from data
Use s=... for a constant marker size

7.2.1. Data Formats#

Data can be structured in different ways. Understanding these formats is essential because most Python visualization and analysis libraries, including Seaborn and pandas, expect data in a specific format. The common data formats are tidy, wide, and nested.

Format	Best For
Tidy	Seaborn, statsmodels, most analysis
Wide	Excel, quick inspection, some ML models
Nested	Raw API/JSON data before processing

In practice, data often arrives in wide or nested format and should be converted to tidy before visualization or analysis. For example, sns.barplot(data=df, x='category', y='value') assumes each row is one observation. If your DataFrame is wide (one column per category), Seaborn won’t work as expected and students will get confusing errors. So the practical point is: if a Seaborn plot looks wrong or throws an error, check whether your DataFrame is in tidy format first — and if not, use pd.melt() to reshape it.

7.2.2. Tidy (Long) Format#

In tidy data,

each row is one observation,
each column is one variable, and
each cell holds one value.

This is the preferred format for analysis and visualization.

import pandas as pd

tidy = pd.DataFrame({
    'student': ['Alice', 'Alice', 'Bob', 'Bob'],
    'subject': ['math', 'english', 'math', 'english'],
    'score':   [90, 85, 78, 88]
})
tidy

	student	subject	score
0	Alice	math	90
1	Alice	english	85
2	Bob	math	78
3	Bob	english	88

7.2.3. Wide Format#

In wide data,

each row represents one subject, and
variables are spread across multiple columns.

This format is easier for humans to read at a glance.

wide = pd.DataFrame({
    'student': ['Alice', 'Bob'],
    'math':    [90, 78],
    'english': [85, 88]
})
wide

	student	math	english
0	Alice	90	85
1	Bob	78	88

7.2.4. Nested Format#

Nested data stores values inside dictionaries or lists, common when loading from JSON or APIs. It must be flattened before use.

nested = {
    'Alice': {'math': 90, 'english': 85},
    'Bob':   {'math': 78, 'english': 88}
}
nested

{'Alice': {'math': 90, 'english': 85}, 'Bob': {'math': 78, 'english': 88}}

7.2.5. Converting Formats#

Wide vs. tidy data errors: If you see KeyError: 'column_name', check if your DataFrame is in tidy format. Use pd.melt() to convert wide to tidy.

students = pd.DataFrame({
    'student': ['Alice', 'Bob', 'Carol', 'David', 'Eve'],
    'gender':  ['F', 'M', 'F', 'M', 'F'],
    'grade':   [10, 10, 11, 11, 12],
    'math':    [90, 78, 85, 72, 95],
    'english': [85, 88, 80, 76, 92],
    'science': [92, 80, 88, 70, 96]
})
print("Wide DataFrame:")
print(students, "\n")

# Wide --> Tidy
tidy = pd.melt(students, 
               id_vars=['student', 'gender', 'grade'],
               var_name='subject', 
               value_name='score')
print("Tidy DataFrame:")
print(tidy)

# Tidy --> Wide
wide = tidy.pivot(index='student',
                  columns='subject', values='score')

# Nested --> Tidy
# Cleaner Nested --> Tidy conversion
flat = pd.DataFrame(nested).T.reset_index()
flat.columns = ['student'] + list(flat.columns[1:])
flat = pd.melt(flat, id_vars='student',
               var_name='subject', value_name='score')

Wide DataFrame:
  student gender  grade  math  english  science
 Alice      F     10    90       85       92
   Bob      M     10    78       88       80
 Carol      F     11    85       80       88
 David      M     11    72       76       70
   Eve      F     12    95       92       96 

Tidy DataFrame:
   student gender  grade  subject  score
  Alice      F     10     math     90
    Bob      M     10     math     78
  Carol      F     11     math     85
  David      M     11     math     72
    Eve      F     12     math     95
  Alice      F     10  english     85
    Bob      M     10  english     88
  Carol      F     11  english     80
  David      M     11  english     76
    Eve      F     12  english     92
 Alice      F     10  science     92
   Bob      M     10  science     80
 Carol      F     11  science     88
 David      M     11  science     70
   Eve      F     12  science     96

7.3. Figure vs. Axes-Level APIs#

Seaborn has two common usage patterns:

Figure-level functions (e.g., displot, relplot, catplot) manage an entire figure/grid and return grid objects (FacetGrid, JointGrid, PairGrid).
Axes-level functions (e.g., histplot, scatterplot, lineplot, boxplot) draw on a specific Matplotlib ax and return an Axes. In other words, axes-level functions make self-contained plots.

Rule of thumb:

Use figure-level when faceting is the main goal.
Use axes-level for custom subplot composition.

The syntax for using seaborn is:

sns.PLOT-TYPE(data=df, x='col_a', y='col_b', hue='col_c')
      ↑           ↑         ↑          ↑          ↑
   plot type   DataFrame  x-axis     y-axis     color by

The table below summarizes common figure-level functions and their typical axes-level backends.

7.3.1. API Differences#

Aspect	Figure-level	Axes-level
Returns	`FacetGrid` · `JointGrid` · `PairGrid`	`Axes`
Faceting	Built-in via `row=`, `col=`, `col_wrap=`	Not available — use `plt.subplots()` + loop
Sizing	`height=` + `aspect=`	`plt.subplots(figsize=(w, h))`
Multiple panels	Automatic grid from data	`fig, axes = plt.subplots()` + pass `ax=`
Layering plots	✗ Hard	✓ Easy — multiple functions on same `ax`
When to use	Grouped/faceted views, small multiples	Custom layouts, overlaying, single-axes control

7.3.2. Function Reference#

Figure-level	Axes-level Functions	Typical use
`displot()`	`histplot()`, `kdeplot()`, `ecdfplot()`	Distributions, histograms, KDE, ECDF
`relplot()`	`scatterplot()`, `lineplot()`	Relationships and trends
`catplot()`	`stripplot()`, `swarmplot()`, `boxplot()`, `violinplot()`, `boxenplot()`, `barplot()`, `countplot()`, `pointplot()`	Categorical comparisons
`jointplot()`	`scatterplot()`, `kdeplot()`, `histplot()`, `regplot()`, `residplot()`, `kind="hex"` (Matplotlib hexbin)	Bivariate plots with marginals and/or regression
`pairplot()`	`scatterplot()`, `kdeplot()`, `histplot()`	Pairwise relationships across variables

7.4. Distribution Plots#

Distribution plots help you understand spread, skew, and shape of numerical variables.

7.4.1. Histogram#

A histogram groups values into bins and shows frequency in each bin. histplot() is an axes-level function that returns an Axes object for further customization.

ax = sns.histplot(data=tips, x='total_bill')    ### sns.PLOT-TYPE(data=df, x='col_a', y='col_b', hue='col_c')
ax.set_title('Histogram of Total Bill')
ax.set_xlabel('Total Bill ($)')
ax.set_ylabel('Frequency')  
ax.figure.set_size_inches(4, 3)
plt.tight_layout()

../../_images/610c2d74efc5bffce5ff4d93bf75e0b144458b84ba1dfe7ae2cdf4f02d80898c.png

To control size, create an axes with Matplotlib and pass ax=.
Also, add bins= gives you different widths for the bars. In this case, we change from the default number of bins to 30 and we see more information about this distribution.

fig, ax = plt.subplots(figsize=(4, 3))
sns.histplot(data=tips, x='total_bill', bins=30, ax=ax)

<Axes: xlabel='total_bill', ylabel='Count'>

../../_images/66e497e4f9f0df0be1fa1612d8092e8ca79a021b26fff457b2ad0ab91cef18f1.png

### Exercise: Histogram for Total Bill
#   1. Create a figure and axes with figsize=(5, 3).
#   2. Plot a histogram of tips['total_bill'] using sns.histplot.
#   3. Use bins=20.
#   4. Set the title to 'Total Bill Distribution'.
### Your code starts here.




### Your code ends here.

Text(0.5, 1.0, 'Total Bill Distribution')

../../_images/1f1157c64ea26154b5ca1dfc0512eae4b8e8468486b62dbe257149d937f273ef.png

7.4.1.1. Bins, Alpha, and KDE#

Three common controls:

bins: granularity of histogram bars
alpha: transparency
kde=True: overlay smooth density estimate

fig, axes = plt.subplots(1, 2, figsize=(8, 3))
bins = [30, 10]
for i, col in enumerate(data.columns):
    sns.histplot(data[col], bins=bins[i], label=col, ax=axes[i])
    axes[i].set_title(f'{col} w/ sns.histplot()')
    axes[i].legend()
plt.tight_layout()

../../_images/b9e90f30e0a82e2f70a7f66be249bb2c5528ef67c289d701e4c343573ec439ed.png

fig, axes = plt.subplots(1, 2, figsize=(8, 3))
for i, col in enumerate(data.columns[:2]):
    sns.histplot(data[col], alpha=0.25, bins=30, ax=axes[i])
    axes[i].set_title(f'{col} alpha=0.25')
plt.tight_layout()

../../_images/32359d07b740353e73e3978728e48f387de20a0f2aadb9d794835260236d1c96.png

To overlay KDE, set keyword argument kde=True when plotting.

fig, axes = plt.subplots(1, 2, figsize=(8, 3))
for i, col in enumerate(data.columns):
    ### add kde=True for kernel density estimate overlay
    sns.histplot(data[col], alpha=0.5, bins=30, kde=True, ax=axes[i]) 
    axes[i].set_title(f'{col} + kde=True')
plt.tight_layout()

../../_images/22c4ca9a94c7580651d5474cea15a21bd5eca1e24ce54c4512bea8c77ed2d6a3.png

### Exercise: Compare Histogram Settings
#   1. Create fig, axes = plt.subplots(1, 2, figsize=(8, 3)).
#   2. On axes[0], plot tips['tip'] with bins=10 and title 'bins=10'.
#   3. On axes[1], plot tips['tip'] with bins=30, alpha=0.4, kde=True,
#      and title 'bins=30, alpha=0.4, kde=True'.
#   4. Call plt.tight_layout().
### Your code starts here.




### Your code ends here.

../../_images/0e93b174181a8dec553d5c8ee396c6beccef983cb3f9db0574446f9fbd4de124.png

7.4.2. KDE / Density#

KDE is a smooth estimate of a distribution. It is often useful alongside histograms.

fig, axes = plt.subplots(1, 2, figsize=(8, 3))

sns.kdeplot(data=data, alpha=0.5, ax=axes[0])   ### seaborn ignores non-numeric columns, so no need to specify x or y
axes[0].set_title('KDE')

sns.kdeplot(data=data, alpha=0.5, ax=axes[1], fill=True)    ### add fill=True for filled KDE
axes[1].set_title('KDE w/ fill')

plt.tight_layout()

../../_images/51fe2ad97e1e904a9d671b42edcd3da75611511f8f349fd90dc9ecde232d5e1c.png

A 2D KDE is a smooth version of scatterplot.

fig, axes = plt.subplots(1, 2, figsize=(8, 3))

sns.kdeplot(data=data, x='total_bill', y='tip', ax=axes[0])  ### specify x and y for 2D KDE
axes[0].set_title('2D KDE')

sns.scatterplot(data=data, x='total_bill', y='tip', ax=axes[1])
axes[1].set_title('Scatter')

plt.tight_layout()

../../_images/32dc1420533879a382889ee0ef91f2934fa51e484b24130225f156a26c683054.png

Here we use density plot as an example of how Pandas visualization differs, syntactically, from Matplotlib and Seaborn:

Library	Function	Notes
Seaborn	`sns.kdeplot()`	most common, easy to use
Pandas	`.plot(kind="density")`	convenient for quick plots
Matplotlib + SciPy	`gaussian_kde()`	manual control over details

### Exercise: KDE for Two Variables
#   1. Create fig, axes = plt.subplots(1, 2, figsize=(8, 3)).
#   2. On axes[0], draw sns.kdeplot for tips['total_bill'] with fill=True.
#   3. On axes[1], draw sns.kdeplot for tips['tip'] with fill=True.
#   4. Title the plots 'total_bill KDE' and 'tip KDE'.
#   5. Call plt.tight_layout().
### Your code starts here.




### Your code ends here.

../../_images/34d7122ad232daba3345099acf5d2b4715e252b5949fc712c01602afc9c19058.png

7.5. Relational Plots#

Relational plots show how variables move together (patterns, trends, clusters).

7.5.1. Scatter Plot#

sns.scatterplot(x='total_bill', y='tip', data=tips, s=100, legend=True)

<Axes: xlabel='total_bill', ylabel='tip'>

../../_images/9179a3d8cb37584af18ecc80f606f4c3d17abd7ec0394d504e3937409dfabf21.png

To control the figure size.

fig, ax = plt.subplots(figsize=(4, 3))
sns.scatterplot(x='total_bill', y='tip', data=tips, s=100, legend=True, ax=ax)

<Axes: xlabel='total_bill', ylabel='tip'>

../../_images/5f8f821ab0934b52101727d0d2abc24bb2f6594c5d54908b34b7af66f9093608.png

Add hue=

fig, ax = plt.subplots(figsize=(4, 3))
sns.scatterplot(x='total_bill', y='tip', data=tips, s=100, legend=True, ax=ax, hue='sex')

<Axes: xlabel='total_bill', ylabel='tip'>

../../_images/90e791a7152d4610a9897463a89d22c5d069ca5958bc0a9422dfb814acc67172.png

Add size and move legend out of the plot using bbox_to_anchor=. (*bbox == bounding box)

fig, ax = plt.subplots(figsize=(4, 3))
sns.scatterplot(x='total_bill', y='tip', data=tips, s=100, legend=True, ax=ax, hue='sex', size='size')
ax.legend(bbox_to_anchor=(1, 1))  ### move legend outside the plot

<matplotlib.legend.Legend at 0x11f057610>

../../_images/244357f16e47c87cebbdebafa5a967f46a2ecaabb1a8cdad5a14c7dc4c383100.png

### Exercise: Scatter with Semantic Mapping
#   1. Create a scatter plot of total_bill vs tip.
#   2. Color points by time (hue='time').
#   3. Use marker size mapping from the 'size' column with sizes=(30, 180).
#   4. Set the title to 'Tips: Bill vs Tip by Time and Party Size'.
### Your code starts here.




### Your code ends here.

Text(0.5, 1.0, 'Tips: Bill vs Tip by Time and Party Size')

../../_images/0a55575e8e220319af7116ab3a84c5da1a02e8c888f01a6acc5d7a5de384a766.png

7.5.2. Line Plot#

lineplot is commonly used for trends across an ordered x-axis. By default, seaborn aggregates repeated x values and can show uncertainty intervals.

sns.lineplot(x='day', y='tip', data=tips, estimator='mean', errorbar=('ci', 95))

<Axes: xlabel='day', ylabel='tip'>

../../_images/bb2f6445e44b881fad34b37aeaa4c61c00e9ba08f6ea6fd592a0b2525190c1c4.png

### Exercise: Average Tip by Day
#   1. Create a line plot with x='day' and y='tip' using the tips dataset.
#   2. Use estimator='mean'.
#   3. Keep a 95% confidence interval using errorbar=('ci', 95).
#   4. Set the title to 'Average Tip by Day'.
### Your code starts here.




### Your code ends here.

Text(0.5, 1.0, 'Average Tip by Day')

../../_images/f099ba83ef2c00b7887c86803f1ada81231ad50d550b177dee98255ff1533ab7.png

7.6. Categorical Plots#

Categorical plots summarize or compare values across groups.

sns.catplot() is a figure-level wrapper for categorical plots (kind='box', 'violin', 'bar', 'count', etc.).

Use it when you want consistent faceting/layout behavior across categorical plot types.

7.6.1. Box Plot#

Note

The with sns.axes_style(...) block is a context manager that temporarily applies a style only to the plots created inside it. Once the block ends, Seaborn reverts to the previous theme — nothing is permanently changed.

with sns.axes_style(style='ticks'):
    # g = sns.catplot(data=tips, x='day', y='total_bill', kind='box')
    g = sns.catplot(data=tips, x='day', y='total_bill', hue='sex', kind='box')
    g.set_axis_labels('Day', 'Total Bill')

../../_images/9f61c1224da6588deab568bc7076015d98fc2f9505261ade8317881e1a9fccd1.png

### Exercise: Box Plot by Day and Sex
#   1. Use sns.catplot with kind='box'.
#   2. Plot x='day', y='total_bill', and hue='sex' from tips.
#   3. Set axis labels to 'Day' and 'Total Bill'.
### Your code starts here.




### Your code ends here.

<seaborn.axisgrid.FacetGrid at 0x11e7c8e10>

../../_images/dc0950d30fb5fc4a49ce94f972b30a570fd827686b2022c26813ff4f3613441e.png

7.6.2. Bar Plot#

bar shows an estimator (mean by default) with uncertainty intervals. Also, here we explore different ways of plotting technics:

Saving axes: Instead of using subplots(), we save the plot, which is an axes, and then control the styling directly.
The with statement: The with statement (the “context manager” that handles resources) temporarily applies a style only for the plots created inside its block. Once the block ends, Seaborn returns to the previous style — nothing is permanently changed.

with sns.axes_style(style='dark'):
    g = sns.barplot(data=tips, x='day', y='total_bill', errorbar=('ci', 95))
    g.set_xlabel('Day')
    g.set_ylabel('Total Bill')
    g.set_title('Average Total Bill by Day with 95% CI')

../../_images/b853f9ddb0600a66f6cdc9407fa8db820aa869f10d062da66f1edd8e785c085f.png

Set hue and see the magic seaborn does.

with sns.axes_style(style='dark'):
    g = sns.catplot(data=tips, x='day', y='total_bill', hue='sex', kind='bar', errorbar=('ci', 95))
    g.set_axis_labels('Day', 'Total Bill')

../../_images/9d82cd7982b268bd4b6ce526a5b06d766118509ec1dba52898701fd05e6ceae0.png

### Exercise: Mean Total Bill by Day
#   1. Create a categorical bar plot with x='day' and y='total_bill'.
#   2. Use kind='bar' and errorbar=('ci', 95).
#   3. Set axis labels to 'Day' and 'Total Bill'.
### Your code starts here.




### Your code ends here.

<seaborn.axisgrid.FacetGrid at 0x11ed860d0>

../../_images/40dc470eb14e6012deda29597f14432fd43c297c62125973f630e6349a1c905f.png

7.6.3. Count Plot (`kind='count'`)#

Count plots show category frequencies. This is different from histograms, which bin continuous numeric values.

print(planets.head())

with sns.axes_style('white'):
    g = sns.countplot(
        data=planets, 
        x='year', 
        color='steelblue'
        )
    ticks = g.get_xticks()
    labels = [t.get_text() for t in g.get_xticklabels()]   # get labels before changing ticks

    g.set_xticks(ticks[::5])                    # set every 5th tick position
    g.set_xticklabels(labels[::5], rotation=45) # set every 5th label

### with figure level catplot:
# with sns.axes_style('white'):       ### with for temporary style
#     g = sns.catplot(
#         data=planets,
#         x='year',
#         aspect=2,           ### aspect ratio (width/height);
#         kind='count',
#         color='steelblue',
#         height=3
#     )

            method  number  orbital_period   mass  distance  year
Radial Velocity       1         269.300   7.10     77.40  2006
Radial Velocity       1         874.774   2.21     56.95  2008
Radial Velocity       1         763.000   2.60     19.84  2011
Radial Velocity       1         326.030  19.40    110.62  2007
Radial Velocity       1         516.220  10.50    119.47  2009

../../_images/7ef24dbc454d4c7b7206d30adcdefe31823f50f711e5d24019daf9b692f043ad.png

### Exercise: Count Planets Discoveries by Year
#   1. Create a count plot using planets with x='year'.
#   2. Use kind='count', height=3, aspect=2, and color='steelblue'.
#   3. Set x tick labels to show every 5th year.
### Your code starts here.




### Your code ends here.

<seaborn.axisgrid.FacetGrid at 0x11e8f3c50>

../../_images/df8f5fc77f4bb566383a518a9370a85bb7c033b11c0e5a92cda30f19203936be.png

7.7. Figure-Level Wrappers and Faceting#

Use figure-level wrappers when you want seaborn to build panel grids directly from grouping variables.

These wrappers make small-multiple comparisons much easier (row=, col=, hue=, height=, aspect=).

7.7.1. `displot()`#

The col= is set to time, meaning lunch and dinner. So we get two plots automatically.

tips.head(3)

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3

sns.displot(data=tips, x='total_bill', col='time', kde=True, height=3, aspect=1.2)

<seaborn.axisgrid.FacetGrid at 0x11df22c10>

../../_images/b89f9038d90a57e3c8cf45d2f980cfbb58f0c989e17da0165e2f62b0c58a07ff.png

sns.displot(data=tips, kind='kde', x='tip', col='time', height=3, aspect=1.2)

<seaborn.axisgrid.FacetGrid at 0x11e771e50>

../../_images/1fa5f49d416df4b0ab29a08075c95e2221ff38addcc1ab9f593a0ed1a742e7cc.png

7.7.2. `relplot()`#

relplot() is a figure-level Seaborn function for relational data.

Parameter	Value	Effect
`kind`	`'scatter'`	Uses scatter plots
`x`, `y`	`'total_bill'`, `'tip'`	Axes in each panel
`hue`	`'sex'`	Point color encodes sex
`col`	`'time'`	Separate columns for Lunch / Dinner
`row`	`'smoker'`	Separate rows for Yes / No smoker
`height`, `aspect`	`3`, `1`	Each facet is about 3×3 inches

sns.relplot(
    data=tips,
    x='total_bill', y='tip',
    hue='sex', col='time', row='smoker',
    kind='scatter', height=3, aspect=1
)

<seaborn.axisgrid.FacetGrid at 0x11f276c10>

../../_images/760759670984765cd576a4f730354a0dfdb3a2c02e6f1cc84c4596c759ac82f9.png

7.7.3. `catplot()`#

cat = sns.catplot(
    data=tips,
    x='day', y='total_bill', hue='sex', col='time',
    kind='box', height=3, aspect=1
)

type(cat)

seaborn.axisgrid.FacetGrid

../../_images/a32d4db21d320585b593a1dbdd1c0dfd6213b70452837393246afe8901a73d63.png

7.8. FacetGrid#

FacetGrid is a return type and a standalone class by itself and can be used to create a FacetGrid directly.

After creating the FacetGrid, you use the map() function to pass plotting functions to it.

tips['tip_pct'] = 100 * tips['tip'] / tips['total_bill']

grid = sns.FacetGrid(data=tips, row='sex', col='time', margin_titles=True)
grid.map(plt.hist, 'tip_pct', bins=np.linspace(0, 40, 15))

<seaborn.axisgrid.FacetGrid at 0x11f3a2e90>

../../_images/6ef2071c1d1703323db3c528df21c7b220fc4cb7e8c4d07bd3a1f6b67a280779.png

When to use FacetGrid vs. figure-level functions:

Use FacetGrid when you need:

Custom functions beyond Seaborn’s built-in plots
Multi-step plot construction (e.g., grid.map() multiple times)
Fine control over individual facets

For standard faceting, displot(), relplot(), and catplot() are simpler and recommended.

### Exercise: Faceted Tip Percentage Histograms
#   1. Create tips['tip_pct'] as 100 * tip / total_bill.
#   2. Build a FacetGrid with row='sex' and col='time'.
#   3. Map plt.hist on 'tip_pct' using bins=np.linspace(0, 40, 15).
### Your code starts here.




### Your code ends here.

<seaborn.axisgrid.FacetGrid at 0x11f5bc550>

7.9. Multivariate Views#

These plots help you inspect relationships among multiple variables at once.

7.9.1. Pair Plots#

sns.pairplot(tips)

<seaborn.axisgrid.PairGrid at 0x11dd82660>

../../_images/5b116443bde55ad65bf0ea20e63922b0552ac6942459a72c3575f3ca1bd3524c.png

sns.pairplot(tips, hue='sex', palette='coolwarm')

<seaborn.axisgrid.PairGrid at 0x11fedead0>

../../_images/89a1a39973825350853f82ca9d2040a570f791478c332fb1ab7109397b25b021.png

sns.pairplot(iris, hue='species', height=1.5)

<seaborn.axisgrid.PairGrid at 0x120569310>

../../_images/bd187b397096ee751430a97b9960ff3f82e0ab699bcd0bb378371265af414e8a.png

### Exercise: Pairplot with Group Coloring
#   1. Create a pairplot using the iris dataset.
#   2. Color points by species using hue='species'.
#   3. Set height=1.5 for each subplot.
### Your code starts here.




### Your code ends here.

<seaborn.axisgrid.PairGrid at 0x11de06520>

7.9.2. Joint Plots#

### you may be tempted to do this, but it won't work because sns.jointplot creates its own figure and axes
# fig, ax = plt.subplots(1, 2, figsize=(8, 3))
# sns.jointplot(data=tips, x='total_bill', y='tip', kind='scatter', ax=ax[0])
# sns.jointplot(data=tips, x='total_bill', y='tip', kind='kde', fill=True, ax=ax[1])
###

sns.jointplot(data=tips, x='total_bill', y='tip', kind='scatter', height=4)
sns.jointplot(data=tips, x='total_bill', y='tip', kind='kde', fill=True, height=4)

<seaborn.axisgrid.JointGrid at 0x120f1a350>

../../_images/2ccd3d96bad540b0272125058e50005ef7b3209c32c3a250b6e6458a8a222e5f.png

../../_images/0a8bcc976bb391bb8a913eef8566265fd05f8afc2334fcf2122bc3622dc14ef7.png

with sns.axes_style('white'):
    sns.jointplot(data=tips, kind='hex', x='total_bill', y='tip', height=4)

../../_images/1ac648404e0c8f0aaed68c846f19b2ccfcad6819ff6db10074545b1d0b306a8e.png

sns.jointplot(x='total_bill', y='tip', data=tips, kind='reg')

<seaborn.axisgrid.JointGrid at 0x11de07360>

../../_images/da433206069b7a68ec8ed6ea4f38a4e8e806d073a0c0037aaefca24c60bb04ad.png

### Exercise: Joint Plot with Regression
#   1. Create a joint plot of total_bill vs tip from tips.
#   2. Use kind='reg'.
#   3. Keep default marginal distributions.
### Your code starts here.




### Your code ends here.

<seaborn.axisgrid.JointGrid at 0x11de06b10>

../../_images/61a1b733272907cfd08ec99a15a415f340dcee13f685d39f23c8411025898c42.png

7.9.3. Heatmap (Common EDA Pattern)#

Heatmaps are useful for compact matrix-style summaries, such as correlation matrices.

corr = tips[['total_bill', 'tip', 'size']].corr(numeric_only=True)

fig, ax = plt.subplots(figsize=(4, 3))
sns.heatmap(corr, annot=True, cmap='Blues', vmin=-1, vmax=1, ax=ax)
ax.set_title('Correlation Heatmap')

Text(0.5, 1.0, 'Correlation Heatmap')

../../_images/2671714270112b2f50b408a2a9c1c103a0bb15af6a44b0dafa2a32410ea7da3e.png

### Exercise: Correlation Heatmap
#   1. Compute a correlation matrix from ['total_bill', 'tip', 'size'].
#   2. Plot a heatmap with annot=True and cmap='Blues'.
#   3. Set vmin=-1 and vmax=1.
#   4. Add title 'Tips Correlation Heatmap'.
### Your code starts here.




### Your code ends here.

Text(0.5, 1.0, 'Tips Correlation Heatmap')

../../_images/cce4508ce4054b41ff4378286704a7761f989366e97d57dbb6e3debc4735be87.png

7.10. Themes and Global Style#

Seaborn theme settings apply globally in the notebook and help keep plots visually consistent.

sns.set_theme(style='ticks', context='talk', palette='deep')
sns.scatterplot(data=tips, x='total_bill', y='tip', hue='time')

<Axes: xlabel='total_bill', ylabel='tip'>

../../_images/6bd38b0747cb5d2ab0d8f0352bc412f5fca117e2d1a641aadf75bf3140259fcd.png

# Reset to a moderate default for the rest of your work
sns.set_theme(style='whitegrid', context='notebook')

To list all named palettes:

print(dir(sns.palettes))

['MPL_QUAL_PALS', 'QUAL_PALETTES', 'QUAL_PALETTE_SIZES', 'SEABORN_PALETTES', '_ColorPalette', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_color_to_rgb', '_parse_cubehelix_args', '_patch_colormap_display', 'blend_palette', 'color_palette', 'colorsys', 'crayon_palette', 'crayons', 'cubehelix_palette', 'cycle', 'dark_palette', 'desaturate', 'diverging_palette', 'get_color_cycle', 'get_colormap', 'hls_palette', 'husl', 'husl_palette', 'light_palette', 'mpl', 'mpl_palette', 'np', 'set_color_codes', 'xkcd_palette', 'xkcd_rgb']

The commonly used built-in options are:

Qualitative (good for categories)

'deep', 'muted', 'pastel', 'bright', 'dark', 'colorblind'

Sequential

'Blues', 'Greens', 'Reds', 'Oranges', 'Purples', 'Greys'

Diverging

'coolwarm', 'RdBu', 'BrBG', 'PiYG'

Perceptually uniform

'viridis', 'plasma', 'magma', 'inferno', 'cividis'

To visually browse all palettes (most useful):

# See a single palette
sns.color_palette('deep')
# sns.color_palette('muted')

# Show a visual swatch in Jupyter
sns.palettes.SEABORN_PALETTES  # dict of all seaborn palettes

{'deep': ['#4C72B0',
  '#DD8452',
  '#55A868',
  '#C44E52',
  '#8172B3',
  '#937860',
  '#DA8BC3',
  '#8C8C8C',
  '#CCB974',
  '#64B5CD'],
 'deep6': ['#4C72B0', '#55A868', '#C44E52', '#8172B3', '#CCB974', '#64B5CD'],
 'muted': ['#4878D0',
  '#EE854A',
  '#6ACC64',
  '#D65F5F',
  '#956CB4',
  '#8C613C',
  '#DC7EC0',
  '#797979',
  '#D5BB67',
  '#82C6E2'],
 'muted6': ['#4878D0', '#6ACC64', '#D65F5F', '#956CB4', '#D5BB67', '#82C6E2'],
 'pastel': ['#A1C9F4',
  '#FFB482',
  '#8DE5A1',
  '#FF9F9B',
  '#D0BBFF',
  '#DEBB9B',
  '#FAB0E4',
  '#CFCFCF',
  '#FFFEA3',
  '#B9F2F0'],
 'pastel6': ['#A1C9F4', '#8DE5A1', '#FF9F9B', '#D0BBFF', '#FFFEA3', '#B9F2F0'],
 'bright': ['#023EFF',
  '#FF7C00',
  '#1AC938',
  '#E8000B',
  '#8B2BE2',
  '#9F4800',
  '#F14CC1',
  '#A3A3A3',
  '#FFC400',
  '#00D7FF'],
 'bright6': ['#023EFF', '#1AC938', '#E8000B', '#8B2BE2', '#FFC400', '#00D7FF'],
 'dark': ['#001C7F',
  '#B1400D',
  '#12711C',
  '#8C0800',
  '#591E71',
  '#592F0D',
  '#A23582',
  '#3C3C3C',
  '#B8850A',
  '#006374'],
 'dark6': ['#001C7F', '#12711C', '#8C0800', '#591E71', '#B8850A', '#006374'],
 'colorblind': ['#0173B2',
  '#DE8F05',
  '#029E73',
  '#D55E00',
  '#CC78BC',
  '#CA9161',
  '#FBAFE4',
  '#949494',
  '#ECE133',
  '#56B4E9'],
 'colorblind6': ['#0173B2',
  '#029E73',
  '#D55E00',
  '#CC78BC',
  '#ECE133',
  '#56B4E9']}

# Or use matplotlib's colormaps too
import matplotlib.pyplot as plt
plt.colormaps()  # lists all matplotlib-compatible names

['magma',
 'inferno',
 'plasma',
 'viridis',
 'cividis',
 'twilight',
 'twilight_shifted',
 'turbo',
 'berlin',
 'managua',
 'vanimo',
 'Blues',
 'BrBG',
 'BuGn',
 'BuPu',
 'CMRmap',
 'GnBu',
 'Greens',
 'Greys',
 'OrRd',
 'Oranges',
 'PRGn',
 'PiYG',
 'PuBu',
 'PuBuGn',
 'PuOr',
 'PuRd',
 'Purples',
 'RdBu',
 'RdGy',
 'RdPu',
 'RdYlBu',
 'RdYlGn',
 'Reds',
 'Spectral',
 'Wistia',
 'YlGn',
 'YlGnBu',
 'YlOrBr',
 'YlOrRd',
 'afmhot',
 'autumn',
 'binary',
 'bone',
 'brg',
 'bwr',
 'cool',
 'coolwarm',
 'copper',
 'cubehelix',
 'flag',
 'gist_earth',
 'gist_gray',
 'gist_heat',
 'gist_ncar',
 'gist_rainbow',
 'gist_stern',
 'gist_yarg',
 'gnuplot',
 'gnuplot2',
 'gray',
 'hot',
 'hsv',
 'jet',
 'nipy_spectral',
 'ocean',
 'pink',
 'prism',
 'rainbow',
 'seismic',
 'spring',
 'summer',
 'terrain',
 'winter',
 'Accent',
 'Dark2',
 'Paired',
 'Pastel1',
 'Pastel2',
 'Set1',
 'Set2',
 'Set3',
 'tab10',
 'tab20',
 'tab20b',
 'tab20c',
 'grey',
 'gist_grey',
 'gist_yerg',
 'Grays',
 'magma_r',
 'inferno_r',
 'plasma_r',
 'viridis_r',
 'cividis_r',
 'twilight_r',
 'twilight_shifted_r',
 'turbo_r',
 'berlin_r',
 'managua_r',
 'vanimo_r',
 'Blues_r',
 'BrBG_r',
 'BuGn_r',
 'BuPu_r',
 'CMRmap_r',
 'GnBu_r',
 'Greens_r',
 'Greys_r',
 'OrRd_r',
 'Oranges_r',
 'PRGn_r',
 'PiYG_r',
 'PuBu_r',
 'PuBuGn_r',
 'PuOr_r',
 'PuRd_r',
 'Purples_r',
 'RdBu_r',
 'RdGy_r',
 'RdPu_r',
 'RdYlBu_r',
 'RdYlGn_r',
 'Reds_r',
 'Spectral_r',
 'Wistia_r',
 'YlGn_r',
 'YlGnBu_r',
 'YlOrBr_r',
 'YlOrRd_r',
 'afmhot_r',
 'autumn_r',
 'binary_r',
 'bone_r',
 'brg_r',
 'bwr_r',
 'cool_r',
 'coolwarm_r',
 'copper_r',
 'cubehelix_r',
 'flag_r',
 'gist_earth_r',
 'gist_gray_r',
 'gist_heat_r',
 'gist_ncar_r',
 'gist_rainbow_r',
 'gist_stern_r',
 'gist_yarg_r',
 'gnuplot_r',
 'gnuplot2_r',
 'gray_r',
 'hot_r',
 'hsv_r',
 'jet_r',
 'nipy_spectral_r',
 'ocean_r',
 'pink_r',
 'prism_r',
 'rainbow_r',
 'seismic_r',
 'spring_r',
 'summer_r',
 'terrain_r',
 'winter_r',
 'Accent_r',
 'Dark2_r',
 'Paired_r',
 'Pastel1_r',
 'Pastel2_r',
 'Set1_r',
 'Set2_r',
 'Set3_r',
 'tab10_r',
 'tab20_r',
 'tab20b_r',
 'tab20c_r',
 'grey_r',
 'gist_grey_r',
 'gist_yerg_r',
 'Grays_r',
 'rocket',
 'rocket_r',
 'mako',
 'mako_r',
 'icefire',
 'icefire_r',
 'vlag',
 'vlag_r',
 'flare',
 'flare_r',
 'crest',
 'crest_r']

7.11. Why KDE Looks Smooth#

This section is optional and focuses on intuition for KDE construction.

rugplot marks each observation on one axis. A KDE can be viewed as the sum of many smooth kernels centered on those observations.

sns.set_theme(palette='viridis')

ax = sns.rugplot(tips['total_bill'])
ax.set_title('Rug Plot of Total Bill')
ax.set_xlabel('Total Bill ($)')
ax.set_ylabel('Frequency')

ax.figure.set_size_inches(4, 3)
plt.tight_layout()

../../_images/5762067f11c742d1d555b4bb4b781aa2a26bbd5f6430634b227f2528ccb37d25.png

# Don't worry about understanding this code in depth.
# It visualizes how summing basis functions forms a KDE-like curve.
from scipy import stats

# Create dataset
np.random.seed(42)
dataset = np.random.randn(25)

ax = sns.rugplot(dataset)

x_min = dataset.min() - 2
x_max = dataset.max() + 2
x_axis = np.linspace(x_min, x_max, 100)

bandwidth = ((4 * dataset.std()**5) / (3 * len(dataset)))**.2

kernel_list = []
for data_point in dataset:
    kernel = stats.norm(data_point, bandwidth).pdf(x_axis)
    kernel_list.append(kernel)

    kernel = kernel / kernel.max()
    kernel = kernel * .4
    plt.plot(x_axis, kernel, color='grey', alpha=0.5)

plt.ylim(0, 1)

ax.figure.set_size_inches(4, 3)
plt.tight_layout()

../../_images/4698da5a3dc9f18e7370bf9b198330e73a28dcae9992de47b00c75cf24c3a56d.png

sum_of_kde = np.sum(kernel_list, axis=0)
plt.figure(figsize=(4, 3))
plt.plot(x_axis, sum_of_kde)
sns.rugplot(dataset)
plt.yticks([])
plt.suptitle('Sum of the Basis Functions');

../../_images/48ee169274f348c16ae4abc4ea32cb62192d8e749fb6b1d21702189fe0c09d42.png

7.12. Styling#

sns.set_theme(style='whitegrid', context='notebook')

7.12.1. Size#

The size= and s= parameters are different parameters.

size= is not a parameter, it’s for semantic mapping. For example, size='size' maps size from data column size.
s= sets constant size across all points.

sns.scatterplot(data=tips, x=total_bill, y=tip, s=100)

sns.scatterplot(data=tips, x=total_bill, y=tip, size='size')

fig, ax = plt.subplots(1, 2, figsize=(10, 4))

# constant size points
sns.scatterplot(data=tips, x='total_bill', y='tip', s=80, ax=ax[0])
ax[0].set_title('Constant size (s=80)')

# size mapped from a real numeric column in the dataframe
sns.scatterplot(
    data=tips, x='total_bill', y='tip',
    hue='time', style='sex', size='size', sizes=(20, 220),
    ax=ax[1]
)
ax[1].set_title('Mapped hue/style/size')
ax[1].legend(bbox_to_anchor=(1, 1))  ### move legend outside the plot

plt.tight_layout()

../../_images/affed018dead9c3c8516c9a3c437306f000340b6465279099cd08feaf3a8f147.png

7.12.2. Alpha#

Seaborn’s histplot() intelligently applies transparency when plotting multiple overlapping distributions. It detects overlapping series and applies transparency automatically so both are visible—one of Seaborn’s key advantages over Pandas and Matplotlib.

Pandas plot(kind="hist") — overlaps with no transparency, you must set alpha= manually
Matplotlib ax.hist() — same, manual alpha= required
Seaborn histplot() — handles transparency automatically for overlapping series

data = tips[['total_bill', 'tip']]

fig, ax = plt.subplots(1, 3, figsize=(10, 3))

### Pandas histogram
data.plot(ax=ax[0], kind='hist', title='Pandas plot(kind="hist")', alpha=0.5)

### Matplotlib histogram
ax[1].hist(data, label=['total_bill', 'tip'], alpha=0.5)
ax[1].set_title('Matplotlib ax.hist()')
ax[1].legend()

### Seaborn histogram
sns.histplot(data=data, ax=ax[2])
ax[2].set_title('Seaborn histplot()')

plt.tight_layout()

../../_images/78bd47599799f04ae541becb0b0b2d1f5477bd8b6c319190dbfb7b5d8367f936.png

7.12.3. Overlay#

When combining multiple plot types, axes-level functions give you full control. On the other hand, figure-level functions create their own figure, making layering harder. Note that the three plots are all rendered on the same MPL axes object (ax).

ax.axhline() adds a horizontal line spanning the whole or fraction of the Axes. (see matplotlib.pyplot.axhline ). The basic syntax of axhline() is:

ax.axhline(y=0, xmin=0, xmax=1, **kwargs)

Commonly used parameters for ax.axhlin() include:

Parameter	Meaning
`y`	y-value where the horizontal line is drawn
`xmin`	start of the line (fraction of axis width, 0–1)
`xmax`	end of the line (fraction of axis width, 0–1)
`**kwargs`	styling options like `color`, `linestyle`, `linewidth`

fig, ax = plt.subplots()
sns.scatterplot(data=tips, x='total_bill', y='tip', ax=ax)
sns.rugplot(data=tips, x='total_bill', ax=ax)                   # Add rug marks
ax.axhline(y=tips['tip'].mean(), color='red', linestyle='--')   # Add reference 

<matplotlib.lines.Line2D at 0x12182f4d0>

../../_images/7ee181db8e7c9e6d436677d3008be54cbbbab80aa3ff1f52081f2327805b303c.png