4.1. Pandas Series#

Hide code cell source

import sys
from pathlib import Path

current = Path.cwd()
for parent in [current, *current.parents]:
    if (parent / '_config.yml').exists():
        project_root = parent  # ← Add project root, not chapters
        break
else:
    project_root = Path.cwd().parent.parent

sys.path.insert(0, str(project_root))

from shared import thinkpython, diagram, jupyturtle

The Pandas Series is a one-dimensional labeled array that can hold any data type. It’s the primary building block of Pandas and serves as the foundation for the more complex DataFrame structure.

Key characteristics of Pandas Series include:

  • Labeled index: Each element has an associated label (index)

  • Heterogeneous data types: Can hold mixed data types (unlike NumPy arrays, which are homogeneous)

  • Built on NumPy: Internally uses NumPy arrays for efficient storage

  • Flexible creation: Can be created from lists, arrays, dictionaries, and more

When to use Series:

  • Labeled data where the index has meaning (e.g., names, IDs)

  • A single column of data extracted from a DataFrame

  • Lookup tables or mapping data (e.g., mapping codes to values)

  • Time series data with datetime indexing

  • When you want to perform vectorized operations on 1D data with index alignment

A Series is very similar to a NumPy array (in fact, it is built on top of the NumPy array object). What differentiates a NumPy array from a Series is that a Series can have labels, meaning the elements can be indexed by labels instead of just numerical positions, which helps a lot when performing data analysis. Additionally, while NumPy arrays are designed for homogeneous numeric data, a Series can hold any arbitrary Python object.

# %pip install pandas

import numpy as np
import pandas as pd

4.1.1. Creating Series#

A Pandas Series can be created by loading data from existing storage such as dataset files and SQL data sources, or directly from Python objects like lists, dictionaries, or even single scalar values.

When creating a Series:

  • We commonly pass data as an argument to the data parameter, or use both the data and index parameters with objects.

  • Automatic indexing: You can feed a sequence object (e.g., list and ndarray) to create a Pandas Series, and the default integer index (0, 1, 2, …) will be automatically created, unless you supply an index.

    • dict: Passing a dictionary uses the dictionary keys as the Series index and the values as the data.

  • Custom index: You can provide custom labels via the index argument to make your Series more descriptive and easier to work with.

  • dtype: Pandas infers the dtype automatically; specify dtype (dtype=) explicitly if you need a particular type.

  • By specifying parameter names explicitly, you can pass arguments in any order, making your code more readable and less error-prone.

    • List comprehensions can be used inline when building a Series from a generated sequence.

  • (Note that NaN, Not a Number, is a float value in pandas and NumPy.)

To create a Pandas Series, the syntax is:

pd.Series(data=None, index=None, dtype=None, name=None, copy=None)

where the parameters are

Parameter

Type

Default

Description

Example

data

array-like, Iterable, dict, or scalar

None

The values for the Series. Can be a list, array, dictionary, or single value.

pd.Series([1, 2, 3])

index

array-like or Index

None

Labels for each element. If not provided, defaults to RangeIndex (0, 1, 2…).

pd.Series([1, 2], index=['a', 'b'])

dtype

str, numpy.dtype, or ExtensionDtype

None

Data type to force. If not specified, will be inferred from data.

pd.Series([1, 2], dtype='float64')

name

str

None

The name of the Series (useful when converting to DataFrame).

pd.Series([1, 2], name='MyColumn')

copy

bool

False

Whether to copy input data. If False, data is referenced (not copied).

pd.Series(data, copy=True)

A simple example of creating a Pandas Series is:

### Create a simple Series from a list

ser = pd.Series([1, 3, 5, np.nan, 6, 8])
ser
0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

4.1.1.1. From Sequences#

From the above evaluation outcome, you notice that each of the elements now has a corresponding index. This is just like lists but explicit. Usually, you would want to supply labels to make the data more descriptive by setting index. The index is the set of labels that uniquely identify each element in a Series (or row in a DataFrame), enabling intuitive label-based access and automatic alignment of data across objects.

Now let’s prepare the following data for more Series creation:

labels = ['Apple','Banana','Cherry']          ### a string list
lst = [10,20,30]                ### a numerical list
arr = np.array(lst)             ### a numpy array
dict = {'Apple':10,'Banana':20,'Cherry':30}   ### a dictionary: key-value pairs

### note that we use tuple unpacking to loop over name-object pairs
for name, obj in [('labels', labels), ('lst', lst), ('arr', arr), ('dict', dict)]:
    print(f"{name}\t: {obj}")
labels	: ['Apple', 'Banana', 'Cherry']
lst	: [10, 20, 30]
arr	: [10 20 30]
dict	: {'Apple': 10, 'Banana': 20, 'Cherry': 30}

See below for creating a Series by sending list data as an argument with parameter data, then with both parameter data and index with objects, and finally, only the objects without parameter names. These examples show how to use the Series constructor to create Series from Python and NumPy objects, along with the Series parameters.

Observe how

  • How the Series() constructor is used to create Pandas Series.

  • How the parameters are used in these examples.

print(pd.Series(data=lst), "\n")                ### create Series by sending lst as argument with parameter data
print(pd.Series(data=lst,index=labels), "\n")   ### add labels: index
print(pd.Series(lst, labels), "\n")             ### same as above, without parameter names

print(pd.Series(arr), "\n")
print(pd.Series(arr, index=labels)) 
0    10
1    20
2    30
dtype: int64 

Apple     10
Banana    20
Cherry    30
dtype: int64 

Apple     10
Banana    20
Cherry    30
dtype: int64 

0    10
1    20
2    30
dtype: int64 

Apple     10
Banana    20
Cherry    30
dtype: int64
### EXERCISE: Create Series from a List
# You have three lists representing student names, their scores, and grades:

names = ['Alice', 'Bob', 'Charlie', 'David']
scores = [95, 87, 92, 78]
grades = ['A', 'B', 'A', 'C']

# 1. Create a Series from the scores list with names as the index
# 2. Create a Series from a NumPy array of scores with names as index
# 3. Print both Series to compare

### Your code starts here:




### Your code ends here.

Hide code cell source

# Solution
names = ['Alice', 'Bob', 'Charlie', 'David']
scores = [95, 87, 92, 78]
grades = ['A', 'B', 'A', 'C']

# 1. Create Series from list
scores_series = pd.Series(data=scores, index=names)
print("Series from list:")
print(scores_series)
print()

# 2. Create Series from NumPy array
scores_array = np.array(scores)
scores_from_array = pd.Series(scores_array, index=names)
print("Series from NumPy array:")
print(scores_from_array)
print()

# 3. Verify they're the same 
print("Are they equal?", scores_series.equals(scores_from_array))
Series from list:
Alice      95
Bob        87
Charlie    92
David      78
dtype: int64

Series from NumPy array:
Alice      95
Bob        87
Charlie    92
David      78
dtype: int64

Are they equal? True

4.1.1.2. From Python dictionary#

Since a Python dictionary maps keys to values, passing it to pd.Series will use the dictionary keys as the Series index and the corresponding values as the Series data — a convenient way to create a labeled Series from key–value pairs.

### key-value pairs

pd.Series(dict)
Apple     10
Banana    20
Cherry    30
dtype: int64
### EXERCISE: Create Series from Dictionary
# You have product inventory data stored as a dictionary:

inventory = {
    'Laptop': 15,
    'Mouse': 45,
    'Keyboard': 30,
    'Monitor': 12,
    'Headset': 25
}

# 1. Create a Series from the inventory dictionary
# 2. Create another Series but only for items: ['Laptop', 'Monitor', 'Headset']
# 3. What happens to items not in the dictionary when you specify them in the index?

### Your code starts here:




### Your code ends here.

Hide code cell source

# Solution
inventory = {
    'Laptop': 15,
    'Mouse': 45,
    'Keyboard': 30,
    'Monitor': 12,
    'Headset': 25
}

# 1. Create Series from dictionary
inventory_series = pd.Series(inventory)
print("Full inventory:")
print(inventory_series)
print()

# 2. Series with specific items
selected_items = pd.Series(inventory, index=['Laptop', 'Monitor', 'Headset'])
print("Selected items:")
print(selected_items)
print()

# 3. What if we request an item not in dictionary?
with_missing = pd.Series(inventory, index=['Laptop', 'Monitor', 'Tablet'])
print("With missing item 'Tablet':")
print(with_missing)
print("\nNote: Missing items get NaN value")
Full inventory:
Laptop      15
Mouse       45
Keyboard    30
Monitor     12
Headset     25
dtype: int64

Selected items:
Laptop     15
Monitor    12
Headset    25
dtype: int64

With missing item 'Tablet':
Laptop     15.0
Monitor    12.0
Tablet      NaN
dtype: float64

Note: Missing items get NaN value

4.1.1.3. Using List Comprehension#

When we need to create a Python list object in a workflow, list comprehension allows us to do so without interruption. Note that in the code below we place the list comprehension in the Series function.

### create a list using list comprehension
### this is a demonstration of list comprehension and we save the list to a variable
nums_lst = [ num for num in range(5) ]    ### [0, 1, 2, 3, 4]

### directly put the list comprehension in the Series function
pd.Series([ num**2 for num in range(5) ])


### indexes are added automatically
0     0
1     1
2     4
3     9
4    16
dtype: int64
### EXERCISE: Series with List Comprehension
# Create Series using list comprehension for the following tasks:

# 1. Create a Series of the first 10 even numbers (0, 2, 4, 6, ..., 18)
# 2. Create a Series of  squares of numbers from 1 to 10, with index as the numbers themselves
# 3. Create a Series with Fahrenheit temperatures (32, 50, 68, 86, 104) 
#    converted to Celsius, using the formula: C = (F - 32) * 5/9

### Your code starts here:




### Your code ends here.

Hide code cell source

# Solution

# 1. First 10 even numbers
even_numbers = pd.Series([n * 2 for n in range(10)])
print("Even numbers:")
print(even_numbers)
print()

# 2. Squares with custom index
squares = pd.Series([n**2 for n in range(1, 11)], index=range(1, 11))
print("Squares:")
print(squares)
print()

# 3. Fahrenheit to Celsius
fahrenheit = [32, 50, 68, 86, 104]
celsius = pd.Series([(f - 32) * 5/9 for f in fahrenheit], index=fahrenheit)
print("Temperature conversions:")
print(celsius)
Even numbers:
0     0
1     2
2     4
3     6
4     8
5    10
6    12
7    14
8    16
9    18
dtype: int64

Squares:
1       1
2       4
3       9
4      16
5      25
6      36
7      49
8      64
9      81
10    100
dtype: int64

Temperature conversions:
32      0.0
50     10.0
68     20.0
86     30.0
104    40.0
dtype: float64

4.1.2. Series Data Types#

A Pandas Series can hold data of various types, including integers, floats, strings, and even Python objects. The data type of the Series is inferred from the input data, but it can also be explicitly specified using the dtype parameter when creating the Series. This flexibility allows for efficient storage and manipulation of heterogeneous data within a single Series object. Some key points to note about Series Data Types:

  • Type inference: Pandas automatically infers the most appropriate data type

  • Memory efficiency: Using the correct dtype can significantly reduce memory usage

  • Performance: Numeric types are faster for mathematical operations

  • Type coercion: Mixed types are converted to object dtype (least efficient)

The common data types (dtypes) and their use cases include:

  • Numeric types (int64, float64): Mathematical operations, statistical analysis

  • Object type: Mixed data, strings, or when type is uncertain

  • Categorical: Repeated string values, ordered data, memory optimization

  • Datetime: Time series data, temporal analysis

  • Boolean: Filtering, conditional logic

Examine the examples below to learn about common Pandas Series dtypes and simple conversions/inspections:

### Series with different dtypes

s_int = pd.Series([1, 2, 3])
s_float = pd.Series([1.0, 2.5, np.nan])
s_str = pd.Series(['a', 'b', 'c'], dtype='string')    ### <v3 default to object 
s_mixed = pd.Series([1, 'a', 3.0])
s_bool = pd.Series([True, False, True])
s_dt = pd.Series(pd.date_range('2020-01-01', periods=3))
s_cat = pd.Series(pd.Categorical(['low', 'medium', 'high', 'low'],
                                 categories=['low', 'medium', 'high'],
                                 ordered=True))
### mixed with dictionary, list, and tuple
s_obj = pd.Series([{'x': 1}, [1, 2], (3, 4)])

### place the series above into a list of tuples
series_list = [
    ("s_int, \t\tinteger", s_int),
    ("s_float, \tfloat (with NaN)", s_float),
    ("s_string, \tstring", s_str),
    ("s_mixed, \tmixed (object)", s_mixed),
    ("s_boolean, \tboolean", s_bool),
    ("s_datetime, \tdatetime", s_dt),
    ("s_categorical, \tcategorical", s_cat),
    ("s_object,  \tobject", s_obj),
]

### print out the series:
for name, s in series_list:
    print(f"{name:40}\t dtype -> {s.dtype}")
s_int, 		integer                        	 dtype -> int64
s_float, 	float (with NaN)              	 dtype -> float64
s_string, 	string                       	 dtype -> string
s_mixed, 	mixed (object)                	 dtype -> object
s_boolean, 	boolean                     	 dtype -> bool
s_datetime, 	datetime                   	 dtype -> datetime64[ns]
s_categorical, 	categorical             	 dtype -> category
s_object,  	object                      	 dtype -> object

4.1.2.1. Type Conversion#

You can change the data type of a Series using the astype() method. This is useful when you need to convert values for compatibility, performance, or analysis—for example, converting strings to categories, floats to integers, or objects to more specific types. Be careful: if the conversion is not possible (e.g., converting a string that can’t be parsed as a number), pandas will raise an error.

### small demos of conversions operations

print("\nConversions operations:\n")
print("1. s_float \t -> fillna(0) \t-> astype(int): \t->", s_float.fillna(0).astype(int).tolist())
print("2. s_str \t\t\t-> astype('category') \t-> dtype:", s_str.astype('category').dtype)
print("\n3. s_mixed \t\t\t\t\t\t-> dtype (inferred):", s_mixed.dtype)

print("\n4.s_cat value counts (preserves categorical ordering):")
print(s_cat.value_counts(sort=False))
Conversions operations:

1. s_float 	 -> fillna(0) 	-> astype(int): 	-> [1, 2, 0]
2. s_str 			-> astype('category') 	-> dtype: category

3. s_mixed 						-> dtype (inferred): object

4.s_cat value counts (preserves categorical ordering):
low       2
medium    1
high      1
Name: count, dtype: int64
### EXERCISE: Type Conversion in Series
# You have a Series with string numbers that need to be processed:

string_numbers = pd.Series(['10', '20', '30', '40', '50'])
prices = pd.Series([19.99, 29.50, 15.75, 42.00, 8.25])

# 1. Convert string_numbers to integers
# 2. Convert prices to integers (rounding down)
# 3. Create a categorical Series from: ['low', 'high', 'medium', 'low', 'high']
#    with ordered categories: ['low', 'medium', 'high']

### Your code starts here:




### Your code ends here.

Hide code cell source

# Solution
string_numbers = pd.Series(['10', '20', '30', '40', '50'])
prices = pd.Series([19.99, 29.50, 15.75, 42.00, 8.25])

# 1. Convert strings to integers
numbers_as_int = string_numbers.astype(int)
print("String to int:")
print(numbers_as_int)
print(f"dtype: {numbers_as_int.dtype}\n")

# 2. Convert floats to integers (truncates decimals)
prices_as_int = prices.astype(int)
print("Float to int:")
print(prices_as_int)
print(f"dtype: {prices_as_int.dtype}\n")

# 3. Create categorical Series
priority_data = ['low', 'high', 'medium', 'low', 'high']
priority = pd.Series(
    pd.Categorical(priority_data, 
                   categories=['low', 'medium', 'high'], 
                   ordered=True)
)
print("Categorical Series:")
print(priority)
print(f"dtype: {priority.dtype}")
String to int:
0    10
1    20
2    30
3    40
4    50
dtype: int64
dtype: int64

Float to int:
0    19
1    29
2    15
3    42
4     8
dtype: int64
dtype: int64

Categorical Series:
0       low
1      high
2    medium
3       low
4      high
dtype: category
Categories (3, object): ['low' < 'medium' < 'high']
dtype: category

4.1.3. Advanced Series Creation Patterns#

Here is a demonstration of robust, efficient patterns for creating Pandas Series—covering error handling, dtype pre-allocation for performance, and handling special or edge-case values.

4.1.3.1. Error handling in Series creation#

Use simple validation and try/except blocks to ensure Series creation from heterogeneous or external data sources fails gracefully and provides informative messages.

### safe Series creation with error handling

try:
    # This might fail if data types are incompatible
    problematic_series = pd.Series([1, 'a', 3.14])
    print("Series created successfully:", problematic_series.dtype)
except Exception as e:
    print(f"Error creating Series: {e}")
Series created successfully: object

4.1.3.2. Performance optimization#

Pre-allocating Series with the correct numeric dtype and minimizing repeated conversions/allocations reduces memory overhead and speeds up large-data operations. The operations below show the difference in time consumed to create larger Series.

### pre-allocate Series dtypes for better performance with large datasets

import time

n = 1000000

### method 1: direct creation
start = time.time()
large_series1 = pd.Series(np.random.randn(n))
time1 = time.time() - start

### method 2: pre-allocate with dtype (faster for known types)
start = time.time()
large_series2 = pd.Series(np.random.randn(n), dtype=np.float64)
time2 = time.time() - start

print(f"Direct creation: {time1:.4f}s")
print(f"Pre-allocated: {time2:.4f}s")
Direct creation: 0.0291s
Pre-allocated: 0.0302s

4.1.3.3. Edge cases and special values#

This section demonstrates the flexibility of Series: you can store unusual or non-scalar objects (including classes, types, and functions) in a Pandas Series, although this is uncommon in typical data analysis workflows.

### even functions or types can be stored in Series (though not commonly used)

print(pd.Series([tuple, list, np.array]), "\n")
print(pd.Series([sum, print, len]))
0              <class 'tuple'>
1               <class 'list'>
2    <built-in function array>
dtype: object 

0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

4.1.4. Series Indexing#

The power of a Pandas Series comes from its index — an ordered set of labels that identify each element and enable fast, expressive, label-based operations. Some of the key concepts to learn about Series indexing include:

  • Label vs. position: use .loc for label-based access and .iloc for position-based access (e.g., ser.loc['Apple'], ser.iloc[0]).

  • Flexible labels: index values can be any hashable type (strings, integers, datetimes, etc.). The index itself can carry a name.

  • Alignment behavior: arithmetic and combine operations automatically align on index labels (missing labels yield NaN), so ser1 + ser2 uses the union of indexes.

  • Uniqueness & ordering: indexes may be non-unique and are ordered — choose the index type and uniqueness appropriate for your use case.

  • Performance: lookups by label are fast (dict-like). Vectorized operations on a well-chosen index/dtype give the best performance.

There are four types of indexing operations in Pandas Series:

  1. Label-based access (preferred for labeled data)

    • Single value: ser1['Apple'] or ser1.loc['Apple']

    • Multiple labels: ser1[['Apple', 'Cherry']] or ser1.loc[['Apple','Cherry']]

    • Label slicing (inclusive end): ser1['Apple':'Cherry'] or ser1.loc['Apple':'Cherry']

  2. Positional access (use for integer positions)

    • Single position: ser1.iloc[0]

    • Slice (end exclusive): ser1.iloc[0:2]

    • Avoid ser1[0] when the index is integer-labelled — it is ambiguous.

  3. Safe access and fast scalar ops

    • Safe get without KeyError: ser1.get('Durian', 'Not found')

    • Fast scalar access/assignment by label: ser1.at['Banana'] = 25

    • Fast scalar access/assignment by position: ser1.iat[0]

  4. Boolean indexing

    • Filter by condition: ser1[ser1 > 15]

    • Use .isin() to filter by a list of labels: ser1[ser1.index.isin(['Apple','Cranberry'])]

There are four core indexing methods in Pandas Series.

Method

Based On

Slice End

Safe?

1.

s.loc[]

Label

Inclusive

Y

2.

s.iloc[]

Position

Exclusive

Y

3.

s[]

Label or position

Depends

N

4.

Boolean

Condition

N/A

Y

Pandas also has special scalar access methods, which are ways to safely or quickly access a single value from a Series.

Method

Access Type

Safe

Speed

Use Case

1.

.get()

Label

Y

Medium

Safe lookup

not sure label exists

2.

.at[]

Label (scalar)

N

Fast

Single value

3.

.iat[]

Position (scalar)

N

Fast

Single value

*.get() does not raise a KeyError if the label doesn’t exist.

Let’s

  1. Takes the union of all index labels

  2. Aligns matching labels

  3. Adds values where both exist

  4. Returns NaN where one side is missing

### creating 2 Series with custom indexes

ser1 = pd.Series( [ 10, 20, 30 ], index = [ 'Apple', 'Banana', 'Cherry' ])
ser2 = pd.Series( [ 10, 20, 40 ], index = [ 'Apple', 'Banana', 'Cranberry'] )
print(ser1, "\n")    
print(ser2)

ser1 + ser2
Apple     10
Banana    20
Cherry    30
dtype: int64 

Apple        10
Banana       20
Cranberry    40
dtype: int64
Apple        20.0
Banana       40.0
Cherry        NaN
Cranberry     NaN
dtype: float64

4.1.4.1. Simple indexing#

  • loc() : based on label

  • iloc(): based on position

  • Boolean: based condition

# Example Series for demonstration
import pandas as pd
ser = pd.Series({'Apple': 10, 'Banana': 20, 'Cherry': 30})
print("The Series:", "\n", ser, "\n")

# Label-based indexing
print("1. Label-based (ser['Apple']):\t\t", ser['Apple'])
print("1. Label-based (ser.loc['Banana']):\t", ser.loc['Banana'], "\n")

# Position-based indexing
print("2. Position-based (ser.iloc[0]):\t\t", ser.iloc[0], "\n")

# Boolean indexing
print(f"3. Boolean indexing: ser[ser > 15]:\n {ser[ser > 15]}")
The Series: 
 Apple     10
Banana    20
Cherry    30
dtype: int64 

1. Label-based (ser['Apple']):		 10
1. Label-based (ser.loc['Banana']):	 20 

2. Position-based (ser.iloc[0]):		 10 

3. Boolean indexing: ser[ser > 15]:
 Banana    20
Cherry    30
dtype: int64
### EXERCISE: Indexing Practice
# Given this Series of city temperatures:

temps = pd.Series([72, 85, 68, 91, 77], 
                  index=['NYC', 'Miami', 'Seattle', 'Phoenix', 'Denver'])

# 1. Access Miami's temperature using label-based indexing with .loc
# 2. Get the first and last temperatures using position-based indexing with .iloc
# 3. Find all cities with temperature above 75 degrees using boolean indexing

### Your code starts here:




### Your code ends here.

Hide code cell source

# Solution
temps = pd.Series([72, 85, 68, 91, 77], 
                  index=['NYC', 'Miami', 'Seattle', 'Phoenix', 'Denver'])

# 1. Label-based access
miami_temp = temps.loc['Miami']
print(f"1. Miami temperature (using .loc): {miami_temp}°F\n")

# 2. Position-based access
first_temp = temps.iloc[0]
last_temp = temps.iloc[-1]
print(f"2. First temperature (using .iloc[0]): {first_temp}°F")
print(f"   Last temperature (using .iloc[-1]): {last_temp}°F\n")

# 3. Boolean indexing
hot_cities = temps[temps > 75]
print(f"3. Cities with temperature > 75°F:")
print(hot_cities)
1. Miami temperature (using .loc): 85°F

2. First temperature (using .iloc[0]): 72°F
   Last temperature (using .iloc[-1]): 77°F

3. Cities with temperature > 75°F:
Miami      85
Phoenix    91
Denver     77
dtype: int64

4.1.4.2. Scalar Access#

  • get(): by label; does not raise an error when not found.

  • at(): by label; fast scalar access; also update

  • iat(): by position; fast scalar access; also update

%%expect KeyError

ser = pd.Series([10, 20], index=['Apple', 'Banana'])

ser["Durian"]
# returns: 'Not found'
KeyError: 'Durian'
ser.get('Durian', 'Not found')
'Not found'
ser1.at['Banana']
np.int64(20)
ser1.at['Banana'] = 25
ser1.iat[0]
np.int64(10)
### EXERCISE: Scalar Access Methods
# Given this Series of product prices:

products = pd.Series([29.99, 15.50, 42.00, 8.99], 
                     index=['Shirt', 'Socks', 'Jacket', 'Hat'])

# 1. Use .get() to safely retrieve the price of 'Shoes' (which doesn't exist), 
#    return 'Not available' if not found
# 2. Use .at to update the price of 'Hat' to 12.99
# 3. Use .iat to get the price of the second item (position 1)

### Your code starts here:




### Your code ends here.

Hide code cell source

# Solution
products = pd.Series([29.99, 15.50, 42.00, 8.99], 
                     index=['Shirt', 'Socks', 'Jacket', 'Hat'])

print("Original Series:")
print(products)
print()

# 1. Safe access with .get()
shoes_price = products.get('Shoes', 'Not available')
print(f"1. Price of 'Shoes' using .get(): {shoes_price}\n")

# 2. Update value with .at
products.at['Hat'] = 12.99
print("2. After updating Hat price using .at:")
print(products)
print()

# 3. Access by position with .iat
second_item_price = products.iat[1]
print(f"3. Price at position 1 using .iat: ${second_item_price}")
Original Series:
Shirt     29.99
Socks     15.50
Jacket    42.00
Hat        8.99
dtype: float64

1. Price of 'Shoes' using .get(): Not available

2. After updating Hat price using .at:
Shirt     29.99
Socks     15.50
Jacket    42.00
Hat       12.99
dtype: float64

3. Price at position 1 using .iat: $15.5

4.1.4.3. .index and .value#

Note we can access the index values and the values of a Series by .index and .value attributes.

print(ser1.index)
print(ser1.values)
Index(['Apple', 'Banana', 'Cherry'], dtype='object')
[10 25 30]
### EXERCISE: Working with Index and Values
# Given this Series of monthly sales:

sales = pd.Series([12000, 15000, 13500, 18000, 16500],
                  index=['Jan', 'Feb', 'Mar', 'Apr', 'May'])

# 1. Extract and print the index values
# 2. Extract and print the values as a NumPy array
# 3. Calculate the total sales by summing the values
# 4. Find which month had the maximum sales (hint: use the index and values together)

### Your code starts here:




### Your code ends here.

Hide code cell source

# Solution
sales = pd.Series([12000, 15000, 13500, 18000, 16500],
                  index=['Jan', 'Feb', 'Mar', 'Apr', 'May'])

# 1. Extract index
print("1. Index values:")
print(sales.index)
print()

# 2. Extract values
print("2. Values as NumPy array:")
print(sales.values)
print(f"   Type: {type(sales.values)}\n")

# 3. Total sales
total = sales.values.sum()
print(f"3. Total sales: ${total:,}\n")

# 4. Month with maximum sales
max_month = sales.index[sales.values.argmax()]
max_value = sales.max()
print(f"4. Maximum sales month: {max_month} with ${max_value:,}")
1. Index values:
Index(['Jan', 'Feb', 'Mar', 'Apr', 'May'], dtype='object')

2. Values as NumPy array:
[12000 15000 13500 18000 16500]
   Type: <class 'numpy.ndarray'>

3. Total sales: $75,000
4. Maximum sales month: Apr with $18,000

4.1.5. Slicing in Series#

Slicing lets you select a range of values from a Series, not just single items. This is useful for working with subsets of your data.

  1. Label-based Slicing (.loc)

    • Use labels to select a range.

    • Inclusive: The end label is included.

  2. Position-based Slicing (.iloc)

    • Use integer positions to select a range.

    • Exclusive: The end position is not included (like standard Python slicing).

# Label-based slicing (inclusive)
print(ser.loc['Apple':'Cherry'], "\n")
print(ser['Apple':'Cherry'])

# Position-based slicing (exclusive)
print("\nser.iloc[0:2]:")
print(ser.iloc[0:2])

# Try changing the slice to see what happens!
# For example:
# print(ser.loc['Banana':'Banana'])
# print(ser.iloc[1:3])
Apple     10
Banana    20
dtype: int64 

Apple     10
Banana    20
dtype: int64

ser.iloc[0:2]:
Apple     10
Banana    20
dtype: int64

Comparison between .loc and .iloc:

Slicing Type

Syntax

End/Stop

Use Case

Label-based

ser.loc['A':'C']

Inclusive (Because .loc is label-based, not position-based)

Named index

Position-based

ser.iloc[0:2]

Exclusive

Integer positions

4.1.5.1. Avoid Python style indexing/slicing#

s = pd.Series([10, 20, 30], index=['A', 'B', 'C'])
s[0:2]
A    10
B    20
dtype: int64

Will generate an error when the position(s) do not exist.

%%expect KeyError

s = pd.Series([10, 20, 30], index=[100, 200, 300])
print(s, "\n")

print(s[1])       ### there's no index 1; there's 100
100    10
200    20
300    30
dtype: int64 
KeyError: 1

See below:

s[1:4]… what do you mean by this? Label or position?

s = pd.Series([10, 20, 30, 40, 50], index=[1, 2, 3, 4, 5])

print(s, "\n")

s[1:4]        ### What do you mean by this?
1    10
2    20
3    30
4    40
5    50
dtype: int64 
2    20
3    30
4    40
dtype: int64
print(ser1.iloc[1] , "\n")
print(ser1.iloc[0:2])
25 

Apple     10
Banana    25
dtype: int64

Slicing Type

Syntax

End Behavior

Example

Result

Label-based

ser['Apple':'Cherry']

Inclusive

Includes ‘Cherry’

Apple, Banana, Cherry

Label-based

ser.loc['Apple':'Cherry']

Inclusive

Includes ‘Cherry’

Apple, Banana, Cherry

Position-based

ser.iloc[0:2]

Exclusive

Excludes position 2

Apple, Banana

4.1.5.2. Masking#

In pandas, Boolean indexing and masking refer to the same fundamental operation.

4.1.5.2.1. Boolean Indexing vs Masking#

import pandas as pd

s = pd.Series([10, 20, 30, 40, 50])

### "Boolean Indexing"
result1 = s[s > 25]

### "Masking"
mask = s > 25
result2 = s[mask]

print(result1, "\n")
print(result2)
2    30
3    40
4    50
dtype: int64 

2    30
3    40
4    50
dtype: int64

You can use .where() and .mask() replace values:

  • s.where(COND): Keeps elements where True, replaces with NaN where False

  • s.mask(COND): Keeps elements where False, replaces with NaN where True

Method

What Happens

.where(cond, other=x)

If condition fails → use x

.mask(cond, other=x)

If condition passes → use x

### creating 2 Series with custom indexes

ser = pd.Series( [ 10, 20, 30 ], index = [ 'Apple', 'Banana', 'Cherry' ])
ser
Apple     10
Banana    20
Cherry    30
dtype: int64
### filtering: "Keep values > 15, 0, vs else 0"

print(ser.where(ser > 15, other=0), "\n")
print(ser.mask(ser > 15, other=True), "\n")
Apple      0
Banana    20
Cherry    30
dtype: int64 

Apple       10
Banana    True
Cherry    True
dtype: object 
### EXERCISE: Use Series Masking to Filter Data
### You have a Series:
import pandas as pd

scores = pd.Series([85, 92, 78, 95, 88, 73, 91], 
                   index=['Alice', 'Bob', 'Charlie', 'David', 'Emma', 'Frank', 'Grace'])

### A. Use boolean indexing to find all students who scored 85 or higher.
### B. Use the .where() method to keep scores that are 85 or higher, and 
### replace lower scores with NaN.
### C. Use the .mask() method to replace scores below 85 with NaN (keep 
### scores 85 and above).

Hide code cell source

print("=== PART A: Boolean Indexing ===")
passing_scores = scores[scores >= 85]
print(passing_scores)
print(f"Shape: {passing_scores.shape}\n")

print("=== PART B: Using .where() ===")
result_where = scores.where(scores >= 85)
# result_where = scores.where(scores >= 85, other=np.nan)  # explicitly set other to np.nan for clarity
print(result_where)
print(f"Shape: {result_where.shape}\n")

print("=== PART C: Using .mask() ===")
# result_mask = scores.mask(scores < 85)
result_mask = scores.mask(scores < 85, other=np.nan)
print(result_mask)

# print()

# print(f"Shape: {result_mask.shape}\n")
=== PART A: Boolean Indexing ===
Alice    85
Bob      92
David    95
Emma     88
Grace    91
dtype: int64
Shape: (5,)

=== PART B: Using .where() ===
Alice      85.0
Bob        92.0
Charlie     NaN
David      95.0
Emma       88.0
Frank       NaN
Grace      91.0
dtype: float64
Shape: (7,)

=== PART C: Using .mask() ===
Alice      85.0
Bob        92.0
Charlie     NaN
David      95.0
Emma       88.0
Frank       NaN
Grace      91.0
dtype: float64