4.1. Pandas Series#
The Pandas Series is a one-dimensional labeled array that can hold any data type. It’s the primary building block of Pandas and serves as the foundation for the more complex DataFrame structure.
Key characteristics of Pandas Series include:
Labeled index: Each element has an associated label (index)
Heterogeneous data types: Can hold mixed data types (unlike NumPy arrays, which are homogeneous)
Built on NumPy: Internally uses NumPy arrays for efficient storage
Flexible creation: Can be created from lists, arrays, dictionaries, and more
When to use Series:
Labeled data where the index has meaning (e.g., names, IDs)
A single column of data extracted from a DataFrame
Lookup tables or mapping data (e.g., mapping codes to values)
Time series data with datetime indexing
When you want to perform vectorized operations on 1D data with index alignment
A Series is very similar to a NumPy array (in fact, it is built on top of the NumPy array object). What differentiates a NumPy array from a Series is that a Series can have labels, meaning the elements can be indexed by labels instead of just numerical positions, which helps a lot when performing data analysis. Additionally, while NumPy arrays are designed for homogeneous numeric data, a Series can hold any arbitrary Python object.
# %pip install pandas
import numpy as np
import pandas as pd
4.1.1. Creating Series#
A Pandas Series can be created by loading data from existing storage such as dataset files and SQL data sources, or directly from Python objects like lists, dictionaries, or even single scalar values.
When creating a Series:
We commonly pass data as an argument to the data parameter, or use both the data and index parameters with objects.
Automatic indexing: You can feed a sequence object (e.g.,
listandndarray) to create a Pandas Series, and the default integer index (0, 1, 2, …) will be automatically created, unless you supply anindex.dict: Passing a dictionary uses the dictionary keys as the Series index and the values as the data.
Custom index: You can provide custom labels via the
indexargument to make your Series more descriptive and easier to work with.dtype: Pandas infers thedtypeautomatically; specifydtype(dtype=) explicitly if you need a particular type.By specifying parameter names explicitly, you can pass arguments in any order, making your code more readable and less error-prone.
List comprehensions can be used inline when building a Series from a generated sequence.
(Note that
NaN, Not a Number, is a float value in pandas and NumPy.)
To create a Pandas Series, the syntax is:
pd.Series(data=None, index=None, dtype=None, name=None, copy=None)
where the parameters are
Parameter |
Type |
Default |
Description |
Example |
|---|---|---|---|---|
data |
array-like, Iterable, dict, or scalar |
|
The values for the Series. Can be a list, array, dictionary, or single value. |
|
index |
array-like or Index |
|
Labels for each element. If not provided, defaults to RangeIndex (0, 1, 2…). |
|
dtype |
str, numpy.dtype, or ExtensionDtype |
|
Data type to force. If not specified, will be inferred from data. |
|
name |
str |
|
The name of the Series (useful when converting to DataFrame). |
|
copy |
bool |
|
Whether to copy input data. If False, data is referenced (not copied). |
|
A simple example of creating a Pandas Series is:
### Create a simple Series from a list
ser = pd.Series([1, 3, 5, np.nan, 6, 8])
ser
0 1.0
1 3.0
2 5.0
3 NaN
4 6.0
5 8.0
dtype: float64
4.1.1.1. From Sequences#
From the above evaluation outcome, you notice that each of the elements now has a corresponding index. This is just like lists but explicit. Usually, you would want to supply labels to make the data more descriptive by setting index. The index is the set of labels that uniquely identify each element in a Series (or row in a DataFrame), enabling intuitive label-based access and automatic alignment of data across objects.
Now let’s prepare the following data for more Series creation:
labels = ['Apple','Banana','Cherry'] ### a string list
lst = [10,20,30] ### a numerical list
arr = np.array(lst) ### a numpy array
dict = {'Apple':10,'Banana':20,'Cherry':30} ### a dictionary: key-value pairs
### note that we use tuple unpacking to loop over name-object pairs
for name, obj in [('labels', labels), ('lst', lst), ('arr', arr), ('dict', dict)]:
print(f"{name}\t: {obj}")
labels : ['Apple', 'Banana', 'Cherry']
lst : [10, 20, 30]
arr : [10 20 30]
dict : {'Apple': 10, 'Banana': 20, 'Cherry': 30}
See below for creating a Series by sending list data as an argument with parameter data, then with both parameter data and index with objects, and finally, only the objects without parameter names. These examples show how to use the Series constructor to create Series from Python and NumPy objects, along with the Series parameters.
Observe how
How the
Series()constructor is used to create Pandas Series.How the parameters are used in these examples.
print(pd.Series(data=lst), "\n") ### create Series by sending lst as argument with parameter data
print(pd.Series(data=lst,index=labels), "\n") ### add labels: index
print(pd.Series(lst, labels), "\n") ### same as above, without parameter names
print(pd.Series(arr), "\n")
print(pd.Series(arr, index=labels))
0 10
1 20
2 30
dtype: int64
Apple 10
Banana 20
Cherry 30
dtype: int64
Apple 10
Banana 20
Cherry 30
dtype: int64
0 10
1 20
2 30
dtype: int64
Apple 10
Banana 20
Cherry 30
dtype: int64
### EXERCISE: Create Series from a List
# You have three lists representing student names, their scores, and grades:
names = ['Alice', 'Bob', 'Charlie', 'David']
scores = [95, 87, 92, 78]
grades = ['A', 'B', 'A', 'C']
# 1. Create a Series from the scores list with names as the index
# 2. Create a Series from a NumPy array of scores with names as index
# 3. Print both Series to compare
### Your code starts here:
### Your code ends here.
Series from list:
Alice 95
Bob 87
Charlie 92
David 78
dtype: int64
Series from NumPy array:
Alice 95
Bob 87
Charlie 92
David 78
dtype: int64
Are they equal? True
4.1.1.2. From Python dictionary#
Since a Python dictionary maps keys to values, passing it to pd.Series will use the dictionary keys as the Series index and the corresponding values as the Series data — a convenient way to create a labeled Series from key–value pairs.
### key-value pairs
pd.Series(dict)
Apple 10
Banana 20
Cherry 30
dtype: int64
### EXERCISE: Create Series from Dictionary
# You have product inventory data stored as a dictionary:
inventory = {
'Laptop': 15,
'Mouse': 45,
'Keyboard': 30,
'Monitor': 12,
'Headset': 25
}
# 1. Create a Series from the inventory dictionary
# 2. Create another Series but only for items: ['Laptop', 'Monitor', 'Headset']
# 3. What happens to items not in the dictionary when you specify them in the index?
### Your code starts here:
### Your code ends here.
Full inventory:
Laptop 15
Mouse 45
Keyboard 30
Monitor 12
Headset 25
dtype: int64
Selected items:
Laptop 15
Monitor 12
Headset 25
dtype: int64
With missing item 'Tablet':
Laptop 15.0
Monitor 12.0
Tablet NaN
dtype: float64
Note: Missing items get NaN value
4.1.1.3. Using List Comprehension#
When we need to create a Python list object in a workflow, list comprehension allows us to do so without interruption. Note that in the code below we place the list comprehension in the Series function.
### create a list using list comprehension
### this is a demonstration of list comprehension and we save the list to a variable
nums_lst = [ num for num in range(5) ] ### [0, 1, 2, 3, 4]
### directly put the list comprehension in the Series function
pd.Series([ num**2 for num in range(5) ])
### indexes are added automatically
0 0
1 1
2 4
3 9
4 16
dtype: int64
### EXERCISE: Series with List Comprehension
# Create Series using list comprehension for the following tasks:
# 1. Create a Series of the first 10 even numbers (0, 2, 4, 6, ..., 18)
# 2. Create a Series of squares of numbers from 1 to 10, with index as the numbers themselves
# 3. Create a Series with Fahrenheit temperatures (32, 50, 68, 86, 104)
# converted to Celsius, using the formula: C = (F - 32) * 5/9
### Your code starts here:
### Your code ends here.
Even numbers:
0 0
1 2
2 4
3 6
4 8
5 10
6 12
7 14
8 16
9 18
dtype: int64
Squares:
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100
dtype: int64
Temperature conversions:
32 0.0
50 10.0
68 20.0
86 30.0
104 40.0
dtype: float64
4.1.2. Series Data Types#
A Pandas Series can hold data of various types, including integers, floats, strings, and even Python objects. The data type of the Series is inferred from the input data, but it can also be explicitly specified using the dtype parameter when creating the Series. This flexibility allows for efficient storage and manipulation of heterogeneous data within a single Series object. Some key points to note about Series Data Types:
Type inference: Pandas automatically infers the most appropriate data type
Memory efficiency: Using the correct dtype can significantly reduce memory usage
Performance: Numeric types are faster for mathematical operations
Type coercion: Mixed types are converted to object dtype (least efficient)
The common data types (dtypes) and their use cases include:
Numeric types (int64, float64): Mathematical operations, statistical analysis
Object type: Mixed data, strings, or when type is uncertain
Categorical: Repeated string values, ordered data, memory optimization
Datetime: Time series data, temporal analysis
Boolean: Filtering, conditional logic
Examine the examples below to learn about common Pandas Series dtypes and simple conversions/inspections:
### Series with different dtypes
s_int = pd.Series([1, 2, 3])
s_float = pd.Series([1.0, 2.5, np.nan])
s_str = pd.Series(['a', 'b', 'c'], dtype='string') ### <v3 default to object
s_mixed = pd.Series([1, 'a', 3.0])
s_bool = pd.Series([True, False, True])
s_dt = pd.Series(pd.date_range('2020-01-01', periods=3))
s_cat = pd.Series(pd.Categorical(['low', 'medium', 'high', 'low'],
categories=['low', 'medium', 'high'],
ordered=True))
### mixed with dictionary, list, and tuple
s_obj = pd.Series([{'x': 1}, [1, 2], (3, 4)])
### place the series above into a list of tuples
series_list = [
("s_int, \t\tinteger", s_int),
("s_float, \tfloat (with NaN)", s_float),
("s_string, \tstring", s_str),
("s_mixed, \tmixed (object)", s_mixed),
("s_boolean, \tboolean", s_bool),
("s_datetime, \tdatetime", s_dt),
("s_categorical, \tcategorical", s_cat),
("s_object, \tobject", s_obj),
]
### print out the series:
for name, s in series_list:
print(f"{name:40}\t dtype -> {s.dtype}")
s_int, integer dtype -> int64
s_float, float (with NaN) dtype -> float64
s_string, string dtype -> string
s_mixed, mixed (object) dtype -> object
s_boolean, boolean dtype -> bool
s_datetime, datetime dtype -> datetime64[ns]
s_categorical, categorical dtype -> category
s_object, object dtype -> object
4.1.2.1. Type Conversion#
You can change the data type of a Series using the astype() method. This is useful when you need to convert values for compatibility, performance, or analysis—for example, converting strings to categories, floats to integers, or objects to more specific types. Be careful: if the conversion is not possible (e.g., converting a string that can’t be parsed as a number), pandas will raise an error.
### small demos of conversions operations
print("\nConversions operations:\n")
print("1. s_float \t -> fillna(0) \t-> astype(int): \t->", s_float.fillna(0).astype(int).tolist())
print("2. s_str \t\t\t-> astype('category') \t-> dtype:", s_str.astype('category').dtype)
print("\n3. s_mixed \t\t\t\t\t\t-> dtype (inferred):", s_mixed.dtype)
print("\n4.s_cat value counts (preserves categorical ordering):")
print(s_cat.value_counts(sort=False))
Conversions operations:
1. s_float -> fillna(0) -> astype(int): -> [1, 2, 0]
2. s_str -> astype('category') -> dtype: category
3. s_mixed -> dtype (inferred): object
4.s_cat value counts (preserves categorical ordering):
low 2
medium 1
high 1
Name: count, dtype: int64
### EXERCISE: Type Conversion in Series
# You have a Series with string numbers that need to be processed:
string_numbers = pd.Series(['10', '20', '30', '40', '50'])
prices = pd.Series([19.99, 29.50, 15.75, 42.00, 8.25])
# 1. Convert string_numbers to integers
# 2. Convert prices to integers (rounding down)
# 3. Create a categorical Series from: ['low', 'high', 'medium', 'low', 'high']
# with ordered categories: ['low', 'medium', 'high']
### Your code starts here:
### Your code ends here.
String to int:
0 10
1 20
2 30
3 40
4 50
dtype: int64
dtype: int64
Float to int:
0 19
1 29
2 15
3 42
4 8
dtype: int64
dtype: int64
Categorical Series:
0 low
1 high
2 medium
3 low
4 high
dtype: category
Categories (3, object): ['low' < 'medium' < 'high']
dtype: category
4.1.3. Advanced Series Creation Patterns#
Here is a demonstration of robust, efficient patterns for creating Pandas Series—covering error handling, dtype pre-allocation for performance, and handling special or edge-case values.
4.1.3.1. Error handling in Series creation#
Use simple validation and try/except blocks to ensure Series creation from heterogeneous or external data sources fails gracefully and provides informative messages.
### safe Series creation with error handling
try:
# This might fail if data types are incompatible
problematic_series = pd.Series([1, 'a', 3.14])
print("Series created successfully:", problematic_series.dtype)
except Exception as e:
print(f"Error creating Series: {e}")
Series created successfully: object
4.1.3.2. Performance optimization#
Pre-allocating Series with the correct numeric dtype and minimizing repeated conversions/allocations reduces memory overhead and speeds up large-data operations. The operations below show the difference in time consumed to create larger Series.
### pre-allocate Series dtypes for better performance with large datasets
import time
n = 1000000
### method 1: direct creation
start = time.time()
large_series1 = pd.Series(np.random.randn(n))
time1 = time.time() - start
### method 2: pre-allocate with dtype (faster for known types)
start = time.time()
large_series2 = pd.Series(np.random.randn(n), dtype=np.float64)
time2 = time.time() - start
print(f"Direct creation: {time1:.4f}s")
print(f"Pre-allocated: {time2:.4f}s")
Direct creation: 0.0291s
Pre-allocated: 0.0302s
4.1.3.3. Edge cases and special values#
This section demonstrates the flexibility of Series: you can store unusual or non-scalar objects (including classes, types, and functions) in a Pandas Series, although this is uncommon in typical data analysis workflows.
### even functions or types can be stored in Series (though not commonly used)
print(pd.Series([tuple, list, np.array]), "\n")
print(pd.Series([sum, print, len]))
0 <class 'tuple'>
1 <class 'list'>
2 <built-in function array>
dtype: object
0 <built-in function sum>
1 <built-in function print>
2 <built-in function len>
dtype: object
4.1.4. Series Indexing#
The power of a Pandas Series comes from its index — an ordered set of labels that identify each element and enable fast, expressive, label-based operations. Some of the key concepts to learn about Series indexing include:
Label vs. position: use
.locfor label-based access and.ilocfor position-based access (e.g.,ser.loc['Apple'],ser.iloc[0]).Flexible labels: index values can be any hashable type (strings, integers, datetimes, etc.). The index itself can carry a name.
Alignment behavior: arithmetic and combine operations automatically align on index labels (missing labels yield NaN), so ser1 + ser2 uses the union of indexes.
Uniqueness & ordering: indexes may be non-unique and are ordered — choose the index type and uniqueness appropriate for your use case.
Performance: lookups by label are fast (dict-like). Vectorized operations on a well-chosen index/dtype give the best performance.
There are four types of indexing operations in Pandas Series:
Label-based access (preferred for labeled data)
Single value:
ser1['Apple']orser1.loc['Apple']Multiple labels:
ser1[['Apple', 'Cherry']]orser1.loc[['Apple','Cherry']]Label slicing (inclusive end):
ser1['Apple':'Cherry']orser1.loc['Apple':'Cherry']
Positional access (use for integer positions)
Single position:
ser1.iloc[0]Slice (end exclusive):
ser1.iloc[0:2]Avoid
ser1[0]when the index is integer-labelled — it is ambiguous.
Safe access and fast scalar ops
Safe get without KeyError:
ser1.get('Durian', 'Not found')Fast scalar access/assignment by label:
ser1.at['Banana'] = 25Fast scalar access/assignment by position:
ser1.iat[0]
Boolean indexing
Filter by condition:
ser1[ser1 > 15]Use .isin() to filter by a list of labels:
ser1[ser1.index.isin(['Apple','Cranberry'])]
There are four core indexing methods in Pandas Series.
Method |
Based On |
Slice End |
Safe? |
|
|---|---|---|---|---|
1. |
|
Label |
Inclusive |
Y |
2. |
|
Position |
Exclusive |
Y |
3. |
|
Label or position |
Depends |
N |
4. |
Boolean |
Condition |
N/A |
Y |
Pandas also has special scalar access methods, which are ways to safely or quickly access a single value from a Series.
Method |
Access Type |
Safe |
Speed |
Use Case |
||
|---|---|---|---|---|---|---|
1. |
|
Label |
Y |
Medium |
Safe lookup |
not sure label exists |
2. |
|
Label (scalar) |
N |
Fast |
Single value |
|
3. |
|
Position (scalar) |
N |
Fast |
Single value |
*.get() does not raise a KeyError if the label doesn’t exist.
Let’s
Takes the union of all index labels
Aligns matching labels
Adds values where both exist
Returns NaN where one side is missing
### creating 2 Series with custom indexes
ser1 = pd.Series( [ 10, 20, 30 ], index = [ 'Apple', 'Banana', 'Cherry' ])
ser2 = pd.Series( [ 10, 20, 40 ], index = [ 'Apple', 'Banana', 'Cranberry'] )
print(ser1, "\n")
print(ser2)
ser1 + ser2
Apple 10
Banana 20
Cherry 30
dtype: int64
Apple 10
Banana 20
Cranberry 40
dtype: int64
Apple 20.0
Banana 40.0
Cherry NaN
Cranberry NaN
dtype: float64
4.1.4.1. Simple indexing#
loc(): based on labeliloc(): based on positionBoolean: based condition
# Example Series for demonstration
import pandas as pd
ser = pd.Series({'Apple': 10, 'Banana': 20, 'Cherry': 30})
print("The Series:", "\n", ser, "\n")
# Label-based indexing
print("1. Label-based (ser['Apple']):\t\t", ser['Apple'])
print("1. Label-based (ser.loc['Banana']):\t", ser.loc['Banana'], "\n")
# Position-based indexing
print("2. Position-based (ser.iloc[0]):\t\t", ser.iloc[0], "\n")
# Boolean indexing
print(f"3. Boolean indexing: ser[ser > 15]:\n {ser[ser > 15]}")
The Series:
Apple 10
Banana 20
Cherry 30
dtype: int64
1. Label-based (ser['Apple']): 10
1. Label-based (ser.loc['Banana']): 20
2. Position-based (ser.iloc[0]): 10
3. Boolean indexing: ser[ser > 15]:
Banana 20
Cherry 30
dtype: int64
### EXERCISE: Indexing Practice
# Given this Series of city temperatures:
temps = pd.Series([72, 85, 68, 91, 77],
index=['NYC', 'Miami', 'Seattle', 'Phoenix', 'Denver'])
# 1. Access Miami's temperature using label-based indexing with .loc
# 2. Get the first and last temperatures using position-based indexing with .iloc
# 3. Find all cities with temperature above 75 degrees using boolean indexing
### Your code starts here:
### Your code ends here.
1. Miami temperature (using .loc): 85°F
2. First temperature (using .iloc[0]): 72°F
Last temperature (using .iloc[-1]): 77°F
3. Cities with temperature > 75°F:
Miami 85
Phoenix 91
Denver 77
dtype: int64
4.1.4.2. Scalar Access#
get(): by label; does not raise an error when not found.at(): by label; fast scalar access; also updateiat(): by position; fast scalar access; also update
%%expect KeyError
ser = pd.Series([10, 20], index=['Apple', 'Banana'])
ser["Durian"]
# returns: 'Not found'
KeyError: 'Durian'
ser.get('Durian', 'Not found')
'Not found'
ser1.at['Banana']
np.int64(20)
ser1.at['Banana'] = 25
ser1.iat[0]
np.int64(10)
### EXERCISE: Scalar Access Methods
# Given this Series of product prices:
products = pd.Series([29.99, 15.50, 42.00, 8.99],
index=['Shirt', 'Socks', 'Jacket', 'Hat'])
# 1. Use .get() to safely retrieve the price of 'Shoes' (which doesn't exist),
# return 'Not available' if not found
# 2. Use .at to update the price of 'Hat' to 12.99
# 3. Use .iat to get the price of the second item (position 1)
### Your code starts here:
### Your code ends here.
Original Series:
Shirt 29.99
Socks 15.50
Jacket 42.00
Hat 8.99
dtype: float64
1. Price of 'Shoes' using .get(): Not available
2. After updating Hat price using .at:
Shirt 29.99
Socks 15.50
Jacket 42.00
Hat 12.99
dtype: float64
3. Price at position 1 using .iat: $15.5
4.1.4.3. .index and .value#
Note we can access the index values and the values of a Series by .index and .value attributes.
print(ser1.index)
print(ser1.values)
Index(['Apple', 'Banana', 'Cherry'], dtype='object')
[10 25 30]
### EXERCISE: Working with Index and Values
# Given this Series of monthly sales:
sales = pd.Series([12000, 15000, 13500, 18000, 16500],
index=['Jan', 'Feb', 'Mar', 'Apr', 'May'])
# 1. Extract and print the index values
# 2. Extract and print the values as a NumPy array
# 3. Calculate the total sales by summing the values
# 4. Find which month had the maximum sales (hint: use the index and values together)
### Your code starts here:
### Your code ends here.
1. Index values:
Index(['Jan', 'Feb', 'Mar', 'Apr', 'May'], dtype='object')
2. Values as NumPy array:
[12000 15000 13500 18000 16500]
Type: <class 'numpy.ndarray'>
3. Total sales: $75,000
4. Maximum sales month: Apr with $18,000
4.1.5. Slicing in Series#
Slicing lets you select a range of values from a Series, not just single items. This is useful for working with subsets of your data.
Label-based Slicing (
.loc)Use labels to select a range.
Inclusive: The end label is included.
Position-based Slicing (
.iloc)Use integer positions to select a range.
Exclusive: The end position is not included (like standard Python slicing).
# Label-based slicing (inclusive)
print(ser.loc['Apple':'Cherry'], "\n")
print(ser['Apple':'Cherry'])
# Position-based slicing (exclusive)
print("\nser.iloc[0:2]:")
print(ser.iloc[0:2])
# Try changing the slice to see what happens!
# For example:
# print(ser.loc['Banana':'Banana'])
# print(ser.iloc[1:3])
Apple 10
Banana 20
dtype: int64
Apple 10
Banana 20
dtype: int64
ser.iloc[0:2]:
Apple 10
Banana 20
dtype: int64
Comparison between .loc and .iloc:
Slicing Type |
Syntax |
End/Stop |
Use Case |
|---|---|---|---|
Label-based |
|
Inclusive (Because .loc is label-based, not position-based) |
Named index |
Position-based |
|
Exclusive |
Integer positions |
4.1.5.1. Avoid Python style indexing/slicing#
s = pd.Series([10, 20, 30], index=['A', 'B', 'C'])
s[0:2]
A 10
B 20
dtype: int64
Will generate an error when the position(s) do not exist.
%%expect KeyError
s = pd.Series([10, 20, 30], index=[100, 200, 300])
print(s, "\n")
print(s[1]) ### there's no index 1; there's 100
100 10
200 20
300 30
dtype: int64
KeyError: 1
See below:
s[1:4]… what do you mean by this? Label or position?
s = pd.Series([10, 20, 30, 40, 50], index=[1, 2, 3, 4, 5])
print(s, "\n")
s[1:4] ### What do you mean by this?
1 10
2 20
3 30
4 40
5 50
dtype: int64
2 20
3 30
4 40
dtype: int64
print(ser1.iloc[1] , "\n")
print(ser1.iloc[0:2])
25
Apple 10
Banana 25
dtype: int64
Slicing Type |
Syntax |
End Behavior |
Example |
Result |
|---|---|---|---|---|
Label-based |
|
Inclusive |
Includes ‘Cherry’ |
Apple, Banana, Cherry |
Label-based |
|
Inclusive |
Includes ‘Cherry’ |
Apple, Banana, Cherry |
Position-based |
|
Exclusive |
Excludes position 2 |
Apple, Banana |
4.1.5.2. Masking#
In pandas, Boolean indexing and masking refer to the same fundamental operation.
4.1.5.2.1. Boolean Indexing vs Masking#
import pandas as pd
s = pd.Series([10, 20, 30, 40, 50])
### "Boolean Indexing"
result1 = s[s > 25]
### "Masking"
mask = s > 25
result2 = s[mask]
print(result1, "\n")
print(result2)
2 30
3 40
4 50
dtype: int64
2 30
3 40
4 50
dtype: int64
You can use .where() and .mask() replace values:
s.where(COND): Keeps elements where True, replaces with NaN where Falses.mask(COND): Keeps elements where False, replaces with NaN where True
Method |
What Happens |
|---|---|
|
If condition fails → use x |
|
If condition passes → use x |
### creating 2 Series with custom indexes
ser = pd.Series( [ 10, 20, 30 ], index = [ 'Apple', 'Banana', 'Cherry' ])
ser
Apple 10
Banana 20
Cherry 30
dtype: int64
### filtering: "Keep values > 15, 0, vs else 0"
print(ser.where(ser > 15, other=0), "\n")
print(ser.mask(ser > 15, other=True), "\n")
Apple 0
Banana 20
Cherry 30
dtype: int64
Apple 10
Banana True
Cherry True
dtype: object
### EXERCISE: Use Series Masking to Filter Data
### You have a Series:
import pandas as pd
scores = pd.Series([85, 92, 78, 95, 88, 73, 91],
index=['Alice', 'Bob', 'Charlie', 'David', 'Emma', 'Frank', 'Grace'])
### A. Use boolean indexing to find all students who scored 85 or higher.
### B. Use the .where() method to keep scores that are 85 or higher, and
### replace lower scores with NaN.
### C. Use the .mask() method to replace scores below 85 with NaN (keep
### scores 85 and above).
=== PART A: Boolean Indexing ===
Alice 85
Bob 92
David 95
Emma 88
Grace 91
dtype: int64
Shape: (5,)
=== PART B: Using .where() ===
Alice 85.0
Bob 92.0
Charlie NaN
David 95.0
Emma 88.0
Frank NaN
Grace 91.0
dtype: float64
Shape: (7,)
=== PART C: Using .mask() ===
Alice 85.0
Bob 92.0
Charlie NaN
David 95.0
Emma 88.0
Frank NaN
Grace 91.0
dtype: float64