7.1. Pandas Series#
7.1.1. Overview#
The Pandas Series is a one-dimensional labeled array capable of holding any data type. It’s the primary building block of Pandas and serves as the foundation for the more complex DataFrame structure.
7.1.1.1. Key Characteristics:#
Labeled index: Each element has an associated label (index)
Heterogeneous data types: Can hold mixed data types (unlike NumPy arrays which are homogeneous)
Built on NumPy: Internally uses NumPy arrays for efficient storage
Flexible creation: Can be created from lists, arrays, dictionaries, and more
7.1.1.2. When to use Series:#
Time series data with datetime indexing
Labeled data where index meaning is important
Single column of data from a larger dataset
Lookup tables or mapping data
The first main data type you will learn about for Pandas is the Series data type. When creating a Series by passing a list of values, pandas will create a default RangeIndex.
You should note that this process is very similar to creating a NumPy array, where we started by using the np.array() function to turn a list into an array.
You should also note that this Series below contains heterogeneous data-type elements, which is the same as a list but not as ndarray.
A Series is very similar to a NumPy array (in fact, it is built on top of the NumPy array object). What differentiates a NumPy array from a Series is that a Series can have labels, meaning the elements can be indexed by labels instead of just numerical positions, which helps a lot when performing data analysis. Additionally, while NumPy arrays are designed for homogeneous numeric data, a Series can hold any arbitrary Python object.
A Pandas Series can be created by loading data from existing storage such as dataset files and SQL data sources, or directly from Python objects like lists, dictionaries, or even single scalar values.
import numpy as np
import pandas as pd
7.1.2. Creating Series#
7.1.2.1. From Python Objects#
Pandas Series
can be constructed from many Python objects — most commonly lists, NumPy arrays, and dictionaries. Note that:
When you pass a list or NumPy array, pandas creates a default integer index (0, 1, 2, …) unless you supply an
index
.Passing a dictionary uses the dictionary keys as the Series index and the values as the data.
Pandas infers the
dtype
automatically; specifydtype
(dtype=) explicitly if you need a particular type.You can provide custom labels via the
index
argument to make your Series more descriptive and easier to work with.List comprehensions can be used inline when building a Series from a generated sequence.
When creating a Series, the pd.Series()
constructor accepts several important arguments:
data: The actual values (list, array, dictionary, etc.)
index: Custom labels for each element (optional)
dtype: Explicit data type specification (optional)
By specifying parameter names explicitly, you can pass arguments in any order, making your code more readable and less error-prone.
You can feed a sequence to create a Pandas Series, and the indexes will be automatically created and data type (dtype) of the sequence inferred.
ser = pd.Series([1, 3, 5, np.nan, 6, 8])
print(type(ser))
ser
<class 'pandas.core.series.Series'>
0 1.0
1 3.0
2 5.0
3 NaN
4 6.0
5 8.0
dtype: float64
From the above evaluation outcome, you notice that each of the elements now has a corresponding index. This is just like lists but explicit. Usually, you would want to supply labels to make the data more descriptive by setting index. The index is the set of labels that uniquely identify each element in a Series (or row in a DataFrame), enabling intuitive label-based access and automatic alignment of data across objects
Now let’s prepare the following data:
labels = ['Apple','Banana','Cherry'] ### a string list
lst = [10,20,30] ### a numerical list
arr = np.array(lst) ### a numpy array
dict = {'Apple':10,'Banana':20,'Cherry':30} ### a dictionary: key-value pairs
### note that we use tuple unpacking to loop over name-object pairs
for name, obj in [('labels', labels), ('lst', lst), ('arr', arr), ('dict', dict)]:
print(f"{name}\t: {obj}")
labels : ['Apple', 'Banana', 'Cherry']
lst : [10, 20, 30]
arr : [10 20 30]
dict : {'Apple': 10, 'Banana': 20, 'Cherry': 30}
Creating Series by sending data as argument with parameter data, then with both parameters data and index with objects, and finally only the objects without parameter names:
print(pd.Series(data=lst), "\n") ### create Series by sending lst as argument with parameter data
print(pd.Series(data=lst,index=labels), "\n") ### add labels: index
print(pd.Series(lst, labels), "\n") ### same as above, without parameter names
print(pd.Series(arr), "\n")
print(pd.Series(arr, index=labels))
0 10
1 20
2 30
dtype: int64
Apple 10
Banana 20
Cherry 30
dtype: int64
Apple 10
Banana 20
Cherry 30
dtype: int64
0 10
1 20
2 30
dtype: int64
Apple 10
Banana 20
Cherry 30
dtype: int64
Using list comprehension
When we need to create a Python list object in a workflow, list comprehension allows us to do so without interruption. Note that in the code below we place the list comprehension in the Series function.
### create a list using list comprehension
### this is a demonstration of list comprehension and we save the list to a variable
nums_lst = [ num for num in range(5) ] ### [0, 1, 2, 3, 4]
### directly put the list comprehension in the Series function
pd.Series([ num for num in range(5) ])
### indexes are added automatically
0 0
1 1
2 2
3 3
4 4
dtype: int64
From Python dictionary
Since a Python dictionary maps keys to values, passing it to pd.Series will use the dictionary keys as the Series index and the corresponding values as the Series data — a convenient way to create a labeled Series from key–value pairs.
### key-value pairs
pd.Series(dict)
Apple 10
Banana 20
Cherry 30
dtype: int64
7.1.3. dtypes in Panda Series#
A Pandas Series can hold data of various types, including integers, floats, strings, and even Python objects. The data type of the Series is inferred from the input data, but it can also be explicitly specified using the dtype
parameter when creating the Series. This flexibility allows for efficient storage and manipulation of heterogeneous data within a single Series object. Some key points to note about Series Data Types:
Type inference: Pandas automatically infers the most appropriate data type
Memory efficiency: Using the correct dtype can significantly reduce memory usage
Performance: Numeric types are faster for mathematical operations
Type coercion: Mixed types are converted to object dtype (least efficient)
The common data types (dtypes) and their use cases include:
Numeric types (int64, float64): Mathematical operations, statistical analysis
Object type: Mixed data, strings, or when type is uncertain
Categorical: Repeated string values, ordered data, memory optimization
Datetime: Time series data, temporal analysis
Boolean: Filtering, conditional logic
Examine the examples below to learn about common Pandas Series dtypes and simple conversions/inspections:
### Series with different dtypes
s_int = pd.Series([1, 2, 3])
s_float = pd.Series([1.0, 2.5, np.nan])
s_str = pd.Series(['a', 'b', 'c'])
s_mixed = pd.Series([1, 'a', 3.0])
s_bool = pd.Series([True, False, True])
s_dt = pd.Series(pd.date_range('2020-01-01', periods=3))
s_cat = pd.Series(pd.Categorical(['low', 'medium', 'high', 'low'],
categories=['low', 'medium', 'high'],
ordered=True))
### mixed with dictionary, list, and tuple
s_obj = pd.Series([{'x': 1}, [1, 2], (3, 4)])
### place the series above into a list of tuples
series_list = [
("s_int, \t\tinteger", s_int),
("s_float, \tfloat (with NaN)", s_float),
("s_string, \tstring", s_str),
("s_mixed, \tmixed (object)", s_mixed),
("s_boolean, \tboolean", s_bool),
("s_datetime, \tdatetime", s_dt),
("s_categorical, \tcategorical", s_cat),
("s_object, \tobject", s_obj),
]
### print out the series:
for name, s in series_list:
print(f"{name:40}\t dtype -> {s.dtype}")
s_int, integer dtype -> int64
s_float, float (with NaN) dtype -> float64
s_string, string dtype -> object
s_mixed, mixed (object) dtype -> object
s_boolean, boolean dtype -> bool
s_datetime, datetime dtype -> datetime64[ns]
s_categorical, categorical dtype -> category
s_object, object dtype -> object
7.1.3.1. Type casting#
You can use astype
to explicitly cast dtypes. See examples 1 and 2 below.
### small demos of conversions operations
print("\nConversions operations:\n")
print("1. s_float \t -> fillna(0) \t-> astype(int): \t->", s_float.fillna(0).astype(int).tolist())
print("2. s_str \t\t\t-> astype('category') \t-> dtype:", s_str.astype('category').dtype)
print("\n3. s_mixed \t\t\t\t\t\t-> dtype (inferred):", s_mixed.dtype)
print("\n4.s_cat value counts (preserves categorical ordering):")
print(s_cat.value_counts(sort=False))
Conversions operations:
1. s_float -> fillna(0) -> astype(int): -> [1, 2, 0]
2. s_str -> astype('category') -> dtype: category
3. s_mixed -> dtype (inferred): object
4.s_cat value counts (preserves categorical ordering):
low 2
medium 1
high 1
Name: count, dtype: int64
7.1.4. Advanced Series Creation Patterns#
This section demonstrates robust and efficient patterns for creating Pandas Series—covering error handling, dtype pre-allocation for performance, and handling special or edge-case values.
Error handling in Series creation
Use simple validation and try/except blocks to ensure Series creation from heterogeneous or external data sources fails gracefully and provides informative messages.
### safe Series creation with error handling
try:
# This might fail if data types are incompatible
problematic_series = pd.Series([1, 'a', 3.14])
print("Series created successfully:", problematic_series.dtype)
except Exception as e:
print(f"Error creating Series: {e}")
Series created successfully: object
Performance optimization
Pre-allocating Series with the correct numeric dtype and minimizing repeated conversions/allocations reduces memory overhead and speeds up large-data operations. The operations below show the difference in time consumed to create larger Series.
### pre-allocate Series dtypes for better performance with large datasets
import time
### method 1: direct creation
start = time.time()
large_series1 = pd.Series(np.random.randn(100000))
time1 = time.time() - start
### method 2: pre-allocate with dtype (faster for known types)
start = time.time()
large_series2 = pd.Series(np.random.randn(100000), dtype=np.float64)
time2 = time.time() - start
print(f"Direct creation: {time1:.4f}s")
print(f"Pre-allocated: {time2:.4f}s")
Direct creation: 0.0031s
Pre-allocated: 0.0034s
Edge cases and special values
This section demonstrates the flexibility of Series: you can store unusual or non-scalar objects (including classes, types, and functions) in a Pandas Series, although this is uncommon in typical data analysis workflows.
### even functions or types can be stored in Series (though not commonly used)
print(pd.Series([tuple, list, np.array]), "\n")
print(pd.Series([sum, print, len]))
0 <class 'tuple'>
1 <class 'list'>
2 <built-in function array>
dtype: object
0 <built-in function sum>
1 <built-in function print>
2 <built-in function len>
dtype: object
7.1.5. Series Indexing#
The power of a Pandas Series comes from its index — an ordered set of labels that identify each element and enable fast, expressive, label-based operations. Some of the key concepts to learn about Series indexing include:
Label vs. position: use
.loc
/.at
for label-based access and.iloc
/.iat
for position-based access (e.g.,ser.loc['Apple']
,ser.iloc[0]
).Flexible labels: index values can be any hashable type (strings, integers, datetimes, etc.). The index itself can carry a name.
Alignment behavior: arithmetic and combine operations automatically align on index labels (missing labels yield
NaN
), soser1 + ser2
uses the union of indexes.Uniqueness & ordering: indexes may be non-unique and are ordered — choose the index type and uniqueness appropriate for your use case.
Performance: lookups by label are fast (dict-like). Vectorized operations on a well-chosen index/dtype give best performance.
There are four types of indexing operations in Pandas Series:
Label-based access (preferred for labeled data)
Single value:
ser1['Apple']
orser1.loc['Apple']
Multiple labels:
ser1[['Apple', 'Cherry']]
orser1.loc[['Apple','Cherry']]
Label slicing (inclusive end):
ser1['Apple':'Cherry']
orser1.loc['Apple':'Cherry']
Positional access (use for integer positions)
Single position:
ser1.iloc[0]
Slice (end exclusive):
ser1.iloc[0:2]
Avoid
ser1[0]
when the index is integer-labelled — it is ambiguous.
Safe access and fast scalar ops
Safe get without KeyError:
ser1.get('Durian', 'Not found')
Fast scalar access/assignment by label:
ser1.at['Banana'] = 25
Fast scalar access/assignment by position:
ser1.iat[0]
Boolean indexing
Filter by condition:
ser1[ser1 > 15]
Use
.isin()
to filter by a list of labels:ser1[ser1.index.isin(['Apple','Cranberry'])]
Note that:
Bracket-based label slicing is stop inclusive; positional slicing with
.iloc
is end-exclusive.Prefer
.loc
/.iloc
/.at
/.iat
for explicit, unambiguous indexing.Remember alignment semantics when combining series: missing labels produce
NaN
, which can change dtypes (e.g., integers -> floats).
Common methods
Useful helpers to reshape, filter, and merge while preserving or changing index semantics:
reindex
,isin
,where
,combine_first
,reset_index
### creating 2 Series with custom indexes
ser1 = pd.Series( [ 10, 20, 30 ], index = [ 'Apple', 'Banana', 'Cherry' ])
ser2 = pd.Series( [ 10, 20, 40 ], index = [ 'Apple', 'Banana', 'Cranberry'] )
print(ser1, "\n")
print(ser2)
Apple 10
Banana 20
Cherry 30
dtype: int64
Apple 10
Banana 20
Cranberry 40
dtype: int64
7.1.5.1. Simple indexing#
print(ser1['Apple'])
10
# print("ser1:\n", ser1, "\n")
print(ser1)
Apple 10
Banana 20
Cherry 30
dtype: int64
7.1.5.1.1. Label-based access (.loc) - inclusive slicing for labels#
print(ser1.loc['Banana'], "\n")
print(ser1.loc[['Apple', 'Cherry']], "\n")
print(ser1['Apple':'Cherry'])
20
Apple 10
Cherry 30
dtype: int64
Apple 10
Banana 20
Cherry 30
dtype: int64
###@ 2) Position-based access (.iloc) - integer positions, Python-style slices (end exclusive)
print(ser1.iloc[1] , "\n")
print(ser1.iloc[0:2])
20
Apple 10
Banana 20
dtype: int64
7.1.5.2. 3) Fast scalar access / assignment (.at and .iat)#
print(ser1.at['Banana'])
ser1.at['Banana'] = 25 ### update set by label
print(ser1.at['Banana'])
print(ser1.iat[0], "\n")
20
25
10
7.1.5.3. 4) Boolean indexing and isin#
print(ser1[ser1 > 15], "\n")
print(ser1[ser1.index.isin(['Apple', 'Cranberry'])], "\n")
Banana 25
Cherry 30
dtype: int64
Apple 10
dtype: int64
7.1.5.4. 5) Safe get with default (avoids KeyError)#
In pandas, the .get()
method for a Series is a safe way to access values by index label without raising a KeyError if the label doesn’t exist. It works similarly to Python’s dictionary .get()
method.
print(ser1.get('Durian', 'Not found'), "\n")
Not found
7.1.5.5. 6) Reindexing and filling missing values#
new_idx = ['Cherry', 'Apple', 'Durian']
print("Reindex to", new_idx, "with fill_value=0:\n", ser1.reindex(new_idx, fill_value=0), "\n")
Reindex to ['Cherry', 'Apple', 'Durian'] with fill_value=0:
Cherry 30
Apple 10
Durian 0
dtype: int64
7.1.5.6. 7) Aligning / combining two Series (useful when indexes differ)#
# ser1 + ser2 (alignment on index)
print(ser1 + ser2, "\n")
# fill missing in ser1 from ser2 using combine_first
print(ser1.combine_first(ser2))
Apple 20.0
Banana 45.0
Cherry NaN
Cranberry NaN
dtype: float64
Apple 10
Banana 25
Cherry 30
Cranberry 40
dtype: int64
7.1.5.7. 8) Conditional selection with where (keeps values that satisfy condition)#
# "Keep values > 15, else 0"
print(ser1.where(ser1 > 15, other=0), "\n")
Apple 0
Banana 25
Cherry 30
dtype: int64
7.1.5.8. 9) Inspect index and values directly#
print(ser1.index)
print(ser1.values)
Index(['Apple', 'Banana', 'Cherry'], dtype='object')
[10 25 30]
7.1.6. More on Indexing#
Observe the operation below. See that operations are done based on the index. When we add two series, Pandas aligns the two Series by their index labels before performing the operation:
The result’s index is the **union of the two input indexes.
For labels present in both Series, values are added element-wise.
If a label is missing in one Series, the result for that label is NaN (missing value propagates).
Presence of NaN may change dtype (e.g., integers -> floats) because NaN is a float value.
ser1 + ser2
Apple 20.0
Banana 40.0
Cherry NaN
Cranberry NaN
dtype: float64