7.1. Pandas Series#

7.1.1. Overview#

The Pandas Series is a one-dimensional labeled array capable of holding any data type. It’s the primary building block of Pandas and serves as the foundation for the more complex DataFrame structure.

7.1.1.1. Key Characteristics:#

  • Labeled index: Each element has an associated label (index)

  • Heterogeneous data types: Can hold mixed data types (unlike NumPy arrays which are homogeneous)

  • Built on NumPy: Internally uses NumPy arrays for efficient storage

  • Flexible creation: Can be created from lists, arrays, dictionaries, and more

7.1.1.2. When to use Series:#

  • Time series data with datetime indexing

  • Labeled data where index meaning is important

  • Single column of data from a larger dataset

  • Lookup tables or mapping data

The first main data type you will learn about for Pandas is the Series data type. When creating a Series by passing a list of values, pandas will create a default RangeIndex.

  • You should note that this process is very similar to creating a NumPy array, where we started by using the np.array() function to turn a list into an array.

  • You should also note that this Series below contains heterogeneous data-type elements, which is the same as a list but not as ndarray.

A Series is very similar to a NumPy array (in fact, it is built on top of the NumPy array object). What differentiates a NumPy array from a Series is that a Series can have labels, meaning the elements can be indexed by labels instead of just numerical positions, which helps a lot when performing data analysis. Additionally, while NumPy arrays are designed for homogeneous numeric data, a Series can hold any arbitrary Python object.

A Pandas Series can be created by loading data from existing storage such as dataset files and SQL data sources, or directly from Python objects like lists, dictionaries, or even single scalar values.

import numpy as np
import pandas as pd

7.1.2. Creating Series#

7.1.2.1. From Python Objects#

Pandas Series can be constructed from many Python objects — most commonly lists, NumPy arrays, and dictionaries. Note that:

  • When you pass a list or NumPy array, pandas creates a default integer index (0, 1, 2, …) unless you supply an index.

  • Passing a dictionary uses the dictionary keys as the Series index and the values as the data.

  • Pandas infers the dtype automatically; specify dtype (dtype=) explicitly if you need a particular type.

  • You can provide custom labels via the index argument to make your Series more descriptive and easier to work with.

  • List comprehensions can be used inline when building a Series from a generated sequence.

When creating a Series, the pd.Series() constructor accepts several important arguments:

  • data: The actual values (list, array, dictionary, etc.)

  • index: Custom labels for each element (optional)

  • dtype: Explicit data type specification (optional)

By specifying parameter names explicitly, you can pass arguments in any order, making your code more readable and less error-prone.

You can feed a sequence to create a Pandas Series, and the indexes will be automatically created and data type (dtype) of the sequence inferred.

ser = pd.Series([1, 3, 5, np.nan, 6, 8])
print(type(ser))
ser
<class 'pandas.core.series.Series'>
0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

From the above evaluation outcome, you notice that each of the elements now has a corresponding index. This is just like lists but explicit. Usually, you would want to supply labels to make the data more descriptive by setting index. The index is the set of labels that uniquely identify each element in a Series (or row in a DataFrame), enabling intuitive label-based access and automatic alignment of data across objects

Now let’s prepare the following data:

labels = ['Apple','Banana','Cherry']          ### a string list
lst = [10,20,30]                ### a numerical list
arr = np.array(lst)             ### a numpy array
dict = {'Apple':10,'Banana':20,'Cherry':30}   ### a dictionary: key-value pairs

### note that we use tuple unpacking to loop over name-object pairs
for name, obj in [('labels', labels), ('lst', lst), ('arr', arr), ('dict', dict)]:
    print(f"{name}\t: {obj}")
labels	: ['Apple', 'Banana', 'Cherry']
lst	: [10, 20, 30]
arr	: [10 20 30]
dict	: {'Apple': 10, 'Banana': 20, 'Cherry': 30}

Creating Series by sending data as argument with parameter data, then with both parameters data and index with objects, and finally only the objects without parameter names:

print(pd.Series(data=lst), "\n")                ### create Series by sending lst as argument with parameter data
print(pd.Series(data=lst,index=labels), "\n")   ### add labels: index
print(pd.Series(lst, labels), "\n")             ### same as above, without parameter names
print(pd.Series(arr), "\n")
print(pd.Series(arr, index=labels)) 
0    10
1    20
2    30
dtype: int64 

Apple     10
Banana    20
Cherry    30
dtype: int64 

Apple     10
Banana    20
Cherry    30
dtype: int64 

0    10
1    20
2    30
dtype: int64 

Apple     10
Banana    20
Cherry    30
dtype: int64

Using list comprehension

When we need to create a Python list object in a workflow, list comprehension allows us to do so without interruption. Note that in the code below we place the list comprehension in the Series function.

### create a list using list comprehension
### this is a demonstration of list comprehension and we save the list to a variable
nums_lst = [ num for num in range(5) ]    ### [0, 1, 2, 3, 4]

### directly put the list comprehension in the Series function
pd.Series([ num for num in range(5) ])


### indexes are added automatically
0    0
1    1
2    2
3    3
4    4
dtype: int64

From Python dictionary

Since a Python dictionary maps keys to values, passing it to pd.Series will use the dictionary keys as the Series index and the corresponding values as the Series data — a convenient way to create a labeled Series from key–value pairs.

### key-value pairs

pd.Series(dict)
Apple     10
Banana    20
Cherry    30
dtype: int64

7.1.3. dtypes in Panda Series#

A Pandas Series can hold data of various types, including integers, floats, strings, and even Python objects. The data type of the Series is inferred from the input data, but it can also be explicitly specified using the dtype parameter when creating the Series. This flexibility allows for efficient storage and manipulation of heterogeneous data within a single Series object. Some key points to note about Series Data Types:

  • Type inference: Pandas automatically infers the most appropriate data type

  • Memory efficiency: Using the correct dtype can significantly reduce memory usage

  • Performance: Numeric types are faster for mathematical operations

  • Type coercion: Mixed types are converted to object dtype (least efficient)

The common data types (dtypes) and their use cases include:

  • Numeric types (int64, float64): Mathematical operations, statistical analysis

  • Object type: Mixed data, strings, or when type is uncertain

  • Categorical: Repeated string values, ordered data, memory optimization

  • Datetime: Time series data, temporal analysis

  • Boolean: Filtering, conditional logic

Examine the examples below to learn about common Pandas Series dtypes and simple conversions/inspections:

### Series with different dtypes

s_int = pd.Series([1, 2, 3])
s_float = pd.Series([1.0, 2.5, np.nan])
s_str = pd.Series(['a', 'b', 'c'])
s_mixed = pd.Series([1, 'a', 3.0])
s_bool = pd.Series([True, False, True])
s_dt = pd.Series(pd.date_range('2020-01-01', periods=3))
s_cat = pd.Series(pd.Categorical(['low', 'medium', 'high', 'low'],
                                 categories=['low', 'medium', 'high'],
                                 ordered=True))
### mixed with dictionary, list, and tuple
s_obj = pd.Series([{'x': 1}, [1, 2], (3, 4)])

### place the series above into a list of tuples
series_list = [
    ("s_int, \t\tinteger", s_int),
    ("s_float, \tfloat (with NaN)", s_float),
    ("s_string, \tstring", s_str),
    ("s_mixed, \tmixed (object)", s_mixed),
    ("s_boolean, \tboolean", s_bool),
    ("s_datetime, \tdatetime", s_dt),
    ("s_categorical, \tcategorical", s_cat),
    ("s_object,  \tobject", s_obj),
]

### print out the series:
for name, s in series_list:
    print(f"{name:40}\t dtype -> {s.dtype}")
s_int, 		integer                        	 dtype -> int64
s_float, 	float (with NaN)              	 dtype -> float64
s_string, 	string                       	 dtype -> object
s_mixed, 	mixed (object)                	 dtype -> object
s_boolean, 	boolean                     	 dtype -> bool
s_datetime, 	datetime                   	 dtype -> datetime64[ns]
s_categorical, 	categorical             	 dtype -> category
s_object,  	object                      	 dtype -> object

7.1.3.1. Type casting#

You can use astype to explicitly cast dtypes. See examples 1 and 2 below.

### small demos of conversions operations

print("\nConversions operations:\n")
print("1. s_float \t -> fillna(0) \t-> astype(int): \t->", s_float.fillna(0).astype(int).tolist())
print("2. s_str \t\t\t-> astype('category') \t-> dtype:", s_str.astype('category').dtype)
print("\n3. s_mixed \t\t\t\t\t\t-> dtype (inferred):", s_mixed.dtype)

print("\n4.s_cat value counts (preserves categorical ordering):")
print(s_cat.value_counts(sort=False))
Conversions operations:

1. s_float 	 -> fillna(0) 	-> astype(int): 	-> [1, 2, 0]
2. s_str 			-> astype('category') 	-> dtype: category

3. s_mixed 						-> dtype (inferred): object

4.s_cat value counts (preserves categorical ordering):
low       2
medium    1
high      1
Name: count, dtype: int64

7.1.4. Advanced Series Creation Patterns#

This section demonstrates robust and efficient patterns for creating Pandas Series—covering error handling, dtype pre-allocation for performance, and handling special or edge-case values.

Error handling in Series creation
Use simple validation and try/except blocks to ensure Series creation from heterogeneous or external data sources fails gracefully and provides informative messages.

### safe Series creation with error handling

try:
    # This might fail if data types are incompatible
    problematic_series = pd.Series([1, 'a', 3.14])
    print("Series created successfully:", problematic_series.dtype)
except Exception as e:
    print(f"Error creating Series: {e}")
Series created successfully: object

Performance optimization

Pre-allocating Series with the correct numeric dtype and minimizing repeated conversions/allocations reduces memory overhead and speeds up large-data operations. The operations below show the difference in time consumed to create larger Series.

### pre-allocate Series dtypes for better performance with large datasets

import time

### method 1: direct creation
start = time.time()
large_series1 = pd.Series(np.random.randn(100000))
time1 = time.time() - start

### method 2: pre-allocate with dtype (faster for known types)
start = time.time()
large_series2 = pd.Series(np.random.randn(100000), dtype=np.float64)
time2 = time.time() - start

print(f"Direct creation: {time1:.4f}s")
print(f"Pre-allocated: {time2:.4f}s")
Direct creation: 0.0031s
Pre-allocated: 0.0034s

Edge cases and special values

This section demonstrates the flexibility of Series: you can store unusual or non-scalar objects (including classes, types, and functions) in a Pandas Series, although this is uncommon in typical data analysis workflows.

### even functions or types can be stored in Series (though not commonly used)

print(pd.Series([tuple, list, np.array]), "\n")
print(pd.Series([sum, print, len]))
0              <class 'tuple'>
1               <class 'list'>
2    <built-in function array>
dtype: object 

0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

7.1.5. Series Indexing#

The power of a Pandas Series comes from its index — an ordered set of labels that identify each element and enable fast, expressive, label-based operations. Some of the key concepts to learn about Series indexing include:

  • Label vs. position: use .loc / .at for label-based access and .iloc / .iat for position-based access (e.g., ser.loc['Apple'], ser.iloc[0]).

  • Flexible labels: index values can be any hashable type (strings, integers, datetimes, etc.). The index itself can carry a name.

  • Alignment behavior: arithmetic and combine operations automatically align on index labels (missing labels yield NaN), so ser1 + ser2 uses the union of indexes.

  • Uniqueness & ordering: indexes may be non-unique and are ordered — choose the index type and uniqueness appropriate for your use case.

  • Performance: lookups by label are fast (dict-like). Vectorized operations on a well-chosen index/dtype give best performance.

There are four types of indexing operations in Pandas Series:

  1. Label-based access (preferred for labeled data)

    • Single value: ser1['Apple'] or ser1.loc['Apple']

    • Multiple labels: ser1[['Apple', 'Cherry']] or ser1.loc[['Apple','Cherry']]

    • Label slicing (inclusive end): ser1['Apple':'Cherry'] or ser1.loc['Apple':'Cherry']

  2. Positional access (use for integer positions)

    • Single position: ser1.iloc[0]

    • Slice (end exclusive): ser1.iloc[0:2]

    • Avoid ser1[0] when the index is integer-labelled — it is ambiguous.

  3. Safe access and fast scalar ops

    • Safe get without KeyError: ser1.get('Durian', 'Not found')

    • Fast scalar access/assignment by label: ser1.at['Banana'] = 25

    • Fast scalar access/assignment by position: ser1.iat[0]

  4. Boolean indexing

    • Filter by condition: ser1[ser1 > 15]

    • Use .isin() to filter by a list of labels: ser1[ser1.index.isin(['Apple','Cranberry'])]

Note that:

  • Bracket-based label slicing is stop inclusive; positional slicing with .iloc is end-exclusive.

  • Prefer .loc / .iloc / .at / .iat for explicit, unambiguous indexing.

  • Remember alignment semantics when combining series: missing labels produce NaN, which can change dtypes (e.g., integers -> floats).

Common methods

  • Useful helpers to reshape, filter, and merge while preserving or changing index semantics:

    • reindex, isin, where, combine_first, reset_index

### creating 2 Series with custom indexes

ser1 = pd.Series( [ 10, 20, 30 ], index = [ 'Apple', 'Banana', 'Cherry' ])
ser2 = pd.Series( [ 10, 20, 40 ], index = [ 'Apple', 'Banana', 'Cranberry'] )
print(ser1, "\n")    
print(ser2)
Apple     10
Banana    20
Cherry    30
dtype: int64 

Apple        10
Banana       20
Cranberry    40
dtype: int64

7.1.5.1. Simple indexing#

print(ser1['Apple'])
10
# print("ser1:\n", ser1, "\n")
print(ser1)
Apple     10
Banana    20
Cherry    30
dtype: int64

7.1.5.1.1. Label-based access (.loc) - inclusive slicing for labels#

print(ser1.loc['Banana'], "\n")
print(ser1.loc[['Apple', 'Cherry']], "\n")
print(ser1['Apple':'Cherry'])
20 

Apple     10
Cherry    30
dtype: int64 

Apple     10
Banana    20
Cherry    30
dtype: int64

###@ 2) Position-based access (.iloc) - integer positions, Python-style slices (end exclusive)

print(ser1.iloc[1] , "\n")
print(ser1.iloc[0:2])
20 

Apple     10
Banana    20
dtype: int64

7.1.5.2. 3) Fast scalar access / assignment (.at and .iat)#

print(ser1.at['Banana'])
ser1.at['Banana'] = 25               ### update set by label
print(ser1.at['Banana'])
print(ser1.iat[0], "\n")
20
25
10 

7.1.5.3. 4) Boolean indexing and isin#

print(ser1[ser1 > 15], "\n")
print(ser1[ser1.index.isin(['Apple', 'Cranberry'])], "\n")
Banana    25
Cherry    30
dtype: int64 

Apple    10
dtype: int64 

7.1.5.4. 5) Safe get with default (avoids KeyError)#

In pandas, the .get() method for a Series is a safe way to access values by index label without raising a KeyError if the label doesn’t exist. It works similarly to Python’s dictionary .get() method.

print(ser1.get('Durian', 'Not found'), "\n")
Not found 

7.1.5.5. 6) Reindexing and filling missing values#

new_idx = ['Cherry', 'Apple', 'Durian']
print("Reindex to", new_idx, "with fill_value=0:\n", ser1.reindex(new_idx, fill_value=0), "\n")
Reindex to ['Cherry', 'Apple', 'Durian'] with fill_value=0:
 Cherry    30
Apple     10
Durian     0
dtype: int64 

7.1.5.6. 7) Aligning / combining two Series (useful when indexes differ)#

# ser1 + ser2 (alignment on index)
print(ser1 + ser2, "\n")
# fill missing in ser1 from ser2 using combine_first
print(ser1.combine_first(ser2))
Apple        20.0
Banana       45.0
Cherry        NaN
Cranberry     NaN
dtype: float64 

Apple        10
Banana       25
Cherry       30
Cranberry    40
dtype: int64

7.1.5.7. 8) Conditional selection with where (keeps values that satisfy condition)#

# "Keep values > 15, else 0"
print(ser1.where(ser1 > 15, other=0), "\n")
Apple      0
Banana    25
Cherry    30
dtype: int64 

7.1.5.8. 9) Inspect index and values directly#

print(ser1.index)
print(ser1.values)
Index(['Apple', 'Banana', 'Cherry'], dtype='object')
[10 25 30]

7.1.6. More on Indexing#

Observe the operation below. See that operations are done based on the index. When we add two series, Pandas aligns the two Series by their index labels before performing the operation:

  • The result’s index is the **union of the two input indexes.

  • For labels present in both Series, values are added element-wise.

  • If a label is missing in one Series, the result for that label is NaN (missing value propagates).

  • Presence of NaN may change dtype (e.g., integers -> floats) because NaN is a float value.

ser1 + ser2
Apple        20.0
Banana       40.0
Cherry        NaN
Cranberry     NaN
dtype: float64