4.4. Missing Data#
import numpy as np
import pandas as pd
Missing data is common in real-world datasets and can affect analysis, aggregation, and model training.
In pandas, missing values are represented with special sentinel markers (placeholder values that mean “missing”), not with a separate universal null type.
Common missing-value markers in pandas:
None: Python’s null singleton. In pandas, it is treated as missing and often appears inobjectcolumns.np.nan(NaN): IEEE floating-point “Not a Number,” commonly used for missing values in numeric/float contexts.pd.NA: pandas’ missing-value scalar for nullable extension dtypes (for exampleInt64,boolean, andstring), which helps preserve logical dtypes.pd.NaT: pandas’ missing-value marker for datetime-like values (datetime64,timedelta64, etc.).
Important comparison behavior:
np.nan != np.nanisTruepd.NA == pd.NAreturns<NA>(unknown), notTrue
Because of this, detect missing values with isna() / notna() rather than equality checks.
The following table summarizes the four sentinel missing value markers in Pandas:
Marker |
Full Name |
Introduced By |
dtype |
Common? |
Use Case |
|---|---|---|---|---|---|
|
None |
Python |
object |
Most common |
Missing values in string/object columns |
|
Not a Number |
NumPy |
float64 |
Most common |
Missing values in numerical/float columns |
|
Not Available |
Pandas 1.0 (new) |
nullable extension dtypes |
Growing |
Missing marker for nullable dtypes (e.g., |
|
Not a Time |
Pandas |
datetime/timedelta |
Specialized |
Missing values in datetime or timedelta columns |
Let’s explore each of these sentinel values in detail, starting with None.
4.4.1. None as a Sentinel Value#
A sentinel value is a special value used to signal that data is missing, invalid, or absent — essentially a placeholder that means “there’s nothing here.” In Pandas, the choice of sentinel value depends on the data type:
None as a sentinel:
Python’s native
Noneobject is used for object/string arraysWhen you include
Nonein a NumPy array, the entire array is forced toobjectdtypeThis is because
Noneis a Python object, not a native NumPy typeObject arrays are usually slower and less type-stable; many operations fall back to Python objects
Why NaN for numerical data:
For numerical arrays, Pandas uses
NaN(Not a Number) as the sentinel insteadNaNis a special IEEE 754 floating-point value that can coexist with numbersThis preserves native numerical dtypes and enables fast, compiled operations
However, it forces integer arrays to become float arrays (since
NaNis a float value)
Pay attention to how dtypes change in the following examples:
### dtype is int64
arr = np.array([1, 1, 2, 3])
arr.dtype
dtype('int64')
In the following context, NumPy infers the arr elements as Python objects because of None.
arr = np.array([1, None, 2, 3])
print("arr.dtype:", arr.dtype)
arr
arr.dtype: object
array([1, None, 2, 3], dtype=object)
The problem with object dtype is, when None forces an array to object dtype, NumPy operations break because they expect native numerical types:
%%expect TypeError
arr.sum() ### will generate a TypeError
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
Forcing dtype=float to avoid TypeError:
To prevent the TypeError with object dtype arrays, you can explicitly set dtype=float when creating the array. This converts None to NaN, which NumPy can handle natively.
However, this doesn’t solve the missing data problem — it just changes how NumPy handles it. Arithmetic operations with NaN propagate the missing value through calculations, so the sum still results in NaN. This behavior is intentional: it forces you to explicitly decide how to handle missing data rather than silently ignoring it.
arr = np.array([1, None, 2, 3], dtype=float)
print(arr[1]) ### None is converted to NaN (Not a Number) when using float dtype
arr.sum() ### NaN propagates through calculations, so the result is NaN
nan
np.float64(nan)
%%expect TypeError
### EXERCISE: Working with None in NumPy Arrays
# 1. print: Create a NumPy array [1, None, 2, 3], call it arr, and then:
# 2. print and observe its dtype — what type does NumPy infer?
# 3. Try calling arr.sum() and note what happens.
# 4. print: create the same array with dtype=float, call it arr_float
# 5. print arr_float.sum() again and observe the result.
### Your code starts here:
### Your code ends here.
arr: [1 None 2 3]
dtype of arr: object
arr_float: [ 1. nan 2. 3.]
dtype of arr_float: float64
sum of arr_float: nan
4.4.2. NaN: Missing Numerical Data#
Unlike None, NaN (Not a Number) is a special IEEE 754 floating-point value that’s standardized across computing systems. When you create an array with NaN, NumPy keeps the native floating-point dtype instead of converting to object dtype.
Key advantages of NaN over None:
Preserves numerical dtype (float64) rather than forcing object dtype
Enables fast, vectorized operations
Works seamlessly with NumPy’s mathematical functions
Recognized by specialized functions like
np.nansum(),np.nanmean(), etc.
Creating an array with NaN values while preserving float dtype:
arr = np.array([1, np.nan, 3, 4], dtype=float)
print(type(arr))
arr.dtype
<class 'numpy.ndarray'>
dtype('float64')
4.4.2.1. Standard sum with NaN#
When you use regular NumPy operations like np.sum() on an array containing NaN, the result propagates the missing value — the entire sum becomes NaN. This forces you to explicitly handle missing data rather than silently ignoring it:
np.sum(arr)
np.float64(nan)
4.4.2.2. NaN-aware functions#
NumPy provides specialized functions like np.nansum(), np.nanmean(), and np.nanstd() that ignore NaN values during computation. These allow you to work with incomplete data while getting meaningful results:
### Examples of NaN-aware functions
print(f"Sum (ignoring NaN):\t {np.nansum(arr)}")
print(f"Mean (ignoring NaN):\t {np.nanmean(arr)}")
print(f"Std (ignoring NaN):\t {np.nanstd(arr)}")
print(f"Min (ignoring NaN):\t {np.nanmin(arr)}")
print(f"Max (ignoring NaN):\t {np.nanmax(arr)}")
Sum (ignoring NaN): 8.0
Mean (ignoring NaN): 2.6666666666666665
Std (ignoring NaN): 1.247219128924647
Min (ignoring NaN): 1.0
Max (ignoring NaN): 4.0
4.4.2.3. Limitation of NaN:**#
A key limitation of NaN is that it’s defined only for floating-point numbers—there’s no native NaN sentinel for integers, strings, or other types.
### EXERCISE: Using NaN-Aware Functions
### 1. print: Create a NumPy array with some np.nan values:
# np.nan, 3, np.nan, 5
### 2. print: sum the array using np.sum()
### 3. print: sum the array using np.nansum().
### 4. print: the mean of arry using np.nanmean()
### 5. print: the standard deviation of array using np.nanstd().
### Your code begins here
### Your code ends here
the array: [ 1. nan 3. nan 5.]
np.sum() : nan
np.nansum() : 9.0
np.nanmean(): 3.0
np.nanstd() : 1.632993161855452
4.4.3. None, NaN, and NA in Pandas#
Both None and NaN serve as missing-value markers in Pandas, and the library treats them nearly interchangeably, automatically converting between them as needed.
A Series with both np.nan and None shows that Pandas converts both to NaN and uses a float64 dtype:
pd.Series( [ 1, np.nan, 2, None ] )
0 1.0
1 NaN
2 2.0
3 NaN
dtype: float64
4.4.3.1. Dtype promotion and upcasting#
When Pandas needs to store values with different types in a single Series or array, it “promotes” to a more general dtype that can accommodate all values. This is especially important for missing values.
Since many dtypes don’t have a native missing-value representation, Pandas must upcast to a compatible type:
Integers are promoted to float64 (because
NaNis a float value)Booleans are promoted to object (to accommodate
None)Floats stay as float (already support
NaN)Objects stay as object (already support
NoneorNaN)
The typical promotion hierarchy:
bool→int→float→complexFor Pandas-specific types:
int→float(whenNaNis needed), or → nullable dtypes likeInt64(whenpd.NAis used)
The examples below demonstrate how Pandas handles dtype conversion automatically when missing values are introduced:
ser = pd.Series(range(3), dtype=int)
print("=== ser: ===")
print(ser, "\n")
print("the dtype of ser is: ", ser.dtype)
ser[0] = None ### update element[0] to None
print("\n=== ser updated with None: ===")
print(ser)
print(f"\npandas upcast the type to: {ser.dtype}")
=== ser: ===
0 0
1 1
2 2
dtype: int64
the dtype of ser is: int64
=== ser updated with None: ===
0 NaN
1 1.0
2 2.0
dtype: float64
pandas upcast the type to: float64
4.4.3.2. Explicit Nullable Integer#
Here we explicitly request pandas’ nullable integer dtype (Int64) so missing values are represented with pd.NA instead of forcing float upcasting.
Pandas adds nullable dtypes (NA) to address situations where type casting is an issue. For example, how to represent a true integer array with missing data. Int64 (capital I) is pandas’ nullable integer dtype, different from NumPy’s int64 (lowercase), which is not nullable. These dtypes have capitalization of their names (e.g., Int64 vs int64) and are used only when explicitly requested for backward compatibility. For example:
### requesting NA: dtype=Int64, not int64
pd.Series([1, np.nan, 2, None, pd.NA], dtype='Int64')
0 1
1 <NA>
2 2
3 <NA>
4 <NA>
dtype: Int64
4.4.3.3. Nullable Dtypes in Practice#
Use nullable dtypes when you want missing values without losing the original logical type:
Int64for integers with missing valuesbooleanfor three-state logic (True/False/<NA>)stringfor text withpd.NA
# Compare default inference vs explicit nullable dtype
s_default = pd.Series([1, None, 3])
s_nullable = pd.Series([1, None, 3], dtype='Int64') ### upcast to nullable integer dtype
print('default dtype :', s_default.dtype)
print('nullable dtype :', s_nullable.dtype)
print(s_nullable)
flags = pd.Series([True, False, pd.NA], dtype='boolean')
names = pd.Series(['Alice', None, 'Charlie'], dtype='string')
print('flags dtype :', flags.dtype)
print('names dtype :', names.dtype)
default dtype : float64
nullable dtype : Int64
0 1
1 <NA>
2 3
dtype: Int64
flags dtype : boolean
names dtype : string
# pd.NA follows 3-valued logic: comparisons can return <NA>
print('pd.NA == pd.NA ->', pd.NA == pd.NA)
print('pd.isna(pd.NA) ->', pd.isna(pd.NA))
print()
mask = s_nullable > 1
print('mask values:')
print(mask)
print()
# For indexing, convert unknown mask entries to False
print('safe filter result:')
print(s_nullable[mask.fillna(False)])
pd.NA == pd.NA -> <NA>
pd.isna(pd.NA) -> True
mask values:
0 False
1 <NA>
2 True
dtype: boolean
safe filter result:
2 3
dtype: Int64
In summary, Pandas has two common missing-data paths: the default legacy upcasting behavior, and nullable extension dtypes.
Type class |
Default path (with |
Nullable path (explicit nullable dtype) |
Missing marker |
|---|---|---|---|
floating |
Stays |
|
|
object/text |
Stays |
|
|
integer |
Upcasts to |
Stays nullable integer ( |
|
boolean |
Upcasts to |
Stays nullable |
|
### EXERCISE: Pandas Missing Value Handling
### 1. print: Create a Pandas Series with a mix of None, np.nan, and regular values.
### Observe the dtype. Then create the same Series with dtype='Int64'.
### What missing-value marker does each version use?
### 2. print: Create the same Series with nullable Int64 dtype
### 3. produce the same results as seen below.
### Your code begins here
### Your code ends here
Default dtype: float64
0 1.0
1 NaN
2 2.0
3 NaN
4 3.0
dtype: float64
Nullable Int64 dtype: Int64
0 1
1 <NA>
2 2
3 <NA>
4 3
dtype: Int64
4.4.4. Null Value Operations#
Pandas provides a small set of core tools for null-value work:
Tool |
Purpose |
Typical use |
|---|---|---|
|
Detect missing values |
Build a Boolean mask ( |
|
Detect non-missing values |
Filter to valid entries ( |
|
Remove missing data |
Drop rows/columns with nulls based on rules ( |
|
Replace missing data |
Fill with constants, statistics, or method-based values |
isnull() and notnull() are aliases for isna() and notna().
4.4.4.1. Check Your Objects#
Before applying null-value fixes, run a quick structural check:
Check |
Why it matters |
|---|---|
|
Confirms shape, non-null counts, and memory usage |
|
Counts missing values per column |
|
Verifies column types before/after cleaning |
Also, isna is available both as a top-level function (pd.isna) and as object methods (Series.isna, DataFrame.isna).
### build a DataFrame
import numpy as np
import pandas as pd
df = pd.DataFrame({
'A' : [ 1, 2, np.nan ],
'B' : [ 5, np.nan, np.nan],
'C' : [ 1, 2, 3]
})
df
| A | B | C | |
|---|---|---|---|
| 0 | 1.0 | 5.0 | 1 |
| 1 | 2.0 | NaN | 2 |
| 2 | NaN | NaN | 3 |
4.4.4.1.1. df.info( )#
df.info() prints a compact summary of the DataFrame: row count, column names, non-null counts, dtypes, and memory usage. For missing-data checks, the key part is the Non-Null Count column, which tells you how many values are present in each column.
### df.info() will show non-null count
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 2 non-null float64
1 B 1 non-null float64
2 C 3 non-null int64
dtypes: float64(2), int64(1)
memory usage: 200.0 bytes
4.4.4.1.2. isna().sum()#
df.isna() creates a boolean DataFrame (True for missing, False for present). Chaining .sum() counts True values per column, so the result shows how many missing values each column contains.
### Count missing values by column
df_nan_sum = df.isna().sum()
print("Sum of NaN's:")
print(df_nan_sum)
Sum of NaN's:
A 1
B 2
C 0
dtype: int64
4.4.4.1.3. The dtypes Attribute#
df.dtypes returns a Series where the index is column names and each value is that column’s data type. The last line in the display (dtype: object) is the dtype of this resulting Series (not the dtype of your DataFrame columns).
### Inspect dtypes
df.dtypes
A float64
B float64
C int64
dtype: object
4.4.4.2. Detecting Null Values#
Use these paired methods to build Boolean masks:
Method |
Meaning of |
Common use |
|---|---|---|
|
Value is missing |
Locate/count nulls |
|
Value is present |
Keep valid entries |
Let’s start with a Pandas Series.
ser = pd.Series([1, np.nan, 'hello', None])
ser
0 1
1 NaN
2 hello
3 None
dtype: object
isnull()/notnull() can be called as Series methods or as top-level pandas functions; both forms return the same Boolean mask.
### method vs function forms (same result)
print("isnull() as method:")
print(ser.isnull())
print("\nisnull() as top-level function:")
print(pd.isnull(ser))
print("\nnotnull() as method:")
print(ser.notnull())
print("\nnotnull() as top-level function:")
print(pd.notnull(ser))
isnull() as method:
0 False
1 True
2 False
3 True
dtype: bool
isnull() as top-level function:
0 False
1 True
2 False
3 True
dtype: bool
notnull() as method:
0 True
1 False
2 True
3 False
dtype: bool
notnull() as top-level function:
0 True
1 False
2 True
3 False
dtype: bool
Here booleans mask as index in Series or DataFrame
print("ser[ser.isnull()]:")
print()
print(ser.isnull())
print()
print(ser[ser.isnull()])
print()
print("\nser[ser.notnull()]:")
print(ser[ser.notnull()])
ser[ser.isnull()]:
0 False
1 True
2 False
3 True
dtype: bool
1 NaN
3 None
dtype: object
ser[ser.notnull()]:
0 1
2 hello
dtype: object
Now let’s look at a Pandas DataFrame.
df
| A | B | C | |
|---|---|---|---|
| 0 | 1.0 | 5.0 | 1 |
| 1 | 2.0 | NaN | 2 |
| 2 | NaN | NaN | 3 |
df.isnull()
| A | B | C | |
|---|---|---|---|
| 0 | False | False | False |
| 1 | False | True | False |
| 2 | True | True | False |
df.notnull()
| A | B | C | |
|---|---|---|---|
| 0 | True | True | True |
| 1 | True | False | True |
| 2 | False | False | True |
Again, boolean masks as index
df[ df.notnull() ]
| A | B | C | |
|---|---|---|---|
| 0 | 1.0 | 5.0 | 1 |
| 1 | 2.0 | NaN | 2 |
| 2 | NaN | NaN | 3 |
### EXERCISE: Detecting Missing Values
#
# 1. print: creat a Series ("ser") with the elements:
# np.nan, 'hello', None, 5
# 2. print: Use isnull() to create a boolean mask
# 3. print: Count how many missing values are in the Series
# 4. print: Filter the Series to show only non-null values
### Your code starts here:
### Your code ends here.
Original Series:
0 1
1 NaN
2 hello
3 None
4 5
dtype: object
Boolean mask (isnull):
0 False
1 True
2 False
3 True
4 False
dtype: bool
Number of missing values: 2
Non-null values:
0 1
2 hello
4 5
dtype: object
4.4.4.3. Dropping Null Values#
Beyond masking, pandas provides dropna() to remove missing entries. On a Series, its behavior is straightforward:
ser = pd.Series([1, np.nan, 'hello', None])
ser
0 1
1 NaN
2 hello
3 None
dtype: object
ser.dropna()
0 1
2 hello
dtype: object
In a DataFrame, dropna() removes whole rows or whole columns, not individual cells.
By default, it returns a new object with missing values removed.
Use
inplace=Trueonly if you want to modify the original object directly.
df = pd.DataFrame(
[
[1, np.nan, 2],
[2, 3, 5],
[np.nan, 4, 6]
]
)
df
| 0 | 1 | 2 | |
|---|---|---|---|
| 0 | 1.0 | NaN | 2 |
| 1 | 2.0 | 3.0 | 5 |
| 2 | NaN | 4.0 | 6 |
### dropping rows by default
df.dropna()
| 0 | 1 | 2 | |
|---|---|---|---|
| 1 | 2.0 | 3.0 | 5 |
### dropping columns instead
# df.dropna(axis=1) ### the same as below
df.dropna(axis='columns')
| 2 | |
|---|---|
| 0 | 2 |
| 1 | 5 |
| 2 | 6 |
thresh means: keep rows (or columns) that have at least that many non-missing values.
df.dropna(thresh=2)keeps rows with 2 or more non-NaN valuesrows with fewer than 2 non-missing values are dropped
So thresh sets a minimum data-completeness requirement before keeping a row/column.
df.dropna(thresh=2)
| 0 | 1 | 2 | |
|---|---|---|---|
| 0 | 1.0 | NaN | 2 |
| 1 | 2.0 | 3.0 | 5 |
| 2 | NaN | 4.0 | 6 |
### EXERCISE: Dropping Missing Values
#
# Create a DataFrame:
# df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, np.nan], 'C': [np.nan, np.nan, np.nan]})
# 1. print: the df
# 2. print: Drop rows with any missing values
# 3. Drop columns where ALL values are missing
# 4. Drop rows only if they have missing values in column 'A'
### Your code starts here:
### Your code ends here.
Original DataFrame:
A B C
0 1.0 4.0 NaN
1 NaN 5.0 NaN
2 3.0 NaN NaN
Drop rows with any NaN:
Empty DataFrame
Columns: [A, B, C]
Index: []
Drop columns where all values are NaN:
A B
0 1.0 4.0
1 NaN 5.0
2 3.0 NaN
Drop rows with NaN in column 'A':
A B C
0 1.0 4.0 NaN
2 3.0 NaN NaN
4.4.4.3.1. Filling Null Values#
Instead of dropping missing values, you may want to substitute a valid value—either a constant (e.g., 0) or an estimate via imputation (e.g., mean) or interpolation (estimated values between observed points). While you could do this with a Boolean mask from isna()/isnull(), Pandas offers the dedicated fillna() method, which returns a new object (or can operate in place) with nulls replaced.
ser = pd.Series(
[1, np.nan, 3, None, 5],
index=list('abcde'),
dtype='Int64'
)
ser
a 1
b <NA>
c 3
d <NA>
e 5
dtype: Int64
We can fill NA entries with a single value such as 0:
# ser.fillna(0) ### fill NA entries with a single value, such as zero
ser.fillna(value=0)
a 1
b 0
c 3
d 0
e 5
dtype: Int64
### We can specify a forward fill to propagate the previous value forward:
ser.ffill()
a 1
b 1
c 3
d 3
e 5
dtype: Int64
### Or we can specify a backward fill to propagate the next values backward:
ser.bfill()
a 1
b 3
c 3
d 5
e 5
dtype: Int64
For DataFrames, the options are similar: You can additionally specify the axis (rows or columns) along which the fill should be applied:
df = pd.DataFrame({
'A' : [ 1, 2, np.nan ],
'B' : [ 5, np.nan, np.nan],
'C' : [ 1, 2, 3]
})
df
| A | B | C | |
|---|---|---|---|
| 0 | 1.0 | 5.0 | 1 |
| 1 | 2.0 | NaN | 2 |
| 2 | NaN | NaN | 3 |
### fill a column (Series) with the mean of that column
df['A'].fillna(value=df['A'].mean())
0 1.0
1 2.0
2 1.5
Name: A, dtype: float64
print(df) ### original code
### ffill along rows (default)
df.ffill()
A B C
0 1.0 5.0 1
1 2.0 NaN 2
2 NaN NaN 3
| A | B | C | |
|---|---|---|---|
| 0 | 1.0 | 5.0 | 1 |
| 1 | 2.0 | 5.0 | 2 |
| 2 | 2.0 | 5.0 | 3 |
print(df) ### original code
### bfill along columns
df.bfill(axis=1)
A B C
0 1.0 5.0 1
1 2.0 NaN 2
2 NaN NaN 3
| A | B | C | |
|---|---|---|---|
| 0 | 1.0 | 5.0 | 1.0 |
| 1 | 2.0 | 2.0 | 2.0 |
| 2 | 3.0 | 3.0 | 3.0 |
### fillna with the mean of each column (numeric only)
### Per-column mean (most common)
print(df) ### original code
# df.fillna(value=df.mean(numeric_only=True), inplace=True)
df.fillna(value=df.mean(numeric_only=True))
A B C
0 1.0 5.0 1
1 2.0 NaN 2
2 NaN NaN 3
| A | B | C | |
|---|---|---|---|
| 0 | 1.0 | 5.0 | 1 |
| 1 | 2.0 | 5.0 | 2 |
| 2 | 1.5 | 5.0 | 3 |
### Advanced; FYI only
### T is the transpose of the DataFrame, which swaps rows and columns.
# This allows us to compute the mean across rows instead of columns.
### Per-row mean (fill each row’s NaNs with that row’s mean)
df.T.fillna(value=df.T.mean(numeric_only=True)).T
| A | B | C | |
|---|---|---|---|
| 0 | 1.0 | 5.0 | 1.0 |
| 1 | 2.0 | 2.0 | 2.0 |
| 2 | 3.0 | 3.0 | 3.0 |
### Advanced; FYI only
### Per-row mean using apply and a lambda function
df.apply(lambda row: row.fillna(row.mean()), axis=1)
| A | B | C | |
|---|---|---|---|
| 0 | 1.0 | 5.0 | 1.0 |
| 1 | 2.0 | 2.0 | 2.0 |
| 2 | 3.0 | 3.0 | 3.0 |
### EXERCISE: Filling Missing Values
#
# Create a DataFrame using the dictionary: {'X': [1, 2, np.nan, 4], 'Y': [np.nan, 2, 3, 4]}
# 1. Fill missing values in column 'X' with the mean of 'X'
# 2. Fill missing values in column 'Y' with forward fill
# 3. Fill all remaining NaN with 0
#
### Your code starts here:
### Your code ends here.
Original DataFrame:
X Y
0 1.0 NaN
1 2.0 2.0
2 NaN 3.0
3 4.0 4.0
After filling X with mean:
X Y
0 1.000000 NaN
1 2.000000 2.0
2 2.333333 3.0
3 4.000000 4.0
After forward filling Y:
X Y
0 1.000000 NaN
1 2.000000 2.0
2 2.333333 3.0
3 4.000000 4.0
After filling remaining with 0:
X Y
0 1.000000 0.0
1 2.000000 2.0
2 2.333333 3.0
3 4.000000 4.0
### EXERCISE: Handling Missing Data with dropna() and fillna()
df = pd.DataFrame({'X': [1, 2, np.nan, 4], 'Y': [np.nan, 2, 3, 4]})
### Using the DataFrame df, perform the following steps:
### 1. Check missing values per column with isna().sum()
### 2. Drop rows that have any missing values; compare shapes
### 3. Fill a numeric column's NaN with the column mean
### 4. Try dropna(thresh=2) and observe the difference
### Your code begins here
# 1. Check missing counts
# 2. Drop rows with any NaN
# 3. Fill a numeric column with the mean
# 4. Try thresh
### Your code ends here
1. Missing values:
X 1
Y 1
dtype: int64
2. Original shape: (4, 2) → After dropna(): (2, 2)
3. Filled 'X' NaN with mean (2.33)
4. After dropna(thresh=2) shape: (2, 2)