Multiple Regression

20. Multiple Regression#

In simple linear regression we related one feature to a target. Real data sets rarely have a single useful feature, so we turn to multiple regression, which fits a linear model using several predictors at once. This lets us capture richer structure (e.g., size and neighborhood and age for a house) and measure the unique contribution of each feature while holding the others fixed.

20.1. What you’ll learn#

When multiple regression is appropriate and how it extends simple regression
How to specify the model, estimate coefficients, and interpret them
How to check assumptions (linearity, additivity, multicollinearity, residual diagnostics)
How to evaluate model quality with train/test splits and error metrics

20.2. The model#

A multiple regression with predictors \(x_1, x_2, \dots, x_p\) and target \(y\) is written as:

\[ \hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p \]

\(\beta_0\) is the intercept (predicted \(y\) when all predictors are 0).
Each \(\beta_j\) captures the change in \(\hat{y}\) for a one-unit increase in \(x_j\), holding other predictors constant.

20.3. Typical workflow#

Inspect and clean the data; handle missing values and outliers.
Explore relationships with pairplots/correlations to spot multicollinearity.
Split into training and testing sets.
Fit the regression model on training data; examine coefficients and p-values.
Check residuals for patterns, heteroscedasticity, and non-linearity.
Evaluate on held-out data using metrics like \(R^2\) and RMSE.

20.4. In this chapter#

We’ll start with a housing-price example to see how adding multiple predictors improves accuracy over a single-feature model.
We’ll interpret coefficients carefully (e.g., “holding area constant, an extra bedroom adds…”), and connect them to domain knowledge.
We’ll practice diagnosing model issues and improving them by engineering features or removing collinear variables.

Use the next notebooks to dive into code examples, model fitting, and evaluation.