20. Multiple Regression#

In simple linear regression we related one feature to a target. Real data sets rarely have a single useful feature, so we turn to multiple regression, which fits a linear model using several predictors at once. This lets us capture richer structure (e.g., size and neighborhood and age for a house) and measure the unique contribution of each feature while holding the others fixed.

20.1. What you’ll learn#

  • When multiple regression is appropriate and how it extends simple regression

  • How to specify the model, estimate coefficients, and interpret them

  • How to check assumptions (linearity, additivity, multicollinearity, residual diagnostics)

  • How to evaluate model quality with train/test splits and error metrics

20.2. The model#

A multiple regression with predictors \(x_1, x_2, \dots, x_p\) and target \(y\) is written as:

\[ \hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p \]
  • \(\beta_0\) is the intercept (predicted \(y\) when all predictors are 0).

  • Each \(\beta_j\) captures the change in \(\hat{y}\) for a one-unit increase in \(x_j\), holding other predictors constant.

20.3. Typical workflow#

  1. Inspect and clean the data; handle missing values and outliers.

  2. Explore relationships with pairplots/correlations to spot multicollinearity.

  3. Split into training and testing sets.

  4. Fit the regression model on training data; examine coefficients and p-values.

  5. Check residuals for patterns, heteroscedasticity, and non-linearity.

  6. Evaluate on held-out data using metrics like \(R^2\) and RMSE.

20.4. In this chapter#

  • We’ll start with a housing-price example to see how adding multiple predictors improves accuracy over a single-feature model.

  • We’ll interpret coefficients carefully (e.g., “holding area constant, an extra bedroom adds…”), and connect them to domain knowledge.

  • We’ll practice diagnosing model issues and improving them by engineering features or removing collinear variables.

Use the next notebooks to dive into code examples, model fitting, and evaluation.