15. Testing Hypothesis#
Data scientists routinely confront binary questions about the world. You’ve already seen examples:
Is milk beneficial for health?
Does Rolla have a higher-than-average crime rate?
Have Missouri’s demographics shifted over the past decade?
The credibility of any answer depends on the data behind it. For demographics, comprehensive census records can resolve questions with minimal uncertainty. For Rolla’s crime rate, both property and violent crimes, is significantly higher than both the state and national averages. Some sources, however, note that local residents often feel safe, with concerns typically centered on suspicious activity and weather. Questions like “Is milk good for you?” ultimately require domain expertise, but a first step is to analyze evidence from observational studies and randomized experiments.
In this chapter, we’ll practice answering yes/no questions by drawing on random samples and empirical distributions, and we’ll learn how the quality and design of data collection shape the strength of our conclusions.