Descriptive Statistics & Distribution

14. Descriptive Statistics & Distribution#

Goals

Describe a variable using center, spread, and shape
Compute descriptive statistics: mean, median, variance, standard deviation
Understand distributions
Connect probability to data-generating processes
Randomness
The Law of Large Numbers

### headers
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

14.1. Data and Variables#

When describing data, make sure you cover center, spread, and shape.

tips = sns.load_dataset('tips')
tips                      ### size means number of ppl in the party

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4
...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3
240	27.18	2.00	Female	Yes	Sat	Dinner	2
241	22.67	2.00	Male	Yes	Sat	Dinner	2
242	17.82	1.75	Male	No	Sat	Dinner	2
243	18.78	3.00	Female	No	Thur	Dinner	2

244 rows × 7 columns

tips.shape       ### a dataframe has shape, which is a tuple

(244, 7)

14.1.1. Population vs sample#

Population is the full group we care about (e.g., “all lunch bills at the restaurant this year”), whereas sample is the subset we actually measured (e.g., the 244 rows in the tips dataset). We only ever see the sample, but we talk like we know the population.

14.1.2. Variable vs Observation#

Each row in the dataframe/dataset is an observation (e.g., one restaurant bill), whereas columns are variables (e.g., total_bill, tip, smoker, time (Dinner/Lunch), etc.).

Quantitative variable: e.g., total_bill (numeric, measured)
Categorical variable: time (Lunch vs Dinner), smoker (Yes/No)

14.1.3. Distribution#

A distribution shows how often different values occur. For numeric data:

Are most bills around $10? $20? $50?
Are there a few very large bills? (outliers)

For categorical data:

How common is “Dinner” vs “Lunch”?

14.1.4. Descriptive Statistics#

df.describe() gives you the descriptive statistics of the dataset.

tips.describe()

	total_bill	tip	size
count	244.000000	244.000000	244.000000
mean	19.785943	2.998279	2.569672
std	8.902412	1.383638	0.951100
min	3.070000	1.000000	1.000000
25%	13.347500	2.000000	2.000000
50%	17.795000	2.900000	2.000000
75%	24.127500	3.562500	3.000000
max	50.810000	10.000000	6.000000

Key summaries for a numeric variable:

Mean (average)
Median (middle value)
Standard deviation (typical distance from the mean)
IQR (middle 50% range)

Let’s compute them for total_bill.

14.1.5. Center: mean vs median#

col = tips['total_bill']

summary = pd.DataFrame({
    'mean': [col.mean()],
    'median': [col.median()],
    'std (SD)': [col.std()],
    'IQR': [col.quantile(0.75) - col.quantile(0.25)]
})
summary

	mean	median	std (SD)	IQR
0	19.785943	17.795	8.902412	10.78

14.1.6. Visualizing One Variable#

We’ll use:

Histogram (counts or density)
Boxplot (median, quartiles, outliers)
KDE / density curve (smoothed shape)

Interpretation checklist: center / spread / shape / outliers.

sns.set_theme()

fig, axes = plt.subplots(1, 2, figsize=(10,4))

sns.histplot(tips, x='total_bill', kde=True, ax=axes[0])
axes[0].set_title('Histogram + KDE')

sns.boxplot(y=tips['total_bill'], ax=axes[1])
axes[1].set_title('Boxplot')

plt.tight_layout()

../../_images/db811e3fa6f728ac1bb2f6e2e5d853128456d97305eebc9994cf4cc67696881d.png

14.2. Distributions#

14.2.1. Normal Distribution#

A normal distribution—also called a Gaussian distribution—is a symmetrical, bell-shaped probability distribution centered on the mean, where most data points cluster around that mean with frequencies tapering off equally toward both extremes.

Key Properties:

Defined by its mean (μ) and standard deviation (σ).
The mean, median, and mode are equal and located at the center.
Symmetrical: the left and right sides are mirror images.
68% of data falls within 1 standard deviation, 95% within 2, and 99.7% within 3 standard deviations from the mean (“Empirical Rule”).
Frequently models real-world phenomena due to the Central Limit Theorem.

Example Applications:

Heights of People: Human heights in a given population typically follow a normal distribution—most people are near the average height, with fewer extremely short or tall.
IQ Scores: Measured intelligence is designed to follow a normal curve, with an average of 100 and most people close to the average.
Measurement Errors: Repeated measurements of something (e.g., lengths, weights) often cluster around the true value, following a normal distribution.
Test Scores: Large-scale standardized test results (e.g., SAT, GRE) often approximate a normal distribution after some normalization.

Example	Mean (μ)	Standard Deviation (σ)	What is Measured
Adult heights	170 cm	10 cm	Human height distribution
IQ scores	100	15	Intelligence
SAT scores	1050	200	Standardized tests

Let us use the random module in NumPy to explore.

from numpy import random
random.seed(42)

x = random.normal(size=(2, 3))
print(type(x))
print(x)

<class 'numpy.ndarray'>
[[ 0.49671415 -0.1382643   0.64768854]
 [ 1.52302986 -0.23415337 -0.23413696]]

Now let’s add mean and standard deviation (because this is a normal distribution). Here we are controlling the center and the spread of the distribution.

### generate a random normal distribution of size 2X3 with mean at 1 and SD of 2

random.seed(42)
x = random.normal(loc=1, scale=2, size=(2, 3))
print(x)

[[1.99342831 0.7234714  2.29537708]
 [4.04605971 0.53169325 0.53172609]]

Time for some visualization. Note that we are using height and aspect (ratio) to control figure size.

random.seed(42)

np.random.seed(42)
sns.displot(random.normal(size=1000), height=3, aspect=2,kde=True)

<seaborn.axisgrid.FacetGrid at 0x1159eec10>

../../_images/55cbcd06713e5f8ec87d1ba982de23c4f24d71ca12f1a7e4ef98c11f76827a37.png

A KDE only plot. Note that we are controlling figure size by

plt.figure(figsize)
dpi

random.seed(42)

np.random.seed(42)

plt.figure(figsize=(6.5, 3))
sns.kdeplot(random.normal(size=1000))

<Axes: ylabel='Density'>

../../_images/b3ac7681b457c78c8345ff7274c35573d58b90edfb24b90c9e049a35fc89766e.png

Now let’s add dpi to control figure size.

np.random.seed(42)

import matplotlib as mpl
with mpl.rc_context({"figure.dpi": 100}):
    plt.figure(figsize=(6.5, 3))
    sns.kdeplot(random.normal(size=1000))

np.random.seed(42)

import matplotlib as mpl
with mpl.rc_context({"figure.dpi": 50}):
    plt.figure(figsize=(6.5, 3))
    sns.kdeplot(random.normal(size=1000))

../../_images/eabbb54c94ac8b6944edfac1fffbea73a0e08f33ef5a446b4df4ae5846f779bc.png

14.2.2. Binomial Distribution#

A binomial distribution is a discrete probability distribution that models the number of successes in a fixed number of independent trials, where each trial has only two outcomes: “success” or “failure.” Every trial must have the same probability of success. This distribution is fundamental for analyzing repeated yes/no-type experiments (Bernoulli trials) and appears frequently in statistics and data science.

Key Features:

There are n fixed trials.
Each trial is independent.
There are only two possible outcomes per trial (success or failure).
The chance of success p is constant for each trial.

For example, flipping a fair coin 10 times and counting the number of heads. Here, each flip is independent, and the probability of heads (success) is 0.5. The distribution of the number of heads follows a binomial distribution with n=10, p=0.5.

# binomial vs normal distributions
random.seed(42)

norm = random.normal(loc=50, scale=5, size=1000)
bino = random.binomial(n=100, p=0.5, size=1000)

fig, ax = plt.subplots(figsize=(4, 3))
sns.kdeplot(data=norm, label='Normal', color='red')
sns.kdeplot(data=bino, label='Binomial', color='green')
plt.tight_layout()

../../_images/67c4915d9fbffce73ea7d211a556ca8cd7c9e28c1d15a56c858620761a9f6db3.png

Visually comparing normal distribution and binomial distribution.

### test out 1 scaler
random.binomial(n=100, p=0.5, size=1)

array([45])

### normal distribution vs binomial
### binomial approximates normal distribution when n is big (np >=5)
### binomial "spikes" because it's integers

random.seed(42)
norm = random.normal(loc=50, scale=5, size=1000)  ### mean=50, SD=5 to match binomial
bino = random.binomial(n=100, p=0.5, size=1000)

fig, ax = plt.subplots(1, 2, figsize=(8, 3))
sns.histplot(norm, label='normal', color='red', ax=ax[0], kde=True)
sns.histplot(bino, label="binomial", color='green', ax=ax[1], kde=True)
fig.legend()
fig.tight_layout()

../../_images/02386812dde3c386442124099ace292d83828be7c29cc342b72fcf166c56d877.png

Coin flipping! (If you count the numbers of heads, it will be a normal distribution)

# Define the coin sides
coin = ['Heads', 'Tails']

# Simulate 100 coin flips
random.seed(42)
flips = np.random.choice(coin, size=100)

# Count the number of heads
num_heads = np.count_nonzero(flips == 'Heads')

print("Number of Heads:", num_heads)

sns.displot(flips, height=3, aspect=2, bins=10, kde=False)  

Number of Heads: 44

<seaborn.axisgrid.FacetGrid at 0x11598d1d0>

../../_images/8a90e78d20e83c190de96b2dfbfa5d4facff2af576bc11a4f7618eb478ed8c67.png

14.2.3. Uniform Distribution#

A uniform distribution is a probability distribution where every outcome within a certain range is equally likely to occur. This means there is no bias—each result has the same probability as any other within the specified set.

Examples:

Rolling a fair six-sided die. Each face (1 through 6) has a 1/6 chance of appearing, so the probability of rolling any particular number is equal.
Flipping a fair coin. Heads and tails each have a 50% chance.

The numpy.random.uniform() method draws samples from a uniform distribution. Samples are uniformly distributed over the half-open interval [low, high) (includes low, but excludes high). In other words, any value within the given interval is equally likely to be drawn by uniform.

The syntax:

random.uniform(low=0.0, high=1.0, size=None)

random.seed(42)

unif1 = random.uniform()                               ### return 1 number [0, 1)
unif2 = random.uniform(low=1, high=5, size=3)          ### return size=3 numbers [0, 1)
unif3 = random.uniform(low=5, high=10, size=(3, 3))    ### return a 3x3 2D array
unif4 = np.round(random.uniform(low=5, high=10, size=(3, 3)), 2)     ### rounded
unif5 = random.randint(low=5, high=10, size=(3, 3))    ### int

for unif in [ unif1, unif2, unif3, unif4, unif5 ]:
    print(unif, '\n')         

0.3745401188473625 

[4.80285723 3.92797577 3.39463394] 

[[5.7800932  5.7799726  5.29041806]
 [9.33088073 8.00557506 8.54036289]
 [5.10292247 9.84954926 9.1622132 ]] 

[[6.06 5.91 5.92]
 [6.52 7.62 7.16]
 [6.46 8.06 5.7 ]] 

[[8 8 7]
 [8 8 5]
 [7 9 7]] 

14.2.3.1. Best Example: Rolling Die#

We eventually flip 1 million times here and see that all 6 numbers and have equal probability distribution.

### flip for many times

fig, axes = plt.subplots(1, 5, figsize=(12, 3))

flips = []
for i, ran in enumerate([ 10, 100, 1000, 10000, 1000000 ]):
    for flip in range(ran):                  ### if n > 26, print vertically
        flips.append(random.randint(1, 7))
    # sns.histplot(flips, ax=axes[i])
    sns.histplot(flips, ax=axes[i], bins=6)

plt.tight_layout()

../../_images/76ecb7b5ba4ed55e8483c142a4b942b9866ae69f2bc6a2a550e722f0dd7a1b0e.png

Note that binning can affect the plotting greatly.

### displot with height and aspect

sns.displot(flips, height=2.5, aspect=3/2)
sns.displot(flips, height=2.5, aspect=3/2, bins=6)

<seaborn.axisgrid.FacetGrid at 0x1169302d0>

../../_images/8d7adb4ce21f6c1446f8716cb3cdaef81705bf9cadadf8a1e56bb80c2eecbb28.png

../../_images/e391d3cf416cd56e767aeda6ae0d31df19aeb69a58b38a6c57edbdc65914336b.png

Now use randint() to append 1000,000 random ints.

### flip for 1 million times
random.seed(42)

flips2 = []
for flip in range(1000000):
    flips2.append(random.randint(1, 7))

plt.figure(figsize=(4, 3))
sns.histplot(flips2)

print(len(flips2))
print(len(flips2)/6)

1000000
166666.66666666666

../../_images/f304b08103e61da1e3c4346e3b65181b76bebaaaf64250d48e64b3de45b014e4.png

Use the random.uniform() function to check. Looks uniform to me.

random.seed(42)
unif = random.uniform(0, 5, size=1000000)

fig, ax = plt.subplots(1, 2, figsize=(6, 4))

sns.histplot(data=unif, bins=6, ax=ax[0])
sns.histplot(data=unif, bins=100, ax=ax[1])

plt.tight_layout()

../../_images/853f38e3c4e8e6298bc0b1b35f653b3d1539ac9b77d1529aba84241620c297b3.png

But, when we have smaller n and larger bins, we see spikes:

random.seed(42)
unif = random.uniform(0, 5, size=1000)    ### low, high exclusive
sns.displot(data=unif, kind='hist', height=3, aspect=3/2, bins=100)
# sns.displot(unif, kind='kde')
# plt.show()

<seaborn.axisgrid.FacetGrid at 0x119279f90>

../../_images/23afbc4b3ae9e7da5fb652f19951731261c1f2c82da8c98ca7d970491a3bdb15.png

Using scatter plot to visualize.

random.seed(42)
unif = random.uniform(0, 5, size=1000)

sns.scatterplot(data=unif)

<Axes: >

../../_images/4b9ad9b3d0ecee22cf29895b91eb8b418df45a22955f1a8250ce57dcf1d119de.png

fig, axes = plt.subplots(1, 4, figsize=(12, 3)) ### now we have one fig with two axes

unif1 = random.uniform(0, 5, size=100)
unif2 = random.uniform(0, 5, size=1000)
unif3 = random.uniform(0, 5, size=10000)
unif4 = random.uniform(0, 5, size=10000000)

sns.scatterplot(unif1, ax=axes[0])          
sns.scatterplot(unif2, ax=axes[1])
sns.scatterplot(unif3, ax=axes[2])
sns.scatterplot(unif4, ax=axes[3])

<Axes: >

../../_images/9294fe419dcfb764bb15633aebb7007c255e66c7642768e49fed55b17d448de2.png

Now, produce a similar plot like the one above for random.randn(). Note the binning strategy here.

fig, axes = plt.subplots(1, 4, figsize=(12, 3)) ### now we have one fig with two axes

datasets = [ unif1, unif2, unif3, unif4 ]
for i, data in enumerate(datasets):

    q25, q75 = np.percentile(data, [25, 75])            ### bins
    bin_width = 2 * (q75 - q25) * len(data) ** (-1/3)   ### bins
    bins = int((data.max() - data.min()) / bin_width)   ### bins 
    
    sns.histplot(data=data, ax=axes[i], bins=bins)      ### plot

plt.tight_layout()

../../_images/449316f8327fda4bb751b4bfb4b9744aef496c797d7e936957c52c2661431ecd.png

Visualizing using KDE.

fig, axes = plt.subplots(1, 4, figsize=(12, 3)) ### now we have one fig with two axes

for i, data in enumerate(datasets):

    ### bins ###
    q25, q75 = np.percentile(data, [25, 75])
    bin_width = 2 * (q75 - q25) * len(data) ** (-1/3)
    bins = int((data.max() - data.min()) / bin_width)   ### bins #
    
    sns.kdeplot(data=data, ax=axes[i])      ### plot

plt.tight_layout()

../../_images/ebf9ddcd2456f931a140afcbb7e879fa79232a6fd6ac1c5348aba474ef2f0513.png

14.3. The Law of Large Numbers#

We’ll simulate coin flips to show:

Randomness
Law of large numbers (the running average stabilizes)

Each flip is 0 (tails) or 1 (heads).

tips.sample(5)

	total_bill	tip	sex	smoker	day	time	size
101	15.38	3.00	Female	Yes	Fri	Dinner	2
127	14.52	2.00	Female	No	Thur	Lunch	2
75	10.51	1.25	Male	No	Sat	Dinner	2
22	15.77	2.23	Female	No	Sat	Dinner	2
220	12.16	2.20	Male	Yes	Fri	Lunch	2

rng = np.random.default_rng(42)

type(rng.integers(0, 2, 10))
rng = np.random.default_rng(42) 
flips = rng.integers(0, 2, 200)
flips

array([0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0,
       0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0,
       0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0,
       1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1,
       0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1,
       0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0,
       0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0,
       1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
       0, 1])

np.arange(len(flips)) + 1     ### fixed and grow linearly

array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
        14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
        27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
        40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
        53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
        66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
        79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
        92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103, 104,
       105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117,
       118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130,
       131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,
       144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156,
       157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169,
       170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182,
       183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195,
       196, 197, 198, 199, 200])

flips.cumsum()                ### grow slower because of 0, 1

array([  0,   1,   2,   2,   2,   3,   3,   4,   4,   4,   5,   6,   7,
         8,   9,  10,  11,  11,  12,  12,  13,  13,  13,  14,  15,  16,
        16,  17,  18,  18,  18,  18,  18,  19,  20,  20,  21,  22,  22,
        23,  23,  24,  25,  25,  25,  26,  26,  27,  28,  29,  30,  30,
        30,  30,  30,  30,  31,  31,  32,  33,  34,  35,  35,  36,  36,
        36,  37,  37,  37,  37,  38,  38,  38,  38,  39,  39,  39,  39,
        40,  41,  42,  42,  42,  43,  44,  45,  45,  45,  46,  47,  47,
        48,  49,  49,  50,  50,  50,  51,  52,  52,  53,  53,  54,  54,
        55,  56,  57,  58,  58,  59,  59,  60,  61,  61,  62,  63,  63,
        63,  63,  63,  63,  64,  65,  65,  66,  67,  67,  68,  69,  70,
        71,  72,  72,  73,  74,  74,  75,  75,  75,  75,  76,  76,  76,
        76,  77,  78,  78,  78,  79,  79,  80,  80,  81,  81,  81,  82,
        82,  83,  84,  85,  86,  87,  87,  87,  87,  88,  88,  88,  88,
        88,  88,  88,  89,  90,  91,  91,  92,  92,  93,  94,  95,  95,
        95,  96,  96,  96,  96,  96,  96,  96,  97,  98,  99,  99, 100,
       100, 100, 100, 100, 101])

flips.cumsum() / (np.arange(len(flips)) + 1)  ### will get as big as about half of len() 

array([0.        , 0.5       , 0.66666667, 0.5       , 0.4       ,
       0.5       , 0.42857143, 0.5       , 0.44444444, 0.4       ,
       0.45454545, 0.5       , 0.53846154, 0.57142857, 0.6       ,
       0.625     , 0.64705882, 0.61111111, 0.63157895, 0.6       ,
       0.61904762, 0.59090909, 0.56521739, 0.58333333, 0.6       ,
       0.61538462, 0.59259259, 0.60714286, 0.62068966, 0.6       ,
       0.58064516, 0.5625    , 0.54545455, 0.55882353, 0.57142857,
       0.55555556, 0.56756757, 0.57894737, 0.56410256, 0.575     ,
       0.56097561, 0.57142857, 0.58139535, 0.56818182, 0.55555556,
       0.56521739, 0.55319149, 0.5625    , 0.57142857, 0.58      ,
       0.58823529, 0.57692308, 0.56603774, 0.55555556, 0.54545455,
       0.53571429, 0.54385965, 0.53448276, 0.54237288, 0.55      ,
       0.55737705, 0.56451613, 0.55555556, 0.5625    , 0.55384615,
       0.54545455, 0.55223881, 0.54411765, 0.53623188, 0.52857143,
       0.53521127, 0.52777778, 0.52054795, 0.51351351, 0.52      ,
       0.51315789, 0.50649351, 0.5       , 0.50632911, 0.5125    ,
       0.51851852, 0.51219512, 0.5060241 , 0.51190476, 0.51764706,
       0.52325581, 0.51724138, 0.51136364, 0.51685393, 0.52222222,
       0.51648352, 0.52173913, 0.52688172, 0.5212766 , 0.52631579,
       0.52083333, 0.51546392, 0.52040816, 0.52525253, 0.52      ,
       0.52475248, 0.51960784, 0.52427184, 0.51923077, 0.52380952,
       0.52830189, 0.53271028, 0.53703704, 0.53211009, 0.53636364,
       0.53153153, 0.53571429, 0.53982301, 0.53508772, 0.53913043,
       0.54310345, 0.53846154, 0.53389831, 0.52941176, 0.525     ,
       0.52066116, 0.52459016, 0.52845528, 0.52419355, 0.528     ,
       0.53174603, 0.52755906, 0.53125   , 0.53488372, 0.53846154,
       0.54198473, 0.54545455, 0.54135338, 0.54477612, 0.54814815,
       0.54411765, 0.54744526, 0.54347826, 0.53956835, 0.53571429,
       0.53900709, 0.53521127, 0.53146853, 0.52777778, 0.53103448,
       0.53424658, 0.53061224, 0.52702703, 0.53020134, 0.52666667,
       0.52980132, 0.52631579, 0.52941176, 0.52597403, 0.52258065,
       0.52564103, 0.52229299, 0.52531646, 0.52830189, 0.53125   ,
       0.53416149, 0.53703704, 0.53374233, 0.5304878 , 0.52727273,
       0.53012048, 0.52694611, 0.52380952, 0.52071006, 0.51764706,
       0.51461988, 0.51162791, 0.51445087, 0.51724138, 0.52      ,
       0.51704545, 0.51977401, 0.51685393, 0.51955307, 0.52222222,
       0.52486188, 0.52197802, 0.51912568, 0.52173913, 0.51891892,
       0.51612903, 0.51336898, 0.5106383 , 0.50793651, 0.50526316,
       0.5078534 , 0.51041667, 0.51295337, 0.51030928, 0.51282051,
       0.51020408, 0.50761421, 0.50505051, 0.50251256, 0.505     ])

rng = np.random.default_rng(42)        ### random generator; seed: 42

flips = rng.integers(0, 2, size=200)   ### 200 0 or 1
running_prop = flips.cumsum() / (np.arange(len(flips)) + 1)     ### cumsum(): cumulative sum
                                                                ### output: an array
plt.figure(figsize=(12,4))
plt.plot(running_prop, marker='o', markersize=2, linewidth=1)   ### matplotlib.pyplot the array
plt.axhline(0.5, color='red', linestyle='--')
plt.xlabel('Number of flips')
plt.ylabel('Proportion heads so far')
plt.title('Law of Large Numbers intuition')
plt.ylim(0,1)
plt.show()

../../_images/05c587a789dfe378fd558db38c9f28a824067a63fdee228602995c6a2475dcff.png