Prof. Frenzel
10 min readOct 27, 2024
#KB Probability Theory — Part 3- Continuous Probability Distributions

Dear Statisticians!

Welcome back! Now that we’ve explored discrete probability distributions, it’s time to focus on another set of questions: How long will a customer wait for their order? What are the daily fluctuations in stock prices? These are continuous outcomes, where the values can take any point within a range, and predicting them accurately involves continuous probability distributions — the mathematical tools that help us measure and predict outcomes across an infinite set of possible values within a range.

In this article, I will focus on continuous probability distributions and their applications, focusing on the most widely used: the normal distribution. From defining continuous variables to understanding the “bell curve” that underpins so much of statistical theory, this section will bring clarity to how we measure, model, and anticipate data points that vary in a continuous flow.

What Are Continuous Probability Distributions?

Imagine tracking the wait time for each customer at Ella’s Cold Brew. On a busy morning, one person might wait exactly 2.7 minutes, another 3.1 minutes, and yet another 4.52 minutes. Instead of modeling discrete outcomes — like counting the number of customers — continuous distributions allow us to model events that can take any real value across an unbroken range. Since there are infinitely many possible values, the probability of the variable assuming any single exact value is essentially zero. Instead, probabilities are assigned to intervals (e.g. “What is the likelihood a customer waits between 2 and 3 minutes?”), and the likelihood is determined by calculating the area under the probability density function (PDF) over that range.

Introduction to the Normal Distribution

The normal distribution — or Gaussian distribution — is perhaps the most recognized continuous probability distribution, often visualized as the classic “bell curve.”

Normal Distribution

This symmetrical, bell-shaped curve describes how data points are distributed around a central value, with most values clustering near the mean and fewer values appearing as you move further away on either side. The prevalence of the normal distribution in natural and social phenomena stems from the Central Limit Theorem, which explains that the sum of many independent random variables tends toward a normal distribution, regardless of their original distributions. From biological measurements like heights and weights to test scores and measurement errors, the normal distribution is central to many fields.

For example, if we consider customer waiting times at Ella’s Cold Brew, most times will likely hover around an average value (let’s say 3 minutes), with shorter or longer wait times becoming progressively less frequent as you move away from this average. This pattern mirrors the bell curve and illustrates how the normal distribution reflects real-life variations.

Key Parameters of the Normal Distribution

The normal distribution has two fundamental parameters: the mean (μ) and the standard deviation (σ). Together, they determine the position and shape of the bell curve, providing insights into both central tendency and variability within a dataset.

The Mean (μ): The mean marks the center of a normal distribution and represents the peak of the bell curve, where most data points are concentrated. For example, if the average customer wait time is μ = 3 minutes, the highest point on the curve would be at 3 minutes, meaning most customers experience a wait time close to this value. As a measure of central tendency, μ shifts the entire curve along the x-axis without changing its shape.

The Standard Deviation (σ): The standard deviation indicates how spread out the values are around the mean. A smaller σ results in data points clustering tightly around μ, creating a tall, narrow curve, while a larger σ produces a wider curve, showing that values are more spread out (as illustrated by the red and orange curves below). For instance, if wait times at Ella’s Cold Brew have a high standard deviation, customers’ wait times vary greatly, making it harder to predict a typical wait duration.

Impact of Parameters on Distribution Shape

Adjusting μ shifts the entire curve along the horizontal axis without changing its shape, effectively relocating the center of the distribution. Modifying σ alters the width and height of the curve, affecting how concentrated or dispersed the data appear.

The Empirical Rule: Understanding Data Distribution

The empirical rule, also known as the 68–95–99.7 rule, describes how data are distributed in a normal distribution relative to the mean and standard deviations.

Empirical Rule

For a normally distributed dataset, the rule specifies that:

  • 68.3% of data falls within one standard deviation (±1σ) of the mean.
  • 95.4% of data lies within two standard deviations (±2σ) of the mean.
  • 99.7% of data is within three standard deviations (±3σ) of the mean.

This rule offers a simple way to estimate the likelihood of a value falling within a certain range in a normal distribution. For instance, if wait times at Ella’s Cold Brew follow a normal distribution with an average (μ) of 3 minutes and a standard deviation (σ) of 1 minute, then about 68.3% of customers will wait between 2 and 4 minutes, 95.4% between 1 and 5 minutes, and 99.7% between 0 and 6 minutes.

Practical Applications of the Empirical Rule

  • In healthcare, normal body temperature follows the Empirical Rule closely, averaging 98.6°F (37°C). Doctors expect most healthy individuals to fall within one degree of this mean, between 97.6°F and 99.6°F. A temperature of 100.6°F (two degrees above normal) signals a mild fever, while anything above 101.6°F (three degrees above) indicates a significant fever needing immediate care. This guideline helps both healthcare providers and parents decide when medical attention is necessary.
  • In education, a standardized college entrance exam might have a mean score of 500. Knowing that 68% of students score within 100 points of the mean (between 400 and 600), admissions officers can categorize applicants more efficiently. For example, a score of 650 places a student in the top 16% of test-takers (above +1σ), while a score of 750 reflects outstanding performance in the top 2% (above +2σ). This framework helps counselors provide students with clear guidance about their college prospects.
  • In finance, investment firms use the Empirical Rule to gauge portfolio risk by studying historical returns on diversified portfolios. Under typical market conditions, the rule suggests that 68.3% of annual returns will range from -4% to +20% (±1σ). If a return falls outside the range of -28% to +44%, it’s considered a rare (3σ) event, prompting risk managers to act. This understanding supports decisions on stop-loss orders, portfolio rebalancing, and risk-based product development.

The Standard Normal Distribution

The standard normal distribution is a special case of the normal distribution with a mean of zero (μ = 0) and a standard deviation of one (σ = 1). This standardization lets us compare different normal distributions on the same scale. To convert any value from a normal distribution to the standard normal distribution, we calculate a z-score, indicating how many standard deviations the value is from the mean. The z-score can then be used with standard normal distribution tables (like z-tables) or software functions, such as those in Excel, to determine probabilities.

Z-Score Transformation

To convert any data point x from a normal distribution with mean μ and standard deviation σ into the standard normal distribution, we use the formula:

Z-Score

The resulting z-score tells us how far and in what direction x deviates from the mean in terms of standard deviations. A positive z-score indicates a value above the mean, while a negative z-score represents a value below the mean.

For example, if the average wait time at Ella’s Cold Brew is 3 minutes with a standard deviation of 1 minute, and we want to find the z-score for a 4.5-minute wait, we can use the formula like this:

This z-score of 1.5 indicates that a 4.5-minute wait time is 1.5 standard deviations above the mean. Using the Empirical Rule for a quick, mental estimate, we know that approximately 68% of values fall within one standard deviation of the mean, leaving about 32% outside this range. Going slightly beyond one standard deviation to 1.5, we can roughly estimate that about 87% of wait times fall below 4.5 minutes, as values become less likely as they move further from the mean. This suggests around a 13% chance of a wait time exceeding 4.5 minutes. This approximation provides a useful estimate without requiring precise calculations or a standard normal table.

Statistical Inference and Applications

One of the main uses of the normal distribution is in statistical inference, where we use sample data to make predictions about a larger population. In continuous probability, we’re often interested in calculating probabilities for certain ranges and constructing confidence intervals that estimate population parameters based on sample data.

Probability Calculations in Excel

Consider a portfolio with an average return of 8% per year (μ = 8%) and a standard deviation of 5% (σ = 5%). If we want to estimate the probability that the portfolio’s annual return will fall outside of one standard deviation (either below 3% or above 13%), we can use the NORM.DIST function in Excel.

1. Calculate the probability of returns being below 3% (1 standard deviation below the mean):

= NORM.DIST(0.03, 0.08, 0.05, TRUE)
  • This function returns approximately 15.87%, representing the probability that the portfolio return is less than 3%.

2. Calculate the probability of returns being below 13% (1 standard deviation above the mean):

= NORM.DIST(0.13, 0.08, 0.05, TRUE)
  • This function returns approximately 84.13%, representing the probability that the portfolio return is less than 13%.

3. Determine the probability of returns being outside of one standard deviation: Since we want to find the probability of the portfolio return being below 3% or above 13%, we need to calculate the chance of returns falling outside this range. The probability of returns staying between 3% and 13% is: 0.8413 − 0.1587 = 0.6826

So, the probability of returns being outside this range (one standard deviation) is: 1 − 0.6826 = 0.3174

In summary, there’s approximately a 31.74% chance that the portfolio’s return will fall outside the range of 3% to 13%.

Confidence Intervals

To create a confidence interval for the portfolio’s expected return, say at the 95% level, we use a two-tailed test because we are interested in the likelihood that returns fall within a certain range above and below the mean, capturing the middle 95% of possible outcomes symmetrically around it. The confidence level of 95% implies that we are 95% confident the actual portfolio return will fall within this interval, leaving a 5% chance (or alpha, α = 0.05) that it falls outside the interval: 2.5% in each tail of the distribution.

Confidence Intervals (Normal Distribution)

In constructing this confidence interval, we split α across both tails (0.05 / 2 = 0.025), meaning that each tail contains 2.5% of the probability. We then look up this probability in a z-table (or calculate it in Excel with =NORM.S.INV(0.975)) to find the corresponding z-score for the 97.5th percentile of the standard normal distribution. The z-score that corresponds to this level is approximately 1.96.

Thus, for a 95% confidence interval, the multiplier Zα/2​ is 1.96, which gives us the confidence interval formula:

Confidence Intervals

To apply the confidence interval formula for the portfolio’s annual return, we’ll use the mean return μ = 8% and the standard deviation σ = 5%, along with the multiplier Zα/2 = 1.96 for a 95% confidence level. Plugging these values into our confidence interval formula:

This means we are 95% confident that the portfolio’s annual return will fall between -1.8% (8 %-(1.96 * 5%)) and +17.8% (8% + (1.96 * 5%)). In practical terms, if the portfolio’s average return and variability remain consistent, we can expect its performance to stay within this range 95% of the time. This confidence interval offers insight into potential return fluctuations and allows for more informed risk management.

Practical Considerations

The normal distribution works best when data are symmetrically distributed, with minimal skewness or outliers. It assumes a continuous random variable affected by many additive factors. So make sure this distribution actually fits your data before applying it.

Testing for Normality

Before assuming a dataset follows a normal distribution, statistical tests like the Shapiro-Wilk test or graphical methods, such as Q-Q plots, can help assess normality. In Excel, you can use the Histogram tool to visually inspect the data distribution by displaying how frequently values occur across specified intervals, which helps reveal whether data approximates a normal distribution. Look for a symmetric, bell-shaped curve where most values cluster around the mean with fewer values toward the extremes. Additionally, examining statistical measures like the mean, median, and skewness can provide further insights; if the mean and median are close and skewness is near zero, this often suggests a distribution closer to normal.

Alternative Distributions

If data deviate significantly from normality, consider alternative distributions:

  • t-Distribution: Useful for small sample sizes when the population standard deviation is unknown.
  • Log-normal distribution: For data that skews positively, like income or biological growth.
  • Exponential Distribution: Useful for modeling time between events, like customer arrivals.
  • Non-Parametric Methods: Statistical techniques that do not assume a specific distribution, useful when data do not meet parametric test assumptions.
Prof. Frenzel
Prof. Frenzel

Written by Prof. Frenzel

Data Scientist | Engineer - Professor | Entrepreneur - Investor | Finance - World Traveler

No responses yet