Let \(x\) be a continuous random variable over the interval \([a, b]\) with probability density function \(f\). The expected value of \(x\) is defined by

\[E(x) = \int_a^b x \cdot f(x)dx\]

The Wikipedia’s definition of expected value is useful in this case.

The expected value of a discrete random variable is the probability-weighted average of all its possible values. In other words, each possible value the random variable can assume is multiplied by its probability of occurring, and the resulting products are summed to produce the expected value. Intuitively, a random variable’s expected value represents the mean of a large number of independent realizations of the random variable.

The expected value is also known as the expectation, mathematical expectation, mean, or first moment. Expected value also applies to an absolutely continuous random variable, except that an integral of the variable with respect to its probability density replaces the sum.

The concept of expected value can be generalized to functions of the random variable. In other words, for something like expected value of \(y = g(x)\) we have that

\[E(g(x)) = \int_a^b g(x) \cdot f(x)dx\]

The mean, \(\mu\), of a continuous random variable \(x\) is defined to be \(E(x)\). That is,

\[\mu = E(x) = \int_a^b x \cdot f(x)dx\]

where \(f\) is a probability density function for \(x\) defined over \([a, b]\).

Because two very different distributions can have the same mean, it is useful to have a second statistic that serves as a measure of how the data in a distribution are spread out.

The variance, \(\sigma^2\), of a continuous random variable \(x\), defined on \([a, b]\), with probability density function \(f\), is

\[\sigma^2 = E(x^2) - \mu^2 = E(x^2) - [E(x)]^2 = \int_a^b x^2 \cdot f(x)dx - \bigg[\int_a^b x \cdot f(x)dx\bigg]^2\]

The standard deviation, \(\sigma\), of a continuous random variable is defined as

\[\sigma = \sqrt{\sigma^2}\] Let’s look at an example. Given the probability density function \(f = \frac{1}{2} x\), over \([0, 2]\), find \(E(x)\) and \(E(x^2)\).

import numpy as np
import matplotlib.pyplot as plt

def f(x):
  return (1/2) * x
  
x = np.linspace(0, 2)
y = f(x)

plt.plot(x, y)

This is how the density function looks like over the interval we’re interested in. The expected value is the sum of all possible outcomes weighted by their probability. We can interpret this probability as the integral of the function up until that specific outcome (\(x\)-value). In our case then, the expected value is

\[E(x) = \int_0^2 x \cdot \frac{1}{2}x dx\] The \(dx\) part in this syntax may be confusing, but it’s just how integrals are defined, we can think of it more like

\[E(x^) = \int_0^2 \bigg(x \cdot \frac{1}{2}x \bigg) dx\] and

\[E(x^2) = \int_0^2 \bigg(x^2 \cdot \frac{1}{2}x \bigg) dx\] which can be thought of as specific case of the more general

\[E(g(x)) = \int_a^b \bigg(g(x) \cdot f(x) \bigg) dx = \int_a^b g(x) \cdot f(x) dx\]

from sympy import *

x = symbols("x")
x2 = symbols("x")**2

mean = integrate(x * ((1/2) * x), (x, 0, 2))
squared = integrate(x2 * ((1/2) * x), (x, 0, 2))

mean
## 1.33333333333333
squared
## 2.00000000000000

Let’s find the mean, the variance, and the standard deviation of the probability density function

\[f(x) = \frac{1}{2}x\] over the same interval. We already have the mean, and we know that

\[\sigma^2 = E(x^2) - \mu^2\]

which translates to the variance being the difference between the expected value of the squared variable and the squared mean of the variable.

The standard deviation is the square root of the variance.

import math

variance = squared - mean**2
variance
## 0.222222222222222
standard_deviation = math.sqrt(variance)
standard_deviation
## 0.4714045207910318

The Normal Distribution

A continuous random variable \(x\) has a standard normal distribution if its probability density function is

\[f(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2}\]

over \((-\infty, \infty)\).

A continuous random variable \(x\) is normally distributed with mean \(\mu\) and standard deviation \(\sigma\) if its probability density function is given by

\[f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-(1/2)[(x - \mu) / \sigma]^2}\]

The graph of any normal distribution is a transformation of the graph of the standard normal distribution.

The graph of any normal distribution is a transformation of the graph of the standard normal distribution, and it’s defined by the the mean and standard deviation.

import scipy.stats as stats

x_axis = np.arange(-8, 8, 0.001)
plt.plot(x_axis, stats.norm.pdf(x_axis, 0, 0.5), label="$\mu = 0, \sigma = 0.5$")
plt.plot(x_axis, stats.norm.pdf(x_axis, 0, 1), label="$\mu = 0, \sigma = 1$")
plt.plot(x_axis, stats.norm.pdf(x_axis, 0, 2), label="$\mu = 0, \sigma = 2$")
plt.plot(x_axis, stats.norm.pdf(x_axis, -2, 1), label="$\mu = -2, \sigma = 1$")
plt.plot(x_axis, stats.norm.pdf(x_axis, 2, 1), label="$\mu = 2, \sigma = 1$")
plt.legend(frameon=False)

Because the normal distribution is extremely important in statistics, tables of approximate values of the definite integral of the standard normal distribution have been prepared using numerical approximations methods, and they’re generally used to speed things up. These are called z-tables. This article goes into some details on how to compute these tables in Python. z-tables contain values of

\[P(0 \leq x \leq z) = \int_0^z \frac{1}{\sqrt{2\pi}} e^{-x^2/2} dx\]

Percentiles

If \(x\) is a continuous random variable and \(f\) is a probability density function over an interval \([a, b]\), the pth percentile is a value \(c\), with \(a < c < b\), such that

\[\frac{P}{100} = \int_a^b f(x)dx\] For the standard normal distribution, with \(\mu = 0\) and \(\sigma = 1\), determine the percentile (probability) corresponding to each of the following \(z\)-values:

  • \(z\) = 0
  • \(z\) = -1.75
  • \(z\) = 2.25
from scipy.integrate import quad
import numpy as np

def normal_pdf(x):
    constant = 1.0 / np.sqrt(2*np.pi)
    return(constant * np.exp((-x**2) / 2.0))
    
scores = [0, -1.75, 2.25]

for i in scores:
  perc, _ = quad(normal_pdf, np.NINF, i)
  print(round(perc, 2))
## 0.5
## 0.04
## 0.99

The understanding is that the perc is equivalent to the area under the curve until the \(z\)-score. It’s a cumulative density function, and more formally

\[P(0 \leq x \leq z) = \int_0^z \frac{1}{\sqrt{2\pi}} e^{-x^2/2} dx\] Let \(x\) be a continuous random variable with a standard normal distribution, let’s find the following:

  • \(P(0 \leq x \leq 1.68)\)
  • \(P(-0.97 \leq x \leq 0)\)
  • \(P(-2.43 \leq x \leq 1.01)\)
  • \(P(1.90 \leq x \leq 2.74)\)
  • \(P(-2.98 \leq x \leq -0.42)\)
  • \(P(x \geq 0.61)\)
lower_bound = [0, -0.97, -2.43, 1.90, -2.98, np.NINF]
scores = [1.68, 0, 1.01, 2.74, -0.42, -0.61]

for i in zip(lower_bound, scores):
  perc, _ = quad(normal_pdf, i[0], i[1])
  print(perc)
## 0.45352134213628
## 0.3339767539364704
## 0.8362029435624363
## 0.025644600597351307
## 0.33580148493090956
## 0.2709309037875521

Z-Scores

For a normal distribution, we can assign a z-score (or standard score) to any point of the distribution. In statistics, the standard score is the signed fractional number of standard deviations by which the value of an observation or data point is above the mean value of what is being observed or measured.

\[z = \frac{x - \mu}{\sigma}\]