A number between 0 and 1 that represents the likelihood that an event will occur is referre to as the event’s probability. A probability of 0 means that the event is impossible (will never occur), and a probability of 1 means that the event is certain to occur.

There are two types of probability.Experimental probabilities are determined by making observations and gathering data. Theoretical probabilities are determined by reasoning mathematically. In this sense there is no true probability. At times, experimental probabilities might align well with theoretical ones. There are situations where you can derive one type of probability from the other.

Oftentimes, probability is introduced with examples about coin tosses, picking balls from urns, and picking cards from decks. The main idea is that there is a universe of possibilities and there are different ways to obtain different outcomes and their associated probabilities.

Jar / Urn Example

A jar contains 7 black balls, 6 yellow balls, 4 green balls, and 3 red balls, all the same size and weight. The jar is shaken well, and you remove 1 ball without looking.

  • What is the probability that the ball is red?
  • What is the probability that the ball is white?
black = 7
yellow = 6
green = 4
red = 3

balls = black + yellow + green + red
balls
## 20
def P(c, b):
  return c / b
  
P(red, balls)
## 0.15

There probability of picking a white ball is 0. There are none.

Continuous Random Variable

A random variable is a variable whose values (which can be infinetely many) depend on outcomes of a random process.

Suppose that buses traveling between a two different destination require at least 2 hr and a most 5 hr for the trip. If \(x\) is the number of hours a bus takes to make the trip, then \(4\) is a continuous random variable distributed over the interval \([2, 5]\).

Suppose that we want to know the probability that a bus will take between 4 hr and 5 hr, as represented by the notation

\[P(4 \leq x \leq 5)\]

There may be a function \(y = f(x)\) such that the area under the graph over a subinterval, e.g. \([2, 5]\), gives the probability that a particular trip time appears in that subinterval. This function is called the probability density function, and its integral over any of its subintervals gives the probability that \(x\) falls in that subinterval.

More formally, let \(x\) be a continuous random variable. A function \(f\) is said to be a probability density function for \(x\) if:

  • For all \(x\) in the domain of \(f\), we have \(0 \leq f(x)\).
  • The area under the graph of \(f\) is 1.
  • For any subinterval \([c, d]\) in the domain of \(f\), the probability that \(x\) will be in that subinterval is given by

\[P([c, d]) = \int_c^d f(x)dx\]

Example: Business Life of Products

A company that produces compact fluorescent bulbs determins tha the life \(t\) of a bulb is from 3 to 6 yr and that the probability density function for \(t\) is given by

\[f(t) = \frac{24}{t^3}\] for \(3 \leq t \leq 6\).

  • Verify that the area under the curve of \(f\) is 1.
  • Find the probability that a bulb will last no more than 4 yr.
  • Find the probability that a bulb will last at least 4 yr and at most 5 yr.
from sympy import *
from sympy.abc import t
import matplotlib.pyplot as plt
import numpy as np

def lifetime(t):
  return 24 / t**3

integrate(lifetime(t), (t, 3, 6))
## 1
integrate(lifetime(t), (t, 3, 4))
## 7/12
integrate(lifetime(t), (t, 4, 5))
## 27/100
xvals = np.linspace(3, 6)

plt.plot(xvals, lifetime(xvals))
plt.fill_between(xvals, lifetime(xvals), where=np.logical_and(xvals >=3 , xvals <= 4), color="green", alpha=0.3)
plt.fill_between(xvals, lifetime(xvals), where=np.logical_and(xvals >=4 , xvals <= 5), color="red", alpha=0.3)

Constructing Probability Density Functions

Suppose that you have an arbitrary nonnegative function \(f(x)\) whose definite integral over some interval \([a, b]\) is \(K\). Then

\[\int_a^b f(x)dx = K\] Multiplying both sides by \(1/K\) gives us

\[\frac{1}{K} \int_a^b f(x)dx = \frac{1}{K} K = 1\] Thus when we multiply the function \(f(x)\) by \(1/K\), we have a function whose area over the given interval is 1. Such a function satsifies the definition of a probability density function.

As an example, let’s find \(k\) such that

\[f(x) = kx^2\] is a probability density function over the interval \([1, 4]\).

from sympy.abc import x

def f(x):
  return x**2
  
integrate(f(x), (x, 1, 4))
## 21

Thus, for \(f\) to be a probability density function, we must have

\[\int_1^4 kx^2 dx = 1\]

and \(k = 1/21\), hence the probability density function is

\[f(x) = \frac{1}{21}x^2\] for \(1 \leq x \leq 4\).

Uniform Distributions

A continuous random variable \(x\) is said be uniformly distributed over an interval \([a, b]\) if it has a probability density funtion \(f\) given by

\[f(x) = \frac{1}{b - a}\]

for \(a \leq x \leq b\). This is the case when a the variable is constant over an interval.

x = np.random.uniform(100)
bins = np.linspace(0, 2, 10)

_, _, _ = plt.hist(x, bins, alpha=0.5, histtype='step', ec='black')
plt.show()

Exponential Distributions

A continuous random variable \(x\) is exponentially distributed if it has a probability density function of the form

\[f(x) = ke^{-kx}\] over the interval \([0, \infty]\).

x = np.random.exponential(2, 10000) 
bins = np.linspace(0, 2, 10)

_, _, _ = plt.hist(x, bins, alpha=0.5, histtype='step', ec='black')
plt.show()