The average, RMS value, and standard deviation of a vector are related by the formula

\[\mathbf{rms}(x)^2 = \mathbf{avg}(x)^2 + \mathbf{std}(x)^2\]

\(\mathbf{rms}(x)^2\) is the mean square value of the entries of \(x\), which can be expressed as the square of the mean value, plus the mean square fluctuation of the entries of \(x\) around their mean value.


Mean return and risk. Suppose that an \(n\)-vector represents a time seris of retun on an investment, expressed as the percentage, in \(n\) time periods over some interval of time.

Its average gives the mean return over the whole interval, often shortened to its return. Its standard deviation is a measure of how variable the return is, from period to period, over the interval, i.e., how much it typically varies from its mean, and is often called the (per period) risk of the investment.

You can compare multiple investment by plotting them on a risk-return plot, which gives the mean and standard deviation of the returns of each of the investments over some interval. A desirable return history vector has high mean return and low risk; meaning that the returns in the different periods are consistently high.

import numpy as np
import matplotlib.pyplot as plt

a = np.array([2, -1, 1, 3, 2, -2])
b = np.array([3, 1, 4, -2, 4, 5])
c = np.array([-1, -3, 0, 2, -1, 0])
d = np.array([6, -3, 2, -8, -4, 4])

plt.plot(a, label="investment a")
plt.plot(b, label="investment b")
plt.plot(c, label="investment c")
plt.plot(d, label="investment d")
plt.legend(frameon=False, loc="best")

import pandas as pd

a = np.array([np.mean(a), np.std(a)])
b = np.array([np.mean(b), np.std(b)])
c = np.array([np.mean(c), np.std(c)])
d = np.array([np.mean(d), np.std(d)])

data = pd.DataFrame([a, b, c, d], columns=["return", "risk"])
data["investment"] = ["a", "b", "c", "d"]
##      return      risk investment
## 0  0.833333  1.771691          a
## 1  2.500000  2.362908          b
## 2 -0.500000  1.500000          c
## 3 -0.500000  4.890467          d
data.plot.scatter(x="risk", y="return")

for i, txt in enumerate(data["investment"]):
    print(i, txt)
    plt.annotate(txt, (data["risk"][i], data["return"][i]))

I found the solution to labeling scatter plots on a seven-years-old answer on Stack Overflow. The fact that that’s how you do it in Pandas and Matplotlib is shocking! I think I’ll just stick to R in the future when I get a hint that plotting in Python would be a chore.