Variance, covariance, standard deviation and standard error

The study of statistics and data is the study of variation. The general situation is that we are interested in estimating a parameter of a population based on a sample drawn from that population.

Here we introduce some of the key basic measures.

Variance and covariance

The variance of a set of values $x$ is the sum of the squared deviation of each value from the mean of all the values $\bar x$, divided by one less than the number of observations (this gives the 'sample variance', whereas omitting the $-1$ would instead give the 'population variance'. When $n$ is large, it doesn't particularly matter). In essence it's the average squared deviation from the mean in the data:

\[\begin{equation} \sigma^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar x)(x_i - \bar x) \end{equation}\]

In matrix notation:

\[\begin{equation} \text{covariance} = \frac{1}{m-1}(x - \bar x)^T (x - \bar x) \end{equation}\]

We can compare the variation between two different variables and see if they 'covary' or not. This is called the covariance and is computed just like the variance, only it's between two sets of observations $x$ and $y$. In essence this is the average 'agreement' in the deviations from each variable's mean:

\[\begin{equation} \text{covariance} = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar x)(y_i - \bar y) \end{equation}\]

In matrix notation:

\[\begin{equation} \text{covariance} = \frac{1}{m-1}(x - \bar x)^T (y - \bar x) \end{equation}\]

If we have observations of several variables, with each set of observations forming a column of a matrix $A$, then computing all of the dot products between each column is just what happens with $A^T A$. So if we use the centering matrix to remove the mean of each set of observations with

\[\begin{equation} (CA)^T C A = A^T C^T C A = A^T C A \end{equation}\]

(since C is symmetric and idempotent) and divide this by one less than the number of observations we get the 'covariance matrix':

\[\begin{equation} \text{covariance matrix} = \frac{1}{m-1} A^T CA \end{equation}\] \[\begin{equation} \frac{1}{3} \begin{bmatrix} 1 & 0 & 4 & 2 \\ 2 & -1 & 0 & 5 \\ 3 & 6 & 2 & 1 \end{bmatrix} % \frac{1}{4} \begin{bmatrix} 3, -1, -1, -1 \\ -1, 3, -1, -1 \\ -1, -1, 3, -1 \\ -1, -1, -1, 3 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \\ 0 & -1 & 6 \\ 4 & 0 & 2 \\ 2 & 5 & 1 \end{bmatrix} =% \end{equation}\] \[\begin{equation} \frac{1}{3} \begin{bmatrix} 8.75 & 1.5 & -8 \\ 1.5 & 21 & -13 \\ -8 & -13 & 14 \end{bmatrix} =% \begin{bmatrix} 2.1875 & 0.375 & -2 \\ 0.375 & 5.25 & -3.25 \\ -2 & -3.25 & 3.5 \end{bmatrix} \end{equation}\]

Note the variance $\sigma^2$ of each column sitting along the diagonal.

Code implementation

We define a method that by default computes the covariance among the columns of an input matrix A, yielding the sample covariance (i.e. using the $n-1$ correction).

First we transpose the matrix if the user wants to take rows as variables. Then we zero center the matrix, perform the $A^T A$ step and normalise by the number of observations (subtracting 1 by default, unless sample is False):

def covar(A, axis=0, sample=True):
    if axis == 1:
        A = A.tr()
    A_zc = zero_center(A)
    A_cov = A_zc.tr().multiply(A_zc)
    N = la.size(A)[0]
    if sample:
        N -= 1
    A_cov = A_cov.div_elwise(N)
    return A_cov

A method for the variances of the columns of a matrix, rather than the full covariance matrix, is achieved by extracting the values from the diagonal of the covariance matrix. We transpose the vector from a row to a column in the case that we computed the variance across the rows of the matrix:

def var(A, axis=0, sample=True):
    A_covar = covar(A, axis, sample)
    A_var = la.Mat([A_covar.diag()])
    if axis == 1:
        return A_var.tr()
    return A_var

Standard deviation and standard error

While the variance of, and covariance between, a set of variables is useful, they are squared values and as such are not on the same scale as the measured data. Therefore a useful measure when reasoning about the data is the square root of the variance. This is called the standard deviation:

\[\begin{equation} \sigma = \sqrt{\sigma^2} \end{equation}\]

If we divide the standard deviation by the square root of the number of observations, we have the standard error of the mean. This indicates the uncertainty around the sample mean as an estimate of the population mean (since the sample mean will not match the population mean exactly). The $\sqrt{n}$ term is like a 'penalty' incurred for making the 'leap' of inference to the population and this penalty decreases as your sample size increases:

\[\begin{equation} SE = \frac{\sigma}{\sqrt{n}} \end{equation}\]

Code implementation

It is simple to convert from the variance to the standard deviation and from the standard deviation to the standard error:

def sd(A, axis=0, sample=True):
    A_var = var(A, axis, sample)
    stddevs = A_var.function_elwise(sqrt)
    return stddevs

def se(A, axis=0, sample=True):
    stddevs = stddev(A, axis, sample)
    N = la.size(A)[axis]
    stderrs = stddevs.div_elwise(sqrt(N))
    return stderrs

Demo

We create a matrix, call the methods and print the results (notice when we use the population covariance by setting the sample argument to 'False' the values are a bit smaller):

import linalg as la

A = la.Mat([[1, 2, 3],
            [0, -1, 6],
            [4, 0, 2],
            [2, 5, 1]])

la.print_mat(la.stats.covar(A), 2)

la.print_mat(la.stats.covar(A, sample=False), 2)

la.print_mat(la.stats.sd(A), 2)

la.print_mat(la.stats.se(A), 2)

Outputs:

>>> la.print_mat(la.stats.covar(A), 2)
[2.92, 0.5, -2.67]
[0.5, 7.0, -4.33]
[-2.67, -4.33, 4.67]

>>> la.print_mat(la.stats.covar(A, sample=False), 2)
[2.19, 0.38, -2.0]
[0.38, 5.25, -3.25]
[-2.0, -3.25, 3.5]

>>> la.print_mat(la.stats.sd(A), 2)
[1.71, 2.65, 2.16]

>>> la.print_mat(la.stats.se(A), 2)
[0.85, 1.32, 1.08]

< Zero center

Z-score >

back to project main page
back to home