Matthew Bennett

Logo

I am a data scientist working on time series forecasting (using R and Python 3) at the London Ambulance Service NHS Trust. I earned my PhD in cognitive neuroscience at the University of Glasgow working with fmri data and neural networks. I favour linux machines, and working in the terminal with Vim as my editor of choice.

View my GitHub Profile

View my LinkedIn Profile

View my CV

Pearson’s correlation

Pearson's correlation coefficient is closely related to the covariance - it is normalised to fall between 1 and -1 by dividing by the product of the standard deviations of the two variables being correlated. Thus, correlation tells us how two variables tend to covary, but in 'standard', rather than absolute, units. This is similar to how a z-score abstracts away from the absolute scale of the raw scores.

In order to divide each value in the covariance matrix by the product of the standard deviations of the two variables, we can multiply the covariance matrix on the left and also on the right by a diagonal matrix containing the reciprocal of the standard deviations:

\[\begin{equation} \begin{bmatrix} \frac{1}{\sigma_1} & 0 & 0 \\ 0 & \frac{1}{\sigma_2} & 0 \\ 0 & 0 & \frac{1}{\sigma_3} \end{bmatrix} % \begin{bmatrix} v_{1,1} & v_{1,2} & v_{1,3} \\ v_{2,1} & v_{2,2} & v_{2,3} \\ v_{3,1} & v_{3,2} & v_{3,3} \end{bmatrix} % \begin{bmatrix} \frac{1}{\sigma_1} & 0 & 0 \\ 0 & \frac{1}{\sigma_2} & 0 \\ 0 & 0 & \frac{1}{\sigma_3} \end{bmatrix} =% \begin{bmatrix} \rho_{1,1} & \rho_{1,2} & \rho_{1,3} \\ \rho_{2,1} & \rho_{2,2} & \rho_{2,3} \\ \rho_{3,1} & \rho_{3,2} & \rho_{3,3} \end{bmatrix} \end{equation}\]

For instance, with the matrix $A$ containing observations of 3 variables, we can see by eye that the first column increases row by row, and so does the 2nd column. These two variables are positively correlated. The third column decreases as we descend the rows, although not as linearly. This column is therefore negatively correlated with the other two, though slightly less strongly:

\[\begin{equation} A =% \begin{bmatrix} 1 & 2 & 3 \\ 2 & 3 & 1 \\ 3 & 3 & 0 \\ 4 & 5 & 1 \\ 5 & 6 & -1 \end{bmatrix} \end{equation}\]

Doing the computations, we see that this is indeed the case:

\[\begin{equation} \begin{bmatrix} 0.63 & 0 & 0 \\ 0 & 0.61 & 0 \\ 0 & 0 & 0.67 \end{bmatrix} % \begin{bmatrix} 2.5 & 2.5 & -2.0 \\ 2.5 & 2.7 & -1.8 \\ -2.0 & -1.8 & 2.2 \end{bmatrix} % \begin{bmatrix} 0.63 & 0 & 0 \\ 0 & 0.61 & 0 \\ 0 & 0 & 0.67 \end{bmatrix} =% \begin{bmatrix} 1.0 & 0.96 & -0.85 \\ 0.96 & 1.0 & -0.74 \\ -0.85 & -0.74 & 1.0 \end{bmatrix} \end{equation}\]

In matrix notation, we have:

\[\begin{equation} K^{-\frac{1}{2}} V K^{-\frac{1}{2}} = \text{correlation matrix} \end{equation}\]

Where $V$ is the covariance matrix, and $K$ is a diagonal matrix containing the variance of each variable (thus $K^{-\frac{1}{2}}$ takes the reciprocal of the square root of each element on the diagonal, yielding the reciprocal of the standard deviation).

Code implementation

We compute the covariance matrix, take the variance values from the diagonal and take the reciprocal of the square root and store them on a diagonal matrix K. Then we do the multiplications before returning the correlation matrix:

def corr(A, axis=0):
    V = covar(A, axis)
    sds=[1/sqrt(x) for x in V.diag()]
    K_sqrt = la.gen_mat([len(sds)]*2, values=sds, family='diag')
    correlations = K_sqrt.multiply(K).multiply(K_sqrt)
    return correlations

Demo

We create a matrix, call the corr method and print the result:

import linalg as la

A = la.Mat([[1, 2, 3],
            [2, 3, 1],
            [3, 3, 0],
            [4, 5, 1],
            [5, 6, -1]])

la.print_mat(la.stats.corr(A), 2)

Outputs:

>>> la.print_mat(la.stats.corr(A), 2)
[1.0, 0.96, -0.85]
[0.96, 1.0, -0.74]
[-0.85, -0.74, 1.0]

< Z-score

t-tests >

back to project main page
back to home