I am a data scientist working on time series forecasting (using R and Python 3) at the London Ambulance Service NHS Trust. I earned my PhD in cognitive neuroscience at the University of Glasgow working with fmri data and neural networks. I favour linux machines, and working in the terminal with Vim as my editor of choice.
It is often useful to normalise data. One type of normalisation is to 'zero center' the data by subtracting it's mean. This will yield data with a new mean of zero. Often we want to do this separately for each column or row of a matrix. Also it's often the case that we want to perform this operation on a square matrix. In that case, we can multiply by a 'centering matrix':
Here the matrix $C$ had the effect of subtracting $1/3$ from the first column, subtracting $2/3$ from the second column, and adding $1$ to the last column - ensuring that all columns sum to zero.
We create a function to build the centering matrix:def gen_centering(size):
if type(size) is int:
size = [size, size]
return la.eye(size).subtract(1/size[0])
We then define a function that can zero center each column, each row, or that matrix as a whole (if axis=2). If the matrix is square, then we use the centering matrix approach above. Otherwise, after taking the mean across the relevant axis (columns, rows, or all elements), we make a matrix of the same size but filled with the mean of each axis and subtract it off the original matrix:
def zero_center(A, axis=0):
if axis == 2:
global_mean = mean(mean(A)).data[0][0]
return A.subtract(global_mean)
elif axis == 1:
A = A.tr()
if A.is_square():
A = gen_centering(la.size(A)).multiply(A)
else:
A_mean = mean(A)
ones = la.gen_mat([la.size(A)[0], 1], values=[1])
A_mean_mat = ones.multiply(A_mean)
A = A.subtract(A_mean_mat)
if axis == 1:
A = A.tr()
return A
We create a matrix, call the zero center method twice with a different axis as an argument and print the results (we only print 2 decimal places in one case to make it look pretty):
import linalg as la
A = la.Mat([[1, 2, 3],
[-2, 1, 4],
[0, 1, 2],
[3, 6, 1]])
B = la.Mat([[1, 1, 1],
[0, 2, 0],
[0, 3, -4]])
result = la.stats.zero_center(A)
la.print_mat(result)
result = la.stats.zero_center(A, axis=1)
la.print_mat(result, 2)
result = la.stats.zero_center(B)
la.print_mat(result, 2)
Outputs:
>>> la.print_mat(result)
[0.5, -0.5, 0.5]
[-2.5, -1.5, 1.5]
[-0.5, -1.5, -0.5]
[2.5, 3.5, -1.5]
>>> la.print_mat(result, 2)
[-1.0, 0.0, 1.0]
[-3.0, 0.0, 3.0]
[-1.0, 0.0, 1.0]
[-0.33, 2.67, -2.33]
>>> la.print_mat(result, 2)
[0.67, -1.0, 2.0]
[-0.33, 0.0, 1.0]
[-0.33, 1.0, -3.0]
back to project main page
back to home