Linear Algebra for Data Science | Revisiting High School

6 min readApr 23, 2018

Agood understanding of Linear Algebra is intrinsic to analyze Machine Learning algorithms, especially for Deep Learning where so much happens behind the curtain. I have often noticed that confronting Mathematics and associated formulas get many aspirants intimidated, & am no different. But this aspect of learning is unavoidable for any Data Science aspirant so I tried to make it simple for myself. With an intention to help the novice lot of aspirants, I shall cover the rudimentary knowledge required for understanding Deep Learning. And you have my word that I will try to keep mathematical formulas & derivations out of this completely mathematical topic (so it might look lame to experts). Let us now brush those high school topics like a primary school student.

Scalars, Vectors, Matrices & Tensors:

A Scalar is just a single number written in lowercase italics variable name. A Vector is just an array of properly indexed numbers in an array, written in bold italics typeface with elements explicitly presented by a column enclosed in square brackets, with each element giving coordinates along a separate axis. A Matrix is a 2-D array of numbers written in bold, italic, uppercase typeface, where each element is referred to by two indices. This array gets enclosed in square brackets to explicitly matrix but with identify elements. A Tensor is just another matrix but with more than two axes, and is written in bold typeface variable name. Important to mention is Transpose, which is just a mirror image of a matrix along the main diagonal line (from top left corner to bottom right). Basically, Vector is a Matrix with just one column and transpose of Vector is a matrix with one row.

Multiplying Matrices and Vectors:

Matrix multiplication is generally distributive and associative, except in the case of Dot Matrix which is commutative. The Matrix product of matrices ‘A’ and ‘B’ is resultant matrix ‘C’, where ‘A’ must have same number of columns as ‘B’ has rows. A Hadamard (element-wise) product is just a matrix containing the product of individual elements of ‘A’ and ‘B’. And, Dot product between two vectors x and y of same dimensionality happens between ‘transpose of x’ and ‘y’. So, if we transpose a Matrix product, the value is scalar and equal to it’s own transpose.

Identity and Inverse Matrices:

An Identity matrix, represented as ‘I’, has all the main diagonal entries as 1 with rest of entries as 0. An Inverse Matrix helps to solve the equation many times for different values of b but isn’t

b is a known vector and x is a vector with unknown variables.

advisable to be used as it’s representation on digital computers has limited precision. Algorithms making use of value b yields more accurate estimates of x.

Vector Norm:

A Norm is a function used to measure size of Vectors (or even Matrices), by mapping vectors to non-negative values. We won’t get into details of whether

where p belongs to Real number & is greater than or equal to 1

it satisfies requirements like the triangle inequality or not. A simple Euclidean Norm is the Euclidean distance from origin to the point identified by x. Sometimes we measure size of the vector by counting it’s non-zero elements but that isn’t a norm because scaling the vector doesn’t change count of non-zero entries. Other common norm is Max Norm that simplifies to the absolute value of element with largest magnitude in the vector,

Now if we wish to measure the size of a Matrix, then we may use Frobenius Norm which has pretty much similar concept to that of Euclidean Norm,

Special Matrices & Vectors:

>Diagonal Matrices consist mostly of zeros and have nonzero entries only along the main diagonal, like Identity matrix. Not all diagonal matrices are square & those don’t have inverses. >Symmetric Matrices often arise when the entries are generated by some function of two arguments that does not depend on the order of the arguments. >Unit Vector is a vector with Unit norm. >Orthonormal Vectorsare orthogonal and has unit norm. >Orthogonal matrix is a square matrix whose rows and columns are mutually orthonormal.

Eigendecomposition:

This refers to decomposition of a matrix into a set of eigenvectors and eigenvalues to analyze various properties of that matrix. An eigenvector of a square matrix A is a non-zero vector v such that multiplication by A alters only the scale of v:

The scalar λ is known as the eigenvalue corresponding to this eigenvector. Note every real symmetric matrix can be decomposed into an expression using only real-valued eigenvectors and eigenvalues. A matrix whose eigenvalues are all positive is called positive definite. A matrix whose eigenvalues are all positive or zero valued is called positive semi-definite. Likewise, if all eigenvalues are negative, the matrix is negative definite, and if all eigenvalues are negative or zero valued, it is negative semi-definite.

SVD (Singular Value Decomposition):

It is another way to factorize a matrix, into singular vectors and singular values. SVD enables us to discover similar kind of information as the Eigendecomposition reveals but SVD is more generally applicable. Every real matrix has a SVD but not Eigenvalue decomposition. For example, if a matrix is not square, the Eigendecomposition is not defined, then we require SVD. Most useful feature of SVD is that we can use it to partially generalize matrix inversion to non-square matrices.

Moore-Penrose Pseudoinverse:

Matrix inversion is not defined for matrices that are not square, and this is where The Moore-Penrose Pseudoinverse enables us to make some headway.

Here, U , D and V are the singular value decomposition of A, and the pseudoinverse D+ of a diagonal matrix D is obtained by taking the reciprocal of its non-zero elements, then taking the transpose of the resulting matrix.

Trace Operator:

It gives the sum of all diagonal entries of a matrix. Some operations that are difficult to specify without resorting to summation notation can be specified using matrix products and the trace operator.

Writing an expression in terms of the trace operator enables to manipulate the expression using various useful identities. Note that a scalar is its own trace: a = Tr(a ).

The Determinant: The determinant of a square matrix, denoted det(A ), is a function that maps matrices to real scalars. The determinant is equal to the product of all the eigenvalues of a matrix. The absolute value of the determinant can be thought of as a measure of how much multiplication by the matrix expands or contracts space. If the determinant is 0, then space is contracted completely along at least one dimension, causing it to lose all its volume. If the determinant is 1, then the transformation preserves volume.

I shall now conclude this recap of Linear Algebra fundamentals, required to advance in Data Science & Machine Learning. This certainly isn’t all that we need to know but can definitely act as a stepping stone. Unless we straightaway urge to be a Data Science unicorn, this should be enough to begin with. Kindly keep me posted if you find any discrepancy in this post. Good Luck and below are few other resources that you would like to check out for other relevant topics:

Linear Algebra for Data Science | Revisiting High School

Scalars, Vectors, Matrices & Tensors:

Multiplying Matrices and Vectors:

Identity and Inverse Matrices:

Vector Norm:

Special Matrices & Vectors:

Eigendecomposition:

SVD (Singular Value Decomposition):

Moore-Penrose Pseudoinverse:

Trace Operator:

> Probability Distribution | Statistics for Deep Learning

> Delta Rule and Gradient Descent | Neural Networks

> Elementary Statistical Terms for Data Science Interviews

Written by Random Nerd

No responses yet