Calculating Principal Component Analysis (PCA): A Step-by-Step Guide

By Mbali Kalirane on 18 Apr 2024

Large datasets, like those with over 100 features can be a large problem particularly in model training. These can be computationally taxing and can lead to model overfitting. Thus, you won’t want to overload your machine learning model with too much information resulting from too many features. Ideally, you’ll want a dataset with a reduced number of features, while still having meaningful information as the original data. One way to do this is to use principal component analysis (PCA).

What is Principal Component Analysis?
How to Find the Principal Components?
Example of Finding the Principal Components
Conclusion
Sources

What is Principal Component Analysis (PCA)?

PCA is a method of dimensionality reduction, which takes all the original features in a dataset and combines them to form a reduced number of new features. These combined new features are referred to as principal components. PCA is suitable for this, as it’s designed to reduce dimensionality in data, while retaining as much variance in the data as possible. Hence the principal components are able to retain as much valuable and relevant information from the original data.

How to Find the Principal Components?

Finding the principal components involves finding two things: the standardized matrix and the eigenvector matrix.

Calculating the standardized matrix involves:

Calculating the mean of each column in the matrix
Calculating the standard deviation of each column in the matrix
Using the z-score formula to transform the original matrix to a standardized matrix

Finding these eigenvectors, involves going through a series of steps. This includes

Calculating the covariance matrix from the standardized matrix
Calculating the determinant of the covariance matrix and finding the characteristic equation
Finding the eigenvalues by solving the characteristic equation
Finding the eigenvectors for each eigenvalue

Let’s take a look at how these are defined.

1. What is the Covariance Matrix?

The covariance matrix is based on the standardized dataset and summarizes the covariances between different pairs of features in the dataset. It analyses how much two variables change together. A positive covariance indicates that as one variable increases, the other variable tends to increase as well. A negative covariance indicates an inverse relationship between variables.

2. What is the Determinant of Matrix?

The determinant is a scalar value that is calculated from your covariance matrix. It’s like a gauge of how much variance exists in your dataset when you apply a transformation such as PCA to your data. A large determinant means that there’s a lot of variance in the transformed data, indicating that the original dataset spreads out widely. A small determinant small, suggests that there’s less variance in the transformed data, meaning the original dataset is more tightly clustered together. A determinant of zero indicates that the data is linearly dependent and has no volume in the transformed space, which would cause issues in PCA.

3. Eigenvalues

Eigenvalues represent the amount of variance captured by each eigenvector of the covariance matrix. Larger eigenvalues correspond to more significant variance explained by the associated eigenvectors.

4. Eigenvectors

Eigenvectors are non-zero vectors that, when multiplied by a square matrix, result in a scaled version of the original vector. The scaling factor is the eigenvalue.

Example of Finding Principle Components (using a 3X3 Dataset)

Suppose we have the following dataset with 3 features and three instances:

\[\begin{array}{|c|c|c|c} \hline \text{} & \text{Feature 1} & \text{Feature 2} & \text{Feature 3} \\ \hline \text{sample 1} & 2 & -1 & 0 \\ \hline \text{sample 2} & -1 & 2 & -1 \\ \hline \text{sample 3} & 0 & -1 & 2 \\ \hline \end{array}\]

The first thing we’ll want to do is standardize the data.

1. Standardizing the Data

Standardization is the process of scaling the features of the original dataset to have a mean of 0 and a standard deviation of 1. This is important because PCA is sensitive to the relative scales of the original features. Standardizing ensures that each feature contributes equally to the calculation of principal components, preventing features with larger scales from dominating the analysis.

The standardization will be applied to the original dataset.

To standardize the dataset, we apply the z-score formula:

$\ Z = \frac{x_{ij} - \mu_i}{\sigma_i} $

where,

$x_{ij}$ is the value of an observation in the i-th feature and j-th sample.
$Z$ is the standardized values of $x_{ij}$.
$mu_i$ is the mean of feature i
$sigma_i$ is the standard deviation of feature {i}

But to use the z-score formula, we need to find the mean $mu_i$ and the standard deviation $sigma_i$ of each feature.

To calculate the mean, we use this formula:

$\bar{x} = \frac{\sum{x_ij}}{n} $ where,

$n$ is the number of samples or instances in the dataset
$x_{ij}$ is the value of an observation in the i-th feature and j-th sample.

To calculate the standard deviation, we use this formula below. I’ve chosen to use the formula for the sample standard deviation, which has $n-1$:

$\sigma = \sqrt{\frac{\sum{(x_ij-\bar{x})^2}}{n-1}} $

$n$ is the number of samples or instances in the dataset
$x_{ij}$ is the value of an observation in the i-th feature and j-th sample.
$\bar{x}$ is the sample mean

We’ll need to calculate the mean and standard deviation for each feature in the dataset.

Using the original dataset, we calculate the mean and standard deviation:

\[\begin{pmatrix} 2 & -1 & 0 \\ -1 & 2 & -1\\ 0 & -1 & 2 \\ \end{pmatrix}\]

For feature 1 (the first column), we calculate the mean as follows:

$\bar{x} = \frac{2 + (-1) + 0}{3} = 1/3 $

And we calculate the standard deviation for feature 1 as follows:

$\sigma = \sqrt{\frac{(2-\frac{1}{3})^2+(0-\frac{1}{3})^2+(-1-\frac{1}{3})^2}{2}} = 1.527 $

Thus, for feature 1, the standard deviation = $1.527$ and the mean =$1/3$

The calculated standard deviations and means for all features in the dataset are as follows:

\[\begin{array}{|c|c|c|} \hline \text{} & \text{Feature 1} & \text{Feature 2} & \text{Feature 3} \\ \hline \text{mean} & 1/3 & 0 & 1/3 \\ \hline \text{standard deviation} & (1.527) & 1.732 & (1.527) \\ \hline \end{array}\]

Thus, we standardize each element in the original matrix by using the mean and standard deviation from the respective column in the original matrix.

For the first element $x_{11}$ (row 1 and column 1), using the mean and standard deviations for column 1, we have:

$\ x_{11}^* = \frac{2 - \frac{1}{3}}{1.527} = 1.091$

Thus, doing this for all other elements in the dataset, we have the standardized matrix:

\[\begin{pmatrix} 1.091 & -0.577 & -0.218 \\ -0.873 & 1.155 & -0.873\\ -0.218 & -0.577 & 1.091 \\ \end{pmatrix}\]

Verifying standardization by calculating the mean and standard deviation for each feature (column) in the standardized matrix:

\[\begin{array}{|c|c|c|} \hline \text{} & \text{Feature 1} & \text{Feature 2} & \text{Feature 3} \\ \hline \text{mean} & 0 & 0 & 0 \\ \hline \text{standard deviation} & 1 & 1 & 1 \\ \hline \end{array}\]

As we can see, the means are 0 and the standard deviations are 1. Thus, the data has indeed been standardized.

2. Finding the Covariance Matrix

To find the covariance matrix, we calculate the pairwise covariance between each feature in the standardized matrix. This will be the covariance between features:

feature A and A (AA)
feature B and B (BB)
feature C and C (CC)
feature A and B (AB)
feature A and C (AC)
feature B and C (BC)

The covariance is calculated using the covariance formula. I’ve chosen to use the sample covariance formula as shown below:

$Cov(X,Y) = \frac{\sum{(x_i-\bar{x})(y_i-\bar{y})}}{n-1}$

where $x_i$ and $y_i$ are the values of the features $X$ and $Y$ being compared.
$\bar{x}$ and $\bar{y}$ are the means of the features $X$ and $Y$.
$n$ is the number of samples

For features AA, we have the Covariance,

$Cov(A, A) = \frac{(1.091-0)(1.091-0)+(0.873-0)(-0.873-0)+(-0.218-0)(-0.218-0)}{2} = 1$

Between features AB, we have the Covariance,

$Cov(A, B) =\frac{(1.091-0)(-0.577-0)+(-0.873-0)(1.155-0)+(-0.218-0)(-0.577-0)}{2}= -0.756$

Thus, we will have our covariance matrix in this form:

\[A = \begin{pmatrix} AA & AB & AC \\ AB & BB & BC\\ AC & BC & CC \\ \end{pmatrix}\]

Thus, we have the covariance matrix:

\[A = \begin{pmatrix} 1 & -0.756 & 0.143 \\ -0.756 & 1 & -0.756\\ 0.143 & -0.756 & 1 \\ \end{pmatrix}\]

2. Calculating the Determinant

We will now calculate the determinant for our covariance matrix.

For our covariance matrix, A, the determinant is given by: $\det(A−λI)=0$,

where:

λ is the eigenvalue
I is the identity matrix.

We thus have,

\[\begin{align*} & \begin{vmatrix} 1 & -0.756 & 0.143 \\ -0.756 & 1 & -0.756\\ 0.143 & -0.756 & 1 \\ \end{vmatrix} & & - & & \lambda \begin{vmatrix} 1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & 0 & 1 \\ \end{vmatrix} = 0 \end{align*}\] \[\begin{vmatrix} 1 - \lambda & -0.756 & 0.143 \\ -0.756 & 1 - \lambda & -0.756\\ 0.143 & -0.756 & 1 - \lambda \\ \end{vmatrix} = 0\]

3. Finding the Characteristic Equation

We will now use our determinant to find our characteristic equation.

To do this we can choose any row or column as our pivot row or column. I’ll choose row 1. Using the $+ - +$ rule, each element will alternate in sign, with column 1 being positive, column 2 being negative and column 3 being positive.

The pivot values are ${1}-\lambda$, $-0.756$ and $-0.143$.

Thus,

\[\ + (1-\lambda) \times \begin{vmatrix} 1 & -0.756\\ -0.756 & 1 \\ \end{vmatrix}\] \[\ - (-0.756) \times \begin{vmatrix} -0.756 & -0.756\\ 0.143 & 1- \lambda\\ \end{vmatrix}\] \[\ + (0.143) \times \begin{vmatrix} -0.756 & 1 - \lambda\\ 0.143 & -0.756 \\ \end{vmatrix}\] \[\begin{align} & = + (1-\lambda)[(1)(1)-(-0.756)(-0.756)] \\ & - (-0.756)[(-0.756)(1-\lambda)-(-0.756)(0.143)] \\ & + (0.143)[-0.756(-0.756)-(1-\lambda)(0.143)] = 0 \\ & \lambda^3 - 3\lambda^2 + 1.8364797\lambda + 0.0000617 = 0 \end{align}\] \[\ = + (1-\lambda)[(1)(1)-(-0.756)(-0.756)] - (-0.756)[(-0.756)(1-\lambda)-(-0.756)(0.143)] + (0.143)[-0.756(-0.756)-(1-\lambda)(0.143)] = 0\]

Finally, we have our characteristic equation:

\[{\lambda}^3 - 3{\lambda}^2+1.8364797{\lambda} + 0.0000617= 0\]

4. Finding the Eigenvalues

To find the eigenvalues, we will solve the in the characteristic equation for $\lambda$. Since we have a cubic equation, we can expect to have three solutions to the characteristic equation.

After solving the characteristic equation, we have the following solutions for $\lambda$:

$\lambda_1 = -0.0000336$
$\lambda_2 = 0.857$
$\lambda_3 = 2.143$

Thus, these values are our eigenvalues.

5. Finding the Eigenvectors

Finding the eigenvectors involves substituting each eigenvalue into the correlation matrix. Using Row Echelon Reduction, we can reduce the correlation matrix to solve for the resulting system of equations, x1, x2 and x3. Reducing the matrix to row echelon form is useful, as the matrix is reduced to a series of 1’s and 0s, making it more easier to solve.

Finding the eigenvector for $\lambda_1 = -0.0000336$:

We thus aim to solve this matrix:, $\det(A−1.0000336I)=0$

We substitute the eigenvalue in place of $\lambda$:

\[\begin{vmatrix} 2 - 0.0000336 & -0.756 & 0.143 \\ -0.756 & 1 - 0.0000336& -0.756\\ 0.143 & -0.756 & 1 - 0.0000336 \\ \end{vmatrix} = 0\]

Thus, we have the resulting value:

\[= \begin{vmatrix} 1.0000336 & -0.756 & 0.143 \\ -0.756 & 1.0000336 & -0.756\\ 0.143 & -0.756 & 1.0000336 \\ \end{vmatrix} = 0\]

Reducing the Matrix - Row Operations

Our aim is to solve the system. One way to make solving the system easier would be to use row echelon reduction. Row Echelon Reduction, helps simplify the process of solving this system.

By definition, a reduced matrix should have the following characteristics:

Each leading entry in each row is 1.
The leading 1 in each row occurs further to the left than the leading 1 in the row above it.
Any rows consisting entirely of zeros are at the bottom.

Arranging the matrix into row echelon form includes applying row operations, such as multiplication, division or addition.

Now let’s apply row operations to reduce the previous matrix:

\[= \begin{vmatrix} 1.0000336 & -0.756 & 0.143 \\ -0.756 & 1.0000336 & -0.756\\ 0.143 & -0756 & 1.0000336 \\ \end{vmatrix} = 0\]

Convert -0.75 in R2 to 0:

$0.756R1 + R2$:

\[= \begin{vmatrix} 1.0000336 & -0.756 & 0.143 \\ 0 & 0.429 & -0.648\\ 0.143 & -0.756 & 1.0000336 \\ \end{vmatrix} = 0\]

Convert 0.143 in row 3 to 0

$-0.143R1 + R3$:

\[= \begin{vmatrix} 1.0000336 & -0.756 & 0.143 \\ 0 & 0.429 & -0.648\\ 0 & -0.648 & 0.979 \\ \end{vmatrix} = 0\]

Swap row 2 and row 3:

$R2<->R3$:

\[= \begin{vmatrix} 1.0000336 & -0.756 & 0.143 \\ 0 & -0.648 & 0.979 \\ 0 & 0.429 & -0.648\\ \end{vmatrix} = 0\]

Change -0.648 in row to to 1

$R2 \div 0.648$:

\[= \begin{vmatrix} 1.0000336 & -0.756 & 0.143 \\ 0 & 1 & -1.512 \\ 0 & 0.429 & -0.648\\ \end{vmatrix} = 0\]

Convert -0.756 in row 1 into 0

$0.745R2 + R1$:

\[= \begin{vmatrix} 1.0000336 & 0 & -1.0000336 \\ 0 & 1 & -1.512 \\ 0 & 0.429 & -0.648\\ \end{vmatrix} = 0\]

Convert 0.428 in row 3 into 0:

\[-0.428R2 + R3\] \[= \begin{vmatrix} 1.0000336 & 0 & -1.0000336 \\ 0 & 1 & -1.512 \\ 0 & 0 & 0\\ \end{vmatrix} = 0\]

System of Equations

The system of equations is as follows:

\[\begin{align*} & (A - 0.0000336) \begin{vmatrix} x_1 \\ x_2\\ x_3\\ \end{vmatrix} & & = & & \begin{vmatrix} 1.0000336 & 0 & -1.0000336 \\ 0 & 1 & -1.512 \\ 0 & 0 & 0\\ \end{vmatrix} \begin{vmatrix} x_1 \\ x_2 \\ x_3 \\ \end{vmatrix} & & = & & \begin{vmatrix} 0 \\ 0 \\ 0 \\ \end{vmatrix} \end{align*}\]

Thus, Multiplying the matrices

\[\begin{align*} & (A - 0.0000336) \begin{vmatrix} x_1 \\ x_2\\ x_3\\ \end{vmatrix} & & = & & \begin{vmatrix} 1.0000336x_1 & 0x_2 & -1.0000336x_3 \\ 0x_1 & 1x_2 & -1.512x_3 \\ 0x_1 & 0x_2 & 0x_3\\ \end{vmatrix} & & = & & \begin{vmatrix} 0 \\ 0 \\ 0 \\ \end{vmatrix} \end{align*}\]

Thus, the eigenvector for $\lambda = 0.0000336$:

$1.0000336x_1-1.0000336x_3 = 0$
$x2-1.512x_3 = 0$
$1.0000336x_1 = 1.0000336x_3$
$x2 = 1.512x_3$
$x_3$

Thus, $v = \begin{pmatrix} x_1\\ x_2\\ x_3\\ \end{pmatrix} = \begin{pmatrix} x_3\\ 1.512x_3\\ x_3\\ \end{pmatrix}$

Let $x_3 = 1$. Thus,

\[v = \begin{pmatrix} 1.0000336\\ 1.512\\ 1\\ \end{pmatrix}\]

We do the Same thing for $\lambda = 0,875$ and $\lambda = 2.143$

The calculated eigenvalues for $\lambda = 0.857$ are: $v = \begin{pmatrix} -1\\ 0 \\ 1\\ \end{pmatrix}$

and for $\lambda = 2.143$

\[v = \begin{pmatrix} 1\\ -1.323 \\ 1\\ \end{pmatrix}\]

5. Find Principal Components

To find the principal components, we multiply our standardized matrix with out eigenvector matrix

To obtain our eigenvector matrix, we order the columns of the matrix by largest eigenvector by largest eigenvalue.

For example, our eigenvalues by order of magnitude are: $\lambda$ = 2.413, followed by $\lambda$ = 0.857, and then $\lambda$ = -0.0000336. Thus, the first column of the eigenvector matrix will have the eigenvector associated with the largest eigenvalue $\lambda$ = 2.413, the second column will have the eigenvector of the second largest eigenvalue $\lambda$ = 0.857 and the third column will have the eigenvector of the smallest eigenvalue $\lambda$ = -0.0000336.

Thus, our eigenvector matrix is as follows:

\[\begin{pmatrix} 1 & -1 & 1.0000336 \\ -1.323 & 0 & 1.512\\ 1 & -1 & 1 \\ \end{pmatrix}\]

Multiplying our standardized matrix with our eigenvector matrix:

\[\begin{align*} & \begin{pmatrix} 1.091 & -0.577 & -0.218 \\ -0.873 & 1.155 & -0.873\\ -0.218 & -0.577 & 1.091 \\ \end{pmatrix} & & \times & & \begin{pmatrix} 1 & -1 & 1.0000336 \\ -1.323 & 0 & 1.512\\ 1 & -1 & 1 \\ \end{pmatrix} \end{align*}\] \[= \begin{pmatrix} 1.636 & -0.873 & 0.000612 \\ -3.274 & 1.746 & 0.00331 \\ 1.636 & -0.873 & 0.000569 \\ \end{pmatrix}\]

Finally, from the multiplied matrices, we have our three principal components, PC1, PC2 and PC3:

\[\begin{array}{|c|c|c|} \hline \text{PC1} & \text{PC2} & \text{PC3} \\ \hline 1.636 & -0.873 & 0.000612 \\ \hline -3.274 & 1.746 & 0.00331 \\ \hline 1.636 & -0.873 & 0.000569 \\ \hline \end{array}\]

Conclusion

In summary, performing Principal Component Analysis (PCA) comprises of the following steps:

Calculating the standardized matrix involves:
Calculating the covariance matrix from the standardized matrix
Calculating the determinant of the covariance matrix and finding the characteristic equation
Finding the eigenvalues by solving the characteristic equation
Finding the eigenvectors for each eigenvalue, using matrix reduction
Finding the eigenvector matrix by arranging eigenvectors according to the magnitude of their associated eigenvalues
Finding the principal components by multiplying the standardized matrix and the eigenvector matrix

Sources

Unfold Data Science, Principal Component Analysis Step by Step
Biostatsquid, Principal Component Analysis
Thinkwellvids, Gauss-Jordan Elimination

Mbali Kalirane Connect on Linkedin Hello! I’m Mbali, your Analytics Writer. I’m a passionate data scientist who believes in the power of education and sharing knowledge. Feel free to connect with me on Linkedin by clicking on the 'Connect' button above

« EDA vs Data Preprocessing: What's the Difference?

The Basics of the K-Nearest Neighbors (KNN) Algorithm »