Calculating Principal Component Analysis (PCA): A Step-by-Step Guide
By Mbali Kalirane on
Large datasets, like those with over 100 features can be a large problem particularly in model training. These can be computationally taxing and can lead to model overfitting. Thus, you won’t want to overload your machine learning model with too much information resulting from too many features. Ideally, you’ll want a dataset with a reduced number of features, while still having meaningful information as the original data. One way to do this is to use principal component analysis (PCA).
Table of Contents
- What is Principal Component Analysis?
- How to Find the Principal Components?
- Example of Finding the Principal Components
- Conclusion
- Sources
What is Principal Component Analysis (PCA)?
PCA is a method of dimensionality reduction, which takes all the original features in a dataset and combines them to form a reduced number of new features. These combined new features are referred to as principal components. PCA is suitable for this, as it’s designed to reduce dimensionality in data, while retaining as much variance in the data as possible. Hence the principal components are able to retain as much valuable and relevant information from the original data.
How to Find the Principal Components?
Finding the principal components involves finding two things: the standardized matrix and the eigenvector matrix.
Calculating the standardized matrix involves:
- Calculating the mean of each column in the matrix
- Calculating the standard deviation of each column in the matrix
- Using the z-score formula to transform the original matrix to a standardized matrix
Finding these eigenvectors, involves going through a series of steps. This includes
- Calculating the covariance matrix from the standardized matrix
- Calculating the determinant of the covariance matrix and finding the characteristic equation
- Finding the eigenvalues by solving the characteristic equation
- Finding the eigenvectors for each eigenvalue
Let’s take a look at how these are defined.
1. What is the Covariance Matrix?
The covariance matrix is based on the standardized dataset and summarizes the covariances between different pairs of features in the dataset. It analyses how much two variables change together. A positive covariance indicates that as one variable increases, the other variable tends to increase as well. A negative covariance indicates an inverse relationship between variables.
2. What is the Determinant of Matrix?
The determinant is a scalar value that is calculated from your covariance matrix. It’s like a gauge of how much variance exists in your dataset when you apply a transformation such as PCA to your data. A large determinant means that there’s a lot of variance in the transformed data, indicating that the original dataset spreads out widely. A small determinant small, suggests that there’s less variance in the transformed data, meaning the original dataset is more tightly clustered together. A determinant of zero indicates that the data is linearly dependent and has no volume in the transformed space, which would cause issues in PCA.
3. Eigenvalues
Eigenvalues represent the amount of variance captured by each eigenvector of the covariance matrix. Larger eigenvalues correspond to more significant variance explained by the associated eigenvectors.
4. Eigenvectors
Eigenvectors are non-zero vectors that, when multiplied by a square matrix, result in a scaled version of the original vector. The scaling factor is the eigenvalue.
Example of Finding Principle Components (using a 3X3 Dataset)
Suppose we have the following dataset with 3 features and three instances:
\[\begin{array}{|c|c|c|c} \hline \text{} & \text{Feature 1} & \text{Feature 2} & \text{Feature 3} \\ \hline \text{sample 1} & 2 & -1 & 0 \\ \hline \text{sample 2} & -1 & 2 & -1 \\ \hline \text{sample 3} & 0 & -1 & 2 \\ \hline \end{array}\]The first thing we’ll want to do is standardize the data.
1. Standardizing the Data
Standardization is the process of scaling the features of the original dataset to have a mean of 0 and a standard deviation of 1. This is important because PCA is sensitive to the relative scales of the original features. Standardizing ensures that each feature contributes equally to the calculation of principal components, preventing features with larger scales from dominating the analysis.
The standardization will be applied to the original dataset.
To standardize the dataset, we apply the z-score formula:
$\ Z = \frac{x_{ij} - \mu_i}{\sigma_i} $
where,
- $x_{ij}$ is the value of an observation in the i-th feature and j-th sample.
- $Z$ is the standardized values of $x_{ij}$.
- $mu_i$ is the mean of feature i
- $sigma_i$ is the standard deviation of feature {i}
But to use the z-score formula, we need to find the mean $mu_i$ and the standard deviation $sigma_i$ of each feature.
To calculate the mean, we use this formula:
$\bar{x} = \frac{\sum{x_ij}}{n} $ where,
- $n$ is the number of samples or instances in the dataset
- $x_{ij}$ is the value of an observation in the i-th feature and j-th sample.
To calculate the standard deviation, we use this formula below. I’ve chosen to use the formula for the sample standard deviation, which has $n-1$:
$\sigma = \sqrt{\frac{\sum{(x_ij-\bar{x})^2}}{n-1}} $
- $n$ is the number of samples or instances in the dataset
- $x_{ij}$ is the value of an observation in the i-th feature and j-th sample.
- $\bar{x}$ is the sample mean
We’ll need to calculate the mean and standard deviation for each feature in the dataset.
Using the original dataset, we calculate the mean and standard deviation:
\[\begin{pmatrix} 2 & -1 & 0 \\ -1 & 2 & -1\\ 0 & -1 & 2 \\ \end{pmatrix}\]For feature 1 (the first column), we calculate the mean as follows:
$\bar{x} = \frac{2 + (-1) + 0}{3} = 1/3 $
And we calculate the standard deviation for feature 1 as follows:
$\sigma = \sqrt{\frac{(2-\frac{1}{3})^2+(0-\frac{1}{3})^2+(-1-\frac{1}{3})^2}{2}} = 1.527 $
Thus, for feature 1, the standard deviation = $1.527$ and the mean =$1/3$
The calculated standard deviations and means for all features in the dataset are as follows:
\[\begin{array}{|c|c|c|} \hline \text{} & \text{Feature 1} & \text{Feature 2} & \text{Feature 3} \\ \hline \text{mean} & 1/3 & 0 & 1/3 \\ \hline \text{standard deviation} & (1.527) & 1.732 & (1.527) \\ \hline \end{array}\]Thus, we standardize each element in the original matrix by using the mean and standard deviation from the respective column in the original matrix.
For the first element $x_{11}$ (row 1 and column 1), using the mean and standard deviations for column 1, we have:
$\ x_{11}^* = \frac{2 - \frac{1}{3}}{1.527} = 1.091$
Thus, doing this for all other elements in the dataset, we have the standardized matrix:
\[\begin{pmatrix} 1.091 & -0.577 & -0.218 \\ -0.873 & 1.155 & -0.873\\ -0.218 & -0.577 & 1.091 \\ \end{pmatrix}\]Verifying standardization by calculating the mean and standard deviation for each feature (column) in the standardized matrix:
\[\begin{array}{|c|c|c|} \hline \text{} & \text{Feature 1} & \text{Feature 2} & \text{Feature 3} \\ \hline \text{mean} & 0 & 0 & 0 \\ \hline \text{standard deviation} & 1 & 1 & 1 \\ \hline \end{array}\]As we can see, the means are 0 and the standard deviations are 1. Thus, the data has indeed been standardized.
2. Finding the Covariance Matrix
To find the covariance matrix, we calculate the pairwise covariance between each feature in the standardized matrix. This will be the covariance between features:
- feature A and A (AA)
- feature B and B (BB)
- feature C and C (CC)
- feature A and B (AB)
- feature A and C (AC)
- feature B and C (BC)
The covariance is calculated using the covariance formula. I’ve chosen to use the sample covariance formula as shown below:
$Cov(X,Y) = \frac{\sum{(x_i-\bar{x})(y_i-\bar{y})}}{n-1}$
- where $x_i$ and $y_i$ are the values of the features $X$ and $Y$ being compared.
- $\bar{x}$ and $\bar{y}$ are the means of the features $X$ and $Y$.
- $n$ is the number of samples
For features AA, we have the Covariance,
$Cov(A, A) = \frac{(1.091-0)(1.091-0)+(0.873-0)(-0.873-0)+(-0.218-0)(-0.218-0)}{2} = 1$
Between features AB, we have the Covariance,
$Cov(A, B) =\frac{(1.091-0)(-0.577-0)+(-0.873-0)(1.155-0)+(-0.218-0)(-0.577-0)}{2}= -0.756$
Thus, we will have our covariance matrix in this form:
\[A = \begin{pmatrix} AA & AB & AC \\ AB & BB & BC\\ AC & BC & CC \\ \end{pmatrix}\]Thus, we have the covariance matrix:
\[A = \begin{pmatrix} 1 & -0.756 & 0.143 \\ -0.756 & 1 & -0.756\\ 0.143 & -0.756 & 1 \\ \end{pmatrix}\]2. Calculating the Determinant
We will now calculate the determinant for our covariance matrix.
For our covariance matrix, A, the determinant is given by: $\det(A−λI)=0$,
where:
- λ is the eigenvalue
- I is the identity matrix.
We thus have,
\[\begin{align*} & \begin{vmatrix} 1 & -0.756 & 0.143 \\ -0.756 & 1 & -0.756\\ 0.143 & -0.756 & 1 \\ \end{vmatrix} & & - & & \lambda \begin{vmatrix} 1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & 0 & 1 \\ \end{vmatrix} = 0 \end{align*}\] \[\begin{vmatrix} 1 - \lambda & -0.756 & 0.143 \\ -0.756 & 1 - \lambda & -0.756\\ 0.143 & -0.756 & 1 - \lambda \\ \end{vmatrix} = 0\]3. Finding the Characteristic Equation
We will now use our determinant to find our characteristic equation.
To do this we can choose any row or column as our pivot row or column. I’ll choose row 1. Using the $+ - +$ rule, each element will alternate in sign, with column 1 being positive, column 2 being negative and column 3 being positive.
The pivot values are ${1}-\lambda$, $-0.756$ and $-0.143$.
Thus,
\[\ + (1-\lambda) \times \begin{vmatrix} 1 & -0.756\\ -0.756 & 1 \\ \end{vmatrix}\] \[\ - (-0.756) \times \begin{vmatrix} -0.756 & -0.756\\ 0.143 & 1- \lambda\\ \end{vmatrix}\] \[\ + (0.143) \times \begin{vmatrix} -0.756 & 1 - \lambda\\ 0.143 & -0.756 \\ \end{vmatrix}\] \[\begin{align} & = + (1-\lambda)[(1)(1)-(-0.756)(-0.756)] \\ & - (-0.756)[(-0.756)(1-\lambda)-(-0.756)(0.143)] \\ & + (0.143)[-0.756(-0.756)-(1-\lambda)(0.143)] = 0 \\ & \lambda^3 - 3\lambda^2 + 1.8364797\lambda + 0.0000617 = 0 \end{align}\] \[\ = + (1-\lambda)[(1)(1)-(-0.756)(-0.756)] - (-0.756)[(-0.756)(1-\lambda)-(-0.756)(0.143)] + (0.143)[-0.756(-0.756)-(1-\lambda)(0.143)] = 0\]Finally, we have our characteristic equation:
\[{\lambda}^3 - 3{\lambda}^2+1.8364797{\lambda} + 0.0000617= 0\]4. Finding the Eigenvalues
To find the eigenvalues, we will solve the in the characteristic equation for $\lambda$. Since we have a cubic equation, we can expect to have three solutions to the characteristic equation.
After solving the characteristic equation, we have the following solutions for $\lambda$:
- $\lambda_1 = -0.0000336$
- $\lambda_2 = 0.857$
- $\lambda_3 = 2.143$
Thus, these values are our eigenvalues.
5. Finding the Eigenvectors
Finding the eigenvectors involves substituting each eigenvalue into the correlation matrix. Using Row Echelon Reduction, we can reduce the correlation matrix to solve for the resulting system of equations, x1, x2 and x3. Reducing the matrix to row echelon form is useful, as the matrix is reduced to a series of 1’s and 0s, making it more easier to solve.
Finding the eigenvector for $\lambda_1 = -0.0000336$:
We thus aim to solve this matrix:, $\det(A−1.0000336I)=0$
We substitute the eigenvalue in place of $\lambda$:
\[\begin{vmatrix} 2 - 0.0000336 & -0.756 & 0.143 \\ -0.756 & 1 - 0.0000336& -0.756\\ 0.143 & -0.756 & 1 - 0.0000336 \\ \end{vmatrix} = 0\]Thus, we have the resulting value:
\[= \begin{vmatrix} 1.0000336 & -0.756 & 0.143 \\ -0.756 & 1.0000336 & -0.756\\ 0.143 & -0.756 & 1.0000336 \\ \end{vmatrix} = 0\]Reducing the Matrix - Row Operations
Our aim is to solve the system. One way to make solving the system easier would be to use row echelon reduction. Row Echelon Reduction, helps simplify the process of solving this system.
By definition, a reduced matrix should have the following characteristics:
- Each leading entry in each row is 1.
- The leading 1 in each row occurs further to the left than the leading 1 in the row above it.
- Any rows consisting entirely of zeros are at the bottom.
Arranging the matrix into row echelon form includes applying row operations, such as multiplication, division or addition.
Now let’s apply row operations to reduce the previous matrix:
\[= \begin{vmatrix} 1.0000336 & -0.756 & 0.143 \\ -0.756 & 1.0000336 & -0.756\\ 0.143 & -0756 & 1.0000336 \\ \end{vmatrix} = 0\]Convert -0.75 in R2 to 0:
\(0.756R1 + R2\):
\[= \begin{vmatrix} 1.0000336 & -0.756 & 0.143 \\ 0 & 0.429 & -0.648\\ 0.143 & -0.756 & 1.0000336 \\ \end{vmatrix} = 0\]Convert 0.143 in row 3 to 0
\(-0.143R1 + R3\):
\[= \begin{vmatrix} 1.0000336 & -0.756 & 0.143 \\ 0 & 0.429 & -0.648\\ 0 & -0.648 & 0.979 \\ \end{vmatrix} = 0\]Swap row 2 and row 3:
\(R2<->R3\):
\[= \begin{vmatrix} 1.0000336 & -0.756 & 0.143 \\ 0 & -0.648 & 0.979 \\ 0 & 0.429 & -0.648\\ \end{vmatrix} = 0\]Change -0.648 in row to to 1
\(R2 \div 0.648\):
\[= \begin{vmatrix} 1.0000336 & -0.756 & 0.143 \\ 0 & 1 & -1.512 \\ 0 & 0.429 & -0.648\\ \end{vmatrix} = 0\]Convert -0.756 in row 1 into 0
\(0.745R2 + R1\):
\[= \begin{vmatrix} 1.0000336 & 0 & -1.0000336 \\ 0 & 1 & -1.512 \\ 0 & 0.429 & -0.648\\ \end{vmatrix} = 0\]Convert 0.428 in row 3 into 0:
\[-0.428R2 + R3\] \[= \begin{vmatrix} 1.0000336 & 0 & -1.0000336 \\ 0 & 1 & -1.512 \\ 0 & 0 & 0\\ \end{vmatrix} = 0\]System of Equations
The system of equations is as follows:
\[\begin{align*} & (A - 0.0000336) \begin{vmatrix} x_1 \\ x_2\\ x_3\\ \end{vmatrix} & & = & & \begin{vmatrix} 1.0000336 & 0 & -1.0000336 \\ 0 & 1 & -1.512 \\ 0 & 0 & 0\\ \end{vmatrix} \begin{vmatrix} x_1 \\ x_2 \\ x_3 \\ \end{vmatrix} & & = & & \begin{vmatrix} 0 \\ 0 \\ 0 \\ \end{vmatrix} \end{align*}\]Thus, Multiplying the matrices
\[\begin{align*} & (A - 0.0000336) \begin{vmatrix} x_1 \\ x_2\\ x_3\\ \end{vmatrix} & & = & & \begin{vmatrix} 1.0000336x_1 & 0x_2 & -1.0000336x_3 \\ 0x_1 & 1x_2 & -1.512x_3 \\ 0x_1 & 0x_2 & 0x_3\\ \end{vmatrix} & & = & & \begin{vmatrix} 0 \\ 0 \\ 0 \\ \end{vmatrix} \end{align*}\]Thus, the eigenvector for $\lambda = 0.0000336$:
- $1.0000336x_1-1.0000336x_3 = 0$
-
$x2-1.512x_3 = 0$
- $1.0000336x_1 = 1.0000336x_3$
- $x2 = 1.512x_3$
- $x_3$
Thus, \(v = \begin{pmatrix} x_1\\ x_2\\ x_3\\ \end{pmatrix} = \begin{pmatrix} x_3\\ 1.512x_3\\ x_3\\ \end{pmatrix}\)
Let $x_3 = 1$. Thus,
\[v = \begin{pmatrix} 1.0000336\\ 1.512\\ 1\\ \end{pmatrix}\]We do the Same thing for $\lambda = 0,875$ and $\lambda = 2.143$
The calculated eigenvalues for $\lambda = 0.857$ are: \(v = \begin{pmatrix} -1\\ 0 \\ 1\\ \end{pmatrix}\)
and for $\lambda = 2.143$
\[v = \begin{pmatrix} 1\\ -1.323 \\ 1\\ \end{pmatrix}\]5. Find Principal Components
To find the principal components, we multiply our standardized matrix with out eigenvector matrix
To obtain our eigenvector matrix, we order the columns of the matrix by largest eigenvector by largest eigenvalue.
For example, our eigenvalues by order of magnitude are: $\lambda$ = 2.413, followed by $\lambda$ = 0.857, and then $\lambda$ = -0.0000336. Thus, the first column of the eigenvector matrix will have the eigenvector associated with the largest eigenvalue $\lambda$ = 2.413, the second column will have the eigenvector of the second largest eigenvalue $\lambda$ = 0.857 and the third column will have the eigenvector of the smallest eigenvalue $\lambda$ = -0.0000336.
Thus, our eigenvector matrix is as follows:
\[\begin{pmatrix} 1 & -1 & 1.0000336 \\ -1.323 & 0 & 1.512\\ 1 & -1 & 1 \\ \end{pmatrix}\]Multiplying our standardized matrix with our eigenvector matrix:
\[\begin{align*} & \begin{pmatrix} 1.091 & -0.577 & -0.218 \\ -0.873 & 1.155 & -0.873\\ -0.218 & -0.577 & 1.091 \\ \end{pmatrix} & & \times & & \begin{pmatrix} 1 & -1 & 1.0000336 \\ -1.323 & 0 & 1.512\\ 1 & -1 & 1 \\ \end{pmatrix} \end{align*}\] \[= \begin{pmatrix} 1.636 & -0.873 & 0.000612 \\ -3.274 & 1.746 & 0.00331 \\ 1.636 & -0.873 & 0.000569 \\ \end{pmatrix}\]Finally, from the multiplied matrices, we have our three principal components, PC1, PC2 and PC3:
\[\begin{array}{|c|c|c|} \hline \text{PC1} & \text{PC2} & \text{PC3} \\ \hline 1.636 & -0.873 & 0.000612 \\ \hline -3.274 & 1.746 & 0.00331 \\ \hline 1.636 & -0.873 & 0.000569 \\ \hline \end{array}\]Conclusion
In summary, performing Principal Component Analysis (PCA) comprises of the following steps:
- Calculating the standardized matrix involves:
- Calculating the covariance matrix from the standardized matrix
- Calculating the determinant of the covariance matrix and finding the characteristic equation
- Finding the eigenvalues by solving the characteristic equation
- Finding the eigenvectors for each eigenvalue, using matrix reduction
- Finding the eigenvector matrix by arranging eigenvectors according to the magnitude of their associated eigenvalues
- Finding the principal components by multiplying the standardized matrix and the eigenvector matrix
Sources
- Unfold Data Science, Principal Component Analysis Step by Step
- Biostatsquid, Principal Component Analysis
- Thinkwellvids, Gauss-Jordan Elimination