Principal Component Analysis (PCA) For Dummies

PCA is one of the first tools you’ll put in your data science tool box. Typically, you’ve collected so much data, that you don’t even know where to begin, so you perform PCA to reduce the data down to two dimensions, and then you can plot it, and look for structure. There are plenty of other reasons to perform PCA, but that’s a common one.

However, when you ask how PCA works, you either get a simple graphical explanation, or long webpages that boil down to the statement “The PCAs of your data are the eigenvectors of your data’s covariance matrix”. If you understood what that meant, you wouldn’t be looking up how PCA works! I want to try to provide a gateway between the graphical intuition and that statement about covariance and eigenvectors. Hopefully this will give you 1) a deeper understanding of linear algebra, 2) a nice intuition about what the covariance matrix is, and of course, 3) help you understand how PCA works.

Continue reading