Project 4: Principal component analysis

The goal of this project is to implement the algorithm for computing the principal components for a data set.

Write two functions in Python 3 with the following signatures:

pca(data, M)

transform(data, components)

Where

data is an array-like with shape (n,d), containing n data points each of dimension d.
M is the desired number of principal components to be found
components is an array-like with shape (M, d) containing principal components as a collection of vectors.

The pca function returns the principal components of the given data in a form that can then be passed as a paramter to transform, which returns the given data projected onto the space defined by the principal components.

To test your implementation, choose one data set that you have used in a previous project and choose a classifier (preferably the KNN, MLP, or SVM classifier that you wrote in a previous project of your choice; or you may use a scikit-learn classifier if you aren't confident in any of your own...). Find the performance of that classifier on the original data set and then again on the data set transformed by principal component analysis.

(Keep in mind that presumably the performance of the classifier will go down. PCA doesn't improve performance but rather makes it more feasible in the light of high dimensionality. The hope is that the performance will only go down a little.)

Submit your code in two files, pca.py and test_pca.py so that the following code will work:

import pca

components = pca.pca(X, 5)

X_transformed = pca.transform(X, components)

Moreover, the following should work from the command line:

python3 test_pca.py

Which should display information about the performance of your classifier with and without PCA.

Finally, include a file README that (briefly) describes how you tested your classifier and to what results, and anything else you think it would be good for me to know in order for me to give your submission the fairest grading.

To turn in:

Copy pca.py, test_pca.py, README, and any other files your code needs (such as data sets for testing) to

/cslab/class/cs394/[your userid]/pca

Due Fri, May 3

Thomas VanDrunen

Last modified: Fri Apr 5 16:55:33 CDT 2019