The goal of this project is to implement the algorithm for computing the principal components for a data set.
Write two functions in Python 3 with the following signatures:
pca(data, M) transform(data, components)
Where
data
is an array-like with shape (n,d)
,
containing n
data points each of dimension d
.
M
is the desired number of principal components to be found
components
is an array-like with shape
(M, d)
containing principal components as
a collection of vectors.
The pca
function returns the principal components
of the given data in a form that can then be passed as a paramter
to transform
,
which returns the given data projected onto the space defined
by the principal components.
To test your implementation, choose one data set
that you have used in a previous project
and choose a classifier (preferably the KNN, MLP, or
SVM classifier that you wrote in a previous project of your
choice;
or you may use a scikit-learn
classifier
if you aren't confident in any of your own...).
Find the performance of that classifier on the original data set
and then again on the data set transformed by principal component
analysis.
(Keep in mind that presumably the performance of the classifier will go down. PCA doesn't improve performance but rather makes it more feasible in the light of high dimensionality. The hope is that the performance will only go down a little.)
Submit your code in two files, pca.py
and test_pca.py
so that the following code will work:
import pca components = pca.pca(X, 5) X_transformed = pca.transform(X, components)
Moreover, the following should work from the command line:
python3 test_pca.py
Which should display information about the performance of your classifier with and without PCA.
Finally, include a file README
that (briefly) describes how you tested your
classifier and to what results, and anything else you
think it would be good for me to know in order for me
to give your submission the fairest grading.
To turn in:
Copy pca.py
, test_pca.py
,
README
, and
any other files your code needs (such as data sets for
testing) to
/cslab/class/cs394/[your userid]/pca
Due Fri, May 3