How to use scikit-learn PCA for features reduction and know which features are discarded

Question

1 Answer

Anurag · Answer 1 · 2019-06-17T13:05:50+0000

Principle Component Analysis (PCA) is a dimensionality reduction technique. It is used to remove less useful (less correlated) features from the dataset. It is more useful in unsupervised machine learning, where we work on unlabelled data.

The features that PCA object has determined during fitting are in pca.components_. The vector space orthogonal to the one spanned by pca.components_ is discarded.

PCA does not "discard" or "retain" any of your pre-defined features (encoded by the columns you specify). It mixes all of them (by weighted sums) to find orthogonal directions of maximum variance.

If this is not the behavior you are looking for, then PCA dimensionality reduction is not the way to go. For some simple general feature selection methods, you can take a look at sklearn.feature_selection.

For example:

# Principal Component Analysis
from numpy import array
from sklearn.decomposition import PCA
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# create the PCA instance
pca = PCA(2)
# fit on data
pca.fit(A)
# access values and vectors
print(pca.components_)
print(pca.explained_variance_)
# transform data
B = pca.transform(A)
print(B)

Hope this answer helps.

If you want to know more about Machine Learning then watch this video:

How to use scikit-learn PCA for features reduction and know which features are discarded

How to use scikit-learn PCA for features reduction and know which features are discarded

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions