Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (19k points)

I am building a model for binary classification problem where each of my data points is of 300 dimensions (I am using 300 features). I am using a PassiveAggressiveClassifier from sklearn. The model is performing really well.

I wish to plot the decision boundary of the model. How can I do so?

To get a sense of the data, I am plotting it in 2D using TSNE. I reduced the dimensions of the data in 2 steps - from 300 to 50, then from 50 to 2 (this is a common recommendation). Below is the code snippet for the same :

from sklearn.manifold import TSNE

from sklearn.decomposition import TruncatedSVD

X_Train_reduced = TruncatedSVD(n_components=50, random_state=0).fit_transform(X_train)

X_Train_embedded = TSNE(n_components=2, perplexity=40, verbose=2).fit_transform(X_Train_reduced)

#some convert lists of lists to 2 dataframes (df_train_neg, df_train_pos) depending on the label - 

#plot the negative points and positive points

scatter(df_train_neg.val1, df_train_neg.val2, marker='o', c='red')

scatter(df_train_pos.val1, df_train_pos.val2, marker='x', c='green')

image

I get a decent graph.

Is there a way that I can add a decision boundary to this plot which represents the actual decision boundary of my model in the 300 dim space?

1 Answer

0 votes
by (33.1k points)

For this problem, You can use scikit learn’s KNeighborsClassifier.

K Nearest Neighbors: 

KNN is a non-parametric, lazy learning algorithm. Its purpose is to use a database in which the data points are separated into several classes to predict the classification of a new sample point.

For example:

import numpy as np, matplotlib.pyplot as plt

from sklearn.neighbors.classification import KNeighborsClassifier

from sklearn.datasets.base import load_iris

from sklearn.manifold.t_sne import TSNE

from sklearn.linear_model.logistic import LogisticRegression

# replace the below by your data and model

iris = load_iris()

X,y = iris.data, iris.target

X_Train_embedded = TSNE(n_components=2).fit_transform(X)

print X_Train_embedded.shape

model = LogisticRegression().fit(X,y)

y_predicted = model.predict(X)

# replace the above by your data and model

# create meshgrid

resolution = 100 # 100x100 background pixels

X2d_xmin, X2d_xmax = np.min(X_Train_embedded[:,0]), np.max(X_Train_embedded[:,0])

X2d_ymin, X2d_ymax = np.min(X_Train_embedded[:,1]), np.max(X_Train_embedded[:,1])

xx, yy = np.meshgrid(np.linspace(X2d_xmin, X2d_xmax, resolution), np.linspace(X2d_ymin, X2d_ymax, resolution))

# approximate Voronoi tesselation on resolution x resolution grid using 1-NN

background_model = KNeighborsClassifier(n_neighbors=1).fit(X_Train_embedded, y_predicted) 

voronoiBackground = background_model.predict(np.c_[xx.ravel(), yy.ravel()])

voronoiBackground = voronoiBackground.reshape((resolution, resolution))

#plot

plt.contourf(xx, yy, voronoiBackground)

plt.scatter(X_Train_embedded[:,0], X_Train_embedded[:,1], c=y)

plt.show()

Hope this answer helps.

If you wish to learn more about Scikit Learn then visit this Scikit Learn Tutorial.

Browse Categories

...