Explore Courses Blog Tutorials Interview Questions
0 votes
in Data Science by (17.6k points)

Is it possible to convert a string vector into an indexed one using numpy ?

Suppose I have an array of strings like ['ABC', 'DEF', 'GHI', 'DEF', 'ABC'] etc. I want it to be changed to an array of integers like [0,1,2,1,0]. Is it possible using numpy? I know that Pandas has a Series class that can do this, courtesy of  this answer. Is there something similar for numpy as well?

Edit :  np.unique() returns unique value for all elements. What I'm trying to do is convert the labels in the  Iris dataset to indices, such as 0 for Iris-setosa, 1 for Iris-versicolor and 2 for Iris-virginica respectively. Is there a way to do this using numpy?

1 Answer

0 votes
by (41.4k points)
edited by

Here, you should use  numpy.unique with parameter return_inverse=True, 

Check factorizing values values:

L = ['ABC', 'DEF', 'GHI', 'DEF', 'ABC'] print (np.unique(L, return_inverse=True)[1]) [0 1 2 1 0]

 With list or array pandas  factorize will work fine :

print (pd.factorize(L)[0]) [0 1 2 1 0]

Thinking of getting a master's degree in Data Science? Enroll in the MSc Data Science in UK! 

Browse Categories