Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I am currently exploring a dataset in CSV format having values like the following:

example 1, class 1

example 2, class 1, class 2

example 3, class 2,

example 4, class 1, class 2, class 4

The classes are assigned in variable length to each example as you can see. Is there any method (using numpy or pandas) that can help me transform this data to one class per instances? Just like the following:

example 1, class 1

example 2, class 1

example 2, class 2

example 3, class 2

example 4, class 1

example 4, class 2

example 4, class 4

I am doing this so that it can be fed to Neural Network models easily. I have tried several ways in pandas but so far no luck.

1 Answer

0 votes
by (41.4k points)

Using string manipulations and comprehensions of Python.

m = lambda x: map(str.strip, x.split(','))

with open('test.csv') as f:

    df = pd.DataFrame(

        [[x, y] for x, *ys in map(m, f.readlines()) for y in ys if y],

        columns=['Example', 'Class']

    )

df

     Example    Class

0  example 1  class 1

1  example 2  class 1

2  example 2  class 2

3  example 3  class 2

4  example 4  class 1

5  example 4  class 2

6  example 4  class 4

Browse Categories

...