Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I have been working with a data frame in which data record have useful information in square brackets and non-useful information outside the square bracket.

Sample Data frame:

 Record        Data

      1          Rohan is [age:10] with [height:130 cm].

      2          Girish is [age:12] with [height:140 cm].

      3          Both kids live in [location:Punjab] and [location:Delhi].

      4          They love to play [Sport:Cricket] and [Sport:Football].

Expected Output:

 Record        Data

      1          [age:10],[height:130 cm]

      2          [age:12],[height:140 cm]

      3          [location:Punjab],[location:Delhi]

      4          [Sport:Cricket],[Sport:Football]

I have been trying this but cannot get the desired output.

df['b'] = df['Record'].str.findall('([[][a-z \s]+[]])', expand=False).str.strip()

print(df['b'])

That doesn't seems to work.

I am new with Python.

1 Answer

0 votes
by (41.4k points)

You should use findall with join for strings:

df['b'] = df['Data'].str.findall('(\[.*?\])').str.join(', ')

print (df)

   Record                                               Data  \

0       1            Rohan is [age:10] with [height:130 cm].   

1       2           Girish is [age:12] with [height:140 cm].   

2       3   Both kids live in [location:Punjab] and [Delhi].   

3       4  They love to play [Sport:Cricket] and [Sport:F...   

                                   b  

0          [age:10], [height:130 cm]  

1          [age:12], [height:140 cm]  

2         [location:Punjab], [Delhi]  

3  [Sport:Cricket], [Sport:Football] 

If you want to learn Python visit this Python Training

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.5k answers

500 comments

108k users

Browse Categories

...