0 votes
1 view
in Data Science by (17.6k points)

I have been working with a data frame in which data record have useful information in square brackets and non-useful information outside the square bracket.

Sample Data frame:

 Record        Data

      1          Rohan is [age:10] with [height:130 cm].

      2          Girish is [age:12] with [height:140 cm].

      3          Both kids live in [location:Punjab] and [location:Delhi].

      4          They love to play [Sport:Cricket] and [Sport:Football].

Expected Output:

 Record        Data

      1          [age:10],[height:130 cm]

      2          [age:12],[height:140 cm]

      3          [location:Punjab],[location:Delhi]

      4          [Sport:Cricket],[Sport:Football]

I have been trying this but cannot get the desired output.

df['b'] = df['Record'].str.findall('([[][a-z \s]+[]])', expand=False).str.strip()

print(df['b'])

That doesn't seems to work.

I am new with Python.

1 Answer

0 votes
by (38.2k points)

You should use findall with join for strings:

df['b'] = df['Data'].str.findall('(\[.*?\])').str.join(', ')

print (df)

   Record                                               Data  \

0       1            Rohan is [age:10] with [height:130 cm].   

1       2           Girish is [age:12] with [height:140 cm].   

2       3   Both kids live in [location:Punjab] and [Delhi].   

3       4  They love to play [Sport:Cricket] and [Sport:F...   

                                   b  

0          [age:10], [height:130 cm]  

1          [age:12], [height:140 cm]  

2         [location:Punjab], [Delhi]  

3  [Sport:Cricket], [Sport:Football] 

If you want to learn Python visit this Python Training

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...