Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Data Science by (17.6k points)

Im new in pandas so would like to ask some help according to excel file. Here I have some sheet with column 1:

Index   Column1

1       PF7293 

2       NodeB Name=SN5208, LogicRNCID=106

3       KL5083

4       Label=DL7765A3U-2, CellID=28643, LogicRNCID=201

and I wanna create another column2 that should have some word from column1 and look like that:

Index Column2

1     PF7293 

2     SN5208

3     KL5083

4     DL7765

in excel we used MID. I would like to do the same using pandas. Thank you!

Question 2

New sheet looks like that:

Column1    Column2 

KL7110     BTS works 

KS5007     BSS works 

KL5066     Planned works 

KL5147     Planned works 

KL5066     Unplanned work 

KL5077     Power work

KL5077     Power work 

AN9045     MW work 

I wanna delete same value from Column 2 for one value in column1. For example here is 2 KL5077 in column1 with same value in column2 I would Like to delete one of them. And second problem here is 2 KL5066 in Column1 with different value in Column2 and in this case I would like to put values in Column2 together like "Planned work/Unplanned work". Hope I ve explained well))

1 Answer

0 votes
by (41.4k points)

We will use  Series.str.extract.For each subject string in the Series, extract groups from the first match of regular expression 

df['Column2'] = df['Column1'].str.extract(r'([A-Z]{2}\d{4})')

Where the regex pattern here can be though of as "2 uppercase letters" followed by "4 digits"

Output:

   Index                                          Column1 Column2

0      1                                           PF7293  PF7293

1      2                NodeB Name=SN5208, LogicRNCID=106  SN5208

2      3                                           KL5083  KL5083

3      4  Label=DL7765A3U-2, CellID=28643, LogicRNCID=201  DL7765

Answer for Question 2:

1) To drop the duplicate rows use:

df.drop_duplicates(subset=['Column1', 'Column2'], inplace=True)

2) To join the multiple 'Column2' values use:

df_new = df.groupby('Column1')['Column2'].apply('/'.join).reset_index()

Output:

  Column1                       Column2

0  AN9045                       MW work

1  KL5066  Planned works/Unplanned work

2  KL5077                    Power work

3  KL5147                 Planned works

4  KL7110                     BTS works

5  KS5007                     BSS works

Browse Categories

...