2 views

I'm struggling with slicing. I thought that generally it's easy and I understand it but when it comes to the below situation my ideas don't work.

Situation: In one of my columns in DF I want to remove in all rows some string that sometimes occurs and sometimes doesn't.

The problem looks like this:

1.I don't know the exact position when this string starts (in each row it could be a different

2.This string various, depending on each row, however, it always starts from the same structure - let's say: "¯main_"

3.After "¯main_" usually, there're some numbers (it various) however the length always is the same (9 numbers)

4.I'm already after splitting and I have around ~40 columns (each with a similar problem). That's why I'm looking for some more efficient way to solve it then splitting, generating ~40 more columns and then dropping them.

5.Sometimes after this string with "¯main_" there's some additional string I'd like to leave in the same column.

Example:

Column1

A1-19

B2-52

C3-1245¯main_123456789

D4

Z89028

F7¯main_123456789,Z241

Looking for a result like this:

Column1

A1-19

B2-52

C3-1245

D4

Z89028

F7,Z241

The best solution that I prepared up till now:

a = test.find("¯")

b = a+14

df[0].str.slice(start = a, stop = b)

But:

1.It doesn't work properly

2.And I'm aware that test.find() returns -1 when it won't find a character. I don't know how to escape from it - writing a loop? I believe that some better (more efficient) solution exists. However, after a few hours of looking for it, I decided to find help.

by (41.4k points)

1.Loop by all column

2.Then, split by position

3.After that, append extracted strings by positions to helper list.

4. At last, assign back to column

print (df)

Column1

0                      NaN

1                    B2-52

2  C3-1245 ¯main_123456789

3                       D4

4                   Z89028

5  F7 ¯main_123456789,Z241

for c in df.columns:

out = []

for x in df[c]:

if x == x:

p = x.find('¯')

if p != -1:

out.append(x[:p] + x[p+14:])

else:

out.append(x)

else:

out.append(x)

df[c] = out

print (df)

Column1

0        NaN

1      B2-52

2  C3-1245 9

3         D4

4     Z89028

5  F7 9,Z241

If you wish to learn What is Data Science visit this Data Science Online Course.