Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I have a column in my CSV file in the following way

4048.187796

4254.6215672333340-0-0-

4229.9155995666671-0-0-

4427.0494321833340-0-0-

4303.428593050-0-0-

4256.6235064166670-0-0-

4132.5399525833330-0-0-

4263.5820142833341-0-0-

4320.6955591833340-0-0-

4342.1270119333330-0-0-

4447.8283416833340-0-0-

4409.2305202500010-0-0-

4280.650570850-1-0-

4283.5942898166680-0-0-

4341.1896358666670-0-0-

4263.1282187000010-0-0-

4222.3119095333330-0-0-

4314.9844073333331-0-0-

The format of the value is some fraction value + ((1|0)-.*)? Some lines will only have a fraction value. E,g. the 1st line. I want to split this into two columns as follows

4048.187796, 

4254.621567233334, 0-0-0-

4229.915599566667, 1-0-0-

4427.049432183334, 0-0-0-

4303.42859305, 0-0-0-

4256.623506416667, 0-0-0-

4132.539952583333, 0-0-0-

4263.582014283334, 1-0-0-

4320.695559183334, 0-0-0-

4342.127011933333, 0-0-0-

4447.828341683334, 0-0-0-

4409.230520250001, 0-0-0-

4280.65057085, 0-1-0-

4283.594289816668, 0-0-0-

4341.189635866667, 0-0-0-

4263.128218700001, 0-0-0-

4222.311909533333, 0-0-0-

4314.984407333333, 1-0-0-

I can do this by reading line by line and then manipulate each value by finding the index of '-' and substring that by index - 2. But as I have several files and each file has more than 1000 lines I don't want to do that. Is there a way for me to do this directly using panda and slice functions?

I tried df['new_col'] = df['last'].str.slice But Ican't give a fix value to slice start index as it changes from row to row

1 Answer

0 votes
by (36.8k points)

Try regular expression ^([0-9.]+)((?:[01]-)*)$ + str.extract:

df.last.str.extract('^([0-9.]+)((?:[01]-)*)$')

#                    0       1

#0         4048.187796        

#1   4254.621567233334  0-0-0-

#2   4229.915599566667  1-0-0-

#3   4427.049432183334  0-0-0-

#4       4303.42859305  0-0-0-

#5   4256.623506416667  0-0-0-

#6   4132.539952583333  0-0-0-

#7   4263.582014283334  1-0-0-

#8   4320.695559183334  0-0-0-

#9   4342.127011933333  0-0-0-

#10  4447.828341683334  0-0-0-

#11  4409.230520250001  0-0-0-

#12      4280.65057085  0-1-0-

#13  4283.594289816668  0-0-0-

#14  4341.189635866667  0-0-0-

#15  4263.128218700001  0-0-0-

#16  4222.311909533333  0-0-0-

#17  4314.984407333333  1-0-0-

 If you are a beginner and want to know more about Data Science the do check out the Data Science course

Browse Categories

...