Conditional skip of split of new lines

Question

asked Jul 24, 2019 in Python by Eresh Kumar (45.3k points)

My data looks like

04/07/16, 12:51 AM - User1: Hi
04/07/16, 8:19 PM - User2: Here’s a link for you
https://www.abcd.com/folder/1SyuIUCa10tM37lT0F8Y3D
04/07/16, 8:29 PM - User2: Thanks

Using the below code, I am able to split each message into each new line

data = []
for line in open('/content/drive/My Drive/sample.txt'):
items = line.rstrip('\r\n').split('\t') # strip new-line characters and split on column delimiter
items = [item.strip() for item in items] # strip extra whitespace off data items
data.append(items)

However, I do not want to split the line where a newline character is followed by a link. For example, Line 3 & 4 are one single message but they split up because of newline character.

CRLF

Is there a way to avoid splitting when a newline character is followed by http?

1 Answer

Shubham Rana · Answer 1 · 2019-07-24T18:35:41+0000

It can probably be optimised, but it works:

data = []
prev = ''
with open('C:/Users/kavanaghal/python/sample.txt', 'r', encoding='utf-8') as f:
prev = f.readline().strip()
while True:
nxt = f.readline().strip()
if 'http' in nxt:
data.append(prev + ": " + nxt)
prev = f.readline()
continue
data.append(prev)
prev = nxt
if not nxt:
break
print(data)
>> ['04/07/16, 12:51 AM - User1: Hi',
'04/07/16, 8:19 PM - User2: Here's a link for you: https://www.abcd.com/folder/1SyuIUCa10tM37lT0F8Y3D',
'04/07/16, 8:29 PM - User2: Thanks']

Conditional skip of split of new lines

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources