Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (45.3k points)

My data looks like

04/07/16, 12:51 AM - User1: Hi

04/07/16, 8:19 PM - User2: Here’s a link for you

https://www.abcd.com/folder/1SyuIUCa10tM37lT0F8Y3D

04/07/16, 8:29 PM - User2: Thanks

Using the below code, I am able to split each message into each new line

data = []

for line in open('/content/drive/My Drive/sample.txt'):

    items = line.rstrip('\r\n').split('\t')   # strip new-line characters and split on column delimiter

    items = [item.strip() for item in items]  # strip extra whitespace off data items

    data.append(items)

However, I do not want to split the line where a newline character is followed by a link. For example, Line 3 & 4 are one single message but they split up because of newline character.

CRLF

Is there a way to avoid splitting when a newline character is followed by http?

1 Answer

0 votes
by (16.8k points)

It can probably be optimised, but it works:

data = []                                                                

prev = ''                                                                

with open('C:/Users/kavanaghal/python/sample.txt', 'r', encoding='utf-8') as f:            

    prev = f.readline().strip()                                          

    while True:                                                          

        nxt = f.readline().strip()                                       

        if 'http' in nxt:                                                

            data.append(prev + ": " + nxt)                               

            prev = f.readline()                                          

            continue                                                     

        data.append(prev)                                                

        prev = nxt                                                       

        if not nxt:                                                      

            break                                                        

print(data)                                                              

>> ['04/07/16, 12:51 AM - User1: Hi', 

    '04/07/16, 8:19 PM - User2: Here's a link for you: https://www.abcd.com/folder/1SyuIUCa10tM37lT0F8Y3D', 

    '04/07/16, 8:29 PM - User2: Thanks']

Related questions

0 votes
1 answer
0 votes
5 answers
0 votes
1 answer

Browse Categories

...