How can I edit my normalize function so that it also removes punctuations and end of line characters

Question

asked Oct 11, 2020 in Data Science by blackindya (18.4k points)

How can I edit the normalize function so that it also removes the punctuations and end of line characters?

This is the code sample below:

filename="bible.Sentences.15.txt"
def getData(filename):
with open(filename,'r') as f:
#converting to list where each element is an individual line of text file
lines=[line.rstrip() for line in f]
return lines
filename="bibleSentences.txt"
getData(filename)

def normalize(filename):
#converting all letters to lowercase
lowercase_lines=[x.lower() for x in getData(filename)]
print(lowercase_lines)
return lowercase_lines
normalize(filename)

1 Answer

supriya · Answer 1 · 2020-10-11T05:28:00+0000

Here is the solution code:

import re
...
def normalize(data):
#converting all letters to lowercase
lowercase_lines=[x.lower() for x in data]
# strip out all non-word or tab or space characters
stripped_lines = [re.sub(r"[^\w \t]+", "", x) for x in lowercase_lines]
print(stripped_lines)
return stripped_lines

Lear Data science from scratch using data scientist

How can I edit my normalize function so that it also removes punctuations and end of line characters

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources