0 votes
1 view
in Python by (12.7k points)

I have a folder that is filled with text reports, the content of which should be stacked into a single list variable. 

Each record/index of the list, ought to be the full content of each document. 

So far I have this code, yet it isn't functioning too.

dir = os.path.join(current_working_directory, 'FolderName')

file_list = glob.glob(dir + '/*.txt')

corpus = [] #-->my list variable

for file_path in file_list:

    text_file = open(file_path, 'r')

    corpus.append(text_file.readlines()) 

    text_file.close()

I need to whether is there any better way to do this?

1 Answer

0 votes
by (26.4k points)

You simply need to read() each record in and affix/append it to your corpus list as follows: 

import glob

import os

file_list = glob.glob(os.path.join(os.getcwd(), "FolderName", "*.txt"))

corpus = []

for file_path in file_list:

    with open(file_path) as f_input:

        corpus.append(f_input.read())

print(corpus)

Each list passage would then be the whole substance of every content document. Note, utilizing readlines() would give you a list of lines for each record as opposed to the raw text.

Wanna become a Python expert? Come and join the python certification course and get certified.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...