I am dealing with several large txt file, each of them has about 8000000 lines. A short example of the lines are:
usedfor zipper fasten_coat
usedfor zipper fasten_jacket
usedfor zipper fasten_pant
usedfor your_foot walk
atlocation camera cupboard
atlocation camera drawer
atlocation camera house
relatedto more plenty
The code to store them in a dictionary is:
dicCSK = collections.defaultdict(list)
for line in finCSK:
line=line.strip('\n')
try:
r, c1, c2 = line.split(" ")
except ValueError:
print line
dicCSK[c1].append(r+" "+c2)
It runs good in the first txt file, but when it runs to the second txt file, I got an error MemoryError.
I am using window 7 64bit with python 2.7 32bit, intel i5 CPU, with 8Gb memory. How can I solve the problem?
Further explaining: I have four large files, each file contains different information for many entities. For example, I want to find all information for cat, its father node animal and its child node persian cat and so on. So my program first read all text files in the dictionary, then I scan all dictionaries to find information for cat and its father and its children.