0 votes
1 view
in Machine Learning by (15.7k points)

I'm new to machine learning, and for my first project, I'd like to write a naive Bayes spam filter. I was wondering if there are any publicly available training sets of labeled spam/not spam emails, preferably in plain text and not a dump of a relational database (unless they pretty-print those?).

I know such a publicly available database exists for other kinds of text classification, specifically news article text. I just haven't been able to find the same sort of thing for emails.

1 Answer

0 votes
by (33.2k points)

I hope this dataset will help you to complete your task: http://untroubled.org/spam/

This archive folder has around a gigabyte of compressed accumulated spam messages. You just need to get the non-spam email. 

Hope this answer helps.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !