Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (19k points)

I'm new to machine learning, and for my first project, I'd like to write a naive Bayes spam filter. I was wondering if there are any publicly available training sets of labeled spam/not spam emails, preferably in plain text and not a dump of a relational database (unless they pretty-print those?).

I know such a publicly available database exists for other kinds of text classification, specifically news article text. I just haven't been able to find the same sort of thing for emails.

1 Answer

0 votes
by (33.1k points)

I hope this dataset will help you to complete your task: http://untroubled.org/spam/

This archive folder has around a gigabyte of compressed accumulated spam messages. You just need to get the non-spam email. 

Hope this answer helps.

Browse Categories

...