Explore Courses Blog Tutorials Interview Questions
0 votes
in AI and Deep Learning by (50.2k points)

I have recently been inspired to write spam filters in JavaScript, Greasemonkey-style, for several websites I use that are prone to spam (especially in comments). When considering my options about how to go about this, I realize I have several options, each with pros/cons. My goal for this question is to expand on this list I have created, and hopefully determine the best way of client-side spam filtering with JavaScript.

As for what makes a spam filter the "best", I would say these are the criteria:

  • Most accurate

  • Least vulnerable to attacks

  • Fastest

  • Most transparent

Also, please note that I am trying to filter content that already exists on websites that aren't mine, using Greasemonkey Userscripts. In other words, I can't prevent spam; I can only filter it.

Here is my attempt, so far, to compile a list of the various methods along with their shortcomings and benefits:

Rule-based filters:

What it does: "Grades" a message by assigning a point value to different criteria (i.e. all uppercase, all non-alphanumeric, etc.) Depending on the score, the message is discarded or kept.


  • Easy to implement

  • Mostly transparent


  • Transparent- it's usually easy to reverse engineer the code to discover the rules, and thereby craft messages which won't be picked up

  • Hard to balance point values (false positives)

  • Can be slow; multiple rules have to be executed on each message, a lot of times using regular expressions

  • In a client-side environment, server interaction or user interaction is required to update the rules

Bayesian filtering:

What it does: Analyzes word frequency (or trigram frequency) and compares it against the data it has been trained with.


  • No need to craft rules

  • Fast (relatively)

  • Tougher to reverse engineer


  • Requires training to be effective

  • Trained data must still be accessible to JavaScript; usually in the form of human-readable JSON, XML, or flat file

  • Data set can get pretty large

  • Poorly designed filters are easy to confuse with a good helping of common words to lower the spamacity rating

  • Words that haven't been seen before can't be accurately classified; sometimes resulting in incorrect classification of the entire message

  • In a client-side environment, server interaction or user interaction is required to update the rules

Bayesian filtering- server-side:

What it does: Applies the Bayesian filtering server-side by submitting each message to a remote server for analysis.


  • All the benefits of regular Bayesian filtering

  • Training data is not revealed to users/reverse engineers


  • Heavy traffic

  • Still vulnerable to uncommon words

  • Still vulnerable to adding common words to decrease spamacity

  • The service itself may be abused

  • To train the classifier, it may be desirable to allow users to submit spam samples for training. Attackers may abuse this service


What it does: Applies a set of criteria to a message or some attribute of it. If one or more (or a specific number of) criteria match, the message is rejected. A lot like rule-based filtering, so see its description for details.

CAPTCHAs, and the like:

Not feasible for this type of application. I am trying to apply these methods to sites that already exist. Greasemonkey will be used to do this; I can't start requiring CAPTCHAs in places that they weren't before someone installed my script.

Can anyone help me fill in the blanks? Thank you,

1 Answer

0 votes
by (108k points)

 Here are 6 modern Solutions to Protect Web Forms from Spam which includes adding Fields That Only Spam Bots Can See and Fill In and many more:

You can also refer to this link for building a spam-free contact without using any captchas:

If you want to learn about Java then visit this Java Tutorial.

Browse Categories