Explore Courses Blog Tutorials Interview Questions
0 votes
in AI and Deep Learning by (50.2k points)

I have recently been inspired to write spam filters in JavaScript, Greasemonkey-style, for several websites I use that are prone to spam (especially in comments). When considering my options about how to go about this, I realize I have several options, each with pros/cons. My goal for this question is to expand on this list I have created, and hopefully determine the best way of client-side spam filtering with JavaScript.

As for what makes a spam filter the "best", I would say these are the criteria:

  • Most accurate

  • Least vulnerable to attacks

  • Fastest

  • Most transparent

Also, please note that I am trying to filter content that already exists on websites that aren't mine, using Greasemonkey Userscripts. In other words, I can't prevent spam; I can only filter it.

Here is my attempt, so far, to compile a list of the various methods along with their shortcomings and benefits:

Rule-based filters:

What it does: "Grades" a message by assigning a point value to different criteria (i.e. all uppercase, all non-alphanumeric, etc.) Depending on the score, the message is discarded or kept.


  • Easy to implement

  • Mostly transparent


  • Transparent- it's usually easy to reverse engineer the code to discover the rules, and thereby craft messages which won't be picked up

  • Hard to balance point values (false positives)

  • Can be slow; multiple rules have to be executed on each message, a lot of times using regular expressions

  • In a client-side environment, server interaction or user interaction is required to update the rules

Bayesian filtering:

What it does: Analyzes word frequency (or trigram frequency) and compares it against the data it has been trained with.


  • No need to craft rules

  • Fast (relatively)

  • Tougher to reverse engineer


  • Requires training to be effective

  • Trained data must still be accessible to JavaScript; usually in the form of human-readable JSON, XML, or flat file

  • Data set can get pretty large

  • Poorly designed filters are easy to confuse with a good helping of common words to lower the spamacity rating

  • Words that haven't been seen before can't be accurately classified; sometimes resulting in incorrect classification of the entire message

  • In a client-side environment, server interaction or user interaction is required to update the rules

Bayesian filtering- server-side:

What it does: Applies the Bayesian filtering server-side by submitting each message to a remote server for analysis.


  • All the benefits of regular Bayesian filtering

  • Training data is not revealed to users/reverse engineers


  • Heavy traffic

  • Still vulnerable to uncommon words

  • Still vulnerable to adding common words to decrease spamacity

  • The service itself may be abused

  • To train the classifier, it may be desirable to allow users to submit spam samples for training. Attackers may abuse this service


What it does: Applies a set of criteria to a message or some attribute of it. If one or more (or a specific number of) criteria match, the message is rejected. A lot like rule-based filtering, so see its description for details.

CAPTCHAs, and the like:

Not feasible for this type of application. I am trying to apply these methods to sites that already exist. Greasemonkey will be used to do this; I can't start requiring CAPTCHAs in places that they weren't before someone installed my script.

Can anyone help me fill in the blanks? Thank you,

1 Answer

0 votes
by (108k points)

 Here are 6 modern Solutions to Protect Web Forms from Spam which includes adding Fields That Only Spam Bots Can See and Fill In and many more:

You can also refer to this link for building a spam-free contact without using any captchas:

If you want to learn about Java then visit this Java Tutorial.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.6k answers


108k users

Browse Categories