0 votes
1 view
in AI and Deep Learning by (48.4k points)

I have recently been inspired to write spam filters in JavaScript, Greasemonkey-style, for several websites I use that are prone to spam (especially in comments). When considering my options about how to go about this, I realize I have several options, each with pros/cons. My goal for this question is to expand on this list I have created, and hopefully determine the best way of client-side spam filtering with JavaScript.

As for what makes a spam filter the "best", I would say these are the criteria:

  • Most accurate

  • Least vulnerable to attacks

  • Fastest

  • Most transparent

Also, please note that I am trying to filter content that already exists on websites that aren't mine, using Greasemonkey Userscripts. In other words, I can't prevent spam; I can only filter it.

Here is my attempt, so far, to compile a list of the various methods along with their shortcomings and benefits:


Rule-based filters:

What it does: "Grades" a message by assigning a point value to different criteria (i.e. all uppercase, all non-alphanumeric, etc.) Depending on the score, the message is discarded or kept.

Benefits:

  • Easy to implement

  • Mostly transparent

Shortcomings:

  • Transparent- it's usually easy to reverse engineer the code to discover the rules, and thereby craft messages which won't be picked up

  • Hard to balance point values (false positives)

  • Can be slow; multiple rules have to be executed on each message, a lot of times using regular expressions

  • In a client-side environment, server interaction or user interaction is required to update the rules

Bayesian filtering:

What it does: Analyzes word frequency (or trigram frequency) and compares it against the data it has been trained with.

Benefits:

  • No need to craft rules

  • Fast (relatively)

  • Tougher to reverse engineer

Shortcomings:

  • Requires training to be effective

  • Trained data must still be accessible to JavaScript; usually in the form of human-readable JSON, XML, or flat file

  • Data set can get pretty large

  • Poorly designed filters are easy to confuse with a good helping of common words to lower the spamacity rating

  • Words that haven't been seen before can't be accurately classified; sometimes resulting in incorrect classification of the entire message

  • In a client-side environment, server interaction or user interaction is required to update the rules

Bayesian filtering- server-side:

What it does: Applies the Bayesian filtering server-side by submitting each message to a remote server for analysis.

Benefits:

  • All the benefits of regular Bayesian filtering

  • Training data is not revealed to users/reverse engineers

Shortcomings:

  • Heavy traffic

  • Still vulnerable to uncommon words

  • Still vulnerable to adding common words to decrease spamacity

  • The service itself may be abused

  • To train the classifier, it may be desirable to allow users to submit spam samples for training. Attackers may abuse this service

Blacklisting:

What it does: Applies a set of criteria to a message or some attribute of it. If one or more (or a specific number of) criteria match, the message is rejected. A lot like rule-based filtering, so see its description for details.

CAPTCHAs, and the like:

Not feasible for this type of application. I am trying to apply these methods to sites that already exist. Greasemonkey will be used to do this; I can't start requiring CAPTCHAs in places that they weren't before someone installed my script.


Can anyone help me fill in the blanks? Thank you,

1 Answer

0 votes
by (104k points)

 Here are 6 modern Solutions to Protect Web Forms from Spam which includes adding Fields That Only Spam Bots Can See and Fill In and many more:

https://www.lifewire.com/solutions-to-protect-web-forms-from-spam-3467469

You can also refer to this link for building a spam-free contact without using any captchas: https://www.nfriedly.com/techblog/2009/11/how-to-build-a-spam-free-contact-forms-without-captchas/

If you want to learn about Java then visit this Java Tutorial.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...