What is the best way to filter spam with JavaScript?

Question

asked Jul 19, 2019 in AI and Deep Learning by ashely (50.2k points)

I have recently been inspired to write spam filters in JavaScript, Greasemonkey-style, for several websites I use that are prone to spam (especially in comments). When considering my options about how to go about this, I realize I have several options, each with pros/cons. My goal for this question is to expand on this list I have created, and hopefully determine the best way of client-side spam filtering with JavaScript.

As for what makes a spam filter the "best", I would say these are the criteria:

Most accurate
Least vulnerable to attacks
Fastest
Most transparent

Also, please note that I am trying to filter content that already exists on websites that aren't mine, using Greasemonkey Userscripts. In other words, I can't prevent spam; I can only filter it.

Here is my attempt, so far, to compile a list of the various methods along with their shortcomings and benefits:

Rule-based filters:

What it does: "Grades" a message by assigning a point value to different criteria (i.e. all uppercase, all non-alphanumeric, etc.) Depending on the score, the message is discarded or kept.

Benefits:

Easy to implement
Mostly transparent

Shortcomings:

Transparent- it's usually easy to reverse engineer the code to discover the rules, and thereby craft messages which won't be picked up
Hard to balance point values (false positives)
Can be slow; multiple rules have to be executed on each message, a lot of times using regular expressions
In a client-side environment, server interaction or user interaction is required to update the rules

Bayesian filtering:

What it does: Analyzes word frequency (or trigram frequency) and compares it against the data it has been trained with.

Benefits:

No need to craft rules
Fast (relatively)
Tougher to reverse engineer

Shortcomings:

Requires training to be effective
Trained data must still be accessible to JavaScript; usually in the form of human-readable JSON, XML, or flat file
Data set can get pretty large
Poorly designed filters are easy to confuse with a good helping of common words to lower the spamacity rating
Words that haven't been seen before can't be accurately classified; sometimes resulting in incorrect classification of the entire message
In a client-side environment, server interaction or user interaction is required to update the rules

Bayesian filtering- server-side:

What it does: Applies the Bayesian filtering server-side by submitting each message to a remote server for analysis.

Benefits:

All the benefits of regular Bayesian filtering
Training data is not revealed to users/reverse engineers

Shortcomings:

Heavy traffic
Still vulnerable to uncommon words
Still vulnerable to adding common words to decrease spamacity
The service itself may be abused
To train the classifier, it may be desirable to allow users to submit spam samples for training. Attackers may abuse this service

Blacklisting:

What it does: Applies a set of criteria to a message or some attribute of it. If one or more (or a specific number of) criteria match, the message is rejected. A lot like rule-based filtering, so see its description for details.

CAPTCHAs, and the like:

Not feasible for this type of application. I am trying to apply these methods to sites that already exist. Greasemonkey will be used to do this; I can't start requiring CAPTCHAs in places that they weren't before someone installed my script.

Can anyone help me fill in the blanks? Thank you,

1 Answer

vinita · Answer 1 · 2019-07-19T13:44:13+0000

Here are 6 modern Solutions to Protect Web Forms from Spam which includes adding Fields That Only Spam Bots Can See and Fill In and many more:

https://www.lifewire.com/solutions-to-protect-web-forms-from-spam-3467469

You can also refer to this link for building a spam-free contact without using any captchas: https://www.nfriedly.com/techblog/2009/11/how-to-build-a-spam-free-contact-forms-without-captchas/

If you want to learn about Java then visit this Java Tutorial.

What is the best way to filter spam with JavaScript?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources