WPBayes: Naive Bayesian Comment Spam Filter for WordPress

As the name implies, this plugin implements spam filtering using [Naive Bayesian classifier](http://en.wikipedia.org/wiki/Naive_Bayes_classifier). This plugin will automatically classify new comments as legitimate or spam based on past decision done by you.

**Important Warning**

This plugin requires modification to a few WordPress core files. The required modification are not much, but the plugin wouldn’t work at all if you don’t do that. This also means there’s a chance that other plugins –especially other spam fighting plugins– would no longer work. You will also need to redo modification if you upgrade WordPress.

**Requirement**

WordPress 1.5.2 and some PHP skills for editing PHP files.

**Installation**

First, make sure you are running 1.5.2. This plugin probably wouldn’t work with any other version. If you are running more recent version, then you should probably keep an eye on this page for updates.

Then, you need to edit two files on your WordPress installation. In `wp-admin/post.php`, around line 670, add line `do_action(‘edit_comment_pre’, $comment_ID);` before the SQL query:


...
$content = apply_filters('comment_save_pre', $_POST['content']);
do_action('edit_comment_pre', $comment_ID);
$result = $wpdb->query("
                UPDATE $wpdb->comments SET
                        comment_content = '$content',
                        comment_author = '$newcomment_author',
                        comment_author_email = '$newcomment_author_email',
                        comment_approved = '$comment_status',
                        comment_author_url = '$newcomment_author_url'".$datemodif."
                WHERE comment_ID = $comment_ID"
                );
...

Next, in file `wp-includes/comment-functions.php`, around line 575, add the line `$oldstatus = wp_get_comment_status($comment_id);` as follows:


...
function wp_set_comment_status($comment_id, $comment_status) {
    global $wpdb;
    $oldstatus = wp_get_comment_status($comment_id);

    switch($comment_status) {
                case 'hold':
...

and around the line 594, modify the line `do_action(‘wp_set_comment_status’, $comment_id, $comment_status);` into `do_action(‘wp_set_comment_status’, $comment_id, $comment_status, $oldstatus);`:


...
if ($wpdb->query($query)) {
                do_action('wp_set_comment_status', $comment_id, $comment_status, $oldstatus);
                return true;
} else {
...

Download the plugin: [wpbayes.tar.gz](https://priyadi.net/wp-content/plugins/wpbayes.tar.gz), and extract it in your `wp-content/plugins` directory. Now you should have the files `wpbayes.php`, `class.naivebayesian.php` and `class.naivebayesianWPstorage.php` in your plugins directory.

Enable the plugin from ‘Plugins’ menu.

**Operation**

The plugin is designed to augment the built in WordPress moderation system. If the plugin decides that a comment is legitimate, the comment will be checked further against WordPress’ moderation system (keyword moderation, whitelisting, open proxy checking, etc). However, if the plugin decides that the comment should go into moderation queue or marked as spam, it won’t consult the built in WordPress moderation system.

No comment will be marked as spam if it passes Bayesian test, even if it appears as spam according WordPress’ built in moderation system. I decide to do this because I had quite a few false positives in the past few weeks, which is my main motivation of writing this plugin.

When enabled, the plugin will automatically approve comments that appear legitimate (still subject to WordPress’ built in moderation) and trash ones that appear to be spam while learning about them in the process. Comments whose status is unclear whether legitimate or spam will be sent into moderation queue. Your decision on comments in moderation queue will also be learned by the plugin.

If the plugin somehow misclassifies a legitimate comment as spam or vice versa, you can correct the decision by altering their status and the plugin will learn your decision too. I recommend installing the [Paged Comment Editing](http://www.coldforged.org/paged-comment-editing-plugin/) plugin, so that you will be able to easily reverse status of legitimate comments that have been wrongly classified as spam.

The plugin register its own options page. From there, you will be able to alter spamminess threshold, reinitialize database and make the plugin learn from past decision.

**Notes**

In my experience, bayesian spam filtering is not as effective as in email, probably because an email contains a lot more information than a blog comment. However it should still be very effective.

This plugin uses pieces from the [PHP Naive Bayesian Filter](http://www.xhtml.net/php/PHPNaiveBayesianFilter) class by LoÃ¯c d’Anterroches. However, I don’t use its classification algorithm.

I’m using a modification the original algorithm from Paul Graham’s [A Plan for Spam](http://www.paulgraham.com/spam.html). I decided not to use 15 most interesting tokens, but instead use cutoff points at 0.05 and 0.95 respectively. In some cases, 15 words is all the comment has.

51 comments

Ronny says:

7 October 2005 at 05:55

Since there are no comments yet (which is unusual for your blog), then either your comments is not working, or you’re posting very early in the morning where everyone else is still sleeping :D

What’s the default training threshold? How many manual decisions before it triggers automatic classification?

It’s a bit unfortunate that a modification to the core WP is needed.

Reply
muhnur says:

7 October 2005 at 07:21

tadi abis sahur ada error 404 waktu mau isi komentar. jadi ga bisa nomer 1 neh. ada apa denganmu priyadi.net ? :-w

Reply
abe says:

7 October 2005 at 08:20

hoahm..
gag ngerti :((

Reply
andriansah says:

7 October 2005 at 08:53

Bukannya di default wordpress sudah ada seperti ini?

Jadi semua comment akan di moderate berdasarkan email, nah email yang sudah pernah di approve akan langsung masuk dan tidak di moderate lagi…

Kalo dari yang gw tangkap plugin loe ini udah terakomodasi(cie… kata2nya) dari wordpress 1.5.0

Maap kalo salah maksudnya tentang plugin ini

Reply
amen says:

7 October 2005 at 08:54

mau ngetest ngisi comment pake firefox 1.5 beta di macintosh. biasa kalau ngisi komen di sini luambatttt sekali
yg ini mendingan

Reply
toni says:

7 October 2005 at 09:15

Klo aku pake spam karma 2, dan so far menyenangkan .. operationnya bertingkat .. mulai dari simple, sampe javascript dan yang terakhir captcha .. hmm, kyknya beda tujuan :D

Reply
Ronny says:

7 October 2005 at 10:22

(sorry off topic)

mau ngetest ngisi comment pake firefox 1.5 beta di macintosh. biasa kalau ngisi komen di sini luambatttt sekali
yg ini mendingan

amen: jadi kesimpulannya firefoxnya apa javascriptnya nih yg bikin lambat? apa mac sux seperti kata eko? hehehe.. kalo di comment di blog gua lambat ya?

Reply
Priyadi says:

7 October 2005 at 10:47

#1: i’m using the original paul graham’s algorithm. that means a token will be considered only if it had occurred at least 5 times. the modification is unfortunate but required because wordpress doesn’t provide enough hooks for this to work as i intended.

#4: beda, ini bayesian spam filter, mirip dengan filter di email, jadi wordpress ‘belajar’ membedakan sebuah komentar spam atau tidak, setelah cukup ‘belajar’ dia bisa membedakan komentar itu spam atau tidak berdasarkan apa yang dia pelajari.

Reply
ryosaeba says:

7 October 2005 at 15:39

anjrit gue keduluan semua.

1. tadinya mau ngetest browser baru. ini sih cukup kencang…. karena gue emang pake plugin noscript :D
2. gue pake spam karma 2… dan rasanya cukup bagus.

Reply
andriansah says:

7 October 2005 at 19:47

#8
Jadi mirip ama yang disediakan wordpress.. apa tuh namanya? yang masukin kata2 yang berhubungan dengan spam.

Reply
Priyadi says:

7 October 2005 at 21:09

#10: beda, kalau itu kata2nya harus kita masukin secara manual, kalau pakai plugin ini otomatis langsung ‘dipelajari’ ketika kita moderate suatu komentar

Reply
geblek says:

8 October 2005 at 05:01

:o bisa dicoba nich

Reply
Jauhari says:

8 October 2005 at 05:21

Belum faham saya :(

Reply
Rendy Maulana says:

8 October 2005 at 11:18

canggih yah….

Reply
Ozzie says:

9 October 2005 at 11:54

@Priyadi: Nanya dunk, pluginnya nge bypass system anti spam built ini WP ga (kalo bisa bypass lebih bagus lagi)? Trus algoritma bayesiannya diambil dari mana? Apakah dari saat kita nginstall pluginnya ataukah dari komentar – komentar terdahulu (sebelum nginstall plugin ini) yang kita kategorikan sebagai spam?

Soalnya baruuuuuu aja kemaren ngapus comment – comment spam yang nge-clog di MySQL (baru tau… ternyata comment spam itu tetep di simpen di database walaupun udah dihapus…)

Reply
Priyadi says:

10 October 2005 at 10:48

#15: gak ngebypass anti spam built-in WP, anti spam built-in WP baru ditrigger kalau komentarnya lolos uji bayesian. kalau gak mau pakai anti spam built-in tinggal dimatikan saja. gua sendiri cuma nyalain sistem whitelisting bawaan wordpress.

plugin ini bisa mempelajari spam yang lama, dan juga spam2 baru. ketika email dikategorikan sebagai spam, dia langsung pelajari bahwa itu spam (dan sebaliknya). untuk mempelajari spam2 yang lama, bisa lewat menu options.

algoritma yang dipakai: algoritma paul graham

Reply
Pingback: WPBayes at Carpe Diem
hendito says:

11 October 2005 at 08:20

ntar efeknya dimana neh …apakah jika perlu moderasi? atau langsung aja submit komentarnya tanpa moderasi? mohon diberi petunjuk

Reply
Priyadi says:

11 October 2005 at 11:11

#18: email yang masuk discan oleh plugin ini. jika kadar spamminessnya di bawah threshold non spam, maka ditest terhadap antispam built-in dari wordpress. jika kadar spamminessnya di atas threshold spam, maka langsung ditandai sebagai spam. jika berada di antara threshold spam dan non spam, maka masuk moderation queue.

Reply
Pingback: WordPress Italy » Blog Archive » Plugin WordPress: WPBayes
Pingback: Blogging Pro China » WordPress Plugin: WPBayes
Pingback: Akismet and Home at Carpe Diem
Ozzie says:

27 October 2005 at 03:53

Whew, I just knew that the pingback from my site is jumbled like that… Well, I think it’s time to turn off the AJAX commenting system :(

Reply
Pingback: Moon in Melbourne » Akismet, ì–¼ë§ˆë‚˜ ìž˜ ìž‘ë™í•˜ëŠ”ê°€?
Peter says:

30 December 2005 at 06:50

Hello has someone made experiences with this plgin and wordpress 2.0?

Reply
Markov Vaughn says:

9 January 2006 at 04:24

Ada yang punya referensi algoritma bayes gak ?
maksudnya secara logika contact me markov@australia.edu
thx b4

Reply
al says:

2 March 2006 at 19:05

How i cant get the plugin that shows th flag of country the browser and the so
Thanks

Reply
jon says:

10 May 2006 at 17:41

hi,
this plugin is very good.
how to do it with wp 2.0?

Reply
Packo says:

27 October 2006 at 22:09

Its really work?

Reply
Brendon says:

9 December 2006 at 07:54

WordPress Trackback Spam!!!
I have installed plugins that prevent comment spams, but this won't prevent trackback to be blocked. I've been spam by many
MFA websites that most probably is from the same network with trackback, but they are not linking me on their website. May I
know how do they do it and how do I stop it? Without disabling trackback?
Thanks, and I'm using WordPress.

Reply
sun bingo says:

13 December 2006 at 02:41

i dont think there is a way of stoping trackback spam
without disabling trackback

Reply
Doug's Bingo says:

19 December 2006 at 01:25

Thanks for the info. I will have to give this plugin a test drive. I just started creating wordpress blogs, and I am starting to see some spam. I will see if this will help.

Reply
Rich S says:

30 December 2006 at 09:40

:-? hmmm I was just wondering if you are using this for this blog. There seems to be a lot of garbly goop….

Reply
Priyadi says:

30 December 2006 at 15:38

#33: yes, i’m using it. however, it should be noted that i’m still using an ancient WP

Reply
Jake says:

5 January 2007 at 17:46

Hi Priyadi, I couldn’t see any instructions for use with WP 2.0. Will the extension work with WP 2.0 and is it stable?

Thanks :)

Reply
Domain regisztráció says:

17 January 2007 at 16:02

Hi, I need to make a decision on a blog spam system. Do I understand well that Spam karma works only from the data of the present installation? Or does it read data from any black list or central database?

Reply
forum overzicht says:

19 January 2007 at 04:22

is this for wordpress 2 compatible, and do you use it here?

Reply
Darin says:

24 January 2007 at 16:36

comment spam filter :d
tapi kayaknya blognya mas pri gak pake plugin model ginian deh.. buktinya semua comment bisa masuk?
bener gak sih? apa aku yang gak mudeng..?

Reply
ç™¾å˜è´è´ says:

18 May 2007 at 21:03

does it support wordpress 2.2?

Reply
MaryJames says:

24 May 2007 at 04:33

Yo all

How I can change avatar in this forum?

Reply
goonie baby says:

23 July 2007 at 15:20

mas kalo Naive Bayesian ini sama Spam Karma okean yang mana Yah buat wordpress?makaciiih.

Reply
No Bull Bingo says:

20 September 2010 at 18:52

I am not sure whether this Word press spam filter works adequately or not but i stop commenting on the wordpress blog because they maximum had installed this spam filter they never get the real bloggers to comment on their, its very painful for us.

Reply
Pingback: Top 10 WordPress Anti Spam Plugins | WordPress Support Wiki
Pingback: Top 10 tools for Wordpress AntiSpam Â« TechInFo
Pingback: Top 10 Anti Spam Plugins For Wordpress | WPsharing
Pingback: Top 10 Anti-Spam áƒžáƒšáƒáƒ’áƒ˜áƒœáƒ˜ áƒ•áƒáƒ áƒ“áƒžáƒ áƒ”áƒ¡áƒ˜áƒ¡áƒ—áƒ•áƒ˜áƒ¡ | Wordpress
Pingback: 10 plugin antispam per wordpress - sastgroup.com
Pingback: 10 plugin antispam per wordpress | buonaguida.com
Pingback: Top 10 WordPress Anti Spam Plugins | Search Engine Optimization
Pingback: Top 10 WordPress Anti Spam Plugins | Bestnet Support Wiki
Bingo picks says:

30 April 2012 at 05:18

Is there any updates coming in the near future for this plugin?

Reply

51 comments

Leave a comment Cancel reply