WPBayes: Naive Bayesian Comment Spam Filter for WordPress
As the name implies, this plugin implements spam filtering using Naive Bayesian classifier. This plugin will automatically classify new comments as legitimate or spam based on past decision done by you.
Important Warning
This plugin requires modification to a few WordPress core files. The required modification are not much, but the plugin wouldn’t work at all if you don’t do that. This also means there’s a chance that other plugins –especially other spam fighting plugins– would no longer work. You will also need to redo modification if you upgrade WordPress.
Requirement
WordPress 1.5.2 and some PHP skills for editing PHP files.
Installation
First, make sure you are running 1.5.2. This plugin probably wouldn’t work with any other version. If you are running more recent version, then you should probably keep an eye on this page for updates.
Then, you need to edit two files on your WordPress installation. In wp-admin/post.php
, around line 670, add line do_action('edit_comment_pre', $comment_ID);
before the SQL query:
...
$content = apply_filters('comment_save_pre', $_POST['content']);
do_action('edit_comment_pre', $comment_ID);
$result = $wpdb->query("
UPDATE $wpdb->comments SET
comment_content = '$content',
comment_author = '$newcomment_author',
comment_author_email = '$newcomment_author_email',
comment_approved = '$comment_status',
comment_author_url = '$newcomment_author_url'".$datemodif."
WHERE comment_ID = $comment_ID"
);
...
Next, in file wp-includes/comment-functions.php
, around line 575, add the line $oldstatus = wp_get_comment_status($comment_id);
as follows:
...
function wp_set_comment_status($comment_id, $comment_status) {
global $wpdb;
$oldstatus = wp_get_comment_status($comment_id);
switch($comment_status) {
case 'hold':
...
and around the line 594, modify the line do_action('wp_set_comment_status', $comment_id, $comment_status);
into do_action('wp_set_comment_status', $comment_id, $comment_status, $oldstatus);
:
...
if ($wpdb->query($query)) {
do_action('wp_set_comment_status', $comment_id, $comment_status, $oldstatus);
return true;
} else {
...
Download the plugin: wpbayes.tar.gz, and extract it in your wp-content/plugins
directory. Now you should have the files wpbayes.php
, class.naivebayesian.php
and class.naivebayesianWPstorage.php
in your plugins directory.
Enable the plugin from ‘Plugins’ menu.
Operation
The plugin is designed to augment the built in WordPress moderation system. If the plugin decides that a comment is legitimate, the comment will be checked further against WordPress’ moderation system (keyword moderation, whitelisting, open proxy checking, etc). However, if the plugin decides that the comment should go into moderation queue or marked as spam, it won’t consult the built in WordPress moderation system.
No comment will be marked as spam if it passes Bayesian test, even if it appears as spam according WordPress’ built in moderation system. I decide to do this because I had quite a few false positives in the past few weeks, which is my main motivation of writing this plugin.
When enabled, the plugin will automatically approve comments that appear legitimate (still subject to WordPress’ built in moderation) and trash ones that appear to be spam while learning about them in the process. Comments whose status is unclear whether legitimate or spam will be sent into moderation queue. Your decision on comments in moderation queue will also be learned by the plugin.
If the plugin somehow misclassifies a legitimate comment as spam or vice versa, you can correct the decision by altering their status and the plugin will learn your decision too. I recommend installing the Paged Comment Editing plugin, so that you will be able to easily reverse status of legitimate comments that have been wrongly classified as spam.
The plugin register its own options page. From there, you will be able to alter spamminess threshold, reinitialize database and make the plugin learn from past decision.
Notes
In my experience, bayesian spam filtering is not as effective as in email, probably because an email contains a lot more information than a blog comment. However it should still be very effective.
This plugin uses pieces from the PHP Naive Bayesian Filter class by Loïc d’Anterroches. However, I don’t use its classification algorithm.
I’m using a modification the original algorithm from Paul Graham’s A Plan for Spam. I decided not to use 15 most interesting tokens, but instead use cutoff points at 0.05 and 0.95 respectively. In some cases, 15 words is all the comment has.
Since there are no comments yet (which is unusual for your blog), then either your comments is not working, or you’re posting very early in the morning where everyone else is still sleeping :D
What’s the default training threshold? How many manual decisions before it triggers automatic classification?
It’s a bit unfortunate that a modification to the core WP is needed.
tadi abis sahur ada error 404 waktu mau isi komentar. jadi ga bisa nomer 1 neh. ada apa denganmu priyadi.net ? :-w
hoahm..
gag ngerti :((
Bukannya di default wordpress sudah ada seperti ini?
Jadi semua comment akan di moderate berdasarkan email, nah email yang sudah pernah di approve akan langsung masuk dan tidak di moderate lagi…
Kalo dari yang gw tangkap plugin loe ini udah terakomodasi(cie… kata2nya) dari wordpress 1.5.0
Maap kalo salah maksudnya tentang plugin ini
mau ngetest ngisi comment pake firefox 1.5 beta di macintosh. biasa kalau ngisi komen di sini luambatttt sekali
yg ini mendingan
Klo aku pake spam karma 2, dan so far menyenangkan .. operationnya bertingkat .. mulai dari simple, sampe javascript dan yang terakhir captcha .. hmm, kyknya beda tujuan :D
(sorry off topic)
amen: jadi kesimpulannya firefoxnya apa javascriptnya nih yg bikin lambat? apa mac sux seperti kata eko? hehehe.. kalo di comment di blog gua lambat ya?
#1: i’m using the original paul graham’s algorithm. that means a token will be considered only if it had occurred at least 5 times. the modification is unfortunate but required because wordpress doesn’t provide enough hooks for this to work as i intended.
#4: beda, ini bayesian spam filter, mirip dengan filter di email, jadi wordpress ‘belajar’ membedakan sebuah komentar spam atau tidak, setelah cukup ‘belajar’ dia bisa membedakan komentar itu spam atau tidak berdasarkan apa yang dia pelajari.
anjrit gue keduluan semua.
1. tadinya mau ngetest browser baru. ini sih cukup kencang…. karena gue emang pake plugin noscript :D
2. gue pake spam karma 2… dan rasanya cukup bagus.
#8
Jadi mirip ama yang disediakan wordpress.. apa tuh namanya? yang masukin kata2 yang berhubungan dengan spam.
#10: beda, kalau itu kata2nya harus kita masukin secara manual, kalau pakai plugin ini otomatis langsung ‘dipelajari’ ketika kita moderate suatu komentar
:o bisa dicoba nich
Belum faham saya :(
canggih yah….
@Priyadi: Nanya dunk, pluginnya nge bypass system anti spam built ini WP ga (kalo bisa bypass lebih bagus lagi)? Trus algoritma bayesiannya diambil dari mana? Apakah dari saat kita nginstall pluginnya ataukah dari komentar – komentar terdahulu (sebelum nginstall plugin ini) yang kita kategorikan sebagai spam?
Soalnya baruuuuuu aja kemaren ngapus comment – comment spam yang nge-clog di MySQL (baru tau… ternyata comment spam itu tetep di simpen di database walaupun udah dihapus…)
#15: gak ngebypass anti spam built-in WP, anti spam built-in WP baru ditrigger kalau komentarnya lolos uji bayesian. kalau gak mau pakai anti spam built-in tinggal dimatikan saja. gua sendiri cuma nyalain sistem whitelisting bawaan wordpress.
plugin ini bisa mempelajari spam yang lama, dan juga spam2 baru. ketika email dikategorikan sebagai spam, dia langsung pelajari bahwa itu spam (dan sebaliknya). untuk mempelajari spam2 yang lama, bisa lewat menu options.
algoritma yang dipakai: algoritma paul graham
[...] After having no luck on using WordPress built-in spam prevention system such as pre-approved comment and moderation queue based on keywords, I’ve decided on using Priyadi’s newest plugin for anti-spam for the commenting system. I have deleted all of my spam keywords on wordpress to see how tough is this plugin against the wave of spam. The edit on WordPress core fire is finished and I’ve sent the plugin to learn from my past comment to determine which is which. I also have unchecked the option that the commenters must have pre-approved comment. [...]
ntar efeknya dimana neh …apakah jika perlu moderasi? atau langsung aja submit komentarnya tanpa moderasi? mohon diberi petunjuk
#18: email yang masuk discan oleh plugin ini. jika kadar spamminessnya di bawah threshold non spam, maka ditest terhadap antispam built-in dari wordpress. jika kadar spamminessnya di atas threshold spam, maka langsung ditandai sebagai spam. jika berada di antara threshold spam dan non spam, maka masuk moderation queue.
[...] Non si può parlare di un vero e proprio plugin, dal momento che sono richieste delle modifiche ai file di sistema di WordPress, ma l’idea sembra essere buona. Come suggerito dal nome, WPBayes filtra lo spam tramite una classificazione Bayesiana, cioè a seconda delle vostre scelte precedenti, cercherà di distinguere i commenti legittimi dallo spam vero e proprio. [...]
[...]   WPBayes,使用著åçš„è´å¶æ–¯æœºåˆ¶æ¥è¿‡æ»¤åžƒåœ¾ç•™è¨€ã€‚ by Scott | posted in WordPress Plugins Trackback URL | Comment RSS Feed Tag at del.icio.us | Incoming links [...]
[...] Plus, I have replaced the WPBayes anti-spam plugin that worked fantastically for the last couple of weeks. I replaced it with Akismet, a new anti-spam plugin by the founder of WordPress. For what I read, Akismet is using a familiar method that WPBayes uses, which is a script that can learn from past mistakes. And since Akismet is a centralized anti-spam plugin, I’m pretty sure the rate of learning from past mistake is even faster than the previous plugin. Well, let’s just crosses fingers. // Used for showing and hiding user information in the comment form function ShowUtils() { document.getElementById(“authorinfo”).style.display = “”; document.getElementById(“showinfo”).style.display = “none”; document.getElementById(“hideinfo”).style.display = “”; } function HideUtils() { document.getElementById(“authorinfo”).style.display = “none”; document.getElementById(“showinfo”).style.display = “”; document.getElementById(“hideinfo”).style.display = “none”; } [...]
Whew, I just knew that the pingback from my site is jumbled like that… Well, I think it’s time to turn off the AJAX commenting system :(
[...] ì¼ë‹¨ ë°˜ì€ í•©ê²©ì ì„ ì£¼ê³ ì‹¶ë‹¤. 하지만, ì¢…ì „ì˜ 1.5 ë²„ì „ì—서 ìž‘ë™í•˜ë˜ WPBayes ì— ë¹„í•´ë©´ ì•„ì§ ì•„ì‰¬ìš´ ì ì´ ìžˆê¸°ëŠ” 하다. WPBayes ì˜ ê²½ìš°ì—”, ëª…ë°±ì´ ìŠ¤íŒ¸ìœ¼ë¡œ íŒëª…ëœ ì½”ë©˜íŠ¸ëŠ” ìžë™ìœ¼ë¡œ 스팸처리를 해버린다. 하지만 Akismet ì˜ ê²½ìš°ëŠ” 스팸으로 ì˜ì‹¬ë 경우 스팸으로 분류를 í•œë‹¤ê³ ëŠ” 하지만, ì•„ì§ì€ ëŒ€ë¶€ë¶„ì´ moderation ìƒíƒœë¡œ 들어가게 ëœë‹¤. 하루ì—ë„ ëª‡ë²ˆì”© 대쉬보드로 들어가면 ì‹ìˆ˜ê±´ì˜ ì½”ë©˜íŠ¸ë“¤ì´ ê´€ë¦¬ë¥¼ ê¸°ë‹¤ë¦¬ê³ ìžˆëŠ”ë°, ëŒ€ë¶€ë¶„ì˜ ê·¸ëŸ° ì½”ë©˜íŠ¸ë“¤ì€ ìŠ¤íŒ¸ì´ë”ë¼. [...]
Hello has someone made experiences with this plgin and wordpress 2.0?
Ada yang punya referensi algoritma bayes gak ?
maksudnya secara logika contact me markov@australia.edu
thx b4
How i cant get the plugin that shows th flag of country the browser and the so
Thanks
hi,
this plugin is very good.
how to do it with wp 2.0?
Its really work?
WordPress Trackback Spam!!!
I have installed plugins that prevent comment spams, but this won't prevent trackback to be blocked. I've been spam by many
MFA websites that most probably is from the same network with trackback, but they are not linking me on their website. May I
know how do they do it and how do I stop it? Without disabling trackback?
Thanks, and I'm using WordPress.
i dont think there is a way of stoping trackback spam
without disabling trackback
Thanks for the info. I will have to give this plugin a test drive. I just started creating wordpress blogs, and I am starting to see some spam. I will see if this will help.
:-? hmmm I was just wondering if you are using this for this blog. There seems to be a lot of garbly goop….
#33: yes, i’m using it. however, it should be noted that i’m still using an ancient WP
Hi Priyadi, I couldn’t see any instructions for use with WP 2.0. Will the extension work with WP 2.0 and is it stable?
Thanks :)
Hi, I need to make a decision on a blog spam system. Do I understand well that Spam karma works only from the data of the present installation? Or does it read data from any black list or central database?
is this for wordpress 2 compatible, and do you use it here?
comment spam filter :d
tapi kayaknya blognya mas pri gak pake plugin model ginian deh.. buktinya semua comment bisa masuk?
bener gak sih? apa aku yang gak mudeng..?
does it support wordpress 2.2?
Yo all
How I can change avatar in this forum?
mas kalo Naive Bayesian ini sama Spam Karma okean yang mana Yah buat wordpress?makaciiih.
I am not sure whether this Word press spam filter works adequately or not but i stop commenting on the wordpress blog because they maximum had installed this spam filter they never get the real bloggers to comment on their, its very painful for us.
[...] WPBayes – Implements the spam filtering with the Naive Bayesian technique, which means it marks the comments as spam or not based on your previous decisions. To be honest, I didn’t use this one [...]
[...] WPBayes – Implements the spam filtering with the Naive Bayesian technique, which means it marks the comments as spam or not based on your previous decisions. To be honest, I didn’t use this one [...]
[...] WPBayes – Implements the spam filtering with the Naive Bayesian technique, which means it marks the comments as spam or not based on your previous decisions. To be honest, I didn’t use this one [...]
[...] 7. WPBaye- ძáƒáƒšáƒ˜áƒáƒœÂ სáƒáƒ˜áƒœáƒ¢áƒ”რესáƒÂ პლáƒáƒ’ინი რáƒáƒ›áƒ”ლიც იყენებს ძáƒáƒšáƒ˜áƒáƒœÂ კáƒáƒ გáƒáƒ“ დáƒáƒ¬áƒ”რილ სკრიპტს, გáƒáƒ“áƒáƒ¬áƒ§áƒ•ეტილებებს კáƒáƒ›áƒ”ნტáƒáƒ ის დáƒáƒ¨áƒ•ებáƒ/დáƒáƒ‘ლáƒáƒ™áƒ•áƒáƒ¡áƒ—áƒáƒœ დáƒáƒ™áƒáƒ•შირებით პლáƒáƒ’ინი იღებს თქვენი წინრგáƒáƒ“áƒáƒ¬áƒ§áƒ•ეტილებებიდáƒáƒœ გáƒáƒ›áƒáƒ›áƒ“ინáƒáƒ ე. [...]
[...] WPBayes [...]
[...] WPBayes [...]
[...] WPBayes – Implements the spam filtering with the Naive Bayesian technique, which means it marks the comments as spam or not based on your previous decisions. To be honest, I didn’t use this one [...]
[...] WPBayes – Implements the spam filtering with the Naive Bayesian technique, which means it marks the comments as spam or not based on your previous decisions. To be honest, I didn’t use this one [...]
Is there any updates coming in the near future for this plugin?