Programming

Another Spam Loser: WordPress and Akismet

It seems like WordPress may have lost the war on comment spam. Up until early this year the spam was quite well filtered, Akismet was doing a good job. Then slowly undetected comment spam started leaking through. It was only a couple at first, but now it’s several per day. The absolute rate may not seem that high, but given my normal level of comment activity it causes a problem. It’s very likely now that I’m missing some genuine comments.

I noticed a strong pattern in the recent spam: they are all rather complimentary. They tend to say nice things about the website, playing to my vanity I guess. All of them are referrer spam: the poster’s website is hawking some garbage product or sometimes a link to a YouTube video.

You’re so cool! I don’t think I have read something like this before. So wonderful to find another person with unique thoughts on this issue. Really.. thank you for starting this up. This website is one thing that is needed on the internet, someone with a little originality!

Generic, sometimes correct

Most of the comments are generic. They do reference the content, but never any specifics, only in a vague way.

Its like you read my mind! You appear to know a lot about this, like you wrote the book in it or something. I think that you can do with a few pics to drive the message home a little bit, but instead of that, this is great blog. An excellent read. I will certainly be back.

The above is one of the better phrased examples. It reads like a proper comment and the grammar and spelling are okay. The constructive criticism is also a nice touch. It feels a bit more genuine that most of the comments which are strictly positive.

It’s quite the contrast from the below.

Your mode of telling all in this piece of writing is in fact pleasant, all be able to without difficulty be aware of it, Thanks a lot.

If a spam filter had a language parser it should be able to pick out these types of comments and discard them. It wouldn’t even need a grammar checker, just attempt to parse a syntax tree. The first phrase would actually parse, but it contains a lot of junk, whereas I still can’t figure out a way to parse the second phrase.

These comments are interesting since it means that simple language parsing would be insufficient to filter out spam. Even though they are still rough it wouldn’t take much to have a generator produce cleaner sentences. Genuine comments often make more mistakes, so the filter couldn’t be too strict anyway.

Meta-spam and appeal

I think my favourite comment spam are ones like below: audacious comments requesting how to block comment spam.

Greetings! I know this is kinda off topic but I was wondering if you knew where I could locate a captcha plugin for my comment form? I’m using the same blog platform as yours and I’m having difficulty finding one? Thanks a lot!

Hi there, i read your blog occasionally and i own a similar one and i was just curious if you get a lot of spam remarks? If so how do you stop it, any plugin or anything you can advise? I get so much lately it’s driving me insane so any assistance is very much appreciated.

This form is interesting since I suspect it works often. It appeals to my vanity and also capitalizes on an actual issue I’m facing on the WordPress platform. The give-away of course is the “same blog platform” comment. Why would they not say WordPress if they know what platform I’m using. Of course, my brain doesn’t work that way and on first read it seems legit.

Does your website have a contact page? I’m having problems locating it but, I’d like to send you an email. I’ve got some ideas for your blog you might be interested in hearing. Either way, great blog and I look forward to seeing it expand over time.

This comment I think works even more often than the previous. WordPress doesn’t have contact pages by default. I’m not sure that a lot of blog authors are even aware of that. I have however added a contact form, and it’s relatively easy to find. To me the comment is nonsense, but in the general case it probably works often.

The problem with this spam is that many authors may choose to accept it. From my experience with filtering, having these “likes” on the comments improves their score. Thus even if a system decides for other reasons it is spam, the user ranking pushes it towards ham.

Referrers

Hello! I’ve been reading your site for a while now and finally got the courage to go ahead and give you a shout out from Lubbock Texas! Just wanted to mention keep up the excellent work!

The spam in these comments comes from the user’s homepage. The comment system allows a random URL to be used. In many cases the link is directly to a crappy product page, so I’m actually surprised that they make it through at all.

Other links are to places that serve as legitimate home pages, like other blogs, or a YouTube channel. For the few I checked, they are still obviously spam, but given their quantity and uniqueness it might be harder to detect.

If the link were simply removed from the comments the spam problem would go away. Unfortunately that would also ruin part of the blogging system. It is valuable to have links to users who post intelligent commentary. It’s often how I find other content, and what builds a community.

Is there a solution?

At the moment I’m just upset with WordPress, and I guess Akismet as they’re the provider. Some part of me doesn’t really care about the reasons, only that spam isn’t being filtered. For all the non-technical bloggers it’s probably the only way the issue is seen. My deeper interest in the spam comes from wishing to find a solution, and understand at a technical level what the problem is.

The language aspect is perhaps the most interesting. Could comments be parsed, interpreted, and checked in relation to the content of the article? This would certainly be a huge step. But it might make things worse. I imagine that it wouldn’t be too hard for a spammer to use a similar tool, parse my content, and generate what seems like relevant commentary.

Ultimately I don’t think anything short of a reputation system for individuals is going to stop the spam issue. Unless a comment comes from a reputable person, it won’t make it through. The biggest issue here is that such a system should not filter critical remarks, only spam, and perhaps trolls. I’m not clear on how we’d prevent a Windows user from losing their reputation though do to virus bots sending on their behalf. I at least wish somebody would make an honest effort in this direction to see what would happen.

I’ll end with one more comment, just to stoke my pride a bit.

I am in fact pleased to glance at this webpage posts which consists of plenty of useful facts, thanks for providing these information.

Categories: Programming, Use Case

Tagged as: , ,

8 replies »

  1. Would you rather have to scan your spam queue for the false positives found by an aggressive filter or scan your approved queue (which one has to do anyway) for the false negatives allowed through by a less aggressive filter?

    As it stands (less aggressive) I have a high confidence anything in the spam queue is junk, so I can delete it in bulk without checking. Since I read all approved comments anyway, spotting the occasional ‘false flag’ seems less onerous.

    it’s the classic security threshold trade-off!

    • It’s no longer occassional though. The WordPress quick review mechanism (Notifications) is effectively broken for me. I click the popup showing comments and it is all spam. I must mark them all and reload the page to get the next set of comments. If I go a few days without reviewing then I have to do this a few times before the queue is cleared.

      It also means I reply to comment slower since I can no longer respond to the email notices. They are triggered 99% of the time from a spam comment now.

    • Wow, that sucks. At least on that account, I’m glad my blogs are off in a quiet corner. I almost never get false negatives, and many of them — by the time I get to the comments section — have already been recognized as spam and moved for me.

      Is it any better or different if you use the Comments area of the Dashboard?

    • I’ll probably have to start using the comments sections. At least there I can see the whole comment at a glance, source, and easily scroll.

      My problem really started this year. I think it has to do with certain types of articles and where they become popular. Some spammer bot must have flagged me incorrectly as a popular site — perhaps I should just improve their algorithm for them instead. :)

  2. Thanks. I just got my first one of these spam comments and was a bit perplexed. I did a web search for the phrase “using the same blog platform as yours” and got 5 million results! So I knew not to approve the comment. This blog post was search result #35 or so, and now it has all become clear to me what’s going on.

    • I’m having trouble deciding if this is a legitimate reply or a much improved comment bot. :)

      Glad I cold help. And yes, that phrase seems endemic to spam comments now.

    • Well there you go, that’s your technical solution. Just do a damned Google search and if the same phrase (or very similar) turn up millions of hits, it’s spam whether dumb pressers mark them as ham or not. Also, articles with referral links should be checked to see what content they point to. All things that aren’t difficult to do. These are not clever linguistic masterpieces that subtly troll. They’re just badly disguised spam. I can equally copy some book passage with perfectly non-spam content and attach a referral link, same thing. Bottom line it’s shoddy Akismet at fault.

  3. There are better anti-spam systems than Akismet. Look at WP SpamShield and Antispam Bee for just two examples. They’re both lightweight, don’t require endless processing through an outside server, and 99.9% effective against spam. They are also CAPTCHA-less. Best of all, they’re also free.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s