Tags

, ,

It seems like WordPress may have lost the war on comment spam. Up until early this year the spam was quite well filtered, Akismet was doing a good job. Then slowly undetected comment spam started leaking through. It was only a couple at first, but now it’s several per day. The absolute rate may not seem that high, but given my normal level of comment activity it causes a problem. It’s very likely now that I’m missing some genuine comments.

I noticed a strong pattern in the recent spam: they are all rather complimentary. They tend to say nice things about the website, playing to my vanity I guess. All of them are referrer spam: the poster’s website is hawking some garbage product or sometimes a link to a YouTube video.

You’re so cool! I don’t think I have read something like this before. So wonderful to find another person with unique thoughts on this issue. Really.. thank you for starting this up. This website is one thing that is needed on the internet, someone with a little originality!

Generic, sometimes correct

Most of the comments are generic. They do reference the content, but never any specifics, only in a vague way.

Its like you read my mind! You appear to know a lot about this, like you wrote the book in it or something. I think that you can do with a few pics to drive the message home a little bit, but instead of that, this is great blog. An excellent read. I will certainly be back.

The above is one of the better phrased examples. It reads like a proper comment and the grammar and spelling are okay. The constructive criticism is also a nice touch. It feels a bit more genuine that most of the comments which are strictly positive.

It’s quite the contrast from the below.

Your mode of telling all in this piece of writing is in fact pleasant, all be able to without difficulty be aware of it, Thanks a lot.

If a spam filter had a language parser it should be able to pick out these types of comments and discard them. It wouldn’t even need a grammar checker, just attempt to parse a syntax tree. The first phrase would actually parse, but it contains a lot of junk, whereas I still can’t figure out a way to parse the second phrase.

These comments are interesting since it means that simple language parsing would be insufficient to filter out spam. Even though they are still rough it wouldn’t take much to have a generator produce cleaner sentences. Genuine comments often make more mistakes, so the filter couldn’t be too strict anyway.

Meta-spam and appeal

I think my favourite comment spam are ones like below: audacious comments requesting how to block comment spam.

Greetings! I know this is kinda off topic but I was wondering if you knew where I could locate a captcha plugin for my comment form? I’m using the same blog platform as yours and I’m having difficulty finding one? Thanks a lot!

Hi there, i read your blog occasionally and i own a similar one and i was just curious if you get a lot of spam remarks? If so how do you stop it, any plugin or anything you can advise? I get so much lately it’s driving me insane so any assistance is very much appreciated.

This form is interesting since I suspect it works often. It appeals to my vanity and also capitalizes on an actual issue I’m facing on the WordPress platform. The give-away of course is the “same blog platform” comment. Why would they not say WordPress if they know what platform I’m using. Of course, my brain doesn’t work that way and on first read it seems legit.

Does your website have a contact page? I’m having problems locating it but, I’d like to send you an email. I’ve got some ideas for your blog you might be interested in hearing. Either way, great blog and I look forward to seeing it expand over time.

This comment I think works even more often than the previous. WordPress doesn’t have contact pages by default. I’m not sure that a lot of blog authors are even aware of that. I have however added a contact form, and it’s relatively easy to find. To me the comment is nonsense, but in the general case it probably works often.

The problem with this spam is that many authors may choose to accept it. From my experience with filtering, having these “likes” on the comments improves their score. Thus even if a system decides for other reasons it is spam, the user ranking pushes it towards ham.

Referrers

Hello! I’ve been reading your site for a while now and finally got the courage to go ahead and give you a shout out from Lubbock Texas! Just wanted to mention keep up the excellent work!

The spam in these comments comes from the user’s homepage. The comment system allows a random URL to be used. In many cases the link is directly to a crappy product page, so I’m actually surprised that they make it through at all.

Other links are to places that serve as legitimate home pages, like other blogs, or a YouTube channel. For the few I checked, they are still obviously spam, but given their quantity and uniqueness it might be harder to detect.

If the link were simply removed from the comments the spam problem would go away. Unfortunately that would also ruin part of the blogging system. It is valuable to have links to users who post intelligent commentary. It’s often how I find other content, and what builds a community.

Is there a solution?

At the moment I’m just upset with WordPress, and I guess Akismet as they’re the provider. Some part of me doesn’t really care about the reasons, only that spam isn’t being filtered. For all the non-technical bloggers it’s probably the only way the issue is seen. My deeper interest in the spam comes from wishing to find a solution, and understand at a technical level what the problem is.

The language aspect is perhaps the most interesting. Could comments be parsed, interpreted, and checked in relation to the content of the article? This would certainly be a huge step. But it might make things worse. I imagine that it wouldn’t be too hard for a spammer to use a similar tool, parse my content, and generate what seems like relevant commentary.

Ultimately I don’t think anything short of a reputation system for individuals is going to stop the spam issue. Unless a comment comes from a reputable person, it won’t make it through. The biggest issue here is that such a system should not filter critical remarks, only spam, and perhaps trolls. I’m not clear on how we’d prevent a Windows user from losing their reputation though do to virus bots sending on their behalf. I at least wish somebody would make an honest effort in this direction to see what would happen.

I’ll end with one more comment, just to stoke my pride a bit.

I am in fact pleased to glance at this webpage posts which consists of plenty of useful facts, thanks for providing these information.

Advertisements