IP Blocking is Cool
Since I moved this blog to WordPress from Blogger, I have had 467 real comments and 44,379 spam comments. Everyone of the 467 is precious and it is worth dealing with the forty thousand robot comments for those. In the time it took me to write that last sentence, it has now gone up to 44,389.
This latter number would be far higher but for the fact that I have set up my blog to automatically block, at the Apache level, any IP address that is responsible for more than one spam comment.
This measure is surprisingly effective. The behaviour of spammers is that they notice my blog exists and then shoot several dozen (sometimes hundreds of) spam comments from the same IP address. Since setting up this measure, I have blocked well over 12,000 IP addresses. I'll save my bandwidth for someone special, you the reader.
One interesting idea would be to study this list by doing reverse IP lookups to plot these computers by location, operating system and so on. Most that I have looked up myself have appeared to be zombie Windows-based PCs on normal commercial ISPs.
After all this, there is still the odd spam that gets through, roughly 0.001% or so of attempted spam comments make it through my current set of tests, i.e. one or two per week. At some point they just use humans, there is little I can do about that.
My process has been to queue all surviving comments, just in case. So I can weed out this small number of comments. Normally I approve real comments in a matter of hours. This is still a bit annoying however to introduce the time delay.
Captchas are not the answer
Even worse are captchas (little games that make you guess the characters), I hate trying to fill them out myself. I also use the text-only elinks browser quite a lot, and captchas will not work with that. Please type out the current picture:
Furthermore, captchas also are becoming less effective at their core aim of distinguishing a human from a machine, as it all descends into an arms race. Computer software is becoming better at beating them, while captchas are becoming more complex in response, meaning that humans have more trouble. The other day, someone's capacha was so obscured that it took me six tries to leave a comment on a blog, I won't bother to post there again.
Innocent until proven guilty
I do not make want to make it like Indiana Jones and the Last Crusade just to make a comment: roll under a flying saw, spell something in Hebrew and then take a flying leap into the abyss. Furthermore moderation of spam is my problem, not yours. It is already inconvenient to make comments, making it harder to make comments is bad bad news.
The ancient Greek philosophers and the Old Testament argue that it is better that any number of guilty people go free rather than one innocent man be punished.
This week I have tried something different, I have allowed all comments that do not have too much in the way of hyperlinks to automatically go live. Yes there will be the odd spam, but I will try to squish them quick.
Although it is handy sometimes to be able to carry on the conversation privately, for example, once I was helping someone install Gentoo; 90% of the time I do not really care that much what your email addresses are, but getting it in twice means we are different than other Wordpress blogs and so we break some spammers' scripts. This is dull though so it might be worth getting rid of the email field entirely.
In the long run I might write my own blog software, then I would do comments very differently indeed. It would be a little more like a wiki and allow you to annotate and change the text itself, using different colours and layers, etc.
Teletubbies need not apply
This is not a blog which attracts small children, it is about taking control of your own technology, as well as ethics, Linux, Bash, Python and so on. None of these are hot topics in the primary school age bracket. Children should not be on the Internet unsupervised anyway. If you let your small kiddy browse the web alone then you are at best an idiot.
Therefore, the consequences of having the odd spam online for 10 minutes is not all that great. So I am going to try it without the cotton wool for a while longer and see how we get on.
What would you do?
What is your take on this? Have you dealt with this problem yourself somewhere? Do let me know.
<dl class="docutils">
<dt>44,389? Your blog must be a lot more popular than mine, or you must be</dt>
<dd>really unlucky. I've had 4300 spams and around 380 legit comments since
installing Wordpress. Akismet captures and automatically removes
essentially all of the spam. It's very rare even for one to make it into
the moderation queue (showing that Akismet thinks it's spam but isn't
100% sure). I don't have any other anti-spam tools running but that. No
captchas and not even a required email address. Not sure what I'd do if I
was facing that kind of spam volume.</dd>
</dl>
<p>I like using little logic puzzles instead of captchas, like a simple math
problem or "Which of these pictures is a bunny?" sort of thing. If you were
inventive you could make it actually a bit of fun. I share your distaste of
captchas. On some message boards it takes me a dozen tries to make a post.</p>
<p>Hi Brian, I like your blog design by the way!</p>
<p>> Your blog must be a lot more popular than mine, or you must be really
unlucky.</p>
<p>I have quite a lot of people turn up but a very low percentage of them leave
comments. I do not know what I am doing wrong :-( but recently I am trying to
encourage you all to leave comments.</p>
<p>If it was about politics or something then I would get more perhaps. Of
course, I did lose the old comments due to moving from blogger to wordpress.</p>
<p>So I think partly it has to do with the nature of the material and the fact
that most readers seem to be just passing through. They read my blog within a
'Planet' or other aggregator or have come via Google, searching for the
answer to some technical problem.</p>
<p>Thanks for liking my blog layout. I try to leave comments when I have
anything interesting to say. I think most sites have a majority of people
lurking and very few who participate so it's not just yours. (I read a
statistic about it somewhere, so it must be true.) Personally I read your
blog via Planet Larry.</p>
<p>Quality over quantity though, that's the way to go. Keep up the posts, they
are good reading.</p>
<p>"moderation of spam is my problem, not yours."</p>
<p>Thank you for saying that. I generally abhor the "modern solutions", since
adding tons of scripting and cookies makes me cringe. And I've written as a
web designer about how much I hate modern comment forms and their extreme
lack of usability.</p>
<p>Sadly, we're too sensible for the modern web. I just use Akismet and hope for
the best, since I am not "blogging for points", simply to document things.
But I never force the user to use cookies or a login, and if I ever stoop to
using a CAPTCHA I have a small army of friends who will hurt me until I
realise the error of my ways.</p>
<p>Among the most effective approaches I've seen were the simplest ones; forced
previews, time limits between posts, and so on.</p>
<p>I've been wondering how spam bots harvest forms on a page, and whether they
ignore some forms. If they do, we could set up a nonsense-like one that sends
the legit comments to a weird server and bounces back to our real one. If the
bots do not, we could instead just have two forms, and if both are used we
ignore both comments and blacklist the IP for a time. Thoughts?</p>
<p>Zeth,
The comment thing really gets to me too. I hate not being able to leave the
comment box free so that two people visiting the site simultaneously can
discuss my posts. Unfortunately I've not been too active recently (and moving
my blog from a dynamic to static IP address seems to have reduced the spam) -
so since installing askimet on wordpress I've not had more that 79 spam.</p>
<p>I don't know what I'd do if I were faced with the levels of spam that you
have - it must be incredibly frustrating. I have started to use the openID
plugin on my blog - it means that anyone who's already posted won't double
post under different nicknames, and therefore their comments go live straight
away. It was pretty easy to install too - the actual openID site I also run
myself and it's nothing more than a (very) short php script.</p>
<p>@Brian, Planet Larry is really cool, it is nice to have fellow Gentooers
visit.</p>
<p>@BTreeHugger I have been thinking about the "Spam Trap" approach since
yesterday since I added the extra email field.</p>
<p>'Spam Traps' are ways to try and divert the spam through hidden Javascript or
whatever. They are very effective at distinguishing between a modern
graphical web browser like Firefox and a spam machine. However, they normally
have lots of false positives:
people using text-only browsers,
people with Javascript turned off
people using visually impaired technologies
people with certain Windows security technologies</p>
<p>This is both making the spam the visitor's problem (client side Javascript),
as well as punishing the innocent. However, if we can make the system recover
gracefully in case of false positives, then this might be the way to go.</p>
<p>@Andy, in the last week I have moved from a dynamic to static IP address. We
will see if this makes a difference. I suspect they are following URLs
though, and cool URLs do not change.</p>
<p>However, spammers often comment on the most linked to posts, i.e. the highest
Google ranked posts, rather than to the most recent posts. So one idea is to
treat comments on posts older than a month differently.</p>
<p>Well, I used too many news systems. I never wanted to have a 'blog' so I just
used a stand alone news systems. For that reason I never bothered with
Wordpress.
It took me quite a time to find a good usable news system that produces a
valid HTML code. But after a while, I found the hard way that it gets spam,
spam that I don't want. So I looked again for a news system. For a while it
was clean, I guess the bots didn't know about the new system. The 2nd news
system wasn't protected at all, and I had posted a digg link. Which then I
ended up with 80 spam comments a day. Not fun at all [Because I didn't have a
DELETE ALL comments button].
After a while, I just decided I can't stand it any longer and just coded a
system myself. So now I use my own made news system. Again, for a while, the
bots tried to understand the news system change. After some more time. 1
single spam comment entered. So I just ended up coding an hidden input field,
which was great for getting rid of the stupid bots.</p>
<p>Though, the bots learned the lesson, I then received about 20 or so spam
comments. Cleaned 'em all and made the math question. The trickiest part of
the my math question is the fact, the input box, have something that is
totally not related inside it. What do you think a bot will type if he has a
label that says Whats the colour of the sky, while the input box default
value is pie? So this seems to kill the spam. I didn't bother with IP Banning
yet, or setting a blocked spam count, though I think I might make a count,
just to see how much it kills <img src="/static/forum/img/smilies/smile.png">, so I'll know if it's because they try or
because they just decided to skip on my site.</p>
<p>I'm totally against Captchas, though I do have one on my webbie in the
contact sector. Just because I'm too lazy to edit the code [It was something
premade I took. Haven't touched it for years, not sure it even sends to the
correct E-Mail XD].</p>
<p>@Brian: The bunny idea is nice, but I like to surf using w3m sometimes,
therefor, I don't like image tests in general, even if it's not as evil as
captchas.</p>
<p>You must have quite a high Google ranking, Zeth - I find that the higher
ranked on Google you are, the more spam you get :-/ I usually get around 50
or so spam comments a day, although fortunately Akismet filters most of these
out.</p>
<p>What I have done on occasion is just disable comments for a particular post
which seems to be generating a lot of traffic from spammers (it's usually an
old one so doesn't really do any harm to disable comments).</p>
<p>I don't think CAPTCHAs are worth the bandwidth to be honest, spam bots are
getting so much smarter that I don't think it's worth it.</p>
<p>How do you block IP adresse in Apache level ?</p>
<p>Thanks !</p>
<p>Easy just add lines to your .htaccess file. E.g.</p>
<p>Deny from 81.26.51.108</p>
<p>will deny from that IP Address.</p>
<p>Though, by the you kill the legitimate PC user from visiting the site.
Also, think about the dynamic IP's issue.</p>
<p>I agree with what bug said... I suppose what you could do instead is redirect
them to a URL which says "Your IP Address is blocked. To unblock, please
contact..." - although I suspect for the majority of web users they'd just
skip past as they couldn't be bothered to get themselves unblocked.</p>
<p>Well if someone's Windows PC has been hijacked and is now part of a botnet,
they no doubt have rather more pressing problems than not being able to read
my blog.</p>
<p>I found just having a hidden field in the form to do the trick.</p>
<ol class="arabic">
<li><dl class="first docutils">
<dt>Hidden field that the user can not obviously fill out (can't fill it</dt>
<dd><p class="first last">out if you can't see it)</p>
</dd>
</dl>
</li>
<li><p class="first">Bot will fill it out because they just read the html</p>
</li>
<li><dl class="first docutils">
<dt>Upon submit- if that field is filled out...you know a bot filled it</dt>
<dd><p class="first last">out. Ignore/ban.</p>
</dd>
</dl>
</li>
</ol>