Winds' Guide to Fighting Comment Spam
by Joe Katzman at January 6, 2005 12:05 AM
(posted Jan. 6, 2005; last updated June 12, 2005)
Six Apart, the folks behind the Movable Type software that runs this site, have just released a Guide for Fighting Comment spam on weblogs via comments, trackbacks, etc. As you might imagine, Jay Allen played a big role in compiling it. It's worth any blogger's time, especially those who run MT installations.
We use our own mix of techniques here at Winds of Change.NET. I'm going to go well beyond the Six Apart guide and give you some general principles for building your own blog's defenses, then move on to what we're up to so you can see some of these ideas in action. I'll conclude by talking about the source of this problem, and what can be done.
Further thoughts and suggestions will be welcome in the comments section, of course, and this post will probably evolve over time.
Some Principles of Blog Defence
I'm assuming you want to keep your comments. Even so, you may lack the resources to implement the measures Winds of Change.NET has taken. You may need something else. And like any hostile enemy, spammer tactics will change in future and you'll have to react.
So here are a few general principles to remember. Six Apart didn't include them, but they're useful as you think about securing your blog against the hostil cyber-attacks of spammers. Now that you're a security designer - or about to become one - remember that:
- There is never one silver bullet solution to ANY security problem (though bullets might work rather well when it comes to organized spammers). Security systems are all brittle. When they fail, they often fail completely.
- Indeed, Bruce Schneier (author of "Beyond Fear" - and also a blogger) notes that "the most critical aspect of a security measure is not how well it works but how well it fails". This is true. Don't depend on just one option, therefore. Have a series of overlapping point solutions that each cover part of the problem.
- Profile matters. You may not need everything, immediately. Start with a base you can maintain easily, then ramp up your investment by adding new pieces and approaches as your popularity and profile rises in the blog world and on Google et. al. Spam volume may also force your hand, of course.
- When in doubt, pick a temporary solution that will buy you the time you need to decide on and carry out more "ideal" but time-consuming approaches.
- Make sure each system is working and that you understand its ins and outs before adding the next layer.
- No single failure should compromise the normal functioning of the entire system or, worse, add to the gravity of the initial breach. For instance, imagine a system that was 100% effective at stopping comment spam, but because of the maximum burst of the attack loads on your blog, its high CPU use ends up taking your blog offline by crashing the server. What have you really protected here? You might still use this system, but rather than making it your front-line effort, make it your last line of defence and work hard to keep its load at manageable levels by placing filtering methods ahead of it.
- Human judgment is the key, and no machine process can wholly replace it. So decide where human attention can make the most difference in your system, and commit it there. Remember that tech. support demands are also a form of human attention commitment.
Which brings me to my last point, familiar to students of Eli Goldratt's Theory of Constraints for organizations:
Know where your system's bottlenecks are, and make all other decisions revolve around their limitations.
For many weblogs, the human element is the bottleneck because the authors have a very finite amount of time & attention to commit. Depending on your host, however, other bottlenecks could include CPU load (on overcrowded servers), limitations of your technical setup, etc. Figure out what the top 3-5 bottlenecks are, and rank them. Then use that ranking as a guide to all of your subsequent decisions re: improvements, defensive measures, etc.
These bottlenecks will also remind you that you can't do it all. Accept your limitations, and consciously make tradeoffs of more "blunt force" methods like closing all comments to posts over X days old if your time is really tight (The Six Apart guide has some tools for this). You'll miss some great comments - but if you don't have the time, then don't agonize. Just pay the price, know why you're paying it, and move on.
Of course, you can expand your limits by recruiting more technical members, forming affiliations, etc. If so, review your past decisions to make sure they're still what you want or need to do. Some of your old limitations may have lifted, at which point you can either fix some of those old tradeoffs, or apply the new resources to a new area if that's more productive overall.
Note the negative selection pressure this creates in the blogosphere. The spam onslaught is actually one of the reasons I believe Carnivorous Conservative was on the money with his prediction that group and federated blogs will rise in popularity as the blogosphere evolves.
What Is Winds of Change.NET Doing?
Winds has become a high-value target, so we use a number of approaches. I'm going to talk about a few:
- MT-Blacklist is one measure. As we explained in this post, MTBL 2.0 in particular can't be your only option unless your blog is quite small. It created 2 bottleneck issues for us (server CPU under attack loads, and our time/attention). In response, we've refined our system to place filters ahead of MTBL, and applied a software upgrade to help reduce the CPU problem. That helped a lot - and an after-action review even led to a decision to switch hosts.
- We held off as long as we could, but finally implemented James Seng's CAPTCHA ("type the number you see") plug-in, on display here at Andunie. Spammers can submit comment spam once the submission URL is figured out, and use new sites not on the blacklist - but this throws up a more fundamental roadblock. The downside is that it's invasive (requires back-end modifications in a few places), is a bit of a pain for our commenters (though a mild one only from early reports), and our blog became less accessible to the disabled (the morally disabled driving out the physically disabled, alas). Upside: it has been VERY effective, cutting comment spam to zero.
- Brad Choate's MT-DSBL plug-in was a way to choke off many spams sent by exploiting open proxies. This offers a point solution to the Distributed Denial of Service architecture of many spammer attacks, which rely on compromising others' computers around the net rather of launching from one easily-identified server they own. We experimented with it, and found from reader feedback it was blocking entire IP spaces of major providers. In our case, that cure was worse than the disease - which shows the importanmce of follow-up.
- In response, we briefly switched to a successor plug-in with many more features called SpamLookup. It monitors comments AND trackbacks, and has other features as well.
- The problem is that SpamLookup blocks a lot of legitimate trackbacks. While The Tweezer's Edge suggests combining a plug-in called MT-Moderate with SpamLookup, in order to ensure that trackbacks can eventually go through, we couldn't get that to work here. When we also discovered (that follow-up thing again) SpamLookup was blocking most trackbacks from Blogspot, and would not stay turned off for trackbacks... out the door it went.
- We also use a few proprietary techniques, like changing the folders in which MT et. al. are located from the standard configuration. This requires a certain degree of fiddling with MT's back end, but it has made a difference and cut MTBL's load.
- Changing the mt-comments.cgi filename, and making changes in mt.cfg, and then rebuilding the blog, also helps. You'll need FTP plus some minimal training - fortunately, it's covered in the Six Apart guide. This generally has to be done about once every 48-72 hours to have a strong effect, but even once per week is helpful. An MT plug-in that would automate these changes etc. would be a big plus, as long as the name changes were a list the bloggers could generate themselves. This human element would keep the list on any one blog from being known beforehand. Becomes less important if you have CAPTCHA.
- We considered using and requiring TypeKey for all commenters, but too many people reported inability to set up their TypeKey identity due to technical problems, etc.
Finally, we looked up the chain for additional firepower.
- Evariste of Discarded Lies had been carrying on some very interesting back-channel conversations with me about running group blogs, and his team does really good work. We've expanded our cooperation in a number of areas, and you'll see more joint features etc. in future. Evariste's role in our blog's technical improvement has also been a big plus, growing our capabilities.
- We also made broader anti-spam efforts part of our ISP evaluation. What measures does your ISP take to help protect the blogs within its space, and to advance anti-spam technology generally? Ask. We do.
Of course, we'll continue to evolve our defences, adding and subtracting based on many of the principles explained earlier in this post.
Facing the Enemy
While this problem seems to come from many directions, most of the problem is apparently the work of a small number of bad actors. We've seen this phenomenon in the email spam world before. O'Reilly's book "Spam Kings" adds more details, and even this spam map from Postini.com suggests it. I've heard a few experts opine that over 80% of today's email spam problems are the work of less than 100 bad actors when you get right down to it.
Ann Elisaberth's investigations and Teresa Neilsen Hayden's Lolita advisory suggest that a similar pattern may be at work with respect to comment spam. The blogosphere's rapid growth is also making blog spamming more and more attractive.
Personally, I'm surprised all that in-your-face porn, drugs, etc. hasn't yet been declared "un-Islamic" and a fatwa issued for the deaths of those involved. It would be the best publicity Osama et. al. could ever hope for, and a problem caused by a small circle of bad actors would be very susceptible to this solution.
That's probably too much to hope for, however, and the blog defence principles above remind us that even silver bullets are no silver bullet. So, we'll need to forge our own response.
- DSBL.org, the Distributed Sender Blackhole List, is another.
Other responses will become more and more necessary as blog readership grows and spammers become more and more sophisticated in their methods. What all share is the requirement for building first tools and information, and then assembling a larger and larger coalition to make use of the results:
- In order to get a better handle on the phenomenon's origin, vendors, bloggers, ISPs, and even related 3rd parties like DSBL.org will need to pool their investigative efforts a la the Spamhaus Project, and hopefully create a Spamdemic Map for the blogosphere.
- Because comment spam attacks are aimed much more intensively at each single site than email spams, the volume of many comment spam attacks may exceed thresholds required for Denial of Service of even DDoS attack status. If a comment spammer's attack takes a blog down, and it can be traced back to someone on the Blogosphere Spamdemic Map, it becomes possible to go after these people as criminal computer crackers. That's a much more serious offense than spamming, and gives us methods for shutting these people down that go beyond mere civil suits. We'll need to begin using them, with the cooepration of law enforcement and of other entities (like large corporations) who may also be pursuing these people.
More possibilities surely exist, and we'd love to hear your thoughts and ideas. Use the comments section to discuss:
- More principles of blog security
- Other ways of dealing with comment spam
- How we'll need to evolve long-term in order to clamp down on the small circle of comment spammers attacking our sites.
All rights reserved. This article can be found on the Internet at:
http://www.windsofchange.net/archives/winds_guide_to_fighting_comment_spam.php
Persons wishing to contact the author of this article for reprints etc. should put a request in the Comments section, or send an email to "joe", over here @windsofchange.net.