Winds of Change.NET: Liberty. Discovery. Humanity. Victory.

Formal Affiliations
  • Anti-Idiotarian Manifesto
  • Euston Democratic Progressive Manifesto
  • Real Democracy for Iran!
  • Support Denamrk
  • Million Voices for Darfur
  • milblogs
Syndication
 Subscribe in a reader

Winds' Guide to Fighting Comment Spam

| 15 Comments | 6 TrackBacks

(posted Jan. 6, 2005; last updated June 12, 2005)

Six Apart, the folks behind the Movable Type software that runs this site, have just released a Guide for Fighting Comment spam on weblogs via comments, trackbacks, etc. As you might imagine, Jay Allen played a big role in compiling it. It's worth any blogger's time, especially those who run MT installations.

We use our own mix of techniques here at Winds of Change.NET. I'm going to go well beyond the Six Apart guide and give you some general principles for building your own blog's defenses, then move on to what we're up to so you can see some of these ideas in action. I'll conclude by talking about the source of this problem, and what can be done.

Further thoughts and suggestions will be welcome in the comments section, of course, and this post will probably evolve over time.

Some Principles of Blog Defence

I'm assuming you want to keep your comments. Even so, you may lack the resources to implement the measures Winds of Change.NET has taken. You may need something else. And like any hostile enemy, spammer tactics will change in future and you'll have to react.

So here are a few general principles to remember. Six Apart didn't include them, but they're useful as you think about securing your blog against the hostil cyber-attacks of spammers. Now that you're a security designer - or about to become one - remember that:

  1. There is never one silver bullet solution to ANY security problem (though bullets might work rather well when it comes to organized spammers). Security systems are all brittle. When they fail, they often fail completely.

  2. Indeed, Bruce Schneier (author of "Beyond Fear" - and also a blogger) notes that "the most critical aspect of a security measure is not how well it works but how well it fails". This is true. Don't depend on just one option, therefore. Have a series of overlapping point solutions that each cover part of the problem.

  3. Profile matters. You may not need everything, immediately. Start with a base you can maintain easily, then ramp up your investment by adding new pieces and approaches as your popularity and profile rises in the blog world and on Google et. al. Spam volume may also force your hand, of course.

  4. When in doubt, pick a temporary solution that will buy you the time you need to decide on and carry out more "ideal" but time-consuming approaches.

  5. Make sure each system is working and that you understand its ins and outs before adding the next layer.

  6. No single failure should compromise the normal functioning of the entire system or, worse, add to the gravity of the initial breach. For instance, imagine a system that was 100% effective at stopping comment spam, but because of the maximum burst of the attack loads on your blog, its high CPU use ends up taking your blog offline by crashing the server. What have you really protected here? You might still use this system, but rather than making it your front-line effort, make it your last line of defence and work hard to keep its load at manageable levels by placing filtering methods ahead of it.

  7. Human judgment is the key, and no machine process can wholly replace it. So decide where human attention can make the most difference in your system, and commit it there. Remember that tech. support demands are also a form of human attention commitment.

Which brings me to my last point, familiar to students of Eli Goldratt's Theory of Constraints for organizations:

Know where your system's bottlenecks are, and make all other decisions revolve around their limitations.

For many weblogs, the human element is the bottleneck because the authors have a very finite amount of time & attention to commit. Depending on your host, however, other bottlenecks could include CPU load (on overcrowded servers), limitations of your technical setup, etc. Figure out what the top 3-5 bottlenecks are, and rank them. Then use that ranking as a guide to all of your subsequent decisions re: improvements, defensive measures, etc.

These bottlenecks will also remind you that you can't do it all. Accept your limitations, and consciously make tradeoffs of more "blunt force" methods like closing all comments to posts over X days old if your time is really tight (The Six Apart guide has some tools for this). You'll miss some great comments - but if you don't have the time, then don't agonize. Just pay the price, know why you're paying it, and move on.

Of course, you can expand your limits by recruiting more technical members, forming affiliations, etc. If so, review your past decisions to make sure they're still what you want or need to do. Some of your old limitations may have lifted, at which point you can either fix some of those old tradeoffs, or apply the new resources to a new area if that's more productive overall.

Note the negative selection pressure this creates in the blogosphere. The spam onslaught is actually one of the reasons I believe Carnivorous Conservative was on the money with his prediction that group and federated blogs will rise in popularity as the blogosphere evolves.

What Is Winds of Change.NET Doing?

Winds has become a high-value target, so we use a number of approaches. I'm going to talk about a few:

  • MT-Blacklist is one measure. As we explained in this post, MTBL 2.0 in particular can't be your only option unless your blog is quite small. It created 2 bottleneck issues for us (server CPU under attack loads, and our time/attention). In response, we've refined our system to place filters ahead of MTBL, and applied a software upgrade to help reduce the CPU problem. That helped a lot - and an after-action review even led to a decision to switch hosts.
  • We held off as long as we could, but finally implemented James Seng's CAPTCHA ("type the number you see") plug-in, on display here at Andunie. Spammers can submit comment spam once the submission URL is figured out, and use new sites not on the blacklist - but this throws up a more fundamental roadblock. The downside is that it's invasive (requires back-end modifications in a few places), is a bit of a pain for our commenters (though a mild one only from early reports), and our blog became less accessible to the disabled (the morally disabled driving out the physically disabled, alas). Upside: it has been VERY effective, cutting comment spam to zero.
  • Brad Choate's MT-DSBL plug-in was a way to choke off many spams sent by exploiting open proxies. This offers a point solution to the Distributed Denial of Service architecture of many spammer attacks, which rely on compromising others' computers around the net rather of launching from one easily-identified server they own. We experimented with it, and found from reader feedback it was blocking entire IP spaces of major providers. In our case, that cure was worse than the disease - which shows the importanmce of follow-up.
  • In response, we briefly switched to a successor plug-in with many more features called SpamLookup. It monitors comments AND trackbacks, and has other features as well.
  • The problem is that SpamLookup blocks a lot of legitimate trackbacks. While The Tweezer's Edge suggests combining a plug-in called MT-Moderate with SpamLookup, in order to ensure that trackbacks can eventually go through, we couldn't get that to work here. When we also discovered (that follow-up thing again) SpamLookup was blocking most trackbacks from Blogspot, and would not stay turned off for trackbacks... out the door it went.
  • We also use a few proprietary techniques, like changing the folders in which MT et. al. are located from the standard configuration. This requires a certain degree of fiddling with MT's back end, but it has made a difference and cut MTBL's load.
  • Changing the mt-comments.cgi filename, and making changes in mt.cfg, and then rebuilding the blog, also helps. You'll need FTP plus some minimal training - fortunately, it's covered in the Six Apart guide. This generally has to be done about once every 48-72 hours to have a strong effect, but even once per week is helpful. An MT plug-in that would automate these changes etc. would be a big plus, as long as the name changes were a list the bloggers could generate themselves. This human element would keep the list on any one blog from being known beforehand. Becomes less important if you have CAPTCHA.
  • We considered using and requiring TypeKey for all commenters, but too many people reported inability to set up their TypeKey identity due to technical problems, etc.

Finally, we looked up the chain for additional firepower.

  • Evariste of Discarded Lies had been carrying on some very interesting back-channel conversations with me about running group blogs, and his team does really good work. We've expanded our cooperation in a number of areas, and you'll see more joint features etc. in future. Evariste's role in our blog's technical improvement has also been a big plus, growing our capabilities.
  • We also made broader anti-spam efforts part of our ISP evaluation. What measures does your ISP take to help protect the blogs within its space, and to advance anti-spam technology generally? Ask. We do.

Of course, we'll continue to evolve our defences, adding and subtracting based on many of the principles explained earlier in this post.

Facing the Enemy

While this problem seems to come from many directions, most of the problem is apparently the work of a small number of bad actors. We've seen this phenomenon in the email spam world before. O'Reilly's book "Spam Kings" adds more details, and even this spam map from Postini.com suggests it. I've heard a few experts opine that over 80% of today's email spam problems are the work of less than 100 bad actors when you get right down to it.

Ann Elisaberth's investigations and Teresa Neilsen Hayden's Lolita advisory suggest that a similar pattern may be at work with respect to comment spam. The blogosphere's rapid growth is also making blog spamming more and more attractive.

Personally, I'm surprised all that in-your-face porn, drugs, etc. hasn't yet been declared "un-Islamic" and a fatwa issued for the deaths of those involved. It would be the best publicity Osama et. al. could ever hope for, and a problem caused by a small circle of bad actors would be very susceptible to this solution.

That's probably too much to hope for, however, and the blog defence principles above remind us that even silver bullets are no silver bullet. So, we'll need to forge our own response.

  • DSBL.org, the Distributed Sender Blackhole List, is another.

Other responses will become more and more necessary as blog readership grows and spammers become more and more sophisticated in their methods. What all share is the requirement for building first tools and information, and then assembling a larger and larger coalition to make use of the results:

  • In order to get a better handle on the phenomenon's origin, vendors, bloggers, ISPs, and even related 3rd parties like DSBL.org will need to pool their investigative efforts a la the Spamhaus Project, and hopefully create a Spamdemic Map for the blogosphere.
  • Because comment spam attacks are aimed much more intensively at each single site than email spams, the volume of many comment spam attacks may exceed thresholds required for Denial of Service of even DDoS attack status. If a comment spammer's attack takes a blog down, and it can be traced back to someone on the Blogosphere Spamdemic Map, it becomes possible to go after these people as criminal computer crackers. That's a much more serious offense than spamming, and gives us methods for shutting these people down that go beyond mere civil suits. We'll need to begin using them, with the cooepration of law enforcement and of other entities (like large corporations) who may also be pursuing these people.

More possibilities surely exist, and we'd love to hear your thoughts and ideas. Use the comments section to discuss:

  • More principles of blog security
  • Other ways of dealing with comment spam
  • How we'll need to evolve long-term in order to clamp down on the small circle of comment spammers attacking our sites.

6 TrackBacks

Tracked: January 6, 2005 4:11 AM
Ghost of a Flea from andunie.net
Excerpt: Over the past few days, I’ve been installing upgrades over at Nicholas Packwood’s Ghost of a Flea, which is one of the leading lights of the Canadian blogosphere. (And with traffic to match: last month about 40x what I received...
Tracked: January 6, 2005 5:31 AM
Excerpt: Recommended reading: Six Apart Guide to Combatting Comment Spam. This is direct from the makers of Movable Type. Because I'm such an opinionated fellow, I'm gonna take issue with the recommendations in their guide. I suggest you use their page as an ov...
Tracked: January 7, 2005 1:53 AM
natural selections from evolution
Excerpt: I'm pretty worn out after wrangling freshmen today, so here's some natural selections: If you're a blogger (and particularly if you use Movable Type), you should check out the Winds of Change guide to fighting comment spam. Arthur Chrenkoff fin...
Tracked: January 7, 2005 4:43 AM
Fighting spam from Munuviana
Excerpt: The Six Apart team put together a guide to fighting comment spam; Joe at Winds of Change has even more ideas on the topic. Personally the shutting old comments after 30 days approach has worked wonders, but at a cost...
Tracked: March 21, 2005 4:24 PM
Ghost of a Flea from Andúnië
Excerpt: Over the past few days, I’ve been installing upgrades over at Nicholas Packwood’s Ghost of a flea, which is one of the leading lights of the Canadian blogosphere. (And with traffic to match: last month about 40x what I received...
Tracked: May 29, 2005 12:30 PM
Winds' Guide to Fighting Comment Spam from Winds of Change.NET
Excerpt: Six apart has a good guide. Winds adds some general principles of blog defence, talks about our own measures, and concludes by talking about the source of this comment problem and what can be done.

15 Comments

You might want to baton down the hatches. My site has been hit pretty hard today, and it looks like LGF, Discarded Lies and Jihad Watch are down (besides Eurabian Times).

I'm so glad to see you bring this to the forefront.

I'm a Wordpress user, myself. There are a number of great plugins to deter or halt comment spam. On my own site, evolution, I went ahead and implemented a "captcha"-style filter and a filter based on Wordpress's built-in comment moderation system. That put a halt to 99% of my comment spam, which, astonishingly enough (or perhaps not, in the light of your post) came from one source promoting a certain online card game.

Now, spammers have turned to Trackback spam, as the Wordpress hasn't really figured out how to deal with that yet. I have deleted over 70 highly disturbing spam Trackbacks (I will say nothing about their content; suffice it to say that you don't want to be associated with this material), and I am not along among Wordpress users.

I'm fairly computer-savvy but my programming knowledge is dated and my familiarity with PHP is minimal. You are right; we do need to band together as bloggers and fight this.

That's one reason I don't have comments. Well, at least not at the Needle.
I just visited j.d. and his system seems very functional. His blog is great, I think we may be isomorphic. :)

It looks like it was a Hostmatters outage. Even their emergency forums were down.

My list:

1. Don't install Moveable Type

...

6. Profit!

I use Blogger.

It seems to be pretty spam proof.

So far I've had just one spam. I've been open since 11 Sept 04. My traffic runs about 100 a day with peaks into the 1,500 range if I get an Instalanche.

I think blogger is the future. With MT you have thousands working the issue individually. Not very labor effective.

With blogger (and it has its problems - spam is not one) the company takes care of spam and I take care of content.

Spam is one of the reasons I use LiveJournal.

You can set comments to LJ "friends" only, LJ users only or those plus "anonymous" and all of those can be screened to prevent them from showing up until you have approved them. It makes spam useless because it never sees the light of day in my blog.

Hopefully Six Apart's acquisition of LiveJournal will not change all of that!

Brett Kottmann, relying on LJ to keep out comment spam is proven not to work. People with LJs have received comment spam that didn't get copied to them in email, even from non-LJ users--spammers get around the code.

I use TypeKey to eliminate comment spam. It does the job but does sometimes frustrate legitimate commenters. Now I'm being hit periodically with huge spam attacks on Trackback, all pornographic. Is there a defense?

Joanne, Trackbacks are a bit harder to deal with, but not impossible. Looking at their chain, we can come up with ideas like:

  • If they're using open proxies to broadcast, DSBL-related solutions will help. I suspect this is the M.O.
  • If it's coming from a limited set of IP addresses, IP bans would become productive.
  • MT-Blacklist scans both comments and trackbacks, or can be configured to do so if Trackback scanning is off for some reason. This is true in both 1.x and 2.x versions of the software.
  • I set email notification for Trackbacks in my Movable Type blog config, so I'm alerted. This allows me to see who's linking us, and also flags Trackback spam. If it's clear that I'm dealing with a lot of trackbacks or comment spams from a single source, I de-spam the first example, then go into MT 3.x's "comments" or "trackbacks" sections and do mass deletes.
  • In an extreme situations, it would be possible to keep trackbacks on and notification on for your own edification, but edit their display out of one's blog templates by removing the tags etc.
  • In future versions, perhaps registered MT blogs could get a blog key to be entered in their config. It would serve as a form of authentication (in fact, why not use TypeKey's blog token), and blogs could then opt to display trackbacks only from authenticated blogs. That, plus a denial list, would improve defenses and allow other layers of "upstream" defences to come into play from Six Apart as spammers began attempting to abuse TypeKey registration.

Hey Joe:

"We considered using and requiring TypeKey for all commenters, but too many people reported inability to set up their TypeKey identity due to technical problems, etc."

How about using TypeKey but not requiring it, as we outlined in the comment spam guide? That way, you can use it as a free pass through Blacklist which will allow you to keep a tighter configuration.

Like I said in the guide, you'll be surprised at how many people sign in, even if you aren't moderating unregistered users.

For trackback spam, you might try Mark Carey's recently released MTDisguiseTrackbackURL plugin.

I haven't used it myself but I may soon install it. It simply outputs the trackback url via javascript and breaks the URL into parts, making it much harder for spambots to pick up. Same technique used to protect email addresses in mailto's.

Hope that helps!

I installed the MTDisguiseTrackbackURL plugin. Very painless and I think it'll be effective. Do take note of my comments (comment #4) at Mark's site, though, for small caveats.

My profound apologies for the duplicate trackbacks. Yet another MT frustration: when I edit a published post and then save, sometimes it repings.

Really, really embarrassing.

But to make this comment still on topic: For trackback spam, I've been using the combination of the MTDisguiseTrackbackURL discussed above, plus MT-Close2 (for trackbacks, although it also allows opening and closing of comments). Then there's this version of dsbl_deny.pl, which blocks comment and trackbracks from known open proxies.

I had the same trouble with Spamlookup and MT-Moderate plugins. I figured out that MT-Moderate 1.1.2 doesn't sit well with Spamlookup. I downgraded to 1.1.0 and everything works fine. You may want to give it another shot.

See here fo more

Leave a comment

Here are some quick tips for adding simple Textile formatting to your comments, though you can also use proper HTML tags:

*This* puts text in bold.

_This_ puts text in italics.

bq. This "bq." at the beginning of a paragraph, flush with the left hand side and with a space after it, is the code to indent one paragraph of text as a block quote.

To add a live URL, "Text to display":http://windsofchange.net/ (no spaces between) will show up as Text to display. Always use this for links - otherwise you will screw up the columns on our main blog page.




Recent Comments
  • TM Lutas: Jobs' formula was simple enough. Passionately care about your users, read more
  • sabinesgreenp.myopenid.com: Just seeing the green community in action makes me confident read more
  • Glen Wishard: Jobs was on the losing end of competition many times, read more
  • Chris M: Thanks for the great post, Joe ... linked it on read more
  • Joe Katzman: Collect them all! Though the French would be upset about read more
  • Glen Wishard: Now all the Saudis need is a division's worth of read more
  • mark buehner: Its one thing to accept the Iranians as an ally read more
  • J Aguilar: Saudis were around here (Spain) a year ago trying the read more
  • Fred: Good point, brutality didn't work terribly well for the Russians read more
  • mark buehner: Certainly plausible but there are plenty of examples of that read more
  • Fred: They have no need to project power but have the read more
  • mark buehner: Good stuff here. The only caveat is that a nuclear read more
  • Ian C.: OK... Here's the problem. Perceived relevance. When it was 'Weapons read more
  • Marcus Vitruvius: Chris, If there were some way to do all these read more
  • Chris M: Marcus Vitruvius, I'm surprised by your comments. You're quite right, read more
The Winds Crew
Town Founder: Left-Hand Man: Other Winds Marshals
  • 'AMac', aka. Marshal Festus (AMac@...)
  • Robin "Straight Shooter" Burk
  • 'Cicero', aka. The Quiet Man (cicero@...)
  • David Blue (david.blue@...)
  • 'Lewy14', aka. Marshal Leroy (lewy14@...)
  • 'Nortius Maximus', aka. Big Tuna (nortius.maximus@...)
Other Regulars Semi-Active: Posting Affiliates Emeritus:
Winds Blogroll
Author Archives
Categories
Powered by Movable Type 4.23-en