How often do you find a cross-over story about three notable Left Coast industries: venture capital, media, and -- err -- sex?
It seems that noted San Francisco sex writer Violet Blue did some checking on what the SF Chron and sfgate.com were doing with her content (NSFW WARNING) and didn't like what she found. Her past columns had been copied to another domain, all outbound links (and some punctuation) stripped, the articles split into multiple pages, the pages stuffed with keywords - some inappropriate, and festooned with pay-per-click ads. And it emerged that multiple domains had also been aliased to these dead-end copies. Now where have we seen that kind of behavior before?
Here I should mention that sfgate.com is apparently - by admission of the author - within the letter of her contract by making this use of her work. That relationship, now terminated, was based on a level of trust that she feels has been abused, and made no explicit stipulations on how the content can be reused. The interest here is what this occurrence may say about the ongoing behavior of the MSM online, and its implications for the industry's business model.
This incident is not a one-off.
The Violet Blue post also mentions the LA Times as creating similar ad-stuffed dead end pages, also with a list of multiple aliased domains. What caught my attention was that both the Chron's and Times' alias lists incuded subdomains of one common domain: perfectmarket.com. Perfect Market is an LA-area startup that claims to:
[help] newspapers, magazines, and broadcasters with a web presence and other online publishers grow their revenue with little effort and no risk. Our proprietary technology solution better fulfills the needs of intent users - people who arrive at their sites through keyword searches seeking specific information - with exactly what they're looking for in our customers' online content. Optimized content with relevant ads generates higher click-through rates for advertisers, and dramatically more revenue for publishers and their ad network partners.
Perfect Market is a well backed venture. It has raised over $20m in venture capital, the most recent round closing in February and announced yesterday. Interestingly, this round was led by the bankrupt Tribune Company, parent company of the LA Times. Perfect Market also has solid backing from more traditional VCs, including Trinity, Rustic Canyon and IdeaLab. (Mayfield also has a board seat, though no publicized investment.)
Again, nothing to see here from a legal perspective. The company is selling a service and technology to its MSM clients, who bear responsibility for its operation against content that they have bought or licensed.
There are three business perspectives that do emerge from considering Perfect Market's business model. The first is the potential reaction of authors who find their work reused in this fashion, and the consequent ability of MSM sites to work with those with an established byline. Ms. Blue has pretty much covered that by example, so I will pass.
The second is the reaction of the so-far-unnamed party to the transaction: the search engine. The keyword and ad-stuffed dead end copy pages apparently produced by Perfect Markets's technology are identical, from a search company's point of view, to those created by more questionable tactics such as scraping. The intent is the same: to spam the index. This is the behavior that routinely gets questionable sites shoved to Google's back pages, or banished altogether. One has to wonder just how long this type of abuse will be tolerated, simply because it's being practiced by a recognized media outlet. (And we can note in passing that the irony of an MSM which routinely suggests that deep links are 'stealing' behaving in this fashion is thick enough to spread on toast.)
Finally come the implications for the businesses of the MSM sites themselves. While Perfect Market suggests that what they are enabling is an exhibit of MSM brand power, reality would seem to be the opposite. It should be clear that neither the potential reader nor the original author are going to be happy with the existence of a keyword stuffed, link stripped dead end page. The difference between these pages and those of a more prosaic SEO spammer is simply the brand attached to them, which might entice the reader to click through. It should be obvious that the Chron, the Times, and other MSM outlets behaving in this fashion are doing no less than milking their brands to the detriment of long term trust and value. It is a subtle, but telling, exhibition of their desperation.
(Cross-posted with minor edits from Due Diligence.)
Some of you may have noticed that Winds went down entirely for about an hour yesterday. We made major modifications to our infrastructure recently, in order to run Winds on a series of base platforms that were more CPU-friendly (Ubuntu/LightTPD not Red Hat/Apache, no more Virtuozzo or CPx control panel, which forced a hosting switch from the excellent folks at ServInt to our new friends at Pixelgate). That worked, and performance improved significantly. But yesterday... over to Ev:
"They called back and let me know what happened. It was a trackback spam attack so large, it drove the load average on the server so high that they couldn't even log in themselves without forcibly rebooting the box first. The spam attack resumed while I was on the phone with him, so I've disabled trackback. It's simply untenable to keep on, when it can disable the machine so badly that not only can't I log in, they can't log in when they're physically in front of the server."
We've killed trackbacks now, and they'll stay dead. Movable Type's approach to dealing with trackback & comment spam is fundamentally non-scalable, which means it's fundamentally broken in an age of cheap CPUs and no consequences for spammers. Worse, their security flaws forced us to migrate to MT 3.3 (and the only CAPTCHA system that works with it, plus the unfixable author link limit annoyances, etc.) and made our lives here worse, not better. We're as frustrated as some of you are.
Which is why Winds of Change.NET will be moving to Wordpress once some test migrations of other blogs are finished and confirmed to be trouble-free. Wordpress is inherently more CPU-friendly (PHP not Perl), has a wider variety of features & plug-ins, and a community that is way, way ahead in anti-spam measures. I'm hoping this can happen by mid-to late November. It would be a fine birthday present for me, and a present for many of you, too.
Plagiarism Today has an excellent article about spamblogs, the problems faced by Google/Blogspot, its spread to MSN Spaces, and why this is likely to be a trend:
"The bitter truth is that the Web is more vulnerable than ever to splogging, not because of clever spammers but because of ill-prepared hosts. While Google responded to pressure from the blogging world to do a better job policing its service (though the effectiveness of its response is up for debate), other hosts have not taken any clear steps and many are completely unable to handle the problems that they face now."
Yes. This has been a discussion topic on Winds following our (continuing) ban on blogspot.com in comments or trackbacks. Personally, I believe we're headed for a blog future in which owning your own domain will be the only viable option to avoid fairly widespread blacklisting. As the PT article notes:
"Being a successful Web host is no longer just about having the best features or good servers and easy to use tools. It's also about having an effective abuse policy that not only frees up precious resources for legitimate users, but makes you a good neighbor on the Web.
Simply put, no one wants to use a service that has a bad reputation or has even been blacklisted for generating too much junk and, in a Web where sharing information and ideas is critical to survival, being blacklisted, can be a death kneel to an otherwise sound service.... We just want to run our sites, search our data and read our favorite pages in peace.
However, it's up to the hosts to create that and, frankly, I don't think most are up for the challenge."
I tend to agree; here's the whole piece if you want to read it.
I see a future in which the free sites are training/experimental grounds, and the more all-inclusive ones like Blogspot or MSN Spaces are their own little gated communities, accepting each other's links but not accepted or accepting very much beyond that radius.
That's sad, but the absence of meaningful penalties or enforcement against spammers makes it more or less inevitable.
So, how's the blogosphere doing? MarketingVOX notes:
"The blogosphere is doubling in size every six months and is now 60 times larger than it was three years ago, according to the latest quarterly installment of David Sifry's "State of the Blogosphere" report. He writes that Technorati now tracks over 35.3 Million blogs."
More data, links, and related articles over at MarketingVoX. The good news is, the survey believes that most of the growth is real, not 'splogs' (spammer blogs).
On which topic, blogspot.com owner Google is trying some CAPTCHA techniques to throttle down on the auto-generated blogspot splogs that have led to their domain's blacklisting at sites like Winds of Change.NET et. al. Based on our MT-Blacklist logs (total spams blocked since inception: over 290,000) Google is improving, but isn't quite at the required seriousness level yet.
The Pentagon, on the other hand, is. As is the CIA.
The Pentagon's Defense Science Board will conduct a study this summer on the military implications of Internet search engines, online journals and blogs - see DID's coverage.
And wouldn't you know it, the CIA is in on the act, too... apparently:
"I can't get into detail of what, but I'll just say the amount of open source reporting that goes into the president's daily brief has gone up rather significantly... There has been a real interest at the highest levels of our government, and we've been able to consistently deliver products that are on par with the rest of the intelligence community."
Hmm. Should I be encouraged, or terrified?
Wonder if they also read and acted on ex-CIA employee Celeste Bilby's internal CIA blogging suggestion back in 2004?
I suppose it's too much to hope for JDAMs or covert hit squads targeted at the 100 or so problem children that make up most of the spammer universe....
Regrettably, Winds of Change.NET has been forced to reinstate the ban on blogspot.com URLs for comments and trackbacks. The amount of spam coming from that domain remains excessive, and Google seems unable to get a handle on the problem.
The bottom line is that if you host there, you live in a bad neighbourhood - and others will seek to protect themselves. If you're serious about blogging, there is no substitute for your own domain.
(Updated post; originally posted March 2, 2006.)
Hi all. Unfortunately, our efforts to convince al-Qaeda that they should be murdering spammers around the world and beheading them on video (charge 25 cents per view on Internet video, it would even be a profit center) haven't borne fruit yet. Couple of quick bulletins, therefore:
Speaking of authors:
If you use Blacklist as part of your system (we found SpamLookup had insuperable problems), the following strings should be entered. I've been looking at our spam logs, and the number of posts blocked is VERY high:
We've talked a bit about Winds' multi-layered spam-defenses before. As a fine close-out to the New Year, I received an email from Project Honey Pot:
"Regardless of how the rest of your day goes, here's something to be happy about - today a honey pot you installed successfully identified a previously unknown email harvester (IP: 184.108.40.206). You can find information about your newly identified harvester here. Info on all the harvesters that have been spotted by this honey pot is also available.
Don't forget to tell your friends you made the Internet a little better today. You can refer them to Project Honey Pot directly from our website.... Thanks from the entire Project Honey Pot team and, we're sure if they knew, from the Internet community as a whole."
Project Honey Pot is fairly easy to install, and I recommend it. Here's a fine New Years' resolution: let's make the Internet a more dangerous place for Spambots in 2006.
Seems our anti-spam plug-in SpamLookup has been blocking a lot of legitimate trackbacks lately (thanks to Security Watchtower for the alert) - including all blogspot trackbacks!
The situation is now fixed, and we encourage blogs to start sending us trackbacks again so our readers and authors can follow the links and see what you've written.
Hi all... You may have noticed that our comments form has changed slightly. We added James Seng's CAPTCHA (Completely Automated Public Turing test - to Tell Computer posters and Humans Apart) feature, so you'll have to look at the graphic and type the humbers/letters in order to comment here.
Note that the CAPTCHA number displayed changes when the screen changes - you don't need it for "Preview," so just enter it when you're ready to post. Sorry about this, but at 1,000 - 2,000 comment spam attacks per day, it has become necessary to go this route in addition to our other defenses.
See the Winds Guide to Fighting Comment Spam for more information and related links. Any serious problems, contact joe, right here @windsofchange.net.
CNN Money is running a story about a new IBM service that "spams the spammers." The idea behind the technology is that when a spam email is received, it is immediately sent back to the originating computer - not an email account. Or so they say.
Interesting idea, and you can find more via Shlashdot... including an early commenter who points out that CNN's description of the system and what IBM's FairUCE actually does paint very different pictures. Nor is this the only the only thing the article gets blatantly wrong. Is it too much to ask that the media hire reporters who actually understand their subjects? (this Australian reporter, who writes about open-source software and Firefox browser adoption in businesses, clearly does).
BTW, note IBM's integration requirements description for FairUCE:
"The FairUCE concept is currently implemented as an SMTP proxy that runs between multiple instances of Postfix on Linux. QMail and Sendmail support are being considered. It should be possible to use existing mail server(s) on the inside of the proxy; Postfix is currently required on the outside (optionally on a separate boundary server, protecting one's regular servers from most spam). End-users cannot install FairUCE at this time; end-users, please direct your mail administrator to this page."
That might not be a bad idea. As the Winds of Change.NET Guide to Fighting Comment Spam notes, there's no such thing as a single complete solution to cyber-security problems like spam. Based on IBM's description and FairUCE's FAQ, however, it seems like a step forward.
I wonder if we could convince them to share their technology as part of future blogging systems that would help us fight trackback and comment spam?
(posted Jan. 6, 2005; last updated June 12, 2005)
Six Apart, the folks behind the Movable Type software that runs this site, have just released a Guide for Fighting Comment spam on weblogs via comments, trackbacks, etc. As you might imagine, Jay Allen played a big role in compiling it. It's worth any blogger's time, especially those who run MT installations.
We use our own mix of techniques here at Winds of Change.NET. I'm going to go well beyond the Six Apart guide and give you some general principles for building your own blog's defenses, then move on to what we're up to so you can see some of these ideas in action. I'll conclude by talking about the source of this problem, and what can be done.
Further thoughts and suggestions will be welcome in the comments section, of course, and this post will probably evolve over time.
I'm assuming you want to keep your comments. Even so, you may lack the resources to implement the measures Winds of Change.NET has taken. You may need something else. And like any hostile enemy, spammer tactics will change in future and you'll have to react.
So here are a few general principles to remember. Six Apart didn't include them, but they're useful as you think about securing your blog against the hostil cyber-attacks of spammers. Now that you're a security designer - or about to become one - remember that:
Which brings me to my last point, familiar to students of Eli Goldratt's Theory of Constraints for organizations:
Know where your system's bottlenecks are, and make all other decisions revolve around their limitations.
For many weblogs, the human element is the bottleneck because the authors have a very finite amount of time & attention to commit. Depending on your host, however, other bottlenecks could include CPU load (on overcrowded servers), limitations of your technical setup, etc. Figure out what the top 3-5 bottlenecks are, and rank them. Then use that ranking as a guide to all of your subsequent decisions re: improvements, defensive measures, etc.
These bottlenecks will also remind you that you can't do it all. Accept your limitations, and consciously make tradeoffs of more "blunt force" methods like closing all comments to posts over X days old if your time is really tight (The Six Apart guide has some tools for this). You'll miss some great comments - but if you don't have the time, then don't agonize. Just pay the price, know why you're paying it, and move on.
Of course, you can expand your limits by recruiting more technical members, forming affiliations, etc. If so, review your past decisions to make sure they're still what you want or need to do. Some of your old limitations may have lifted, at which point you can either fix some of those old tradeoffs, or apply the new resources to a new area if that's more productive overall.
Note the negative selection pressure this creates in the blogosphere. The spam onslaught is actually one of the reasons I believe Carnivorous Conservative was on the money with his prediction that group and federated blogs will rise in popularity as the blogosphere evolves.
Winds has become a high-value target, so we use a number of approaches. I'm going to talk about a few:
Finally, we looked up the chain for additional firepower.
Of course, we'll continue to evolve our defences, adding and subtracting based on many of the principles explained earlier in this post.
While this problem seems to come from many directions, most of the problem is apparently the work of a small number of bad actors. We've seen this phenomenon in the email spam world before. O'Reilly's book "Spam Kings" adds more details, and even this spam map from Postini.com suggests it. I've heard a few experts opine that over 80% of today's email spam problems are the work of less than 100 bad actors when you get right down to it.
Ann Elisaberth's investigations and Teresa Neilsen Hayden's Lolita advisory suggest that a similar pattern may be at work with respect to comment spam. The blogosphere's rapid growth is also making blog spamming more and more attractive.
Personally, I'm surprised all that in-your-face porn, drugs, etc. hasn't yet been declared "un-Islamic" and a fatwa issued for the deaths of those involved. It would be the best publicity Osama et. al. could ever hope for, and a problem caused by a small circle of bad actors would be very susceptible to this solution.
That's probably too much to hope for, however, and the blog defence principles above remind us that even silver bullets are no silver bullet. So, we'll need to forge our own response.
Other responses will become more and more necessary as blog readership grows and spammers become more and more sophisticated in their methods. What all share is the requirement for building first tools and information, and then assembling a larger and larger coalition to make use of the results:
More possibilities surely exist, and we'd love to hear your thoughts and ideas. Use the comments section to discuss:
Hi folks. You may have noticed that comments have been disabled on the site. They are now re-enabled.
With the help of Evariste from the excellent teamblog Discarded Lies, we have taken other measures to combat the 20,000 comment spam attempts we've seen in the last 2 weeks. For reasons we're still trying to figure out, the spams are causing problems for our hosts at Total Choice Hosting due to server load. At present, we will NOT require TypeKey registration. We're hoping these moves, plus the possible addition of this "type in the numbers you see and prove you're human" system will provide adequate defenses, while still allowing our valued readers to comment.
If you're considering an upgrade to Movable Type 3.x and you use MT-Blacklist to protect your site, or you're wondering what forced our hand, read on...
Lots of changes on Winds in November - moving hosts, record traffic (over 200,000 visits in November) then a couple weeks ago we upgraded to Movable Type 3.x from 2.x. It offered many benefits, but also came with a hidden price - a required upgrade to MT-Blacklist 2.0.
MTB 2 is a step back from version 1.6 in many ways, from interface to capabilities. It was put together as an emergency measure when MT3's TypeKey alone failed, and this shows. While more efficient in certain ways, overall it's a big jump in administrative overhead time. Over the last 2 weeks, Blacklist may have blocked 18,000 spams, but it also forced moderation of another 2,000 or so, and in many cases they were already-blacklisted items that got through due to a flaw in the system.
Blacklist also changes in one more important way. Instead of comparing new comments against a text file blacklist, it stores the blacklist items (in our case about 2,750 items and 60 programmed "catch alls" for various things, and we aren't unusual) in MySQL. This forces MySQL database calls whenever a comment is submitted. 1500 database hits a day may not mean much, but if you get 100 from various IPs in about 10 seconds, is that a problem? I don't know what's happening elsewhere, but it has been a problem for us at Total Choice Hosting.
So if you're unhappy with your present setup... the grass isn't always greener.
MySQL is pretty fast, but the question of how much additional load this sort of thing creates on shared servers with multiple bloggers, who could all be attacked in the same burst, is definitely worth investigating. I hope some qualified people will do so, and share the results.
For now, the bottom line is that taken some measures we can't tell you about, and others may follow that could add some additional steps for our commenters. Unfortunately, until we have a better option we're going to have to take that hit.
Meanwhile, if you're considering following in our recent technical footsteps, a word of friendly advice:
Not yet, not now. I wouldn't wish the aggro on anyone.