It didn't take a genius to guess that something was fishy when a John Hopkins study (coincidentally released very conveniently near the election) trumpeted a figure for Iraqi dead that exceeds even far-left body counts for Iraq by a factor of about 5, and reliable estimates by a factor in the double-digits.
It may take a fairly smart person to explain exactly what was wrong with the study, however, and how it produced such laughable results. Fortunately Steven E. Moore, who has been doing survey work in Iraq for over 2 years, can. He steps up to the plate and deservedly destroys the study as a effort that would flunk a grad student. That it was a work of political agitprop rather than anything honest or serious was obvious from the get-go - but it's always good to understand the whys and hows of its shoddy dishonesty, because you'll see this again and again in future.








I personally hope that Daniel Davies will show up and flog the peasants with his superior intellect. A more adept purveyor of the art of ad hominem and argument from authority, I never hope to meet.
Oh, dear, gasp. An "argument from authority" used against laymen discussing statistical modeling is quite the imposition.
Without defending the original study, Moore's particular "refutation" is innumerate nonsense. (The near-independence of the need of sample size from underlying population size is freshman stats stuff.)
It is amazing how at WoC, professional degrees and experience are a sign of bias while employment at a GOP-affiliated consulting firm (as Moore has) is a sign of honesty and impartiality.
The 650,00 figure seems to be a death rate of about 1% per year. Is that an inconceivable figure for a country involved in a civil war with massive health problems ?
So the Lancet study is bogus, according to a Republican lobbyist. Then why not do a legit study?
AJL,
Good comment. Note that "Smokin Joe" (smokin crack Joe, more like), authoritatively states that Moore "destroys" the study. How does Joe know? He doesn't, of course. But then the truth of the matter is less important than Joe's role as a spinner.
No doubt here in Joe's mind, even though he clearly doesn't know what the hell he is talking about.
The Stats article referenced in your link is also useful
Myself? I don't know. But, what I can continue to point out, is that Joe in no way really cares about the truth of the matter - but only cares advancing his biases.
Not only is Moore's point about number of clusters innumerate nonsense, his other point about how Les Roberts told him that they didn't collect demographic info is not, in fact true. Roberts said that they did collect it and he told Moore this fact
John, the survey is not about a country's death rate, but claims to be about death rates from a very specific cause i.e. war. Yes, 1% would be extraodrinarily high, as shown easily by its radical departure from even the far left's numbers.
As for silly analogies to a "well-stirred soup," this glosses over precisely the point: Iraq is not that, and the thing they wish to measure is very, very uneven within Iraq. Few cluster points is, therefore, an obvious problem that requires no advanced statistics degree to see. There's also, of course, the issue of departure from the methods used by other, more reliable reports:
The JH report was poorly planned and poorly executed, but that does not alone indicate dishonesty. A lead team who saw no reason to double-check or question its wildly variant results before releasing them to much media fanfare at a critical political juncture, however, is a most excellent and reliable indicator.
Something Andrew wishes to preserve by dishonestly affirming the study's validity while saying he will not defend it.
Joe, you claim that the study was poorly planned and executed but you have no clue about survey methodology. Moore's article even referes to a 50 cluster survey and Moore used 75 clusters in his own surveys. There is nothing unusual or untoward or in any wrong with a 47 cluster survey. And if it was too small it would have shown up in the results as a confidence interval that was too broad. Moore's criticism is rubbish, and the fact that you continue to advance it even after it has been demolished demonstrates your lack of knowledge and understanding.
Have you ever done even a first year course in stats? You can't bullshit your way through this stuff you know.
I wish I had time to look seriously at the Burnham study and do a real post, but work precludes that for a couple of weeks at the least.
When I heard on the radio that the Roberts group had done another study, here's what I immediately thought:
And anybody would know that the higher the casualty estimate, the more fiercely the Left Blogosphere would defend the study's sacred honor, and the more the Right Blogosphere would attack it. Science as politics (with an able assist from Roberts' prior study).
Well, (on a quick read), I was right on #1, #3, and #4 but wrong on #2. Burnham's figures are way high.
Why are they so high? Two choices.
Burnham et al. will certainly claim that they did go as far as they could re. safeguards under the circumstances, but as of now I see no reason to credit that notion. To do so properly would have necessarily meant inviting skeptics into the design phase of the study, so that people who are generally seen as trustworthy and apolitical could also vouch for the study's integrity.
Burnham's and Roberts' version of politicized science rejects the notion that they are doing politicized science. (E.g. the coyness regarding the rush to publication so that results hit the papers just before a key US election.) Since they aren't political, including expensive safeguards to make their findings credible in a politicized environment would seem to make no sense to them.
The Iraq Body Count approaches the Burnham study from a completely different angle than does the innumerate (sic) Moore in the WSJ. Some quite damning discussion from an unexpected source.
"You can't bullshit your way through this stuff you know."
Tim, just so you know, that is completely wrong. Joe bullshits his way through stuff like this ALL THE TIME.
It is his Modus Operandi:
It is also applied in fraud investigation when talking of behavior patterns that indicate specific types of fraud.
Thanks for stopping by to expose the fraud (spin) - it is appreciated!
Joe, if it weren't about Iraq, you wouldn't have such trouble separating (1) I support the study and (2) I am agnostic on the study but I recognize that Moore's attack on it is not mathematically sound. I hold position (2). I haven't studied the matter enough to know if there were systemic errors in the study (an unstirred, very separated soup, if you will). I've studied more than enough statistics to understand that except for small populations the margin of error is based on sample size as an absolute, not as a ratio of the sample to the population. Moore's implications to the contrary, by bringing up Kosovo, are somewhere between innumerate nonsense and a mathematical con job. The study's authors indicate the very large margin of error in the results. (For a brush-up from a completely neutral organization, check the link in comment 6.)
A good round-up of the hysterical and 100 percent mathematically ignorant reaction to the Lancet may be found here. And in fairness to conservatives, there was a similar reaction on the left to "The Bell Curve" and its white-superiority conclusions. Sure there were (extremely) knowledgeable critics like Steven Jay Gould, but there was also a lot of nonsense arguing backwards from the unpalatability of the conclusions to the invalidity of the methodology.
Its tricky trying to rig a statistical study to match your political beliefs. If youre not careful youll end up with unbelievable results. Results that folks look at and reject because it doesnt pass the smell test. But since many of those involved in funding and conducting the study are not in on the fix you are pretty much stuck with your unbelievable figures. A really prudent American-hating group of British scientist would have taken more care.
Any scientific study is only as reliable as those conducting it. Time and time again scientists of all stripes have been shown to be as political, ambitious and more than willing to fudge on every stage of their work as the next man so we know that it DOES happen.
Furthermore I also see intractable problems of built-in bias in sampling and interviewing in an emotionally charged war-torn environment like Iraq. They were not asking interviewees in Camden, Ohio what brand of soap they preferred or who they preferred in an election, after all.
I haven't had time for a detailed analysis of the study, but, if an error did occur, it seems likely that the statistical error that occurred was probably sampling error. That is, the 47 clusters used in the study were not chosen in a manner that turned out to be truly independent. This doesn't necessarily mean that the authors had political intentions; it just means that the results are probably not accurate. If demographic data were collected, one quick check on this would be to compare the demographics of the study population to the overall demographics for Iraq. Large discrepancies between the two would lead one to doubt the sampling methodology.
AMAC's link is a good one for someone interested in finding an objective critique of this study. It uses simple sanity checks to examine whether this estimate actually makes sense.
For those of you supporting the study, and who are claiming that critiques of it are "innumerate nonsense," you need to learn a bit of math and statistics yourself.
Small numbers of clusters (and, indeed, small sample sizes) can be representative of much larger populations. BUT...
If you use a small number of clusters (like the Lancet study), you also need to to a lot more groundwork than the small amount quoted by the study. Smaller sample sizes require much more data.
You need to not only get your primary data, you also need to ask a lot of other questions, covering a broader range of demographic variation, to show that your sample isn't screwy. Besides asking about the people who died, you need to ask about the families of the deceased, the neighborhoods they lived in, and more about the actual victims (income levels, marital status, political and religious afiliations, et cetera). That's how you double-check to see if your sample is truly representative of the population at large.
It's also a stronger proof that one or more of your researchers isn't screwing with your data (consciously or unconsciously) to get a predetermined result.
You might also note that the people who actually did the groundwork were not trained statisticians or polling experts, so the "innumerate nonsense" charge can be very easily laid at the feet of those folks.
Yes, even doctors can be idiots when it comes to stats and sample sizes. Or, in many cases especially doctors.
The responses here are a perfect example of the 'lies, damn lies, and statistics' saying.
Yes, the sample size versus population size is irrelevant assuming your sample size has a certain critical mass. Further increases to the sample size are reflected in the confidence interval.
However, and this is the point Moore is making and everyone is conveniently missing, that is only true if your sample is random. That idiot soup comment is a perfect examplar of how shoddy statistics gets by the 'innumerate'. A well stirred soup has been randomly distributed already (the well stirred part). Thus your guaranteed a random sample, and can achieve a very high degree of confidence with a single spoonful. In a highly fragmented data set, where the data isn't even close to being naturally randomly distributed, sample size versus population size is a problem, because you need enough sample points to be sure and hit all potential 'pockets' of similarly clumped data. And since you don't know where those 'pockets' are to begin with, you have to have enough samples to basically carpet the entire data set.
The very nature of clustering ensures you do not have a random sampling. In essence the Lancet study pulled a bait and switch. They really only have 47 samples, but they calculated the confidence interval based on the number of individual interviews instead of the number of clusters. This is why the study is bogus. Their claiming a far higher degree of random distribution of the sample set than their methodology actually gave them, thus their confidence interval is complete garbage.
You don't even need to know anything about statistics. Just think about it logically, if it were possible to get accurate results from extremely small numbers of clusters, why do all the deep-pocketed survey groups go for such large numbers of clusters (and interviewed far fewer people at each cluster than Lancet did)?
Wow, looks like I just beat the barrage of other posters on the sampling methodology issue. :-)
One other comment - seems like the whole Moore-Roberts "he said/he said" issue can easily be resolved if they communicated by e-mail. Post the e-mails on line and let us see what exactly was said to who. It seems silly to argue over something that can be empirically resolved. Don't tell us you what you said, just show us.
However, and this is the point Moore is making and everyone is conveniently missing, that is only true if your sample is random.
Except that is not the point Moore is making. Moore is making the (invalid) point that 47 clusters is just too few to be a good random sample. Moore: "This is astonishing: I wouldn't survey a junior high school, no less an entire country, using only 47 cluster points." Real statisticians: "On the face of it, this sounds like a fatal flaw. But unless the sample is actually biased, a smaller number of cluster points only has the effect of widening the confidence interval. Polls don't like large confidence intervals, but for the purposes of estimating large numbers of people, even the wide confidence interval of the Lancet study is informative. The point is that the number of clusters relative to the size of the population is less relevant than whether the sample of clusters is representative of the population. So when Moore implicitly criticizes the Lancet study in relation to a similar study on Kosovo which used 50 cluster points, �for a population of just 1.6 million, compared to Iraq's 27 million,� the issue is not one of brute numbers, but whether the clusters chosen are representative of the overall population." (This is from the link in comment 6. I am sorry that the spam filter limits the number of links in comments.)
The STATS organization has a general defense of cluster sampling here, please read it. That's not from doctors. That's from a non-partisan group of statisticians and mathematicians. Excerpts: I'm afraid alarms go off in my head when I read You don't even need to know anything about statistics. Just think about it logically…. Logically heavy objects fall faster than light ones, right? I prefer knowing something about statistics and using the standard methods for calculating confidence intervals in clustered surveys.Well, talboito, your rejoinder might be apt if Davies restricted his bile for we unwashed "laymen". He doesn't, and apparently, you also can't be bothered to learn who your interlocutors are prior to being an insufferable ass.
AJL, and others who claim to know some statistics:
I've been asking this question repeatedly over at Asymmetrical Information, with no answers yet:
Can anyone tell me:
1) If cluster-sampling has been used for death counts in peaceful, advanced countries (like the US, or UK, or Japan) where the official records are likely to be pretty good
and
2) how close those "test" counts came to the official counts.
Forgive my empiricism. But unless someone has tried this method for this purpose to make sure it works, the fact that it is mathematically impeccable and "widely accepted" doesn't mean anything. I don't care if it is "regularly used" in war zones, I want to know if it works. Starting under ideal conditions and moving out from there. It's the difference betwee "we regularly use leeches" and "our leeches are FDA approved as safe and effective."
As we used to say in the lab: in theory, theory is no different from practice. In practice, it usually is. (Or: what's the difference between theory and experiment? About 4 standard deviations)
This is in entirely good faith. I don't have an opinion one way or the other, I just want something better than "trust us, we're mathematicians."
However, and this is the point Moore is making and everyone is conveniently missing, that is only true if your sample is random.
Except that is not the point Moore is making.
Moore is making the (invalid) point that 47 clusters is just too few to be a good random sample.
So, Moore is not making the point that their sample wasn't sufficiently random, he was making the 'invalid' point that they didn't have enough points to have a random sample, thus making their sample...not...random?
Excuse me?
Try this on for size, I have 10 M&Ms, half red, half green. I sample 2, and get 2 red M&Ms. So according to my study, the M&Ms are 100% red. This is obviously bogus. In statistics, the sample needs both a certain fixed size, and a certain proportional size. Interestingly, the rough rule of thumb to calculate necessary minimum fixed size is 50, i.e., in just about any population size (excluding very small ones obviously) you need at least 50 samples to even BEGIN to get an accurate sample. Then you need an increase from that point, depending on the natural distribution of the data, to narrow the confidence ratio to something smaller than +/- bazillion percent.
They used a poorly distributed set of clusters, used very few clusters total, did little/no proper prep work to ensure the clusters would be properly distributed and, gasp, got results spectacularly divergent from that achieved by everyone else. But the methodology was just fine. Right.
Oh, and to be nit-picky, larger (more massive) objects DO 'fall' faster than smaller (less massive) objects, barring air resistance. Look at the equations for gravitational attraction if you don't believe me. Granted for most objects the difference is too small to measure, but if you drop something the mass of the moon, and something the mass of a bowling ball, the moon hits first. That's because it pulls the planet up to meet it faster the bowling ball does.
Treefrog: Moore was OK with a Kosovo survey using 50 clusters and not OK with an Iraq survey using 47. (The confidence interval in the Iraq survey is, in fact, very large, and the authors have made no attempt to hide that.) The only explanation I see for Moore making for the distinction of the Kosovo and Iraq surveys is that he (mistakenly) believes that Iraq, being much more populous than Kosovo, needs a larger sample size to obtain the same (percentage) width confidence interval, or more likely that he knows this is not true but he figures a lot of his readers will believe it anyway.
Rob: I don't know the answer to that question—this sort of study is pretty far away from anything I've ever done practically. I hope Tim Lambert re-visits, because I suspect he would have some knowledge.
The only explanation I see for Moore making for the distinction of the Kosovo and Iraq surveys is that he (mistakenly) believes that Iraq, being much more populous than Kosovo, needs a larger sample size to obtain the same (percentage) width confidence interval,...
Let me fix that for you...
The only explanation I see for Moore making for the distinction of the Kosovo and Iraq surveys is that he (correctly) believes that Iraq, being much more diverse than Kosovo, needs a larger sample size to obtain the same (percentage) width confidence interval,
From the byline, about the author:
"Mr. Moore, a political consultant with Gorton Moore International, trained Iraqi researchers for the International Republican Institute from 2003 to 2004 and conducted survey research for the Coalition Forces from 2005 to 2006."
They seem to have left out "...and purveyor of truthiness to Republican's in their political civil war with their fellow Americans."
Less snarkily, in Kosovo, the violence was pretty evenly spread across the entire area, and the smaller cluster set (about the same size as the Iraq set) could get an accurate measure.
In Iraq, the violence has not been spread evenly. Not even close. I'm not even sure how you'd even begin to ensure that your small group of samples is properly representative of the areas with high violence and the areas with low violence. Given that even in high violence areas like Baghdad, it varies considerably from neighborhood to neighborhood.
If a sample point fell in a neighborhood that had, say, the shopping plaza for the area bombed, it's conceivable that almost every family interviewed might have lost someone, whereas a half mile away, the same sample group would've produced almost no losses.
The only even remotely workable approach I can see is to do what the UN team did and just carpet the landscape with sample points and brute force the problem.
Much has been made about the fact that I am a Republican political consultant. I clearly labeled my op-ed with this information.
Les Roberts was a democratic candidate for Congress in New York... http://www.thatsmycongress.com/lesroberts.html
This is just another example of the hidden political agenda behind this study. If you don't know how to do surveys (as 99.999% of American voters don't) and realize how terrible the fieldwork is, then you have no way of deciphering this as propaganda.
Kosovo is less "diverse" in what sense that matters here? Are you claiming that death is more evenly distributed in Kosovo? I don't really see how you could, without accepting the validity of the very studies you are arguing against. And, to answer a previous question I neglected, deep-pocket survey teams use larger samples because they want a smaller confidence interval than the Lancet's methods allow. I suspect you knew that at the time you asked, too.
AFAIK, not a single professional statistician of any political persuasion has objected to the general methods of the survey nor to the calculation of the confidence interval on the results. Now, statisticians can't, from the sidelines, refute claims of bias, either conscious or unconscious, in the survey, but if you think there are such, please detail what they are.
Are you claiming that death is more evenly distributed in Kosovo?
Yep, smaller geographic area, and the fighting covered, as far as I know, essentially all of it. That means the deaths are going to be a lot more spread out.
I'm not saying clustering is an invalid technique, incapable of producing usable results. I'm saying the Lancet implementation was horrible. Clustering is a method for attempting to avoid errors introduced because it's too expensive to get a traditional random sample set. Thus, we're already less accurate than normal, and they took a dreadfully small sample. Even the stated confidence ratio is attrocious.
AFAIK, not a single professional statistician of any political persuasion has objected to the general methods of the survey nor to the calculation of the confidence interval on the results.
Hmm, 5 seconds of googling came up with this...sure I could find more if I cared to spend more time at it...
Donald Berry
Steven Moore #26,
Welcome.
AJL #27:
> Now, statisticians can't, from the sidelines, refute claims of bias, either conscious or unconscious, in the survey, but if you think there are such, please detail what they are.
This is, to me the more interesting question. Note that the confidence interval of the 2006 study is much tighter than that of the 2004 study. The lower bound stil gives a death-by-violence figure that is much higher than what others have calculated by other methods. That makes me doubt the "random error" scenario. If properly presented, a design with a lot of recognized uncertainty should reflect that in a very large confidence interval.
I gave an example of a systematic bias possiblity in comment #10. Note that any such biases would not be reflected in metrics like confidence intervals. But there's a more general point.
When setting out to do this study, Burnham and Roberts knew that findings of high Iraqi casualties would foment exactly the back-and-forth that we see in microcosm here. They knew that skeptics would be able to raise questions, that the value of their report as science would be severely questioned. (They also knew that they would attract passionate defenders, but that's not relevant here.) B & R could have chosen to design their study in such a way that the science would be highly defensible. That design would have had to start with the proposition that the cross-checks and safeguards demanded by reasonable skeptics would have to be built in to the study. It would have meant broadening their group to include some skeptics. There are no technical reasons why Burnham and Roberts did not go this route. They chose not to try.
In combination with their performance--and the Lancet editor's performance--with the 2004 study, I think this decision is highly suggestive of the notion that the authors did not feel that they needed to address these issues of trust and verification. And indeed, they are not issues for much of the intended audience of this study, in my opinion.
But I wonder whether it is a matter of some indifference to the authors, how their study is viewed five or ten years from now.
We can argue about the methodology of the study for a month and get nowhere. In the end all we have is their word that they didnt rig the results. I have to trust the source before I believe results and I most emphatically do not trust the source.
The fact is that the area they are trying to sample simply isnt amenable to polling in the same sense that, say, Scarsdale, NY, is amenable to polling of what political candidate or brand of toothpaste Scarsdale prefers.
What needs to be discussed is the validity of the underlying assumption: That we should care whether Iraqis kill each other in record numbers between the time of Saddams fall and such time as a viable government may be formed. Just as long as the dead or the killers are not US soldiers I see no pressing reason to be concerned.
#26
"Much has been made about the fact that I am a Republican political consultant. I clearly labeled my op-ed with this information."
"This is just another example of the hidden political agenda behind this study."
At least you are not hiding your political agenda, Steven.
But what is the evidence that the authors of the study, the anonymous reviewers, and the editors of the journal are all hiding theirs, as you assert?
Simply not believing the results is certainly evidence of nothing.
Or are you just deploying "level 2" defense mechanisms...attack the motives of your opponents?
That byline of yours is starting to make more and more sense, and there is no evidence whatsoever that you are functioning in anything other than your capacity as a political consultant to an imploding party.
Honestly, I do not envy you or your job at this moment.
Being in the 99.999% of voters who don't know "stat," I'll simply point out what seems to be the most obvious problem:
1. The U.S. and Iraq have programs to reimburse victims of the war for property loss and casualties. Additional programs are always under discussion.
2. The study is based on self-reporting from the potential beneficiaries of such programs.
Therefore, any study based purely on self-reporting has a problem.
Stephen Moore at #26,
What is your statistics training and education?
Your WSJ claim that the number of clusters was too small was not backed up by any citation to a statistical authority or any statistical reasoning, other than your claim that some surveys used a larger number of clusters. Are you using rule of thumb reasoning? And if so, based on what statistical authority.
You may have needed more precision (smaller confidence interval) for your opinion surveys. So you should use a larger number of clusters. Burnham didn't need a small confidence interval for his research since he was attempting to measure a crude effect. Even his lower bound showed 400k excess deaths.
Your claim that he collected no demographic data was false. He's explained that such data was collected about each household.
Burnham's survey showed 2 violent deaths pre-war, and 300 deaths post-war, out of 12,000 people sampled. In a country of 26+ million you can't reconcile that huge increase in violent deaths with Bush's 30k deaths or the IBC's claim of ~50k or the Brookings claim of ~60k.
Comment to those who claimed that violence is spread out unevenly. Cluster sampling tends to understate the occurrence of rarer and episodic events like violent deaths. So if violence is concentrated in pockets, cluster sampling is more likely to miss it than to hit it. If it does hit it, you will have a big effect, but that point will mostly show up as an outlier (like Fallujah in the first study) and will be deleted from the study.
As to the bias of the interviewers. Burnham says they were medical doctors who where experienced in doing public health surveys. Could they have made up data? Why? Because they all hate us. LOL! If that's the case why trust any survey takers. They will always have a motive to lie to please the survey designers. Based on that criteria we should discredit all of Stephen Moore's opinion surveys that he conducted for the Coalition because he would only hire Iraqis who supported the Coalition and would elicit answers that they all loved us.
Andy L (#24): From the byline, about the author . . . [t]hey seem to have left out "...and purveyor of truthiness to Republican's in their political civil war with their fellow Americans."
Andy L (#31): are you just deploying "level 2" defense mechanisms...attack the motives of your opponents?
WARNING!! IRONY ALERT!!
Did Wall Street Journal Find Fatal Flaw in Lancet Iraq Study?
October 18, 2006
Rebecca Goldin Ph.D. and Trevor Butterworth
http://www.stats.org/stories/did_wsj_flaw_iraq_oct18_06.htm
Moore:
"_What happens when you don't use enough cluster points in a survey? You get crazy results when compared to a known quantity, or a survey with more cluster points.
With so few cluster points, it is highly unlikely the Johns Hopkins survey is representative of the population in Iraq._
"On the face of it, this sounds like a fatal flaw. But unless the sample is actually biased, a smaller number of cluster points only has the effect of widening the confidence interval. Polls don't like large confidence intervals, but for the purposes of estimating large numbers of people, even the wide confidence interval of the Lancet study is informative."
"The point is that the number of clusters relative to the size of the population is less relevant than whether the sample of clusters is representative of the population. So when Moore implicitly criticizes the Lancet study in relation to a similar study on Kosovo which used 50 cluster points, for a population of just 1.6 million, compared to Iraq's 27 million, the issue is not one of brute numbers, but whether the clusters chosen are representative of the overall population."
If we take an example like the fraudulent twin work of Cyril Burt, the results are garbage because the input was simply fabricated. But the study design, if it had been conducted as advertised, was valid.
Incidentally, you should note chew2's comment that the more evenly spread out deaths in Kosovo would influence the data in the opposite direction from what you claimed.
Treefrog
"Are you claiming that death is more evenly distributed in Kosovo?
Yep, smaller geographic area, and the fighting covered, as far as I know, essentially all of it. That means the deaths are going to be a lot more spread out."
I believe this is false. Kosovo was ethnic cleansing and guerilla war, so was localized in ethnic pockets. True we also bombed Serbain armed forces, but that was also localized and concentrated.
chew2 #33:
A bit of a straw man. Regarding the earlier study, here's an excerpt from an interview with Les Roberts --
I wrote then,
Here is a 1/4/06 interview of Gilbert Burnham in the Baltimore City Paper --
Under the circumstances, I think it's reasonable to question whether Burnham and Roberts are unbiased. In the City Paper quotes, Burnham seems to me to dismiss critics as unscientific, emotion-based dimwits. Such a perspective may not necessarily have been the best motivator for getting the science right the second time.
#36
It says about the same thing I'm saying.
But I think 'unreliable' is apt.
And the main point.
"Selecting clusters and households that are representative and random is enormously difficult. Moreover, any bias on the part of the interviewers in the selection process would occur in every cluster and would therefore be magnified. The authors point out the possibility of bias, but they do not account for it in their report.
They simply didn't select enough sample points to assure anyone of randomness on that basis (and they even acknowledge that themselves based on that enormous confidence interval). Which means we now have to trust that they sufficiently spread their clusters out enough to capture things correctly, and that trust is in short supply.
Extraordinary results require extraordinary rigor to prove. Even if you think they could have gotten accurate results based on their methodology, which I, obviously, don't believe, you have to believe that their study was MORE rigorous than all the other studies. That's the only way they could be right, is if everyone else is wrong.
As for the effects clustering would have on the data, we're into roll the dice territory here. How many outliers did they hit, and how would they even know? The only way to detect an outlier is by comparison with other data points. Which again returns to the number of samples problem.
Ideally, clustering doesn't skew the data one way or the other. If 10% of potential cluster areas had casualties to measure, then if my clusters are properly distributed, 10% of my clusters will fall on those areas, and the data should be right.
Practically you are correct, clustering has all kinds of reporting problems, an outlier hit on a particularly hard hit area will skew the data high. A really hard hit area will have no one left alive to interview, and thus skew the data low. These problems are all known, and the report mentions that they knew about them and took 'corrective' measures. Now we have potential bias issues, in theory a disinterested observer could correct for these in their sample set, but did the Lancet group do so properly?
Which is the point of the guy I quoted. Their methodology was potentially accurate, but not rigorously so, disagreed with the conventional numbers, and was open to bias.
In scientific terminology, their is no worse label for data than 'unreliable'. If this were any other field, the results would've ended up in the trash can.
Would you take prescription drugs where the clinical trial used this methodology to survey side effects?
#34
Not really sure what your definition of "irony" is, but I would say it is fairly obvious to suggest that a Republican Consultant's "critique", published in an "Opinion Journal", of a scientific study, published in a peer-review journal of high repute, is motivated by-ahem- something other than their desire for objective accuracy and the pursuit of truth...
Treefrog,
Are you a stats guy?
"Which is the point of the guy I quoted."
The stats guy you quoted made an idiotic statement, reflecting his own bias. He claimed the methodology was sound (because it was). He claims selecting clusters is hard, but doesn't claim that the study didn't do this randomly and correctly (because it was random). So all he can fall back on is the potential "bias of the interviewers" (i.e they would fake the answers). He presents no evidence to assume they are biased or that they faked the answers.
Give me a break.
You don't need training in statistics to claim fraud. Any monkey off the street can do that.
You imply that all the Iraqis hate us so will fake the results. So if some Americans, like you, did the interviewing, they would fake the results the other way because they wouldn't want the results to make us look bad?
Look, I think the estimates are shockingly high and have some doubts that the true number is that high. But its just one study. I'm not going to call the people who did this study frauds, unless there is more evidence than has been presented so far. I also don't think Pres Bush's claim of 30,000 is at all realistic either. I think he probably just made that up.
Andy L #40:
With the alternatives being peer-review and no peer-review, I'll take the former for its benefits. However, those good effects are not limitless. Referees and editors aren't saints or all-knowing gods, and there are certain things that peer-review isn't good at addressing.
"published in a peer-review journal of high repute" is what we say when we like an article, or its conclusions. State it often or in hushed tones, and you'll soon elicit eye-rolls, at least from those who have participated in the process. If it was that good, scientists would have much less to talk about at Happy Hour.
chew2 #41, the latter part of your remark was directed to me rather than treefrog, I think:
No. I cut-and-pasted an interview that Les Roberts gave to a sympathetic reporter at the Socialist Worker. I'm stating (not implying) that the biases of field workers like that could skew the study's results, even in the absence of intent on the part of its senior authors.
"I'm offended by that!" or its cousin, Sarcasm, shouldn't be an acceptable answer to "Are you sure you've taken every practical step to insulate your proposed study from all sources of bias?"
Agreed. The impression that I get is that the authors are resisitant to criticsm, secure in their politics, and assured of the correctness of their methods and conclusions. Which, happily, agree with their politics. Their numbers may well be correct--I don't know either. In which case there are a lot of corrolaries to consider, with respect to the deceit, negligence, and incompetence that others' numbers necessarily imply. The new Lancet study isn't "fraud." So far, it doesn't look like good science, either. (Though, without a thorough read, I'm unwilling to make a definitive judgement, one way or the other.)
Hunh? You could pick only one cluster and assure people of its "randomness". That can't be what you mean. (The confidence interval for one cluster would be extreme.) There isn't any evidence that the clusters here were picked in a non-random way, and the enormous confidence interval is an accurate reflection, mathematically, of the number of clusters chosen.
If there were fraud, interviewer bias, incentives for respondents to lie, or non-random selection of clusters, it wouldn't matter how many clusters were chosen (except, I guess, getting the entire population). You seem to be conflating two lines of attack here. The number of clusters is not evidence of bias and it is not evidence of lack of randomness. It is evidence of more or less precision in the results.
Chew2 writes like someone who knows much more statistics than I do. Internet liars' dice is a dangerous game, but given that neither Treefrog nor Moore have even anted up with their stat training…
I ran into an interesting interview that mentions a previous Lancet death-count study that was found to be flawed in methodology. Page down to the third question. It seems The Lancet has tried this before. If first you don’t succeed try again with subtler cheating. “Link”:http://news.bbc.co.uk/1/hi/programmes/newsnight/4950254.stm
Get this: The Lancet’s own previous bogus study in 2004 put the figure at 98,000. We are expected to believe that in the intervening 2 years there have been 557,000 deaths.
#42
I'll agree with your first paragraph but disagee with the cynicism of the second, strongly.
Peer review is certainly not a mechanism for producting perfect research works, and no one who publishes would make such a claim. But it does provide a threshold for rooting out research that has obvious flaws or biases (again, not perfectly; what human construct is?), but certainly leagues better than the article published by the Republican Operative in a Right Wing Opinion Journal that doesn't even try to hide it's bias, wouldn't you say?
Its like sniping at the pros at a baseball game from the stands, while shoving a hot dog down your throat. You might convince the folks sitting next to you (all you here) that you know how to play outfield better than Carlos Beltran, but ain't no one going to pay to see you play....
#41
I am not a statistician, I do programming of statistical models, with real statisticians, and the related fall out thereof. If my team tried to hand over sales estimates like this, we'd be tarred and feathered.
As for why the guy quoted sounds bad, it's because he's speaking scientifically (i.e. precisely) and not politically (i.e. to generate sound bites), which is why he seems to contradict himself. Real scientists always interview terribly because they qualify the heck out of everything. One of the reasons the media always falls for the nuts, because they are always sure of themselves, but I digress. Read him very carefully. What he says is that their methodology (meaning the clustering survey technique) is sound. But then he takes issues with their implementation of that methodology. Granted, he doesn't explicitly lay out all his issues with the study in the given quote.
Look, AJL wanted an example of an argument from authority against the study, so I produced one.
So all he can fall back on is the potential "bias of the interviewers" (i.e they would fake the answers).
Now we start with the straw men... Bias does not mean faking. Bias in this context would be undercorrecting, overcorrecting, incorrect base assumptions, incorrect base data (demographic), etc. In this case bias means (www.dictionary.com):
3. Statistics. a systematic as opposed to a random distortion of a statistic as a result of sampling procedure.
The distortion need not be deliberate. Often isn't.
You imply that all the Iraqis hate us so will fake the results. So if some Americans, like you, did the interviewing, they would fake the results the other way because they wouldn't want the results to make us look bad?
You have me confused with someone else.
I'm not calling them frauds, I'm saying their data is worthless.
I'll just hammer my basic point again. They produced results out of line with all similar data and lack the methodological rigor to show why their data is better than everyone elses.
I really don't care about charges of political bias, suspicious timing, etc, etc. Their results were out of line and they can't back them up. Their results belong in the junk bin unless someone else comes along with a better study backing them up.
robust statistics and cluster analysis were used to get a desired or pre-ordained conclusion. nonparametrics (robust statistics) are used when the shape of the distribution that you are sampling from is unknown. i sincerely doubt that the death distribution across all of Iraq is normal, ie gaussian. Some provinces are very high, some are very low.
an across-Iraq death estimate is design biased, because the high deathrate provinces have equal weight with low deathrate provinces. this skews the distribution in favor of more death.
the death contest is stupid.
this is the way it really is.
the LEFT: see? more death! you suck!
the RIGHT: see? less death than saddam! we rock! you suck!
Andy L #46, one final thought then I'll duck out. Possibly confusing to have two separate discussions (Clustering techniques and Bias) interwoven like this, anyway.
Peer review roots out obvious flaws or bias, yes... but "obvious" is in the eye of the beholder. In this context, I suspect that editor Peter Horton's rendition of peer review may have added little value. But not being privy to the process, the pre-review manuscript, or the reviews, I don't know.
Aside from peer-review-vetted letters to a few journals (and a few other conventions) the process is quite rarely used to contest and debate the peer-reviewed literature. This is a shortcoming that has been long noted in the hard sciences; Science had a recent report on the subject as it concerned some stem cell work.
I'd say Steven Moore gets as much credit as Tim Lambert, you, or me for stepping up and formulating his thoughts into an essay that he placed into the Public Square for inspection and debate. Yes, peer review (ex-Lancet) would probably have sharpened his piece, as that's one of its good effects. But I am not sure where he would have found a suitable forum for the exercise.
As far as Moore's bias, his metier was plainly listed with the article. You've twitted him with that repeatedly; readers here are smart enough to have noticed your point. You might consider letting it go for a while.
#44
Hunh? You could pick only one cluster and assure people of its "randomness". That can't be what you mean. (The confidence interval for one cluster would be extreme.) There isn't any evidence that the clusters here were picked in a non-random way, and the enormous confidence interval is an accurate reflection, mathematically, of the number of clusters chosen.
Sorry, that was sloppy, replace 'randomness' with 'representativeness' and you have what I meant.
If you read the report (available on the Lancet website) what you find is that they selected cluster points by divvying 50 points up amongst the various Iraqi 'Governorate' zones, then within those zones spread the points out by 'Governorate's constituent administrative units' population. Then they picked a random 'major street' and then a 'residential street' off of that major street, and then sent a team of interviewers in to interview households in that area at the disgression of the team.
My problem with the low sample size is that several provinces with populations around a million got 1 or 2 cluster points to sample the entire area. The cluster points weren't randomly selected within the Governates, instead they were divied out by 'proportional population'.
My question is how the heck you divvy out 2 points by proportional population. And how do you expect to get a representative sample out of that?
Full Report Here
If you want to judge for yourself.
matoko #48 wrote:
Wrong thread. This one is about the use of clustering in statistics and about bias (sampling and otherwise).
Better topics, actually. (I liked the first part of your comment more.)
#49
"I'd say Steven Moore gets as much credit as Tim Lambert, you, or me for stepping up and formulating his thoughts into an essay that he placed into the Public Square for inspection and debate."
Sorry for boring you with my repeated highlighting of Moore's perspective, but unfortunately the point still seems to have escaped you, as this comment suggests.
To suggest that Moore's purpose in writing his thin critique was anything other than to provide fuel to the thousands of online and on air debates being waged in defense of Bush's and the Repubican's Iraq war strategy...in other words, PR and damage control....strains credulity, and indicates that you are either unwilling to acknowledge or simply naive about the manner in which the Right operates in the "Public Square".
Amac,
Yes I did attibute your comments to some other poster. The bias of the interviewers is a separate issue from the bias of the authors.
"The new Lancet study isn't "fraud." So far, it doesn't look like good science, either. (Though, without a thorough read, I'm unwilling to make a definitive judgement, one way or the other.)"
So your critique of the study is that the investigators were "biased" and would shade their results. This had nothing to do with whether their methods were "good science".
I think we essentially have 3 possibilities here although I'm open to others
1) The date is correct. The problem with this one is that it doesn't match anything else produced thus far, and doesn't feel intuitively correct either.
2) The authors deliberately threw the study. I.E. the fraud explanation. I don't like this one either. Say what you will about academics, but these guys do have good credentials and accusations of fraud need better evidence by far.
3) The implementation was flawed. This is my take. They were on the right track but lacked the budget to make it work, and sorta combed over the lack of data here. Insert professional pride and we have all the makings for a train wreck.
I'm not sure what other options we have, so take your pick.
treefrog
"Now we start with the straw men... Bias does not mean faking. Bias in this context would be undercorrecting, overcorrecting, incorrect base assumptions, incorrect base data (demographic), etc. In this case bias means (www.dictionary.com):"
Your stats guy was talking about the interviewers. The only faking they could do would be to fake answers to the interview.
Undercorrecting etc. would be by the study authors, not the interviewers. There is no evidence they did anything that was not good statistics.
They chose a cluster within each governate randomly. They weighted the cluster by the population in the governate. Not much else you can do to ensure no bias. Low violence governates shoud have been given roughly the right weight relative to high violence governates.
Your stats guy was talking about the interviewers. The only faking they could do would be to fake answers to the interview.
Actually the Lancet study itself talks about ways the interviewers could introduce bias into their results (other than faking) but basically just shrugs it's shoulders and says oh well...
Wasn't very reassuring at all.
_They chose a cluster within each governate randomly. _
Actually, this was my point. They didn't choose the clusters randomly. They assigned clusters to provinces based on 2004 population data. So far so good. But then they assigned the clusters within the provinces (Governates in the study speak) to 'Governorate's constituent administrative units', I'm assuming these are counties, or some approximation thereof (towns, villages, cities?). The method here, also distribution by population. NOT RANDOMLY. The problem I had with this goes back to the small number of clusters, how do you assign by population 2 clusters to a governate with a million people? Did the biggest cities get the clusters. Did they pick one big city and one tiny village? The report doesn't say and I'm not sure how you get an accurate cross section of the governate based on a whopping two cluster points.
I'd have felt better if they had picked the points randomly.
So who are these 650,000 allegedly dead people, and who killed them?
650,000 insurgents killed by Coalition forces? William Tecumseh Sherman would be proud. Un-fucking-believable kill ratio.
650,000 civilians killed by insurgent torture-and-murder crews? That hardly argues for the immorality of our efforts against such a genocidal enemy. Let those efforts be redoubled and trebled.
I imagine the left pictures them as a more diverse group, however. Poets, vegetarians, children skeet-shot by Halliburton employees ...
Treefrog,
"But then they assigned the clusters within the provinces (Governates in the study speak) to 'Governorate's constituent administrative units', I'm assuming these are counties, or some approximation thereof (towns, villages, cities?). The method here, also distribution by population. NOT RANDOMLY"
you're wrong, it was random. they chose the administrative unit by random number. then they chose a street within that administrative unit by random number. then they chose a house on that street by random number. that was the cluster chosen.
Glen, congrats on making all the other pro-War posters on this thread look like geniuses.
The excess deaths were not all from violence. They would include increased mortality from lack of access to hospitals, deterioration in living conditions, and many other causes. According to the study, there's a fair component of women and young children included.
By the way, do you have the names of the six million Jews handy? The ten million Stalin killed? Do you really want to require such a high standard of proof?
Treefrog,
Correction. You were right. They chose the administrative unit randomly, but with odds proportionate to its population size.
I don't think there is anything wrong with this. That meant that any household in the governate had an equal chance of being selected as the cluster center regardless of where located. If you had a town with 2000 people and a town with 20,000 people, why should they have an equal chance of being selected? That would bias the results to smaller towns.
re: #31
It is not my intent to attack anyone. I find Les Roberts a charming guy who is a passionate advocate for his point of view.
I just find myself at odds with the practice of labeling one's point of view as science.
I'm not in the practice of supporting Democrats, but if Les runs for Congress again, I'd strongly consider sending him a campaign donation.
I suggest this study be examined at Mystery Pollster whom I have found to be very coherent when discussing polling and statistical work
ummm...okthen.
the study is only bad science in that it has design bias. the previous lancet study was under attack and their problem statement alludes to continuing the study.
they were seeking validation of the previous results.
i doubt there was deliberate fraud, but the study design likely introduced both sample bias and collection bias.
the design was also chosen for collection pragmatics.
they attempted to use robust (non-parametric) statistics to smooth outlier deathrates, either extremely high or extremely low.
a better design with more accurate results would have done both inter province and across province aggregates.
i expect they did what they could with the fundage avaiable.
still, how important is the 600K plus-or-minus 300K as a statistic? it's obvious that OIF has caused more death--so what? without factor analysis or principal components or something like that we don't really have much more information that before the survey.
if system entrophy in Iraq is increasing at some interesting rate, we would also need to know stuff like power availabilty, goods and services, etc., all kinds of parameters besides the gross # of deaths since 2003.
chew2
I agree with you that there is nothing wrong with the base methodology they are using. I just have issues with the combination of this methodology and the sample size. The proportionate by size thing just doesn't work well with just a few points.
Check out table 1 on the report. The only areas to get more than 3 points were Ninewa and Baghdad. So Basrah for example got 3 points to survey 1.8 million people.
With so few sample points, random proportionate to size is going to heavily weight the larger population zones over the low pop zones. If a third of the population, for example, is in 2 or 3 large cities, and the rest is spread in countless little villages, if I only have 2 or 3 clusters, chances are they'll get assigned to the big cities, and ignore the smaller towns. If, as seems likely, the violence is mostly an urban phenomenen, this seems to me where they pulled the high numbers from.
If they just had more sample points, the random proportionate thing would've started throwing more sample points into the little towns and it would've worked.
Since we pretty much all agree their numbers aren rather high, and I agree with you that outright fraud doesn't seem likely, that leaves just miscalculation (they screwed up the computations [ possible but unlikely, these are professionals]) or some form of sample error.
The combination of cluster point selection method and low cluster point count is my bet. What's your guess?
This is getting less believable all the time.
Barring major disease or starvation, there is no reason why half a million people would die in a few years from having poorer living conditions and lowered access to hospitals.
No doubt there is typhoid and cholera in Iraq, and diarrhea from polluted water. But the estimates of typhoid that I've seen are a few thousand cases, not hundreds of thousands.
Furthermore, whatever the real numbers are, there's no reason to simply attribute this to invasion/occupation. The Coalition forces have provided vast amounts of food, clean water, and medical care to the population. They have also provided security, without which many, many more people would be dead.
During the reign of Saddam the Great, on the other hand, health services were cut 90% because of sanctions. Insulin and cancer drugs disappeared from Iraq. People went hungry while Eurocrats gorged themselves on Oil for Food bribes. Do you think that might have affected somebody's health?
I believe the death toll attributed to this was estimated at 5000 a month (children, not total deaths) by Unicef. What methodology did they use to arrive at this figure, and how does it compare with what we've seen since?
Frankly, I'm with Grackle. As long as the killed or killers are not American, I'm not concerned either. And let's face it, savages slaughter each other. It's what they do best.
Glen mentions clean water.
In the most recent estimate from Brookings, potable water, sewerage, and wastewater treatment capacity are all below pre-war (i.e., Saddam) levels. Took me three minutes to find.
Your post, Glen, is a textbook case of the worthless "argument from incredulity". (No one found the Holocaust real credible, till it happened.) I'm sorry it has to share thread with Amac and Treefrog, whose contributions are much more valuable.
Andrew:
I can't believe that you said that. I'm going to argue from incredulity and just pretend that you didn't.
The fact, Glen, that the 650K number is incredible to you is no more important than Noam Chomsky's finding reports of Pol Pot's atrocities incredible, nor the people who found (or still find) the Holocaust incredible, nor the American Communists who found reports of the gulags incredible—there are many ways to attack the survey results here, but "Wow, that's a big number" is not a legitimate one.
IIRC, th study claims that 31% of the excess deaths are directly attributable to coalition action, mostly shooting and airstrike.
That's 203k.
Ah, to think that clustering methodology would arouse such passion. My old stats professor would likely be jumping for joy. :-)
chew2/Treefrog: I believe one major issue with the clustering methodology used by the authors could occur if population density is correlated with ethnicity or other key factors in certain areas. For example, if minorities in certain areas are much more likely to live in larger communities, then they would be over represented in this analysis. This could bias the results since mixed neighborhoods probably suffer from a disproportionate amount of violence. I agree with Treefrog that a random selection process might be better. Increasing the number of clusters would also be nice.
Re: Moore/Roberts political affilication. Isn't this irrelevant to the current discussion? The line of discussion here is regarding a study's statistical methodology, not who votes for who, and ad hominem attacks don't shed any clarity on the issue at hand. I'm skeptical of the survey results because of the sampling issue mentioned above, not because Roberts ran as a Democrat. Even if Rush Limbaugh, Ann Coulter, and Karl Rove said that these results were valid, that still wouldn't change my opinion.
Fretensisx #72, re: Roberts' (and Burnham's) political affilication:
For my part, I don't think I've made ad hominem attacks. But I do think the question of the political orientation of the study authors is worth considering. In my opinion, the authors and the chief editor of The Lancet have shown themselves to be activists. They are dedicated to getting the truth out to the American voting public. They want voters to have certain information fresh in their minds: namely, the hideously large numbers of Iraqi deaths that the Bush Administration and its Republican Congressional allies are responsible for. They want this to be taken into account in the forthcoming elections.
Not that there's anything wrong with that stance.
But it strikes me as disingenuous to participate in this manner as political actors, and then respond to criticsm and concerns of bias by saying, "why, that's outrageous--we are scientists, and our results--like all science--is about the impartial truth, and beyond that, our article was rigorously peer-reviewed!"
I would rather that they and their defenders look at the matter objectively, which is to say that by designing and using their studies to serve their ideological agenda, they are necessarily vulnerable to issues related to biases that might have influenced their work. As I noted earlier in the thread, they could have addressed this issue successfully in the study-design phase, and chose not to. My tentative conclusion is that the authors and editor were more interested in properly-timed headlines than in maximum methodological rigor. I recognize that others view the matter differently, and have presented their arguments ably in this thread.
This is not to dismiss the study or to proclaim that it is worthless. It's more along the lines of pharma company scientists publishing on a drug that their employer very much hopes makes it to market. These studies are often excellent. Ironically, The Lancet was at the forefront of demanding that those conflicts of interest (and potential and perceived conflicts of interest) be presented clearly to its readership.
(And now, back to cluster sampling issues.)
AMac: My comment on ad hominem attacks definitely was not directed at you; I think your posts have been very informative so far, particulary post #10. I was just getting disturbed by a number of posts where people seemed to dismiss a line of argument based on its source without any serious evaluation. , i.e., that man is a Republican/Democrat, so his line of reasoning is wrong because all Republicans/Democrats are wrong.
I would say that most people who have had any exposure to the academic world are probably initially stunned at how political an environment it is. Anyone who has studied the history of science or has experience in the field probably has a chuckle at the idea of scientific papers as impartial truth and the objectiveness of peer review. But I would think that it is better to attack the Lancet article based on methodology issues rather than question the motivations of the researchers. If people have questions on methodology, arguing that the paper has already been peer-reviewed is not even close to a sufficient defense and leaves the discussion at a more objective level. Attacking someone's motivation is opening a pandora's box of personal attacks and counterattacks, many of which are empirically impossible to prove or disprove, which is why I would hesitate to use that as a point of discussion.
Fretensisx, thanks for the clarification. Your point on politics in the academic world seems well taken. Public health is probably one of the medical specialties that suffers most from issues-related groupthink (or, is very constructively engaged in making science relevant, if you wish).
Pajamas Media's David Miniter conducts an email interview with Gilbert Burnham, Jousting with the Lancet. Some of the points discussed here are covered.
Via The Belmont Club.