[Edited by Nort with permission of the author]
A few weeks back I ran a survey related to the notion of a 'Cold Civil War' on this site. When I reported the results of the survey, I mentioned that I was also going to do some analysis with more powerful tools and report if I had found anything else interesting. Well, I did and I have.
Really short form: The Big Sort (see below) is likely onto something. I have some modest statistical evidence that WoC denizens are behaving in the way Bishop (the author of The Big Sort) suggests, and those who think Bush stole 2000 are somewhat more likely to 'sort' themselves out.
I detest when the MSM trots out "the study showed" and gives no idea how the conclusion was reached. So here are the details: first my impression of "The Big Sort" hypothesis, and then my detailed description of what I think I am seeing in the survey data and why.
The Big Sort
In the discussion of the survey, a commenter suggested a relationship to a just-published book called The Big Sort, by Bill Bishop (reviewed by the WSJ here). I haven't read the book yet - it's on order from Amazon - but the thesis is easily described: "Like-minded people increasingly tend to live near like-minded people, thus amplifying the beliefs people hold." The author has an overview website, and here's a set of slides (PDF) from a presentation of his material (found here), that provides the basic talking points. One of the most important is that Bishop is not just regurgitating the Red vs. Blue state themes of the MSM, but looking at a finer geographical grain: "Not red and blue states, he is quick to insist; he calls that cliché an illusion. The reality is red and blue wards and precincts, suburbs and counties."
The 'Big Sort' is about the country turning into a collection of echo chambers, about networks becoming more disjoint over time. Not only was that shift in networks the logic behind the experimental design of my own survey, I'd asked a question about moving for political reason in the original survey. Bishop's hypothesis came my way just as I was trying to make sense of the further analysis of the survey. Explaining the intersection takes some further (and unfortunately lengthy) description of the process:
Data Mining the Cold Civil War
I started by importing the survey results into the R statistical system. This is a freeware analytics program cloned from a famous Bell Labs package. I described the whole process at my home blog for those curious. (R is perhaps overkill for an experiment of this size, but learning my way around it was an additional goal beyond political curiosity.)
The test I used on the survey results is called correspondence analysis. Fortunately for me, two of the best known experts in this procedure had provided code to implement it in R. Correspondence analysis is a form of factors analysis suited for use with categorical data, like survey answers. If that didn't make any sense, think of it as a type of data mining, attempting to find relationships among variables by analyzing a large number of samples.
What you're looking for in such a study are covariance patterns, ways in which some observations (survey responses in this case) correlate to and might predict other responses or characteristics. I obviously believed there would be such correlations and some particular underlying themes, or I wouldn't have named the survey after the hypothetical Cold Civil War, and based the questions on the notion of a breaking of personal networks as being diagnostic of its existence. It turns out such patterns do exist, and they shed some light on the notion of a Big Sort.
Disengage
An analysis of this sort generates patterns of correlation (factors) in order of their importance in the data. The factors are mathematical constructs, that is, they aren't typically compromised of a single variable, or some neat binary combination. Instead there are weights (loadings) that are assigned to the observed variables and serve as either evidence for or against the existence of the factor in a particular observation (survey answer).
It's common that the first few factors from an experiment have some interpretation comprehensible to humans. Later factors are often complex differences among variables, which become difficult to explain in English. That's the case here.
I'll call the first factor from the CCW study 'Disengage'. Here are some of the loadings for 'Disengage':
| FamilyYes | 6.20565381 |
| NeighborYes | 6.07101955 |
| EmployYes | 3.66292315 |
| TownYes | 2.86960559 |
| FriendYes | 2.08080388 |
| DonateYes | 1.97785510 |
| SOYes | 1.73062428 |
| MoveYes | 1.72145567 |
| VirtualYes | 1.27983818 |
| CompanyYes | 1.00500829 |
| CancelYes | 0.94177190 |
| .... | .... |
| FriendNo | -0.43358482 |
| TownNo | -0.47277723 |
| DonateNo | -0.66950134 |
| CompanyNo | -0.99150427 |
| VirtualNo | -1.05007323 |
| CancelNo | -1.25204879 |
A variable that has a 'Yes' at its end represents a positive answer to a particular question, and the same word with 'No' at the end means a negative answer. These are binary variables with either a one or zero, depending. So a "Yes" response to '... have you... boycotted a physical community?' would result in a 1 for TownYes, and a 0 for TownNo, and vice versa. A missing response would set a zero in each. Taking all of the replies in an individual's response to the survey, and adding up the loadings for the variables that are set to one would result in that individual's score on 'Disengage'.
The first thing to observe is that all eleven of the 'Yes' responses are on the positive side of the scales for 'Disengage'. That - fortunately - verifies my hypothesis that having done one of the networking breaking acts predicts that others will also have occurred. The larger the positive number, the more suggestive the particular behavior is of 'Disengage'. The more negative the number for a variable, the more evidence against 'Disengage' it provides for an individual.
Some caution is needed with the variables I've put in italics. These are relatively rare behaviors, reported less than 5% of the time. When they occur, they are highly suggestive, but the number of samples to judge their correlation with other variables is limited. It's likely safe putting all of their 'Yes' variants on the positive side, but the particular magnitude of the loading should be judged with some skepticism.
Thinking further about frequency of occurrence gives insight into the magnitude of the loadings. For instance, 'TownYes' occurs in about 20% of the samples. 'TownNo' is therefore observed about 80% of the time. Finding a 'TownYes' response therefore gives more information about the individual than 'TownNo', and the magnitude of the loading - other things being equal - should be larger for the former, as it is. By the same argument, finding the 'No' case for one of the rarer (italicized) behaviors gives very little information, and it's these variables that I deleted from the middle of the table due to their low loadings and contribution. In the case of behaviors that are close to 50/50 splits, such as canceling subscriptions and boycotting companies, the magnitudes of their positive and negative loadings are similar.
If you took the survey, you'll notice I've held out the responses for party affiliation and attitudes towards 9/11 and the 2000 elections. By my prior chi-squared analysis we know that there are significant correlations, in unsurprising directions, among these extra responses. As this was an experiment regarding network breaking behaviors, I didn't directly introduce these extra (and potentially confounding) responses into the analysis. Instead, after determining the loadings for 'Disengage' we can interpret these extra responses by averaging the scores on 'Disengage' for all individuals who gave a particular response on the extra questions. (If you followed the link on correspondence analysis, this means the behavioral answers are the 'active' variables, and the affiliation and attitude answers are 'supplementary' variables in my experimental design.) Here's what you get:
| W911Inside | 0.67483124 |
| PartyRepublican | 0.19571195 |
| W2000Correct | 0.18021509 |
| W911War | 0.03715160 |
| PartyNeither | 0.03651692 |
| W2000Mess | -0.14203091 |
| W2000Stolen | -0.20110148 |
| W911Distraction | -0.21622403 |
| PartyDemocrat | -0.42674652 |
The big thing to notice here is the low magnitudes. None of these extra answers is more predictive of 'Disengage' than a single 'Yes' answer to a behavioral question. Only a rare troofer "Inside Job" response is a substantial predictor of 'Disengage'. Interestingly, knowing someone is a Democrat is mildly counter-indicative of 'Disengage', while being a Republican yields only a slight positive prediction and an Independent virtually nothing. Responses to the 9/11 and 2000 questions are even less predictive of 'Disengage'.
This single factor - 'Disengage' - explains most of the patterns in the survey responses. If you're following along in CA, it explains 65.8% of the inertia. Yet, we haven't captured everything of interest: The original chi-square results showed significance for party affiliation and attitude towards the 2000 elections in respect of some behavioral answers, and we don't seem to have entirely captured that in this first factor. More may be required to complete the analysis.
Before going there, one further note about factors: By the design of the analysis, each factor is independent of every other. Any particular survey answer will likely give insight on more than one of the factors, but the factors themselves are generated to be 'orthogonal' in the mathematical sense. Knowing where an individual scores on 'Disengage' alone gives absolutely no information about where that person would score on succeeding factors in the analysis.
It turns out there's just one more factor of interest in the analysis (all the rest rank below 1% of inertia) and it's quite interesting.
The Little Sort
Here are some of the loadings for the second factor (7.5% of inertia). I'm calling this one 'Sort', with prejudice:
| SOYes | -9.81492003 |
| MoveYes | -6.00729401 |
| EmployYes | -5.77654300 |
| CancelNo | -1.07063014 |
| NeighborYes | -0.93366498 |
| FriendYes | -0.90672787 |
| CompanyNo | -0.55790551 |
| DonateNo | -0.53309995 |
| .... | .... |
| CompanyYes | 0.57308179 |
| MoveNo | 0.59338207 |
| CancelYes | 0.81189135 |
| TownYes | 0.88117168 |
| FamilyYes | 1.53527462 |
| DonateYes | 1.59127176 |
By an accident of the math, the loadings are reversed in sense from my plain English name for 'Sort'. For this factor, a negative loading indicates a proclivity for 'Sort' and a positive the reverse.
Notice that the 'Yes' and 'No' variables are now mixed up on either side of the factor loadings. The 'Disengage' factor did indeed scrub out the general positive correlation among all network breaking behaviors. Leaving - something else. Neglecting two rare (italic) responses, the one action that provides the most evidence for 'Sort' is having moved domicile due to political issues. Supporting this physical world interpretation is the very low weight given to both senses of the "dropped out of a virtual community" question, -.158 and .137, when determining 'Sort'.
It's also interesting which behaviors stack up on the counter-indication side of 'Sort'. Among non-rare responses, having stopped donations to a nonprofit, boycotted a physical community, cancelled a subscription, or boycotted a company count against an individual displaying the 'Sort' factor.
What's going on here? In my earlier post I speculated that the most partisan individuals on the left might already done the lower impact network breaking behaviors, and then took the heavier decision to move during our period of interest. Meanwhile the rest of the citizenry started catching up with their own network breaks. Call this the "already done that" hypothesis.
Another possibility is that one may move in order to avoid the confrontations implied by some of the other acts. In that case, a respondent might answer 'no' to (for instance) boycotting a (local) company. The move was the primary act. The impacts on local companies, nonprofits and newspapers may be viewed as a side-effect, rather than a distinct, politically motivated act. We can call this the "avoid the aggravation" hypothesis, which seems more in line with Bishop's thesis.
More insight can be gained by mapping the supplementary variables against 'Sort', as we did for 'Disengage':
| W911Inside | -1.06681772 |
| W2000Stolen | -0.64506563 |
| W911Distraction | -0.35035944 |
| PartyDemocrat | -0.16844676 |
| W2000Mess | -0.07333407 |
| PartyRepublican | 0.03911077 |
| PartyNeither | 0.05742253 |
| W911War | 0.07057699 |
| W2000Correct | 0.23776289 |
Now we see the 2000 election effect found before. Of these extra variables, it's attitude towards the results of Bush v. Gore that best predicts 'Sort'. If you think it was a deliberate rip-off, you're more likely to have called the moving vans. Interpretation of 9/11 has a minor effect (except for the few troofers) and party identification has even less.
The Inadvertent Experiment
I didn't design the original Cold Civil War survey with this interpretation in mind, since I hadn't heard of Bishop's work before it was mentioned in comments. Albeit with the dangers of a small sample size, this analysis appears to have independently generated a result supporting the Big Sort hypothesis, from a different perspective than the voting and demographic records that appear to form the core of Bishop's analysis. The CCW survey may have inadvertently shed some light on motivations at the individual level, to go with the area-level Big Sort correlations.
Now, why doesn't somebody try a survey designed to match Bishop's ideas, in a larger and less politically charged venue? Maybe one of you lurkers has the resources and motivation to give it a try.
(NB: The original survey is still open and slowly accumulating more responses. If the number gets to twice that in my original sample, I'll rerun the analysis.)
