In the further disappearance of the concept of anonymity, statistical analysis allows individuals’ marks on bubble forms to be identified as corresponding to the same person. That is, someone’s marks on a bubble form can be used to identify them the same way handwriting might (though it still seems with less accuracy). We learn that filling in your bubbles thoroughly and completely is probably the best way to stay anonymous.
The shocker here of course is that a small little mark – filling in a circle – can be unique across a sample of almost 100 surveys. It did bring to mind, though, an experience last fall where some students distributed a two-page survey to almost that many people that asked, among other things, a few questions to be indicated by marking boxes. Somehow, the first and second pages of the surveys got stacked in separate piles. After a short lecture on the role of correlating answers between questions in analyzing survey results, we decided to see if we could put at least some of the surveys back together. We started off with the trivial – only one person used green pen. Three people used red, but one used check marks and the other two used Xs, and the Xs were very different. But, we actually realized that we were able to, with pretty high confidence, match all of the surveys back together. (And, of course, intended to take any correlations found between the first and second page with a very large grain of salt, but it allowed the students to proceed with the assignment.) I would love to have seen data in the paper on human ability to recreate these patterns – was the previous anonymity perceived in bubble forms real, or did it exist only by virtue of the prohibitive effort involved in having people solve the problem by hand?
The paper itself is linked in the weblog entry above and looks to be a pretty accessible read to those with just a touch of background. It has a very extensive discussion section that gets into some really nice content about the places bubble forms are used (e.g. standardized tests, voting forms, research surveys) and the implications of these results in these settings. It is another piece in the trend for easily accessed large data sets and the computational power to do statistical analyses across them thoroughly eating into our ability to obscure who we are.