Daily Kos

View Story | 133 comments

  •  Not probable: Suwanee County (4.00 / 7)

    Please excuse the ramblings of a tired statistician. Think of an election as flipping a coin, but not with 50-50 odds. Instead, once we count the votes, we know the odds of a randomly choosen ballot in Union County being for Kerry. It is p = 1251/(1251+3396) = 0.2692, according to the orginal count. I assume a binomial distribution.
    Now I can ask about the probability of the distribution Kerry=1272 and Bush=3393 varying from the above distribution based solely on math of the binomial distribution. Turns out the handcount data has p between 0.260 and 0.286 95% of the time. In fine agreement with the original count.
    As for Lafayette County, the original count has p = 0.256. The handcount has p between 0.242 and 0.272 at 95% C.L.
    So, let's turn to the Suwanee County data. The reported handcount gives p = 0.2885. Now, we assume the null hypothesis of this p, when I have 9124 samples with 2984 of them being Kerry ballots. When I run this test, I find poor agreement: p between 0.317 and 0.337 at 95% C.L. At this level, the models representing the orginal and handcounts are 3.9 standard deviations apart.
    I will think about this more tomorrow (later today); but there might be something here.

    The Place of Dead Roads
    "Gentlemen, you can't fight in here! This is the War Room!"

    by Nicholas Phillips on Tue Nov 30, 2004 at 12:21:02 AM PDT

    •  Thanks for that additional analysis. (4.00 / 6)

      And to this person's eyes, nobody in there right minds would look at those results from the 60% sample, say "all is good," and quit counting. At the very least, that pattern with the 60% sample would compel you to keep counting because the tallies are, in fact, way out of line.

      Yet our good friends from the Miami Herald not only say everything is fine in these three counties, contrary to the data they themselves collected, they even have the gall to say everything is fine in the whole state!

      It's really, really shoddy reporting, and they need to be taken to task on it.

      •  It was pretty stupid (none / 0)

        It was pretty stupid that they stopped conting at 60%. Doing so was about as useful, perhaps even less useful, than not counting any of the county at all.

        John McCain: Bush right to veto kids health insurance expansion

        by Chris Bowers on Tue Nov 30, 2004 at 12:32:13 AM PDT

        [ Parent ]

        •  Yes, it was pretty stupid (none / 0)

          It was pretty stupid that they stopped conting at 60%. Doing so was about as useful, perhaps even less useful, than not counting any of the county at all.

          They only needed to count 6,500 more and they would have had an argument.

          •  I wonder... (none / 0)

            ...why the writer included partial results from this county, when the picture would have been clear had the story only reported the two counties with negligible discrepancies?

            Yes, in fact, I do drive a Volvo.

            by KTinOhio on Tue Nov 30, 2004 at 02:03:28 PM PDT

            [ Parent ]

        •  Hmmmm (none / 0)

          Did they say which precints were counted? Does that county have more than one type of voting machine? That might explain why there is a partial count. But without a satisfactory explaination they should have just dropped this county from the story. It just gives rise to more questions.
      •  They stopped counting (none / 0)

        when it became clear that they were in a problem area.
         The first 2 counties listed above had Kerry gains of 0.5% and 0.3% (of the original total), while the extrapolation of the 3rd indicates that they would have found a 7.7% gain for Kerry had they continued counting. I don't know how the scan ballots would be sorted so that 60% was NOT a random sample. I think they have presented a smoking gun, and called it an alibi!
    •  Goodness! 3.9 standard deviations (none / 0)

      I have no idea what that means, and your post made my head hurt, but good work.

      This election was rigged. I know it and I don't need no stinkin proof, but since others seem to require some, I'm glad you math enabled folks are on the case.

      •  standard deviation (none / 1)

        You've probably heard the expression "bell-shaped curve."   Let's say you plot a graph of the deviation from average of some parameter.  If the object being studied is random then 2/3 of all the values (SAT scores, IQs, height, weight, whatever) will fall within plus or minus one standard deviation of the average.

        95% will be within plus or minus 2 standard deviations, and 99% will be within plus or minus 3 standard deviations.   So when someone says that something is 3.9 standard deviations from the average, it means there was a very low probability of its happening by random chance.  

        This doesn't, for example, necessarily mean that the election was rigged, but it means there is a high liklihood of some sort of anomaly having occurred (rigging being one obvious possibility) and suggests that further inquiry is needed.

        BTW, in the above discussion I've glossed over some of the subtle statistical points to try to get a simple explanation for non-technical people, but if anyone thinks I've oversimplified too much please correct/followup.

        •  3.9 standard deviations (none / 0)

          to put it simply.  if the average male population is about 5'10 inches and you see someone who's 6'10" you know that person is tall.  If you see a bunch of people who are 5'10" but a doctor tells them ALL they are 6'10" you've got a measurement problem.

          In this case we don't know if the individual we're looking at is really tall or if the measurements are off. (but if i could find a way to place a bet, you can guess where i'd put my money.)

          That's why we like to have more than one case.

    •  See my comment down thread (none / 0)

      on why this was probably not a random sample
    •  The one flaw in extrapolating these numbers... (none / 0)

      ...is that, under the null hypothesis of the total being accurate, the last 40% of the votes are not dependent on the first 60% of the votes.

      Take a deck of cards: 25% of the cards are spades.  I deal myself 31 cards (almost 60%) and, by chance, I get 11 spades (11/31 = 35.5%).  If I extrapolate to the 21 remaining cards, I would estimate that the deck has 0.355*21 = 7.45 more spades, for a total of 17 or 18 spades in the deck. Which is nonsense.  I know that there are exactly two spades remaining in the deck because there were only 13 spades to begin with.

      Now, the real question is: given the expected total we began with (if there are no shenanigai), what are the odds that the hand recounted SUBsample could have been produced?

      Probability analysis of these counties is asking exactly that -- given that we expect 28.8% Kerry votes, if we pick 9,124 of the 15,675 total votes, what are the odds that we will get 32.6%?  95% of the time, we expect the total for that large a sample to be between 2548 (27.9%) and 2717 (29.7%).  Getting 2984 votes is way outside that...

      To put it another way: the odds of getting this many "spades" out of the deck are less than 10^-15, or less than one in a quadrillion (thousand trillion). Conclusion: our initial hypothesis (28.8% Kerry) must be rejected.

      HOWEVER, I must emphasize that this number is HEAVILY dependent on random sampling. As other posters have said, we don't know if there were very pro-bush precincts excluded -- and counting the rest of the deck will tell us if there are 2 or 7 spades left.

      Only one thing to do: COUNT THE VOTES!

      (I did these calculations and graphs in R, an open-source stats package.  Its very hard to use, and I am far from an expert, but heres the code I entered.)
      ####
      expected = 4522/(11153+4522)
      result = 2984
      total = 2984 + 6140

      # c(2300:3000) says do every integer from 2300 to 3000
      plot(c(2300:3000), dbinom(c(2300:3000), total, expected, log=FALSE))

      # lower 95% interval
      qbinom(0.025, total, expected, log.p=FALSE)

      # upper 95% interval
      qbinom(0.975, total, expected, log.p=FALSE)
      #####

View Story | 133 comments