NC 2012 Interstate Crosscheck election fraud Monte Carlo (statistical) simulator (source code)
Contents:
- Usage instructions
- Perl source code
- Sample output
- Simulated v. calculated distribution
- Email discussion
Usage instructions:
perl test_voter_fraud_stats.pl -h
NC 2012 Interstate Crosscheck election fraud Monte Carlo statistical simulator
USAGE:
perl test_voter_fraud_stats.pl [options]
Where [options] is any or all of:
'-d' enable debug prints
'-n12345678' specify number of simulations to do (default = 1,000,000)
'-sFilename' specify file where intermediate results are periodically saved
'-rFilename' specify file(s) from which intermediate results should be loaded
('-rFilename' may be repeated to load multiple sets of results)
'-q' "quick mode" - does half as many rand() calls and runs ~32% faster
'-b' just benchmark the computer, do no simulations
'-h' or '-?' print these instructions
EXAMPLES:
1. Run 1,500,000 simulations (instead of the default of one million):
perl test_voter_fraud_stats.pl -n1,500,000
(commas are optional.)
2. Use a save-file so that you can restart the simulations if the program
gets stopped before completion:
perl test_voter_fraud_stats.pl -n200000 -srun1.txt
(The data is saved at the end of each row of progress-dots.)
3. View result before it's done (or afterward):
type run1.txt
('type' is for Windows; use 'cat' on Linux.)
4. Restart the simulations from where they left off:
perl test_voter_fraud_stats.pl -n200000 -rrun1.txt
(Note: run1.txt must already exist.)
5. Restart the simulations where they left off, and periodically update the
save-file so you can restart the program again later:
perl test_voter_fraud_stats.pl -n200000 -rrun1.txt -srun1.txt
(Note: run1.txt need not already exist.)
6. Run four instances simultaneously (perhaps on a 4-core computer):
At 1st cmd prompt:
perl test_voter_fraud_stats.pl -n250000 -rrun1.txt -srun1.txt
At 2nd cmd prompt:
perl test_voter_fraud_stats.pl -n250000 -rrun2.txt -srun2.txt
At 3nd cmd prompt:
perl test_voter_fraud_stats.pl -n250000 -rrun3.txt -srun3.txt
At 4th cmd prompt:
perl test_voter_fraud_stats.pl -n250000 -rrun4.txt -srun4.txt
7. Make a combined report from the data from four different simulation runs:
perl test_voter_fraud_stats.pl -rrun1.txt -rrun2.txt -rrun3.txt -rrun4.txt
by Dave Buurton
http://www.sealevel.info/
M: 919-244-3316
This is free, uncopyrighted, open source software.
*** To view this 'help' message one screen-full at a time, pipe it to 'more':
perl test_voter_fraud_stats.pl -h | more
#!/usr/bin/perl
# by David A. Burton
# Cary, NC USA
# +1 919-481-2183
# Email: http://www.burtonsys.com/email/
# This is free, uncopyrighted, open source software.
# However, as a courtesy, I ask that you please retain this notice in copies of the program.
# TLIB Version Control fills in the version information for us:
$version_str = "";
#--=>keyflag<=-- "&(#)%n, version %v, %d "
$version_str = "&(#)test_voter_fraud_stats.pl, version 25, 03-Dec-20 ";
# Number of simulations to run:
$numruns = 1000000;
# Note: On my 2011 Dell i5-2310 PC, 1 million simulations takes 170 minutes, or 115 minutes in Quick Mode
# The 2012 NC Interstate Crosscheck found 35750 cases w/ voter name & date-of-birth matching voters in other States
$number_of_name_and_dob_matches = 35750;
# This program is written for Perl 5.008, but works (more slowly) even with
# Perl 4.0036, if you delete this line:
use Time::HiRes 'time'; # for Perl 4 you'll need to delete this line
# immediate output of debug prints
$| = 1;
$numruns_defaulted = 1; # changed to 0 (false) if they specify '-n...'
# echo the command line
print "\nperl $0 " . join(' ',@ARGV) . "\n\n";
# What version of Perl is this?
$hasperl5 = 0;
$perlver = "3 or earlier";
if ($] =~ /\$\$Revision\:\s*([0-9.]+)\s/) {
$perlver = $1; # probably 4.something
} elsif ($] =~ /([0-9][0-9.]*)/) {
$perlver = $1; # probably 5.something or 6.something
$hasperl5 = 1;
}
print "You are using Perl version $perlver\n";
$debugmode=0; # for debug prints
$start_time = time(); # for measuring the program's runtime
$savefile = ''; # file for periodically saving intermediate results
$restorefiles = (); # files from which to read previously calculated results
$benchmarkonly = 0; # 1 if '-b' was specified
$quickmode = 0; # 1 if '-q' was specified
$numruns_with_commas = &commafy($numruns); # initial value (in case user doesn't specify '-n...')
# parse command-line options
while (($#ARGV >= 0) && ('-' eq substr($ARGV[0],0,1))) {
if ($ARGV[0] =~ /^\-(\-|)[h\?]/i) {
&showhelp(); # display 'help' and exit
exit 1;
} elsif ($ARGV[0] =~ /^\-d$/i) {
$debugmode++; # turn on debug prints; specify twice for extra verbosity
} elsif ($ARGV[0] =~ /^\-b$/i) {
$benchmarkonly++; # just benchmark the computer, do no simulations
} elsif ($ARGV[0] =~ /^\-q$/i) {
$quickmode++; # "quick mode" -- does half as many rand() calls
} elsif ($ARGV[0] =~ /^\-n([0-9\,]+)$/i) { # specify number of simulations (default = 1 million)
$numruns = $1;
$numruns =~ s/,//g;
if ($numruns <= 0) {
$numruns = 1;
}
$numruns_defaulted = 0;
} elsif ($ARGV[0] =~ /^\-s(.+)$/i) {
$savefile = $1; # specify file into which results should periodically be saved
} elsif ($ARGV[0] =~ /^\-r(.+)$/i) {
push(@restorefiles,$1); # specify file(s) from which results should be restored
} else {
printf "\nERROR: unrecognized command-line option: '%s'\n\n", $ARGV[0];
&showhelp(); # display 'help' and exit
exit 1;
}
shift @ARGV;
}
# Detect whether HiRes is available for timing
$loResTimer = !$hasperl5; # Perl 4 never has HiRes available
if ($hasperl5) {
# Perl 5 should have HiRes, but let's double-check
if ((0.0+int($start_time)) == $start_time) {
# start_time is an exact integer -- looks suspiciously like HiRes is unavailable
&num_coincidences(); # do something which takes more than 1 millisecond, but less than 1 second
$start_time = time();
if ((0.0+int($start_time)) == $start_time) {
# yep, HiRes is unavailable
$loResTimer = 1;
}
}
}
# benchmark this computer
if ($loResTimer) {
# special Perl4 benchmarking kluge, since Time::HiRes is unavailable; wait for clock to 'tick'
do {
$end_time = time();
} while ($end_time == $start_time);
$start_time = $end_time;
}
$cntr = 0;
do {
# time a run of at least ten simulations
&num_coincidences();
$cntr++;
$end_time = time();
} while (($cntr < 10) || ($end_time == $start_time));
$speed = ($cntr / ($end_time - $start_time));
if ($hasperl5) {
$passmark = $speed / (94/1616); # My i5-2310 CPU has a Passmark rating of 1616, and it does 94 simulations / sec
} else {
$passmark = $speed / (65/1616); # Perl 4 is slower than Perl 5
$speed *= 0.95; # Perl 4 seems to slow down a bit for longer runs
}
if ($quickmode) {
$passmark *= .684; # correct for fact that num_coincidences runs faster w/ $quickmode=1
}
$passmark = int($passmark + 0.5);
$speed = int($speed + 0.5);
$speed = &commafy($speed);
print "Speed = $speed simulations/second (estimated single-thread Passmark score $passmark)\n";
if (-1 == $#ARGV) {
print "\n" . "- "x29 . "-\nNote: for instructions, ctrl-break or ctrl-C now, and run:\n perl $0 -h\n" . "- "x29 . "-\n\n";
}
sub showhelp {
print "\nNC 2012 Interstate Crosscheck election fraud Monte Carlo statistical simulator\n" .
"\n" .
"USAGE:\n" .
"\n" .
" perl $0 [options]\n" .
"\n" .
"Where [options] is any or all of:\n" .
"\n" .
" '-d' enable debug prints\n" .
" '-n12345678' specify number of simulations to do (default = $numruns_with_commas)\n" .
" '-sFilename' specify file where intermediate results are periodically saved\n" .
" '-rFilename' specify file(s) from which intermediate results should be loaded\n" .
" ('-rFilename' may be repeated to load multiple sets of results)\n" .
" '-q' \"quick mode\" - does half as many rand() calls and runs ~32% faster\n" .
" '-b' just benchmark the computer, do no simulations\n" .
" '-h' or '-?' print these instructions\n" .
"\n" .
"EXAMPLES:\n" .
"\n" .
"1. Run 1,500,000 simulations (instead of the default of one million):\n" .
"perl test_voter_fraud_stats.pl -n1,500,000\n" .
"(commas are optional.)\n" .
"\n" .
"2. Use a save-file so that you can restart the simulations if the program\n" .
"gets stopped before completion:\n" .
"perl test_voter_fraud_stats.pl -n200000 -srun1.txt\n" .
"(The data is saved at the end of each row of progress-dots.)\n" .
"\n" .
"3. View result before it's done (or afterward):\n" .
"type run1.txt\n" .
"('type' is for Windows; use 'cat' on Linux.)\n" .
"\n" .
"4. Restart the simulations from where they left off:\n" .
"perl test_voter_fraud_stats.pl -n200000 -rrun1.txt\n" .
"(Note: run1.txt must already exist.)\n" .
"\n" .
"5. Restart the simulations where they left off, and periodically update the\n" .
"save-file so you can restart the program again later:\n" .
"perl test_voter_fraud_stats.pl -n200000 -rrun1.txt -srun1.txt\n" .
"(Note: run1.txt need not already exist.)\n" .
"\n" .
"6. Run four instances simultaneously (perhaps on a 4-core computer):\n" .
"At 1st cmd prompt:\n" .
" perl test_voter_fraud_stats.pl -n250000 -rrun1.txt -srun1.txt\n" .
"At 2nd cmd prompt:\n" .
" perl test_voter_fraud_stats.pl -n250000 -rrun2.txt -srun2.txt\n" .
"At 3nd cmd prompt:\n" .
" perl test_voter_fraud_stats.pl -n250000 -rrun3.txt -srun3.txt\n" .
"At 4th cmd prompt:\n" .
" perl test_voter_fraud_stats.pl -n250000 -rrun4.txt -srun4.txt\n" .
"\n" .
"7. Make a combined report from the data from four different simulation runs:\n" .
"perl test_voter_fraud_stats.pl -rrun1.txt -rrun2.txt -rrun3.txt -rrun4.txt\n" .
"\n" .
"by Dave Buurton\n" .
"http://www.sealevel.info/\n" .
"M: 919-244-3316\n" .
"\n" .
"This is free, uncopyrighted, open source software.\n" .
"\n" .
"*** To view this 'help' message one screen-full at a time, pipe it to 'more':\n" .
"perl $0 -h | more\n";
exit(1);
}
# '-b' was specified, so exit after reporting benchmark results
if ($benchmarkonly) {
exit 0;
}
if ($debugmode) {
print "dbg: save file = '$savefile'\n";
print "dbg: restore files = '" . join("','", @restorefiles) . "'\n";
}
# we don't actually use this
$num_args = $#ARGV+1;
# Initialize the buckets. bucket[N] keeps track of how many simulations
# had N innocent coincidences of Last4SSN matching.
@buckets = ();
for ($i=0; $i <= $number_of_name_and_dob_matches; $i++) {
$buckets[$i] = 0;
}
# We go ahead and make 35751 buckets, even though less than 20 will ever be used,
# because it can't hurt, and it uses only an extra 1.8 MB of RAM and hardly affects
# performance at all.
# report the results, or save them to a file
sub report_results {
local($outpfile) = shift;
local($i, $highest_num, $sum, $percentage, $avg);
$highest_num = 0;
for ($i=0; $i < $#buckets; $i++) {
if ($buckets[$i]) {
$highest_num = $i;
}
}
$sum = 0;
for ($i=0; $i<=$highest_num; $i++) {
$sum += ($buckets[$i] * $i);
$percentage = 100 * ($buckets[$i] / $numruns);
printf $outpfile "%3d :%8d : %10.6f\n", $i, $buckets[$i], $percentage;
}
$avg = $sum / $numruns_done;
printf $outpfile "Average = %7.5f\n", $avg;
}
# save current (intermediate) results to a text file
sub save_buckets {
local($outfile) = shift;
open( OUTPUT, ">$outfile" ) || die "ERROR: could not write '$outfile', $!\n";
&report_results(OUTPUT);
close OUTPUT;
}
# Load intermediate results from a text file which was created by save_buckets().
# Note that this can be called multiple times to combine results from several files.
sub load_buckets {
local($inpfile) = shift;
local($sum) = 0;
local($num,$cnt,@tmp);
if (open(INPUT, "$inpfile")) {
while (<INPUT>) {
@tmp = split(/\s*\:\s*/, $_);
if (2 == $#tmp) {
($num,$cnt,$pct) = @tmp;
$num =~ s/[\s\,]//g; # delete whitespace and commas
$cnt =~ s/[\s\,]//g;
$buckets[$num] += $cnt;
$sum += $cnt;
}
}
close INPUT;
print "Loaded $sum simulations from '$inpfile'\n";
} elsif ($inpfile ne $savefile) {
die "ERROR: could not read '$inpfile', $!\n";
} # else if savefile and restorefile are identical, then it's okay if it doesn't initially exist
}
# if '-r...' was specified, then load initial buckets from file, to resume where we left off
for $fn (@restorefiles) {
&load_buckets($fn);
}
$number_of_runs_preloaded = 0;
for ($i=0; $i <= $number_of_name_and_dob_matches; $i++) {
if ($buckets[$i]) {
$number_of_runs_preloaded += $buckets[$i];
}
}
# $numruns_done is needed for calculating the average
$numruns_done = $number_of_runs_preloaded;
if ($numruns_defaulted && ($#restorefiles > 1) && ($number_of_runs_preloaded > 2)) {
# we're just making a combined report, so don't default numruns to a million
$numruns = $numruns_done;
}
if ($number_of_runs_preloaded > $numruns) {
$numruns = $number_of_runs_preloaded;
} else {
$remaining_numruns = $numruns - $number_of_runs_preloaded;
$remaining_numruns_with_commas = &commafy($remaining_numruns);
$estimated_runtime = $remaining_numruns / $speed;
$readable_estimated_runtime = &human_time($estimated_runtime);
print "Estimated run time = $readable_estimated_runtime for $remaining_numruns_with_commas simulations\n";
}
if (($estimated_runtime > (60*60)) && ('' eq $savefile)) {
print "Note: for long simulation runs like this, you really should use '-sSavefile' so\nthat you can resume if it is interrupted.\n";
}
# put commas into an integer if it is > 4 digits long
sub commafy {
local($number) = shift;
local(@pieces) = ();
$number .= '';
# if ($debugmode) { print "dbg: number='$number'\n"; }
if (length($number) > 4) {
while (length($number) > 0) {
if (length($number) <= 3) {
# we could omit this 'if' clause for Perl 5, but Perl 4 needs it
unshift(@pieces,$number);
$number = '';
} else {
unshift(@pieces,substr($number,-3));
substr($number,-3) = '';
}
# if ($debugmode) {
# $tm1 = join(',',@pieces);
# print "dbg: number='$number', pieces='$tm1'\n";
# }
}
$number = join(',',@pieces);
}
# if ($debugmode) { print "dbg: number='$number'\n"; }
return $number;
}
# Return a random integer between 1 and 9999, inclusive. (Won't return 0.)
sub rand10k {
local($result);
$result = rand(9999); # that's >= 0.0, and < 9999.0 (it can never return 9999)
$result = 1 + int($result);
return $result; # valid SSNs cannot end in 0000
}
# convert input in floating point seconds to nice, human-friendly time (e.g., "xx.x minutes")
sub human_time {
local($seconds) = shift;
local($result) = '';
if ($seconds >= 600) {
$minutes = $seconds / 60;
$minutes = int(($minutes * 10) + 0.5) / 10.0;
$result = sprintf("%3.1f minutes", $minutes);
} elsif ($seconds >= 60) {
$minutes = $seconds / 60;
$minutes = int(($minutes * 100) + 0.5) / 100.0;
$result = sprintf("%4.2f minutes", $minutes);
} else {
$result = sprintf("%4.2f seconds", $seconds);
}
return $result;
}
# Run one test: of 35,750 voters, how many match last-4-SSNs by innocent coincidence?
# The expected value, of course, is 35750/9999 = ~3.575
sub num_coincidences {
local($coincidences) = 0;
local($i);
local($ssn1);
local($ssn2);
if ($quickmode) {
$ssn1 = int(rand(9999)); # &rand10k(); -- 'inlined' for better performance
for ($i=0; $i<$number_of_name_and_dob_matches; $i++) {
$ssn2 = int(rand(9999)); # &rand10k();
if ($ssn1 == $ssn2) {
$coincidences++;
}
}
} else {
for ($i=0; $i<$number_of_name_and_dob_matches; $i++) {
$ssn1 = int(rand(9999)); # &rand10k(); -- 'inlined' for better performance
$ssn2 = int(rand(9999)); # &rand10k();
if ($ssn1 == $ssn2) {
$coincidences++;
}
}
}
return $coincidences;
}
$simulations_per_dot = 50;
$dots_per_line = 60;
$modulo_of_dot = int($dots_per_line/3);
if ($quickmode) {
$calls2rand = $numruns * (1 + $number_of_name_and_dob_matches);
} else {
$calls2rand = $numruns * 2 * $number_of_name_and_dob_matches;
}
$calls2rand = &commafy($calls2rand);
$numruns_with_commas = &commafy($numruns);
print "\n$numruns_with_commas simulations";
if (($numruns - $number_of_runs_preloaded) >= 1000) {
print " requires $calls2rand calls to rand(), which takes a while!\n";
print "So, after every $simulations_per_dot" . "th simulation it prints a dot ($dots_per_line/line), as a progress indicator.";
}
print "\n\n";
# for the progress indicator
$dotcolumn = $dotrow = 0;
# Main loop to run the simulations and tabulate the results.
# Print "." as progress indicator every $simulations_per_dot simulations, up to $dots_per_line dots per line.
for ($i=$number_of_runs_preloaded; $i<$numruns; $i++) {
$buckets[ &num_coincidences() ] ++;
if (($i % $simulations_per_dot) == $modulo_of_dot) {
# print a dot
if ($dotcolumn == $dots_per_line) {
$pctdone = ($i * 100) / $numruns;
printf("%5.1f%%\n", $pctdone);
$dotcolumn = 0;
$dotrow++;
$numruns_done = $i;
if ('' ne $savefile) {
&save_buckets($savefile);
}
}
print ".";
$dotcolumn++;
}
}
$numruns_done = $numruns;
if ($dotcolumn > 0) {
print "\n";
$dotcolum = 0;
$dotrow++;
}
# save results one last time at the end
if ('' ne $savefile) {
&save_buckets($savefile);
}
# remind the user that the simulation results are also in the Savefile, if he specified '-sSavefile'
if ($debugmode && ('' ne $savefile)) {
print "Note: results of $sum simulations were saved to '$savefile'\n";
}
# report the results:
print "First column is number of coincidences per 35,750 matches\n";
print " : second column is number of runs (out of $numruns_with_commas) which had that number of coincidences\n";
print " : third column is percentage of runs which had that number of coincidences\n";
&report_results(STDOUT);
# report the run-time:
$end_time = time();
$run_time = $end_time - $start_time; # in seconds
$run_time = &human_time($run_time);
if ($debugmode || ($number_of_runs_preloaded < $numruns)) {
print "Run time = $run_time\n";
}
exit 0;
__END__
Sample output:
First column is number of coincidences per 35,750 matches
: second column is number of runs (out of 25,000,000) which had that number of coincidences
: third column is percentage of runs which had that number of coincidences
0 : 700479 : 2.801916
1 : 2503288 : 10.013152
2 : 4469133 : 17.876532
3 : 5333504 : 21.334016
4 : 4768914 : 19.075656
5 : 3410612 : 13.642448
6 : 2031999 : 8.127996
7 : 1038296 : 4.153184
8 : 463046 : 1.852184
9 : 184655 : 0.738620
10 : 65817 : 0.263268
11 : 21470 : 0.085880
12 : 6339 : 0.025356
13 : 1851 : 0.007404
14 : 461 : 0.001844
15 : 103 : 0.000412
16 : 27 : 0.000108
17 : 5 : 0.000020
18 : 1 : 0.000004
Average = 3.57600
Simulated v. calculated distribution:
First column is number of coincidences, k, per 35,750 matches
: second & third colums are copied from the results of 25,000,000 simulations (above)
: fourth column is percentage calculated by my online binomial probability distribution calculator
: fifth column is cumulative percentage from the binomial calculator, ∑ ≤k
k | simulated | calculated
-----+------------------------+-----------------------------------------------
0 : 700479 : 2.801916% : 2.80004041934694% : 2.80004041934694%
1 : 2503288 : 10.013152 : 10.01214692855101 : 12.81218734789794
2 : 4469133 : 17.876532 : 17.89979198583566 : 30.71197933373361
3 : 5333504 : 21.334016 : 21.33365886209420 : 52.04563819582780
4 : 4768914 : 19.075656 : 19.06917141786560 : 71.11480961369340
5 : 3410612 : 13.642448 : 13.63565916189286 : 84.75046877558626
6 : 2031999 : 8.127996 : 8.125068959489566 : 92.87553773507582
7 : 1038296 : 4.153184 : 4.149722300002787 : 97.02526003507861
8 : 463046 : 1.852184 : 1.854414935099515 : 98.87967497017813
9 : 184655 : 0.738620 : 0.7365973040199915 : 99.61627227419812
10 : 65817 : 0.263268 : 0.2633199064110674 : 99.87959218060919
11 : 21470 : 0.085880 : 0.08557214583945469 : 99.96516432644864
12 : 6339 : 0.025356 : 0.02549062245912742 : 99.99065494890777
13 : 1851 : 0.007404 : 0.007008969989723296 : 99.99766391889749
14 : 461 : 0.001844 : 0.001789497617543090 : 99.99945341651503
15 : 103 : 0.000412 : 0.0004264151954425543 : 99.99987983171048
16 : 27 : 0.000108 : 0.00009525622005113322 : 99.99997508793053
17 : 5 : 0.000020 : 0.00002002686282731367 : 99.99999511479336
18 : 1 : 0.000004 : 0.00000397646134453779 : 99.99999909125470
Predicted average = 35,750 / 9999 = 3.575357535753...
Emails about this software: