perl test_voter_fraud_stats.pl -h NC 2012 Interstate Crosscheck election fraud Monte Carlo statistical simulator USAGE: perl test_voter_fraud_stats.pl [options] Where [options] is any or all of: '-d' enable debug prints '-n12345678' specify number of simulations to do (default = 1,000,000) '-sFilename' specify file where intermediate results are periodically saved '-rFilename' specify file(s) from which intermediate results should be loaded ('-rFilename' may be repeated to load multiple sets of results) '-q' "quick mode" - does half as many rand() calls and runs ~32% faster '-b' just benchmark the computer, do no simulations '-h' or '-?' print these instructions EXAMPLES: 1. Run 1,500,000 simulations (instead of the default of one million): perl test_voter_fraud_stats.pl -n1,500,000 (commas are optional.) 2. Use a save-file so that you can restart the simulations if the program gets stopped before completion: perl test_voter_fraud_stats.pl -n200000 -srun1.txt (The data is saved at the end of each row of progress-dots.) 3. View result before it's done (or afterward): type run1.txt ('type' is for Windows; use 'cat' on Linux.) 4. Restart the simulations from where they left off: perl test_voter_fraud_stats.pl -n200000 -rrun1.txt (Note: run1.txt must already exist.) 5. Restart the simulations where they left off, and periodically update the save-file so you can restart the program again later: perl test_voter_fraud_stats.pl -n200000 -rrun1.txt -srun1.txt (Note: run1.txt need not already exist.) 6. Run four instances simultaneously (perhaps on a 4-core computer): At 1st cmd prompt: perl test_voter_fraud_stats.pl -n250000 -rrun1.txt -srun1.txt At 2nd cmd prompt: perl test_voter_fraud_stats.pl -n250000 -rrun2.txt -srun2.txt At 3nd cmd prompt: perl test_voter_fraud_stats.pl -n250000 -rrun3.txt -srun3.txt At 4th cmd prompt: perl test_voter_fraud_stats.pl -n250000 -rrun4.txt -srun4.txt 7. Make a combined report from the data from four different simulation runs: perl test_voter_fraud_stats.pl -rrun1.txt -rrun2.txt -rrun3.txt -rrun4.txt by Dave Buurton http://www.sealevel.info/ M: 919-244-3316 This is free, uncopyrighted, open source software. *** To view this 'help' message one screen-full at a time, pipe it to 'more': perl test_voter_fraud_stats.pl -h | more
#!/usr/bin/perl # by David A. Burton # Cary, NC USA # +1 919-481-2183 # Email: http://www.burtonsys.com/email/ # This is free, uncopyrighted, open source software. # However, as a courtesy, I ask that you please retain this notice in copies of the program. # TLIB Version Control fills in the version information for us: $version_str = ""; #--=>keyflag<=-- "&(#)%n, version %v, %d " $version_str = "&(#)test_voter_fraud_stats.pl, version 25, 03-Dec-20 "; # Number of simulations to run: $numruns = 1000000; # Note: On my 2011 Dell i5-2310 PC, 1 million simulations takes 170 minutes, or 115 minutes in Quick Mode # The 2012 NC Interstate Crosscheck found 35750 cases w/ voter name & date-of-birth matching voters in other States $number_of_name_and_dob_matches = 35750; # This program is written for Perl 5.008, but works (more slowly) even with # Perl 4.0036, if you delete this line: use Time::HiRes 'time'; # for Perl 4 you'll need to delete this line # immediate output of debug prints $| = 1; $numruns_defaulted = 1; # changed to 0 (false) if they specify '-n...' # echo the command line print "\nperl $0 " . join(' ',@ARGV) . "\n\n"; # What version of Perl is this? $hasperl5 = 0; $perlver = "3 or earlier"; if ($] =~ /\$\$Revision\:\s*([0-9.]+)\s/) { $perlver = $1; # probably 4.something } elsif ($] =~ /([0-9][0-9.]*)/) { $perlver = $1; # probably 5.something or 6.something $hasperl5 = 1; } print "You are using Perl version $perlver\n"; $debugmode=0; # for debug prints $start_time = time(); # for measuring the program's runtime $savefile = ''; # file for periodically saving intermediate results $restorefiles = (); # files from which to read previously calculated results $benchmarkonly = 0; # 1 if '-b' was specified $quickmode = 0; # 1 if '-q' was specified $numruns_with_commas = &commafy($numruns); # initial value (in case user doesn't specify '-n...') # parse command-line options while (($#ARGV >= 0) && ('-' eq substr($ARGV[0],0,1))) { if ($ARGV[0] =~ /^\-(\-|)[h\?]/i) { &showhelp(); # display 'help' and exit exit 1; } elsif ($ARGV[0] =~ /^\-d$/i) { $debugmode++; # turn on debug prints; specify twice for extra verbosity } elsif ($ARGV[0] =~ /^\-b$/i) { $benchmarkonly++; # just benchmark the computer, do no simulations } elsif ($ARGV[0] =~ /^\-q$/i) { $quickmode++; # "quick mode" -- does half as many rand() calls } elsif ($ARGV[0] =~ /^\-n([0-9\,]+)$/i) { # specify number of simulations (default = 1 million) $numruns = $1; $numruns =~ s/,//g; if ($numruns <= 0) { $numruns = 1; } $numruns_defaulted = 0; } elsif ($ARGV[0] =~ /^\-s(.+)$/i) { $savefile = $1; # specify file into which results should periodically be saved } elsif ($ARGV[0] =~ /^\-r(.+)$/i) { push(@restorefiles,$1); # specify file(s) from which results should be restored } else { printf "\nERROR: unrecognized command-line option: '%s'\n\n", $ARGV[0]; &showhelp(); # display 'help' and exit exit 1; } shift @ARGV; } # Detect whether HiRes is available for timing $loResTimer = !$hasperl5; # Perl 4 never has HiRes available if ($hasperl5) { # Perl 5 should have HiRes, but let's double-check if ((0.0+int($start_time)) == $start_time) { # start_time is an exact integer -- looks suspiciously like HiRes is unavailable &num_coincidences(); # do something which takes more than 1 millisecond, but less than 1 second $start_time = time(); if ((0.0+int($start_time)) == $start_time) { # yep, HiRes is unavailable $loResTimer = 1; } } } # benchmark this computer if ($loResTimer) { # special Perl4 benchmarking kluge, since Time::HiRes is unavailable; wait for clock to 'tick' do { $end_time = time(); } while ($end_time == $start_time); $start_time = $end_time; } $cntr = 0; do { # time a run of at least ten simulations &num_coincidences(); $cntr++; $end_time = time(); } while (($cntr < 10) || ($end_time == $start_time)); $speed = ($cntr / ($end_time - $start_time)); if ($hasperl5) { $passmark = $speed / (94/1616); # My i5-2310 CPU has a Passmark rating of 1616, and it does 94 simulations / sec } else { $passmark = $speed / (65/1616); # Perl 4 is slower than Perl 5 $speed *= 0.95; # Perl 4 seems to slow down a bit for longer runs } if ($quickmode) { $passmark *= .684; # correct for fact that num_coincidences runs faster w/ $quickmode=1 } $passmark = int($passmark + 0.5); $speed = int($speed + 0.5); $speed = &commafy($speed); print "Speed = $speed simulations/second (estimated single-thread Passmark score $passmark)\n"; if (-1 == $#ARGV) { print "\n" . "- "x29 . "-\nNote: for instructions, ctrl-break or ctrl-C now, and run:\n perl $0 -h\n" . "- "x29 . "-\n\n"; } sub showhelp { print "\nNC 2012 Interstate Crosscheck election fraud Monte Carlo statistical simulator\n" . "\n" . "USAGE:\n" . "\n" . " perl $0 [options]\n" . "\n" . "Where [options] is any or all of:\n" . "\n" . " '-d' enable debug prints\n" . " '-n12345678' specify number of simulations to do (default = $numruns_with_commas)\n" . " '-sFilename' specify file where intermediate results are periodically saved\n" . " '-rFilename' specify file(s) from which intermediate results should be loaded\n" . " ('-rFilename' may be repeated to load multiple sets of results)\n" . " '-q' \"quick mode\" - does half as many rand() calls and runs ~32% faster\n" . " '-b' just benchmark the computer, do no simulations\n" . " '-h' or '-?' print these instructions\n" . "\n" . "EXAMPLES:\n" . "\n" . "1. Run 1,500,000 simulations (instead of the default of one million):\n" . "perl test_voter_fraud_stats.pl -n1,500,000\n" . "(commas are optional.)\n" . "\n" . "2. Use a save-file so that you can restart the simulations if the program\n" . "gets stopped before completion:\n" . "perl test_voter_fraud_stats.pl -n200000 -srun1.txt\n" . "(The data is saved at the end of each row of progress-dots.)\n" . "\n" . "3. View result before it's done (or afterward):\n" . "type run1.txt\n" . "('type' is for Windows; use 'cat' on Linux.)\n" . "\n" . "4. Restart the simulations from where they left off:\n" . "perl test_voter_fraud_stats.pl -n200000 -rrun1.txt\n" . "(Note: run1.txt must already exist.)\n" . "\n" . "5. Restart the simulations where they left off, and periodically update the\n" . "save-file so you can restart the program again later:\n" . "perl test_voter_fraud_stats.pl -n200000 -rrun1.txt -srun1.txt\n" . "(Note: run1.txt need not already exist.)\n" . "\n" . "6. Run four instances simultaneously (perhaps on a 4-core computer):\n" . "At 1st cmd prompt:\n" . " perl test_voter_fraud_stats.pl -n250000 -rrun1.txt -srun1.txt\n" . "At 2nd cmd prompt:\n" . " perl test_voter_fraud_stats.pl -n250000 -rrun2.txt -srun2.txt\n" . "At 3nd cmd prompt:\n" . " perl test_voter_fraud_stats.pl -n250000 -rrun3.txt -srun3.txt\n" . "At 4th cmd prompt:\n" . " perl test_voter_fraud_stats.pl -n250000 -rrun4.txt -srun4.txt\n" . "\n" . "7. Make a combined report from the data from four different simulation runs:\n" . "perl test_voter_fraud_stats.pl -rrun1.txt -rrun2.txt -rrun3.txt -rrun4.txt\n" . "\n" . "by Dave Buurton\n" . "http://www.sealevel.info/\n" . "M: 919-244-3316\n" . "\n" . "This is free, uncopyrighted, open source software.\n" . "\n" . "*** To view this 'help' message one screen-full at a time, pipe it to 'more':\n" . "perl $0 -h | more\n"; exit(1); } # '-b' was specified, so exit after reporting benchmark results if ($benchmarkonly) { exit 0; } if ($debugmode) { print "dbg: save file = '$savefile'\n"; print "dbg: restore files = '" . join("','", @restorefiles) . "'\n"; } # we don't actually use this $num_args = $#ARGV+1; # Initialize the buckets. bucket[N] keeps track of how many simulations # had N innocent coincidences of Last4SSN matching. @buckets = (); for ($i=0; $i <= $number_of_name_and_dob_matches; $i++) { $buckets[$i] = 0; } # We go ahead and make 35751 buckets, even though less than 20 will ever be used, # because it can't hurt, and it uses only an extra 1.8 MB of RAM and hardly affects # performance at all. # report the results, or save them to a file sub report_results { local($outpfile) = shift; local($i, $highest_num, $sum, $percentage, $avg); $highest_num = 0; for ($i=0; $i < $#buckets; $i++) { if ($buckets[$i]) { $highest_num = $i; } } $sum = 0; for ($i=0; $i<=$highest_num; $i++) { $sum += ($buckets[$i] * $i); $percentage = 100 * ($buckets[$i] / $numruns); printf $outpfile "%3d :%8d : %10.6f\n", $i, $buckets[$i], $percentage; } $avg = $sum / $numruns_done; printf $outpfile "Average = %7.5f\n", $avg; } # save current (intermediate) results to a text file sub save_buckets { local($outfile) = shift; open( OUTPUT, ">$outfile" ) || die "ERROR: could not write '$outfile', $!\n"; &report_results(OUTPUT); close OUTPUT; } # Load intermediate results from a text file which was created by save_buckets(). # Note that this can be called multiple times to combine results from several files. sub load_buckets { local($inpfile) = shift; local($sum) = 0; local($num,$cnt,@tmp); if (open(INPUT, "$inpfile")) { while (<INPUT>) { @tmp = split(/\s*\:\s*/, $_); if (2 == $#tmp) { ($num,$cnt,$pct) = @tmp; $num =~ s/[\s\,]//g; # delete whitespace and commas $cnt =~ s/[\s\,]//g; $buckets[$num] += $cnt; $sum += $cnt; } } close INPUT; print "Loaded $sum simulations from '$inpfile'\n"; } elsif ($inpfile ne $savefile) { die "ERROR: could not read '$inpfile', $!\n"; } # else if savefile and restorefile are identical, then it's okay if it doesn't initially exist } # if '-r...' was specified, then load initial buckets from file, to resume where we left off for $fn (@restorefiles) { &load_buckets($fn); } $number_of_runs_preloaded = 0; for ($i=0; $i <= $number_of_name_and_dob_matches; $i++) { if ($buckets[$i]) { $number_of_runs_preloaded += $buckets[$i]; } } # $numruns_done is needed for calculating the average $numruns_done = $number_of_runs_preloaded; if ($numruns_defaulted && ($#restorefiles > 1) && ($number_of_runs_preloaded > 2)) { # we're just making a combined report, so don't default numruns to a million $numruns = $numruns_done; } if ($number_of_runs_preloaded > $numruns) { $numruns = $number_of_runs_preloaded; } else { $remaining_numruns = $numruns - $number_of_runs_preloaded; $remaining_numruns_with_commas = &commafy($remaining_numruns); $estimated_runtime = $remaining_numruns / $speed; $readable_estimated_runtime = &human_time($estimated_runtime); print "Estimated run time = $readable_estimated_runtime for $remaining_numruns_with_commas simulations\n"; } if (($estimated_runtime > (60*60)) && ('' eq $savefile)) { print "Note: for long simulation runs like this, you really should use '-sSavefile' so\nthat you can resume if it is interrupted.\n"; } # put commas into an integer if it is > 4 digits long sub commafy { local($number) = shift; local(@pieces) = (); $number .= ''; # if ($debugmode) { print "dbg: number='$number'\n"; } if (length($number) > 4) { while (length($number) > 0) { if (length($number) <= 3) { # we could omit this 'if' clause for Perl 5, but Perl 4 needs it unshift(@pieces,$number); $number = ''; } else { unshift(@pieces,substr($number,-3)); substr($number,-3) = ''; } # if ($debugmode) { # $tm1 = join(',',@pieces); # print "dbg: number='$number', pieces='$tm1'\n"; # } } $number = join(',',@pieces); } # if ($debugmode) { print "dbg: number='$number'\n"; } return $number; } # Return a random integer between 1 and 9999, inclusive. (Won't return 0.) sub rand10k { local($result); $result = rand(9999); # that's >= 0.0, and < 9999.0 (it can never return 9999) $result = 1 + int($result); return $result; # valid SSNs cannot end in 0000 } # convert input in floating point seconds to nice, human-friendly time (e.g., "xx.x minutes") sub human_time { local($seconds) = shift; local($result) = ''; if ($seconds >= 600) { $minutes = $seconds / 60; $minutes = int(($minutes * 10) + 0.5) / 10.0; $result = sprintf("%3.1f minutes", $minutes); } elsif ($seconds >= 60) { $minutes = $seconds / 60; $minutes = int(($minutes * 100) + 0.5) / 100.0; $result = sprintf("%4.2f minutes", $minutes); } else { $result = sprintf("%4.2f seconds", $seconds); } return $result; } # Run one test: of 35,750 voters, how many match last-4-SSNs by innocent coincidence? # The expected value, of course, is 35750/9999 = ~3.575 sub num_coincidences { local($coincidences) = 0; local($i); local($ssn1); local($ssn2); if ($quickmode) { $ssn1 = int(rand(9999)); # &rand10k(); -- 'inlined' for better performance for ($i=0; $i<$number_of_name_and_dob_matches; $i++) { $ssn2 = int(rand(9999)); # &rand10k(); if ($ssn1 == $ssn2) { $coincidences++; } } } else { for ($i=0; $i<$number_of_name_and_dob_matches; $i++) { $ssn1 = int(rand(9999)); # &rand10k(); -- 'inlined' for better performance $ssn2 = int(rand(9999)); # &rand10k(); if ($ssn1 == $ssn2) { $coincidences++; } } } return $coincidences; } $simulations_per_dot = 50; $dots_per_line = 60; $modulo_of_dot = int($dots_per_line/3); if ($quickmode) { $calls2rand = $numruns * (1 + $number_of_name_and_dob_matches); } else { $calls2rand = $numruns * 2 * $number_of_name_and_dob_matches; } $calls2rand = &commafy($calls2rand); $numruns_with_commas = &commafy($numruns); print "\n$numruns_with_commas simulations"; if (($numruns - $number_of_runs_preloaded) >= 1000) { print " requires $calls2rand calls to rand(), which takes a while!\n"; print "So, after every $simulations_per_dot" . "th simulation it prints a dot ($dots_per_line/line), as a progress indicator."; } print "\n\n"; # for the progress indicator $dotcolumn = $dotrow = 0; # Main loop to run the simulations and tabulate the results. # Print "." as progress indicator every $simulations_per_dot simulations, up to $dots_per_line dots per line. for ($i=$number_of_runs_preloaded; $i<$numruns; $i++) { $buckets[ &num_coincidences() ] ++; if (($i % $simulations_per_dot) == $modulo_of_dot) { # print a dot if ($dotcolumn == $dots_per_line) { $pctdone = ($i * 100) / $numruns; printf("%5.1f%%\n", $pctdone); $dotcolumn = 0; $dotrow++; $numruns_done = $i; if ('' ne $savefile) { &save_buckets($savefile); } } print "."; $dotcolumn++; } } $numruns_done = $numruns; if ($dotcolumn > 0) { print "\n"; $dotcolum = 0; $dotrow++; } # save results one last time at the end if ('' ne $savefile) { &save_buckets($savefile); } # remind the user that the simulation results are also in the Savefile, if he specified '-sSavefile' if ($debugmode && ('' ne $savefile)) { print "Note: results of $sum simulations were saved to '$savefile'\n"; } # report the results: print "First column is number of coincidences per 35,750 matches\n"; print " : second column is number of runs (out of $numruns_with_commas) which had that number of coincidences\n"; print " : third column is percentage of runs which had that number of coincidences\n"; &report_results(STDOUT); # report the run-time: $end_time = time(); $run_time = $end_time - $start_time; # in seconds $run_time = &human_time($run_time); if ($debugmode || ($number_of_runs_preloaded < $numruns)) { print "Run time = $run_time\n"; } exit 0; __END__
First column is number of coincidences per 35,750 matches : second column is number of runs (out of 25,000,000) which had that number of coincidences : third column is percentage of runs which had that number of coincidences 0 : 700479 : 2.801916 1 : 2503288 : 10.013152 2 : 4469133 : 17.876532 3 : 5333504 : 21.334016 4 : 4768914 : 19.075656 5 : 3410612 : 13.642448 6 : 2031999 : 8.127996 7 : 1038296 : 4.153184 8 : 463046 : 1.852184 9 : 184655 : 0.738620 10 : 65817 : 0.263268 11 : 21470 : 0.085880 12 : 6339 : 0.025356 13 : 1851 : 0.007404 14 : 461 : 0.001844 15 : 103 : 0.000412 16 : 27 : 0.000108 17 : 5 : 0.000020 18 : 1 : 0.000004 Average = 3.57600
First column is number of coincidences, k, per 35,750 matches : second & third colums are copied from the results of 25,000,000 simulations (above) : fourth column is percentage calculated by my online binomial probability distribution calculator : fifth column is cumulative percentage from the binomial calculator, ∑ ≤k k | simulated | calculated -----+------------------------+----------------------------------------------- 0 : 700479 : 2.801916% : 2.80004041934694% : 2.80004041934694% 1 : 2503288 : 10.013152 : 10.01214692855101 : 12.81218734789794 2 : 4469133 : 17.876532 : 17.89979198583566 : 30.71197933373361 3 : 5333504 : 21.334016 : 21.33365886209420 : 52.04563819582780 4 : 4768914 : 19.075656 : 19.06917141786560 : 71.11480961369340 5 : 3410612 : 13.642448 : 13.63565916189286 : 84.75046877558626 6 : 2031999 : 8.127996 : 8.125068959489566 : 92.87553773507582 7 : 1038296 : 4.153184 : 4.149722300002787 : 97.02526003507861 8 : 463046 : 1.852184 : 1.854414935099515 : 98.87967497017813 9 : 184655 : 0.738620 : 0.7365973040199915 : 99.61627227419812 10 : 65817 : 0.263268 : 0.2633199064110674 : 99.87959218060919 11 : 21470 : 0.085880 : 0.08557214583945469 : 99.96516432644864 12 : 6339 : 0.025356 : 0.02549062245912742 : 99.99065494890777 13 : 1851 : 0.007404 : 0.007008969989723296 : 99.99766391889749 14 : 461 : 0.001844 : 0.001789497617543090 : 99.99945341651503 15 : 103 : 0.000412 : 0.0004264151954425543 : 99.99987983171048 16 : 27 : 0.000108 : 0.00009525622005113322 : 99.99997508793053 17 : 5 : 0.000020 : 0.00002002686282731367 : 99.99999511479336 18 : 1 : 0.000004 : 0.00000397646134453779 : 99.99999909125470 Predicted average = 35,750 / 9999 = 3.575357535753...