| Filename | gs.zip |
|---|---|
| Title | BASH Script for priming of Election Log Data |
| Version | 1.1 |
| Author / Vendor | Jason Page of Benford Bench |
| Homepage | https://benfordbench.org |
| Category | Utilities » |
| File Size | 9.7 KB |
| Downloads | 2 |
| File Type | ZIP |
| Uploaded | 2026-04-11 15:00 by pagetelegram |
| Approved | 2026-04-11 15:01 by pagetelegram |
| License | GNU MIT |
# Ballot Scan Log CSV Generators Two Bash scripts that parse ImageCast Precinct device logs, extract every "Ballot scanned successfully" event, and write a sorted CSV for analysis. Both scripts are identical in structure; they differ only in how they measure elapsed time. | Script | Elapsed-time column | Output filename | |---|---|---| | `gs_christ.sh` | `seconds_AD1` — seconds since Jan 1, year 1 AD | `<dirname>.csv` | | `gs_zero.sh` | `seconds_day0` — seconds since midnight of the chosen date | `<dirname>_day0.csv` | --- ## Requirements - Bash 4.0 or later - GNU `grep` (PCRE support via `-P`) - GNU `awk` - GNU `date` (the `-d` flag for date parsing) - GNU `sort` All are standard on modern Linux distributions. --- ## Usage Run either script from inside the directory that contains the `.txt` log files you want to process: ```bash cd /path/to/log/directory bash /path/to/gs_zero.sh ``` The script scans all `.txt` files in the current directory, lists every date that has at least one successful scan, and prompts you to pick one: ``` Dates with successful transactions: [1] 19 Oct 2020 (5366 records) [2] 03 Nov 2020 (25262 records) Enter the number of the date to include in the CSV: ``` After you enter a number the script processes the selected date and writes the CSV to the current directory. --- ## Output columns Both scripts produce the same columns; only the elapsed-time column name differs. | Column | Description | |---|---| | `transaction` | Sequential scan count across all files, in ward-then-time order | | `ward` | Zero-padded two-digit ward number (e.g. `01`, `08`, `48`) | | `file` | Source `.txt` filename (spaces replaced with underscores) | | `timestamp` | Datetime stamp extracted from the log line | | `seconds_AD1` *(gs_christ only)* | Total seconds elapsed from Jan 1, year 1 AD (proleptic Gregorian) to this scan | | `seconds_day0` *(gs_zero only)* | Seconds elapsed since midnight (UTC) of the selected date | | `delta` | Seconds between this scan and the previous one; blank for the first row | | `gap_over_100` | Same as `delta`, but only populated when the gap exceeds 100 seconds; else blank | Rows are sorted first by **ward** (ascending), then by **timestamp** (ascending) within each ward. --- ## Ward number extraction The ward number is determined per log file using the following priority order: 1. **Path or filename** — if the directory or filename contains a ward indicator, that value is used for every scan line in the file. - `WARD 01/`, `WARD_01/`, `Ward-01/` — directory name pattern - `w01p03.txt`, `w48p12.txt` — `w<ward>p<precinct>` filename pattern 2. **File content** — if no ward can be found in the path, the script reads the log lines themselves for: ``` votingLocationName Value: Ward 08 ``` - **Raw log files**: the most recently seen `votingLocationName` line before each scan event is used (handles large combined log files where entries from multiple machines are concatenated). - **CSV log files** (where column 1 is a source filename such as `machinecontext_1_10101_ELOG.TXT`): a lookup table is built mapping each source filename to its ward, then applied to every scan row. 3. **Fallback** — if no ward can be determined by any method, the ward is recorded as `00`. --- ## Supported log formats The scripts recognise three timestamp formats automatically: | Format | Example | Source | |---|---|---| | Old raw | `19 Oct 2020 13:04:04` at start of line | 2020 ImageCast logs (`wNNpNN.txt`) | | New raw | `2024-11-05 05:21:57` at start of line | 2024 ImageCast logs (`machinecontext_*_ELOG.TXT`) | | CSV column 2 | `machinecontext_...,2024-11-05 05:21:57.011,...` | Combined export files (`*_Election.txt`) | Milliseconds in the new formats are stripped before processing. All timestamps are treated as UTC. --- ## Directory layouts ### Per-ward directories (2020) ``` CHI 20201103 Ver J_DeviceLogs/ WARD 01/ w01p01.txt w01p02.txt ... WARD 02/ w02p01.txt ... ``` Run the script from inside a single ward directory. The ward number is taken from the directory name. ### Flat per-machine directory (2024) ``` CHI 20241105 Ver C_DeviceLogs.../ machinecontext_1_10101_ELOG.TXT machinecontext_1_10102_ELOG.TXT ... ``` Run the script from this directory. Each file contains a `votingLocationName Value: Ward NN` line; the script reads that line to determine the ward for the scans in the same file. ### Combined export file ``` 2024/ 2024_Election.txt ← CSV, column 1 = source machine filename 2020/ 2020_Election.txt ← raw concatenated log from all machines ``` Run the script from the `2024/` or `2020/` directory. Ward is resolved from `votingLocationName` content as described above. --- ## Output file location The CSV is written to the **current directory** and named after the directory: | Working directory | Output file | |---|---| | `WARD 01/` | `WARD 01/WARD_01.csv` / `WARD_01_day0.csv` | | `CHI 20241105.../` | `CHI_20241105..._day0.csv` | | `2024/` | `2024_day0.csv` | Spaces in the directory name are replaced with underscores in the filename. --- ## seconds_AD1 calculation (gs_christ.sh) The epoch offset used to convert Unix time to seconds-since-year-1 is **62,135,596,800**, derived as follows: - 1969 complete years × 365 days = 718,685 days - Leap years in years 1–1969: ⌊1969/4⌋ − ⌊1969/100⌋ + ⌊1969/400⌋ = 492 − 19 + 4 = **477** - Total days to Unix epoch: 718,685 + 477 = **719,162** - × 86,400 s/day = **62,135,596,800 seconds**