# Project index — documents, scripts, results

*The navigable map of the investigation. Every script is standalone (`python3 <script>.py` after
`bash fetch_data.sh` or `bash fetch_data_mirrors.sh`); every number traces to `canonical.py`.
"Part N" = the dated entry in [REPORT.md](REPORT.md) where the result is derived and interpreted.*

## Read in this order

| Document | What it is |
|---|---|
| [README.md](README.md) | Mission, headline results, how to reproduce |
| [PAPER.md](PAPER.md) | Citation-ready manuscript draft (conservative claims only) |
| [REPORT.md](REPORT.md) | The full chronological lab notebook, Parts 1–62 |
| [SCENARIOS.md](SCENARIOS.md) | Every hypothesis, ranked against the evidence; updated after each round |
| [STATE_OF_THE_FIELD.md](STATE_OF_THE_FIELD.md) | Sourced survey: what is established vs refuted in the literature |
| [INTENTION.md](INTENTION.md) | Companion essay: reasoning from human motive |
| [ALCHEMY_BIOLOGY_READING.md](ALCHEMY_BIOLOGY_READING.md) | Galenic/alchemical interpretive layer (hypothesis-tier) |
| [ARCHIVAL_SEARCH.md](ARCHIVAL_SEARCH.md) | Where a key would physically be, ranked |
| [COMMUNITY_POST.md](COMMUNITY_POST.md) | Draft summary for voynich.ninja |
| [research/](research/) | Historical dossiers + the hostile independent review and responses |

## Infrastructure

| Artifact | Role |
|---|---|
| `canonical.py` | **Single source of truth** for parsing (IVTFF → token table with codicological metadata); sensitivity kwargs `alt=`, `commas=` |
| `generators.py` | Shared null models: fitted self-citation generator, class-Markov, word-bigram + the Part-10 grammar battery |
| `fetch_data.sh` / `fetch_data_mirrors.sh` | Data acquisition from primary hosts / pinned mirrors (CI-safe) |
| `build_cuman.py` | Builds the Cuman (Kipchak, c. 1330) lexicon + corpus from Codex Cumanicus OCR → `data/cuman_*.txt` (committed) |
| `site/` | Interactive explainer with in-browser Voynichese generator |

## Experiments by question

### Is it language, cipher, or gibberish? (surface statistics)
| Script | Part | One-line result |
|---|---|---|
| `analyze_voynich.py` | 1 | Zipf −1.04 (natural), glyph h2 ≈ 2.09 bits (far below any language) → not random, not a simple cipher |
| `generate_voynich.py` | 5 | Fitted 11-parameter self-citation generator matches surface stats to 3 decimals but underproduces types |
| `steelman_test.py` | 32 | A *meaningless* class-Markov process reproduces the grammar (r 0.51 vs 0.55) → grammar proves sequence, not meaning |
| `rugg_test.py` | 57 | Rugg's table-and-grille **excluded**: 214 vs 4,584 types, comb-shaped MI(d), wrong entropy and grammar shape |

### Does it have grammar?
| Script | Part | One-line result |
|---|---|---|
| `morphosyntax_test.py`, `morphosyntax_controls.py` | 10 | Suffix alternations shift following-word context across 60–86 stems (r ≈ 0.55, z > 4) |
| `audit_v101morph.py`, `audit_quireblock.py`, `audit_C/D/E/F.py` | 24–25, 31, 33 | Replicates in v101 (r 0.66–0.81), Currier A, Takahashi; survives quire-blocked holdout |
| `wordclass_test.py` | 11 | 8 induced word classes; class-transition MI excess 0.163 bits |
| `resegment_test.py` | 58 | **Grammar survives space-blind re-segmentation** (r → +0.54); scribal spaces are statistically redundant cut points (F1 0.74–0.77) |
| `sensitivity_test.py` | 55 | All headline numbers robust across the 2×2 parsing-choice grid |

### Does it have content/topics?
| Script | Part | One-line result |
|---|---|---|
| `topic_test.py`, `scribe_control.py` | 6 | Section-conditioned vocabulary (JSD ratio 1.29), survives within-scribe control |
| `longrange_test.py` | 56 | **No discourse-scale information**: all d ≥ 8 excess is page-level class composition + drift; prose disfavored, line-records possible |
| `label_test.py` | 21–23 | Labels are a uniform classifier register; no page-specific plant names |

### What kind of writing system?
| Script | Part | One-line result |
|---|---|---|
| `ordering_test.py` | 7 | Glyph-order concordance 0.84 → notation-like internals |
| `syllabary_test.py`, `bpe_test.py`, `reframe_test.py` | 34–36 | ~66 units cover 90% of word-mass at ~2/word: syllabary proportions; BPE merging shrinks the entropy gap (verbose-encoding signature) |
| `decipher_grid.py`, `decipher_phonotactics.py` | 38–40 | Sukhotin vowels {o,e,a,y}; complementary distribution; open-leaning CVC phonotactics |
| `kipchak_test.py` | 54 | **No vowel harmony at sign or unit level** (validated detector) → harmonic plaintexts (Turkic/Hungarian/Finnish) excluded under sign-level encodings; tokens carry ~half the units of real agglutinative words |
| `quantity_test.py`, `indus_test.py`, `analyze_lineara.py` | 3, 19 | Numeral-system and Indus/Linear-A comparisons |

### Who wrote it, in what order? (codicology)
| Script | Part | One-line result |
|---|---|---|
| `operator_page_test.py` | 26 | e/i run-length = production drift (page-stable r 0.90, folio-autocorrelated z 5.4), **not** semantics (retraction documented) |
| `misbinding_test.py`, `reorder_test.py` | 27–28 | Bifolium production confirmed (z −3.8); binding disorder quantified 0.59–0.89 |
| `continuum_test.py` | 29 | Currier A/B are attractor states on a production continuum |
| `burst_test.py`, `repeat_test.py` | 14, 16 | Line is the functional unit; recurrence collapses 5× at line breaks |

### Language identity and readings (mostly negative — documented)
| Script | Part | One-line result |
|---|---|---|
| `typology_test.py` | 12–13 | 13-language screen: inflection-rich tier closest (now constrained by Part 54's harmony exclusion) |
| `decipher_langmatch.py`, `decipher_agglutination.py`, `decipher_morphology.py` | 41–47 | Agglutination claim raised and **retracted** (Part 47) |
| `audit_crib_zodiac.py`, `audit_crib_B/C/D.py`, `decipher_zodiac.py` | 44, 49–52 | All internal and image-based cribs fail their null tests |
| `audit_hist_nullglyph.py`, `audit_hist_numsub.py`, `audit_hist_readgeom.py` | 25 | Null-glyph layer, numeral substitution, reading geometry: tested, LTR confirmed |
| `genre_test.py` | 20 | Medieval recipe-genre comparison |
| `humoral_test.py`, `iconographic_map.py` | 59–62 | Galenic-degree interpretive layer; alchemical ascending-stages reading rejected by its own discriminating test |
| `template.py`, `factorize.py` | various | Line-template and word-factorization utilities |

## Current verdict (one paragraph)

Voynichese is a rule-governed formal system whose grammar is real, transcription-independent, and
a property of the glyph stream itself — but whose only long-range information is drifting
production register, with no discourse signal, no vowel harmony, and no surviving internal or
image-based crib. Meaning, if present, lives in line-records under a verbose/structured encoding
whose key is external to the manuscript. Two readings remain live: a meaning-thin formal
convention with drifting habits (now positively evidenced), and a line-record formulary (the
Galenic frame is the historically best-fitting version). The falsification battery any claimed
decipherment must pass is in [PAPER.md](PAPER.md) §2.8–2.9.

## Gap ledger (what is genuinely untested)

See the end of [REPORT.md](REPORT.md): abbreviated-Latin verbose encoding, syllable-value search
on re-segmented units, Takahashi replication of Parts 54/56/58, Cuman OCR proofing, true medieval
inventory corpora.
