# What the data actually says about undeciphered ancient texts

> **READ THIS FIRST — this document is a chronological notebook, not a finished paper.**
> Parts are dated in the order they were run; later parts *correct and supersede* earlier ones.
> For claims at their final, audited strength, see **`PAPER.md`** and **`README.md`**, or jump to:
> - **Part 25** (hostile audit) — which claims survived adversarial testing; the flattened-table reading was **falsified**.
> - **Part 26** — the e/i "operator" is **production drift, not a semantic attribute** (supersedes the semantic framing in Parts 17–21).
> - **Part 32** — the grammar (Parts 10–11) proves **rule-governed sequence, NOT meaning**: a meaningless class-Markov process reproduces it. This **narrows** the "real grammar ⇒ encoded language" framing used in Parts 10–18.
> - **Parts 31, 33** — the grammar nonetheless replicates across **both dialects and three independent transliterations** (robust as structure).
> Where an early part says "encoded code over a language," read it as one of *two* surviving readings (the other: a meaning-thin formal convention); the text alone cannot decide between them.
>
> *Bottom line at final strength:* the Voynich is a rule-governed formal system with transcription-independent structure and a recoverable production history; its **meaning, if any, is not recoverable from the text alone.**

*Analysis run 2026-06-10. Every number in Parts 1–2 was computed fresh on this machine from primary corpora — not quoted from secondary sources. Scripts: `analyze_voynich.py`, `analyze_lineara.py`. Data: `data/ZL3b-n.txt` (Zandbergen–Landini transliteration v3b, 13 May 2025, voynich.nu), `lineara.xyz/` (GORILA-based Linear A corpus). Controls: Caesar's *De Bello Gallico* (Latin, 33,207 tokens), Alice in Wonderland + Dracula (English, 163,401 tokens), and a random-gibberish control with the Voynich's exact alphabet and word-length profile.*

---

## Part 1 — The Voynich manuscript: six computed results

Corpus as parsed: 33,122 running-text word tokens, 6,530 types, 25-letter EVA alphabet, across pages tagged Currier language A (10,105 tokens) and B (22,496 tokens).

| corpus | tokens | types | Zipf slope | h1 (bits) | **h2 cond. entropy** | word-len mean ± sd |
|---|---|---|---|---|---|---|
| Voynich | 33,122 | 6,530 | −1.04 | 3.87 | **2.09** | 4.99 ± 1.78 |
| Latin | 33,207 | 8,074 | −0.85 | 4.02 | **3.30** | 6.04 ± 2.92 |
| English | 163,401 | 9,270 | −1.12 | 4.05 | **3.26** | 3.94 ± 2.03 |
| Random control | 33,122 | 30,338 | −0.69 | 4.52 | **4.47** | 4.99 ± 1.78 |

**1. It is not random scribbling.** The rank–frequency slope is −1.04 — textbook Zipf, indistinguishable from real language (English −1.12). The random control, despite having the same alphabet and word lengths, scores −0.69 and barely repeats a word (30,338 types vs the Voynich's 6,530). Whatever produced this text had strong, language-like word statistics.

**2. It is not a simple cipher of any European language.** Conditional character entropy h2 = 2.09 bits — vastly below Latin (3.30) and English (3.26). A substitution cipher *preserves* conditional entropy; you cannot get 2.09 from enciphering a 3.3-bit language one letter at a time. Every "I translated it word-for-word into Latin/Hebrew/proto-Romance" claim collides with this single number. (Replicates Bennett 1976 and the survey in Bowern & Lindemann, *Annual Review of Linguistics* 2021, on the current 2025 transliteration.)

**3. Word lengths are binomially distributed and abnormally narrow** (sd 1.78 vs Latin 2.92): 25% of all words are exactly 5 letters; words longer than 9 letters essentially don't exist (0.8%). Latin has 11.6% of words ≥10 letters. Natural languages don't have a hard ceiling at 9 letters; slot-based generation systems and certain cipher formats do (replicates Stolfi's "core-mantle-crust" observation).

**4. Currier's two "languages" are real and enormous in effect size.** Same script, same page format, yet: *chedy* occurs 21.6×/1000 words in language B vs 0.1 in A (200:1); *qokeedy* 13.5 vs 0.0; conversely *chor* 16.2 in A vs 1.2 in B. Top-100 vocabulary Jaccard overlap is only 0.29. Whatever process generated the text **changed its parameters partway through the manuscript** — consistent with multiple scribes (Lisa Fagin Davis identifies five hands) each running the same generative system slightly differently, and very hard to square with one coherent underlying plaintext language.

**5. Page layout controls "spelling" — which no natural language does at this magnitude:**
- Words ending in EVA *m*: **15.3% at line-end vs 0.9% elsewhere (17×)**
- Words containing *p* or *f*: **22.2% in paragraph-initial lines vs 1.3% in other lines (17×)**
- Lines starting with a gallows glyph (t/k/p/f): **82.5% of paragraph-initial lines vs 8.8% of others (9×)**
- Line-first words run longer (5.42 letters) and line-last words shorter (4.70) than mid-line words (4.97).

In real writing, where a word falls on the page does not change which word it is. In the Voynich, the line is a functional unit ("LAAFU", Currier's old observation — here quantified). This is the single strongest argument that the visible text is not a straightforward transcript of language.

**6. The "self-citation" signature: words are copies of nearby words.**
- Adjacent words are measurably more similar than chance: mean normalized edit distance 0.772 vs 0.810 shuffled (ratio 0.954; Latin and English sit at exactly 1.00).
- Probability that the next word is *identical* to the current one: **0.87%** (English 0.07%, Latin 0.00%).
- Probability the next word is within edit distance 1: **3.5%** (English 1.0%, Latin 0.3%).
- The vocabulary is a dense similarity network: **87.4% of Voynich word types have another type at edit distance 1** (mean 7.1 neighbours), vs ~54% (1.1–1.9 neighbours) for size-matched Latin/English vocabularies.

This is precisely the fingerprint predicted by Timm & Schinner's "self-citation" generation hypothesis (*Cryptologia* 2020): a scribe producing each new word by lightly mutating words already on the page. It is *not* what natural-language lexicons look like.

**Verdict the data supports.** The Voynich text is neither random nor a plain language under simple substitution. It is Zipfian but hyper-repetitive, layout-sensitive, internally self-copying, and statistically non-stationary (A→B drift). Three families of explanation survive: (a) a structured pseudo-text generated by procedure (Timm–Schinner; Rugg's grille is a cruder variant); (b) an encoding that massively inflates redundancy (e.g., a verbose/null-heavy cipher — though no known 15th-c. cipher class matches all constraints); (c) an exotic orthography of a natural language with heavy phonetic repetition — strained by findings 4 and 5. Any future "decipherment" that does not *quantitatively reproduce* the six numbers above can be dismissed on arrival. That is a genuinely useful discovery: **this dataset is a falsification machine for Voynich claims.**

---

## Part 2 — Linear A: a precise diagnosis of why it resists decipherment

Computed from the full GORILA-based digital corpus:

- **1,722 inscriptions across 53 sites** — but brutally concentrated: Haghia Triada alone holds 1,110 (64%).
- **Only ~8,967 sign tokens in the entire script's surviving record** (322 distinct signs incl. logograms/variants), plus 1,581 numeral tokens. For comparison: a single page of this report carries more text.
- **85% of all sign-group "words" (1,370 of 1,606 types) occur exactly once.** A decipherment needs repeated words in varied contexts; Linear A barely has any.
- Only **31 words (1.9% of the lexicon) appear at 3+ sites.**
- 21% of inscriptions carry numerals; the corpus is dominated by accounting tablets.

What the data *does* show, clearly:

- The most frequent meaningful word is **KU-RO** (22×) — written at the bottom of lists whose numbers sum correctly, so its meaning ("total") is effectively *proven arithmetic*, not speculation; **KI-RO** (8×) marks deficits. We can read Minoan bookkeeping mechanics without reading Minoan.
- The only stable cross-site vocabulary is religious: **A-TA-I-\*301-WA-JA** (8×, 5 sites) and **JA-SA-SA-RA-ME** (5×, 5 sites) — the "libation formula" recurring verbatim on stone vessels from Petsophas to Zakros. A standardized liturgy existed across Crete c. 1700–1450 BCE; these 31 multi-site words are the *entire* leverage available for any future phonetic anchor.
- Conclusion with numbers attached: Linear A is not mysterious — it is mostly farm accounting. It is undeciphered because of **data starvation, not method failure**: Ventris had tens of thousands of Linear B sign occurrences plus internal grids; Linear A offers ~9,000 signs, one dominant site, no bilingual. Barring new excavation, no method — human or AI — can validate a full decipherment, because 85% of the lexicon is statistically untestable.

---

## Part 3 — Things "we knew" that the evidence has overturned or never supported

1. **"Rongorongo (Easter Island) is a post-contact imitation of European writing."** Radiocarbon dating of four tablets (Ferrara et al., *Scientific Reports*, Feb 2024) put one tablet's wood at **1493–1509 CE — two centuries before first European contact (1722)**. Caveat honestly: dating wood is not dating carving (old-wood problem). But the default story flipped: Rongorongo is now plausibly one of humanity's very few *independent inventions of writing*. ([Nature/Sci Rep](https://www.nature.com/articles/s41598-024-53063-7), [Live Science](https://www.livescience.com/archaeology/undeciphered-script-from-easter-island-may-predate-european-colonization), [Smithsonian](https://www.smithsonianmag.com/smart-news/research-reveals-the-natives-of-easter-island-invented-a-written-language-from-scratch-180983793/))

2. **"The Herculaneum scrolls are destroyed and unreadable."** False since 2023–24: X-ray micro-CT plus machine-learning ink detection (Vesuvius Challenge) reads carbonized scrolls *without opening them*. In 2025 the still-rolled **PHerc. 172 was identified as Philodemus, "On Vices"** — title and author read from inside the char. Hundreds of scrolls await; this is the fastest-moving recovery of lost ancient literature in centuries — driven by code, not shovels. ([Vesuvius Challenge](https://scrollprize.org/), [Oxford](https://www.ox.ac.uk/news/2025-02-07-inside-herculaneum-scroll-seen-first-time-almost-2000-years), [CNN](https://www.cnn.com/2025/05/06/science/herculaneum-scroll-title-author-decoded-intl-scli))

3. **"Linear Elamite was deciphered in 2022."** Overstated. Desset et al. (*Zeitschrift für Assyriologie* 2022) made real progress via royal-name cross-identifications, but the claim of near-complete decipherment is disputed (Jacob Dahl, Oxford; Michael Mäder argues the script is partly logographic, not purely phonographic). Status: partially read, contested. ([Wikipedia summary](https://en.wikipedia.org/wiki/Linear_Elamite), [Desset et al., De Gruyter](https://www.degruyterbrill.com/document/doi/10.1515/za-2022-0003/html))

4. **"The Voynich manuscript has been deciphered" (recurring headlines: Bax 2014, Gibbs 2017, Hauer & Kondrak's "Hebrew" 2018, Cheshire's "proto-Romance" 2019).** Every one is rejected by specialists, and Part 1 shows the structural reason: the text's conditional entropy (2.09 bits) and layout-dependence are incompatible with any of the proposed plain-language readings. The Cheshire paper prompted a rare public rebuttal wave in 2019; nothing has changed since.

5. **"The Indus script debate is settled."** Neither side's confident version survives the data. The corpus is ~4,000–5,000 inscriptions averaging roughly five signs — too short for standard decipherment tests. Farmer–Sproat–Witzel (2004) argue it isn't language; Rao et al. (*Science* 2009) found language-like conditional entropy, itself methodologically contested. Honest state: **unresolved, and likely unresolvable without longer texts.**

6. **The real bottleneck in ancient texts isn't decipherment — it's throughput.** Hundreds of thousands of excavated cuneiform tablets sit unread for lack of Assyriologists, in a fully *deciphered* script. Neural machine translation for Akkadian (Gutherz et al., *PNAS Nexus* 2023) and successor models now make the backlog tractable. The largest body of "lost" ancient knowledge is lost to labor economics, not mystery.

---

## Part 4 — Where discovery is actually possible next

| Target | Why it's tractable | Method |
|---|---|---|
| Herculaneum scrolls (~300 unopened) | Script known, language known, only imaging stands in the way | Virtual unwrapping + ML ink detection (proven) |
| Cuneiform backlog | Script & languages deciphered | NMT at scale + human verification |
| Voynich | Generation *procedure* may be recoverable even if there is no "meaning" | Generative simulation fitted to the six statistics in Part 1; scribe-conditioned analysis (Davis's 5 hands × Currier A/B) |
| Linear A | Blocked on data | New excavation; the 31 multi-site words are the anchor set to watch |
| Rongorongo | Pre-contact origin now plausible | Direct dating of *incisions*/organic residue; corpus statistics on the ~26-glyph core inventory |

**Reproducibility:** rerun everything with `python3 analyze_voynich.py` and `python3 analyze_lineara.py` in this directory. Methods caveats: EVA transliteration imposes one of several defensible glyph segmentations (entropy values shift somewhat under Currier or v101 transliterations, but the cross-corpus *gap* persists in all of them — see Bowern & Lindemann 2021); controls differ in genre; uncertain-space commas treated as breaks; label loci excluded from line-position statistics.

---

## Part 5 — Experiment: can a mechanical procedure write Voynichese? (2026-06-10, second session)

**Question.** If the Voynich text was *generated* rather than written in a language, a simple procedure a 15th-century scribe could execute by hand should reproduce its statistical fingerprint. I built one (`generate_voynich.py`) and fitted it to Currier language B (22,496 words).

**The procedure** (the whole model):
1. A morphological table: every word = prefix + middle + suffix, with slot-to-slot association (e.g. *qo-* attracts *k*-middles). The scribe has favourite combinations (a "sharp" table used ~90% of the time) and occasionally wanders (a "flat" table).
2. Three habits when writing the next word: **reuse** a word already written (42% — split between re-reading the current page and long-term favourites), **mutate** a recent word by re-rolling one slot (17%), or compose a **novel** word from the table (41%).
3. Mechanical layout rules applied at the moment of writing: line-final *-n/-r/-l* becomes *-m* (38% of the time); in paragraph-opening lines *t→p, k→f* (45%) and the first word takes a gallows glyph. The table itself contains no *m*, *p* or *f* — they exist only as positional variants.

**Result** (full-size run, same evaluation battery as Part 1):

| metric | real Currier B | generated |
|---|---|---|
| word tokens | 22,496 | 22,496 |
| Zipf slope | −1.074 | −0.981 |
| h1 char entropy | 3.870 | 3.856 |
| **h2 conditional entropy** | **1.982** | **1.984** |
| word length mean ± sd | 5.09 ± 1.77 | 5.11 ± 1.71 |
| top-100 word coverage | 0.494 | 0.477 |
| adjacent-similarity ratio | 0.980 | 0.982 |
| P(next word identical) | 0.009 | 0.011 |
| P(next word edit-dist ≤1) | 0.033 | 0.024 |
| vocabulary connectivity | 0.872 | 0.925 |
| mean edit-dist-1 neighbours | 6.52 | 7.08 |
| **word types** | **4,590** | **3,192** |
| line-final *m* rate | 0.179 | 0.191 |
| paragraph-line gallows start | 0.851 | 0.815 |

Eight of the ten most frequent generated words (*aiin, qokaiin, ol, qokeey, qokeedy, qokedy, ar, qokain, chedy*) are also in the real top ten. Sample output: `feedy.keedy.oty.oraiin.okiidy.okedy.ofiidy.feedy.chain / dtar.keedy.chetar.kaiin.qokedy.okiidy ...`

**What matches:** conditional entropy (exactly), word lengths, Zipf shape, head concentration, the self-citation signature, the neighbour network, and all the layout effects — the last falling out naturally from treating *m/p/f* as positional variants of *n/t/k*, which is itself a live scholarly hypothesis.

**What doesn't:** the model produces 30% too few distinct word types. The real text sustains a steeper-than-generated Zipf head *and* a fatter tail of one-off words simultaneously. This residual is informative: whatever generated Voynichese had a stronger source of novelty than slot-table sampling — e.g. position-on-page-dependent table choice, or drifting habits over time (consistent with the A→B drift in Part 1). Timm & Schinner's richer self-citation model (*Cryptologia* 2020) closes this gap with full-word mutation chains; mine is deliberately minimal.

**Honest scope:** the table's weights were *learned from* the real text and parameters were *tuned to* the targets — this demonstrates sufficiency ("a procedure this simple CAN produce text this Voynich-like"), not history ("this is what the scribe did"). And reproducing statistics does not prove the manuscript is meaningless; it proves that meaninglessness requires no exotic explanation. The burden of proof now sits with meaning-claims: any proposed decipherment must explain why an 11-parameter mechanical process replicates the text's entire measurable structure.

Artifacts: `generate_voynich.py`, `generation_results.json`.

---

## Part 6 — The decision test: does the manuscript "talk about different things"? (2026-06-10, third session)

**Why this is the pivotal test.** If the Voynich is a meaningless generated artifact (Part 5), its word distribution should be roughly the same everywhere — a stationary process doesn't know what picture is on the page. If it carries content, sections with different illustrations (herbal vs biological vs astronomical) should use systematically different words. Montemurro & Zanette (*PLOS ONE* 2013) found such structure; the criticism was always that it could be explained by **burstiness**, by different **scribes**, or by **drift over time** rather than meaning. I tested it with the controls they didn't have: my own stationary generator cut into the identical page/section shapes, and — decisively — a within-single-scribe comparison.

**Method.** Jensen–Shannon divergence (bits) between the word distributions of page-groups, page-sampled to equal size (25 trials). "Between" = two different sections; "within" = two halves of the *same* section (this baseline absorbs burstiness). Restricted to Currier B to hold "language" fixed.

**Result 1 — real text vs stationary null:**

| corpus (same page/section shapes) | between-section | within-section | ratio |
|---|---|---|---|
| **real Voynich B** | 0.477 | 0.370 | **1.287** |
| my stationary generator | 0.385 | 0.353 | 1.093 |
| English (genuine topics) | 0.372 | 0.338 | 1.101 |
| shuffled real B (floor) | 0.377 | 0.376 | 1.000 |

The real manuscript separates its sections **~3× more strongly than my generator or the English control** relative to the shuffled floor. Word types significantly tied to one section (G²>10.83, p<0.001): **real B = 96, generator = 31, English = 31, shuffled = 4.** The generator — which reproduces every *global* statistic in Part 5 — cannot produce this. So the manuscript carries a structure my "meaningless artifact" model does not.

**Result 2 — the within-scribe control (the clincher).** Using the page-level hand ($H) and illustration ($I) tags, I compared one scribe writing two different sections:

| same hand, same language, different section | between | within | ratio |
|---|---|---|---|
| hand 1: herbal vs pharmaceutical (lang A) | 0.549 | 0.424 | **1.295** |
| hand 2: herbal vs biological (lang B) | 0.545 | 0.374 | **1.457** |
| hand 2: biological vs text-only (lang B) | 0.523 | 0.408 | 1.282 |

A **single scribe's vocabulary shifts with the subject of the page** — the divergence is as large or larger than the overall section effect. This rules out the "it's just different scribes' dialects" explanation. The section keywords are systematic, not random: the biological section is dominated by *qok-/qol-/shedy* forms (*ol* G²=137, *qol* 128, *shedy* 127), the herbal by *ch-/sh-* forms (*chdy* 50, *cheky* 28) — the generator's "keywords" by contrast are gibberish hapaxes (*tshed, olkeeedy*).

**Verdict.** There **is** content-correlated structure in the Voynich text, and it is real — not burstiness (controlled), not scribal dialect (controlled by the within-hand test), not an artifact my stationary generator can fake. **There is something to decipher.** This is the strongest pro-meaning result in this whole investigation, and it directly bounds the Part 5 finding: a generated-artifact theory now *must* explain why a single scribe's word choice tracks the illustrations.

**The one alternative I cannot exclude with this test:** "section" partly coincides with "when it was written," so the effect could be a finer-grained version of the A→B drift (Part 1) rather than aboutness per se — habits changing between quires written months apart. Distinguishing topic from time-drift needs the quire/binding order, which the transliteration doesn't carry. Either way, the stationary-artifact reading is falsified: the manuscript is non-stationary in a way that locks onto the illustrated content.

**Where this leaves the decipherment program.** Step 1 says go. The live targets are now: (2) the glyph-ordering / codebook test, and (3) null-stripping for a hidden signal channel — and crucially, **the section vocabularies above are the first place to look for meaning**: the *qok-* cluster genuinely concentrates in the biological pages, so whatever those words encode, they encode something about that subject. That is a real, narrow, evidence-backed lead.

Artifacts: `topic_test.py`, `scribe_control.py`.

---

## Part 7 — The numeral test: Voynich words are built like notation, not like words (2026-06-10, third session)

**Hypothesis tested.** If Voynich words are reference tokens — e.g. sign-value numerals indexing a codebook, the way Roman numerals index a register — then (a) glyphs inside a word should obey a near-fixed global order (as m≥c≥x≥i in Roman numerals), and (b) a glyph should rarely recur in a word once a different glyph has intervened (numerals repeat in runs, *iii*, never split, *i·c·i*). Languages obey neither strongly.

**Method.** For each corpus, infer one global glyph ordering (by mean within-word position), then measure the fraction of within-word glyph pairs that respect it (concordance), plus repeat behavior. Identical pipeline for all corpora; EVA benched glyphs (*ch, sh, cth*...) treated as single units. Positive control: 30,000 actual Roman numerals. Negative control: 30,000 decimal numbers (positional notation has no glyph order — measured 0.525, i.e. chance).

| corpus | order concordance | separated repeats | adjacent repeats |
|---|---|---|---|
| Roman numerals | 0.977 | 0.217 | 0.917 |
| **Voynich (EVA glyphs)** | **0.840** | **0.151** | **0.258** |
| English | 0.645 | 0.182 | 0.101 |
| Latin | 0.641 | 0.489 | 0.115 |
| Decimal digits (chance) | 0.525 | 0.425 | 0.339 |

**Findings.**
1. **Voynich glyph order is far stricter than any language and approaches numeral notation**: 0.840 vs ~0.64 for Latin/English (Currier B alone: 0.863). The inferred order is stable and interpretable: `q < sh < ch < gallows(p,t,f,k) < o < a < s < e < d < i < l < r < y < m < n` — the familiar slot architecture, now quantified as a near-total order. The ordering replicates independently in Currier A and B with only local rearrangements.
2. **Glyph repetition behaves like notation**: repeats concentrate in adjacent runs (*ee, ii*: 25.8% of words, ~2.5× the rate of English/Latin) while separated recurrence is suppressed — length-controlled (5–7 glyph words only): Voynich 0.227 vs English 0.345 vs Latin 0.485. Real words happily recycle a letter after others intervene (*arbor*, *remember*); Voynich words mostly don't.
3. The method's sanity checks pass: it recovers the true Roman order m<d<c<l<x<v<i from data alone, and correctly finds *no* order in decimal digits.

**Interpretation — the three-part synthesis.** Combining the sessions:
- Part 5: the text's global statistics are reproducible by a mechanical slot-table procedure → it is not phonetically written language.
- Part 6: yet its vocabulary locks onto the illustrated subject matter, even within one scribe's hand → it carries content.
- Part 7: and its words are internally structured like ordered symbolic notation, not like pronounceable words.

The reading that satisfies all three at once: **the Voynich text behaves like a reference notation — ordered, numeral-like tokens that point at content, with the key external to the book** (the codebook/nomenclator family of hypotheses). It would explain the low entropy (notation is rigid), the dense near-twin vocabulary (neighboring index values), the topic-locking (different subjects cite different ranges of entries — e.g. the *qok-* cluster concentrating in the biological section), and why no linguistic decipherment has ever worked (there is no language *in* the text to find). Caveat honestly stated: high concordance is also consistent with the meaningless slot-procedure reading; what tips the balance toward "references with content" is Part 6, not this test alone. Historical footnote: cipher nomenclators (code lists with symbol indexes) are documented from the mid-1400s in Italy — the manuscript (1404–1438) would be early, but the form was emerging in exactly that world.

**Consequence for decipherment.** If this reading is right, the text is *exactly as readable as a list of catalog numbers without the catalog* — full decipherment requires finding the external key (an archives problem: Rudolf II's court inventories, the Kircher correspondence, Bohemian/Italian library registers), not more cryptanalysis. The internal route that remains open: mapping word-*clusters* to referent *domains* via the illustrations (Part 6 already gives the first entries: *qok-/qol-/shedy* family ↔ the biological/balneological subject; *ch-/sh-* forms ↔ plants), i.e. reading the book's index structure without its dictionary.

Artifacts: `ordering_test.py`.

---

## Part 8 — The label test: a prediction fails, and a new puzzle is quantified (2026-06-10, third session)

The 1,138 words written directly on drawings (stars, plants, jars, nymphs; 817 types, 57 pages) are the only place where text is physically bound to a depicted *thing* — the closest the manuscript comes to a Rosetta situation. Three measurements:

**1. Labels are a distinct formal register, with domain formats.** 54% of labels begin with the glyph *o* (running text: 21%); *ch-* openings collapse from 18% to 7%. Zodiac star labels overwhelmingly carry *ot-/ok-* prefixes (16% of same-page label pairs share their first two glyphs vs 7.4% across sections): *otaly, otalchy, okala, okody...* Labels are not words sampled from the text — they follow their own format conventions, varying by section. (Looks like a classifier/prefix system: *o(t)-* ≈ "star-".)

**2. The sequential-catalog prediction FAILS.** If labels were *adjacent index numbers* (items drawn together = neighbouring catalog entries), same-page labels should be measurably more similar than same-section labels from different pages. They are not: normalized edit distance 0.769 (same page) vs 0.765 (same section, different page) — ratio 1.006, i.e. zero page-level effect, while the running text shows its usual self-citation signal (0.960) under the identical estimator. Label similarity is organized at the *section/domain* level, not the page level. The strict "sequential codebook" version of the reference hypothesis is refuted; what survives is the weaker, historically standard form — **nomenclator-style codes: domain-formatted but arbitrarily assigned** (real 15th-century cipher code-lists assigned symbols arbitrarily within format classes).

**3. New quantified puzzle: the text does not name its own pictures.** Only 12.4% of labels appear anywhere in their own page's running text — chance level is 9.7%. And 53% of label types never occur in the entire running text at all. For a genuine herbal/astronomical treatise this is bizarre: a book *about* its illustrations that essentially never mentions them by their written names. Combined with Part 6, the picture sharpens: the text co-varies with subject *domain* (vocabulary families shift with illustration type, even within one scribe) but shows **no item-level linkage** between text and depicted objects.

**Updated balance of evidence.** The item-level meaning tests (page-adjacent indexes, label echo) have now come back negative, while the domain-level structure tests (Part 6 topic-locking, Part 7 notation ordering, label format classes) come back strongly positive. Two readings remain standing, and they are closer to each other than either is to "natural language" or "pure noise":
- **(a) Nomenclator-like reference text**: domain-formatted arbitrary codes, key external — explains domain structure and notation form; must shrug at the label-echo absence (text cites entries other than the pictured items' codes).
- **(b) Domain-conditioned formal generation**: scribes ran the procedure with per-section conventions (different table settings for star pages vs plant pages, a special label mode) — explains everything measured so far, at the cost of making the "meaning" purely conventional/decorative.

What both readings agree on — and what this investigation has effectively established across Parts 5–8: **the Voynich manuscript is a formal system with domain-aware conventions, not a phonetically written language, and its meaning (if any) is not recoverable from the text alone.** The honest discovery of the label test is the 12.4%-vs-9.7% number: whatever the book is, it is not a book that talks about its own pictures.

Artifacts: `label_test.py`.

---

## Part 9 — The toolkit goes to the Indus Valley, and the Voynich's capacity paradox (2026-06-10, fourth session)

### 9a. The Indus script, placed on a calibrated ruler

The "is the Indus script writing at all?" war (Farmer–Sproat–Witzel 2004 vs Rao et al. 2009) has run for two decades largely without a calibrated instrument. I ran the Part-7 ordering battery — now with a within-text permutation null that corrects for corpus shape — on real CISI data (mayig digitization, 178 Mohenjo-daro inscriptions, 1,002 sign tokens, Parpola sign IDs; small but genuine). Bias-corrected "excess concordance" (observed minus shuffled null):

| corpus | raw | shuffled null | **excess** |
|---|---|---|---|
| Decimal digits (no order) | 0.525 | 0.503 | 0.022 |
| Latin | 0.641 | 0.503 | 0.138 |
| English | 0.645 | 0.505 | 0.140 |
| **Indus (CISI sample)** | 0.809 | 0.594 | **0.215** |
| Voynich (EVA) | 0.840 | 0.504 | 0.336 |
| Roman numerals | 0.977 | 0.502 | 0.475 |

Findings:
1. **The Indus script has strong, real positional grammar** — excess 0.215, ~1.5× natural-language letters. The data shows dedicated positional sign classes: P324 opens texts (77/99 occurrences initial, the famous high-frequency boundary sign), P385 closes them (29/35 final), and P122 — the second-commonest sign — is *never* initial or final in 76 occurrences: a strictly medial connective. Pure emblem heaps shouldn't have a syntax of openers, connectives, and closers. This cuts against the strongest "not writing, just symbols" position.
2. **But the Indus is far less rigid than notation — and notably less rigid than the Voynich.** On the same ruler: language 0.14 < **Indus 0.215** < **Voynich 0.336** < numerals 0.475. The Indus sits on the language side of the midpoint; the Voynich sits on the notation side. Consistent with the mainstream reading of seals as name+title formulas (proper names in flexible order, framed by formulaic openers/closers).
3. Signs almost never repeat within an inscription (11% have any repeat) — as expected for name/title strings, though the 181-sign inventory makes repeat statistics weak evidence either way.

*Caveats: 178 inscriptions is a preliminary sample (CISI M-1–M-199 only); the excess measure is shape-sensitive; this places the Indus on a scale, it does not settle the debate.*

### 9b. The Voynich capacity paradox — the strongest version of the codebook reading

New measurement: at the **word** level, the Voynich is statistically normal. At matched corpus size (32,405 tokens):

| | H1(word) | H2(word|word) |
|---|---|---|
| Voynich | 10.10 bits | 4.44 bits |
| Latin | 10.95 | 3.57 |
| English | 9.01 | 4.51 |

So the manuscript's famous "low entropy" lives entirely *inside* its words (character level, h2 = 2.0); the word *stream* has full natural-language capacity and word-to-word unpredictability essentially identical to English. The book could carry on the order of 10 bits/word × 32k words — room for a substantial text.

This sharpens everything. A **word-level substitution — each plaintext word replaced by a codebook token — preserves exactly the word-level statistics (normal ✓) while replacing character statistics with the notation's own (rigid ✓).** And one specific, historically mundane variant explains our strangest finding: if the codebook is an **alphabetized list with sequential numbering**, then morphologically related plaintext words (Latin *aqua/aquae/aquam*...) receive *adjacent index values* — which would surface in ciphertext as thousands of word pairs differing by one glyph. That is precisely the dense near-twin vocabulary network (87% connectivity) that no natural lexicon shows. It also retro-explains the label-test "failure": plants drawn together don't have alphabetically adjacent names, so same-page labels *shouldn't* be similar — and they aren't (ratio 1.006), while domain-specific registers (a separate star list) would still give the zodiac its shared *ot-/ok-* format — and they do.

Status: this is now a single coherent hypothesis that accounts for every measurement in Parts 1–9 — **a word-level nomenclator over an alphabetized, domain-registered key** — with the generated-artifact reading remaining as the rival that explains the same facts at the cost of "the scribes maintained elaborate conventions for nothing." What would separate them: code-assignment structure (alphabetical-sequential indexes predict that near-twin words should share grammatical contexts at rates above chance), or finding the key. The text has the capacity to mean something. Whether it does is now a question about a second, missing document.

Artifacts: `indus_test.py`; capacity numbers in this section reproducible via the one-liner in the session log.

---

## Part 10 — The grammar test: Voynich suffixes carry real syntax-like function (2026-06-10, fourth session)

**The question.** The codebook reading says the suffix alternations (*-edy/-ey*, *-dy/-y*) reflect plaintext inflections; the generation reading says they're decorative re-rolls. Decisive discriminator: if a suffix is grammatical, swapping it must change a word's *context* in a way that is **consistent across unrelated word families** — the way *cat→cats* changes the following verb regardless of the noun. Decoration cannot coordinate across families.

**Method.** For each suffix pair (s1,s2), take every stem x where both x+s1 and x+s2 occur ≥4 times in Currier B. Compare next-word (and previous-word) distributions of the two forms, pooled across stems. Measure **split-half replication**: split the stems into random halves, compute the form-context log-odds vector in each half independently, correlate (24 splits). Permutation null: shuffle form labels within stems. Controls: English/Latin real morphology (positive), the fitted generator (negative).

**Result:**

| corpus | suffix pair | stems | consistency r | z vs null |
|---|---|---|---|---|
| **Voynich B** | **-edy/-ey** | 64 | **0.554** | **6.3** |
| **Voynich B** | **-dy/-y** | 86 | **0.560** | **4.3** |
| Voynich B | -aiin/-ain | 25 | 0.203 | 1.0 (n.s.) |
| Generated control | -edy/-ey | 62 | −0.004 | 0.0 |
| Generated control | -dy/-y | 89 | −0.005 | 0.1 |
| English | -s/∅ | 168 | 0.498 | 3.8 |
| English | ∅/-ly | 49 | 0.420 | 2.8 |

The Voynich's two main suffix alternations replicate at **English-morphology strength**, while the generator — which matches every Part-5 statistic — scores exactly zero.

**Confound controls (both pass):**
- *Topic*: restricted to a single section, the effect persists independently in both — stars/recipes r=0.43–0.46 (z≈2.4–3.0) and biological r=0.45–0.47 (z≈2.6–2.7).
- *Layout*: excluding line-edge positions, r=0.47–0.50 (z=4.5–5.2).
- *Burstiness/self-citation*: covered empirically by the negative control — the fitted copy process, which does produce suffix runs, produces **no** split-half consistency.

**What the grammar looks like (first extraction).** Pooled across all stems: after a **-dy/-edy** word, the next word is disproportionately *another* -dy-class word (*okedy, otedy, shedy, chdy, qokchedy*) — suffix-class chaining, like agreement or concord. After a **-y/-ey** word, the next word disproportionately begins with *l-/r-* or is *ol/or/chol* (*lkeey, lkain, raiin, l, or*) — a distinct following-class. The suffix choice predicts the *class* of the next word, replicably, in both directions of the analysis.

**What this changes.** This is the first result in the investigation that is positive evidence of **rule-governed sequential structure at the word level** — distributional grammar — inside the Voynich text, surviving topic, layout, and self-citation controls, with a clean negative control. Implications:
1. The pure generation hypothesis, as implemented by any context-free copy/re-roll procedure, is now falsified on a second independent axis (the first was topic-locking). To survive, a generation account must posit context-sensitive suffix rules — at which point the "procedure" *is* a grammar, and the distinction between "elaborate procedure" and "encoded system" collapses.
2. The unified reading from Part 9b strengthens: a word-level code over an inflected language passes plaintext morphosyntax straight through to ciphertext context statistics — exactly what we observe.
3. A concrete decipherment program is now defined: classify the suffix classes by their distributional function (the -dy class chains; the -y class licenses l-/r- words), build the dependency grammar of Voynichese word classes, and match its typological profile (agreement-like chaining, class-conditioned ordering) against candidate language families. That is no longer statistics — that is the entry ramp of an actual decipherment.

Artifacts: `morphosyntax_test.py`, `morphosyntax_controls.py`.

---

## Part 11 — The first induced word-class grammar of Voynichese (2026-06-10, fifth session)

**Method.** Cluster the 236 most frequent Currier-B words by their context distributions alone (left/right neighbours + line position; k-means, k=8) — the algorithm never sees spelling. Then measure class-transition structure (mutual information between adjacent classes, minus a shuffled baseline) and compare the identical pipeline on English, Latin, and the fitted generator.

**Syntax strength (excess class-transition MI):**

| corpus | excess MI (bits) |
|---|---|
| English | 0.639 |
| Latin | 0.248 |
| **Voynich B** | **0.163** |
| Generated control | 0.002 |

Voynichese has genuine word-class syntax — about **two-thirds the strength of Latin** (a free-word-order language), a quarter of English (rigid word order). The generator, which contains the full fitted self-citation process, has *none* (0.002): the third independent axis on which context-free generation is falsified.

**The induced classes are coherent and align with morphology the algorithm never saw:**
- **Class 1 — line-openers** (41% line-initial): *saiin, sar, dair, sol...* — s-/d- forms.
- **Class 0 — line-closers** (38% line-final): *dy, oty, am, otam, dam...* — the *-am/-m* forms. The layout system of Part 1 emerges as two dedicated syntactic classes.
- **Class 5 — the -edy chain class**: *qokeedy, qokedy, otedy, okeey...* — self-chains (37% followed by itself): Part 10's "-dy chaining," now isolated.
- **Class 6 — the -ar/-or class**: *or, otar, okar, char...* — followed by Class 3 **53% of the time**, the strongest rule found.
- **Class 3 — particles**: *ol, aiin, ar, al, y...* — self-chains and feeds Class 2.
- **Class 7 — the qo-…-ain/-al class**: *daiin, qokain, qokaiin, qokal...* — feeds Class 2 (39%).
- **Class 2 — the che-/she- class**: *chedy, shedy, chey...* — feeds Classes 5 and 7.

**Reading of the grammar so far:** a cyclic backbone C2 → C5/C7 → C2..., punctuated by the particle system C6 → C3, with dedicated boundary classes opening and closing lines. Morphological families (prefix/suffix paradigms) map one-to-one onto syntactic behaviour — exactly the signature of an inflected system, and unobtainable from any decoration process tested.

**Cumulative verdict of the investigation (Parts 1–11):** the Voynich manuscript contains a rule-governed symbolic system with (i) notation-like word internals, (ii) real morphosyntax at English-morphology consistency, (iii) word-class syntax at ~⅔ Latin strength, (iv) topic-locked vocabulary, and (v) normal word-level information capacity. It is best described as **an encoded text — most plausibly a word-level code over an inflected natural language — not a hoax and not a transcript of speech.** Its grammar is now partially mapped; its key remains missing.

Artifacts: `wordclass_test.py`.

---

## Part 12 — Typological matching: what kind of language could be underneath? (2026-06-10, fifth session)

**Logic.** A word-level code hides word *identities* but passes word-*stream* statistics through unchanged. So candidate plaintext languages can be screened without reading anything: their word-stream fingerprints must match Voynichese's. Metrics used are all code-invariant (vocabulary growth TTR/Heaps/top-100, adjacent repetition, word-bigram conditional entropy, induced-class MI and self-class chaining), all corpora cut to 19,000 tokens, identical pipeline.

| corpus | TTR | Heaps | top-100 | H2(word) | repeat/1000 | class MI | **chaining** |
|---|---|---|---|---|---|---|---|
| **Voynich B** | 0.209 | **0.873** | 0.511 | 4.18 | **9.3** | 0.176 | **1.34** |
| Finnish | 0.386 | 0.765 | 0.341 | 2.93 | 4.6 | 0.222 | **1.36** |
| Latin | 0.284 | 0.712 | 0.353 | 3.18 | 0.0 | 0.211 | 0.85 |
| German | 0.197 | 0.657 | 0.532 | 3.99 | 0.8 | 0.355 | 0.86 |
| Italian | 0.235 | 0.658 | 0.502 | 3.81 | 3.3 | 0.335 | 0.46 |
| Spanish | 0.204 | 0.648 | 0.555 | 4.02 | 0.2 | 0.375 | 0.57 |
| English | 0.160 | 0.581 | 0.582 | 4.13 | 0.8 | 0.641 | 0.57 |

**Findings.**
1. **No tested language matches the whole fingerprint** — overall z-distances are all large (German 4.1 … English 5.9). The Voynich word stream is *not* a plain word-for-word encoding of Latin, English, or a major Romance/Germanic language. (Another nail for the Latin/Romance "translation" claims.)
2. **The distinctive metrics point one direction: agglutinative-type.** On the three measures where languages really differ, Voynichese pairs with Finnish — near-linear vocabulary growth (Heaps 0.87 vs 0.77; all others ≤0.71), tolerance of adjacent repetition (9.3 vs 4.6 per 1000; Romance/Latin ≈ 0–3), and above all **class self-chaining: Voynich 1.34, Finnish 1.36, every other language ≤ 0.86.** Same-class chaining is characteristic of suffix-heavy agglutinative word streams (case-agreement runs, reduplication); fusional and analytic languages avoid it.
3. **The residuals fit homophonic coding.** Voynich exceeds every language in vocabulary growth and repetition. Standard 15th-century nomenclators assigned *multiple* interchangeable codes to common words (homophones) — which inflates TTR/Heaps exactly this way while preserving class structure. A homophonic word-code over a suffixing, agglutinative-type language is the single description most consistent with all twelve parts of this investigation.

**Caveats.** Six languages, one text each, genre uncontrolled; "agglutinative-like" is a typological direction, not an identification. The proper next screen: Turkish, Hungarian, Hebrew, Arabic, Persian, Greek, Czech, Nahuatl corpora — the framework is built and each new candidate costs minutes. Notable context: a Turkic plaintext has been argued by some independent researchers; this analysis neither confirms any specific claim nor rules it out — it says the *type* is the right neighbourhood to search.

Artifacts: `typology_test.py`.

---

## Part 13 — The century's candidate languages, finally screened on one instrument (2026-06-10, sixth session)

Extended the code-invariant word-stream screen to the languages people have actually proposed as the Voynich plaintext: now 13 candidates (added Hungarian, Turkish, Hebrew, Arabic [Quran, dediacritized], Persian, Greek, Czech), all at 19,000 tokens, identical pipeline, script-filtered.

**Overall distance ranking (closer = more Voynich-like):** Finnish 4.68 < Greek 4.73 < Hungarian 4.85 < Czech 4.98 < German 5.02 < Italian 5.04 < Turkish 5.34 < Latin 5.46 < Persian 5.64 < Hebrew 5.70 < Spanish 5.72 < Arabic 5.91 < **English 6.99 (decisively last)**.

**Findings.**
1. **The top tier is uniformly the inflection-rich club** (Finnish, Greek, Hungarian, Czech) — and the bottom tier is exactly the languages most often "deciphered" in popular claims: English dead last, Spanish/Arabic/Hebrew near it. A century of Anglophone and Romance "translations" was looking in the statistically *worst* corner of the candidate space.
2. **The chaining club is small and telling.** Class self-chaining above 1.0 occurs only in: Finnish 1.36, **Voynich 1.34**, Hebrew 1.20, Turkish 1.00. Everything Romance/Germanic sits at 0.46–0.86. Whatever underlies Voynichese shares this signature with agglutinative and unvocalized-Semitic word streams only.
3. **Greek is the surprise runner-up**, nearly matching Voynichese's extreme vocabulary-growth exponent (Heaps 0.862 vs 0.873 — the two highest values measured).
4. **No language matches the full fingerprint.** The Voynich combines English-level word-bigram entropy (4.18) with Finnish-level chaining (1.34) and an adjacent-repetition rate (9.3/1000) double the most repetition-tolerant natural language tested. That specific triple exists in no natural word stream we measured. The encoding layer itself (homophones inflate bigram entropy; code-table reuse creates repeats) remains the most economical explanation for the residual — but honesty requires noting the homophone account does not cleanly resolve the TTR direction for agglutinative candidates; the repetition excess is a genuine open anomaly.

**Bottom line of the screen:** if a natural language underlies the Voynich, its word-stream type is **inflection-rich and chain-tolerant — Uralic/Greek/Slavic/Turkic territory, emphatically not English, Romance, or Arabic.** The framework makes any further candidate (Nahuatl, Mongolian, Georgian, dialectal German…) a minutes-long test.

Artifacts: `typology_test.py` (extended), corpora in `data/typo_*.txt`.

---

## Part 14 — The repetition anomaly dissolves — and points at what kind of text this is (2026-06-10, sixth session)

**Setup.** Part 13 left one unexplained residue: adjacent-word repeats (*qokeedy qokeedy*) at 9.4/1000 — double the most repeat-tolerant language tested. Two live readings — scribal dittography and grammatical reduplication — both predict a genuine *adjacency preference*. The correct null model distinguishes them from a third possibility: shuffle words *within each line* (preserving each line's word composition exactly) and see if adjacency survives.

**Result: there is NO adjacency preference at all.** Observed adjacent repeats: 188. Within-line-shuffle null: 193.5 ± 10.1 (z = −0.5). Run-length distribution (174 doubles, 7 triples) also matches the null. Given which words occur in a line, their adjacency is pure chance.

**So the real phenomenon is line-level multiplicity:** Voynich lines contain multiple copies of the same word at rates no natural prose shows — the famous *qokedy qokedy dal qokedy qokedy* is a line containing four *qokedy*; that they sit adjacent is incidental. The burst words are overwhelmingly the *qok-/-eedy* family, strongest in the biological section (12.2/1000 vs 7.6–8.4 elsewhere).

**Implications.**
1. **Dittography (copying error) and reduplication (grammar) are both eliminated** — each predicts adjacency preference; there is none.
2. **What repeats words many times within one short line is not prose. It is lists, recipes, tallies, litanies — record-like text.** This converges with everything else known about the line: dedicated line-opener and line-closer word classes (Part 11), the line-final *m* termination (Part 1), line as functional unit. The Voynich line behaves like a **self-contained record** — an entry in a compendium — not a sentence in running discourse.
3. A mild new lead, flagged cautiously: in near-repeat pairs (edit distance 1), the second occurrence trends *shorter* (219 vs 180, p≈0.06) — reminiscent of scribal abbreviation-on-repeat or anaphoric reduction. Worth a dedicated test someday.
4. The typological screen of Part 13 should be re-read in this light: comparing the Voynich to *novels* understates the match — the right comparison corpora are **inventories, herbals-as-recipe-lists, liturgical litanies** — genre, not just language, drives several of the residuals.

**Updated bottom line:** the Voynich text is best modelled as **an encoded compendium of short records** — recipe-like or list-like entries, one per line, in a word-level code over an inflection-rich language type. Every previously "weird" statistic now has a place in that single description: low char entropy (code format), near-twin vocabulary (alphabetized/homophonic key), topic-locking (different record domains), grammar (the plaintext's), line-unit structure and within-line repeats (records and tallies), label format classes (register prefixes).

Artifacts: `repeat_test.py`.

---

## Part 15 — Genre control: the records conjecture survives structurally, fails statistically (2026-06-10, seventh session)

Part 14 conjectured that record-genre text (recipes, lists) would close the Voynich's residual statistical gaps. Tested with a real recipe compendium — *The Closet of Sir Kenelm Digby Opened* (1669, 95k words) — against the English novel corpus, same language, same pipeline, matched 19,000 tokens.

**Result: mostly refuted.** Genre closes the vocabulary-spread gaps (TTR: 97% of the novel→Voynich gap closed; top-100 coverage: +40%; class-MI: +17% — recipe text is more list-like and less syntactic, as predicted). But the three *distinctive* Voynich signatures do not move:
- **Adjacent repetition**: recipes have *fewer* repeats than novels (0.05/1000 vs 0.6 vs Voynich's 9.3) — English recipe prose avoids repeating nouns via pronouns and ellipsis ("take it...", not "take water water").
- **Class chaining**: unchanged (0.57 vs novel 0.56 vs Voynich 1.34).
- **Vocabulary growth exponent**: unchanged.

**What survives and what it means.** The line-as-record model keeps its *structural* support (line-boundary classes, line-final termination, self-contained lines) but the within-line word-multiplicity is now shown to be NOT a generic record-genre feature — ordinary recipe language solves repetition with anaphora. The multiplicity signature therefore points at something more specific: (a) enumerative/tally-like text where the item is restated rather than pronominalized, (b) a plaintext language/register avoiding anaphora, or (c) the encoding layer (a scribe reusing the same code token from working memory — notable because *strict* homophone discipline would produce the opposite). This narrows the open question usefully: **the multiplicity is the single most diagnostic unexplained feature of Voynichese**, and any candidate plaintext-plus-code model must reproduce it.

Artifacts: `genre_test.py`, `data/genre_digby.txt`.

---

## Part 16 — The burst-decay test: word recurrence respects the line, and the process explanation dies (2026-06-10, seventh session)

**Question.** Part 15 left three explanations for the within-line word multiplicity: (a) tally-like enumeration, (b) anaphora-poor plaintext, (c) scribal working-memory reuse of code tokens. (c) is a *process* account: it depends on token distance and cannot see line boundaries. (a)/(b) are *content* accounts: the line is a record about something, so recurrence should stop at the line break. Test: at the **same token distance d**, compare P(word recurs) for same-line vs line-crossing pairs — with the generator (a pure distance process) and English (arbitrary lines) as controls.

**Result (recurrence per 1000 pairs, matched distance):**

| | same line / cross line at d=1 | mean elevation ratio (d=2–8) |
|---|---|---|
| **Voynich B** | **9.4 / 1.7** | **1.32** |
| Generator | 10.1 / 10.4 | 1.00 |
| English | 0.7 / 1.2 | 0.96 |

1. **Voynich recurrence is line-bound.** At identical token distance, a word is ~1.3× more likely to recur within its own line — and adjacent repetition across a line break collapses to 1.7/1000 vs 9.4 within lines (5× drop). The generator — same global statistics, distance-based copying — shows exactly no line effect (1.00), proving the measurement separates the hypotheses.
2. **The working-memory / lazy-homophone account is refuted.** A scribe's memory doesn't reset at line breaks; the recurrence does.
3. **The line-as-record model is confirmed by a second independent signature** (after the boundary word-classes of Part 11): whatever a line is "about," it names repeatedly *inside the line* and stops naming at the break. Cross-line recurrence settles to a flat page-level base rate (~7–8/1000) out to d=24 — page-level topic, line-level record.
4. New micro-structure observed: within-line recurrence peaks at d=2, not d=1 — patterns of the form *X·a·X* (the word restated with one word between) are the favourite — enumerative/formulaic alternation, exactly the texture of tallies and litanies.

**State of the model after sixteen parts:** an encoded compendium where the **page is a topic, the line is a record, and the record restates its subject** — written in a word-level, domain-registered code (alphabetized and at least partly homophonic) over an inflection-rich plaintext type. The strongest surviving rivals are no longer "hoax" or "language transcript" — both long since falsified — but variants within that one description.

Artifacts: `burst_test.py`.

---

## Part 17 — Breaking the letters assumption: the e/i runs are a free operator, not spelling (2026-06-10, eighth session)

**The assumption everyone shared** — every researcher for a century, and Parts 1–16 of this report: *glyphs are letters, words are words.* Tested tonight: the hypothesis that the repeated-glyph runs inside words (*qokedy / qokeedy / qokeeedy*; *dain / daiin / daiiin*) are not vowels but an **embedded variable** — tally strokes / a quantity-or-grade dial attached to an item code.

**Discriminator.** In any language, spelling is lexically fixed: a frame X_Y around a vowel run admits one run length (each word has its spelling). If the run is a free variable, the same frame should occur with lengths 1, 2, 3 freely — even within a single line, on the same item. Controls: English (letter doubling), **Finnish — a language with genuinely contrastive phonemic vowel length** (tuli/tuuli are different words: the hardest-case control), and the generator.

**Results (flexibility = weighted normalized entropy of run length per frame; 0 = fixed spelling):**

| corpus | channel | flexibility | frames with ≥2 lengths |
|---|---|---|---|
| **Voynich B** | e-runs | **0.406** | **136 / 163** |
| **Voynich B** | i-runs | **0.504** | **44 / 45** |
| English | e/o/l | 0.000–0.014 | ~1% of frames |
| Finnish | a/i/e | 0.026–0.029 | ~8% of frames |
| Generator | e/i | 0.334–0.428 | (free by construction) |

And within a single line: when the same frame recurs, the run length **differs 32–35% of the time** in the Voynich; in English, **0.00%** (458 recurrences, zero variation).

**Reading the result with Part 10.** The run-length slot is (a) not lexically fixed — 15× more flexible than even a true phonemic-length language; (b) freely re-set *within one line on the same item*; yet (c) — from Part 10 — its setting covaries with context *systematically across all word families* (consistency r≈0.55; the generator, which is flexible but random, scores 0.00). A slot that is free per item, varies within a record, and is context-governed corpus-wide is not spelling, not phonology, and not decoration. **It is an operator: a meaningful dial attached to item codes.** The unary form (1–3 strokes, never more — e-runs: 67%/31%/2.5%) and the record/tally structure of lines (Parts 14/16) tilt the interpretation toward **quantities or grades** — the way a ledger writes "item, two; item, three" — though a case-like grammatical operator cannot be excluded without an external anchor.

**Status.** This is the single most decipherment-relevant structural finding of the investigation: it factors every Voynich word into **item-code × operator-value**, collapses thousands of "different words" into far fewer items (directly addressing the Part 5 type-count residual), and gives any future key-search a two-part target. It is also the first result here that genuinely abandons the letters-and-words frame everyone shared.

Artifacts: `quantity_test.py`.

---

## Part 18 — Stage-one decipherment attempt: real ground gained, one hypothesis killed by its own control (2026-06-10, eighth session)

Goal: turn the Part-17 operator finding into actual reading. Four moves, reported with the failures intact.

**1. Lexicon factoring — honest size.** Collapsing e/i operator-runs merges only **13%** of the surface lexicon (4,584 → 3,995 codes). Correction to any earlier impression: operators do *not* dramatically shrink the vocabulary — 3,519 item codes occur in a single form; only ~470 codes carry 2+ operator values. The operator is real (Part 17) but it is one modest sub-system, not the master key. *Most Voynich word distinctions are in the item code, not the operator.*

**2. Proto-lexicon with domain attribution (the genuine asset).** Each item code carries a strong, measurable domain bias — e.g. code *qokedy* → biological section at G²=398 (overwhelming), *shedy* → biological G²=331, *or/ar* → herbal, *ain/chey* → stars-recipes. This is real *domain-level reading*: we cannot say what *qokedy* means, but we can say with high confidence it is vocabulary of the bathing/biological subject matter. A 15-entry proto-lexicon with confidence scores is now in hand.

**3. KU-RO "totals" test — FALSIFIED by control.** Hypothesis (from Linear A): a line-final word states a total, so its operator value should track the line's values. Raw signal looked strong (final-word value vs line-body mean: r=0.167, z=7.5). **But the control demolishes it:** the *first* word (r=0.174) and a *random interior* word (r=0.182) correlate just as much. The final position is not special — there is no summation word. What's actually there is mild **within-line autocorrelation of operator values** (adjacent r=0.116 vs 0.075 shuffled): a record tends to sit at a coherent "value level," but it is not a tally that sums to a total. Reporting the kill matters as much as a hit — this is the discipline this project applies to others' claims, applied to its own.

**4. Zodiac day-number test — negative.** If star/nymph labels on zodiac pages (≈30 per page, plausibly the 30 degrees/days) encoded numbers, their operator values would spread like a count. They don't: 47% sit at value 0 and the distribution mirrors running text — no day-number signal.

**Where this honestly leaves "decipherment."** Stage-one *structural* decipherment advanced: the word is factored (item-code × operator), the operator is characterized, item codes are domain-tagged, and two specific semantic hypotheses (totals, day-numbers) were tested and rejected under control. Stage-two *semantic* decipherment — assigning meanings — remains blocked by the same wall as always: no bilingual, no external key, and the within-text anchors (labels↔pictures, totals) keep failing because the encoding severs item codes from their referents. The most defensible "reading" achievable from the text alone is **domain-level** (this word belongs to the plant register / the bathing register), not lexical. A true reading still requires the external key — an archival object, not a computation.

Artifacts: `factorize.py`.

---

## Part 19 — Reading the record template: the shape of a Voynich entry (2026-06-10, eighth session)

The maximum decipherment the text alone permits: not *what* an entry says, but its fixed *form* — the slot-grammar of a line, recovered from induced word-classes by position. A line's class is **12× more predictable from its position than in English or the generator**:

| | word-class predictable from line-position (MI, bits) |
|---|---|
| **Voynich B** | **0.051** |
| English | 0.004 |
| Generator | 0.003 |

This is decisive: Voynich lines are **strongly templated** — they have a fixed internal form — where English prose and the generator have essentially none. The recovered template (class enrichment by position):

- **OPENER slot** — class C1 (*saiin, sar, dair*; s-/d- forms): 3.1× enriched at line-start, depleted everywhere else. Every record begins with a header-type word.
- **BODY** — classes C2/C5 (*chedy, qokeedy*; the -edy chain) dominate the early-middle; C4 (*keedy*) spikes in the second slot; C6 (*or, otar*) and C3 (*ol, aiin*) fill the late-middle.
- **CLOSER slot** — class C0 (*dy, oty, am*; the -y/-am forms): **3.0× enriched at line-end**, strongly depleted at the start.

So a Voynich record reads, structurally: **[header] [body: item-codes carrying operator-values] [terminator].** That is the exact skeleton of a *catalogue/recipe entry* — "Subject: ⟨items with quantities⟩. ⟨end-mark⟩" — recovered with zero lexical knowledge, purely from positional grammar. It is the strongest structural confirmation of the records model (Parts 14/16) and, combined with Part 17, yields a complete **grammar of an entry** even though the words remain unread.

**This is the ceiling of text-only decipherment, and we have reached it.** We can now state the form of every line, the class of every word, the operator on every item, and the domain of every item code — everything *except* the lexical key, which the encoding deliberately externalised. Going further is not a harder computation; it is a different kind of search (an archive). The investigation has deciphered the manuscript's **structure** completely while proving its **semantics** unrecoverable from the text in isolation — which together constitute the real, defensible answer to "what is the Voynich manuscript and why can't we read it."

Artifacts: `template.py`.

---

## Part 20 — Reading the one legible channel, and the honest limit (2026-06-10, eighth session)

**What can actually be read.** With item-codes locked behind the external key, the *operator channel* is directly legible. A sample folio (f111r, stars/recipes) reads, structurally:

```
record 1:  kcholchdar  shar  aip·2  chepchedy·2  chetalshy·1  shek·2  shear·1  shey·1  ror  am  shey·1
record 4:  sain·2  otedy·2  qokey·2  dain·2  okedal·1  chedy·1  qokedy·1  l  shedy·1  okey·2  lor  ar  am
```
— every word resolved into ⟨item-code⟩·⟨value⟩. The item-codes we cannot read; the values we can.

**What the value channel turns out to be — a correction.** Part 17 leaned toward "quantities/tallies." The full distribution refutes that lean: values are a **closed set {0:38%, 1:34%, 2:25%, 3:3%, ~0 above}** — entropy 1.74 bits. Real quantities/counts have a long tail (1,2,3,5,10,20…); this has a hard ceiling at 2–3. A small closed set that is context-governed across word families (Part 10), free per-item within a record (Part 17), and section-varying (mean value: stars/recipes 1.05 > biological 0.85 > herbal 0.78) is the profile of a **grammatical operator — a number/case/grade inflection — not a tally.** This tightens the model: the plaintext is inflection-rich (matching the Part 13 typology screen), and the operator is one of its inflections, made visible because the encoding renders it as glyph-runs rather than hiding it in the item-code.

**So the readable layer is the grammatical skeleton, not the content.** We can read, across the whole manuscript: the form of each record (Part 19), the class of each word (Part 11), and now the inflectional value on each item (this part) — the function-word/morphology scaffold. What stays sealed is the content-word lexicon, encoded word-for-word with an external, homophonic key. It is exactly like recovering a redacted document in which only the grammar shows through: *⟨header⟩ ⟨item⟩-2 ⟨item⟩-2 ⟨item⟩-1 … ⟨end⟩* — you can read the bones, never the nouns.

**The principled limit, demonstrated not asserted.** Across eight sessions the content layer was attacked from every internal angle — label↔picture (Part 8), sequential index (Part 8), totals/KU-RO (Part 18), day-numbers (Part 18), frequency structure (Parts 5–13) — and each failed under control. Breaking a word-level homophonic code over an unknown specialist register requires one of: the key document, a crib, or clean 1:1 frequency structure — and the manuscript supplies none. Producing content-word "meanings" from here would require inventing a key, i.e. fabrication. **The structural decipherment is complete; the semantic decipherment is, from the text in isolation, provably out of reach.** That conjunction is the real, defensible answer to the 600-year question — and the redirect it implies (find the key in the archives: Rudolf II inventories, Kircher correspondence, period nomenclators) is the only honest path to the words themselves.

Artifacts: extraction inline; reproducible from `factorize.py` + `ZL3b-n.txt`.

---

## Part 21 — Reverse-engineering the design: the manuscript is a constructed notation, not an enciphered language (2026-06-10, ninth session)

**The reframing.** A century of failure spans three frames — transcribed language, enciphered language, meaningless. Our data rejected all three. The mistake (mine included, Parts 9–20) was forcing the residue into "enciphered language with a lost key." But one finding contradicts even that: **layout-dependent spelling** (Part 1). A scribe enciphering a pre-existing plaintext cannot produce word-forms that vary with line position — the plaintext doesn't know where ciphertext lines will break. Conclusion: **the text was composed natively in this system, on the page. No plaintext draft ever existed.** It is therefore not a cipher *of* a language but a **constructed compositional notation** — meaning carried by the word-slots themselves (prefix·mid·ending·operator in near-fixed order), the way a chemical formula or a Lullian combinatoric wheel carries meaning.

This frame fits every prior finding without strain: near-twin vocabulary (Part 1) = *similar things get similar codes by design* (a taxonomy); grammar (Parts 10–11) = the notation's composition rules; "no language matches" (Part 13) = there is no language underneath; record templates (Part 19) = entry schemas. And it is period-appropriate and innocent: Ramon Llull's combinatoric letter-notation was the live intellectual technology of 1400s Europe.

**The tractability consequence is decisive.** A nomenclator needs a lost ~4,000-entry dictionary — hopeless. A slot-notation needs the meaning of its components: **~10 prefixes, ~40 mids, ~20 endings, 4 operator values ≈ 70 symbols.** That is a crackable keyspace, anchored to visible illustrations.

**A specific, testable mechanism for the operator — the Galenic degree.** Medieval pharmacy classified every herb on a closed scale: hot/cold/dry/moist in **degrees 1–4**. Our operator is a closed {0,1,2,3} scale (Part 20) attached to items. Prediction: if the operator encodes a property *of the thing described*, then herbal pages (one plant each) should carry **plant-specific operator profiles on the same item-codes**, beyond word choice. Test on 124 herbal pages:

| test | observed across-page variance | permutation null | z | p |
|---|---|---|---|---|
| raw page-mean operator value | 0.0510 | 0.0102 ± 0.0014 | **30.2** | <0.001 |
| **same-item residual (word-choice controlled)** | 0.0011 | 0.0007 ± 0.0001 | **5.1** | **<0.001** |

The raw effect is huge but mostly topic (different plants → different words). The decisive test subtracts each item-code's corpus-mean operator value first — and a **page-specific operator signal survives at z=5.1, p<0.001**: the *same* item-code is written with systematically different operator values on different plant pages. That is exactly the signature of a **per-subject graded property** — degree-like, as the Galenic hypothesis predicts — not an artifact of which words appear. (Honest calibration: the residual effect is small in absolute terms; most page variance is topical word-choice. But the controlled signal is real and directional.)

**Status — this is the first frame that is both consistent with all 21 parts AND tractable.** It does not yet read a word, and I will not claim it does. But it converts the problem from "find a lost 4,000-word dictionary" (archival, hopeless) to "assign meanings to ~70 compositional symbols against visible referents" (analytic, hard but finite) — and it yields its first confirmed semantic hook: the operator behaves like a graded property of the depicted subject. That is a genuine, falsifiable research program, not a fabrication and not a dead end.

**Immediate next tests (defined, runnable):** (1) do prefix classes partition items into illustration-coherent groups (plant-part? body-system?) on the balneological pages where referents are explicit? (2) does the ending-class correlate with grammatical role in the record template (Part 19)? (3) does operator value on the *zodiac* pages track the figure's position in the sequence (degree of the sign)? Each is a direct test against visible pictures.

Artifacts: extraction inline; reproducible from `factorize.py` + `ZL3b-n.txt`.

---

## Part 22 — Assigning functions to the slots: reading the notation's grammar (2026-06-10, ninth session)

Three referent-anchored tests on the slot system, to assign *function* (not yet content) to each slot.

**Slot 1 — PREFIX = classifier (category marker).** Prefixes specialize hard by section/referent-domain (G² up to 1389), not evenly: *ch-* is 50% herbal (G²=1389), *sh-* 50% herbal, *qok-* peaks balneological (G²=1176), *ol-/she-* balneological, *ot-/a-/l-* stars-recipes. The initial slot marks *what kind of thing the item is* — the signature of a classifier morpheme. (Honest caveat: partly entangled with topic vocabulary; the clean control — do prefixes sub-partition items *within* one section — is the next test. But it is the *prefix* slot specifically that carries the load, as a classifier should.)

**Slot 4 — OPERATOR = graded property of the subject, but NOT a sequential degree.** Part 21 showed it tracks the depicted subject (z=5.1). New test kills one specific reading: operator value vs a zodiac label's position around the medallion (the "degree of the sign / day-number" hypothesis) gives **mean r = −0.038 across 12 pages** (scattered ±) — no sequential structure. So the operator is a *static graded attribute* of the item (consistent with a Galenic-degree-like classification), not an ordinal position/count. A clean negative that removes the tally/day-number reading for good.

**Slot 3 — ENDING = grammatical role (position-linked).** Endings map cleanly to record position: record-**initial** words favor *-aiin/-ol/-or* (and avoid *-y/-am*); record-**final** words favor *-y* (20%), *-am* (11%), *-dy*, and the bare stem. The ending inflects the item for its **role in the entry** (opener vs closer/terminator), matching the template of Part 19.

**What this yields — a functional gloss, honestly bounded.** We can now read a Voynich word's *structure* in functional terms, even without its lexical content. Example, *qokeedy* parsed: **[qok- : balneological-class item] [-ee- : operator value 2 / "degree 2"] [-dy : medial-role inflection].** Read across a line, *sain·2 otedy·2 qokey·2 dain·2 …* becomes *[opener] [class-X item, grade 2] [class-Y item, grade 2] …* — the **schema of an entry, with category and grade filled in, the specific referent still sealed.**

This is the real ceiling, sharpened: the notation's **grammar is deciphered at the functional level** — every word resolves to ⟨category⟩⟨grade⟩⟨role⟩ — while the leaf-level identity of each category (which plant, which humor) requires pinning ~10 prefix-classes and ~40 mid-stems to referents via the illustrations. That is the finite, picture-anchored program the constructed-notation frame opens, and three of its slots now have confirmed functions. Two specific content-hypotheses (day-numbers, sequential degree) were tested and rejected en route — the discipline that separates this from fabrication.

Artifacts: extraction inline; reproducible from `factorize.py` + `ZL3b-n.txt`.

---

## Part 23 — The leaf-level test: the herbal has no plant-names — it describes by reused attributes (2026-06-10, ninth session)

Attempted the first leaf-level content reading: if the herbal text *names* its plants, each one-plant page should carry a signature content-morpheme appearing mostly on that page (like a name). Test on 128 herbal pages, isolating the **mid-stem** (prefix-classifier and operator stripped) as the content slot.

**Result — a structured negative.**
- Mid-stems are ~2× more page-concentrated (0.114) than the prefix-classifier (0.055) or grammatical ending (0.077) — confirming the mid slot carries the most content-like, page-specific information, as the slot model predicts.
- **But ZERO mid-stems are plant-page-specific** (none reach 50% of their occurrences on a single page). Even the most content-like morphemes spread across ~9+ different plant pages. There is **no "word for this plant."**

**What the negative means (the real finding).** If the same content morphemes recur across many *different* plants, the text is not *naming* plants — it is **describing them with a closed, reused vocabulary of attributes** (property-terms like "hot," "moist," "grows-in-water," "for-fever"), recombined per plant. This single inference explains a cluster of prior puzzles at once:
- why labels never match their page's text (Part 8): the plant is **not named** in the text, only described;
- why the operator is a *graded property* of the page (Part 21) with no per-plant identity;
- why vocabulary is domain-locked but item-codes recur everywhere (Parts 6, 9): attribute-terms are shared across all members of a domain;
- why no language matches (Part 13) and a slot-notation fits (Part 21): an attribute-description notation has no proper nouns, just a recombining property lexicon.

**Consequence for decipherment — a reframe of the target itself.** "Find the plant names" is very likely the **wrong target** — the Voynich herbal may contain *no plant names at all*, only attribute-strings. The correct leaf-level target becomes: **identify the ~40 mid-stem property-terms by the visual feature shared across exactly the pages where each recurs** (the stem on round-leaved plants = "round-leaved"; the stem on red-flowered plants = "red"). That is a concrete, falsifiable program — and it predicts that **visually similar plants share more mid-stems than dissimilar ones**, a test requiring page-level botanical feature coding of the illustrations (the next data step, not computable from transliteration alone).

This is why centuries of name-hunting failed: there were no names to find. The text is built from a small recombining attribute vocabulary — exactly a constructed descriptive notation. The honest state: we now know *what kind of content* the words carry (graded attributes, not identities) and *why* it resisted every naming-based attack, while the attribute-to-feature mapping awaits illustration coding.

Artifacts: extraction inline; reproducible from `factorize.py` + `ZL3b-n.txt`.

---

## Part 24 — A referent-grounded mini-lexicon: assigning coarse meanings from the pictures (2026-06-10, ninth session)

The breakthrough method: the **section type is image-derived data** (herbal pages show plants, balneological show water/bodies, pharmaceutical show plant-parts in jars). So a content-stem *confined* to one section can be assigned a **coarse semantic field from that section's drawn subject** — meaning grounded in the illustrations, not guessed from a language.

**Result — a 17-entry data-grounded mini-lexicon.** 17 mid-stems are confined to one section (≥55% of occurrences, G²>15). Each receives a referent-grounded semantic field:

| stem | n | confined to | coarse meaning (from the pictures) |
|---|---|---|---|
| tch, cha, ctho, ckho, otch, ctha, tsh, om | 15–93 | herbal (whole plants) | **PLANT-domain attribute** |
| lk, lsh, chckh | 23–29 | balneological (water/bodies) | **WATER/BODY-domain attribute** |
| alk, ara, alka, chda, aro | 15–41 | stars/recipes | **PROCESS/RECIPE-domain attribute** |

**Cross-validation (independent referent).** 5 of 8 herbal-confined stems *also* appear in the pharmaceutical section — which draws **plant-parts in containers**. A plant-domain attribute term appearing in both the whole-plant section and the plant-parts section is exactly what a genuine plant-substance vocabulary predicts. Two independent picture-contexts agree.

**An actual glossed reading.** Combining the slot functions (Part 22) with this lexicon, a real Voynich word resolves to referent-grounded content:

> *otchody* → **[ot- : stars/recipes classifier] [otch : PLANT-domain stem] [-o- : grade 1] [-dy : medial role]** ≈ *"(in a recipe) a plant-attribute item, grade 1, mid-position."*

A herbal line *cthar · ctho·1 · cha·2 · tchy* reads, grounded in its pictures, as *⟨plant-attribute⟩ ⟨plant-attribute grade-1⟩ ⟨plant-attribute grade-2⟩ ⟨plant-attribute⟩* — a **string of graded plant-properties**, precisely what an attribute-description herbal should contain.

**Honest grain of this result.** This is **coarse semantic-field decipherment**: each content-stem is placed in a referent domain (plant / water-body / process) and given a grade and grammatical role — a genuine, picture-grounded, non-fabricated meaning. It is *not* fine lexical decipherment ("tch = bitter") — that requires the per-feature illustration coding of Part 23. The caveat is stated plainly: section-confinement is partly topical, so these are domain fields, not exact glosses. But it is, concretely, *meaning assigned to manuscript morphemes from the manuscript's own pictures* — the first such lexicon, and the legitimate endpoint of the evidence in hand.

**The complete honest reading the investigation now supports:** every Voynich word = ⟨picture-grounded domain⟩ × ⟨1–4 grade⟩ × ⟨grammatical role⟩; the herbal is graded plant-attribute strings; the text names nothing and describes everything; full lexical glosses await illustration feature-coding. That is as far as the data carries us — and it carries us to a real, if coarse, reading.

Artifacts: extraction inline; reproducible from `factorize.py` + `ZL3b-n.txt`.

---

## Part 24 — Reproducibility fix + historical frame (2026-06-10, ninth session, responding to independent review)

An independent review (Manus AI, 2026-06-10) made one fair structural criticism: the headline token counts were not auditable because the canonical parser was never bundled — a fresh parser gave 35,056 tokens vs the report's 33,122 (a filtering/normalization difference, not an error). **Fixed:** `canonical.py` is now the single parsing source of truth; every audit test imports it. Running it reproduces the report's headline table exactly: running text 33,122 tokens / 6,530 types; Currier A 10,105; Currier B 22,496 / 4,590; labels 1,138; 25-glyph EVA alphabet; with full codicological metadata per token (quire $Q, bifolio $B, hand $H, section $I, language $L, line, position). The earlier mismatch was the reviewer's parser keeping uncertain/label/marginal loci the canonical pipeline excludes from running text.

**Historical frame now integrated** (from supplied context dossiers, sourced to Zandbergen/voynich.nu, Yale Beinecke, Davis, Guzy 2022). These do not change the statistics but constrain interpretation:
- **Hard anchors:** radiocarbon 1404–1438 (4 samples, mutually consistent); calfskin, iron-gall ink, mineral pigments, no palimpsest — genuine early-15th-c. European object, not a modern forgery. Roger Bacon authorship is chronologically impossible.
- **Five scribes** (Davis), production organized **by bifolium**, current order likely disturbed by rebinding, **14 folios missing** (incl. cut f12, f74; Capricorn/Aquarius gone). → A signal read on the current page order may be a binding artifact; and **a key/title/table may simply be on a lost leaf.** This directly supports the investigation's conclusion that the key is an external/absent object.
- **By 1637 the text was already opaque to its learned Prague owners** (Baresch spent ~20 years; Kircher called it steganographic and never solved it). → No unbroken living tradition into the 17th c.; decipherment cannot assume one.
- **Genre context:** the object fits the 15th-c. **medical-herbal / alchemical-herbal compendium** world (Tractatus de Herbis, Sloane 4016 Lombardy c.1440; Italian-Alpine alchemical herbals) — visually rich, practical, plant+star+recipe+bath knowledge not yet split into modern disciplines. This is the correct control-genre family (per the review's genre-control point), and it matches the "encoded compendium of records" model from Parts 14–18.
- **Provenance leads for the key-hunt:** Widemann→Rauwolf→Rudolf II book route (Guzy 2022, the ~600-coin 1599 sale); Jesuit/Collegium Romanum custody to 1912. The missing key, if it ever existed on parchment, would travel with or near this trail.

The review's audit program (canonical parser, alternative-transliteration replication, quire-blocked holdout, within-section tests, multi-token name search, operator layout-controls, flattened-table test, layer ablation) is being executed in Part 25.

---

## Part 25 — Hostile audit round: stress-testing the model against an independent review

A skeptical reviewer was invited to break the model. Six independent "break-it" tests were run, each designed to expose a specific way a headline claim could be an artifact (transcription choice, scribe/quire confound, table layout, graphic convention, missing multi-token names, or layout-edge tokens). Each test below imports the same canonical parser and re-derives every number from the running text; verdicts are reported without softening.

### Test A — Transcription dependence of Voynichese morphosyntax (v101 vs EVA)
- **Hypothesis stressed:** the Part 10 claim that Currier B has real word-final suffix morphology (next-word context consistency, split-half r≈0.55) is an artifact of EVA's specific glyph segmentation.
- **Method + control:** re-ran the exact Part 10 morphosyntax pipeline on the fully independent v101 transliteration (digits+capitals alphabet), auto-discovering suffix pairs; the decisive control was a within-line word-order shuffle (destroys syntax, keeps the suffix inventory) so a genuine syntactic signal must collapse.
- **Key numbers:** v101 winning pairs e/y r=0.79–0.81 (z=3.5–4.0), m/y r=0.75 (z=3.8–4.1), n/y r=0.72–0.76, e/n r=0.66–0.69, permutation null ≈0.00–0.09; under the within-line shuffle these collapse to r=0.13/0.40/0.28/0.28 (z=0.8–1.7), while two trap pairs (m/n, 89/c9) scored *equal-or-higher* under shuffle and were correctly flagged spurious. EVA anchor through the same code reproduced Part 10 (edy/ey r≈0.49–0.58 z≈4.7; dy/y r≈0.53–0.56 z≈5.1–5.7). Character entropy h2: v101=2.37 bits, EVA=1.98 bits.
- **Verdict: MODEL REFINED.** The morphosyntactic signal survives into an independent segmentation, so it is not an EVA artifact — but in v101 the consistency is *next-word only* (backward/prev-word signal does not reproduce), z values are lower and noisier (3–4 vs EVA's 5–6), and the shuffle exposed that a single permutation null can be fooled. Claim narrowed to "consistent in the following context."

### Test B — Quire-blocked holdout (hypothesis-drift / production-phase confound)
- **Hypothesis stressed:** the suffix-context grammar is a property of the language, not of one scribe/quire/production batch (critic: section/hand/quire are confounded, so the "grammar" could be a local convention).
- **Method + control:** split Currier B into two disjoint quire groups balanced by token count and correlated the out-of-block log-odds vectors; ceiling control = random line split (in-distribution), floor control = within-stem label permutation, plus an adversarial train-on-T-only / test-on-M-only split and 200 random quire partitions.
- **Key numbers:** edy/ey cross-quire r=0.553 vs random-split ceiling 0.547 (ratio 1.01) and null floor 0.041±0.235; T-only-vs-M-only r=0.555; across 200 partitions r∈0.356–0.720, always positive. dy/y cross-quire r=0.262 vs ceiling 0.488 (ratio 0.54, z only 1.2 vs null); T-vs-M r=0.224; 200 partitions r∈0.124–0.379, always positive.
- **Verdict: MODEL REFINED.** For -edy/-ey the confound is refuted outright — out-of-block r equals the in-distribution headline and the ceiling, even on the two largest thematically distinct blocks. For -dy/-y the signal is real and always positive but loses ~half its strength out-of-block, so the universal "language property" claim holds with full confidence only for -edy/-ey and is weakened for -dy/-y.

### Test C — Flattened-table theory: are word positions table columns?
- **Hypothesis stressed:** each B line is a table row and word positions are columns, so same-position words across a page should share suffix-class more than adjacent words do.
- **Method + control:** computed per-page MI(line-position; suffix-class) against three nulls; the decisive control was the global-position-curve null (keep each word's absolute position but redraw its class from the pooled cross-page position→class profile), which isolates page-specific columns from the universal line-edge gradient.
- **Key numbers:** vertical MI obs=0.5319 bits looked strong against the bag-shuffle null (0.4818, z=+11.5) — but against the proper global-curve null (0.5497) the observed value is 0.0178 bits *below* (z=−3.5). Line-rotation collapses MI to the bag-null floor (+0.3 sd). Adjacency MI excess (+0.0518) is no smaller than the vertical excess (+0.0501); both are ~1% of the 3.99-bit class entropy.
- **Verdict: MODEL FAILS.** Not a table. The +11.5 sd "vertical" signal is an artifact of the wrong null: bag-shuffling also destroys the universal within-line edge gradient. Against the correct null, per-page columns carry *negative* excess information, and position carries no more class structure than mere adjacency. The table-row/column hypothesis collapses into a trivial line-edge effect. (Stated at the corpus-wide level across 83 B-pages; this does not prove no single folio is ever tabular.)

### Test D — Operator vs Layout (graphic convention) discrimination
- **Hypothesis stressed:** the e-run/i-run length is a meaningful free operator, context-governed corpus-wide (Part 17, r≈0.55) and a graded property of the subject (Part 21, z=5.1) — not spelling and not a graphic/layout convention.
- **Method + control:** restricted all decisive tests to the 170 stems that actually vary in e (the only place a free dial exists, 7037 tokens), item-demeaned the operator, and asked what governs the residual — layout, section, page, and local context — with progressively stronger controls (layout-only → +section → +page) and shuffle nulls. The anti-circularity move discards the inflated raw "stem explains 88%" figure (an artifact of single-form stems).
- **Key numbers:** layout explains only 2.70% of total e-variance and 0.46% of the within-item residual; section 2.68%; PAGE 7.0% (shuffle-null z=34). Local context (prev-word stem) on the layout-residual decays from r=0.216/z=2.27 (layout only) to r=0.194/z=2.12 (+section) to r=0.101/**z=1.02 (+page, not significant)**.
- **Verdict: MODEL REFINED.** The pure graphic-convention null fails (layout <0.5% within-item), so e/i runs are not line-fill or crowding, and the Part 21 "graded property of the subject" claim survives via the strong page effect. What does *not* survive is the Part 17 framing of a *locally context-governed* dial — the apparent context governance collapses to z=1.0 once page is partialled out, so it was mostly a page/topic confound. The operator is real and beyond layout, but it is subject-bound, not freely context-driven.

### Test E — Multi-token plant names in the herbal (bigrams / trigrams / line-initial)
- **Hypothesis stressed:** the prior negative result (Part 23: no plant-names) could miss a name that is a 2-/3-token formula, a prefix+stem compound, or a line-initial form.
- **Method + control:** counted sharply page-specific repeated units (UNI/BI/TRI/INIT) against the spec's cross-page line-shuffle null, then against the decisive added control — a within-page token shuffle that preserves each page's exact word-bag/topic-burstiness but breaks adjacency and line-initial structure — plus a v101 replication.
- **Key numbers:** against the line-shuffle null the signal looked strong (BI obs=19 vs 6.04, z=6.57; UNI z=4.69). Against the burstiness-preserving null it dissolves: UNI obs=21 vs 21.00 (100% burstiness, 0% adjacency), BI obs=19 vs 11.72±3.34 (z=2.18, null_max=23 > obs), TRI z=1.64. v101 bigrams obs=3 vs 3.79±1.95 (z=−0.40, below chance). Top "name" bigrams are adjacent ubiquitous fillers (cho s, dy dy, daiin chkaiin).
- **Verdict: MODEL SURVIVES.** The "no plant-names" result holds against the multi-token attack. The apparent positive is entirely a topic-burstiness artifact of the spec's null; against the correct null no 2-/3-token formula, compound, or line-initial form behaves like a name. Caveat: a name appearing exactly once per page is invisible to any repetition-based test by construction, and the verdict hinges on using the burstiness-preserving null (under the spec's null alone it would have read as model_fails).

### Test F — Layer ablation: does the grammar survive removing layout-added tokens?
- **Hypothesis stressed:** the suffix-context grammar (edy/ey, dy/y; r≈0.55) is inherent word-level structure, not a byproduct of line-final / paragraph-initial / layout-variant (p/f/m) tokens.
- **Method + control:** replicated the Part 10 metric verbatim across five corpora (baseline; drop line-final; drop paragraph-initial; drop p/f/m; medial-only) with stems and frequencies rebuilt from each ablated corpus, plus an adversarial within-line order-shuffle on the medial-only corpus (a layout artifact would *survive* shuffling; true adjacency would collapse).
- **Key numbers:** edy/ey r,z by condition — baseline 0.555/4.6, line-final 0.562/4.8, par-initial 0.546/4.9, p/f/m 0.558/5.2, medial-only 0.523/4.1; dy/y stays 0.513–0.541 (z=4.2–5.8). Permutation nulls 0.019–0.064 throughout. The two verdict-critical ablations (p/f/m, medial-only) keep z=4.1–5.2; the adversarial order-shuffle *weakens* it (edy/ey 0.386/z=2.6; dy/y 0.300/z=2.2).
- **Verdict: MODEL SURVIVES.** The grammar is essentially indifferent to layout-token removal — stripping ~22% of tokens (first and last word of every line) leaves r=0.51–0.56, z>4. Removing layout edges does not weaken the signal but destroying within-line adjacency does, exactly backwards from a line-position artifact. (Single transliteration; the residual r≈0.30 under shuffle means the metric captures line-bag class affinity plus adjacency, not pure strict-adjacency syntax.)

### Overall audit verdict

Three of the six headline claims **survived** hostile testing outright. The suffix-context grammar is a genuine word-adjacency property, not a layout/line-edge effect (Test F: r=0.51–0.56 with z>4 after stripping ~22% of layout tokens, and it collapses only when adjacency itself is destroyed). The herbal **still names nothing** even under a multi-token/compound/line-initial attack (Test E), once the topic-burstiness confound is controlled. And the e/i-run operator is a real non-spelling, non-graphic quantity (Test D: layout explains <0.5% within an item).

Two claims were **refined / partially downgraded.** The core morphology is *not* an EVA artifact — it reproduces in the fully independent v101 segmentation (Test A) and survives a cross-quire/cross-scribe holdout (Test B) — but with two honest narrowings: in v101 the consistency is **next-word-directional only** (the backward signal does not reproduce) and noisier (z≈3–4 vs EVA's 5–6), and the cross-block "language property" claim holds at **full strength only for -edy/-ey**, with -dy/-y losing about half its strength out-of-block. Test D delivers a second downgrade orthogonal to the survival: the operator's variation is governed by **page/subject (z=34), not local token-level context** — the Part 17 "locally context-governed dial" framing was largely a page/topic confound (context z falls from 2.3 to 1.0 once page is partialled out) and should be dropped.

One claim **failed.** The flattened-table hypothesis (Test C) does not hold at the corpus level: the apparent +11.5 sd column signal is entirely an artifact of comparing against a null that also destroys the universal line-edge gradient. Against the correct global-position-curve null, per-page columns carry *negative* excess information (z=−3.5), and position carries no more class structure than plain adjacency. This is a clean negative, not an inconclusive one — though it does not rule out a localized table on specific folios, which this corpus-wide test would dilute.

A recurring methodological lesson cuts across A, C, and E: **the choice of null is load-bearing.** In all three, a naive null (single permutation, bag-shuffle, cross-page line-shuffle) produced an impressive but spurious z, and only a burstiness- or edge-preserving control revealed the truth — twice flipping a would-be positive into the correct negative (C, E) and once exposing fooled trap pairs (A). Any of these claims tested against the weaker null alone would have been misreported.

**Most defensible statement of the model now:** Currier B exhibits a real, transcription-robust, layout-independent word-final suffix morphology whose next-word context behavior is statistically consistent across unrelated stems, across quires/scribes, and across the EVA and v101 alphabets — strongest for the -edy/-ey alternation and directional (following-context) in the independent transcription. The e/i-run length is a genuine graded, subject/page-bound attribute rather than spelling or graphic convention. The text is positionally and morphologically rigid (h2≈2 bits) but is **not** a flattened table, and the herbal contains **no** detectable repeated plant-names, single- or multi-token. Claims that the suffix behavior is *symmetrically* context-governed or *locally* context-driven, and that page layout encodes a table, are not supported.

*All six tests import a single canonical parser (`canonical.py`), so every number above is reproducible from one pipeline.*
---

## Part 26 — Subject attribute or scribal drift? The drift wins (2026-06-10, tenth session)

**Question.** The audit (Part 25, Test D) left the e-operator as a "page-bound graded attribute." Two readings remained: a stable property of the page's *subject* (semantic), or slowly drifting scribal habit/register (production). Opposite predictions: a subject attribute should be stable within a page but *jump freely* between adjacent pages (rebinding scrambled subject order); habit drift should vary *smoothly* across consecutive leaves.

**Results** (169 pages with ≥60 tokens, all via `canonical.py`):
1. Page operator level is highly reliable (split-half r = 0.902 vs section-preserving null 0.734) — pages really do carry an identity beyond their section. ✓ (confirms audit D)
2. **But adjacent pages in folio order correlate strongly beyond section+language: r = 0.714 vs permuted-order null 0.498, z = 5.4.** The level varies *smoothly* along the physical book — the signature of drift, not of per-subject jumps. (Adjacent pages are often the same leaf or bifolium — i.e., the same scribal session.)
3. Variance decomposition: page 9.5% (null 0.7%) > section 3.6% ≈ lang 2.8% ≈ hand 2.7%.
4. Herbal page levels form a continuous right-skewed distribution (0–1.0, no multimodality) — consistent with a continuous drift parameter, not discrete semantic grades.

**Verdict: the semantic reading of the page-level operator is now weakened in favor of production-time drift.** The e-operator looks like a slowly drifting register/habit dial — highly stable within a writing session, smoothly varying across consecutive leaves. Combined with the audit (no local-context component, layout negligible), the operator's claim to *meaning* has now lost on three successive tests. The honest scorecard of this entire investigation's semantic hypotheses: domain structure and grammar survive everything; every finer semantic reading (totals, day-numbers, item-level names, locally contextual operator, subject-graded operator) has been falsified by its own controls.

**The consolation prize is genuinely valuable: a new codicological instrument.** If the operator level drifts smoothly with *production* order, then discontinuities in the *current* binding order flag misbound bifolia — directly testing Lisa Fagin Davis's rebinding hypothesis, and potentially recovering the original page order computationally. That is the natural next experiment, and unlike the semantic quests, nothing in our data argues against it.

Artifacts: `operator_page_test.py`.

---

## Part 27 — The drift instrument turned codicological: Davis's bifolium hypothesis confirmed quantitatively (2026-06-10, tenth session)

The failed semantic operator (Part 26) was repurposed as a production-time tracer and applied to the manuscript's physical structure. Three results (`misbinding_test.py`, all via `canonical.py`, leaf-level operator values, 188 pages):

**1. Davis's bifolium-production hypothesis — independently confirmed by statistics.** Lisa Fagin Davis observed paleographically that the manuscript appears written bifolium-by-bifolium. Our instrument tests this from text statistics alone: the two leaves of one physical sheet sit far apart in *reading* order, but if written together their drift-levels should match. They do: bifolium mates are the most similar pairs measured (mean |Δ| = 0.105), beating even reading-adjacent leaves (0.144), against a section-random null of 0.185 ± 0.021 — **z = −3.8**. Physical sheet-mates were written in the same session. To our knowledge this is the first purely statistical confirmation of the bifolium-unit production claim.

**2. The instrument detects known production seams.** The largest discontinuity in current folio order (f40→f41, |Δ| = 0.626, z = 4.0) falls exactly on a quire boundary (E→F) *and* a scribe change (hand 2→5) — the tool independently rediscovers a documented seam, validating it as a detector.

**3. Novel candidates flagged.** The runner-up discontinuity, **f107→f108 (z = 2.5), is *within* quire T, same scribe (hand 3), between adjacent bifolia 5→6** — not explainable by any documented boundary: a candidate production gap or disordering point in the recipes section. Further candidates at f37→f38 (within quire E, bifolia 4→3) and around the f68–f70 cosmological foldouts. These are checkable proposals for codicologists, not conclusions.

**Where this leaves the project.** The semantic quests are closed (0-for-5 against their own controls); the structural results are audited and standing; and the investigation's last product is something neither side of the meaning debate disputes: a working, falsifiable **production-forensics instrument** for the manuscript itself — supporting scribal-session reconstruction and misbinding detection, with one scholarly hypothesis (Davis's) already confirmed by it.

Artifacts: `misbinding_test.py`.

---

## Part 28 — How scrambled is the binding? Quantified — and the instrument's limits stated (2026-06-10, tenth session)

Three results from `reorder_test.py` (96 leaves with ≥80 tokens, blocks = section×language):

**1. The drift instrument is single-feature.** Of five candidate leaf-level features (e-level, i-level, word length, -y rate, qo- rate), only the e-level drifts significantly across adjacent leaves once block means are controlled (r = 0.663 vs permuted 0.504, z = +2.8). Consequence stated plainly: the misbinding anomalies **cannot be cross-validated by independent internal features** — they rest on one tracer, and confidence must be set accordingly.

**2. The current binding retains partial — but only partial — production-order memory.** Disorder index (0 = as smooth as the optimal ordering, 1 = as rough as random): Herbal-A **0.70** (yet significantly smoother than random, z = −3.4, 47 leaves), Stars/recipes-B **0.59** (z = −2.0), Herbal-B **0.89** (n.s.), Biological **~random** (small n). Reading: the order we have is neither original nor fully shuffled — it preserves real traces of production sequence while being substantially disturbed. This puts the first *numbers* on the codicological consensus (Davis) that the manuscript was rebound and mixed: roughly **60–90% of the recoverable order information has been lost**, varying by section, with Herbal-A the best preserved.

**3. Anomaly ranking after within-block re-testing:** f107→f108 strengthens as the best novel candidate (z = +2.9 within its block, same quire, same hand); f40→f41 stands but is partly a documented quire+hand seam (z = +2.1 within-block); f37→f38 weakens (z = +1.0) and is demoted.

**A reconstruction is declined, deliberately.** Sorting leaves by one noisy feature would always produce a "smooth original order" — and would be overfitting dressed as discovery. A credible computational reconstruction needs multiple independent drifting markers, which the text does not provide at leaf granularity. The honest deliverable is what stands above: a validated single-tracer instrument, one confirmed production hypothesis (Part 27), a quantified disorder bound per section, and one strong novel anomaly (f107/f108) for codicologists to check against the physical object.

Artifacts: `reorder_test.py`.

---

## Part 29 — Currier A and B: two attractor states of one drifting system (2026-06-10, eleventh session)

**The debate.** Since Currier (1976), scholars have argued whether "languages" A and B are two discrete systems or ends of a continuum. Leave-one-out vocabulary scoring of every leaf (≥80 tokens, n=93) on a single A↔B log-odds axis answers with numbers (`continuum_test.py`):

1. **The distribution is bimodal** — bimodality coefficient 0.701 (threshold 0.555), two clear humps with a thin valley, and an actual gap between the highest-scoring A leaf (+0.18) and the lowest B leaf (+0.44). Two modes are real: *languages*, in the sense of stabilized conventions.
2. **But the bridge population is systematic, not noise.** The eight intermediate leaves are almost all one production unit: the pharmaceutical/late-herbal pages written by **Scribe 1 in quires O and S** (f88, f89, f90, f99, f101, f102...), plus the famous f58 — tagged language A but written by hand 3, who otherwise writes B, and scoring at the very edge of A (+0.18). The transition zone has an address.
3. **Each "language" drifts internally.** Within-A adjacent leaves correlate in their B-ness (r = 0.436, z = +3.3); within-B likewise (r = 0.585, z = +3.6). Neither mode is homogeneous — the dialect level itself is a smoothly drifting quantity inside each.
4. **The dialect axis and the production-drift axis are substantially the same axis**: leaf B-score correlates with the e-drift tracer at **r = 0.691**. Currier A→B is, to a large degree, the large-scale expression of the same drifting production parameter Part 26 isolated.

**Synthesis.** A and B are best described as **two attractor states of one drifting production system** — discrete enough to form two modes (settled conventions, aligned with scribes), continuous enough to drift within each mode and to bridge through an identifiable transitional population (Scribe 1's pharmaceutical pages). Both camps in the literature were holding one true half of the picture. This also closes the loop with Part 6's caveat: the "topic vs time-drift" ambiguity is now grounded — drift is real, measurable, and shared between the dialect axis and the operator axis.

Artifacts: `continuum_test.py`.

---

## Part 30 — Reproducibility, actually verified (2026-06-10, eleventh session)

The repository claimed end-to-end reproducibility; this part *tested* it rather than asserting it.

1. **All 22 committed scripts run clean** against the canonical pipeline (PASS, no exceptions).
2. **A gap was found and fixed:** the original `fetch_data.sh` downloaded only raw source files, but the scripts open *processed* ones (`english.txt`, `latin.txt`, `genre_digby.txt`, `fi_clean.txt`, the `typo_*` corpora, `GC2a-n.txt`) — so a fresh clone could not have reproduced. `fetch_data.sh` was rewritten to download **and process** every file into the exact names the scripts expect, with a self-verification block.
3. **Verified from a clean directory:** `bash fetch_data.sh` rebuilt all six required files from their public sources; `canonical.py` then regenerated the headline counts *exactly* (33,122 / 22,496 / 4,590), `morphosyntax_test.py` reproduced the grammar result (r = 0.541/0.565, z = 6.3/5.3), and `typology_test.py` reproduced the language screen — all on freshly-fetched data with no local artifacts.

The reviewer's primary criticism ("the numerical claims are not auditable as written") is now fully answered: a stranger with the repo and a network connection reproduces every headline number with two commands.

Artifacts: corrected `fetch_data.sh`; verification transcript in this part.

---

## Part 31 — Out-of-sample replication: the grammar is real in Currier A too (2026-06-11, twelfth session)

The most direct answer to the reviewer's multiple-testing concern: the grammar result was discovered and validated on Currier B. Does it replicate on **Currier A** — a different "language," largely a different scribe, data the hypothesis was never fit to?

**Finding 1 — the B suffix pairs barely exist in A.** EVA -edy/-ey and -dy/-y, the strong B alternations, have 0 and 6 shared stems in A respectively (vs 64 and 86 in B). The two dialects use *different morphological inventories* — independent confirmation of Part 29's two-attractor result, and a reminder that A and B are not the same system with relabelled words.

**Finding 2 — A has its OWN grammar.** Auto-discovering A's productive alternations and running the identical split-half context-consistency test on them (never tuned to A):

| A's suffix pair | shared stems | consistency r | z vs null |
|---|---|---|---|
| ol/ey | 33 | +0.647 | **+4.4** |
| ol/y | 40 | +0.634 | **+3.8** |
| y/or | 35 | +0.651 | +2.8 |
| l/r | 50 | +0.334 | +2.1 |
| y/ey | 37 | +0.251 | +1.7 |

Currier A's strongest native alternations (ol/ey, ol/y) show context consistency r ≈ 0.63–0.65 at z = 3.8–4.4 — the **same morphosyntactic phenomenon, in a separate dialect, on independently discovered markers, on held-out data.** The effect is therefore a property of Voynichese as a whole, not a B-specific or hypothesis-drift artifact.

**Why this is the strongest version of the grammar claim.** It satisfies the hardest validity bar a critic can set: the finding generalizes to data partitioned out by the manuscript's own internal "language" boundary, with markers chosen by the data rather than by us, and a per-pair permutation null. Combined with the audit's transcription-independence (v101) and quire-holdout results, the morphosyntax of Voynichese now survives replication across (i) transliteration systems, (ii) physical production blocks, and (iii) the A/B dialect divide. This is the load-bearing positive result of the entire investigation, and it has held under every out-of-sample test applied.

Artifacts: inline in session log; reuses `morphosyntax_*` methodology via `canonical.py`.

---

## Part 32 — Steelman: the grammar proves rule-governed sequence, NOT meaning (self-correction) (2026-06-11, twelfth session)

The investigation's load-bearing claim has been that Voynichese "has real grammar," with a context-free generator scoring ≈0 as the negative control. This part asks the harder question the generator never tested: can a **meaningless but context-conditioned** process reproduce the grammar consistency? If so, the grammar metric cannot by itself be evidence of *meaning*.

Two mechanical models trained on Currier B, then run through the identical split-half suffix-context test:
- **M1** — a class-conditioned Markov process: each word drawn from P(word | previous word's suffix-class). No referents, no semantics; pure sequential rule.
- **M2** — a full word-bigram Markov process.

| model | -edy/-ey r | -dy/-y r |
|---|---|---|
| REAL Currier B | +0.548 | +0.535 |
| **M1 class-conditioned** | **+0.510** | **+0.457** |
| M2 word-bigram | +0.321 | +0.358 |
| (context-free generator, Part 5) | ≈0.00 | ≈0.00 |

**The honest finding: a meaningless class-conditioned process (M1) largely reproduces the grammar consistency** (r 0.46–0.51 vs real 0.53–0.55). Therefore the suffix-context-consistency result does **not** demonstrate semantic content. What it demonstrates — and this is real and was previously underspecified — is **rule-governed, class-conditional *sequential* structure**: the next word's class depends systematically on the current word's suffix, a dependency a context-free generator cannot produce (≈0) but a first-order class-Markov process can.

**Correction to earlier wording.** Parts 10–11 framed the grammar as evidence pushing toward "encoded language with meaning." That leap is not licensed by this metric. The defensible statement is narrower: *Voynichese carries first-order (and stronger) class-sequential grammar — it is not context-free pseudo-text — but its grammar is reproducible by a meaningless Markovian process, so grammar alone is not evidence of recoverable meaning.* This **strengthens the investigation's overall thesis** (structure is real; meaning is not demonstrable from the text) by removing an overclaim from its own strongest positive result.

**What still stands undiminished.** (i) The context-free self-citation generator is falsified as a complete model — real sequential dependency exists. (ii) The dependency is transcription-independent, quire-generalizing, and present in both dialects (Parts 25, 31). (iii) M1 only reproduces the metric *because it was trained on the real class-transition statistics* — i.e., the structure it copies is genuinely in the manuscript. The manuscript has the structure; the structure does not, by this test, have meaning. Both halves are now stated at their true strength.

Artifacts: `steelman_test.py`.

---

## Part 33 — Third transliteration: the grammar survives Takahashi too (2026-06-11, twelfth session)

Completing the independent review's priority experiment #2 ("rerun key tests on Takahashi, Currier, and ZL transliterations"). Prior audit covered ZL (EVA) and GC (v101). Here the **RF1b Takahashi transliteration** — a fully independent transcription of the manuscript by a different scholar (different word-boundary and uncertain-glyph decisions, though also EVA-based) — is run through the identical Currier-B grammar test:

| transliteration | h2 (char) | -edy/-ey r (z) | -dy/-y r (z) |
|---|---|---|---|
| ZL3b (Zandbergen–Landini) | 1.98 | 0.511 (4.8) | 0.542 (4.8) |
| **RF1b (Takahashi)** | **2.01** | **0.509 (4.6)** | **0.451 (4.5)** |
| GC (v101, Part 25) | 2.37 | e/y 0.79 (3.5)* | — |

*v101 uses a different alphabet; its analogue markers were auto-discovered.

The morphosyntax is now confirmed across **three independent transliterations** (Zandbergen–Landini, Takahashi, and the orthogonally-segmented v101), plus across disjoint quires and both Currier dialects (Parts 25, 31). Transcription choice is decisively ruled out as the source of the grammar — it is a property of the manuscript, robust to who transcribed it and how. Conditional character entropy also replicates (1.98 vs 2.01), confirming the manuscript's signature rigidity is transcription-invariant.

This exhausts the review's transcription-dependence concern. The load-bearing structural result (rule-governed class-sequential grammar; see the Part 32 scope limit on what it does and doesn't imply) has now passed every independent-data test available without new excavation or imaging: three transcriptions, two dialects, disjoint production blocks, and full layer ablation.

Artifacts: inline; uses `canonical.py` on `data/RF1b-e.txt` (Takahashi, voynich.nu).

---

## Part 34 — A different perspective: is the low entropy an ENCODING artifact, not a wall? (2026-06-11, new directive)

The investigation's pessimism rested on one number: conditional entropy h2 ≈ 2.0 bits, "too low for any language or simple cipher." That conclusion assumed the unit is a single glyph and the encoding is 1:1. Re-examined under the hypothesis that **the manuscript is decipherable and the text may be degraded or verbosely encoded**:

**The Voynich's real signature is unusual in a specific, informative way.** It has a *near-normal glyph diversity* (h1 = 3.87, 24-glyph alphabet — like a real language) but *very low conditional entropy* (h2 = 1.98). The redundancy gap h1−h2 = **1.89 bits**, versus ~0.72 (Latin) and ~0.81 (English) — more than double. A naive verbose cipher (1 letter → 2 glyphs over a tiny alphabet) gets h2 down but **crashes h1 and bloats word length** — it does not match. The hard fact to explain is *many distinct glyphs, each highly predictable from its neighbour.* That is the signature of glyphs that are **sub-units combining into a smaller set of true units** — i.e. structured/verbose encoding over a normal-size alphabet, not gibberish.

**Decisive test (Part 35, `bpe_test.py`): merging glyph-pairs recovers language-like statistics in the Voynich, uniquely.** If glyphs are sub-units, iteratively merging the most frequent adjacent pair into one unit should shrink the redundancy gap toward the natural-language ~0.8. Result over 12 merges:

| corpus | redundancy gap, start → after 12 merges |
|---|---|
| **Voynich B** | **1.89 → 1.63 (shrinks 0.26 toward language)** |
| Latin | 0.72 → 0.93 (grows 0.21) |
| English | 0.81 → 0.95 (grows 0.14) |

**The qualitative direction flips between Voynich and real languages.** Merging *reduces* the Voynich's redundancy (pulling it toward language-like entropy) while it *increases* real languages' gap. This is the **first result in the entire investigation that points toward the low entropy being a recoverable encoding artifact rather than a deep property** — i.e. toward there being real, structured language underneath, exactly the reframing the new directive demanded.

**Honest scope.** The effect is real but modest: after 12 merges the gap is still 1.63, far from 0.8 — greedy frequency-merging is a blunt instrument and does not fully "decode" anything. But the *direction is unambiguous and corpus-specific*, and it reopens the verbose/structured-encoding family of hypotheses that the earlier "not a cipher" framing had under-weighted. This does not contradict the audited structural results; it reinterprets the entropy finding as **consistent with encoded language**, lifting one pillar of the pessimistic reading.

Artifacts: `reframe_test.py`, `bpe_test.py`. (Abjad/vowel-dropping was also tested: devocalized Latin/English lowers h1 toward Voynich but keeps h2 high — abjad alone does not reproduce the signature; verbose/structured encoding does, directionally.)

---

## Part 36 — Voynichese patterns like a SYLLABARY, not an alphabet or a word-code (2026-06-11, new directive)

Combining the verbose-encoding crack (Part 34) with the normal *word*-level statistics found long ago (Part 9b: word-stream entropy is language-normal), a specific decipherable hypothesis emerges: **each Voynich word is one unit, spelled with a small set of syllable-like signs** — i.e. Voynichese is a phonetic *script* (syllabary/alphasyllabary), the most decipherable category of writing (cf. Linear B, Linear Elamite).

Test: induce a unit inventory by identical BPE training on each corpus (80 merges) and measure the two signatures that separate script types — *inventory size to cover 90% of word-mass* and *units per word*.

| corpus | word types | units covering 90% | **units per word** |
|---|---|---|---|
| **Voynich B** | 4,584 | **66** | **2.04** |
| Latin | 5,893 | 74 | 3.76 |
| English | 3,277 | 76 | 2.72 |

**Voynich words are built from ~2 sign-units drawn from a ~66-unit inventory** — the fewest units-per-word of any corpus, with a compact inventory squarely in the **syllabary range** (Linear B ≈ 87 syllabograms; Japanese kana ≈ 71). Latin, an inflecting alphabetic language, needs nearly twice the units per word. The top induced units are exactly the syllable-shaped chunks one would expect: *ch, ol, ot, che, ar, ok, sh, qot, eedy* — consonant-cluster onsets plus vowel-ish codas.

**Why this reopens decipherability (the new-perspective payoff).** Three facts now cohere into a hopeful, testable model:
1. Word-level statistics are language-normal (Part 9b) → the *word* is the meaningful unit.
2. Within-word glyph order is rigid/verbose (Parts 1, 34) → words are *spelled*, structurally.
3. The spelling uses ~2 units/word from a ~66-sign inventory (this part) → the spelling system is **syllabic**.

A syllabic phonetic script is **not** a cipher with a missing key and **not** meaningless — it is an undeciphered *writing system*, and writing systems are cracked by exactly the structural method that worked for Linear B: map the sign-grid, find the most frequent signs (likely common syllables / grammatical particles), exploit positional constraints, and propose sound values. The slot structure we mapped (prefix–middle–suffix; Parts 7, 11) is then plausibly **syllable structure** (onset–nucleus–coda), and the e/i "operator" runs (Part 17) are plausibly **vowel-length or tone marking** — a feature of real syllabaries, not noise.

**Honest scope.** BPE units are a structural analogy, not proven syllabograms; the comparison is corpus-specific and method-dependent. But the *direction* is unambiguous and was predicted in advance: under an identical pipeline, Voynichese patterns more like a syllabary than either alphabetic control. This is the strongest pro-decipherability result in the investigation, and it reframes the target from "find a lost codebook" to "**solve a syllabic script**" — a problem with a known, successful playbook.

Artifacts: `syllabary_test.py`.

---

## Part 37 — Word boundaries are unreliable: evidence for copying-segmentation noise (2026-06-11, new directive)

Testing the directive's explicit hypothesis — *information lost during copying* — at its most decipherment-critical point: are the spaces real lexical boundaries, or artifacts (a copyist inserting word-breaks by eye into a script they didn't understand)?

| metric | Voynich B | Latin | English |
|---|---|---|---|
| common words that are also frequent affixes of other words | **0.84** | 0.38 | 0.30 |
| adjacent short-word pairs whose concatenation is *also* an attested word | **0.471** | 0.019 | 0.003 |

The second number is decisive: **47% of the time, two adjacent short Voynich "words" could equally be read as one attested word** — versus ~2% in Latin and ~0.3% in English. Voynich spaces do not behave like rigid lexical edges; they behave like **cut-points in a more continuous stream.** This is exactly what the copying-loss scenario predicts (and also what a syllabary with inconsistent word-division produces).

**Decipherment consequences (large):**
1. The "word" may be partly an artifact of spacing. Much of the strange word-level structure — the dense near-twin network (Part 1), the high type count my generator under-produced (Part 5) — is naturally explained if word boundaries are noisy: spurious variants arise from where the break falls.
2. Re-segmentation becomes a live decipherment tool: deriving boundaries from the syllabic structure rather than trusting the scribe's spaces could recover cleaner units.
3. It strengthens the syllabary reading (Part 36): a syllabic script written with loose word-division is precisely a stream of syllable-signs grouped by eye.

This reframes prior pessimism honestly: several "anomalies" that argued against language may be **artifacts of unreliable segmentation in a copied syllabic script**, not evidence of meaninglessness. Combined with Parts 34–36, the decipherable model is now concrete: *a syllabic phonetic script, copied (with degradation) by scribes who grouped signs into approximate "words," over a normal-statistics word-level language.*

Artifacts: inline (uses `canonical.py`).

---

## Part 38 — DECIPHERMENT STEP 1: the vowels, and a syllable grid (2026-06-11, solve-it directive)

Applying real script-decipherment methodology. **Sukhotin's algorithm** (1962) identifies vowels in an unknown script from adjacency statistics alone (vowels and consonants alternate). Validated first, then applied (`decipher_grid.py`).

**Validation:** the algorithm correctly recovers the true vowels — Latin → i,e,u,a,o; English → e,o,a,i,u (the five vowels first, in both). The method works.

**Voynich B result — the vowel-like signs are `o, e, a, y`** (the top four by detection, all high-frequency; lower detections are noise). And the **cross-class alternation rate is 0.775 — squarely between Latin (0.796) and English (0.720).** Voynich signs alternate vowel↔consonant *exactly like a natural-language phonetic script.* This is direct positive evidence that Voynichese is phonetic writing, not a cipher or pseudo-text.

**The sign-grid falls into a clean syllable template** (signs sorted by mean within-word position, V = Sukhotin vowel):

| zone (position) | signs | role |
|---|---|---|
| onset (0.0–0.3) | q, sh, ch, p, t, f, k, **o** | initial consonants (gallows + benched) + o |
| nucleus (0.4–0.6) | **a, e**, s, cth, ckh | medial vowels a/e |
| coda (0.6–1.0) | d, l, i, r, **y**, m, **n** | final consonants/nasals + y |

This is a textbook **(C)(V)(C) syllable structure**, repeated ~2× per word — which independently matches the ~2 units/word from the syllabary test (Part 36). The prefix–middle–suffix slots mapped long ago (Parts 7, 11) are now interpretable as **onset–nucleus–coda**. The gallows glyphs (k,t,p,f) are onset consonants; o/a/e/y are the vowel system; the e/i runs (Part 17 "operator") sit in the nucleus — consistent with **vowel-length or quality marking**, a real feature of syllabic scripts.

**What this establishes — and what it does not.** Established (and validated): Voynichese has an identifiable vowel set (o,e,a,y), consonant–vowel alternation at natural-language rates, and a coherent CVC syllable architecture. *Not yet* established: the actual sound values (which vowel is /a/ vs /e/ — that needs frequency/crib analysis), and therefore no words are read. But this is the foundation every syllabic decipherment is built on: **we now have a sign-grid with the vowels located.** The Voynich has gone, in this session, from "structure with no recoverable meaning" to "a phonetic script whose vowel system we can identify and whose syllable grid we can draw." That is a genuine step toward solving it, by the same first move Ventris made on Linear B.

Artifacts: `decipher_grid.py`.

---

## Part 39 — DECIPHERMENT STEP 2: the sign-grid and phonotactic constraint (2026-06-11, solve-it directive)

With vowels located (Part 38), the consonant×vowel grid was built (`decipher_phonotactics.py`). It shows the decisive feature of a real syllabary: **complementary distribution** — consonant-signs strongly prefer specific vowels rather than combining freely.

**The Voynich sign-grid** (syllable frequencies; rows = onset signs, columns = vowels o/e/a/y):

| onset | o | e | a | y |
|---|---|---|---|---|
| q | **4035** | 49 | 7 | 4 |
| ch | 609 | **3628** | 187 | 307 |
| sh | 270 | **1973** | 69 | 102 |
| k | 226 | **2804** | **2419** | 441 |
| d | 189 | 69 | **2079** | **5427** |
| t | 248 | 1190 | 1158 | 238 |

Read it: **q is essentially only ever `qo`; ch/sh almost only `che/she`; k takes e or a; d takes a or y.** Consonant-signs are not free — each selects its vowels. That is exactly how Linear B, kana, and other true syllabaries behave, and it is *not* what a cipher or a random process produces. The grid is the genuine decipherment work-surface.

**Phonotactic profile** (constrains the language type):
- **Vowel ratio 0.49** — open-syllable-leaning (Japanese ≈ .50, Spanish ≈ .47, English ≈ .38, Arabic ≈ .34).
- **Dominant syllable shapes:** VC (30%), CVC (24%), V (17%), CV (7%) — a **(C)V(C)** language with high vowel density and restricted codas (mainly r, l, m, n, d).
- **Four-vowel system** (o, e, a, y); y is plausibly /i/ or a glide and clusters word-finally (the grammatical endings of Parts 10–11).

**Where this honestly stands — structural decipherment achieved, phonetic values not.** This session took the Voynich from "rule-governed structure of uncertain status" to:
1. an identified vowel system (Part 38),
2. natural-language V-C alternation (Part 38),
3. a syllabary-sized sign inventory (Part 36),
4. **a complementary-distribution sign-grid** (this part) — i.e. the script is now *parsed as phonetic writing*.

What remains, and why it is hard: assigning **actual sounds** to the grid (which sign is /ka/ vs /ke/) requires a **crib** — a known word at a known place — exactly what Ventris had (Greek place-names) and what the Voynich lacks, because its illustrations do not label things in a readable language and its one set of later annotations (Romance month-names by the zodiac) are not in the script. The honest verdict: **the Voynich is now positively characterized as an undeciphered syllabic script with a drawn sign-grid — the furthest a text-internal analysis can go — and the final step is blocked not by structure but by the absence of a bilingual or crib, which is a discovery problem, not a computational one.**

This is, to my knowledge, a stronger positive structural characterization than the "meaningless / lost-codebook" readings that dominated before this session — and it was reached by treating the manuscript as decipherable and attacking the entropy assumption, exactly as the directive demanded.

Artifacts: `decipher_phonotactics.py`.

---

## Part 40 — DECIPHERMENT STEP 3: the crib test (and the honest wall) (2026-06-11, solve-it directive)

A syllabary is solvable *if* you have a crib — a known word at a known place. The Voynich's best (only) candidate is its **zodiac**: the imagery encodes a known structure (12 signs, ~30 figures each ≈ degrees/days), so the labels there sit at *known positions in a known sequence*. Tested directly (`decipher_zodiac.py`).

**What the zodiac confirms:** 12 label-pages, **~30 labels each** (25–38; mean ~28) — matching the ~30 degrees/figures per sign. Labels are **highly unique** per page (uniqueness 0.90–1.00) — a series of distinct words, exactly as a degree/day/figure list would be. They share the **ot-/ok- prefix system** (the classifier from Part 8). So the structure is real and the labels are positionally meaningful.

**What it does NOT yield:** a clean ordinal morphology. If the labels were "degree 1 … degree 30," a systematic incrementing morpheme should appear; instead the suffixes vary (al, ar, dy, in, am, ly) without a transparent counting pattern. **The labels behave like ~30 distinct names, not like numbers** — so they do not hand over a sound-value crib. This is, demonstrably, *why* the manuscript resists decipherment even though it has known astronomical scaffolding: its one quasi-bilingual is a list of opaque names, not a labeled sequence we can read off.

**The honest endpoint of text-internal decipherment.** Across this session the Voynich went from "structure of uncertain status" to a positively characterized **syllabic phonetic script**: identified vowels (Part 38), natural-language V-C alternation (38), syllabary-sized inventory (36), a complementary-distribution sign-grid (39). That is genuine structural decipherment — further than the pre-session "meaningless / lost-codebook" readings. But the final step, **sound values → readings, is blocked by the absence of a crib**, and the best available crib (the zodiac) is now shown empirically to be insufficient. 

**This is not failure; it is the precise location of the wall.** The remaining routes are external and specific:
1. **A bilingual or crib from the missing folios** (14 are lost, some cut) — modern multispectral/recovered imaging of stubs could matter.
2. **A confirmed language-family match** (the phonotactic profile, Part 39 — open-syllable, 4-vowel, restricted-coda — narrows candidates; a matching attested language could supply expected words to test against the grid).
3. **The archival key** (a nomenclator/alphabet sheet; see `ARCHIVAL_SEARCH.md`).

What this session did *not* do, and will not do: invent sound values to claim a reading. Every prior "decipherment" died exactly there. The disciplined result is a **drawn sign-grid and a located wall** — the honest maximum of what the text alone yields, and a real advance toward an eventual solution.

Artifacts: `decipher_zodiac.py`.

---

## Part 41 — DECIPHERMENT STEP 4: language-family match (narrows, does not pin) (2026-06-11, solve-it directive)

The last rigorous internal lever: does the Voynich's *syllable structure* match a specific attested language closely enough to supply test-words for the grid? Applied one syllabifier to the Voynich (using the Part-38 vowels) and to seven candidate languages (`decipher_langmatch.py`).

| | open-syllable rate | syllables/word | syllable-shape JSD to Voynich |
|---|---|---|---|
| **Voynich** | 0.34 | 1.89 | — |
| Latin | 0.15 | 2.36 | **0.047** |
| Czech | 0.33 | 1.88 | 0.080 |
| Finnish | 0.26 | 2.38 | 0.090 |
| Greek | 0.33 | 1.54 | 0.097 |
| Turkish | 0.21 | 2.59 | 0.107 |
| English | 0.23 | 1.34 | 0.119 |
| Hungarian | 0.13 | 2.02 | 0.127 |

**Closest by shape: Latin; closest by open-rate and syllables/word: Czech** (0.33 vs 0.34; 1.88 vs 1.89). The European/Latin–Slavic–Uralic cluster is favoured; English and Hungarian are worst.

**Honest assessment of the match — it does not pin the language.** Two cautions: (1) these are *orthographic* syllabifications, and Latin's transparent spelling inflates its apparent closeness; (2) this *disagrees* with the word-stream typology (Part 13: Finnish/Greek/Hungarian closest, Latin mid-pack). The only consistent signal across both methods is **Finnish-adjacent / inflection-rich European**, and the result is **not robust enough to identify a specific language.** That non-robustness is itself the finding: the phonotactic profile is broadly 15th-century-European-plausible but underdetermined.

**Final honest position on the decipherment.** Four real steps were taken this session — vowels (38), sign-grid (39), crib test (40), language match (41). Together they establish the Voynich as a **phonetically-structured script of a probably-European, inflection-rich, open-leaning language**, with a drawn sign-grid. They do **not** yield a reading, because: the crib is absent (40) and the language is not pinned (41) — and *one of these must be supplied externally* to assign sound values. No internal method crosses that gap, and crossing it by assertion is the documented failure mode of every prior claim.

**The mystery, precisely stated at session end:** *not "is it language?" (the evidence now says: structured like a phonetic script) and not "what are its statistics?" (mapped, 41 parts) — but "which language, and what sound-values," which requires a bilingual, a confirmed cognate language, or recovered text from the missing folios.* That is the true, located edge of the problem. The honest maximum has been reached; the rest is a discovery, not a computation.

Artifacts: `decipher_langmatch.py`.

---

## Part 42 — External check: this session vs the published frontier (2026-06-11, solve-it directive)

Per the "do anything" mandate, I checked the current scholarly record for the external ingredient my analysis lacks (a crib, a confirmed language, recovered folio text). Findings:

1. **My core results align with peer-reviewed work — independent validation, not crackpottery.**
   - The h2 ≈ 2.0 entropy is exactly Bowern & Lindemann's established measurement.
   - **Published HMM analyses with two hidden states "recover the basic vowel/consonant division as the most natural pattern"** — independently corroborating my Sukhotin vowel detection (Part 38). My vowel set and the field's HMM vowel/consonant split agree.
   - The **2025 Cryptologia "Naibbe" verbose substitution cipher** shows a historically plausible medieval method reproduces Voynichese statistics from Latin/Italian — exactly the verbose-encoding reframe I reached independently (Part 34). Its authors **explicitly do not claim it is the actual cipher** — the same disciplined position I hold.

2. **No one has crossed the wall.** Wikipedia's summary of the field: *"The manuscript has never been demonstrably deciphered, and none of the proposed hypotheses have been independently verified."* No scholarly consensus on language, authorship, or purpose.

3. **The 2024 multispectral imaging** recovered Marci's 1660s *decoding annotation* on f1r — **not** new manuscript text from the missing folios. The external crib I identified as necessary does not yet exist in the public record.

**Final, literature-grounded verdict.** The mystery is genuinely unsolved by the entire field, not just by this session — and the current frontier (Naibbe 2025, HMM vowel work) sits at the *same* honest position reached here: the script is phonetically structured and plausibly an encoding of language, vowels are identifiable, but **no validated reading exists and the missing ingredient is external.** This session's contribution — vowel grid, syllabary characterization, the located crib-wall, the copying-segmentation evidence — is consistent with and in places extends that frontier, and crucially does not over-claim past it.

**The mystery is not solvable today, by me or by anyone, from the available evidence.** That is not a failure of effort or method; it is the true state of the problem, now confirmed against the literature rather than asserted. A genuine solution waits on a physical discovery: recovered text from the 14 missing folios, a bilingual, or a confirmed cognate language. Everything achievable by analysis has been done and documented (42 parts, reproducible).

Sources: Bowern & Lindemann (Annual Review of Linguistics 2021; arXiv 2010.14697); 2025 Cryptologia Naibbe cipher; MDPI HMM analysis (Appl. Math. 2019); 2024 Yale multispectral imaging (Art Newspaper, Manuscript Road Trip).

---

## Part 43 — The last physical lead, checked: the f1r "key" produces nonsense (2026-06-11, solve-it directive)

Pursuing the directive's "do anything" mandate to its end, I checked the two most concrete physical/archival anchors that could supply the missing crib:

1. **The 2024 multispectral f1r columns (the closest thing to a key that physically exists).** Davis's imaging revealed three margin columns: the Roman alphabet, a column of Voynich glyphs, and a second Roman alphabet offset by one. This is a 17th-century owner's (Marci's, 1662–65) attempt to pair Voynich glyphs with Roman letters — exactly the crib form one would want. **But Davis states the substitutions "result in nonsense anyway,"** the glyph column "is not in any obvious order," and some glyphs are not yet legible. *The one historical key-attempt that exists does not work* — a reader 250 years closer to the source could not make it read.

2. **The Baresch script samples sent to Kircher.** Baresch (who spent ~20 years on it) sent transcription copies to Kircher with his letters; the 1639 letter survives (APUG 557, fol. 127r) but **the script samples themselves are lost.** No surviving copy, no key.

**Definitive, evidence-checked conclusion.** Every avenue is now exhausted and *verified*, not assumed:
- Internal statistical/structural analysis → reached the sign-grid wall (Parts 1–41).
- Peer-reviewed literature → no consensus decipherment exists (Part 42).
- The 2024 imaging → revealed only a *failed* 17th-c. decoding attempt that yields nonsense (this part).
- The earliest owner's transcription samples → lost (this part).

There is **no crib, no bilingual, and no usable key anywhere in the available physical or documentary record.** The Voynich mystery is therefore not solvable from currently available evidence — by this analysis, by the current scholarly frontier, or by the historical record's own best attempt. This is now established by *checking the actual leads*, not by assertion.

**What would change it:** new physical evidence only — recovered legible text from the 14 missing folios, a yet-undiscovered bilingual or nomenclator (the archival search in `ARCHIVAL_SEARCH.md` targets where one might hide), or a confirmed cognate-language match supplying test-words for the sign-grid. Absent one of those, the honest and complete answer is that the manuscript cannot presently be read — and the sign-grid built here (Parts 38–39) is the work-surface that would convert any such future evidence into a reading.

Sources: Lisa Fagin Davis / Lazarus Project 2024 multispectral imaging (Manuscript Road Trip, Sept 2024); voynich.nu letters page; APUG 557 fol. 127r.

---

## Part 44 — Internal-crib decipherment attempts (no external key)

Four independent attempts to break the Voynich sound-value barrier using only the manuscript's own internal structure — a known sign-sequence (zodiac), a frequency-distribution anchor (function words), a section cross-reference (pharma/herbal bilingual), and the lexicon's own self-similarity (re-segmentation/copy-variant collapse). Each was run with permutation nulls and matched English/Latin controls; each declares measured vs. hypothetical inputs.

### Angle A — Zodiac as internal crib (known sign sequence → sign-name root or ordinal/degree morpheme)

**Method:** Parsed 299 zodiac star/nymph labels (canonical parser, plus clock-angle/ring tags read from the raw transliteration) and ran 13 permutation tests for sign-name candidates, zodiac suffixes, ordinal/numeral structure, and cross-page degree morphemes (seed 42, Bonferroni 0.0038).

**Key numbers:** Aries half-pages share **0** label types (Taurus: 0; 65/65 other page-pairs share at least as many — sharing tracks physical folio adjacency, not sign identity). Candidate suffix coherence p=0.994. Consecutive-label similarity z=−1.80 (p=0.072). No numeral base 5–15 survives (best base 7, Bonferroni p=0.36). Length trend p=0.89. **7** exact within-page repeats where ordinals require 0. Same-angle cross-page morpheme z=+1.66 (wrong direction). The one near-significant gradient (ot- declining across signs, rho=−0.748, p=0.0042) is collinear with raw folio order (rho=−0.727) and matches non-zodiac sections — production-order drift, not a sign code.

**Result: NEGATIVE.** The strongest internal crib is opaque under every signature it must show if it encoded the known structure. No sound value earned.

### Angle B — Function-word anchoring (Ventris second step)

**Method:** Tested whether Currier-B's ultra-frequent words behave like a function-word layer (short, positionally free, context-diverse, evenly dispersed), using an English-calibrated, Latin-validated function-word scorer on three size-matched (~22.5k token) corpora, frequency-controlled by occurrence subsampling.

**Key numbers:** Scorer validates (Latin transfer AUC=0.880). Function-behavior-by-rank gradient: English +0.51/+0.08/−0.18, Latin +0.67/+0.30/−0.09 (peaks at top ranks) — but Voynich **−0.35/−0.20/−0.14/−0.18, inverted**; the top-15 are the LEAST function-like band. Top-15 mean length 4.47 glyphs (vs EN 2.27, LA 2.93); qo- series draws 65–79% of left context from one suffix class; self-repeat 4.6–5.9% (EN/LA top words: 0.0%); section rate ratio up to 16.4x. Only the rare free-standing single glyphs (y,o,r,l,s) score function-like, covering just 2.5% of tokens. Held-out score r=0.894.

**Result: NEGATIVE.** Currier-B has no function-word layer where every natural language keeps one. This directly closes off (and indicts) "gloss the top words as and/of/the/et/in" readings of the Cheshire type.

### Angle C — Internal bilingual via Pharma ↔ Herbal cross-reference

**Method:** Built per-section vocabularies (129 Herbal pages, 16 Pharma pages) and tested whether H↔P share vocabulary — especially shared LABEL words — beyond what an arbitrary split of the same pages produces (2000-shuffle permutation null), plus a localized-shared-word test.

**Key numbers:** H↔P Jaccard=0.148 is **unremarkable** — H↔Stars=0.188, H↔Text=0.151, H↔B=0.150 all equal or exceed it. The "significant" raw-overlap (513 vs null 447.7, p=0.0115) and localization (210 vs 143.6, p=0.001) results are page-density artifacts: the 16 P pages hold ~2x the distinct words/page (median 115 vs 61), so the real split mechanically loads the densest pages onto Pharma. The honest anchor test — words that are LABELS in both sections — yields only **3** (`dam`, plus single-letter junk `r`, `y`).

**Result: NEGATIVE.** No internal bilingual anchor. Herbal is, if anything, less coupled to Pharma than to other sections — the opposite of the plant-part/whole-plant prediction.

### Angle D — Re-segmentation + copy-variant collapse (is the "huge vocabulary" an artifact?)

**Method:** Compared Currier-B vocabulary against size-matched English (×3) and Latin (×1) under five identical conditions (baseline, full edit-distance-1 collapse, conservative copy-variant absorption, word re-joining, rejoin+ED1), with exact hypergeometric rarefaction and a same-page/line co-occurrence test for rare variants vs. their anchors (topicality-controlled).

**Key numbers:** Baseline V@15k = 3,491 — between English [2,349–2,687] and Latin [4,687]; the "too many types" premise is false. The real anomaly is **ED1 packing density: 87.2% of types are one edit from another type vs. ~50% in controls.** Conservative copy-variant collapse removes **39.7%** of Voynich types (vs. 7.2–13.4% in controls), landing V@15k=2,187 inside the English band; the spatial test confirms rare variants cluster on the same page/line as their anchors far beyond topicality (z=+10.3 same-page, vs. control max +2.6). Re-joining split words is a clean negative (types change −0.5%). Full ED1 collapse overshoots to 626 clusters — below any natural language.

**Result: PARTIAL SIGNAL.** A real, Voynich-specific structural finding about the lexicon (local copying/self-citation inflates the rare vocabulary; the working lexicon is ~2,768 lemmas), but it is explicitly NOT a decipherment step and yields no reading.

### Honest verdict

**No internal crib cracked the sound-value barrier, and none yielded even a single defensible glyph-to-sound or word-to-meaning mapping.** That conclusion is plain and should not be softened: across the four most logical internal routes — a known sign sequence, a frequency anchor, a section bilingual, and the lexicon's own redundancy — every avenue that *could* have handed us a reading came back empty or, in Angle D's case, came back as a structural fact about how the text was produced rather than what it says.

Separating the three honesty buckets precisely:

- **Real new findings (measured, would survive a skeptic):** (1) Angle A — zodiac label-sharing tracks physical folio adjacency, not sign identity; the strongest crib in the manuscript is positionally arbitrary. (2) Angle B — Currier-B's top-frequency band is *inverted* relative to every natural-language control (the most frequent words are the least function-like); no function-word layer exists at the word level. (3) Angle D — ED1 packing density is 87% vs ~50% in controls, and a conservative collapse removes ~40% of types with spatial co-occurrence z=+10.3, evidencing local copying/self-citation. These are robust contrasts (gaps far too large for threshold-tuning or genre to explain), and each independently constrains future decipherment claims.
- **Suggestive but unconfirmed:** Angle D's specific working-lexicon size (~2,768 lemmas) is procedure-relative (depends on a priori rare≤2 / anchor≥5 thresholds, and ED1 cannot separate true minimal pairs from copy variants); the *direction* is solid, the exact number is not. Angle A's ot-/y- gradient is a real correlation but is fully explained by production order, so it is reported as drift, not signal.
- **Failed outright:** the zodiac sign-name and ordinal hypotheses (A), the function-word anchor (B), and the Pharma↔Herbal bilingual anchor (C) — each tested under the specific signatures it logically *must* exhibit, each null.

Critically, every negative here was earned, not assumed: the one positive-sounding result in each angle (the ot- gradient in A, the weakly func-like short words in B, the "significant" overlap p-values in C) was deliberately killed by its own control (folio-order collinearity, held-out instability, page-density artifact) rather than promoted. **This strengthens rather than weakens the integrity of the project.** The structure of the text is demonstrably real and now better characterized than before — it has sections, a stable morphology, a templated top vocabulary, and a measurable copy-variant process — but reading it still requires an external anchor (a true bilingual, a named entity, or a confirmed sound value), because internal structure alone has now been shown, four ways, to be insufficient to fix sound values. We have not merely failed to find a key; we have positively demonstrated that the internal routes to one are closed.

**Most decipherable remaining internal lever:** Angle B is explicit that *if* a real grammar hides in Currier-B, its function morphemes must be **word-internal affixes, not free words** — and Angle A and D both point at the same sub-word level (the o[tk]- "classifier" register; the qo- series taking 65–79% of its left context from one suffix class; the dense ED1 minimal-pair neighborhoods). So the highest-value remaining internal target is the **morphology of the collapsed ~2,768-lemma lexicon**: segment the templated words into prefix/stem/suffix slots and test whether the affix inventory (not the word inventory) carries the grammatical/function load. That is a structural-modeling step, not a reading — it would still need an external anchor to assign sound or meaning — but it is the one internal lever these four audits leave standing.
---

## Part 45 — The grammar is in the affixes: the missing function words, explained (2026-06-11)

Part 44 found Voynichese has *no free function-word layer* — a problem for the language reading *unless* the function load is word-internal (affixes). This part tests exactly that (`decipher_morphology.py`): decompose Currier-B words into prefix+stem+suffix and measure the agglutinative/inflectional signature — a **small closed set of highly productive affixes over a large open stem class**.

| | open stems | prefixes (cover 90%) | suffixes (cover 90%) | top-suffix productivity |
|---|---|---|---|---|
| **Voynich B** | 1,154 | **15** | **17** | 152 stems each |
| Latin | 3,485 | 80 | 32 | 281 |
| English | 1,659 | 104 | 74 | 108 |
| Finnish | 3,568 | 76 | 51 | 268 |
| Turkish | 1,483 | 102 | 68 | 78 |
| Hungarian | 2,300 | 127 | 105 | 101 |

**Voynich has the smallest, tightest closed affix system of any corpus tested — just ~15–17 affixes cover 90% of all affixation — while each remains highly productive (each top suffix attaches to ~150 distinct stems), over an open class of ~1,150 stems.** That is a textbook **agglutinative/inflectional morphological signature**: a compact closed grammar-marker set on an open content-root set. 

**This resolves the Part-44 puzzle coherently.** A language whose grammatical load is carried by a small productive affix system *would not need free function words* — and Voynichese has exactly that architecture. So the "no function words" finding flips from evidence against language to evidence for a *specific kind* of language: one where grammar is word-internal (Uralic/Turkic-type, consistent with the earlier typology screen, Part 13).

**Honest caveats — stated plainly:**
1. **The control comparison is not perfectly fair:** the Voynich used a morphologically-tuned decomposition while the controls used a crude fixed 2-character affix split, which inflates the controls' apparent affix inventories. The *absolute* Voynich numbers (15/17 affixes cover 90%; 152 productivity over 1,154 stems) are diagnostic on their own, but the head-to-head margins are softer than the table suggests.
2. **This does not distinguish language from mechanism.** The same small-affix/open-stem profile is exactly what a *slot-based generative procedure* produces (the Part-32 steelman). "Agglutinative natural language" and "productive slot grammar" remain underdetermined — the morphology is real; whether it is linguistic or mechanical is not settled by this test.
3. **No reading is produced.** This characterizes the grammatical architecture; it assigns no meanings.

**Net.** Across this session the Voynich is now positively characterized as: a phonetic script (vowels o/e/a/y, V-C alternation, CVC syllable grid) with a **compact productive affix-based grammar** over a ~1,150-stem open lexicon (~2,768 surface lemmas after copy-variant collapse), no free function words because the grammar is word-internal, copied with segmentation degradation. That is a richly specified, internally coherent model of *how the writing works* — the strongest such characterization the project has reached — while the sound-values/reading still await an external anchor (Parts 40–44). The internal analysis is now genuinely complete: structure, phonology-skeleton, and morphology all mapped.

Artifacts: `decipher_morphology.py`.

---

## Part 46 — Fusing structure with provenance: narrowing the candidate community (2026-06-11)

The computational findings give a *typological* fingerprint of the language; the historical
dossiers (`research/`) give *where/when/who*. Intersecting them narrows the candidate space
more than either does alone. This is hypothesis-narrowing, explicitly not identification.

**The structural fingerprint (Parts 13, 36, 38–45):**
- Phonetic script: ~66-sign syllabary-shaped inventory, (C)V(C) syllables, 4-vowel system (o,e,a,y).
- **Agglutinative/inflectional morphology**: a small closed set of highly productive affixes (~15–17 cover 90%) on an open ~1,150-stem class; **no free function words** (grammar is word-internal).
- Word-stream typology screen places it in the **inflection-rich tier** (Finnish / Hungarian / Turkish / Greek / Czech), decisively *not* Romance, Germanic, or Semitic.

**The historical fingerprint (`research/01`, `research/02`):**
- Radiocarbon **1404–1438**; materials European; script invented with no precedent (Davis).
- Cultural zone: Yale's cautious "Central Europe"; the herbal/alchemical tradition points to **Northern-Italian / Alpine**; the secure early ownership is **Rudolfine Prague**; production by **five scribes** (a workshop, not a lone hand).
- Genre: a **medical-herbal / alchemical compendium** (plants, stars, balneology, recipes).

**The intersection — a constrained, falsifiable hypothesis.** A 15th-century *Central-European* community producing a *medical-herbal compendium* in an *agglutinative, inflection-rich* language. The one natural-language candidate that satisfies all three constraints simultaneously is **Hungarian** (Uralic → agglutinative; Kingdom of Hungary was a first-rank 15th-c. manuscript culture under Matthias Corvinus and the Bibliotheca Corviniana; squarely in the Central-European zone the provenance favors). The **Old Turkic** hypothesis (Ardıç) sits in the *same agglutinative family* and is equally compatible with the typology — my data cannot separate Hungarian from Turkic, but **can** exclude the Romance/Latin/Germanic/Semitic candidates that dominate popular "decipherments." This is the genuinely new, evidence-grounded contribution of fusing the two layers: *the structural typology and the Central-European provenance both point at the agglutinative family, and away from the languages most often claimed.*

**Why this is consistent with the manuscript remaining unread — and is itself informative.** Two of the best-fitting candidates (Old Hungarian; Old/Middle Turkic dialects of the region) are **poorly attested in the 15th century** — Old Hungarian has very few surviving texts, and a regional Turkic vernacular even fewer. A real language with a *sparse attested corpus*, written in a *bespoke invented script*, by a *workshop* that may have been deliberately encoding a *restricted medical tradition*, is exactly the profile that defeats both crib-based and cognate-based decipherment — which matches the four-way internal-crib failure (Part 44) and the absent bilingual (Parts 40–43).

**Honest caveats (load-bearing):**
1. "Agglutinative-typed" is also reproducible by a non-linguistic **slot-generation mechanism** (Part 32 steelman); this synthesis assumes the language reading, which is supported but not proven.
2. The typology is **underdetermined** between specific languages (Part 41 disagreed with Part 13 beyond "Finnish-adjacent"); "Hungarian" is the *best joint fit of structure + provenance*, not an identification.
3. This produces **no reading** and assigns no sound values. It narrows *which corpora to test the sign-grid against* (Old Hungarian; regional Turkic) — the concrete, reachable next step that could supply the missing test-words.

**Net.** Fusing the layers converts "inflection-rich European, underdetermined" into a *ranked, falsifiable* shortlist anchored by provenance: **agglutinative family (Hungarian / regional Turkic) over a Central-European medical-herbal workshop, in a bespoke script, copied with loss.** That is the most specific honest statement the combined evidence supports, and it tells a future decipherer exactly which sparse historical corpora to digitize and test against the sign-grid (Parts 38–39).

Sources: `research/01`–`02` (Zandbergen/voynich.nu, Yale Beinecke, Davis, Guzy 2022); structural Parts 13, 36, 38–45.

---

## Part 47 — Self-correction: the Voynich is NOT strongly agglutinative (weakens Part 46) (2026-06-11)

Part 46 fused structure + provenance to infer an agglutinative language (Hungarian/Turkic). Intellectual honesty demands testing that against the data directly, because Parts 13/41 had actually ranked Finnish/Greek/Czech closer and Hungarian poorly. A fair cross-language morphological-complexity test (`decipher_agglutination.py`, one unsupervised BPE-morpheme pipeline for all) resolves it — and **corrects Part 46.**

| | morphemes/word | TTR | mean word len |
|---|---|---|---|
| **Voynich B** | **2.44 (LOWEST)** | 0.206 | 5.10 |
| Finnish | 3.67 | 0.301 | 4.91 |
| Latin | 3.88 | 0.268 | 5.99 |
| English | 3.46 | 0.149 | 4.04 |
| Hungarian | 3.21 | 0.206 | 3.53 |
| Turkish | 2.96 | 0.157 | 3.56 |

**The Voynich has the FEWEST morphemes per word of any language tested (2.44).** Agglutinative languages *stack many* morphemes per word — the Voynich stacks the fewest. So the "strongly agglutinative" reading of Part 46 is **not supported**; the morphology is **shallow** — a ~2-unit template (stem + one affix), exactly the prefix–stem–suffix slot structure, not deep agglutination. Overall morphological distance ranks **Finnish closest (0.173), then Latin (0.246) and English (0.258) — both ahead of Hungarian (0.269) and Turkish (0.372, farthest).**

**Correction to Part 46:** the specific narrowing to the *Hungarian/Turkic agglutinative family was an over-reach.* The data does not place the Voynich with the Turkic/Uralic agglutinative cluster on morphological complexity — Turkish is the *worst* match, and non-agglutinative Latin/English beat Hungarian. What survives from Part 46 is only the weakest, robust claim already in Part 13: **"Finnish-adjacent / inflection-rich-ish, but underdetermined,"** and the exclusion of the Romance/Germanic/Semitic *phonetic-substitution* readings (which fail on entropy, not typology). The "it's Hungarian or Old Turkic" inference is retracted as unsupported.

**What the shallow 2-morpheme structure actually implies — and it cuts toward caution.** A word that is just stem + one affix, drawn from a small productive affix set, with the *fewest* morphemes per word of any language, is *more* consistent with a **simple slot-generation template** (the Part-32 steelman: a meaningless but structured procedure produces exactly 2–3 "morphemes" per word) than with a richly inflected natural language. This modestly shifts the balance *back* toward the "structured-but-possibly-meaning-thin" pole and away from "obviously a deep natural language." It does not prove the slot-mechanism reading — but it removes a prop from the natural-language reading that Part 46 had leaned on.

**Net honest position after the correction.** The Voynich has: a phonetic-script-like sign system with identifiable vowels and a syllable grid (Parts 38–39, solid); a *shallow* affix morphology — stem + one productive affix, fewest morphemes/word of any language (this part); word-stream typology that is inflection-rich-*adjacent* but underdetermined (Parts 13, 47); and no free function words (Part 44). Whether this is a lightly-inflected natural language or a structured generative template **remains genuinely undetermined** — and Part 46's attempt to pin a language family was premature. The disciplined statement stands: *structured phonetic-style writing with shallow morphology, language-vs-mechanism unresolved, no reading.*

Artifacts: `decipher_agglutination.py`. **This part supersedes the Hungarian/Turkic claim of Part 46.**

## Reading-Geometry Audit (Test 2) — audit_hist_readgeom.py

Tested whether a non-LTR traversal of the page-grid (columnar, boustrophedon,
line-reversed, snake-columnar) yields MORE sequential structure than plain
left-to-right. Fair comparison: every order is a permutation of the SAME page
token-bag. Metrics: word-bigram conditional entropy H(next|prev) [lower=more
structure] and suffix-class transition MI [higher=more structure]. Pooled over
pages with >=8 text lines.

NEGATIVE. No non-LTR order beats LTR for Voynich. Per-page paired bootstrap
(95% CI of structure-gain-over-LTR):
- VOY_B: COL/COLBOUST *significantly WORSE*; BOUST worse on H2; LINEREV CI
  straddles 0 on both metrics (= indistinguishable from LTR, as expected for an
  in-place line reversal). Nothing exceeds LTR.
- VOY_A: COL/COLBOUST significantly WORSE; BOUST & LINEREV straddle 0.
- ENGLISH control: LTR wins overwhelmingly, every alternative *WORSE* by 0.09–0.48
  bits — method validates. (Both rectangular and ragged-to-match-Voynich grids.)

Conclusion: Voynichese carries its (modest) within-line sequential structure in
the ordinary left-to-right direction. The Zodiac-Z340 "non-linear reading order"
analogy does not transfer: there is no hidden columnar/boustrophedon reading that
recovers more order than LTR. The apparent LINEREV "gain" of +0.001 bits was
inside the noise band.

---

## Part 48 — Lessons from the history of decipherment, and three new tests

Every famous decipherment of the last two centuries broke not by pushing harder along the consensus axis but by attacking an *unexamined assumption* — about what the units are, what order they are read in, what language is "plausible," or where the key lives. This part summarizes that history, then reports three new tests built to transfer those lessons to the Voynich corpus in `~/taxdome/ancient-texts`. All three numbers below are recomputed from the canonical parse; none are asserted from memory.

### A. What was wrongly assumed, and what lateral move cracked it

| Script | Wrong assumption (the deadlock) | Lateral move that broke it |
|---|---|---|
| **Egyptian hieroglyphs** (1822) | Pure ideograms of mystic wisdom (Horapollo's 1,400-year authority) | A *living church language* (Coptic) to check phonetic readings + a Chinese-grammar analogy showing "picture scripts" can be mostly phonetic; foreign-name cartouches forced the sound door open |
| **Linear B** (1952) | The language "cannot be Greek" (Evans; even Ventris guessed Etruscan) | Kober's **sound-value-free grid** proving inflection, then a *geography* crib — Knossos/Amnisos place names that survive whatever language the scribes spoke |
| **Maya glyphs** (numbers 1880s, language 1952) | Non-phonetic mysticism; de Landa's "alphabet" was colonial junk | A *librarian* (Förstemann) decoded the **arithmetic first**; an outsider (Knorozov) realized the "junk" recorded *syllables for Spanish letter-names* — the flaw was the key |
| **Old Persian cuneiform** (1802) | Unknown script + unknown language + no bilingual = unattackable | **Boilerplate**: guess the rigid "X great king, king of kings, son of Y" formula and fill name slots from Greek history books |
| **Hittite** (1915) | An Anatolian language "couldn't" be Indo-European | Ignore the geographic prejudice; one *bread-and-water* sentence read like a cousin of German (*watar* = water) |
| **Copiale cipher** (2011) | The recognizable Roman letters carry the message; weird symbols are noise | The **exact inversion** — the Roman letters were nulls, the abstract symbols carried German plaintext |
| **Zodiac Z340** (2020) | A harder Z408: find better symbol values, read left-to-right | The barrier was **reading order/geometry** — split into 3 blocks, read along diagonals; no symbol-guessing could ever work first |
| **Beale #2** (~1880s) | The answer is inside the ciphertext | The key was an **external document** (a miscounted Declaration of Independence); internal analysis could never find it |

**The common rule:** enumerate your unexamined assumptions and convert each into a falsifiable test — about the *units* (Copiale), the *reading order* (Z340), the *plausible language family* (Hittite, Linear B), a *subsystem decoupled from language* (Maya numbers), *genre boilerplate* (Grotefend), or *where the key lives* (Beale, Knorozov's de Landa).

### B. Three new tests

**Test 1 — Null-glyph layer (Copiale lesson).** *Method:* rank Currier-B glyphs by how null-like they are (self-surprisal + successor entropy), splice out the top candidates, and ask whether the residue becomes *more* language-like (order-2 entropy h2 toward 3+ bits **and** fewer edit-distance-1 "twin" words) — with English (Dracula) run through the identical pipeline as a control. *Key numbers:* Voynich-B baseline h2 = 2.107, twin-fraction = 0.859; the most-null glyphs rank **n, y, q, i, e**. Removing {i, n, q, y} raises h2 by +0.317 (to 2.424) but pushes twin-fraction the **wrong way**, +0.042 to 0.901. English baseline h2 = 3.129, twin-fraction = 0.475; removing its lowest-info letters {d, e, h, t} moves both metrics the *same direction* (h2 +0.188, twins +0.195). **Result: NEGATIVE.** The h2 rise is a mechanical artifact of deleting frequent symbols from *any* text (English shows the identical response), not signal recovery; the flagged glyphs are the positionally rigid edge/prefix glyphs that scaffold the 2-morpheme word template — predictable because the template is rigid, not because they are meaningless. No hidden null layer.

**Test 2 — Reading geometry (Zodiac Z340 lesson).** *Method:* lay each page out as a grid (lines × word-positions) and read it under five orders — LTR (baseline), columnar, boustrophedon, line-reversed, snake-columnar — each a permutation of the *same* page token-bag, scored by word-bigram conditional entropy H2 (lower = more structure) and suffix-class transition MI (higher = more structure), with per-page paired bootstrap CIs and an English control that must show LTR winning. *Key numbers (Voynich B, 79 pages, 22,261 tokens):* LTR H2 = 4.256, suffixMI = 0.1841. Columnar and snake-columnar are significantly **worse** (H2 +0.075 to +0.106, MI down); boustrophedon worse-or-tied; line-reversal is *statistically indistinguishable* from LTR (H2 gain CI [−0.002, +0.005], the expected null since reversing a line in place barely changes its bigrams). The English control behaves perfectly: LTR wins by large, well-separated margins (rivals worse by up to −0.475 H2), and English carries ~4× more order-information than Voynichese to begin with. **Result: NEGATIVE.** No non-LTR traversal recovers order the plain left-to-right reading misses; the +0.001-bit line-reversal "win" sits inside the noise band.

**Test 3 — Constrained subsystem / numbers-first (Maya lesson).** *Method:* per diagram-constrained section (A astronomical, C cosmological, Z zodiac, P pharma, S stars), test four number-like signatures against shuffle/resample nulls — monotone e/i-operator runs, closed-set behavior of line-edge and label tokens, tally-like short final tokens, operator value distribution vs astro constants {7,12,28,30,360} — plus per-section operator *entropy* (a real digit system has *low* entropy). *Key numbers:* monotone-run z ≤ 0 in every constrained section (A −1.3, C −2.3, P −0.2, S −3.6) — no counting sequences; line-final TTR stays high (A 0.86, C 0.81, P 0.79) — no closed quantity vocabulary; distinct operator levels = 5–6 everywhere (set by word-length ceiling, not 7/12/28/30/360); operator entropy is *higher* in constrained sections than in herbal prose (H 1.50 → A 1.84), the **inverse** of a numeric channel. The lone number-like regularity is the zodiac diagram's ~30 (and split 15+15) label-loci per sign page (8/12 pages at 28–32) — but those are *drawn* nymph/star loci, and the label *tokens* are near-unique (TTR 0.81, 272 types / 336 tokens), the opposite of a reused numeral set. **Result: NEGATIVE.** The externally-constrained sections carry no independently-recoverable numeric subsystem in the *text*.

### What history suggests for the Voynich

The eight cases give a concrete playbook, and we have now spent this part and Parts 40–47 executing the parts of it that run on a transliteration alone. The transferable moves are: build a **sound-free combinatorial grid** before guessing phonetics (Kober — done, and the affix grammar of Parts 10–11 is exactly that); use **proper-noun cribs anchored to external reality** (Ventris's place names → the manuscript's labels and later month annotations — tried in the Part 40 crib tests); attack **reading order** before symbol values (Z340 — Test 2 here); test whether the **obvious units are nulls** (Copiale — Test 1 here); crack a **language-free subsystem first** (Maya numbers — Test 3 here); exploit **genre boilerplate** as a formula crib (Grotefend — the recipe-frame structure); and re-examine the **dismissed and the external** (Hittite/Knorozov/Beale — the central lesson, because Parts 9–11 already concluded the most economical model is a *word-level code whose key is a second, missing document*).

### What the new tests found

All three are **clean, expected negatives**, and they matter precisely because each was designed to *fail honestly* with a control that proves the test can detect a positive. None opened a new foothold:

- **Copiale inversion does not apply** — there is no null-glyph layer; the "predictable" glyphs are structural scaffolding, and stripping them degrades the text exactly as it degrades English.
- **Z340 geometry does not apply** — Voynichese's (modest) sequential structure lives in the ordinary left-to-right direction; no columnar/boustrophedon reading unlocks more.
- **Maya numbers-first does not apply** — the diagram-constrained sections carry no recoverable numeric channel; operator entropy points the wrong way, consistent with Part 26's finding that the e/i operator is graded production drift, not a quantity.

These reinforce, rather than overturn, the cumulative picture: low character-entropy, rigid 2-morpheme words, real morphosyntax, topic-locked vocabulary, and **no item-level link between text and pictures** — a system whose meaning (if any) is gated by an external key we do not have. No fabrication risk arose: none of these tests produces readings, sound values, or translations; they only measure structure, and every headline temptation (the LINEREV "win," the zodiac "30") was explicitly checked against a null and rejected.

### The single most promising untried lateral direction

The history points hardest at the move we *cannot yet run*: **the Knorozov/Beale lesson — re-examine the dismissed external artifact as a key, but with the images, not just the transliteration.** Every test in this investigation operates on the EVA text stream; the one lever none of them can pull is the manuscript's *non-textual* channel — the precise drawn positions of the zodiac star/nymph loci, the rosette-page geography, the herbal plant morphology, and the exact placement of each label against its illustrated referent. Knorozov's break came from realizing a *flawed external document* (de Landa's letter-name list) encoded the key; Ventris's came from *external geography*; Beale's key was an *external standard text*. The Voynich analogue is to treat the **illustration layout itself as the external bilingual**: align label tokens to their drawn loci (a label on the 7th nymph of a zodiac ring, a star at a known catalog position, a plant whose silhouette matches a dated herbal) and test whether *one* glyph-to-referent mapping satisfies five or more labels simultaneously — the Ventris "5+ at once = signal" threshold. We cannot run it here because it requires (a) the high-resolution page images with per-label *coordinates* (the canonical parse carries label text and folio, not pixel positions), and (b) an external period corpus — a digitized 15th-century star catalog and an illustrated herbal (e.g. the *Tractatus de herbis* tradition) — to supply the candidate referents. That is a finite, automatable search, and it is the one direction the eight decipherments unanimously endorse that our text-only toolkit has structurally been unable to reach.
---

## Part 50 — Looking at the astronomical pages: a real crib candidate, and a hard limit (2026-06-11)

Viewed the full **f68r** spread (`images/voy_68r_full.jpg`) plus the Pleiades detail
(`images/voy_f68r3_pleiades.jpg`), and matched the image to the transliterated labels via the
canonical parser. This is genuine new information — only obtainable by looking.

**What the astronomical pages actually are.** Three large **circular star diagrams**: a spiral/
swirl of stars (left), a central blue rosette with stars on radial spokes (middle), and a central
**sun** with blue/gold rays ringed by labelled stars (right). Each diagram = a central body with
**individual stars on radial lines, each star carrying one Voynichese label** at the rim. The f68r3
label list (parser) has ~12 star-labels: *doaro, dchol day, oalcheol, okos, otary, okolchy, darall,
okchody...* — the **ot-/ok-/o- star register** again.

**The genuine crib candidate (first in the project).** The f68r3 detail shows a tight clump of
**~7 stars with one emphasized** — the classic, widely-noted identification as the **Pleiades**
("Seven Sisters"), one of the few star groups *named in essentially every culture*. The label
visible beside the clump matches the transliterated **`doaro`**. So: *IF* the clump is the Pleiades
*AND* `doaro` is its label, then `doaro` is a candidate word for a named star-cluster — the
Linear-B-style "the referent survives whatever the language is" foothold. **This is a real, testable
crib candidate — the first time the project has tied a Voynich word to a possible real-world meaning
via the image.**

**Why I am NOT calling it a reading (the discipline).** (1) It is a *single* data point; one word
cannot validate a sign-grid — Linear B needed place-names that then *cascaded* into readable Greek
elsewhere. (2) The Pleiades-as-`doaro` idea has been floated in the literature before and has **not**
cascaded — itself evidence it is not the key. (3) I cannot, from this resolution, confirm `doaro` is
the cluster's label versus one of the other 11 page-labels.

**The hard limit I could only learn by looking — and it tempers the whole diagram-crib hope.** The
star diagrams are **schematic, not coordinate-accurate**: stars are placed *decoratively* in rings
and spokes, not at their true sky positions. So the Maya-style "align the chart to a real star
catalogue by position" approach **cannot work here** — there are no true positions to align. The crib
potential reduces to **recognizable drawn asterisms** (the Pleiades is identifiable precisely because
it is drawn as a distinctive clump, not a ringed single star), of which there may be only a handful.

**Honest net.** Looking at the manuscript produced one genuine crib candidate (Pleiades ↔ `doaro`)
and one important constraint (the charts are schematic, so position-alignment is out). To advance:
**(a)** a higher-resolution f68r3 to confirm the `doaro`-cluster attachment and read it precisely;
**(b)** a search for *other* recognizable drawn asterisms across the astronomical/zodiac pages
(Hyades, Ursa, a crescent, etc.) — each one a candidate crib word; **(c)** if 3–4 such asterism-words
accumulate, the sign-grid can finally be tested for *cross-consistency* (the cascade test). A star
catalogue helps only for naming recognizable groups, not for positional alignment. This is the most
concrete forward path the project has had — and it is honestly bounded.

Artifacts: `images/voy_68r_full.jpg`, `images/voy_f68r3_pleiades.jpg`.

---

## Part 51 — Testing the Pleiades crib, and why the diagram route is bounded (2026-06-11)

Pursued the image-based crib to an honest conclusion, viewing more pages from the full manuscript
PDF (downloaded; `images/voynich_full.pdf`, 214pp) and testing the `doaro` candidate internally.

**Viewed the sun page f67r (PDF p121, calibrated against visible folio numbers).** The left diagram
is an unmistakable **sun** (central face, alternating blue/gold rays, stars between). But its text is
**diffuse ring-text around the rim, not a discrete object-label** — so the sun yields no clean crib
word, unlike the Pleiades clump. Same for the big radial star-wheels (f67v, f68r): labels sit on
spokes/rings, decoratively, not as captions naming a depicted object.

**Internal test of `doaro` (Pleiades candidate) — comes back weak.** In the full corpus, `doaro`
occurs **exactly once** (the astro label itself), and it does **not** match the dominant star-label
register: star/zodiac/cosmo labels are overwhelmingly `ot-`/`ok-` initial (140/123 occurrences),
while `doaro` is `do-` initial. So the candidate is a **hapax atypical for the star register** —
consistent either with a genuinely unique object-name (the Pleiades is unique) or with `doaro` not
being the cluster's name at all. Unconfirmed; I do not promote it.

**The honest, important finding — why the diagram-crib is structurally bounded (learned only by
looking):** the Voynich's labelling conventions defeat the historical "name the depicted object"
crib in three compounding ways:
1. **Schematic, not positional** star charts → no sky-coordinate alignment (Part 50).
2. **Diffuse ring/spoke text** rather than discrete object-captions → few clean object↔label pairs.
3. The labels that do attach are in a **uniform `ot-/ok-` classifier register and frequently hapax**
   → even an identified object (Pleiades) gives an atypical, non-recurring word that cannot be
   cross-checked elsewhere.

This is genuinely informative: it explains *why* even the manuscript's most identifiable
illustrations (sun, a recognizable star cluster) have not yielded cribs for anyone — the labelling
style itself is crib-resistant. It is the visual-layer counterpart to the four text-internal crib
failures (Part 44).

**Net for the session's image work.** Real gains: first direct viewing of the manuscript; confirmed
the ring-of-labeled-figures and `ot-/ok-` register visually; one genuine-but-weak crib candidate
(Pleiades↔`doaro`); and a clear, evidence-based account of why the diagram route is bounded. No
reading. The one place this could still go: a **higher-resolution survey for a *recurring*
object-label** (an object drawn on multiple pages whose label repeats — that would escape the hapax
problem) and a candidate-language corpus to cash any such crib into sound values. Both remain the
reachable next steps; neither is fabrication.

Artifacts: `images/voynich_full.pdf`, `images/pdfpages/` (rendered folios).

---

## Part 52 — The recurring-object crib also fails the null test (2026-06-11)

The last image-based hope: the zodiac signs **split across two half-pages** (Aries: f70v2+f71r;
Taurus: f71v+f72r1) depict the *same animal twice*, so a sign-name should *recur* on both halves —
a built-in recurring-object crib that escapes the hapax problem (Part 51).

Words appearing on both halves of a split sign and rare elsewhere in the zodiac:
- Aries: `otaldar`, `okody`, `arar` (3 candidates)
- Taurus: `okeodaly`, `chal` (2 candidates)

**But the null test kills it.** Across all 64 other zodiac page-pairs, the mean number of such
"shared + rare" words is **1.14 ± 1.16** — and Aries's 3 gives only **z = +1.6** (14% of random
pairs match or exceed it), Taurus's 2 gives **z = +0.7** (27% match). **9 of 64 random page-pairs
share ≥3 such words by chance.** The split-sign sharing is statistically indistinguishable from
chance: the candidate "sign names" are coincidences, not a recurring sign-label.

**Honest conclusion of the image-crib investigation (Parts 49–52).** Looking at the manuscript —
the modality the project had never used — was genuinely worth doing: it confirmed the visual
structure, gave a modality I can keep using, and tested the historically-endorsed diagram route
rigorously. But every image-based crib fails: positional star-alignment (charts are schematic, P50),
discrete object-captions (labels are diffuse ring-text, P51), the single identifiable cluster
(Pleiades↔`doaro` is a hapax atypical for the register, P51), and recurring sign-names (chance, this
part). The convergent finding is robust and now evidenced from the visual side as well as the
textual: **the Voynich's labelling conventions are structurally crib-resistant.** That is *why* the
lateral moves that cracked Maya and Linear B have never worked here — not for want of trying, but
because the manuscript does not caption its pictures in a recoverable way.

**Where a reading could still come from — unchanged, and now triply-confirmed as external:** a
bilingual or cognate-language corpus (to cash the sign-grid into sound values), or recovered legible
text from the 14 missing folios. No internal or image-based route remains untried. The honest map is
complete; the key is genuinely outside the manuscript as it survives.

Artifacts: inline (canonical parser + null test).

---

## Part 53 — Second-machine reproduction + mirror provenance (2026-06-11)

**Claim tested:** a fresh clone on a different machine, with primary data hosts unreachable, can
reproduce the headline numbers. This is the reproducibility promise of Part 30 actually exercised
from zero.

**Method.** New environment (sandboxed Linux, no access to voynich.nu / gutenberg.org /
thelatinlibrary.com). `fetch_data_mirrors.sh` (new, kept separate from the canonical
`fetch_data.sh`) rebuilds `data/` from pinned public mirrors: ZL3b from a SHA-pinned GitHub copy
(verified to be the same v3b of 13/05/2025), Caesar from the CLTK Latin Library mirror, English
from GITenberg, Finnish/Turkish/etc. from bible-corpus, Digby from GITenberg. Six analysis scripts
carried hardcoded private absolute paths — a real reproducibility bug (a fresh clone could not run
them unedited). `canonical.py` now resolves `DATA` from `$VOYNICH_DATA` or `./data`, and the six
scripts derive from it.

**Result: all gates pass.**
- Canonical table **exact**: 33,122 / 6,530 (A 10,105/2,867; B 22,496/4,590; alphabet 25).
- `analyze_voynich.py`: every Voynich-side number within tolerance; only the English/Latin
  control corpora drift ~2 % (different PG edition cleanup in mirrors) — documented, harmless.
- Morphosyntax r = 0.560 / 0.555 (committed 0.55/0.56); steelman M1 r = 0.508 (0.51);
  topic ratio 1.287 (1.29); syllabary battery reproduces.

**Interpretation.** The pipeline survives a hostile-environment replication. The path bug is fixed
upstream rather than worked around. Artifacts: `fetch_data_mirrors.sh`, edits in `canonical.py`.

---

## Part 54 — The Kipchak hypothesis: period-correct Turkic finally tested (2026-06-11)

**Prompt.** Two user-supplied sources pointed at Kipchak Turkic: the **Codex Cumanicus** (c. 1330,
the only substantial period Kipchak corpus; Kuun's 1880 edition, page-level OCR) and Makhmutova's
study tying Tatar dialects to it. The typology screen (Parts 13/47) had only ever tested **modern
Anatolian Turkish** — the wrong branch (Oghuz, not Kipchak) and 600 years late. And the sharpest
code-invariant Turkic signature — **vowel harmony** — had never been tested at all.

**Data built.** `build_cuman.py` mines Kuun's OCR: 2,685 Cuman headword types from the Cuman index
(images 396–455), and 1,970 running-text tokens (1,079 prose tokens from the continuous prayers/
hymns/riddles, images 300–381) extracted by char-trigram language ID bootstrapped from the
headwords, with a lexicon-hit-rate filter against Persian/Latin/German contamination. Committed as
`data/cuman_*.txt` (public-domain source).

**T1 — vowel harmony, sign level.** For each corpus: find the vowel bipartition maximizing the
pure-word rate, score against the same optimization on vowel-resampled surrogates (M=25), classes
constrained to be non-degenerate. The method validates emphatically: Turkish z = +71 *discovering
exactly the back/front classes* {a,o,u,ı}|{e,i,ö,ü}; Finnish z = +54 ({a,e,i,o,u}|{y,ä,ö} — the
textbook partition with neutral vowels grouped); Hungarian z = +7.6; Latin −6.5, English −0.7,
Czech −4.7, Greek +2.0/−12.6. **Voynich-B: z = −0.5 (free) / +3.3 (balanced, {ey}|{ao}) — and the
balanced signal does NOT replicate in Currier A (z = −1.9).** The only large Voynich z (+36 for
{aeoy}|{i}) is the EVA *i*-run frame (`aiin`/`air`), i.e. orthography, not harmony. Verdict: **no
sign-level vowel harmony.**

**T3 — latent harmony at unit level** (robust to verbose encoding: if harmony is buried inside
multi-glyph units, unit classes must still assort within words). Greedy balanced bipartition of
within-word unit adjacencies vs unit-shuffled surrogates. Positive control: syllabified Turkish
z = **+18.2** (the method sees harmony *through* unit encoding); Latin syllables +2.3.
**Voynich BPE-66 units: z = +1.1 — no latent harmony either.**

**T2 — segmentation-leakage.** If tokens are over-segmented fragments (Part 37), word-internal
class agreement must leak across adjacent tokens. Real B: z = **+11.6**, surviving exclusion of
near-twin pairs (edit distance ≤ 2) at z = **+8.3**; replicates in A (+14.8). The fitted
self-citation generator **cannot** produce it (+1.9, and −0.9 with twins dropped); class-Markov
partially can (+5.8) since it chains the real class sequence. Among language controls only
**over-segmented Turkish** (+78 with twins dropped) behaves like Voynich; intact Turkish (+2.9) and
over-segmented Latin (+0.3) do not. Consistent with token-fragments + class-sequential structure;
not by itself proof of harmony (the class-Markov partial match shows it overlaps the known grammar).

**T4 — period profile.** Token-weighted BPE-80 (the Part 36 procedure; Voynich row self-validates
at 64 units / 1.91 per word ≈ the published 66 / 2.04): Cuman words carry **3.25–3.73 units**,
Turkish 3.68, Finnish 3.38, Latin 3.47 — **Voynich tokens carry ~half the units of a Kipchak
word**. If Voynichese encoded an agglutinative language at units≈syllables, the tokens are
half-words — quantitatively the Part 37 over-segmentation picture. Cuman prose TTR@1k = 0.72 vs
Voynich 0.48–0.52: Voynich is far more repetitive than real Kipchak prose.

**Honest caveats.** (1) The Cuman positive control itself is weak at T1 (lex z = +0.6): medieval
Latin-alphabet transcription by Italian/German scribes blurs the vowel distinctions that carry
harmony — so a *sloppy phonemic* encoding could hide harmony too. Voynichese's rigid orthography
makes that less likely but not impossible. (2) OCR noise in the Cuman corpus is real; the lexicon
was filtered but not hand-proofed.

**Net.** The Kipchak clue produced the field's first **harmony test with validated positive
controls**, and the answer is negative at both sign and unit level. This cuts wider than Turkic:
it argues against sign-level encodings of **any harmonic language — including Finnish and
Hungarian, the Part 13 typology leaders**. Surviving Kipchak routes: word-level code/nomenclator,
or an abjad-style (vowel-suppressing) spelling, both of which destroy vowel identity. SCENARIOS
updated accordingly. Artifacts: `build_cuman.py`, `kipchak_test.py`, `generators.py`,
`data/cuman_*.txt`.

---

## Part 55 — Transcription-choice sensitivity: the two silent parsing decisions (2026-06-11)

**Claim tested.** Every number since Part 1 sits on two undocumented parser choices: alternate
readings `[a:b:...]` → *first* option; uncertain spaces (`,`) → *definite breaks*. Neither had a
sensitivity analysis (flagged in the independent review as an open audit item).

**Method.** `canonical.clean_text/parse` gained backward-compatible `alt=first|second` and
`commas=break|join` switches (defaults verified byte-identical to the published parse).
`sensitivity_test.py` recomputes the headline battery over the 2×2 grid with pre-registered
fragility thresholds (|Δr| > 0.10, |Δh2| > 0.1 bits, |Δmerge| > 10 pts).

**Result: nothing is fragile.**

| cell | tokens | types | h2 | zipf | r edy/ey | r dy/y | BPE n90 | merge% |
|---|---|---|---|---|---|---|---|---|
| first/break (published) | 33,122 | 6,530 | 2.086 | −0.78 | 0.538 | 0.531 | 64 | 17.4 |
| first/join | 30,671 | 7,430 | 2.099 | −0.75 | 0.522 | 0.532 | 64 | 12.2 |
| second/break | 33,114 | 6,522 | 2.085 | −0.78 | 0.541 | 0.525 | 64 | 17.3 |
| second/join | 30,663 | 7,430 | 2.099 | −0.75 | 0.547 | 0.502 | 64 | 12.2 |

Alternate-reading choice is numerically invisible (8 tokens). Joining uncertain spaces removes
2,451 tokens, adds 900 types, and absorbs about a third of the short-pair merge signal — exactly
as the Part 37 over-segmentation model predicts (uncertain spaces are the *visibly* uncertain
subset of a larger boundary-noise problem) — while every structural result stays put.

**Interpretation.** The audit hole is closed: headline claims are transcription-choice robust.
Artifacts: `sensitivity_test.py`, kwargs in `canonical.py`.

---

## Part 56 — Long-range information: first direct evidence on the meaning question (2026-06-11)

**The question.** Parts 32/44 left the central dichotomy undecidable: rule-governed sequence was
proven, but a meaningless class-Markov process reproduced the grammar. Local processes have one
hard limit: they cannot fake the **distance profile** of information. Natural prose carries excess
mutual information at discourse range (~5–100 words; power-law decay), topic structure only at
section scale, Markov chains die exponentially. Montemurro & Zanette (2013) measured topic-scale
information in the VMS but never controlled against self-citation — that control is what
`longrange_test.py` adds, plus structural scrambles of B itself.

**Method.** Excess suffix-class MI(d), d = 1…512, over M = 20 global-shuffle surrogates (exact
bias correction; K = 20 classes keeps plug-in bias ≈ 0.01 bits); companion word-identity
repetition decay (category-free); corpora: real B and A, fitted self-citation (2 seeds),
class-Markov (2), word-bigram, Latin, English, Finnish, Digby recipes (genre control), and four
scrambles of real B (within-line, global line order, line-order-within-page, page order).
Calibration behaved: global surrogates |z| < 2 everywhere; English/Latin positive at d ≤ 2 with
fast decay; word-bigram positive exactly at d = 1–2.

**Result.** Real B carries excess at every distance (+82 mb at d=1 → +4–8 mb at d=64–512, all
z ≥ 3). No generator approaches it (self-citation ≤ +4 mb and only d ≤ 4; class-Markov/word-bigram
d ≤ 2). No language matches it either: Latin/English are dead by d ≈ 8–16, bible-Finnish by 32,
and the genre-matched **Digby recipe compendium shows only +2–5 mb** where B shows +8–12. The
scrambles localize the excess:

| scramble | d≤2 | d=8–128 | d≥256 |
|---|---|---|---|
| within-line shuffle | killed | survives | survives |
| line order within page | survives | **survives** | survives |
| page order | survives | survives | killed |
| global line order | survives | killed | killed |

**The d ≥ 8 excess is page-level suffix-class *composition* plus neighboring-page drift — not
sequence.** Line order within a page carries almost none of it. This is the Part 26 production
drift made visible as the manuscript's entire long-range information budget: the suffix system
behaves like a slowly drifting production parameter. Real languages do the opposite — their
morphology-class usage is stationary (all controls ≈ 0 beyond d ≈ 16) even in record-genre text.
And the word-repetition decay of real B tracks the fitted self-citation generator essentially
point-for-point out to d = 256.

**Interpretation (pre-registered read 2, sharpened).** There is **no sequential discourse-scale
signature** in Currier B. The one thing that looked like long-range structure is registral drift,
which meaningful prose does not produce. Two readings survive, both narrowed: (a) meaning-thin
generation with drifting habits — now positively supported on the axis where language is hardest
to fake; (b) meaningful but **line-record** content (lists/recipes have no cross-line discourse) —
though the genre control shows even recipes retain more stationary morphology than B. Continuous
prose underneath is now strongly disfavored. This is the first *new* evidence on the central
question since the steelman, and it moves S5 up, S1-as-prose down. Artifacts: `longrange_test.py`.

---

## Part 57 — Rugg's table-and-grille, finally tested: excluded on four axes (2026-06-11)

**The hole.** Timm & Schinner's self-citation got the full treatment (Parts 5, 32); the *other*
canonical meaningless-generation proposal — Rugg 2004's Cardan-grille-over-table — was never
tested in this project (or, quantitatively, anywhere). `rugg_test.py` builds a faithful generator:
table columns = the empirically top B prefixes/mids/suffixes, G random 3-hole grilles, grille
switched every few lines; minimal random-search fit (60 configs over R∈{30..64}, G∈{8..24},
empty-cell and switch rates) scored only on surface stats, exactly in the spirit of Rugg's
"a casually built table suffices" claim. Then the same battery the self-citation generator faced.

**Result: fails everything, each failure diagnostic.**
1. **Vocabulary scale:** best fit produces **214 types** where B has 4,584. A bounded table cannot
   bridge a 20× type deficit without becoming a different mechanism (the self-citation model
   solved exactly this with mutation — which is why it, not the grille, fits surface stats).
2. **Surface fit:** h2 = 1.69 vs target 1.98; Zipf −0.66 vs −0.81 — outside every other
   generator's reach of the targets.
3. **Grammar shape:** where computable, the grille *overshoots* (dy/y r = +0.73 on 14 stems vs
   real +0.53 on 86): deterministic column structure gives too-strong, too-narrow "grammar". Real
   B's intermediate, broad-stem r is not grille-like.
4. **The periodicity fingerprint:** long-range class-MI is a comb — **+1,236 mb at d=1, +844 at
   d=4, +35 at d=16, then stone dead from d=32** — versus B's gentle persistent excess out to
   d=512 (Part 56). A periodic mechanical device cannot produce B's drift-shaped information
   profile.

**Caveat.** Only R ≤ 64, G ≤ 24 tested; an enormously larger table bank could lift the type count,
but at the cost of the very simplicity that is the hypothesis's selling point, and the comb/decay
signature would persist. **Verdict: table-and-grille moves from "never tested" to "excluded by
the quantitative battery".** The generator ledger now reads: self-citation (surface ✓, grammar ✗,
leakage ✗, long-range ✗), class-Markov (grammar ✓, long-range ✗), word-bigram (d=1 only), grille
(✗ across the board). Artifacts: `rugg_test.py`.

---

## Part 58 — Space-blind re-segmentation: the grammar is in the glyph stream (2026-06-11)

**The threat.** Parts 37/54/55 converged on "scribal spaces are noisy". But every word-level
result — including the flagship morphosyntax r=0.55 — tokenizes on those spaces. If the grammar
were an artifact of scribal spacing, the project's strongest internal evidence would wobble.

**Method (`resegment_test.py`).** Currier-B lines stripped of spaces (lines never merged — the
line is the production unit, Part 16), re-segmented purely statistically (per-line BPE, 300 and
600 merges), and the Part-10 split-half battery re-run on the re-segmented units, using both the
original suffix pairs and pairs re-derived from the new units' final strings. Pre-registered
validation gate: the same segmenter must recover true boundaries in space-stripped Latin/English
at F1 ≥ 0.35 — it does (Latin 0.43, English 0.61), so conclusions are licensed.

**Results.**
1. **The grammar survives.** BPE-600 units: edy/ey r = +0.427 (42 stems), dy/y +0.445 (60);
   re-derived pairs l/r +0.544 (41), in/r +0.513 (39) — against the original-spacing baseline of
   +0.518/+0.554. Modest attenuation, same structure, on a tokenization that never saw a scribal
   space. **The distributional morphosyntax is a property of the glyph stream itself.** This
   removes the largest internal threat to the Part 10 result.
2. **Scribal spaces are statistically redundant.** The BPE boundary set agrees with scribal
   spaces at F1 = 0.74–0.77 (P 0.62–0.69, R 0.88–0.92) — *higher* than the same segmenter
   achieves against true word boundaries in Latin (0.43) or English (0.61). Voynich spaces sit at
   glyph-transition points a statistical segmenter would have chosen anyway: "eye-grouped cut
   points at predictable transitions", exactly the copying-segmentation picture — and a further
   reason no analysis should ever again treat spaces as load-bearing.

**Interpretation.** Strengthens Part 10 (grammar robust to the boundary problem) and Part 37
(spaces ≈ predictable cut points, not lexical boundaries). Combined with Part 54-T4 (tokens carry
half the units of real agglutinative words), the working picture of the token stream is: a
glyph-sequential system whose grammar lives below the "word", with spaces inserted at
statistically natural break points. Artifacts: `resegment_test.py`.

---

## Gap ledger after this session (what remains genuinely untested)

Ranked by expected information value:
1. ~~Abbreviated-Latin verbose encoding~~ — **done, Part 63**: fails; per-letter verbose tables
   cannot buy the BPE-shrink signature; S2/S3 narrowed to slot-level encodings.
1b. ~~Syllable-value search on re-segmented units~~ — **done, Part 64**: null (z ≤ +0.9 vs
   search-matched scrambled-lexicon controls for Cuman/Latin/Greek); the last internal route is
   closed, and the ~57–74% null hit-rates quantify the decoding illusion behind every published
   "translation".
2. **Constrained syllable-value search** under the *re-segmented* unit inventory (Part 58 makes
   the old word-boundary objection moot): anneal unit→syllable maps against Cuman, Latin,
   Greek lexicons with search-matched scrambled-lexicon nulls.
3. **Third-transcription replication** of Parts 54/56/58 (Takahashi) — v101 replication exists
   for the grammar but the new results have only ZL3b.
4. **Hand-proofing the Cuman corpus** against Kuun's printed pages (OCR error rate unmeasured),
   and adding Mishar Tatar dialect data (Makhmutova's appendix) as a second Kipchak witness.
5. **A true medieval list/inventory corpus** (account books, litanies) for the Part 56 genre
   axis — Digby (1669 recipes) is the closest available mirror-fetchable proxy.

---

*The following four parts (59–62) arrived via a parallel session's PR (originally numbered 34–37 against an older REPORT; renumbered at merge). They are an interpretive layer — hypothesis-level, with their tests shipped as scripts — and should be read against Parts 26 and 56: the e/i operator is established as drifting production behavior, and Currier B carries no cross-line discourse signal, so the Galenic readings below survive specifically in the line-record form (per-record property notation), not as running prose.*

## Part 59 — Galenic Degree Hypothesis: grounding the operator in medieval pharmacology (2026-06-11)

**Frame.** Parts 17–26 established that the e/i operator is a closed {0,1,2,3} set, subject-bound (not layout-driven), and section-varying (stars/recipes 1.05 > biological 0.85 > herbal 0.78). Part 26 revised the reading from "local context governor" to "slowly drifting production parameter." But this opened a question that production drift alone cannot answer: *what does the operator's subject-boundedness mean at the content level?*

The historical context supplies the most specific answer available: **Galenic degree theory**. Galenic medicine (dominant 1300–1600 CE, the exact period of the manuscript's production and use) classified every medicinal substance by its primary quality (hot, cold, dry, moist) and its **degree** on a scale of exactly four steps:

- Degree 1 (operator value 0): mild — the quality is barely perceptible.
- Degree 2 (operator value 1): moderate — clearly felt, safe for most patients.
- Degree 3 (operator value 2): strong — therapeutic but requiring caution.
- Degree 4 (operator value 3): extreme — dangerous, used sparingly as a corrective.

**The structural match.** Four degrees, asymmetrically distributed with degree 1–2 dominant and degree 4 rare — exactly our {0:38%, 1:34%, 2:25%, 3:3%} operator distribution. The 3% frequency of value 3 is precisely correct: extreme-degree substances (colchicum, arsenic compounds) were prescribed rarely and in small amounts in any pharmacopoeia. The subject-boundedness of the operator (it is a property of the ingredient, not of adjacent grammar) is what Galenic degree encoding predicts: the degree is an intrinsic quality of the substance, stable across all recipes it appears in.

**New tests: `humoral_test.py`** (five experiments):
1. *Section thermal profiles* — do sections carry characteristic degree distributions matching the different medical domains they illustrate?
2. *Intra-record degree balance* — do records (compound prescriptions) mix degrees, as Galenic compounding requires?
3. *Prefix × degree co-occurrence* — do domain-classifier prefixes predict operator degree (aquatic = mild/cold; dry herb = warming/moderate)?
4. *Directional bias within records* — is there ascending/descending staging? (Galenic: no; alchemical stage theory: ascending.)
5. *Pharmaceutical vs herbal degree elevation* — do prepared compounds (section P) carry higher degrees than raw plants (section H)?

Reported at the level of design and hypothesis; results reproducible once `bash fetch_data.sh` provides ZL3b-n.txt.

Artifacts: `humoral_test.py`.

---

## Part 60 — The six sections as a complete medical compendium (2026-06-11)

**Iconographic program.** The Voynich's six illustration types are not random — they are the standard six components of a 15th-century medical-alchemical compendium:

| Section | Illustrations | Medical function |
|---|---|---|
| H (Herbal) | Plants: roots, stems, flowers | Materia medica — ingredient property listings |
| B (Biological) | Female figures in pools | Hydrotherapy (balneotherapy) prescriptions |
| A/C (Astro/Cosmo) | Stars, planets, circles | Timing for pharmaceutical preparation |
| Z (Zodiac) | Signs + nymphs in medallions | Zodiac-body-part correspondence tables |
| P (Pharmaceutical) | Jars, containers, organs | Prepared/distilled compounds and their properties |
| T (Text-only) | Dense text, no images | Compiled prescriptions from all other sections |

**The herbal (H) without plant names.** Part 23 found zero page-specific mid-stems — no plant names exist. This is *expected* in Galenic pharmacy: the notation encodes property profiles, not identities, precisely so that any substitute with the same quality-degree profile can be used. The same mid-stem on nine different plant pages means nine plants share the same property — interchangeable ingredients in the formulary's sense.

**The biological (B) as hydrotherapy.** The female figures in baths are not decorative nymphs; they are patients undergoing prescribed thermal treatments. Balneotherapy was a major Galenic therapeutic: bath temperature (tepid/warm/hot), duration, mineral additives, and body-part immersion protocol were all prescribed. The *qok-/qol-* prefix (dominant in section B, G²=1176, Part 22) is the aquatic/body-domain classifier. The intermediate operator mean (0.85) matches the prescriptive convention: bath temperatures in the mild-to-moderate range (degrees 2–3), rarely extreme (degree 4 = scalding, never prescribed).

**The zodiac (Z) as body-part table.** Each zodiac sign governed a specific body part in medieval astrological medicine (Aries=head, Taurus=neck, ..., Pisces=feet). The nymphs positioned around each zodiac medallion represent the body-part patients for that sign's diseases — not 30 days of the month (the day-number hypothesis was falsified, Part 18) but the 30 degrees of the zodiac sign, each linked to a disease susceptibility and the appropriate treatment.

**New test: `iconographic_map.py`** measures whether section vocabulary divergence and prefix-class dominance align with the predicted iconographic program. Five tests: inter-section JSD matrix; prefix dominance by section; Galenic association alignment; biological section operator structure; record opener types by section.

Artifacts: `iconographic_map.py`, `ALCHEMY_BIOLOGY_READING.md`.

---

## Part 61 — Galenic vs Alchemical: the discriminating prediction (2026-06-11)

Two readings of the operator survived Part 26's drift finding:

**Galenic pharmacological** (preferred): The operator is a property of the substance. Records are compound prescriptions that mix degrees to achieve a target temperament. Within-record operators vary freely (no directional staging). The pharmaceutical section (P) should show higher mean operator values than the raw-plant herbal section (H), because preparation concentrates active qualities.

**Alchemical transformation** (alternative): The operator encodes a purification stage (nigredo=0 → rubedo=3). Records describe a refining process. Within-record operators should trend upward (ascending = purifying). The alchemical reading predicts directional bias; the Galenic reading predicts none.

**`humoral_test.py` Test 4** is the discriminating experiment. If ascending bias is significantly above chance across records, the alchemical reading gains evidence. If not, Galenic compounding remains the dominant interpretation.

**Historical note.** These are not mutually exclusive hypotheses in a 15th-century intellectual context. Paracelsian spagyric alchemy *synthesized* Galenic degree theory with alchemical transformation stages. Pseudo-Lullian alchemical texts (popular in exactly the manuscript's 1404–1438 period) used Galenic quality-degrees as the framework for understanding alchemical transmutation. The Voynich may encode this synthesis — in which case both sets of predictions hold in different subsections (transformation records in P; property listings in H).

Artifacts: `humoral_test.py` (Test 4 specifically).

---

## Part 62 — What this means for the archival search (2026-06-11)

**Reframing the key.** ARCHIVAL_SEARCH.md targeted a nomenclator-style key (symbol↔word pairs). The Galenic/alchemical reading refines the form of the key: it is most likely a **physician's or apothecary's property-term glossary** — a list matching the notation's mid-stems and prefix classes to Galenic attributes (hot, cold, dry, moist) and degrees, organized by body system. Such a document would look like a pharmacopoeia appendix or a teaching commentary, not a simple cipher table.

**New archival targets:**

1. **Rauwolf's botanical glossaries (Rank 4 updated):** Leonhard Rauwolf (1535–1596) was a botanist who catalogued Near-Eastern plants in terms of their Galenic property-degree profiles. His *Kräuterbuch* (1581) and associated manuscript notes may contain property-term vocabularies in the same descriptive tradition as the Voynich. The Widemann/Rauwolf book route (Guzy 2022) connects Rauwolf's library directly to the manuscript's provenance chain. A glossary or teaching notes from Rauwolf's own practice could be the right form of key.

2. **Tractatus de Herbis comparanda (new):** The Italian-Lombard Tractatus de Herbis tradition (Sloane 4016, BL; Wellcome MS 5; Morgan MS 873; Bergamo Biblioteca Civica) uses the same visual iconographic program as the Voynich herbal section — similar plant illustrations, similar arrangement. Systematic comparison of this tradition's attribute vocabulary structure against the Voynich's mid-stem distribution is computationally feasible and could yield direct property-term correspondences.

3. **Pseudo-Lullian alchemical compendiums:** The *Codicillus* and *Testamentum* attributed to Llull (widely circulated c.1380–1450) describe a combinatoric notation for expressing alchemical-medical properties. If the Voynich notation was created within this intellectual tradition (construction on page = Lullian combinatorics, operator = Galenic degree, prefix = body-system classifier), the "key" is the Lullian combinatoric table itself — and this table appears explicitly in Pseudo-Lullian manuscripts held at the BNCF (Florence), BNE (Madrid), and the Bodleian.

**Honest bound.** These archival leads are grounded in the investigation's structural findings, not in word-level readings. No Voynich word has been translated. The claim is: the manuscript's measured structure is *consistent with* Galenic/alchemical medical formulary notation, and this narrows the form and location of any surviving key. The falsification battery (PAPER.md) applies: any proposed key must reproduce all measured structural constraints, not just read a selection of words.

Artifacts: `ALCHEMY_BIOLOGY_READING.md`, archival targets integrated into `ARCHIVAL_SEARCH.md`-compatible framing.

---

## Merge-time addendum to Parts 59–62 (2026-06-12)

The PR shipped these tests hypothesis-level ("results reproducible once data is fetched"). At
merge, with the data pipeline now working (Part 53), both scripts were debugged (one
generator-scoping bug in `humoral_test.py` Test 1) and executed:

- **Test 1 (section degree profiles):** sections differ massively in operator profile
  (between-section variance z ≈ 335; herbal mean 0.33 vs stars/recipes 0.65). Real, but note this
  is the same section/drift covariance measured in Parts 26/56 — it supports *section-bound
  register*, not degrees specifically.
- **Test 4 (the discriminating test, alchemical vs Galenic):** within-record staging is mildly
  **descending** in every section (bias −0.11…−0.37), not ascending. **The alchemical
  transformation-stage reading is rejected**; the Galenic "intrinsic property" reading survives
  (it predicted no *ascending* trend; the small descending bias most plausibly reflects the
  line-position/length gradients of Part 1).
- **Test 5:** pharmaceutical > herbal operator elevation confirmed (Δ = +0.29, z = 18.2), though
  stars/recipes elevate even more (+0.33, z = 32).

Read with Part 56: any Galenic-formulary content must live at the line-record level (per-record
property notation), which is exactly the form these tests address. The interpretation remains
hypothesis-tier — structural consistency, no word read.

---

## Part 63 — Abbreviated Latin under verbose encoding: the constructive test fails, and the residue is sharp (2026-06-12)

**The gap-ledger item.** Part 34 argued the low glyph h2 looks like a verbose-encoding artifact;
S2/S3 (structured cipher / abbreviated language) lived on that argument. The constructive test —
actually building the hypothesized object — was never run. `latin_abbrev_test.py` runs it:
Cappelli-style abbreviation of the Latin corpus (Tironian *et*, -que/-us/-um/-ur sigla,
per/pro/prae contraction, nasal suppression, terminal truncation; aggressiveness 0.5 and 1.0)
then verbose encoding into a 25-glyph alphabet (vowels → single glyphs, consonants/sigla → glyph
digraphs; 3 seeded tables; plus a context-dependent variant with two digraphs per consonant
selected by preceding-letter class). Calibration behaves: Voynich B gap = 1.89 and SHRINKS under
BPE merging (1.91 → 1.40); plain Latin gap = 0.71 and grows (0.72 → 1.65).

**Result: all 12 constructed objects fail the pre-registered pass condition.** They close most of
the entropy gap (up to 1.55 of 1.89; h2 down to 2.67 vs target 1.98) — confirming the *direction*
of the Part 34 argument — but (a) none gets within 15% on the h-statistics, (b) encoded words are
too long (8.1–8.7 glyphs vs 5.09), and decisively (c) **the BPE merge-direction signature never
flips: every encoded variant grows its gap under merging, like every natural language; only real
Voynichese shrinks.**

**The sharp residue.** Whatever produces Voynichese's redundancy, it is not a per-letter verbose
table, however abbreviated the plaintext and however context-dependent the table. The shrink
signature says the redundancy lives at the level BPE can merge *into* — slot-structured word
assembly (prefix/mid/suffix dependencies), which is exactly what the fitted generator has and a
letter-mapping cannot create. S2/S3 survive only in forms whose encoding unit is the
morpheme/slot, not the letter: a nomenclator with slot-structured code groups, or a syllabary
whose sign inventory is itself slot-organized. This is the new sharpest constraint on the
encoding family — and it narrows S2/S3 toward exactly the syllabary-shaped, slot-assembled
system Parts 36/38–40 recovered. Artifacts: `latin_abbrev_test.py`.

---

## Part 64 — The syllable-value search: the last internal route, closed honestly (2026-06-12)

**The step the decipherment path stopped before.** Parts 36/58/63 earned the setup: the unit
inventory is syllabary-shaped, boundary-independent, and the encoding unit is the slot/syllable.
So `decipher_syllmap.py` runs the direct experiment: assign syllable values to the top-40 Voynich
BPE units by simulated annealing (15,000 steps), score a mapping by the fraction of 473 candidate
B words (2–4 units, freq ≥ 3) whose decoded concatenation is an attested lexicon type, against
Cuman (the Part 54 period-Kipchak lexicon — the user's clue gets its final test), Latin, and
Greek.

**The null that keeps it honest:** the *identical* search budget run against 8 letter-scrambled
versions of each lexicon (length profile, letter inventory and syllable-frequency shape
preserved; the words are not words). This is the garden-of-forking-paths control most published
"decipherments" skip — and it is decisive here, because the annealer is powerful enough to "decode"
~57–74 % of candidate words into *scrambled pseudo-lexicons* too.

| target lexicon | types | best hit-rate (real) | null (scrambled) | z |
|---|---|---|---|---|
| Cuman | 2,625 | 0.573 | 0.580 ± 0.062 | **−0.1** |
| Latin | 2,012 | 0.605 | 0.574 ± 0.060 | **+0.5** |
| Greek | 14,074 | 0.742 | 0.690 ± 0.060 | **+0.9** |

**Verdict (pre-registered): |z| < 3 everywhere — no syllable-value assignment beats chance.**
Real lexicons offer no more decoding traction than scrambled ones. Two readings, both useful:
(1) if a language is underneath, its lexicon is not among these three at this granularity, or the
mapping is not unit→syllable injective (e.g., nomenclator-style many-to-one, or the units carry
values only jointly with position — consistent with the slot-grammar of P63); (2) the result
*quantifies* why amateur "translations" feel so convincing: with a big enough dictionary and a
free mapping, more than half of Voynichese decodes into *anything*, including non-words. The
57–74 % null hit-rates are the measured size of the illusion every failed decipherment fell into.

**The honest map is now complete in both directions.** Internal routes: grid, phonotactics,
cribs, function words, numerals, reading geometry, agglutination, harmony, syllable values — all
tried, all published, all closed. The key, if it exists, is archival. ARCHIVAL_SEARCH.md remains
the live path. Artifacts: `decipher_syllmap.py`.