Seven arrows, 429 trees: a Mermaid cleanup uncovered a 7-year naming bug

Composed-state naming in a Turing-machine engine was non-injective since 2019: one name, up to C_7 = 429 distinct machines. v7 fixes it with parens.

Two Turing machines walked into a diagram. They had the same name.

What I really mean is: two distinct wrapper states in @turing-machine-js/machine, constructed differently, with different runtime behavior, and with byte-identical state.name. I’ve been developing the library as a hobby since 2019, and I’d missed this for seven years.

I caught it sideways, starting from what looked like a Mermaid cleanup ticket. It ended in a redesign of how state composition is encoded in the name. And along the way I learned that tucked inside that redesign is the number $C_{7} = 429$ .

Archer with bow drawn, contemplating which tree of a forest his arrow will hit

Where it started

Composition in the engine is one primitive: bareState.withOverrodeHaltState(overrideState). It returns a copy of bareState whose would-be halt transitions fall through to overrideState. From here on I’ll call bareState the bare state and the returned copy a wrapper over it. Composition via wrapping is the only way to build a bigger machine out of smaller halt-on-completion subroutines.

The wrapper’s name, in the old API, was formed like this:

state.#name = `${this.name}>${override.name}`;

So A.withOverrodeHaltState(B) is named A>B. Simple, readable: “A, then B.”

That line shipped in 2019 and hadn’t been touched since. It’s the name logs use, the name error messages use, the name tests compare against, and the name any user-facing diagnostic surfaces.

The investigation began with the Mermaid output. turing-machine-js#138 asked for a cleaner emit. Before v7, toMermaid rendered a wrapper as a separate node whose transitions were duplicated from the bare — visually it looked like the wrapper had a life of its own. But the wrapper’s transitions are the bare’s — even #symbolToDataMap is the same reference, not a copy. What it does own is call-frame semantics: push the override onto the halt-stack on entry, pop on the bare’s halt. Behaviorally, the wrapper adds one thing on top of the bare: where halt goes.

Roughly, this is what it looked like:

A and A>B are two nodes with literally the same outgoing transition to A1, plus the wrapper gets a dashed onHalt arrow to B. The reader sees “two distinct machines,” when behaviorally it’s one machine with a single point of divergence — where halt goes.

The first fix attempt looked reasonable: render the wrapper as a single node that reuses the bare’s transitions, with the override-halt as a separate arrow:

One A>B node, transitions reused from the bare’s graph, override-halt as a separate dashed arrow from halt to B. Clean, compact, no visual noise. Almost worked.

Almost worked — then broke

Take minusOne from library-binary-numbers — it’s built by nested wrapping:

const minusOne = invertNumber
  .withOverrodeHaltState(
    plusOne
      .withOverrodeHaltState(
        invertNumber
          .withOverrodeHaltState(normalizeNumber),
      ),
  );

Composite name: invertNumber>plusOne>invertNumber>normalizeNumber — three > separators in real code, and already one base (invertNumber) appears twice. When a name gets longer, it’s tempting to read it as a walking instruction. Take A>B>B>A>A>B>B>A (seven separators): each letter is a bare state you land in after the previous one halts. The sequence of bases visited should equal the name.

Except that’s wishful reading. The name doesn’t formally determine the traversal. It resembles it. And only if the chain was constructed the one specific way I happened to picture when I wrote the naming rule in 2019.

The catch

Take a smaller example: A>B>A. How many ways to build it?

At least two:

// Construction 1
const c1 = A.withOverrodeHaltState(B.withOverrodeHaltState(A));

// Construction 2
const c2 = A.withOverrodeHaltState(B).withOverrodeHaltState(A);

Both flatten to the string A>B>A under ${this.name}>${override.name}. But the wrapper topologies differ — each construction builds its own binary tree (internal • nodes are withOverrodeHaltState applications, leaves are plain State instances):

And the runtime behavior differs too:

Construction 1 (A.wohs(B.wohs(A))): A’s halt goes to a wrapper “B with halt redirected to A”. Runtime walk: A → B → A.
Construction 2 (A.wohs(B).wohs(A)): the wrapper “A with halt redirected to B” is wrapped once more — halt now goes to A. Runtime walk: A → A. The inner B override is dead: only the outermost .wohs()’s override makes it onto the halt stack.

Same name. Just because the flat > separator doesn’t remember where in the construction tree each arrow sat. And there was a tree there.

The tree we forgot

withOverrodeHaltState is a binary operation: a left argument (this) and a right argument (override). Each application is a new node in the tree. The node has two children: the bare and the override. If a child is itself a wrapper, the tree continues there with another node; otherwise it’s a plain State — a leaf. After N applications: N wrapper-nodes and N+1 leaves — standard binary-tree property. For minusOne above: 3 applications (3 wrapper-nodes), 4 leaves (invertNumber, plusOne, invertNumber, normalizeNumber). Here’s the tree explicitly:

The name, meanwhile, is flat: a concatenation joined by >. A binary tree gets serialized into a string, and the string can be reversed back into the tree only if the naming convention is bijective. The flat > notation isn’t.

How many trees can correspond to one name with N separators? The standard answer from combinatorics: the Catalan number $C_{N}$ .

$C_{n} = \frac{1}{n + 1} (\binom{2 n}{n})$ — the sequence that counts: the number of binary trees with n+1 leaves, the number of balanced-paren strings of length 2n, the number of Dyck paths of length 2n, the number of triangulations of an (n+2)-gon, and a dozen other isomorphic combinatorial objects. The kind of number that turns up everywhere.

The most direct angle for our case: parenthesization. For an arbitrary binary operator •, the number of ways to parenthesize a • b • c • ... (with n+1 operands) is $C_{n}$ . withOverrodeHaltState is exactly such an operator — each parenthesization corresponds to a distinct composition tree.

The first eight values: $C_{0} = 1$ , $C_{1} = 1$ , $C_{2} = 2$ , $C_{3} = 5$ , $C_{4} = 14$ , $C_{5} = 42$ , $C_{6} = 132$ , $𝑪_{7} = 429$ .

For our A>B>B>A>A>B>B>A, that means 429 different construction trees, each with its own parent-child wiring, all collapsing to the same string. And 429 isn’t a theoretical upper bound — it’s the exact number of distinct structures hiding behind that one name.

A smaller example — A>B>C>D (three separators). By the formula, $C_{3} = 5$ ; here are the five construction trees, written as code:

// 1. fully left-associative
((A.wohs(B)).wohs(C)).wohs(D)

// 2.
(A.wohs(B.wohs(C))).wohs(D)

// 3.
(A.wohs(B)).wohs(C.wohs(D))

// 4.
A.wohs((B.wohs(C)).wohs(D))

// 5. fully right-associative
A.wohs(B.wohs(C.wohs(D)))

All five collapse to the same flat string A>B>C>D. Under the paren notation v7 will ship, each gets a distinct name — more on that below.

The fix is parens

If a flat separator throws the tree away, parens keep it. v7 (turing-machine-js#148) changes the naming rule to:

state.#name = `${this.name}(${override.name})`;

A.withOverrodeHaltState(B) is now named A(B). Back to our two A>B>A constructions:

// Construction 1: A(B(A))
const c1 = A.withOverrodeHaltState(B.withOverrodeHaltState(A));

// Construction 2: A(B)(A) — in pure paren notation
const c2 = A.withOverrodeHaltState(B).withOverrodeHaltState(A);

Same characters, different brackets, different trees. Names distinguish them. Bijection!

And this isn’t an invented convention. Balanced-paren strings of length 2n are the canonical encoding of binary trees with n+1 leaves in combinatorics — the same bijection underlying one of the foundational counting results for Catalan numbers. A better convention wasn’t designed — what gets recovered is the structure the > notation was always discarding.

The cost: state names from users can no longer contain ( or ) — the State constructor throws. Parens are reserved for composition. In exchange, > is no longer reserved and is fine in user-provided names again.

The other half of the story

The clean paren bijection above is about naming. v7 also cleans up the runtime alongside. turing-machine-js#176 adds chain-collapse: inside withOverrodeHaltState, we now do bare = this.#bareState ?? this. So A.wohs(B).wohs(A) is recognized as “wrapping an already-wrapped A” — the inner override (B) is dead at runtime anyway (only the outermost wrapper’s override gets pushed onto the halt-stack), and instead of A(B)(A) you get A(A).

So in actual v7, Construction 2 doesn’t give you A(B)(A) — at construction time it collapses to A(A). Paren naming on its own is bijective (one tree, one name); chain-collapse adds a step on top, removing trees that are indistinguishable at runtime. The composite name in v7 describes how the machine behaves, not how it was built.

And while we were touching the operator: the method gets renamed. withOverrodeHaltState → withOverriddenHaltState (turing-machine-js#149). Past-tense overrode never fit the idiom — it should have been the past participle overridden. A seven-year grammar typo on a public API. Hard cutover, no alias.

Other places this shows up

Non-injective serialization is a general pattern. If your codebase has a “name” formed by flat-separator concatenation of something tree-shaped, the odds of a C_n bug aren’t zero. A few examples:

File-extension chains — .tar.gz, .spec.ts, .d.ts. Works by convention because the extensions are well-known. Put a dot in the filename and the ambiguity is immediate: is my.app.tar.gz the basename my.app with extension .tar.gz, or the basename my.app.tar with extension .gz? Different tools answer differently.
DB column aliases with _ separators: user_address_country. If user_address is already a table name (underscore is part of the identifier), then user_address_country is ambiguous: is it user.address_country or user_address.country? Fix via qualified namespacing or explicit aliases.
Module paths in Java/Python — com.example.foo.Bar. Could be class Bar in package com.example.foo or nested class foo.Bar in package com.example. Resolved by convention (uppercase = class), not by the naming itself.

Code-review heuristic: if your “identifier” is a flat-separator concatenation of something tree-shaped, look for the C_n bug. It’s almost always latent: in production the names happen not to collide on the actual data, but architecturally the hole is always there.

In my library the C_n bug hid for seven years because no one was trying to catch the engine on a name collision. To find it, you’d have to know to construct two structurally distinct trees and check whether their names differ. Nobody writes that test until they hear the collision is possible.

What v7 ships

Three tickets that were planned as three separate changes but, on closer look, are the same thought:

Paren-based naming (turing-machine-js#148). The bijection between composition trees and their names.
Cleaner Mermaid emit (turing-machine-js#138, turing-machine-js#174). Wrappers emit as separate [[composite-name]] nodes; the bare’s reachable subtree becomes a subgraph callable subtree of NAME block. The structural clarity of the name shows up as visual clarity in the diagram:

Wrapper — call site; bare’s body — callable subtree; halt — return. Composition in the engine literally became a function call.
withOverrodeHaltState → withOverriddenHaltState (turing-machine-js#149). Grammar on a public API.

All three are one idea: composition is tree-shaped, and its representations should keep the tree, not smear it.

Closing

A visualization cleanup turned into a naming overhaul. That’s the same pattern I described in “Two majors, one README, one demo”: a design review tests can’t run catches what tests can’t see. There it was docs and the first consumer. Here it’s visualization. The genre is the same: put yourself in front of the task of showing the structure, and if the structure is bent, it starts protruding. Tests don’t do that — tests check behavior, and structural bentness is usually visible only in the representation.

Seven arrows, 429 trees. A name that looked like a traversal trace turned out to be one of 429 possible ones. v7 gives each its own name — and lets the visualization show how that tree actually grew. That the wrapper IS a call-frame and deserves its own type — that’s another story (turing-machine-js#213).