Two majors, one README, one demo: two cheap design reviews

Of the three majors covered by the previous article, two never surfaced in the test suite. They surfaced in the two design reviews tests can’t run.

Writing the docs is what surfaced both mistakes. There’s a meta-lesson in there about how docs are the cheapest design review you can run, but that’s another post.

This is “that other post” — and the first thing it has to do is correct the teaser. The teaser oversells. Docs caught one of the two mistakes. The other was caught by the first real consumer of the API, which I was building in parallel. The two reviews worked in tandem: docs review the shape of an API, the consumer reviews the use of it. Together they catch what tests can’t see.

If you ship anything behind an interface — a library, a CLI, any entity behind a contract — these are the two reviews you don’t want to skip.

Screenshot of demo.machines.mellonis.ru running a Brainfuck UTM: multi-tape view on the left, engine source on the right

Recap

“Three majors, two mistakes” covers the engine and the v4 pause API — onStep, onDebugBreak, per-state debug flags — in detail. I’ll lean on it here without rehashing it. The v4 → v6 trajectory shipped three breaking majors: a hook rename, a halt-semantics hardening, and a dispatch-tick collapse. The first two surfaced in the demo. The third surfaced in the docs.

The demo case: v4 → v5

While shipping v4 I started building machines-demo — an interactive Turing-machine debugger and the engine's first non-test client.

The demo is a natural first consumer: it has a dual purpose. The product goal — public distribution and showing the engine in action. The technical goal — road-testing changes and validating concepts against a live API surface. Both goals make building the demo an integral part of the release cycle, not an optional add-on.

The demo used both hooks at once: onStep populated a per-iteration command buffer for the trace UI; onDebugBreak drove the pause/resume cycle.

The demo built. The tests passed. But it was uncomfortable to write, for two reasons.

First, onDebugBreak’s after-fire came with data from the previous yield — the same data the previous onStep had already shown. The demo processed the same thing twice, and the question “why two hooks for one event?” was being asked not by me but by the code itself, in a way. I filed turing-machine-js#109 as an RFC about the relationship between these hooks, listing four sketches; the resolution narrowed to naming. onDebugBreak framed the purpose of use as “debugging”, while the consumer’s verb was “pause”. Shameless rename, no alias. v5.

Second, the demo’s UI gained a “pause before halt” scenario — letting the user glimpse the machine’s final state before it shuts down. The natural implementation: a debug flag on haltState itself. The first test case set haltState.debug = { before: true, after: true } because the symmetry looked right. Only before fired. Worse: the after-fire on the iteration that led to halt never reached the consumer — the loop exited as soon as state.isHalt became true. turing-machine-js#108 split it in two: restore the lost after-fire (bug); throw on assignment to haltState.debug.after (API).

Both complaints came from the consumer side. The demo didn’t surface a code bug — it surfaced an API interaction format. The names didn’t fit the use. Being permissive in input didn’t match what the user was trying to do. The tests verified behavior against the engine’s own internal model — ok, green. The demo verified behavior against the consumer’s mental model — and produced two specific complaints (not ok).

The docs case: v5 → v6

v5 shipped. The README needed updating. The new dispatch-order section had to explain — in words — when each of onPause(before), onStep, and onPause(after) fired relative to the iteration they described.

The first honest paragraph went something like:

onPause(after, K) fires on iteration K+1’s yield, with the payload substituted from iteration K’s snapshot, before onPause(before, K+1) or onStep(K+1) fire.

I stared at that sentence for a while. There was no shorter version. The reader wasn’t supposed to need a sentence about substitution.

The code worked. The tests passed. The demo consumed the hooks correctly. The mistake wasn’t in any of those — it was in the shape of the dispatch, and that shape was only visible when you had to put it into words.

The fix collapsed the lifecycle: before(K) → step(K) → after(K) on the same yield. No substitution. No cross-iteration scheduling. No final drain for the halting iteration (in “Three majors, two mistakes” I called it “post-loop drain”; I’d probably shorten it to “final drain” now). The README paragraph now reads:

On iteration K’s yield, hooks fire in lifecycle order.

And that’s the kind of sentence the reader glides past without effort. turing-machine-js#119 shipped as v6.

A direct quote from the previous article:

The code worked, the tests passed, and the docs were correct. The shape was just wrong.

What I’d add: the docs were correct only on the condition that the reader accepted three explanatory sentences they shouldn’t have had to read. That’s not “correct docs” — that’s docs apologizing for the shape.

The docs review caught the shape, not the use. The demo worked. The tests passed. This time, only prose pressure uncovered the issues hiding in the implementation.

Three reviews, three layers

Here’s what each one checks:

Tests verify code against itself. Internal consistency. The engine yields what the engine should yield. Green tests are the baseline.
The first real consumer verifies code against a consumer’s mental model. Does the API format match what the user is trying to do? Reviewing real interaction with the API forced API changes: a rename and a halt-after restriction.
The docs as prose (not JSDoc — a connected README narrative) verify code against the author’s own explanation. Does the design have an honest one-paragraph description? Without one, we had to look hard at the substitution dance — and abandon it.

Each review catches what the layer below misses. Tests don’t catch shape; the consumer doesn’t catch interaction problems it can skillfully bypass; the docs don’t catch UX wrinkles they don’t have to mention.

The cost is asymmetric. Building a real consumer is the most expensive of the three reviews; writing the docs is the cheapest. Still, even the most expensive costs less than reworking the API after the problems surface for users — hence “cheap” in the title. The real consumer has to be built before the major cut — it’s the only review that makes the API comfortable to use. Docs then paper over what’s left.

Heuristics for the next major

Build the first real consumer before cutting the major. Not a test fixture — a consumer with its own mental model. The difficulties it encounters are the same difficulties your users will encounter later.
Write the dispatch-order paragraph before locking the dispatch order. If you can’t describe in one sentence what fires when, the dispatch is wrong. Decisions of this kind belong at the design stage, not documentation — when not a single line of code has been written yet.
Read the docs back as a stranger. If a paragraph reads as an apology for the shape, the docs are working — they’ve revealed a shape mistake. Fix the shape, not the paragraph.

Three majors. One README. One demo. The tests had nothing to say.

Code: turing-machine-js (engine) and machines-demo (the first non-test consumer where v4 → v5 surfaced).