pith. machine review for the scientific record. sign in

arxiv: 2605.01560 · v1 · submitted 2026-05-02 · 💻 cs.PL · cs.HC

Recognition: unknown

FlowBook: Enforcing Reproducibility in Computational Notebooks

Cormac Flanagan, Emery D. Berger, Eunice Jun, Stephen N. Freund

Authors on Pith no claims yet

Pith reviewed 2026-05-09 17:29 UTC · model grok-4.3

classification 💻 cs.PL cs.HC
keywords reproducibilityflowbooknotebookscellcellscomputationaldependencynotebook
0
0 comments X

The pith

FlowBook enforces notebook reproducibility by checking whether top-to-bottom execution from an empty state matches recorded outputs, using dynamic read/write tracking with near-zero overhead.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Computational notebooks allow cells to be executed in any order, which is useful for exploration but creates hidden state and implicit dependencies. This often means that running the same notebook interactively produces different results than a clean top-to-bottom execution. The paper defines a notebook as reproducible exactly when executing every cell from top to bottom starting from an empty store produces the same outputs currently shown. FlowBook implements this check through a dynamic analysis that records what each cell reads and writes at its boundaries. It uses these sets to identify stale cells whose outputs may no longer be valid and to prevent user actions that would break the reproducibility guarantee. The system reports a median latency overhead of only 70 milliseconds. This approach deliberately avoids building precise dependency graphs or forcing a reactive dataflow model, trading some precision for simplicity and performance.

Core claim

a notebook is reproducible if and only if executing its cells in top-to-bottom order from an empty store produces exactly the outputs currently recorded. We formalize this notion of reproducibility and present FlowBook, which implements a dynamic analysis that enforces reproducibility by tracking read and write sets at cell boundaries.

Load-bearing premise

That tracking read and write sets only at cell boundaries is sufficient to accurately detect all stale cells and prevent all reproducibility violations without missing hidden dependencies or introducing unacceptable false positives.

read the original abstract

Computational notebooks are notoriously prone to reproducibility failures. By permitting out-of-order cell execution, notebooks accumulate hidden state and implicit dependencies that cause interactive executions to silently diverge from clean top-to-bottom runs. Prior approaches either employ dependency analyses or enforce reactive dataflow models that face fundamental tradeoffs among expressiveness, precision, and performance. This paper exploits the insight that reproducibility can be enforced without precise dependency tracking: a notebook is reproducible if and only if executing its cells in top-to-bottom order from an empty store produces exactly the outputs currently recorded. We formalize this notion of reproducibility and present FlowBook, which implements a dynamic analysis that enforces reproducibility by tracking read and write sets at cell boundaries. FlowBook detects stale cells whose recorded outputs may no longer reflect the current notebook state and prevents operations that would violate reproducibility. FlowBook incurs near-imperceptible latency overhead (median: 70 ms).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on a newly introduced domain assumption about what constitutes reproducibility and on the effectiveness of boundary-level read/write tracking.

axioms (1)
  • domain assumption A notebook is reproducible if and only if top-to-bottom execution from an empty store matches the recorded outputs.
    This equivalence is the central formalization presented in the abstract.

pith-pipeline@v0.9.0 · 5454 in / 1251 out tokens · 55261 ms · 2026-05-09T17:29:34.524436+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.