arxiv: 2604.11021 · v2 · submitted 2026-04-13 · 💻 cs.PL

Recognition: unknown

Emulation-Completeness of Programming Languages

Gregory Morse , Tam\'as Kozsik

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:15 UTC · model grok-4.3

classification 💻 cs.PL

keywords emulation-completenessself-emulationTuring-completenessruntime behaviorErlangreflective executionsandboxingprogramming language semantics

0 comments

The pith

A language can emulate its own programs only by reproducing every guest-visible state, not just final results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that Turing-completeness falls short for self-emulation because programs depend on observable details such as control flow, exceptions, callbacks, timing, memory use, and runtime metadata like stack traces. It introduces a taxonomy that separates source-level evaluation from compiled-code emulation and weak from strong completeness based on how much runtime behavior must stay visible. Requirements are split into language-side needs, which require the semantics to be representable inside the language itself, and emulator-side needs, which require the emulator to mask or reproduce observations faithfully. Concrete mismatches in Erlang illustrate the gaps. The framework supplies a shared vocabulary for language designers working on sandboxes, decompilers, and reflective systems.

Core claim

Emulation-completeness requires a self-emulator to compute the guest program's result while also accounting for all guest-visible state, including control flow, exceptions, callbacks, timing, memory usage, and runtime metadata such as stack traces or line numbers. The paper defines syntactic and compiled-code variants of this property, distinguishes weak from strong versions according to preserved observable behavior, and organizes the necessary conditions into language-side representability and emulator-side faithfulness.

What carries the argument

Emulation-completeness, the property that a language hosts a self-emulator reproducing both computation results and all guest-visible runtime observations without delegating to the host evaluator.

If this is right

Language designers must expose enough internal semantics so that exceptions and callbacks can be represented explicitly inside the language.
Emulators must be able to mask or reproduce every observation that the guest program could inspect at runtime.
Gaps appear in practice when languages impose limits on arguments, bitstring matching, or message queues that the emulator cannot mirror.
The taxonomy guides construction of secure sandboxes and decompilers that aim to hide their own presence from the guest.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the classification holds, reflective execution environments could be built without hidden host dependencies.
The same split between language-side and emulator-side requirements may apply to verifying isolation properties in virtual machines.
Testing the framework on additional languages could expose hidden assumptions in how their runtimes expose metadata.
A concrete next step is to attempt self-emulator construction for a small language and measure fidelity on timing and exception observations.

Load-bearing premise

Every relevant guest-visible state and behavior can be classified cleanly into language-side representability or emulator-side faithfulness without leaving out real-world execution details.

What would settle it

A program in Erlang that produces observably different stack traces, timing, or message reception when run directly versus when executed by a self-emulator written in Erlang.

read the original abstract

We study when a programming language can emulate programs written in that same language without delegating the guest program back to the host evaluator or compiler. We call this property emulation-completeness. The central observation is that Turing-completeness by itself is not enough: a self-emulator must not only compute the guest program's result, but must also account for the guest-visible state on which realistic programs depend, including control flow, exceptions, callbacks, timing, memory usage, and runtime metadata such as stack traces or line numbers. This paper is a systematization paper. Its contribution is not a new emulator implementation, but a precise vocabulary and a structured taxonomy for reasoning about self-emulation. We distinguish source-level evaluation from compiled-code emulation, define syntactic and compiled-code emulation-completeness, and separate weak from strong emulation-completeness according to how much observable runtime behavior must be preserved. We then organize the requirements into two classes: language-side requirements, which determine whether the guest semantics can be represented explicitly inside the language, and emulator-side requirements, which determine whether the resulting emulator can faithfully mask or reproduce relevant observations. The discussion is grounded by concrete examples, including publicly documented details from Erlang, where argument limits, bitstring pattern matching, and message reception expose subtle mismatches between direct execution and self-emulation. The resulting framework is intended as guidance for language designers, implementers of evaluators and emulators, and researchers interested in secure sandboxing, decompilation, and reflective execution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines emulation-completeness and shows Turing-completeness alone is not enough for realistic self-emulation, with a taxonomy that separates source versus compiled and weak versus strong cases.

read the letter

The main thing to know is that this paper treats self-emulation as more than just running the same computations. It argues a language needs to expose and reproduce guest-visible details like control flow, exceptions, callbacks, timing, memory use, and metadata such as stack traces. Turing-completeness does not guarantee that coverage, and the authors use Erlang cases on argument limits, bitstring matching, and message reception to illustrate the gaps.

Referee Report

2 major / 3 minor

Summary. The paper claims that Turing-completeness is insufficient for self-emulation in programming languages because a self-emulator must preserve all guest-visible states, including control flow, exceptions, callbacks, timing, memory usage, and runtime metadata. It contributes a systematization via a taxonomy that distinguishes source-level evaluation from compiled-code emulation, defines syntactic and compiled-code emulation-completeness, separates weak from strong variants, and organizes requirements into language-side representability versus emulator-side faithfulness, with the discussion grounded in Erlang examples such as argument limits, bitstring pattern matching, and message reception.

Significance. If the taxonomy is adopted, it supplies a structured vocabulary that could usefully inform language designers, emulator implementers, and work on sandboxing, decompilation, and reflective execution. The paper earns credit for its explicit separation of concerns and for using publicly documented Erlang details to illustrate mismatches rather than relying solely on abstract argument.

major comments (2)

[Section 3] The definitions of syntactic emulation-completeness and compiled-code emulation-completeness (introduced after the central observation) remain informal and lack a precise mathematical formulation or inductive clauses; this is load-bearing because the claimed precision of the vocabulary and the distinction between weak and strong variants rest on these definitions.
[Section 5] The Erlang examples (argument limits, bitstring matching, message reception) are invoked to ground the claim that Turing-completeness is insufficient, yet they are presented at a high level without concrete code fragments, observable state differences, or a systematic check against the language-side versus emulator-side classification; this weakens the evidential support for the taxonomy's practical utility.

minor comments (3)

[Abstract] The abstract states that the framework is 'intended as guidance' but does not list the concrete applications (sandboxing, decompilation, reflective execution) until the final paragraph; moving this forward would improve readability.
[Section 4] Notation for the two classes of requirements (language-side and emulator-side) is introduced without an accompanying table or diagram that cross-references them to the weak/strong and source/compiled distinctions.
[Section 2] A few sentences contain repeated phrasing (e.g., 'guest-visible state' appears in consecutive paragraphs without variation); minor rewording would enhance flow.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, the positive overall assessment, and the constructive suggestions for improving the precision and evidential support of the taxonomy. We address each major comment below and will incorporate the revisions in the next version of the manuscript.

read point-by-point responses

Referee: [Section 3] The definitions of syntactic emulation-completeness and compiled-code emulation-completeness (introduced after the central observation) remain informal and lack a precise mathematical formulation or inductive clauses; this is load-bearing because the claimed precision of the vocabulary and the distinction between weak and strong variants rest on these definitions.

Authors: We agree that the current presentation of the definitions would benefit from greater formality to support the claimed precision of the vocabulary. In the revised manuscript we will supply inductive definitions for both syntactic emulation-completeness and compiled-code emulation-completeness. The clauses will explicitly capture the preservation (or masking) of guest-visible states for the weak and strong variants, and will make the separation between language-side representability and emulator-side faithfulness part of the formal statement. This change directly addresses the load-bearing concern while remaining consistent with the systematization focus of the paper. revision: yes
Referee: [Section 5] The Erlang examples (argument limits, bitstring matching, message reception) are invoked to ground the claim that Turing-completeness is insufficient, yet they are presented at a high level without concrete code fragments, observable state differences, or a systematic check against the language-side versus emulator-side classification; this weakens the evidential support for the taxonomy's practical utility.

Authors: We accept that the examples would be more effective if presented with greater concreteness. In the revision we will add short, self-contained code fragments for each of the three Erlang cases, describe the precise observable differences (for instance, the exception raised on argument-limit violation versus the behavior under self-emulation), and include an explicit mapping of each mismatch to the language-side versus emulator-side requirements. These additions will strengthen the grounding of the taxonomy without altering the overall narrative or length of the section. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a systematization paper whose contribution consists entirely of definitions, distinctions (source-level vs. compiled-code, weak vs. strong), and a taxonomy separating language-side representability from emulator-side faithfulness. No equations, fitted parameters, or derivation steps appear; the central observation that Turing-completeness is insufficient is illustrated by external Erlang examples rather than derived from prior results by the same authors. All claims are therefore self-contained and do not reduce to their own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on domain assumptions about what counts as observable runtime state and introduces a new conceptual entity without independent empirical or formal evidence beyond illustrative examples.

axioms (1)

domain assumption Guest-visible state includes control flow, exceptions, callbacks, timing, memory usage, and runtime metadata such as stack traces or line numbers
Invoked as the basis for defining what a self-emulator must preserve.

invented entities (1)

emulation-completeness no independent evidence
purpose: To classify the self-emulation capability of programming languages
Newly coined term whose utility depends on the taxonomy being adopted; no external falsifiable prediction is supplied.

pith-pipeline@v0.9.0 · 5561 in / 1281 out tokens · 43526 ms · 2026-05-10T16:15:38.118494+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

8 extracted references

[1]

A history of erlang

Joe Armstrong. A history of erlang. InProceedings of the Third ACM SIGPLAN Conference on History of Programming Languages, HOPL III, pages 6–1–6–26, New York, NY, USA,
[2]

Erlang reference manual - erlang module.http://erlang.org/doc/ man/erlang.html#bump_reductions-1, 2018

Erlang/OTP Team. Erlang reference manual - erlang module.http://erlang.org/doc/ man/erlang.html#bump_reductions-1, 2018. Accessed: 2018-10-30

2018
[3]

Erlang runtime system primative receive evaluation module source code.https://github.com/erlang/otp/blob/master/erts/preloaded/src/prim_eval

Erlang/OTP Team. Erlang runtime system primative receive evaluation module source code.https://github.com/erlang/otp/blob/master/erts/preloaded/src/prim_eval. S, 2018. Accessed: 2018-10-30

2018
[4]

Erlang standard library bitstring evaluation module source code

Erlang/OTP Team. Erlang standard library bitstring evaluation module source code. https://github.com/erlang/otp/blob/master/lib/stdlib/src/eval_bits.erl, 2018. Accessed: 2018-10-30

2018
[5]

Erlang standard library evaluation module source code.https:// github.com/erlang/otp/blob/master/lib/stdlib/src/erl_eval.erl, 2018

Erlang/OTP Team. Erlang standard library evaluation module source code.https:// github.com/erlang/otp/blob/master/lib/stdlib/src/erl_eval.erl, 2018. Accessed: 2018-10-30

2018
[6]

Detecting environment-sensitive malware

Martina Lindorfer, Clemens Kolbitsch, and Paolo Milani Comparetti. Detecting environment-sensitive malware. In Robin Sommer, Davide Balzarotti, and Gregor Maier, editors,Recent Advances in Intrusion Detection, pages 338–357, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg

2011
[7]

Detecting system emulators

Thomas Raffetseder, Christopher Kruegel, and Engin Kirda. Detecting system emulators. In Juan A. Garay, Arjen K. Lenstra, Masahiro Mambo, and René Peralta, editors,Information Security, pages 1–18, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg. 12

2007
[8]

Beard, and Naga Praveen Kumar Katta

Jeff Terrace, Stephen R. Beard, and Naga Praveen Kumar Katta. Javascript in javascript (js.js): Sandboxing third-party scripts. InPresented as part of the 3rd USENIX Conference on Web Application Development (WebApps 12), pages 95–100, Boston, MA, 2012. USENIX. 13

2012