Recognition: unknown
Emulation-Completeness of Programming Languages
Pith reviewed 2026-05-10 16:15 UTC · model grok-4.3
The pith
A language can emulate its own programs only by reproducing every guest-visible state, not just final results.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Emulation-completeness requires a self-emulator to compute the guest program's result while also accounting for all guest-visible state, including control flow, exceptions, callbacks, timing, memory usage, and runtime metadata such as stack traces or line numbers. The paper defines syntactic and compiled-code variants of this property, distinguishes weak from strong versions according to preserved observable behavior, and organizes the necessary conditions into language-side representability and emulator-side faithfulness.
What carries the argument
Emulation-completeness, the property that a language hosts a self-emulator reproducing both computation results and all guest-visible runtime observations without delegating to the host evaluator.
If this is right
- Language designers must expose enough internal semantics so that exceptions and callbacks can be represented explicitly inside the language.
- Emulators must be able to mask or reproduce every observation that the guest program could inspect at runtime.
- Gaps appear in practice when languages impose limits on arguments, bitstring matching, or message queues that the emulator cannot mirror.
- The taxonomy guides construction of secure sandboxes and decompilers that aim to hide their own presence from the guest.
Where Pith is reading between the lines
- If the classification holds, reflective execution environments could be built without hidden host dependencies.
- The same split between language-side and emulator-side requirements may apply to verifying isolation properties in virtual machines.
- Testing the framework on additional languages could expose hidden assumptions in how their runtimes expose metadata.
- A concrete next step is to attempt self-emulator construction for a small language and measure fidelity on timing and exception observations.
Load-bearing premise
Every relevant guest-visible state and behavior can be classified cleanly into language-side representability or emulator-side faithfulness without leaving out real-world execution details.
What would settle it
A program in Erlang that produces observably different stack traces, timing, or message reception when run directly versus when executed by a self-emulator written in Erlang.
read the original abstract
We study when a programming language can emulate programs written in that same language without delegating the guest program back to the host evaluator or compiler. We call this property emulation-completeness. The central observation is that Turing-completeness by itself is not enough: a self-emulator must not only compute the guest program's result, but must also account for the guest-visible state on which realistic programs depend, including control flow, exceptions, callbacks, timing, memory usage, and runtime metadata such as stack traces or line numbers. This paper is a systematization paper. Its contribution is not a new emulator implementation, but a precise vocabulary and a structured taxonomy for reasoning about self-emulation. We distinguish source-level evaluation from compiled-code emulation, define syntactic and compiled-code emulation-completeness, and separate weak from strong emulation-completeness according to how much observable runtime behavior must be preserved. We then organize the requirements into two classes: language-side requirements, which determine whether the guest semantics can be represented explicitly inside the language, and emulator-side requirements, which determine whether the resulting emulator can faithfully mask or reproduce relevant observations. The discussion is grounded by concrete examples, including publicly documented details from Erlang, where argument limits, bitstring pattern matching, and message reception expose subtle mismatches between direct execution and self-emulation. The resulting framework is intended as guidance for language designers, implementers of evaluators and emulators, and researchers interested in secure sandboxing, decompilation, and reflective execution.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that Turing-completeness is insufficient for self-emulation in programming languages because a self-emulator must preserve all guest-visible states, including control flow, exceptions, callbacks, timing, memory usage, and runtime metadata. It contributes a systematization via a taxonomy that distinguishes source-level evaluation from compiled-code emulation, defines syntactic and compiled-code emulation-completeness, separates weak from strong variants, and organizes requirements into language-side representability versus emulator-side faithfulness, with the discussion grounded in Erlang examples such as argument limits, bitstring pattern matching, and message reception.
Significance. If the taxonomy is adopted, it supplies a structured vocabulary that could usefully inform language designers, emulator implementers, and work on sandboxing, decompilation, and reflective execution. The paper earns credit for its explicit separation of concerns and for using publicly documented Erlang details to illustrate mismatches rather than relying solely on abstract argument.
major comments (2)
- [Section 3] The definitions of syntactic emulation-completeness and compiled-code emulation-completeness (introduced after the central observation) remain informal and lack a precise mathematical formulation or inductive clauses; this is load-bearing because the claimed precision of the vocabulary and the distinction between weak and strong variants rest on these definitions.
- [Section 5] The Erlang examples (argument limits, bitstring matching, message reception) are invoked to ground the claim that Turing-completeness is insufficient, yet they are presented at a high level without concrete code fragments, observable state differences, or a systematic check against the language-side versus emulator-side classification; this weakens the evidential support for the taxonomy's practical utility.
minor comments (3)
- [Abstract] The abstract states that the framework is 'intended as guidance' but does not list the concrete applications (sandboxing, decompilation, reflective execution) until the final paragraph; moving this forward would improve readability.
- [Section 4] Notation for the two classes of requirements (language-side and emulator-side) is introduced without an accompanying table or diagram that cross-references them to the weak/strong and source/compiled distinctions.
- [Section 2] A few sentences contain repeated phrasing (e.g., 'guest-visible state' appears in consecutive paragraphs without variation); minor rewording would enhance flow.
Simulated Author's Rebuttal
We thank the referee for the careful reading, the positive overall assessment, and the constructive suggestions for improving the precision and evidential support of the taxonomy. We address each major comment below and will incorporate the revisions in the next version of the manuscript.
read point-by-point responses
-
Referee: [Section 3] The definitions of syntactic emulation-completeness and compiled-code emulation-completeness (introduced after the central observation) remain informal and lack a precise mathematical formulation or inductive clauses; this is load-bearing because the claimed precision of the vocabulary and the distinction between weak and strong variants rest on these definitions.
Authors: We agree that the current presentation of the definitions would benefit from greater formality to support the claimed precision of the vocabulary. In the revised manuscript we will supply inductive definitions for both syntactic emulation-completeness and compiled-code emulation-completeness. The clauses will explicitly capture the preservation (or masking) of guest-visible states for the weak and strong variants, and will make the separation between language-side representability and emulator-side faithfulness part of the formal statement. This change directly addresses the load-bearing concern while remaining consistent with the systematization focus of the paper. revision: yes
-
Referee: [Section 5] The Erlang examples (argument limits, bitstring matching, message reception) are invoked to ground the claim that Turing-completeness is insufficient, yet they are presented at a high level without concrete code fragments, observable state differences, or a systematic check against the language-side versus emulator-side classification; this weakens the evidential support for the taxonomy's practical utility.
Authors: We accept that the examples would be more effective if presented with greater concreteness. In the revision we will add short, self-contained code fragments for each of the three Erlang cases, describe the precise observable differences (for instance, the exception raised on argument-limit violation versus the behavior under self-emulation), and include an explicit mapping of each mismatch to the language-side versus emulator-side requirements. These additions will strengthen the grounding of the taxonomy without altering the overall narrative or length of the section. revision: yes
Circularity Check
No significant circularity
full rationale
This is a systematization paper whose contribution consists entirely of definitions, distinctions (source-level vs. compiled-code, weak vs. strong), and a taxonomy separating language-side representability from emulator-side faithfulness. No equations, fitted parameters, or derivation steps appear; the central observation that Turing-completeness is insufficient is illustrated by external Erlang examples rather than derived from prior results by the same authors. All claims are therefore self-contained and do not reduce to their own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Guest-visible state includes control flow, exceptions, callbacks, timing, memory usage, and runtime metadata such as stack traces or line numbers
invented entities (1)
-
emulation-completeness
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A history of erlang
Joe Armstrong. A history of erlang. InProceedings of the Third ACM SIGPLAN Conference on History of Programming Languages, HOPL III, pages 6–1–6–26, New York, NY, USA,
-
[2]
Erlang reference manual - erlang module.http://erlang.org/doc/ man/erlang.html#bump_reductions-1, 2018
Erlang/OTP Team. Erlang reference manual - erlang module.http://erlang.org/doc/ man/erlang.html#bump_reductions-1, 2018. Accessed: 2018-10-30
2018
-
[3]
Erlang runtime system primative receive evaluation module source code.https://github.com/erlang/otp/blob/master/erts/preloaded/src/prim_eval
Erlang/OTP Team. Erlang runtime system primative receive evaluation module source code.https://github.com/erlang/otp/blob/master/erts/preloaded/src/prim_eval. S, 2018. Accessed: 2018-10-30
2018
-
[4]
Erlang standard library bitstring evaluation module source code
Erlang/OTP Team. Erlang standard library bitstring evaluation module source code. https://github.com/erlang/otp/blob/master/lib/stdlib/src/eval_bits.erl, 2018. Accessed: 2018-10-30
2018
-
[5]
Erlang standard library evaluation module source code.https:// github.com/erlang/otp/blob/master/lib/stdlib/src/erl_eval.erl, 2018
Erlang/OTP Team. Erlang standard library evaluation module source code.https:// github.com/erlang/otp/blob/master/lib/stdlib/src/erl_eval.erl, 2018. Accessed: 2018-10-30
2018
-
[6]
Detecting environment-sensitive malware
Martina Lindorfer, Clemens Kolbitsch, and Paolo Milani Comparetti. Detecting environment-sensitive malware. In Robin Sommer, Davide Balzarotti, and Gregor Maier, editors,Recent Advances in Intrusion Detection, pages 338–357, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg
2011
-
[7]
Detecting system emulators
Thomas Raffetseder, Christopher Kruegel, and Engin Kirda. Detecting system emulators. In Juan A. Garay, Arjen K. Lenstra, Masahiro Mambo, and René Peralta, editors,Information Security, pages 1–18, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg. 12
2007
-
[8]
Beard, and Naga Praveen Kumar Katta
Jeff Terrace, Stephen R. Beard, and Naga Praveen Kumar Katta. Javascript in javascript (js.js): Sandboxing third-party scripts. InPresented as part of the 3rd USENIX Conference on Web Application Development (WebApps 12), pages 95–100, Boston, MA, 2012. USENIX. 13
2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.