pith. machine review for the scientific record. sign in

arxiv: 2604.13675 · v1 · submitted 2026-04-15 · 💻 cs.PL · cs.CR

Recognition: unknown

Erlang Binary and Source Code Obfuscation

Gregory Morse, Tam\'as Kozsik

Pith reviewed 2026-05-10 12:27 UTC · model grok-4.3

classification 💻 cs.PL cs.CR
keywords ErlangobfuscationBEAM bytecodedecompilationreverse engineeringcode transformationvirtual machine
0
0 comments X

The pith

Erlang obfuscation succeeds by exploiting gaps between the language's clean semantics and the concrete constraints of its BEAM compiler and runtime.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines transformations applied to Erlang source, syntax trees, assembly, and bytecode that aim to block reverse engineering and decompilation. It argues that the most reliable methods do not rely on arbitrary corruption but instead use specific mismatches between what the high-level Erlang language guarantees and what the lower-level toolchain and virtual machine actually accept. By cataloging opcode dependency tricks, receive-loop encodings, irregular control flow, performance-oriented mutability changes, and self-modifying modules loaded at runtime, the work shows how programs can retain their original observable behavior while becoming far harder to reconstruct. A reader would care because these gaps are inherent to the Erlang execution model and therefore affect any attempt to protect or analyze deployed code.

Core claim

The paper claims that effective obfuscation arises from targeted exploitation of representational gaps between high-level Erlang semantics and the lower-level execution model accepted by the compiler, validator, loader, and virtual machine. It categorizes five families of transformations—opcode-level dependency tricks, receive-based loop encodings, irregular control-flow constructions, mutability-oriented performance obfuscation, and self-modifying code enabled by dynamic module loading—and shows that each preserves correct runtime behavior while complicating decompilation and recompilation.

What carries the argument

The central mechanism is the set of transformations that deliberately widen the distance between Erlang's abstract semantics and the concrete representation accepted by the BEAM toolchain and runtime.

If this is right

  • Obfuscated modules continue to load and execute correctly under the standard Erlang virtual machine.
  • Decompilers and disassemblers encounter increased structural obstacles from the irregular control flow and dependency encodings.
  • Self-modifying code remains possible through dynamic loading without violating loader rules.
  • Performance-oriented mutations can be introduced without changing observable results.
  • The same gap-exploitation pattern can be applied at source, AST, assembly, and bytecode levels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar gap-based methods may apply to other virtual-machine languages where high-level semantics diverge from the concrete execution model.
  • Protecting distributed Erlang systems could rely on these constructions rather than external encryption layers.
  • Decompiler authors would need to model the exact loader and validator constraints to recover clean code from such modules.

Load-bearing premise

These specific transformations preserve the original program's observable behavior while making reverse engineering and decompilation substantially harder.

What would settle it

Successful reconstruction of the original source or logic from one of the described obfuscated BEAM modules by a standard decompiler or disassembler without extra information would show the claimed resistance does not hold.

Figures

Figures reproduced from arXiv: 2604.13675 by Gregory Morse, Tam\'as Kozsik.

Figure 1
Figure 1. Figure 1: Transformation graph between Erlang source, AST forms, BEAM assembly, and [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Erlang pattern matching example This yields the following BEAM assembly code in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: BEAM code for the pattern matching example [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Validation error after removing the result-preserving move [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Validator-compliant idiom used as a patch source [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Pattern-matching idiom prepared for binary patching [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: BEAM opcode sequence dependency example Element initialization opcodes for binaries include bs_put_string, bs_put_utf8, bs_put_utf16, bs_put_utf32, bs_put_integer, bs_put_binary, and bs_put_float. The string version takes a list of 5 [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Erlang simple recursion example 1 sum_to_n_loop(N) -> 2 U = erlang:unique_integer(), S = self(), 3 Counter = 0, Sum = 0, S ! {U, N =:= 0}, 4 fun SumReceive(C, M, Sm) -> 5 receive {U, X} -> 6 if X -> Sm; 7 true -> P = M - 1, S ! {U, P =:= 0}, 8 SumReceive(Counter+1, P, Sm+M) 9 end 10 end 11 end(Counter, N, Sum) [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Erlang recursion via structured receive times the loop body executed. This example demonstrates a for-loop or pre-test while loop. A post-test while loop would simply substitute the condition N =:= 0 for false for the first iteration. The process itself is stored in a variable to send messages, while a unique identifier is used to prevent any interference; using the library unique integer function should b… view at source ↗
Figure 10
Figure 10. Figure 10: Mailbox cleanup keyed by a unique identifier [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Erlang dynamic module compilation and loading code [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Direct BEAM-assembly generation for mutable tuple updates [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Erlang mutable merge sort static view of the program incomplete. This is especially problematic for decompilers and reverse-engineering tools which assume that code identity is stable once a module has been loaded. In practice, such self-modifying techniques also provide a natural bridge to staged obfuscation. A program may begin from a relatively benign initial representation, derive a transformed succes… view at source ↗
Figure 14
Figure 14. Figure 14: Random-access reads and writes with a mutable tuple [PITH_FULL_IMAGE:figures/full_fig_p013_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Self-modifying Erlang function via dynamic recompilation [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗
read the original abstract

This paper studies obfuscation techniques for Erlang programs at the source, abstract syntax tree, BEAM assembly, and BEAM bytecode levels. We focus on transformations that complicate reverse engineering, decompilation, and recompilation while remaining grounded in the actual behavior of the Erlang compiler, validator, loader, and virtual machine. The paper categorizes opcode-level dependency tricks, receive-based loop encodings, irregular control-flow constructions, mutability-oriented performance obfuscation, and self-modifying code enabled by dynamic module loading. A recurring theme is that effective obfuscation in BEAM often arises not from arbitrary corruption, but from exploiting representational gaps between high-level Erlang semantics and the lower-level execution model accepted by the toolchain and runtime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. This paper studies obfuscation techniques for Erlang programs at the source, abstract syntax tree, BEAM assembly, and BEAM bytecode levels. It categorizes five classes of transformations—opcode-level dependency tricks, receive-based loop encodings, irregular control-flow constructions, mutability-oriented performance obfuscation, and self-modifying code enabled by dynamic module loading—while emphasizing that effective obfuscation exploits representational gaps between high-level Erlang semantics and the lower-level execution model accepted by the compiler, validator, loader, and VM.

Significance. If the categorized transformations can be shown to preserve observable program behavior while demonstrably raising the cost of reverse engineering and decompilation, the work would supply a useful taxonomy of BEAM-specific obfuscation methods. It could inform both the design of code-protection tools for Erlang systems and the improvement of decompilers that must handle toolchain-accepted but semantically irregular constructions.

major comments (2)
  1. Abstract: the central claim that the five categories of transformations 'complicate reverse engineering, decompilation, and recompilation while remaining grounded in the actual behavior' is unsupported by any concrete before/after examples, validator/loader/VM test cases, or measurements of decompilation effort. Without such evidence the exploitation-of-gaps thesis remains unverified.
  2. Description of the five categories (opcode dependency tricks through self-modifying modules): the manuscript asserts that these constructions preserve original program behavior yet increase reverse-engineering cost, but supplies no equivalence arguments, test-suite results, or error analysis for any category, leaving the semantics-preservation and effectiveness claims without demonstrated support.
minor comments (1)
  1. The categorization would be easier to follow if the paper included a summary table listing each technique, the level(s) at which it applies (source/AST/BEAM), and the specific representational gap it exploits.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight the need for stronger empirical grounding of our claims, which we address below by outlining targeted revisions while preserving the paper's focus as a taxonomy of BEAM-specific obfuscation techniques.

read point-by-point responses
  1. Referee: Abstract: the central claim that the five categories of transformations 'complicate reverse engineering, decompilation, and recompilation while remaining grounded in the actual behavior' is unsupported by any concrete before/after examples, validator/loader/VM test cases, or measurements of decompilation effort. Without such evidence the exploitation-of-gaps thesis remains unverified.

    Authors: We agree that the abstract would benefit from more immediate support. The body of the manuscript already details how each category exploits documented gaps (e.g., opcode dependencies accepted by the loader but invisible to high-level decompilers, or dynamic module loading for self-modification). In revision we will expand the abstract to reference these mechanisms briefly and add a dedicated 'Illustrative Examples' subsection containing before/after source and BEAM snippets for each of the five categories, together with short validator and runtime execution checks confirming acceptance by the toolchain. Quantitative measurement of decompilation effort is inherently difficult to standardize; we will instead supply qualitative reasoning tied to the specific irregularities introduced, noting that full empirical user studies lie outside the scope of this work. revision: partial

  2. Referee: Description of the five categories (opcode dependency tricks through self-modifying modules): the manuscript asserts that these constructions preserve original program behavior yet increase reverse-engineering cost, but supplies no equivalence arguments, test-suite results, or error analysis for any category, leaving the semantics-preservation and effectiveness claims without demonstrated support.

    Authors: The manuscript grounds preservation in the fact that all presented constructions are accepted by the official Erlang compiler, validator, loader, and VM without modification, thereby inheriting the same observable semantics by construction. For example, receive-based loop encodings and irregular control flow remain valid under the BEAM execution model. We acknowledge the value of explicit test cases. In the revised version we will append a small test-suite summary (with input/output pairs) for representative examples from each category, plus a brief error-analysis paragraph discussing known edge cases such as hot-code loading interactions. A complete formal equivalence proof would require a verified semantics of the entire BEAM instruction set and is beyond the taxonomy-oriented contribution of the paper; we will make this scope limitation explicit. revision: partial

standing simulated objections not resolved
  • Objective, reproducible quantification of 'increased reverse-engineering cost' would require controlled experiments with professional decompilers or human subjects, which exceeds the resources and scope of the current taxonomy paper.

Circularity Check

0 steps flagged

No circularity: purely descriptive taxonomy with no derivations or self-referential claims

full rationale

The paper is a categorization of Erlang/BEAM obfuscation techniques (opcode dependencies, receive loops, irregular control flow, mutability tricks, self-modifying modules) without equations, predictions, fitted parameters, or formal derivations. The abstract and structure frame the work as observational taxonomy exploiting representational gaps between Erlang semantics and BEAM execution, but supply no load-bearing steps that reduce to self-definition, self-citation, or renaming of inputs. No uniqueness theorems, ansatzes, or prior-author results are invoked to force conclusions. The central theme is presented as a recurring observation rather than a derived result, rendering the paper self-contained against external benchmarks with no circular reduction possible.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are present because the paper is a descriptive study of practical obfuscation techniques rather than a formal or mathematical derivation.

pith-pipeline@v0.9.0 · 5409 in / 1120 out tokens · 41877 ms · 2026-05-10T12:27:14.802227+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 1 canonical work pages

  1. [1]

    Hello␣1!~n

    Joe Armstrong.Making Reliable Distributed Systems in the Presence of Software Errors. PhD thesis, Royal Institute of Technology, Stockholm, Sweden, 2003. URL: https: //dblp.org/rec/phd/basesearch/Armstrong03. 13 1loop() -> 2io:format("Hello␣1!~n", []), 3{ok, MTs, _} = erl_scan:string("-module(selfmod)."), 4{ok, ETs, _} = erl_scan:string("-export([loop/0])...

  2. [2]

    Erlang.Communications of the ACM, 53(9):68–75, 2010

    Joe Armstrong. Erlang.Communications of the ACM, 53(9):68–75, 2010. URL:https: //cacm.acm.org/research/erlang/

  3. [3]

    Schwartz, and Maverick Woo

    David Brumley, JongHyup Lee, Edward J. Schwartz, and Maverick Woo. Native x86 decompilation using Semantics-Preserving structural analysis and iterative Control-Flow structuring. In22nd USENIX Security Symposium (USENIX Security 13), pages 353–368. USENIX Association, 2013. URL:https://www.usenix.org/conference/usenixsecuri ty13/technical-sessions/present...

  4. [4]

    John Gough

    Cristina Cifuentes and K. John Gough. Decompilation of binary programs.Software: Practice and Experience, 25(7):811–829, 1995. URL:https://dblp.org/rec/journals/ spe/CifuentesG95.html

  5. [5]

    A taxonomy of obfuscating transformations

    Christian Collberg, Clark Thomborson, and Douglas Low. A taxonomy of obfuscating transformations. Technical report, Department of Computer Science, The University of Auckland, 1997. Technical report

  6. [6]

    Erlang/OTP, 2026

    Erlang/OTP Team.The Abstract Format. Erlang/OTP, 2026. ERTS documentation v16.3.1. URL:https://www.erlang.org/doc/apps/erts/absform.html. Accessed 2026-04-11

  7. [7]

    Erlang/OTP, 2026

    Erlang/OTP Team.beam_lib — Interface to the BEAM File Format. Erlang/OTP, 2026. stdlib documentation v7.3. URL:https://www.erlang.org/doc/apps/stdlib/beam_lib. html. Accessed 2026-04-11

  8. [8]

    Erlang/OTP, 2026

    Erlang/OTP Team.cerl — Core Erlang Abstract Syntax Trees. Erlang/OTP, 2026. compiler documentation v9.0.6. URL:https://www.erlang.org/doc/apps/compiler/cerl.html. Accessed 2026-04-11. 14

  9. [9]

    Erlang/OTP, 2026

    Erlang/OTP Team.Compilation and Code Loading. Erlang/OTP, 2026. Erlang System Documentation v28.4.2. URL:https://www.erlang.org/doc/system/code_loading.h tml. Accessed 2026-04-11

  10. [10]

    Erlang/OTP, 2026

    Erlang/OTP Team.compile — Erlang Compiler. Erlang/OTP, 2026. Compiler docu- mentation v9.0.6. URL:https://www.erlang.org/doc/apps/compiler/compile.html . Accessed 2026-04-11

  11. [11]

    Erlang/OTP, 2026

    Erlang/OTP Team.epp — Erlang Code Preprocessor. Erlang/OTP, 2026. stdlib docu- mentation v7.3. URL:https://www.erlang.org/doc/apps/stdlib/epp.html. Accessed 2026-04-11

  12. [12]

    Erlang/OTP, 2026

    Erlang/OTP Team.erlang — Built-In Functions, Process Information, and Reductions. Erlang/OTP, 2026. ERTS documentation v16.3.1. URL:https://www.erlang.org/doc/a pps/erts/erlang.html. Accessed 2026-04-11

  13. [13]

    Erlang/OTP, 2026

    Erlang/OTP Team.Release Handling. Erlang/OTP, 2026. Erlang System Documentation v28.4.2. URL:https://www.erlang.org/doc/system/release_handling.html. Accessed 2026-04-11

  14. [14]

    A comb for decompiled C code

    Andrea Gussoni, Alessandro Di Federico, Pietro Fezzardi, and Giovanni Agosta. A comb for decompiled C code. InProceedings of the ACM Asia Conference on Computer and Communications Security, pages 637–651, 2020. URL:https://dblp.org/rec/conf/ccs/ GussoniFFA20

  15. [15]

    HappiHacking, 2025

    Erik Stenman and contributors.The BEAM Book: Understanding the Erlang Runtime System. HappiHacking, 2025. Version 1.0.86. URL:https://blog.stenmans.org/theBea mBook

  16. [16]

    Hui Xu, Yangfan Zhou, Jiang Ming, and Michael R. Lyu. Layered obfuscation: A taxonomy of software obfuscation techniques for layered security.Cybersecurity, 3(1):9, 2020. URL: https://link.springer.com/article/10.1186/s42400-020-00049-3

  17. [17]

    No more gotos: Decompilation using pattern-independent control-flow structuring and Semantics-Preserving transformations

    Khaled Yakdan, Sebastian Eschweiler, Elmar Gerhards-Padilla, and Matthew Smith. No more gotos: Decompilation using pattern-independent control-flow structuring and Semantics-Preserving transformations. InNDSS Symposium 2015, 2015. URL: https: //dblp.org/rec/conf/ndss/YakdanEGS15.html. 15