arxiv: 2604.14073 · v1 · submitted 2026-04-15 · 💻 cs.LG · cs.NE

Recognition: unknown

Neural architectures for resolving references in program code

Gerg\H{o} Szalay , Gergely Zsolt Kov\'acs , S\'andor Teleki , Bal\'azs Pint\'er , Tibor Gregorics

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:28 UTC · model grok-4.3

classification 💻 cs.LG cs.NE

keywords neural architecturessequence to sequencereference resolutionprogram decompilationpermutation indexingmachine learningcode analysisswitch statements

0 comments

The pith

New sequence-to-sequence architectures for reference resolution in code handle ten times longer examples and reduce errors by 42% in decompilation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper abstracts reference rewriting in programming into direct and indirect indexing by permutation. It generates synthetic benchmarks where standard sequence-to-sequence models struggle with robustness and length. The authors design new architectures for these indexing problems that maintain performance on much longer sequences. When applied to decompiling switch statements, the enhanced model cuts the error rate by 42%. Ablations verify that every element of the new designs is required for the improvements.

Core claim

We abstract reference rewriting into the problems of direct and indirect indexing by permutation. We create synthetic benchmarks for these tasks and show that well-known sequence-to-sequence machine learning architectures are struggling on these benchmarks. We introduce new sequence-to-sequence architectures for both problems. Our measurements show that our architectures outperform the baselines in both robustness and scalability: our models can handle examples that are ten times longer compared to the best baseline. We measure the impact of our architecture in the real-world task of decompiling switch statements, which has an indexing subtask. According to our measurements, the extended m

What carries the argument

Custom sequence-to-sequence architectures for direct and indirect indexing by permutation that abstract and solve reference rewriting in program code.

If this is right

The architectures process inputs ten times longer than the strongest baseline while remaining accurate.
Error rates fall by 42% when decompiling switch statements using the extended model.
Ablation experiments demonstrate that each component of the architectures is necessary.
Common sequence-to-sequence models lack robustness and scalability on these permutation tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These permutation-focused designs may extend to other code manipulation problems like register allocation or loop invariant code motion.
If the benchmarks represent real code well, the methods could enhance tools for binary analysis and malware detection.
Integrating the architectures into larger transformer-based models might further boost performance on full programs.

Load-bearing premise

The synthetic benchmarks for direct and indirect indexing by permutation faithfully capture the distribution and difficulty of reference-resolution subtasks that appear in real program decompilation.

What would settle it

Running the new architectures on a larger set of real decompiled programs with diverse reference patterns and finding no improvement in accuracy or scalability would disprove the main claims.

Figures

Figures reproduced from arXiv: 2604.14073 by Bal\'azs Pint\'er, Gergely Zsolt Kov\'acs, Gerg\H{o} Szalay, S\'andor Teleki, Tibor Gregorics.

**Figure 2.** Figure 2: <sos> a 2 7 <eos> Embedding Output of GRU ... ... Index <sos> Embedding Linear GRU Linear Linear MatMul Elementwise product & Sum Elementwise product Accumulate Add Me Me varr varr vvocab vlogits Mind Mind vattn (t) vind Mprob vperm [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The architecture created for the indirect permutation problem with the attention mechanism [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The weights of the index embedding of the model (Eq. 5) for the direct permutation problem after [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: An example of the Mdp matrix (defined in Eq. 22) calculated by a trained PermNetI model on a PI20 sample. This square matrix has dimensions equal to the length of the input sequence. The beginning of the sequence contains the DATA-KEY pairs, while the end of the sequence contains the QUERY tokens. The matrix connects the QUERY tokens to the DATA tokens, which is visible from the high values in the top righ… view at source ↗

**Figure 6.** Figure 6: An example of the model focusing on the DATA-KEY pairs when removing [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

read the original abstract

Resolving and rewriting references is fundamental in programming languages. Motivated by a real-world decompilation task, we abstract reference rewriting into the problems of direct and indirect indexing by permutation. We create synthetic benchmarks for these tasks and show that well-known sequence-to-sequence machine learning architectures are struggling on these benchmarks. We introduce new sequence-to-sequence architectures for both problems. Our measurements show that our architectures outperform the baselines in both robustness and scalability: our models can handle examples that are ten times longer compared to the best baseline. We measure the impact of our architecture in the real-world task of decompiling switch statements, which has an indexing subtask. According to our measurements, the extended model decreases the error rate by 42%. Multiple ablation studies show that all components of our architectures are essential.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Custom seq2seq architectures for permutation-based reference resolution scale to longer inputs and cut decompilation error by 42%, but the synthetic benchmarks are the part that needs the most checking.

read the letter

The paper abstracts reference rewriting in code into direct and indirect indexing-by-permutation tasks, builds synthetic benchmarks for them, and introduces new sequence-to-sequence architectures that handle inputs ten times longer than standard baselines while delivering a 42% error drop on switch-statement decompilation. Ablation studies are included to show each component matters. That is the core contribution and it is presented clearly from a real engineering motivation. The gains on the actual decompilation task are the most useful part of the results. The architectures themselves appear to be the novel piece rather than just another application of existing seq2seq models. The work is narrow but the empirical comparison is direct and the numbers are stated plainly. The main soft spot is the gap between the synthetic permutation benchmarks and the reference patterns that actually appear in decompiled binaries. If the generated data lacks the aliasing, scoping, or control-flow noise of real binaries, then the scalability advantage may not transfer as cleanly as claimed. The abstract gives no error bars, no details on benchmark construction, and no statistical tests, so those elements will need to be examined in the full text. Readers working on machine learning for program analysis or decompilation will get the most out of it. Anyone looking for concrete examples of domain-specific architectures beating generic ones on structured prediction tasks will also find value. The paper is coherent on its own terms and the claims are testable, so it deserves a serious referee. I would send it to peer review rather than desk-reject it.

Referee Report

2 major / 1 minor

Summary. The paper abstracts reference rewriting in code to direct and indirect indexing by permutation. It generates synthetic benchmarks where standard seq2seq models struggle, introduces new architectures claimed to handle inputs ten times longer than baselines, and reports a 42% error-rate reduction when applied to decompiling switch statements, with ablations showing all components are essential.

Significance. If the results hold, the work offers targeted architectures for long-range reference resolution in code, which could improve decompilation and program analysis tools. The combination of isolating the indexing subtask synthetically and validating on a real decompilation task, plus ablation studies, is a positive aspect of the experimental design.

major comments (2)

[Abstract] Abstract: the headline claims (10x longer inputs, 42% error drop on switch decompilation) are presented without details on how the synthetic benchmarks are generated, the precise model equations or architectural modifications, or any statistical tests/error bars. These omissions are load-bearing because the central claims of superior robustness and scalability cannot be assessed or reproduced from the given information.
[Synthetic benchmarks] Synthetic benchmarks section: no comparison is provided between the distribution of reference patterns (aliasing, scoping, control-flow noise) in the generated permutation-indexing data and those appearing in actual decompiled binaries. Without this, the assumption that superior performance on the synthetic tasks implies transfer to the motivating switch-decompilation application remains unverified and weakens the real-world claim.

minor comments (1)

[Abstract] The abstract refers to 'well-known sequence-to-sequence machine learning architectures' as baselines but does not name them; adding the specific models (e.g., standard LSTM or Transformer variants) would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, clarifying details present in the full manuscript and indicating revisions where they strengthen the presentation without altering the core claims.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claims (10x longer inputs, 42% error drop on switch decompilation) are presented without details on how the synthetic benchmarks are generated, the precise model equations or architectural modifications, or any statistical tests/error bars. These omissions are load-bearing because the central claims of superior robustness and scalability cannot be assessed or reproduced from the given information.

Authors: The abstract is intentionally concise as a high-level summary of contributions. Complete details on synthetic benchmark generation appear in Section 3, the model architectures with equations and modifications in Section 4, and all quantitative results including error bars and statistical tests in Section 5. These sections enable full assessment and reproduction. We have added a single sentence to the abstract directing readers to the relevant sections for the supporting details. revision: partial
Referee: [Synthetic benchmarks] Synthetic benchmarks section: no comparison is provided between the distribution of reference patterns (aliasing, scoping, control-flow noise) in the generated permutation-indexing data and those appearing in actual decompiled binaries. Without this, the assumption that superior performance on the synthetic tasks implies transfer to the motivating switch-decompilation application remains unverified and weakens the real-world claim.

Authors: The synthetic benchmarks isolate the core permutation-indexing operations that define the reference-resolution subtask. While we did not include an explicit statistical comparison of pattern distributions (e.g., aliasing or scoping frequencies), the transferability is directly evidenced by the 42% error-rate reduction achieved when the same architectures are applied to the real switch-decompilation task. We have inserted a short paragraph in the synthetic benchmarks section explaining the design choices and their relation to real code patterns. revision: partial

Circularity Check

0 steps flagged

No circularity: results are measured empirical performance on held-out data.

full rationale

The paper defines synthetic benchmarks for direct/indirect permutation indexing, trains sequence-to-sequence models on them, and reports measured accuracy/scalability on held-out test sets plus a separate real decompilation task. No equations, fitted parameters, or self-citations are shown that reduce the reported 10x length handling or 42% error reduction to quantities defined by the same inputs. Ablations and baseline comparisons are independent experimental outcomes. The motivating assumption that synthetic permutations proxy real reference patterns is a validity claim, not a definitional reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are described. The work implicitly relies on standard assumptions of neural-network training (gradient descent, random initialization, synthetic data distribution matching real tasks) but none are stated as novel.

pith-pipeline@v0.9.0 · 5452 in / 1136 out tokens · 46764 ms · 2026-05-10T13:28:15.655642+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate.arXiv preprint arXiv:1409.0473, 2015. URLhttp://arxiv.org/abs/1409.0473

work page internal anchor Pith review arXiv 2015
[2]

K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio. On the properties of neural machine translation: Encoder-decoder approaches.CoRR, abs/1409.1259, 2014. URLhttp://arxiv.org/abs/ 1409.1259

work page arXiv 2014
[3]

J. Gu, Z. Lu, H. Li, and V. O. Li. Incorporating copying mechanism in sequence-to-sequence learning. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1631–1640, 2016

2016
[4]

H. J. Levesque, E. Davis, and L. Morgenstern. The winograd schema challenge.KR, 2012(13th):3, 2012

2012
[5]

Mitkov.Anaphora resolution

R. Mitkov.Anaphora resolution. Routledge, 2014

2014
[6]

J. Quan, D. Xiong, B. Webber, and C. Hu. Gecor: An end-to-end generative ellipsis and co-reference resolution model for task-oriented dialogue.arXiv preprint arXiv:1909.12086, 2019

work page arXiv 1909
[7]

Schuster and K

M. Schuster and K. K. Paliwal. Bidirectional recurrent neural networks.IEEE transactions on Signal Processing, 45(11):2673–2681, 1997

1997
[8]

A. See, P. J. Liu, and C. D. Manning. Get to the point: Summarization with pointer-generator networks. InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073–1083, 2017

2017
[9]

Szalay, M

G. Szalay, M. B. Poór, B. Pintér, and T. Gregorics. Single-pass end-to-end neural decompila- tion using copying mechanism.Neural Computing and Applications, Dec 2024. ISSN 1433-3058. doi:10.1007/s00521-024-10735-9. URLhttps://doi.org/10.1007/s00521-024-10735-9

work page doi:10.1007/s00521-024-10735-9 2024
[10]

Tseng, S

B.-H. Tseng, S. Bhargava, J. Lu, J. R. A. Moniz, D. Piraviperumal, L. Li, and H. Yu. Cread: Combined resolution of ellipses and anaphora in dialogues.arXiv preprint arXiv:2105.09914, 2021

work page arXiv 2021
[11]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polo- sukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, vol- ume 30. Curran Associates, Inc., 2017. URLhttps://proceedings.ne...

2017
[12]

Vinyals, M

O. Vinyals, M. Fortunato, and N. Jaitly. Pointer networks.Advances in neural information processing systems, 28, 2015

2015
[13]

L2 " is the key and

M. N. Wegman and F. K. Zadeck. Constant propagation with conditional branches.ACM Transactions on Programming Languages and Systems (TOPLAS), 13(2):181–210, 1991. 16 Neural architectures for resolving references in program codePreprint A A real-world example Below is an assembly code snippet corresponding to aCswitch statement. // in case of 87 as an argu...

1991