arxiv: 2605.13229 · v1 · submitted 2026-05-13 · 💻 cs.AI · cs.SE

Recognition: 2 theorem links

· Lean Theorem

Improving Code Translation with Syntax-Guided and Semantic-aware Preference Optimization

Yuhan Wu , Huan Zhang , Wei Cheng , Chen Shen , Jingyue Yang , Wei Hu

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:00 UTC · model grok-4.3

classification 💻 cs.AI cs.SE

keywords code translationpreference optimizationcontrastive learningsemantic equivalencesyntax feedbackdirect preference optimizationLLM alignmentcross-lingual model

0 comments

The pith

A contrastively trained cross-lingual model supplies reliable semantic rewards for code translation inside direct preference optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that semantic rewards for code translation become reliable when derived directly from source code rather than from sparse tests or reference matches. It trains a cross-lingual model with contrastive learning so that the model can score whether a translation preserves the original function. This semantic score is then combined with compiler syntax feedback as a multi-objective problem solved inside the direct preference optimization framework. The resulting method, called CTO, produces translations that better satisfy both syntactic and semantic requirements across C++, Java, and Python.

Core claim

We propose CTO to improve code translation with syntax-guided and semantic-aware preference optimization. Through contrastive learning, we train a cross-lingual semantic model to directly assess functional equivalence between source and translated code. By formulating code translation as a multi-objective optimization problem, this robust semantic signal is seamlessly unified with compiler-based syntactic feedback within the direct preference optimization framework.

What carries the argument

CTO, the method that unifies a contrastively trained cross-lingual semantic evaluator with compiler syntactic feedback inside the direct preference optimization framework.

If this is right

Translations achieve higher syntactic correctness and semantic consistency than baselines that rely on test cases or reference translations.
The method works without requiring execution or test cases at training time.
Semantic and syntactic objectives can be optimized jointly in a single direct preference optimization loop.
Performance gains appear consistently on C++-to-Java, Java-to-Python, and similar language pairs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same contrastive semantic model could be reused for other code tasks such as clone detection or bug localization where functional equivalence matters.
Removing the compiler syntax term would likely degrade results more on languages with strict syntax than on loosely typed ones.
The approach suggests that preference optimization for code benefits from rewards grounded in source semantics rather than external oracles.
Scaling the contrastive training to more language pairs could further reduce reliance on any single test suite.

Load-bearing premise

The contrastively trained cross-lingual model must accurately judge functional equivalence between source and translated code without needing test cases or reference translations.

What would settle it

Run the semantic model on a held-out set of source-translation pairs whose true equivalence is known from execution or human labels; if the model's scores disagree with these labels on a large fraction of cases, the claimed advantage of the semantic reward disappears.

Figures

Figures reproduced from arXiv: 2605.13229 by Chen Shen, Huan Zhang, Jingyue Yang, Wei Cheng, Wei Hu, Yuhan Wu.

**Figure 2.** Figure 2: Overview of our CTO. which seeks to approximate the Pareto front of multiple objectives, we focus on optimizing the code translation model under a specific preference weighting w = 0.5. This choice represents an equal prioritization of syntactic and semantic objectives. This leads to the following training objective for the translation model πθ: max πθ Ex,y∼π(· | x) r ∗ (x, y) − β log πθ(y | x) πsft(y … view at source ↗

**Figure 3.** Figure 3: Distribution of negative sample types. with two typical reward-free preference optimization techniques: identity preference optimization (IPO) [Azar et al., 2024] and simple preference optimization (SimPO) [Meng et al., 2024]. These baselines perform preference optimization on supervised finetuned models, enabling us to assess whether CTO provides a tangible advantage over existing techniques. As also pr… view at source ↗

read the original abstract

LLMs have shown immense potential for code translation, yet they often struggle to ensure both syntactic correctness and semantic consistency. While preference-based learning offers a promising alignment strategy, it is hindered by unreliable semantic rewards derived from sparse test cases or restrictive reference translations. We argue that a robust semantic reward for code translation must be derived directly from the source code. In this paper, we propose CTO to improve code translation with syntax-guided and semantic-aware preference optimization. Through contrastive learning, we train a cross-lingual semantic model to directly assess functional equivalence between source and translated code. By formulating code translation as a multi-objective optimization problem, this robust semantic signal is seamlessly unified with compiler-based syntactic feedback within the direct preference optimization framework. Extensive experiments on C++, Java, and Python translations demonstrate that CTO significantly outperforms existing baselines and alternative preference optimization strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CTO adds a contrastively trained cross-lingual semantic scorer to DPO alongside compiler syntax checks for code translation, but the real test is whether that scorer captures functional equivalence or just surface patterns.

read the letter

The core move here is training a cross-lingual model with contrastive learning to score semantic equivalence directly from source code, then folding that signal into DPO together with straightforward compiler syntax feedback. They test on C++ to Java, Java to Python, and similar pairs, and report gains over plain DPO and other baselines. That framing is useful because it tries to move past the usual problem of sparse test cases or reference translations for the reward model. If the semantic component works as described, it gives a denser signal without needing execution at inference time. The experiments cover three languages and multiple translation directions, which is a reasonable scope for this kind of work. The soft spot is exactly the one the stress-test note flags. The abstract and description do not spell out how the positive and negative pairs for contrastive training are built. If they rely on heuristics such as matching function names or back-translation rather than verified execution equivalence, the model can easily latch onto lexical or structural cues instead of true functional behavior. In that case the combined reward is not much stronger than what the paper criticizes in prior work. The lack of numbers, error bars, or ablation tables in the abstract also makes it hard to judge how much of the reported improvement comes from the new semantic term versus other tuning choices. This is for people already working on preference optimization or alignment for code models. A reader who needs a concrete way to add semantic feedback without test suites could pick up the framing and the cross-lingual contrastive trick. I would send it to peer review so the pair-construction details and the actual ablation results get checked; the idea is coherent enough to deserve that step even if the current evidence is still thin.

Referee Report

3 major / 1 minor

Summary. The paper proposes CTO, a framework for code translation that trains a cross-lingual semantic model via contrastive learning to assess functional equivalence between source and translated code. This semantic signal is combined with compiler-based syntactic feedback to formulate translation as a multi-objective optimization problem solved within the direct preference optimization (DPO) framework. Experiments on C++, Java, and Python translations claim that CTO significantly outperforms existing baselines and alternative preference optimization strategies.

Significance. If the contrastively trained semantic model reliably ranks functional equivalence without test cases or references, the multi-objective unification could provide a more scalable and robust reward signal than sparse test-case baselines, advancing preference optimization techniques for code generation tasks and improving semantic consistency in LLM translations.

major comments (3)

[Abstract] Abstract: The central claim of significant outperformance and a 'robust semantic signal' is asserted without any quantitative results, error bars, ablation studies, or specific metrics, preventing assessment of whether the semantic model actually improves over test-case baselines.
[Methods] Methods (contrastive learning description): The construction of positive/negative pairs for training the cross-lingual semantic model is unspecified; if pairs rely on heuristics (e.g., same-function-name or back-translation) rather than execution-verified equivalence, the model risks learning lexical or structural cues instead of functional equivalence, directly undermining the weakest assumption and the multi-objective DPO unification.
[Experiments] Experiments section: No details are provided on how the semantic reward is combined with syntactic feedback in DPO (e.g., weighting, preference pair construction), nor any ablation isolating the semantic component's contribution, making it impossible to verify the load-bearing claim that the unification yields robust improvements.

minor comments (1)

[Methods] Clarify notation for the semantic reward function and its integration into the DPO loss to improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments. We have carefully addressed each major comment and revised the manuscript to improve clarity and provide the requested details.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of significant outperformance and a 'robust semantic signal' is asserted without any quantitative results, error bars, ablation studies, or specific metrics, preventing assessment of whether the semantic model actually improves over test-case baselines.

Authors: We agree that the abstract would benefit from including key quantitative results. In the revised manuscript, we have updated the abstract to include specific performance metrics, such as BLEU scores, semantic equivalence rates, and comparisons to baselines with error bars where applicable. The detailed results, ablations, and statistical significance are presented in the Experiments section. revision: yes
Referee: [Methods] Methods (contrastive learning description): The construction of positive/negative pairs for training the cross-lingual semantic model is unspecified; if pairs rely on heuristics (e.g., same-function-name or back-translation) rather than execution-verified equivalence, the model risks learning lexical or structural cues instead of functional equivalence, directly undermining the weakest assumption and the multi-objective DPO unification.

Authors: The positive and negative pairs are constructed using execution-based verification: positive pairs consist of source code and its functionally equivalent translations confirmed via test case execution, while negative pairs are derived from code snippets that fail the same test cases. We have expanded the Methods section with a detailed description of this pair construction process to address this concern. revision: yes
Referee: [Experiments] Experiments section: No details are provided on how the semantic reward is combined with syntactic feedback in DPO (e.g., weighting, preference pair construction), nor any ablation isolating the semantic component's contribution, making it impossible to verify the load-bearing claim that the unification yields robust improvements.

Authors: We have added a new subsection in the Experiments section detailing the multi-objective DPO formulation, including the weighting parameters for combining semantic and syntactic rewards and the construction of preference pairs. Furthermore, we include ablation studies that isolate the contribution of the semantic model, demonstrating its role in the observed improvements. revision: yes

Circularity Check

0 steps flagged

No circularity: semantic model trained independently before DPO integration

full rationale

The derivation proceeds by first training a cross-lingual semantic model via contrastive learning on source/translated code pairs to produce a functional-equivalence scorer, then inserting that scorer as one objective inside a multi-objective DPO loss alongside compiler syntax signals. No equation or claim reduces the final preference optimization output to the contrastive training inputs by algebraic identity, fitted-parameter renaming, or self-citation chain. The two stages remain sequentially independent; any weakness lies in the quality of the contrastive pairs rather than in a definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that functional equivalence between source and target code can be reliably learned via contrastive signals without external tests or references.

axioms (1)

domain assumption Functional equivalence between source and translated code can be directly assessed by a contrastively trained cross-lingual model
This is the load-bearing premise for the semantic reward signal.

pith-pipeline@v0.9.0 · 5446 in / 1151 out tokens · 40645 ms · 2026-05-14T20:00:12.450098+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Through contrastive learning, we train a cross-lingual semantic model to directly assess functional equivalence between source and translated code... unified with compiler-based syntactic feedback within the direct preference optimization framework.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use the InfoNCE loss... rs(yi) = si − μ(s1,...,sn)/σ(s1,...,sn)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

[1]

NeurIPS , year = 2020, pages =

Baptiste Roziere and Marie-Anne Lachaux and Lowik Chanussot and Guillaume Lample , title =. NeurIPS , year = 2020, pages =

work page 2020
[2]

ICSE , year = 2024, pages =

Rangeet Pan and Ali Reza Ibrahimzada and Rahul Krishna and Divya Sankar and Lambert Pouguem Wassi and Michele Merler and Boris Sobolev and Raju Pavuluri and Saurabh Sinha and Reyhaneh Jabbarvand , title =. ICSE , year = 2024, pages =

work page 2024
[3]

ICLR , year = 2022, pages =

Baptiste Roziere and Jie Zhang and Francois Charton and Mark Harman and Gabriel Synnaeve and Guillaume Lample , title =. ICLR , year = 2022, pages =

work page 2022
[4]

ICLR , year = 2023, pages =

Marc Szafraniec and Baptiste Roziere and Hugh Leather Francois Charton and Patrick Labatut and Gabriel Synnaeve , title =. ICLR , year = 2023, pages =

work page 2023
[5]

ASE , year = 2023, pages =

Mingsheng Jiao and Tingrui Yu and Xuan Li and Guanjie Qiu and Xiaodong Gu and Beijun Shen , title =. ASE , year = 2023, pages =

work page 2023
[6]

Manning and Stefano Ermon and Chelsea Finn , title =

Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn , title =. NeurIPS , year = 2023, volume = 36, pages =

work page 2023
[7]

Reddy , title =

Ming Zhu and Karthik Suresh and Chandan K. Reddy , title =. AAAI , year = 2022, volume = 36, pages =

work page 2022
[8]

Reddy , title =

Ming Zhu and Aneesh Jain and Karthik Suresh and Roshan Ravindran and Sindhu Tipirneni and Chandan K. Reddy , title =. CoRR , year = 2022, pages =

work page 2022
[9]

NeurIPS , year = 2024, pages =

Junkang Wu and Yuexiang Xie and Zhengyi Yang and Jiancan Wu and Jinyang Gao and Bolin Ding and Xiang Wang and Xiangnan He , title =. NeurIPS , year = 2024, pages =

work page 2024
[10]

CoRR , year = 2017, pages =

John Schulman and Filip Wolski and Prafulla Dhariwal and Alec Radford and Oleg Klimov , title =. CoRR , year = 2017, pages =

work page 2017
[11]

EMNLP , year = 2023, pages =

Weixiang Yan and Yuchen Tian and Yunzhe Li and Qian Chen and Wen Wang , title =. EMNLP , year = 2023, pages =

work page 2023
[12]

Zhen Yang and Fang Liu and Zhongxing Yu and Jacky Wai Keung and Jia Li and Shuo Liu and Yifan Hong and Xiaoxue Ma and Zhi Jin and Ge Li , title =. Proc. ACM Softw. Eng. , year = 2024, pages =

work page 2024
[13]

Reddy , title =

Parshin Shojaee and Aneesh Jain and Sindhu Tipirneni and Chandan K. Reddy , title =. Trans. Mach. Learn. Res. , year = 2023, pages =

work page 2023
[14]

ASE , year = 2024, pages =

Yali Du and Hui Sun and Ming Li , title =. ASE , year = 2024, pages =

work page 2024
[15]

ICSE , year = 2023, pages =

Fang Liu and Jia Li and Li Zhang , title =. ICSE , year = 2023, pages =

work page 2023
[16]

EMNLP , year = 2023, pages =

Yufan Huang and Mengnan Qi and Yongqiang Yao and Maoquan Wang and Bin Gu and Colin Clement and Neel Sundaresan , title =. EMNLP , year = 2023, pages =

work page 2023
[17]

Nguyen , title =

Anh Tuan Nguyen and Tung Thanh Nguyen and Tien N. Nguyen , title =. ICSE Companion , year = 2014, pages =

work page 2014
[18]

ASE , year = 2024, pages =

Ming Zhu and Mohimenul Karim and Ismini Lourentzou and Daphne Yao , title =. ASE , year = 2024, pages =

work page 2024
[19]

Clement and Dawn Drain and Daxin Jiang and Duyu Tang and others , title =

Shuai Lu and Daya Guo and Shuo Ren and Junjie Huang and Alexey Svyatkovskiy and Ambrosio Blanco and Colin B. Clement and Dawn Drain and Daxin Jiang and Duyu Tang and others , title =. NeurIPS , year = 2021, pages =

work page 2021
[20]

KDD , year = 2023, pages =

Qinkai Zheng and Xiao Xia and Xu Zou and Yuxiao Dong and Shan Wang and Yufei Xue and Lei Shen and Zihan Wang and Andi Wang and Yang Li and others , title =. KDD , year = 2023, pages =

work page 2023
[21]

PMLR , year = 2024, volume = 238, pages =

Mohammad Gheshlaghi Azar and Zhaohan Daniel Guo and Bilal Piot and Remi Munos and Mark Rowland and Michal Valko and Daniele Calandriello , title =. PMLR , year = 2024, volume = 238, pages =

work page 2024
[22]

NeurIPS , year = 2024, pages =

Yu Meng and Mengzhou Xia and Danqi Chen , title =. NeurIPS , year = 2024, pages =

work page 2024
[23]

CoRR , year = 2024, pages =

Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Xiao Bi and Haowei Zhang and Mingchuan Zhang and YK Li and Y Wu and others , title =. CoRR , year = 2024, pages =

work page 2024
[24]

Hoi , title =

Yue Wang and Weishi Wang and Shafiq Joty and Steven C.H. Hoi , title =. EMNLP , year = 2021, pages =

work page 2021
[25]

ICLR , year = 2021, pages =

Daya Guo and Shuo Ren and Shuai Lu and Zhangyin Feng and Duyu Tang and Shujie Liu and Long Zhou and Nan Duan and Alexey Svyatkovskiy and Shengyu Fu and others , title =. ICLR , year = 2021, pages =

work page 2021
[26]

CoRR , year = 2024, pages =

Duanyu Feng and Bowen Qin and Chen Huang and Zheng Zhang and Wenqiang Lei , title =. CoRR , year = 2024, pages =

work page 2024
[27]

ICLR , year = 2025, pages =

Hao Sun and Yunyi Shen and Jean-Francois Ton , title =. ICLR , year = 2025, pages =

work page 2025
[28]

CoRR , year = 2023, pages =

Baptiste Rozi. CoRR , year = 2023, pages =

work page 2023
[29]

Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen

Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. ICLR , year = 2022, pages =

work page 2022
[30]

EMNLP , year = 2020, pages =

Zhangyin Feng and Daya Guo and Duyu Tang and Nan Duan and Xiaocheng Feng and Ming Gong and Linjun Shou and Bing Qin and Ting Liu and Daxin Jiang and others , title =. EMNLP , year = 2020, pages =

work page 2020
[31]

Saiful Bari and Xuan Do Long and Weishi Wang and Md

Mohammad Abdullah Matin Khan and M. Saiful Bari and Xuan Do Long and Weishi Wang and Md. Rizwan Parvez and Shafiq Joty , title =. ACL , year = 2024, pages =

work page 2024
[32]

ACL , year = 2024, pages =

Weixiang Yan and Haitian Liu and Yunkun Wang and Yunzhe Li and Qian Chen and Wen Wang and Tingyu Lin and Weishan Zhao and Li Zhu and Hari Sundaram and others , title =. ACL , year = 2024, pages =

work page 2024
[33]

Reddy , title =

Ming Zhu and Karthik Suresh and Chandan K. Reddy , title =. AAAI , year = 2022, pages =

work page 2022
[34]

CoRR , year = 2021, pages =

Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harri Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and others , title =. CoRR , year = 2021, pages =

work page 2021
[35]

ACL , year = 2002, pages =

Kishore Papineni and Salim Roukos and Todd Ward and Wei-Jing Zhu , title =. ACL , year = 2002, pages =

work page 2002
[36]

Terry , title =

Ralph Allan Bradley and Milton E. Terry , title =. Biometrika , year = 1952, pages =

work page 1952
[37]

Git Community , title =

work page
[38]

Wainwright and Pamela Mishkin and Chong Zhang and Sandhini Agarwal and Katarina Slama and Alex Ray and others , title =

Long Ouyang and Jeff Wu and Xu Jiang and Diogo Almeida and Carroll L. Wainwright and Pamela Mishkin and Chong Zhang and Sandhini Agarwal and Katarina Slama and Alex Ray and others , title =. NeurIPS , year = 2022, pages =

work page 2022
[39]

ACL , year = 2024, pages =

Ryan Park and Rafael Rafailov and Stefano Ermon and Chelsea Finn , title =. ACL , year = 2024, pages =

work page 2024
[40]

CoRR , year = 2024, pages =

Yanli Wang and Yanlin Wang and Suiquan Wang and Daya Guo and Jiachi Chen and John Grundy and Xilin Liu and Yuchi Ma and Mingzhi Mao and Hongyu Zhang and others , title =. CoRR , year = 2024, pages =

work page 2024
[41]

CoRR , year = 2024, pages =

Ali Reza Ibrahimzada and Kaiyao Ke and Mrigank Pawagi and Muhammad Salman Abid and Rangeet Pan and Saurabh Sinha and Reyhaneh Jabbarvand , title =. CoRR , year = 2024, pages =

work page 2024
[42]

CoRR , year = 2018, pages =

Aaron van den Oord and Yazhe Li and Oriol Vinyals , title =. CoRR , year = 2018, pages =

work page 2018
[43]

NeurIPS , year = 2022, pages =

Hung Le and Yue Wang and Akhilesh Deepak Gotmare and Silvio Savarese and Steven Chu Hong Hoi , title =. NeurIPS , year = 2022, pages =

work page 2022
[44]

ICSE , year = 2024, pages =

Yi Gao and Xing Hu and Tongtong Xu and Xin Xia and David Lo and Xiaohu Yang , title =. ICSE , year = 2024, pages =

work page 2024
[45]

CoRR , year = 2020, pages =

Shuo Ren and Daya Guo and Shuai Lu and Long Zhou and Shujie Liu and Duyu Tang and Neel Sundaresan and Ming Zhou and Ambrosio Blanco and Shuai Ma , title =. CoRR , year = 2020, pages =

work page 2020
[46]

ACL , year = 2025, address =

Kechi Zhang and Ge Li and Yihong Dong and Jingjing Xu and Jun Zhang and Jing Su and Yongfei Liu and Zhi Jin , title =. ACL , year = 2025, address =

work page 2025
[47]

Findings of NAACL , year = 2025, pages =

Leonidas Gee and Milan Gritta and Gerasimos Lampouras and Ignacio Iacobacci , title =. Findings of NAACL , year = 2025, pages =

work page 2025
[48]

ACL , year = 2025, address =

Zeyao Ma and Xiaokang Zhang and Jing Zhang and Jifan Yu and Sijia Luo and Jie Tang , title =. ACL , year = 2025, address =

work page 2025
[49]

EMNLP , year = 2023, pages =

Shuyan Zhou and Uri Alon and Sumit Agarwal and Graham Neubig , title =. EMNLP , year = 2023, pages =

work page 2023
[50]

CoRR , year = 2024, pages =

Binyuan Hui and Jian Yang and Zeyu Cui and Jiaxi Yang and Dayiheng Liu and Lei Zhang and Tianyu Liu and Jiajun Zhang and Bowen Yu and Keming Lu and others , title =. CoRR , year = 2024, pages =

work page 2024