arxiv: 2605.00610 · v1 · submitted 2026-05-01 · 💻 cs.LG

Recognition: unknown

Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

Chaohao Yuan, Chenghao Xiao, Hong Cheng, Long-Kai Huang, Yu Rong

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:00 UTC · model grok-4.3

classification 💻 cs.LG

keywords task vectorsSFTRLVRLLM post-trainingmathematical reasoningtest-time synthesisdecoupled training

0 comments

The pith

SFT and RLVR task vectors can be synthesized at test time to match joint-training performance without parameter updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that supervised fine-tuning and reinforcement learning post-training produce task vectors with incompatible structures, including large magnitude gaps and opposing signs, which cause failures in sequential or joint training. These structural differences indicate that the two methods update partly complementary parts of the model. The proposed approach trains the two paradigms independently, then combines their effects only at inference through sparsified vector addition and a search for mixing weights on a few unlabeled examples. This delivers results on mathematical reasoning benchmarks that match or exceed full retraining methods while using far less computation and generalizing to new domains.

Core claim

Task vector analysis reveals three properties behind integration failures: a 30-fold magnitude disparity, 45-fold sign interference, and heterogeneous module-wise updates. These properties show SFT and RLVR modify partly complementary components. Therefore, independent training followed by test-time synthesis via selective sparsification with norm-preserving rescaling and Bayesian optimization of combination coefficients recovers joint-training performance without updating any parameters.

What carries the argument

Decoupled test-time synthesis of task vectors using selective sparsification with norm-preserving rescaling and Bayesian optimization to select coefficients on the Pareto frontier of consistency and perplexity.

If this is right

Matches or exceeds the performance of training-based SFT-RLVR integration methods across multiple mathematical reasoning benchmarks.
Incurs only about 3 percent of the computational cost of training-based integration methods.
Surpasses state-of-the-art models when applied to stronger post-trained checkpoints.
Generalizes to out-of-domain benchmarks without requiring any re-tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Task vectors from different post-training regimes may behave as modular components that can be mixed at deployment rather than during training.
The same decoupling approach could apply to other pairs of conflicting objectives, such as instruction tuning combined with preference optimization.
Reducing reliance on large joint training runs might allow more efficient exploration of many specialized checkpoints before final combination.

Load-bearing premise

The three structural properties of the task vectors mean that SFT and RLVR affect sufficiently complementary parts of the model for their combination through sparsified addition to recover joint-training results.

What would settle it

A side-by-side evaluation where the test-time synthesized model performs worse than a jointly trained SFT-RLVR model on the mathematical reasoning benchmarks, or fails to match performance on out-of-domain tests, would show the claim is incorrect.

Figures

Figures reproduced from arXiv: 2605.00610 by Chaohao Yuan, Chenghao Xiao, Hong Cheng, Long-Kai Huang, Yu Rong.

**Figure 1.** Figure 1: Empirical analysis of SFT and RLVR task vectors. (a) Layer-wise L2 norms show an approximately 30× magnitude gap between SFT and RLVR. (b) SFT sparsification substantially reduces sign interference with the sparsified RLVR task vector. The y-axis measures the fraction of parameters with opposite signs among the non-zero entries of the sparsified RLVR task vector. Without sparsification, 44.91% of parameter… view at source ↗

**Figure 2.** Figure 2: Overview of the DoTS framework. (A) Selective Sparsification. Each task vector is pruned to retain only its top-k% entries by magnitude and then rescaled to preserve its original L2 norm. (B) Difficulty-Aware Data Selection. A small set of unlabeled queries is stratified by difficulty, estimated from the answer consistency of the SFT and RLVR source models, to provide a reliable signal for coefficient sear… view at source ↗

**Figure 3.** Figure 3: Performance vs. number of adaptation samples. DoTS surpasses the SFT+RL sequential baseline with 16 unlabeled queries, and performance stabilizes around 64 queries. (SFT + DPO) achieves an average score of 30.3, outperforming SFT, DPO, and DFT. In the LUFFY setting, DoTS (SFT + On-Policy RL) reaches 40.0, outperforming both source models and LUFFY. These results show that DoTS remains effective at a small… view at source ↗

**Figure 4.** Figure 4: Pareto frontiers learned by DoTS across different backbones. Each point corresponds to one candidate coefficient pair. We select the final coefficients by maximum consistency for Qwen-series models and by the knee point for LLaMA3.1-8B. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

read the original abstract

SFT and RLVR represent two fundamental yet distinct paradigms for LLM post-training, each excelling in distinct dimensions. SFT expands knowledge breadth while RLVR enhances reasoning depth. Yet integrating these complementary strengths remains a formidable challenge. Sequential training can cause catastrophic forgetting, and joint optimization often suffers from severe gradient conflicts. We analyze SFT and RLVR through the lens of task vectors and reveal three structural properties behind these failures: a 30* magnitude disparity, 45* sign interference, and heterogeneous module-wise update distributions. These findings show SFT and RLVR are difficult to integrate directly, but they also suggest that the two paradigms modify partly complementary components of the model. Motivated by these observations, we propose Decoupled Test-time Synthesis (DoTS), a post-hoc framework allows SFT and RLVR checkpoints to be trained independently and synthesizes their capabilities only at inference time via task vector arithmetic, without updating model parameters. To reduce interference, DOTS applies selective sparsification with norm-preserving rescaling. It then uses Bayesian optimization on a small set of unlabeled queries to search for combination coefficients on the Pareto frontier of consistency and perplexity. Empirically, \ours matches or exceeds the performance of training-based SFT--RLVR integration methods across multiple mathematical reasoning benchmarks, incurring only $\sim$3\% of the computational cost. When applied to stronger post-trained checkpoints, DOTS surpasses SOTA models and generalizes to out-of-domain benchmarks without re-tuning. Code is available at https://github.com/chaohaoyuan/DoTS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DoTS gives a workable test-time way to mix SFT and RLVR via sparsified task vectors plus Bayesian tuning, but the ablations leave open whether the sparsification step is doing real work or if the search is carrying the results.

read the letter

The core result is that you can train an SFT model and an RLVR model on their own, then at inference time add a sparsified, norm-preserved version of their task vectors whose weights are picked by a quick Bayesian search over a handful of unlabeled queries. This gets performance on math reasoning benchmarks that matches or beats the usual joint-training or sequential baselines while using roughly 3% of the compute. When started from stronger checkpoints it even beats some current SOTA numbers and holds up out of domain without retuning.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Decoupled Test-time Synthesis (DoTS), a framework for integrating SFT and RLVR capabilities in LLMs at test time without parameter updates. Through analysis of task vectors, it identifies three structural properties—30× magnitude disparity, 45× sign interference, and heterogeneous module-wise updates—that hinder direct integration. DoTS applies selective sparsification with norm-preserving rescaling to task vectors and uses Bayesian optimization on a small set of unlabeled queries to determine combination coefficients optimizing consistency and perplexity. The paper claims that DoTS matches or exceeds training-based integration methods on mathematical reasoning benchmarks at approximately 3% of the computational cost, surpasses SOTA on stronger checkpoints, and generalizes to out-of-domain tasks.

Significance. If the empirical claims hold, the work provides a low-cost, post-hoc alternative to joint training or sequential fine-tuning for combining the breadth of SFT with the depth of RLVR. The release of code at the provided GitHub link is a strength that facilitates reproducibility and further investigation. This approach could have practical implications for efficient LLM post-training pipelines by avoiding catastrophic forgetting and gradient conflicts.

major comments (2)

[Task vector analysis and motivation] The inference that the three structural properties (magnitude disparity, sign interference, heterogeneous updates) imply complementary components modifiable via sparsified addition is load-bearing for the central claim. However, no ablation is presented that isolates the effect of sparsification and rescaling from the Bayesian optimization step (e.g., comparing direct addition of full task vectors using the same coefficient search procedure). Without this control, it remains possible that the performance gains are primarily driven by the external Bayesian search rather than mitigation of the identified interferences.
[Empirical evaluation] The reported benchmark improvements and ~3% cost claim lack error bars, multiple runs, or statistical tests. Additionally, details on the choice of sparsification threshold, the specific unlabeled queries used for Bayesian optimization, and controls for query selection bias are not provided, which undermines assessment of the robustness of the central empirical result.

minor comments (2)

[Abstract] The abstract mentions '∼3% of the computational cost' but does not specify the baseline computation being compared (e.g., full joint training FLOPs).
[Introduction] The use of 'RLVR' should be defined at first use if it is not standard (presumably Reinforcement Learning with Verifiable Rewards).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the motivation from task vector analysis and the robustness of the empirical results. We address each major comment below and will revise the manuscript to incorporate additional ablations, statistical details, and implementation clarifications.

read point-by-point responses

Referee: [Task vector analysis and motivation] The inference that the three structural properties (magnitude disparity, sign interference, heterogeneous updates) imply complementary components modifiable via sparsified addition is load-bearing for the central claim. However, no ablation is presented that isolates the effect of sparsification and rescaling from the Bayesian optimization step (e.g., comparing direct addition of full task vectors using the same coefficient search procedure). Without this control, it remains possible that the performance gains are primarily driven by the external Bayesian search rather than mitigation of the identified interferences.

Authors: We agree that isolating the contribution of sparsification and rescaling is important to substantiate the motivation. The three properties were identified via direct comparison of SFT and RLVR task vectors (Section 3), demonstrating that full-vector addition incurs interference. Sparsification with norm-preserving rescaling is intended to retain complementary updates while attenuating conflicts. To address the concern directly, we will add an ablation in the revised manuscript that applies the identical Bayesian optimization procedure to (i) full task vectors, (ii) sparsified vectors without rescaling, and (iii) the complete DoTS pipeline. This control will quantify the incremental benefit of the sparsification step beyond coefficient search alone. revision: yes
Referee: [Empirical evaluation] The reported benchmark improvements and ~3% cost claim lack error bars, multiple runs, or statistical tests. Additionally, details on the choice of sparsification threshold, the specific unlabeled queries used for Bayesian optimization, and controls for query selection bias are not provided, which undermines assessment of the robustness of the central empirical result.

Authors: We acknowledge that the current presentation lacks sufficient statistical detail and implementation transparency. In the revision we will (1) report all main results as means and standard deviations over five independent runs of Bayesian optimization with varied random seeds, including error bars and basic statistical comparisons; (2) explicitly document the sparsification threshold (top-5% norm components selected to align with the observed 30× magnitude disparity, with a sensitivity plot added); (3) specify the unlabeled query set (50 held-out GSM8K-style problems with the exact prompt template provided in the appendix); and (4) include a control varying the query set to demonstrate stability against selection bias. These additions will be placed in the main text and appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical validation remains external to inputs

full rationale

The paper observes three task-vector properties (magnitude disparity, sign interference, module heterogeneity) from SFT/RLVR checkpoints, uses them motivationally to justify selective sparsification plus norm-preserving rescaling, then applies Bayesian optimization over combination coefficients on a small unlabeled query set using consistency/perplexity objectives. Performance is subsequently measured on separate mathematical reasoning benchmarks. No equation or claim reduces the reported benchmark gains to the optimization inputs by construction; the coefficient search is an external, falsifiable procedure whose outputs are tested out-of-domain without re-tuning. No self-definitional loops, fitted-input-as-prediction, or load-bearing self-citations appear in the derivation. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The method rests on the domain assumption that task vectors remain additive after sparsification and that a small unlabeled query set suffices to locate a useful Pareto point; no new physical entities or free parameters are introduced beyond the sparsification ratio and the Bayesian objective weights.

free parameters (2)

sparsification ratio
Chosen to remove low-magnitude interference while preserving update norm; value not stated in abstract.
Bayesian objective weights
Trade-off between consistency and perplexity; tuned on unlabeled queries.

axioms (1)

domain assumption SFT and RLVR produce task vectors whose linear combination can recover joint-training performance when interference is reduced by sparsification.
Invoked to justify test-time synthesis instead of joint training.

pith-pipeline@v0.9.0 · 5595 in / 1463 out tokens · 34462 ms · 2026-05-09T19:00:43.687567+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 18 canonical work pages · 8 internal anchors

[1]

Hugging Face repository , volume=

Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions , author=. Hugging Face repository , volume=
[2]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Olympiadbench: A challenging benchmark for promoting agi with olympiad-level bilingual multimodal scientific problems , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[3]

Measuring Mathematical Problem Solving With the

Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt , booktitle=. Measuring Mathematical Problem Solving With the. 2021 , url=

2021
[4]

The Eleventh International Conference on Learning Representations , year=

Editing models with task arithmetic , author=. The Eleventh International Conference on Learning Representations , year=
[5]

When Scaling Meets

Biao Zhang and Zhongtao Liu and Colin Cherry and Orhan Firat , booktitle=. When Scaling Meets. 2024 , url=

2024
[6]

Nature , volume=

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning , author=. Nature , volume=. 2025 , publisher=

2025
[7]

2025 , url=

Tianzhe Chu and Yuexiang Zhai and Jihan Yang and Shengbang Tong and Saining Xie and Dale Schuurmans and Quoc V Le and Sergey Levine and Yi Ma , booktitle=. 2025 , url=

2025
[8]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities , author=. arXiv preprint arXiv:2507.06261 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement , author=. arXiv preprint arXiv:2409.12122 , year=

work page internal anchor Pith review arXiv
[12]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Learning to Reason under Off-Policy Guidance , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
[13]

2025 , publisher=

Open r1: A fully open reproduction of deepseek-r1 , author=. 2025 , publisher=

2025
[14]

Learning what reinforcement learning can’t: Interleaved online fine-tuning for hardest questions, 2026

Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions , author=. arXiv preprint arXiv:2506.07527 , year=

work page arXiv
[15]

2023 , url=

Prateek Yadav and Derek Tam and Leshem Choshen and Colin Raffel and Mohit Bansal , booktitle=. 2023 , url=

2023
[16]

On the generalization of sft: A reinforcement learning perspective with reward rectification, 2026

On the generalization of sft: A reinforcement learning perspective with reward rectification , author=. arXiv preprint arXiv:2508.05629 , year=

work page arXiv
[17]

The Thirteenth International Conference on Learning Representations , year=

Efficient Model Editing with Task-Localized Sparse Fine-tuning , author=. The Thirteenth International Conference on Learning Representations , year=
[18]

Weihao Zeng and Yuzhen Huang and Qian Liu and Wei Liu and Keqing He and Zejun MA and Junxian He , booktitle=. Simple. 2025 , url=

2025
[19]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
[20]

Process Reinforcement through Implicit Rewards

Process reinforcement through implicit rewards , author=. arXiv preprint arXiv:2502.01456 , year=

work page internal anchor Pith review arXiv
[21]

Conference on Language Modeling (COLM) , year=

Understanding r1-zero-like training: A critical perspective , author=. Conference on Language Modeling (COLM) , year=
[22]

Supervised fine tuning on curated data is reinforcement learning (and can be improved), 2025

Supervised fine tuning on curated data is reinforcement learning (and can be improved) , author=. arXiv preprint arXiv:2507.12856 , year=

work page arXiv
[23]

The Thirteenth International Conference on Learning Representations , year=

When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers , author=. The Thirteenth International Conference on Learning Representations , year=
[24]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track) , pages=

Light-r1: Curriculum sft, dpo and rl for long cot from scratch and beyond , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track) , pages=
[25]

Right Question is Already Half the Answer: Fully Unsupervised

Qingyang Zhang and Haitao Wu and Changqing Zhang and Peilin Zhao and Yatao Bian , booktitle=. Right Question is Already Half the Answer: Fully Unsupervised. 2025 , url=

2025
[26]

Visual generation without guidance.Forty-second international conference on machine learning, 2025a

Bridging supervised learning and reinforcement learning in math reasoning , author=. arXiv preprint arXiv:2505.18116 , year=

work page arXiv
[27]

Srft: A single-stage method with supervised and reinforcement fine-tuning for reasoning.arXiv preprint arXiv:2506.19767, 2025

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning , author=. arXiv preprint arXiv:2506.19767 , year=

work page arXiv
[28]

Akiba, Takuya and Sano, Shotaro and Yanase, Toshihiko and Ohta, Takeru and Koyama, Masanori , booktitle=
[29]

Transactions on Machine Learning Research , issn=

Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

2025
[30]

European Conference on Computer Vision , pages=

Model breadcrumbs: Scaling multi-task model merging with sparse masks , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[31]

PopulAtion Parameter Averaging (

Alexia Jolicoeur-Martineau and Emy Gervais and Kilian FATRAS and Yan Zhang and Simon Lacoste-Julien , journal=. PopulAtion Parameter Averaging (. 2024 , url=

2024
[32]

Advances in Neural Information Processing Systems , volume=

Model fusion via optimal transport , author=. Advances in Neural Information Processing Systems , volume=
[33]

Advances in neural information processing systems , volume=

Algorithms for hyper-parameter optimization , author=. Advances in neural information processing systems , volume=
[34]

2025 , url=

Xuechen Zhang and Zijian Huang and Yingcong Li and Chenshun Ni and Jiasi Chen and Samet Oymak , booktitle=. 2025 , url=

2025
[35]

Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them , author=

Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them , author=. arXiv preprint arXiv:2507.10616 , year=

work page arXiv
[36]

OpenAI o1 System Card

Openai o1 system card , author=. arXiv preprint arXiv:2412.16720 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[37]

Forty-first International Conference on Machine Learning , year=

Language models are super mario: Absorbing abilities from homologous models as a free lunch , author=. Forty-first International Conference on Machine Learning , year=
[38]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
[39]

The Twelfth International Conference on Learning Representations , year=

The False Promise of Imitating Proprietary Language Models , author=. The Twelfth International Conference on Learning Representations , year=
[40]

The Twelfth International Conference on Learning Representations , year=

Let's Verify Step by Step , author=. The Twelfth International Conference on Learning Representations , year=
[41]

The Thirteenth International Conference on Learning Representations , year=

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws , author=. The Thirteenth International Conference on Learning Representations , year=
[42]

Training Verifiers to Solve Math Word Problems

Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[43]

arXiv preprint arXiv:2511.08567 , year=

The path not taken: Rlvr provably learns off the principals , author=. arXiv preprint arXiv:2511.08567 , year=

work page arXiv
[44]

Advances in neural information processing systems , volume=

Direct preference optimization: Your language model is secretly a reward model , author=. Advances in neural information processing systems , volume=
[45]

arXiv preprint arXiv:2509.21128 , year=

Rl squeezes, sft expands: A comparative study of reasoning llms , author=. arXiv preprint arXiv:2509.21128 , year=

work page arXiv
[46]

Templaterl: Structured template-guided reinforcement learning for llm reasoning.arXiv preprint arXiv:2505.15692, 2025

Thought-augmented policy optimization: Bridging external guidance and internal capabilities , author=. arXiv preprint arXiv:2505.15692 , volume=

work page arXiv
[47]

Forty-second International Conference on Machine Learning , year=

Whoever Started the interference Should End It: Guiding Data-Free Model Merging via Task Vectors , author=. Forty-second International Conference on Machine Learning , year=
[48]

The Twelfth International Conference on Learning Representations , year=

AdaMerging: Adaptive Model Merging for Multi-Task Learning , author=. The Twelfth International Conference on Learning Representations , year=
[49]

ACM Computing Surveys , year=

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications, and Opportunities , author=. ACM Computing Surveys , year=
[50]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

Mitigating the alignment tax of rlhf , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

2024
[51]

arXiv preprint arXiv:2406.16768 , year=

Warp: On the benefits of weight averaged rewarded policies , author=. arXiv preprint arXiv:2406.16768 , year=

work page arXiv
[52]

2024 , url=

Alexandre Rame and Nino Vieillard and Leonard Hussenot and Robert Dadashi and Geoffrey Cideron and Olivier Bachem and Johan Ferret , booktitle=. 2024 , url=

2024
[53]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[54]

Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
[55]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) , address=

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) , address=. 2024 , url=

2024
[56]

Wong and Yu Cheng , booktitle=

Runzhe Zhan and Yafu Li and Zhi Wang and Xiaoye Qu and Dongrui Liu and Jing Shao and Derek F. Wong and Yu Cheng , booktitle=. Ex. 2026 , url=

2026