On the impact of fine-tuning on chain-of- thought reasoning.arXiv preprint arXiv:2411.15382

· 2025 · arXiv 2411.15382

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

UNIPO: Unified Interactive Visual Explanation for RL Fine-Tuning Policy Optimization

cs.HC · 2026-05-12 · unverdicted · novelty 6.0

UNIPO is the first unified interactive visualization tool exposing token-level training dynamics of RL fine-tuning algorithms for LLMs through high-level overviews, step inspectors, and side-by-side comparisons.

Can You Break RLVER? Probing Adversarial Robustness of RL-Trained Empathetic Agents

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

RLVER agents improve emotional responsiveness under adversarial user behaviors but exhibit no measurable gains in tracking emotional states compared to untuned base models.

SeLaR: Selective Latent Reasoning in Large Language Models

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.

Semantic Communication with an LLM-enabled Knowledge Base

eess.SP · 2026-04-07 · unverdicted · novelty 6.0

SC-LMKB uses LLM-generated data with cross-domain fusion to cut hallucinations and delivers up to 72.6% gains on cross-modality retrieval tasks over standard semantic communication.

citing papers explorer

Showing 4 of 4 citing papers.

UNIPO: Unified Interactive Visual Explanation for RL Fine-Tuning Policy Optimization cs.HC · 2026-05-12 · unverdicted · none · ref 20
UNIPO is the first unified interactive visualization tool exposing token-level training dynamics of RL fine-tuning algorithms for LLMs through high-level overviews, step inspectors, and side-by-side comparisons.
Can You Break RLVER? Probing Adversarial Robustness of RL-Trained Empathetic Agents cs.AI · 2026-05-08 · unverdicted · none · ref 9
RLVER agents improve emotional responsiveness under adversarial user behaviors but exhibit no measurable gains in tracking emotional states compared to untuned base models.
SeLaR: Selective Latent Reasoning in Large Language Models cs.CL · 2026-04-09 · unverdicted · none · ref 21
SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.
Semantic Communication with an LLM-enabled Knowledge Base eess.SP · 2026-04-07 · unverdicted · none · ref 30
SC-LMKB uses LLM-generated data with cross-domain fusion to cut hallucinations and delivers up to 72.6% gains on cross-modality retrieval tasks over standard semantic communication.

On the impact of fine-tuning on chain-of- thought reasoning.arXiv preprint arXiv:2411.15382

fields

years

verdicts

representative citing papers

citing papers explorer