UNIPO is the first unified interactive visualization tool exposing token-level training dynamics of RL fine-tuning algorithms for LLMs through high-level overviews, step inspectors, and side-by-side comparisons.
On the impact of fine-tuning on chain-of- thought reasoning.arXiv preprint arXiv:2411.15382
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
RLVER agents improve emotional responsiveness under adversarial user behaviors but exhibit no measurable gains in tracking emotional states compared to untuned base models.
SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.
SC-LMKB uses LLM-generated data with cross-domain fusion to cut hallucinations and delivers up to 72.6% gains on cross-modality retrieval tasks over standard semantic communication.
citing papers explorer
-
UNIPO: Unified Interactive Visual Explanation for RL Fine-Tuning Policy Optimization
UNIPO is the first unified interactive visualization tool exposing token-level training dynamics of RL fine-tuning algorithms for LLMs through high-level overviews, step inspectors, and side-by-side comparisons.
-
Can You Break RLVER? Probing Adversarial Robustness of RL-Trained Empathetic Agents
RLVER agents improve emotional responsiveness under adversarial user behaviors but exhibit no measurable gains in tracking emotional states compared to untuned base models.
-
SeLaR: Selective Latent Reasoning in Large Language Models
SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.
-
Semantic Communication with an LLM-enabled Knowledge Base
SC-LMKB uses LLM-generated data with cross-domain fusion to cut hallucinations and delivers up to 72.6% gains on cross-modality retrieval tasks over standard semantic communication.