LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.
arXiv preprint arXiv:2404.00213 , year=
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
JUDO enhances large multimodal models for industrial anomaly QA by juxtaposing query images with normal ones for visual comparison and using SFT plus GRPO with tailored rewards to inject domain knowledge, outperforming Qwen2.5-VL-7B and GPT-4o on the MMAD benchmark.
Correcting DeepSpeed optimizer and OpenRLHF loss bugs reveals SFT-then-RL outperforms mixed-policy methods by 3.8-22.2 points on math benchmarks.
FINCH is a loss-adaptive learning-rate schedule that reduces forgetting by 93% on average during LLM fine-tuning while matching standard task performance across several benchmarks.
BV-Blend blends prompt-local and semantic-cluster historical reward statistics via SEM-derived weights to stabilize critic-free RL advantage estimation.
Fine-tuned LLaMA 3.1-8B variants for the energy sector outperform the base model on domain QA benchmarks, with LoRA delivering similar gains at lower training cost.
citing papers explorer
-
Crafting Reversible SFT Behaviors in Large Language Models
LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.
-
JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA
JUDO enhances large multimodal models for industrial anomaly QA by juxtaposing query images with normal ones for visual comparison and using SFT plus GRPO with tailored rewards to inject domain knowledge, outperforming Qwen2.5-VL-7B and GPT-4o on the MMAD benchmark.
-
SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning
Correcting DeepSpeed optimizer and OpenRLHF loss bugs reveals SFT-then-RL outperforms mixed-policy methods by 3.8-22.2 points on math benchmarks.
-
Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates
FINCH is a loss-adaptive learning-rate schedule that reduces forgetting by 93% on average during LLM fine-tuning while matching standard task performance across several benchmarks.
-
BV-Blend: Uncertainty-Weighted Historical Baselines for Stable Critic-Free RL with Verifiable Rewards
BV-Blend blends prompt-local and semantic-cluster historical reward statistics via SEM-derived weights to stabilize critic-free RL advantage estimation.
-
Towards EnergyGPT: A Large Language Model Specialized for the Energy Sector
Fine-tuned LLaMA 3.1-8B variants for the energy sector outperform the base model on domain QA benchmarks, with LoRA delivering similar gains at lower training cost.