Canonical reference

arXiv preprint arXiv:2505.00949 , year=

Llama-nemotron: Efficient reasoning models , author= · 2025 · arXiv 2505.00949

Canonical reference. 80% of citing Pith papers cite this work as background.

35 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 35 citing papers

citation-role summary

background 4 baseline 1

citation-polarity summary

background 4 baseline 1

representative citing papers

Know When to Stop: Segment-Level Credit Assignment for Reducing Overthinking

cs.CL · 2026-07-01 · unverdicted · novelty 7.0

DASH assigns segment-level credit in reasoning traces using drift toward ground-truth answers, yielding 50.8% accuracy on AIME25 versus 45.4% for GRPO while reducing overthinking behaviors.

Cybersecurity AI (CAI) Dataset

cs.CR · 2026-05-27 · unverdicted · novelty 7.0

CAI Dataset is presented as the largest described corpus of LLM-driven hacker trajectories, with the claim that operator data concentration in frontier-model providers creates a major security risk best addressed by on-premise specialized LLMs.

Learnability-Informed Fine-Tuning of Diffusion Language Models

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

LIFT is a learnability-informed SFT algorithm for diffusion LMs that aligns token difficulty with diffusion time steps, yielding up to 3x gains on AIME'24 and AIME'25 over standard SFT baselines.

Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL

cs.CL · 2026-04-22 · unverdicted · novelty 7.0

Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.

Action-guided generation of 3D functionality segmentation data

cs.CV · 2025-11-28 · unverdicted · novelty 7.0

SynthFun3D generates synthetic 3D functionality segmentation data from action descriptions via object retrieval and scene arrangement, yielding consistent gains of +2.2 mAP, +6.3 mAR, and +5.7 mIoU when augmenting real data for VLM training.

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge

cs.CL · 2025-10-21 · unverdicted · novelty 7.0

ProfBench is a new multi-domain benchmark with human-expert rubrics for judging LLM responses on professional tasks, showing top models reach only 65.9% performance while providing cheap LLM judges that reduce evaluation cost by orders of magnitude.

Soft-SVeRL: Self-Verified Reinforcement Learning with Soft Rewards

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

Soft-RLVR converts prompts to checklists for item-level LLM scoring to create soft RL rewards, with a stabilized self-verifying variant, yielding up to 11.1 point gains on IFEval.

MobileMoE: Scaling On-Device Mixture of Experts

cs.LG · 2026-05-26 · unverdicted · novelty 6.0

MobileMoE introduces on-device MoE LLMs that match dense models with 2-4x fewer FLOPs and provide efficient smartphone inference.

Learned Relay Representations for Forward-Thinking Discrete Diffusion Models

cs.LG · 2026-05-21 · unverdicted · novelty 6.0 · 2 refs

Learned Relay Representations add a differentiable per-token channel to masked diffusion models so they can propagate latent information across iterative denoising steps, yielding better coding performance and up to 32% lower latency on Fast-dLLM v2 than standard supervised finetuning.

Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.

Post-Trained MoE Can Skip Half Experts via Self-Distillation

cs.LG · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

ZEDA turns post-trained static MoE models into dynamic ones via zero-output expert injection and two-stage self-distillation, cutting over 50% expert FLOPs on Qwen3-30B-A3B and GLM-4.7-Flash with small accuracy drops across 11 benchmarks.

Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

cs.CL · 2026-05-17 · unverdicted · novelty 6.0

PUMA detects reasoning-level semantic redundancy to enable early exit in chains of thought, achieving 26.2% average token reduction across five LRMs and five benchmarks while preserving accuracy and CoT quality.

Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

RACER routes between reasoning and non-reasoning LLM judges via constrained distributionally robust optimization to achieve better accuracy-cost trade-offs under distribution shift.

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

cs.CV · 2026-04-27 · unverdicted · novelty 6.0 · 2 refs

Tuna-2 shows that direct pixel embeddings can replace vision encoders in unified multimodal models, achieving competitive generation and stronger understanding at scale.

Infection-Reasoner: A Compact Vision-Language Model for Wound Infection Classification with Evidence-Grounded Clinical Reasoning

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

Infection-Reasoner, a 4B VLM, reaches 86.8% accuracy on wound infection classification while producing rationales rated mostly correct by experts, via GPT-5.1 distillation followed by reinforcement learning.

Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

Adversarial competition between attacker and defender teams generates diverse multi-turn conversational data that improves LLM performance on secure code generation benchmarks by 18-29%.

Characterizing Model-Native Skills

cs.AI · 2026-04-19 · conditional · novelty 6.0

Recovering an orthogonal basis from model activations yields a model-native skill characterization that improves reasoning Pass@1 by up to 41% via targeted data selection and supports inference steering, outperforming human-characterized alternatives.

Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion

cs.CL · 2026-04-07 · conditional · novelty 6.0

Attention Editing converts pre-trained LLMs to new attention architectures through layer-wise teacher-forced optimization and model-level distillation, preserving performance with efficiency gains.

Flexible Entropy Control in RLVR with a Gradient-Preserving Perspective

cs.LG · 2026-02-10 · unverdicted · novelty 6.0

Dynamic clipping strategies based on importance sampling regions enable precise entropy management in RLVR, mitigating collapse and improving benchmark performance.

The Signal is in the Steps: Local Scoring for Reasoning Data Selection

cs.LG · 2025-10-05 · unverdicted · novelty 6.0

LALP scores local reasoning steps rather than full trajectories to improve selection of training data from diverse teacher models for distilling long-form reasoning.

Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!

cs.CR · 2025-07-02 · unverdicted · novelty 6.0

Standard deviation distributions of attention matrices in LLMs remain distinctive and stable after continued training, enabling fingerprinting to trace model lineage and detect potential plagiarism such as in Pangu Pro MoE.

Textual Bayes: Quantifying Prompt Uncertainty in LLM-Based Systems

cs.LG · 2025-06-11 · unverdicted · novelty 6.0

Introduces a Bayesian framework viewing LLM prompts as textual parameters and proposes MHLP, a novel MCMC algorithm using LLM proposals, to perform inference and improve accuracy plus uncertainty quantification on benchmarks.

Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs

cs.CV · 2025-05-21 · unverdicted · novelty 6.0

Chain-of-Focus enables VLMs to adaptively search and zoom on important image areas via a two-stage SFT and RL pipeline on a custom 3K-sample dataset, yielding 5% gains on the V* benchmark across resolutions from 224 to 4K.

The Hidden Power of Scaling Factor in LoRA Optimization

cs.AI · 2026-06-11 · unverdicted · novelty 5.0

Alpha in LoRA outperforms learning-rate scaling, follows a square-root law with rank, and enables a minimalist LoRA-alpha method that improves performance across tasks.

citing papers explorer

Showing 35 of 35 citing papers.

Know When to Stop: Segment-Level Credit Assignment for Reducing Overthinking cs.CL · 2026-07-01 · unverdicted · none · ref 72
DASH assigns segment-level credit in reasoning traces using drift toward ground-truth answers, yielding 50.8% accuracy on AIME25 versus 45.4% for GRPO while reducing overthinking behaviors.
Cybersecurity AI (CAI) Dataset cs.CR · 2026-05-27 · unverdicted · none · ref 54
CAI Dataset is presented as the largest described corpus of LLM-driven hacker trajectories, with the claim that operator data concentration in frontier-model providers creates a major security risk best addressed by on-premise specialized LLMs.
Learnability-Informed Fine-Tuning of Diffusion Language Models cs.CL · 2026-05-21 · unverdicted · none · ref 3
LIFT is a learnability-informed SFT algorithm for diffusion LMs that aligns token difficulty with diffusion time steps, yielding up to 3x gains on AIME'24 and AIME'25 over standard SFT baselines.
Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL cs.CL · 2026-04-22 · unverdicted · none · ref 11
Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.
Action-guided generation of 3D functionality segmentation data cs.CV · 2025-11-28 · unverdicted · none · ref 6
SynthFun3D generates synthetic 3D functionality segmentation data from action descriptions via object retrieval and scene arrangement, yielding consistent gains of +2.2 mAP, +6.3 mAR, and +5.7 mIoU when augmenting real data for VLM training.
ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge cs.CL · 2025-10-21 · unverdicted · none · ref 1
ProfBench is a new multi-domain benchmark with human-expert rubrics for judging LLM responses on professional tasks, showing top models reach only 65.9% performance while providing cheap LLM judges that reduce evaluation cost by orders of magnitude.
Soft-SVeRL: Self-Verified Reinforcement Learning with Soft Rewards cs.CL · 2026-05-27 · unverdicted · none · ref 1
Soft-RLVR converts prompts to checklists for item-level LLM scoring to create soft RL rewards, with a stabilized self-verifying variant, yielding up to 11.1 point gains on IFEval.
MobileMoE: Scaling On-Device Mixture of Experts cs.LG · 2026-05-26 · unverdicted · none · ref 3
MobileMoE introduces on-device MoE LLMs that match dense models with 2-4x fewer FLOPs and provide efficient smartphone inference.
Learned Relay Representations for Forward-Thinking Discrete Diffusion Models cs.LG · 2026-05-21 · unverdicted · none · ref 22 · 2 links
Learned Relay Representations add a differentiable per-token channel to masked diffusion models so they can propagate latent information across iterative denoising steps, yielding better coding performance and up to 32% lower latency on Fast-dLLM v2 than standard supervised finetuning.
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning cs.AI · 2026-05-21 · unverdicted · none · ref 8
SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.
Post-Trained MoE Can Skip Half Experts via Self-Distillation cs.LG · 2026-05-18 · unverdicted · none · ref 18 · 2 links
ZEDA turns post-trained static MoE models into dynamic ones via zero-output expert injection and two-stage self-distillation, cutting over 50% expert FLOPs on Qwen3-30B-A3B and GLM-4.7-Flash with small accuracy drops across 11 benchmarks.
Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models cs.CL · 2026-05-17 · unverdicted · none · ref 42
PUMA detects reasoning-level semantic redundancy to enable early exit in chains of thought, achieving 26.2% average token reduction across five LRMs and five benchmarks while preserving accuracy and CoT quality.
Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge cs.AI · 2026-05-11 · unverdicted · none · ref 2
RACER routes between reasoning and non-reasoning LLM judges via constrained distributionally robust optimization to achieve better accuracy-cost trade-offs under distribution shift.
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation cs.CV · 2026-04-27 · unverdicted · none · ref 6 · 2 links
Tuna-2 shows that direct pixel embeddings can replace vision encoders in unified multimodal models, achieving competitive generation and stronger understanding at scale.
Infection-Reasoner: A Compact Vision-Language Model for Wound Infection Classification with Evidence-Grounded Clinical Reasoning cs.CV · 2026-04-21 · unverdicted · none · ref 7
Infection-Reasoner, a 4B VLM, reaches 86.8% accuracy on wound infection classification while producing rationales rated mostly correct by experts, via GPT-5.1 distillation followed by reinforcement learning.
Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition cs.AI · 2026-04-20 · unverdicted · none · ref 63
Adversarial competition between attacker and defender teams generates diverse multi-turn conversational data that improves LLM performance on secure code generation benchmarks by 18-29%.
Characterizing Model-Native Skills cs.AI · 2026-04-19 · conditional · none · ref 9
Recovering an orthogonal basis from model activations yields a model-native skill characterization that improves reasoning Pass@1 by up to 41% via targeted data selection and supports inference steering, outperforming human-characterized alternatives.
Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion cs.CL · 2026-04-07 · conditional · none · ref 19
Attention Editing converts pre-trained LLMs to new attention architectures through layer-wise teacher-forced optimization and model-level distillation, preserving performance with efficiency gains.
Flexible Entropy Control in RLVR with a Gradient-Preserving Perspective cs.LG · 2026-02-10 · unverdicted · none · ref 2
Dynamic clipping strategies based on importance sampling regions enable precise entropy management in RLVR, mitigating collapse and improving benchmark performance.
The Signal is in the Steps: Local Scoring for Reasoning Data Selection cs.LG · 2025-10-05 · unverdicted · none · ref 1
LALP scores local reasoning steps rather than full trajectories to improve selection of training data from diverse teacher models for distilling long-form reasoning.
Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model! cs.CR · 2025-07-02 · unverdicted · none · ref 1
Standard deviation distributions of attention matrices in LLMs remain distinctive and stable after continued training, enabling fingerprinting to trace model lineage and detect potential plagiarism such as in Pangu Pro MoE.
Textual Bayes: Quantifying Prompt Uncertainty in LLM-Based Systems cs.LG · 2025-06-11 · unverdicted · none · ref 3
Introduces a Bayesian framework viewing LLM prompts as textual parameters and proposes MHLP, a novel MCMC algorithm using LLM proposals, to perform inference and improve accuracy plus uncertainty quantification on benchmarks.
Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs cs.CV · 2025-05-21 · unverdicted · none · ref 6
Chain-of-Focus enables VLMs to adaptively search and zoom on important image areas via a two-stage SFT and RL pipeline on a custom 3K-sample dataset, yielding 5% gains on the V* benchmark across resolutions from 224 to 4K.
The Hidden Power of Scaling Factor in LoRA Optimization cs.AI · 2026-06-11 · unverdicted · none · ref 108
Alpha in LoRA outperforms learning-rate scaling, follows a square-root law with rank, and enables a minimalist LoRA-alpha method that improves performance across tasks.
SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking cs.AI · 2026-04-09 · unverdicted · none · ref 2
SAT reduces reasoning tokens by up to 40% across multiple large reasoning models and benchmarks by adaptively pruning steps based on difficulty while maintaining or improving accuracy.
NVIDIA Nemotron 3: Efficient and Open Intelligence cs.CL · 2025-12-24 · unverdicted · none · ref 142
NVIDIA releases the Nemotron 3 model family with hybrid Mamba-Transformer architecture, LatentMoE, NVFP4 training, MTP layers, and multi-environment RL post-training for reasoning and agentic tasks.
A Scalable Multi-LLM Collaboration System with Retrieval-based Selection and Exploration-Exploitation-Driven Enhancement cs.CL · 2025-07-14 · unverdicted · none · ref 8
SMCS coordinates 15 open-source LLMs via retrieval-based prior selection and exploration-exploitation posterior enhancement, outperforming GPT-4.1 by 5.36% and GPT-o3-mini by 5.28% on eight benchmarks.
Multi-Agentic System Leveraging Open-Source LLMs to Mitigate Disinformation Threats cs.CL · 2026-06-29 · unverdicted · none · ref 4
Multi-agent LLM system with consensus and hierarchy outperforms individual models on disinformation detection tasks across English, Polish, Slovak, and Bulgarian datasets.
AI-Model Network: Concept, Current State and Future cs.AI · 2026-05-25 · unverdicted · none · ref 13
The paper introduces the concept, vision, and hierarchical architecture of a worldwide AI-model network (AI-ModelNet) for model interconnection, sharing, and collaboration, validated via a prototype.
CLaC at SemEval-2026 Task 6: Response Clarity Detection in Political Discourse cs.CL · 2026-05-04 · unverdicted · none · ref 16
An LLM ensemble reached 80 macro-F1 on 3-class clarity detection and 59 on 9-class evasion detection, with partial layer unfreezing and multilingual ensembles improving encoder results while enriched context helped only LLMs.
XekRung Technical Report cs.CR · 2026-04-30 · unverdicted · none · ref 18
XekRung achieves state-of-the-art performance on cybersecurity benchmarks among same-scale models via tailored data synthesis and multi-stage training while retaining strong general capabilities.
A Survey of Reinforcement Learning for Large Reasoning Models cs.CL · 2025-09-10 · accept · none · ref 34
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment cs.CL · 2026-01-20 · unreviewed · ref 4
Scaling Latent Reasoning via Looped Language Models cs.CL · 2025-10-29 · unreviewed · ref 53
Reinforcement Learning from Human Feedback cs.LG · 2025-04-16 · unreviewed · ref 171

arXiv preprint arXiv:2505.00949 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer