hub Mixed citations

NVIDIA Nemotron 3: Efficient and Open Intelligence

NVIDIA: Aaron Blakeman, Aaron Grattafiori, Aarti Basant, Abhibha Gupta, Abhinav Khattar, Adi Renduchintala · 2025 · cs.CL · arXiv 2512.20856

Mixed citation behavior. Most common role is background (44%).

18 Pith papers citing it

Background 44% of classified citations

open full Pith review browse 18 citing papers arXiv PDF

abstract

We introduce the Nemotron 3 family of models - Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities. The Nemotron 3 family uses a Mixture-of-Experts hybrid Mamba-Transformer architecture to provide best-in-class throughput and context lengths of up to 1M tokens. Super and Ultra models are trained with NVFP4 and incorporate LatentMoE, a novel approach that improves model quality. The two larger models also include MTP layers for faster text generation. All Nemotron 3 models are post-trained using multi-environment reinforcement learning enabling reasoning, multi-step tool use, and support granular reasoning budget control. Nano, the smallest model, outperforms comparable models in accuracy while remaining extremely cost-efficient for inference. Super is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Ultra, the largest model, provides state-of-the-art accuracy and reasoning performance. Nano is released together with its technical report and this white paper, while Super and Ultra will follow in the coming months. We will openly release the model weights, pre- and post-training software, recipes, and all data for which we hold redistribution rights.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 method 2 baseline 1 dataset 1 other 1

citation-polarity summary

background 4 use method 2 baseline 1 unclear 1 use dataset 1

representative citing papers

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

cs.CL · 2025-12-01 · conditional · novelty 7.0

Four Over Six adaptively scales blocks in NVFP4 quantization to smaller FP4 values, making representable value distributions more uniform and reducing quantization error especially for near-maximal values.

Timesteps of Mamba Align with Human Reading Times

cs.CL · 2026-06-29 · unverdicted · novelty 6.0

Mamba's per-word timesteps significantly predict human reading times beyond GPT-2 surprisal in a naturalistic dataset.

Nemotron-Labs-TwoTower: Diffusion Language Modeling with Pretrained Autoregressive Context

cs.CL · 2026-06-25 · unverdicted · novelty 6.0

TwoTower decouples autoregressive context from diffusion denoising using a frozen AR tower and bidirectional diffusion tower, retaining 98.7% baseline quality with 2.42X throughput on a 30B model trained on 2.1T tokens.

PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior

cs.CR · 2026-05-12 · unverdicted · novelty 6.0

PrivacySIM shows that conditioning LLMs on user personas like demographics and attitudes improves simulation of privacy choices but reaches only 40.4% accuracy against real responses from 1,000 users.

Shepherd: Enabling Programmable Meta-Agents via Reversible Agentic Execution Traces

cs.AI · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Shepherd provides a reversible execution trace substrate for LLM agents that enables meta-agents to inspect and transform runs, yielding reported gains on coding and terminal benchmarks via supervision, counterfactual repair, and RL credit assignment.

Structured Recurrent Mixers for Massively Parallelized Sequence Generation

cs.CL · 2026-05-09 · unverdicted · novelty 6.0 · 3 refs

Structured Recurrent Mixers provide a dual parallel-recurrent representation for sequence models, claiming superior training efficiency, information capacity, and inference throughput over linear complexity alternatives.

Priming: Hybrid State Space Models From Pre-trained Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Priming transfers knowledge from pre-trained Transformers to hybrid SSM-attention models, recovering performance with minimal additional tokens and showing Gated KalmaNet outperforming Mamba-2 on long-context reasoning at 32B scale.

Hypothesis generation and updating in large language models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.

Evaluation Awareness in Language Models Has Limited Effect on Behaviour

cs.CL · 2026-05-07 · conditional · novelty 6.0

Verbalised evaluation awareness in large reasoning models has only small effects on their outputs across safety and alignment tests.

When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

cs.CL · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

A new benchmark shows LLM first-answer accuracy on procedural arithmetic drops from 63% (5 steps) to 20% (95 steps) due to execution failures like skipped steps and premature answers.

AVISE: Framework for Evaluating the Security of AI Systems

cs.CR · 2026-04-22 · unverdicted · novelty 6.0

AVISE provides a new framework and automated SET that identifies jailbreak vulnerabilities in language models with 92% accuracy, finding all nine tested models vulnerable to an augmented Red Queen attack.

Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

Unsupervised single-generation confidence calibration for reasoning LLMs via offline self-consistency proxy distillation outperforms baselines on math and QA tasks and improves selective prediction.

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

cs.LG · 2026-04-14 · unverdicted · novelty 6.0 · 2 refs

Lightning OPD is an offline on-policy distillation method that matches standard OPD performance at 4x efficiency by enforcing teacher consistency between SFT and distillation phases.

How Transformers Learn to Plan via Multi-Token Prediction

cs.LG · 2026-04-13 · conditional · novelty 6.0

Multi-token prediction induces a two-stage reverse reasoning process in Transformers via gradient decoupling, improving planning on synthetic and realistic tasks.

Claw-Eval: Towards Trustworthy Evaluation of Autonomous Agents

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

Claw-Eval is a new trajectory-aware benchmark for LLM agents that records execution traces, audit logs, and environment snapshots to evaluate completion, safety, and robustness across 300 tasks, revealing that opaque grading misses 44% of safety issues.

Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving

cs.DC · 2026-05-07 · unverdicted · novelty 5.0

Irminsul recovers up to 83% of prompt tokens above exact-prefix matching and delivers 63% prefill energy savings per cache hit on MLA-MoE models by content-hashing CDC chunks and applying closed-form kr correction.

SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

cs.DC · 2026-02-10

citing papers explorer

Showing 18 of 18 citing papers.

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models cs.LG · 2026-05-10 · unverdicted · none · ref 71 · internal anchor
LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.
Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling cs.CL · 2025-12-01 · conditional · none · ref 22 · internal anchor
Four Over Six adaptively scales blocks in NVFP4 quantization to smaller FP4 values, making representable value distributions more uniform and reducing quantization error especially for near-maximal values.
Timesteps of Mamba Align with Human Reading Times cs.CL · 2026-06-29 · unverdicted · none · ref 43 · internal anchor
Mamba's per-word timesteps significantly predict human reading times beyond GPT-2 surprisal in a naturalistic dataset.
Nemotron-Labs-TwoTower: Diffusion Language Modeling with Pretrained Autoregressive Context cs.CL · 2026-06-25 · unverdicted · none · ref 9 · internal anchor
TwoTower decouples autoregressive context from diffusion denoising using a frozen AR tower and bidirectional diffusion tower, retaining 98.7% baseline quality with 2.42X throughput on a 30B model trained on 2.1T tokens.
PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior cs.CR · 2026-05-12 · unverdicted · none · ref 30 · internal anchor
PrivacySIM shows that conditioning LLMs on user personas like demographics and attitudes improves simulation of privacy choices but reaches only 40.4% accuracy against real responses from 1,000 users.
Shepherd: Enabling Programmable Meta-Agents via Reversible Agentic Execution Traces cs.AI · 2026-05-11 · unverdicted · none · ref 27 · 2 links · internal anchor
Shepherd provides a reversible execution trace substrate for LLM agents that enables meta-agents to inspect and transform runs, yielding reported gains on coding and terminal benchmarks via supervision, counterfactual repair, and RL credit assignment.
Structured Recurrent Mixers for Massively Parallelized Sequence Generation cs.CL · 2026-05-09 · unverdicted · none · ref 9 · 3 links · internal anchor
Structured Recurrent Mixers provide a dual parallel-recurrent representation for sequence models, claiming superior training efficiency, information capacity, and inference throughput over linear complexity alternatives.
Priming: Hybrid State Space Models From Pre-trained Transformers cs.LG · 2026-05-08 · unverdicted · none · ref 9 · internal anchor
Priming transfers knowledge from pre-trained Transformers to hybrid SSM-attention models, recovering performance with minimal additional tokens and showing Gated KalmaNet outperforming Mamba-2 on long-context reasoning at 32B scale.
Hypothesis generation and updating in large language models cs.LG · 2026-05-07 · unverdicted · none · ref 1 · internal anchor
LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
Evaluation Awareness in Language Models Has Limited Effect on Behaviour cs.CL · 2026-05-07 · conditional · none · ref 6 · internal anchor
Verbalised evaluation awareness in large reasoning models has only small effects on their outputs across safety and alignment tests.
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models cs.CL · 2026-05-01 · unverdicted · none · ref 5 · 2 links · internal anchor
A new benchmark shows LLM first-answer accuracy on procedural arithmetic drops from 63% (5 steps) to 20% (95 steps) due to execution failures like skipped steps and premature answers.
AVISE: Framework for Evaluating the Security of AI Systems cs.CR · 2026-04-22 · unverdicted · none · ref 49 · internal anchor
AVISE provides a new framework and automated SET that identifies jailbreak vulnerabilities in language models with 92% accuracy, finding all nine tested models vulnerable to an augmented Red Queen attack.
Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation cs.LG · 2026-04-21 · unverdicted · none · ref 185 · internal anchor
Unsupervised single-generation confidence calibration for reasoning LLMs via offline self-consistency proxy distillation outperforms baselines on math and QA tasks and improves selective prediction.
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation cs.LG · 2026-04-14 · unverdicted · none · ref 4 · 2 links · internal anchor
Lightning OPD is an offline on-policy distillation method that matches standard OPD performance at 4x efficiency by enforcing teacher consistency between SFT and distillation phases.
How Transformers Learn to Plan via Multi-Token Prediction cs.LG · 2026-04-13 · conditional · none · ref 2 · internal anchor
Multi-token prediction induces a two-stage reverse reasoning process in Transformers via gradient decoupling, improving planning on synthetic and realistic tasks.
Claw-Eval: Towards Trustworthy Evaluation of Autonomous Agents cs.AI · 2026-04-07 · unverdicted · none · ref 5 · internal anchor
Claw-Eval is a new trajectory-aware benchmark for LLM agents that records execution traces, audit logs, and environment snapshots to evaluate completion, safety, and robustness across 300 tasks, revealing that opaque grading misses 44% of safety issues.
Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving cs.DC · 2026-05-07 · unverdicted · none · ref 22 · internal anchor
Irminsul recovers up to 83% of prompt tokens above exact-prefix matching and delivers 63% prefill energy savings per cache hit on MLA-MoE models by content-hashing CDC chunks and applying closed-form kr correction.
SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding cs.DC · 2026-02-10 · unreviewed · ref 5 · internal anchor

NVIDIA Nemotron 3: Efficient and Open Intelligence

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer