Nvidia nemotron nano v2 vl

Amala Sanjay Deshmukh, Kateryna Chumachenko, Tuomas Rintamaki, Matthieu Le, Tyler Poon, Danial Mohseni Taheri, Ilia Karmanov, Guilin Liu, Jarno Seppanen, Guo Chen, et al · 2025 · arXiv 2511.03929

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing

cs.CV · 2026-04-10 · accept · novelty 8.0

HM-Bench is the first benchmark for MLLMs on hyperspectral images, showing models struggle with complex spatial-spectral reasoning and perform better with visual PCA images than textual reports.

MAVEN: A Multi-stage Agentic Annotation Pipeline for Video Reasoning Tasks

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

MAVEN pipeline generates multi-scale spatio-temporal event descriptions from videos using agentic adaptation and refinement, then produces training data that lets a fine-tuned 8B model outperform Gemini baselines on private CCTV and AccidentBench tasks.

Multimodal Data Curation Through Ranked Retrieval

cs.IR · 2026-05-01 · unverdicted · novelty 7.0

Symmetric Nucleus Subsampling and Expert Embedding Engine reduce modality gaps in multimodal embeddings by over 90% and outperform baselines in data curation for downstream models.

LinMU: Multimodal Understanding Made Linear

cs.CV · 2026-01-04 · conditional · novelty 6.0

LinMU achieves linear-complexity multimodal understanding by swapping self-attention for an M-MATE dual-branch block and distilling from a frozen teacher VLM, matching accuracy with up to 2.7x faster TTFT and 9x higher throughput.

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

cs.CV · 2026-05-10 · unverdicted · novelty 5.0

A neuro-symbolic engine generates GeoSym127K, a 127K-question dataset with symbolic ground truths and verified CoT pairs, yielding +22.21% gains on MathVerse Vision-Only after SFT on Qwen3-VL-8B.

citing papers explorer

Showing 5 of 5 citing papers.

HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing cs.CV · 2026-04-10 · accept · none · ref 11
HM-Bench is the first benchmark for MLLMs on hyperspectral images, showing models struggle with complex spatial-spectral reasoning and perform better with visual PCA images than textual reports.
MAVEN: A Multi-stage Agentic Annotation Pipeline for Video Reasoning Tasks cs.CV · 2026-05-21 · unverdicted · none · ref 10
MAVEN pipeline generates multi-scale spatio-temporal event descriptions from videos using agentic adaptation and refinement, then produces training data that lets a fine-tuned 8B model outperform Gemini baselines on private CCTV and AccidentBench tasks.
Multimodal Data Curation Through Ranked Retrieval cs.IR · 2026-05-01 · unverdicted · none · ref 19
Symmetric Nucleus Subsampling and Expert Embedding Engine reduce modality gaps in multimodal embeddings by over 90% and outperform baselines in data curation for downstream models.
LinMU: Multimodal Understanding Made Linear cs.CV · 2026-01-04 · conditional · none · ref 5
LinMU achieves linear-complexity multimodal understanding by swapping self-attention for an M-MATE dual-branch block and distilling from a frozen teacher VLM, matching accuracy with up to 2.7x faster TTFT and 9x higher throughput.
GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning cs.CV · 2026-05-10 · unverdicted · none · ref 18
A neuro-symbolic engine generates GeoSym127K, a 127K-question dataset with symbolic ground truths and verified CoT pairs, yielding +22.21% gains on MathVerse Vision-Only after SFT on Qwen3-VL-8B.

Nvidia nemotron nano v2 vl

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer