Distills 3D spatial reasoning from a 7B teacher VLM to a 2.29B student using VGGT encoder, multi-task losses, and Hidden CoT latent tokens, yielding 8.7x lower latency with 54-72% performance retention on ScanNet and 3D-FRONT.
Visual instruction tuning
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
DUAL-BLADE uses a dual-path KV-cache framework with NVMe-direct access to reduce prefill and decode latency by up to 33% and 42% while improving SSD utilization 2.2x under tight memory budgets.
UAV-VL-R1 combines SFT and multi-stage GRPO reinforcement learning on a new 50,019-sample HRVQA-VL dataset to deliver substantially higher zero-shot accuracy on UAV visual reasoning tasks than both its 2B baseline and a 72B-scale model.
citing papers explorer
-
Distilling 3D Spatial Reasoning into a Lightweight Vision-Language Model with CoT
Distills 3D spatial reasoning from a 7B teacher VLM to a 2.29B student using VGGT encoder, multi-task losses, and Hidden CoT latent tokens, yielding 8.7x lower latency with 54-72% performance retention on ScanNet and 3D-FRONT.
-
DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference
DUAL-BLADE uses a dual-path KV-cache framework with NVMe-direct access to reduce prefill and decode latency by up to 33% and 42% while improving SSD utilization 2.2x under tight memory budgets.
-
UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning
UAV-VL-R1 combines SFT and multi-stage GRPO reinforcement learning on a new 50,019-sample HRVQA-VL dataset to deliver substantially higher zero-shot accuracy on UAV visual reasoning tasks than both its 2B baseline and a 72B-scale model.