Chain-of-thought compression should not be blind: V-skip for efficient multimodal reasoning via dual-path anchoring

Dongxu Zhang, Yiding Sun, Cheng Tan, Wenbiao Yan, Ning Yang, Jihua Zhu, Haijun Zhang, “Chain-of-thought compression should not be blind: V-skip for efficient multimodal reasoning via dual-path anchoring,”arXiv preprint arXiv:2601 · 2026 · arXiv 2601.13879

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning

cs.AI · 2026-04-13 · unverdicted · novelty 6.0

CFMS is a coarse-to-fine framework that uses MLLMs to create a multi-perspective knowledge tuple as a reasoning map for symbolic table operations, yielding competitive accuracy on WikiTQ and TabFact.

Sparsity-Aware Voxel Attention and Foreground Modulation for 3D Semantic Scene Completion

cs.CV · 2026-04-07 · unverdicted · novelty 5.0

VoxSAMNet introduces sparsity-aware deformable attention via a dummy node and foreground modulation with dropout plus text-guided filtering to reach new state-of-the-art mIoU of 18.2% on SemanticKITTI and 20.2% on SSCBench-KITTI-360 for monocular 3D scene completion.

Controllable Singing Style Conversion with Boundary-Aware Information Bottleneck

cs.SD · 2026-04-07 · unverdicted · novelty 5.0

A singing voice conversion system with boundary-aware information bottleneck and high-frequency augmentation achieves the best naturalness in SVCC2025 subjective tests while using less extra data than competitors.

citing papers explorer

Showing 3 of 3 citing papers.

CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning cs.AI · 2026-04-13 · unverdicted · none · ref 18
CFMS is a coarse-to-fine framework that uses MLLMs to create a multi-perspective knowledge tuple as a reasoning map for symbolic table operations, yielding competitive accuracy on WikiTQ and TabFact.
Sparsity-Aware Voxel Attention and Foreground Modulation for 3D Semantic Scene Completion cs.CV · 2026-04-07 · unverdicted · none · ref 48
VoxSAMNet introduces sparsity-aware deformable attention via a dummy node and foreground modulation with dropout plus text-guided filtering to reach new state-of-the-art mIoU of 18.2% on SemanticKITTI and 20.2% on SSCBench-KITTI-360 for monocular 3D scene completion.
Controllable Singing Style Conversion with Boundary-Aware Information Bottleneck cs.SD · 2026-04-07 · unverdicted · none · ref 24
A singing voice conversion system with boundary-aware information bottleneck and high-frequency augmentation achieves the best naturalness in SVCC2025 subjective tests while using less extra data than competitors.

Chain-of-thought compression should not be blind: V-skip for efficient multimodal reasoning via dual-path anchoring

fields

years

verdicts

representative citing papers

citing papers explorer