VGR introduces a visual-grounded reasoning MLLM that detects and replays image regions during inference, achieving gains on visual benchmarks with 30% fewer image tokens than the LLaVA-NeXT-7B baseline.
Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2025 2verdicts
UNVERDICTED 2representative citing papers
SVL pretraining enables SNNs to reach 85.4% top-1 accuracy on zero-shot 3D classification while outperforming prior SNNs on detection, segmentation, and action recognition with added open-world QA capability.
citing papers explorer
-
VGR: Visual Grounded Reasoning
VGR introduces a visual-grounded reasoning MLLM that detects and replays image regions during inference, achieving gains on visual benchmarks with 30% fewer image tokens than the LLaVA-NeXT-7B baseline.
-
SVL: Spike-based Vision-language Pretraining for Efficient 3D Open-world Understanding
SVL pretraining enables SNNs to reach 85.4% top-1 accuracy on zero-shot 3D classification while outperforming prior SNNs on detection, segmentation, and action recognition with added open-world QA capability.