Mini-internvl: A flexible-transfer pocket multimodal model with 5% parameters and 90% performance

Gao, Z · 2024 · arXiv 2410.16261

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

The Blind Spot of Adaptation: Quantifying and Mitigating Forgetting in Fine-tuned Driving Models

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

Fine-tuning VLMs for driving erodes pre-trained world knowledge, but shifting adaptation to prompt space via the Drive Expert Adapter preserves generalization while improving task performance.

LightSTAR: Efficient Visual Document Retrieval via Lightweight Selection with Vision-Adaptive Refinement

cs.CV · 2026-06-22 · unverdicted · novelty 6.0

LightSTAR achieves state-of-the-art accuracy in visual document retrieval by decomposing the task into LLM-free high-recall candidate selection and vision-adaptive semantic refinement on candidates, cutting end-to-end latency several-fold.

EventDrive: Event Cameras for Vision-Language Driving Intelligence

cs.CV · 2026-06-16 · unverdicted · novelty 6.0

EventDrive supplies a multi-task benchmark and EventDrive-VLM architecture that fuses event data, RGB, and language supervision, reporting gains in temporal precision and motion awareness for driving intelligence.

A Readiness-Driven Runtime for Pipeline-Parallel Training under Runtime Variability

cs.DC · 2026-05-18 · unverdicted · novelty 6.0

RRFP introduces a readiness-driven runtime for pipeline parallelism that uses schedules as hints and ready-set arbitration to improve utilization under runtime variability, reporting up to 2.77x speedup on multimodal workloads.

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

cs.CV · 2025-08-25 · unverdicted · novelty 6.0

InternVL3.5 advances open-source multimodal models with Cascade RL for +16% reasoning gains and ViR for 4x inference speedup, with the 241B model reaching SOTA among open-source MLLMs on multimodal, reasoning, and agentic tasks.

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

cs.CV · 2025-04-14 · conditional · novelty 6.0

InternVL3-78B sets a new open-source SOTA of 72.2 on MMMU via native joint multimodal pre-training, V2PE, MPO, and test-time scaling while remaining competitive with proprietary models.

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

cs.CV · 2024-12-06 · unverdicted · novelty 6.0

InternVL 2.5 is the first open-source MLLM to surpass 70% on the MMMU benchmark via model, data, and test-time scaling, with a 3.7-point gain from chain-of-thought reasoning.

citing papers explorer

Showing 7 of 7 citing papers.

The Blind Spot of Adaptation: Quantifying and Mitigating Forgetting in Fine-tuned Driving Models cs.CV · 2026-04-06 · unverdicted · none · ref 15
Fine-tuning VLMs for driving erodes pre-trained world knowledge, but shifting adaptation to prompt space via the Drive Expert Adapter preserves generalization while improving task performance.
LightSTAR: Efficient Visual Document Retrieval via Lightweight Selection with Vision-Adaptive Refinement cs.CV · 2026-06-22 · unverdicted · none · ref 11
LightSTAR achieves state-of-the-art accuracy in visual document retrieval by decomposing the task into LLM-free high-recall candidate selection and vision-adaptive semantic refinement on candidates, cutting end-to-end latency several-fold.
EventDrive: Event Cameras for Vision-Language Driving Intelligence cs.CV · 2026-06-16 · unverdicted · none · ref 21
EventDrive supplies a multi-task benchmark and EventDrive-VLM architecture that fuses event data, RGB, and language supervision, reporting gains in temporal precision and motion awareness for driving intelligence.
A Readiness-Driven Runtime for Pipeline-Parallel Training under Runtime Variability cs.DC · 2026-05-18 · unverdicted · none · ref 15
RRFP introduces a readiness-driven runtime for pipeline parallelism that uses schedules as hints and ready-set arbitration to improve utilization under runtime variability, reporting up to 2.77x speedup on multimodal workloads.
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency cs.CV · 2025-08-25 · unverdicted · none · ref 37
InternVL3.5 advances open-source multimodal models with Cascade RL for +16% reasoning gains and ViR for 4x inference speedup, with the 241B model reaching SOTA among open-source MLLMs on multimodal, reasoning, and agentic tasks.
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models cs.CV · 2025-04-14 · conditional · none · ref 41
InternVL3-78B sets a new open-source SOTA of 72.2 on MMMU via native joint multimodal pre-training, V2PE, MPO, and test-time scaling while remaining competitive with proprietary models.
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling cs.CV · 2024-12-06 · unverdicted · none · ref 71
InternVL 2.5 is the first open-source MLLM to surpass 70% on the MMMU benchmark via model, data, and test-time scaling, with a 3.7-point gain from chain-of-thought reasoning.

Mini-internvl: A flexible-transfer pocket multimodal model with 5% parameters and 90% performance

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer