MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference

· 2026 · cs.DC · arXiv 2604.02945

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Multimodal large language models (MLLMs) enable powerful cross-modal reasoning capabilities but impose substantial computational and latency burdens, posing critical challenges for deployment on resource-constrained edge devices. In this paper, we propose MSAO, an adaptive modality sparsity-aware offloading framework with edge-cloud collaboration for efficient MLLM Inference. First, a lightweight heterogeneous modality-aware via fine-grained sparsity module performs spatial-temporal-modal joint analysis to compute the Modality Activation Sparsity (MAS) metric, which quantifies the necessity of each modality with minimal computational overhead. Second, an adaptive speculative edge-cloud collaborative offloading mechanism dynamically schedules workloads between edge and cloud based on the derived MAS scores and real-time system states, leveraging confidence-guided speculative execution to hide communication latency. Extensive experiments on VQAv2 and MMBench benchmarks demonstrate that MSAO achieves a 30% reduction in end-to-end latency and 30%-65% decrease in resource overhead, while delivering a throughput improvement of 1.5x to 2.3x compared to traditional approaches, all without compromising competitive accuracy.

representative citing papers

Perceive Before Reasoning: A Pre-Reasoning Perception Framework for Efficient and Reliable Proactive Mobile Agents

cs.AI · 2026-06-02 · unverdicted · novelty 5.0

PRPF uses a lightweight Multimodal Proactive Perceptor for intervention gating and context compression, activating the Proactive Agent Reasoner only when needed, reducing false trigger rates and improving efficiency on the ProactiveMobile benchmark.

citing papers explorer

Showing 1 of 1 citing paper.

Perceive Before Reasoning: A Pre-Reasoning Perception Framework for Efficient and Reliable Proactive Mobile Agents cs.AI · 2026-06-02 · unverdicted · none · ref 48 · internal anchor
PRPF uses a lightweight Multimodal Proactive Perceptor for intervention gating and context compression, activating the Proactive Agent Reasoner only when needed, reducing false trigger rates and improving efficiency on the ProactiveMobile benchmark.

MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference

fields

years

verdicts

representative citing papers

citing papers explorer