Title resolution pending

Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen · 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference

cs.AR · 2026-04-28 · unverdicted · novelty 6.0

NVLLM offloads FFN computations to integrated 3D NAND flash with page-level access and keeps attention in DRAM, delivering 16.7x-37.9x speedups over GPU out-of-core baselines for models up to 30B parameters.

MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference

cs.DC · 2026-04-03 · unverdicted · novelty 5.0

MSAO cuts end-to-end latency by 30% and resource overhead by 30-65% for multimodal LLM inference through sparsity-aware edge-cloud offloading while preserving accuracy.

citing papers explorer

Showing 2 of 2 citing papers.

NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference cs.AR · 2026-04-28 · unverdicted · none · ref 39
NVLLM offloads FFN computations to integrated 3D NAND flash with page-level access and keeps attention in DRAM, delivering 16.7x-37.9x speedups over GPU out-of-core baselines for models up to 30B parameters.
MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference cs.DC · 2026-04-03 · unverdicted · none · ref 56
MSAO cuts end-to-end latency by 30% and resource overhead by 30-65% for multimodal LLM inference through sparsity-aware edge-cloud offloading while preserving accuracy.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer