Large language model inference acceler- ation: A comprehensive hardware perspective

Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Li Ding, Hao Zhou, et al · 2024 · arXiv 2410.04466

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

SPEX delivers 1.2-3x speedup on ToT algorithms via speculative path selection, dynamic budget allocation, and adaptive early termination, reaching up to 4.1x when combined with token-level speculative decoding.

A Little Rank Goes a Long Way: Random Scaffolds with LoRA Adapters Are All You Need

cs.LG · 2026-04-09 · unverdicted · novelty 6.0

Frozen random backbones with low-rank LoRA adapters recover 96-100% of fully trained performance on diverse architectures while training only 0.5-40% of parameters.

Accelerating Precise End-to-End Simulation: Latency-Sensitive Many-core System Modeling

cs.AR · 2026-05-08 · unverdicted · novelty 5.0

A new end-to-end modeling approach for latency-sensitive many-core architectures with globally shared L1 SPM tracks RTL golden models within 7% error while running up to 115x faster and supports profiling for design optimization.

Secure eFPGA-Enabled Edge LLM Inference: Architectural and Hardware Countermeasures

cs.CR · 2026-04-24 · unverdicted · novelty 5.0

A hybrid ASIC+eFPGA architecture is proposed to add adaptive security mechanisms to edge LLM inference while retaining ASIC efficiency.

Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models

cs.LG · 2026-04-23 · unverdicted · novelty 2.0

The paper compiles hardware-software co-design techniques including mixed-precision quantization, structural pruning, speculative decoding, and transformer accelerators to speed up multimodal foundation models, with examples in medical and code tasks.

citing papers explorer

Showing 5 of 5 citing papers.

Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration cs.LG · 2026-05-11 · unverdicted · none · ref 29 · 2 links
SPEX delivers 1.2-3x speedup on ToT algorithms via speculative path selection, dynamic budget allocation, and adaptive early termination, reaching up to 4.1x when combined with token-level speculative decoding.
A Little Rank Goes a Long Way: Random Scaffolds with LoRA Adapters Are All You Need cs.LG · 2026-04-09 · unverdicted · none · ref 31
Frozen random backbones with low-rank LoRA adapters recover 96-100% of fully trained performance on diverse architectures while training only 0.5-40% of parameters.
Accelerating Precise End-to-End Simulation: Latency-Sensitive Many-core System Modeling cs.AR · 2026-05-08 · unverdicted · none · ref 1
A new end-to-end modeling approach for latency-sensitive many-core architectures with globally shared L1 SPM tracks RTL golden models within 7% error while running up to 115x faster and supports profiling for design optimization.
Secure eFPGA-Enabled Edge LLM Inference: Architectural and Hardware Countermeasures cs.CR · 2026-04-24 · unverdicted · none · ref 10
A hybrid ASIC+eFPGA architecture is proposed to add adaptive security mechanisms to edge LLM inference while retaining ASIC efficiency.
Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models cs.LG · 2026-04-23 · unverdicted · none · ref 47
The paper compiles hardware-software co-design techniques including mixed-precision quantization, structural pruning, speculative decoding, and transformer accelerators to speed up multimodal foundation models, with examples in medical and code tasks.

Large language model inference acceler- ation: A comprehensive hardware perspective

fields

years

verdicts

representative citing papers

citing papers explorer