Flexgen: High-throughput generative inference of large language models with a single gpu

· 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference

cs.IT · 2026-04-20 · unverdicted · novelty 7.0

WISV uses a channel-aware semantic acceptance policy on hidden representations to boost accepted sequence length by up to 60.8% and cut interaction rounds by 37.3% in distributed speculative decoding, with under 1% accuracy loss.

Fast Cross-Operator Optimization of Attention Dataflow

cs.AR · 2026-04-03 · unverdicted · novelty 7.0

MMEE encodes dataflow decisions in matrix form for fast exhaustive search, delivering 40-69% lower latency and energy use than prior methods while running 64-343x faster.

LLM-Powered AI Agent Systems and Their Applications in Industry

cs.AI · 2025-05-22 · unverdicted · novelty 2.0

A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.

citing papers explorer

Showing 3 of 3 citing papers.

WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference cs.IT · 2026-04-20 · unverdicted · none · ref 36
WISV uses a channel-aware semantic acceptance policy on hidden representations to boost accepted sequence length by up to 60.8% and cut interaction rounds by 37.3% in distributed speculative decoding, with under 1% accuracy loss.
Fast Cross-Operator Optimization of Attention Dataflow cs.AR · 2026-04-03 · unverdicted · none · ref 68
MMEE encodes dataflow decisions in matrix form for fast exhaustive search, delivering 40-69% lower latency and energy use than prior methods while running 64-343x faster.
LLM-Powered AI Agent Systems and Their Applications in Industry cs.AI · 2025-05-22 · unverdicted · none · ref 96
A survey categorizing LLM-powered agent systems into software-based, physical, and hybrid types, covering industrial applications and challenges such as latency and security.

Flexgen: High-throughput generative inference of large language models with a single gpu

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer