Big-little transformer decoder for optimal inference-time cost,

· 2023 · arXiv 2302.07030

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

RLM-Cascade: Response-Level Speculative Decoding for Cost-Efficient LLM API Serving

cs.LG · 2026-06-22 · unverdicted · novelty 6.0

RLM-Cascade applies response-level speculative decoding with a complexity router to reduce LLM API costs by 45.8% on agentic coding tasks while also lowering latency and matching or exceeding baseline quality.

citing papers explorer

Showing 1 of 1 citing paper.

RLM-Cascade: Response-Level Speculative Decoding for Cost-Efficient LLM API Serving cs.LG · 2026-06-22 · unverdicted · none · ref 10
RLM-Cascade applies response-level speculative decoding with a complexity router to reduce LLM API costs by 45.8% on agentic coding tasks while also lowering latency and matching or exceeding baseline quality.

Big-little transformer decoder for optimal inference-time cost,

fields

years

verdicts

representative citing papers

citing papers explorer