Rabe and Charles Staats

Markus N · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes

cs.LG · 2025-06-10 · unverdicted · novelty 6.0

BSA-TNP is a new neural process model with KRBlocks and biased scan attention that claims to match top accuracy while scaling inference to over 1M points in under a minute on a single GPU and supporting translation invariance.

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

cs.LG · 2023-08-31 · unverdicted · novelty 6.0

SARATHI uses chunked prefills and decode-maximal batching to let decode steps ride along with prefill compute, delivering up to 10x higher decode throughput and 1.91x end-to-end throughput on models including LLaMA-13B and GPT-3.

citing papers explorer

Showing 2 of 2 citing papers.

Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes cs.LG · 2025-06-10 · unverdicted · none · ref 32
BSA-TNP is a new neural process model with KRBlocks and biased scan attention that claims to match top accuracy while scaling inference to over 1M points in under a minute on a single GPU and supporting translation invariance.
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills cs.LG · 2023-08-31 · unverdicted · none · ref 40
SARATHI uses chunked prefills and decode-maximal batching to let decode steps ride along with prefill compute, delivering up to 10x higher decode throughput and 1.91x end-to-end throughput on models including LLaMA-13B and GPT-3.

Rabe and Charles Staats

fields

years

verdicts

representative citing papers

citing papers explorer