pith. machine review for the scientific record. sign in

hub

Fast inference from transformers via speculative decoding

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

hub tools

representative citing papers

Speculative Decoding for Autoregressive Video Generation

cs.CV · 2026-04-19 · conditional · novelty 7.0

A training-free speculative decoding method for block-based autoregressive video diffusion uses a quality router on worst-frame ImageReward scores to accept drafter proposals, achieving up to 2.09x speedup at 95.7% quality retention.

Micro Language Models Enable Instant Responses

cs.CL · 2026-04-21 · conditional · novelty 6.0

Ultra-compact 8-30M parameter models start contextually grounded responses on-device while cloud models seamlessly continue them, enabling responsive AI on power-constrained hardware.

EdgeFM: Efficient Edge Inference for Vision-Language Models

cs.CV · 2026-04-30 · unverdicted · novelty 5.0

EdgeFM is an agent-driven framework that strips non-essential features from VLMs and packages reusable optimized kernels, achieving up to 1.49x speedup over TensorRT-Edge-LLM on NVIDIA Orin while enabling first end-to-end deployment on Horizon Journey hardware.

citing papers explorer

Showing 15 of 15 citing papers.