pith. machine review for the scientific record. sign in

hub

Spinquant: Llm quantization with learned rotations

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

hub tools

years

2026 17

representative citing papers

OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization

cs.LG · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

OSAQ suppresses weight outliers in LLMs via a closed-form additive transformation from the Hessian's stable null space, improving 2-bit quantization perplexity by over 40% versus vanilla GPTQ with no inference overhead.

QuantClaw: Precision Where It Matters for OpenClaw

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

QuantClaw dynamically routes precision in agent workflows to cut cost by up to 21.4% and latency by 15.7% while keeping or improving task performance.

MCAP: Deployment-Time Layer Profiling for Memory-Constrained LLM Inference

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

MCAP uses load-time Monte Carlo profiling to estimate layer importance, enabling dynamic quantization (W4A8 vs W4A16) and memory tiering (GPU/RAM/SSD) that delivers 1.5-1.8x higher decode throughput than llama-cpp Q4_0 on NVIDIA T4 while fitting models into previously infeasible memory budgets.

citing papers explorer

Showing 17 of 17 citing papers.