Title resolution pending

· 2018

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

cs.LG · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

Apple MPS transformer decoding shows abrupt latency spikes up to 21x in narrow decoding-budget intervals due to KV cache and execution regime shifts, absent on CPU and CUDA.

The Power of Scale for Parameter-Efficient Prompt Tuning

cs.CL · 2021-04-18 · unverdicted · novelty 7.0

Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.

Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.

citing papers explorer

Showing 3 of 3 citing papers.

Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes cs.LG · 2026-05-09 · unverdicted · none · ref 4 · 2 links
Apple MPS transformer decoding shows abrupt latency spikes up to 21x in narrow decoding-budget intervals due to KV cache and execution regime shifts, absent on CPU and CUDA.
The Power of Scale for Parameter-Efficient Prompt Tuning cs.CL · 2021-04-18 · unverdicted · none · ref 97
Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.
Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics cs.LG · 2026-05-21 · unverdicted · none · ref 198
SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer