Title resolution pending

Fixing weight decay regularization in adam , author=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

cs.LG · 2022-08-15 · conditional · novelty 7.0

LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.

Towards Understanding Continual Factual Knowledge Acquisition of Language Models: From Theory to Algorithm

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

Theoretical analysis of continual factual knowledge acquisition shows data replay stabilizes pretrained knowledge by shifting convergence dynamics while regularization only slows forgetting, leading to the STOC method for attention-based replay selection.

Vision Transformers Need Registers

cs.CV · 2023-09-28 · unverdicted · novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Towards Understanding Continual Factual Knowledge Acquisition of Language Models: From Theory to Algorithm cs.CL · 2026-05-11 · unverdicted · none · ref 85
Theoretical analysis of continual factual knowledge acquisition shows data replay stabilizes pretrained knowledge by shifting convergence dynamics while regularization only slows forgetting, leading to the STOC method for attention-based replay selection.
Vision Transformers Need Registers cs.CV · 2023-09-28 · unverdicted · none · ref 160
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer