Title resolution pending

Pointer Sentinel Mixture Models , author= · 2016

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

AAAC uses two adaptive 64-byte codebooks per layer for 4-bit LLM weight quantization, choosing the optimal one per group to minimize activation-weighted error with zero storage overhead and fast runtime.

Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Recursive generative retraining with pluralistic preferences converges to a stable diverse distribution that satisfies a weighted Nash bargaining solution.

Search Your Block Floating Point Scales!

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.

Optimizer-Induced Mode Connectivity: From AdamW to Muon

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

Optimizer choice induces distinct connected regions in the loss landscape of two-layer ReLU networks, with AdamW and Muon sometimes separated by provable barriers.

citing papers explorer

Showing 4 of 4 citing papers.

AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization cs.LG · 2026-05-09 · unverdicted · none · ref 6
AAAC uses two adaptive 64-byte codebooks per layer for 4-bit LLM weight quantization, choosing the optimal one per group to minimize activation-weighted error with zero storage overhead and fast runtime.
Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences cs.LG · 2026-05-08 · unverdicted · none · ref 2
Recursive generative retraining with pluralistic preferences converges to a stable diverse distribution that satisfies a weighted Nash bargaining solution.
Search Your Block Floating Point Scales! cs.LG · 2026-05-12 · unverdicted · none · ref 95
ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.
Optimizer-Induced Mode Connectivity: From AdamW to Muon cs.AI · 2026-05-11 · unverdicted · none · ref 64
Optimizer choice induces distinct connected regions in the loss landscape of two-layer ReLU networks, with AdamW and Muon sometimes separated by provable barriers.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer