pith. sign in

hub Tool reference

Pointer sentinel mixture models

Tool reference. 80% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.

12 Pith papers citing it
Method reference 80% of classified citations

hub tools

citation-role summary

dataset 4 other 1

citation-polarity summary

representative citing papers

Locking Pretrained Weights via Deep Low-Rank Residual Distillation

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

DLR-Lock locks open-weight LLMs against unauthorized fine-tuning by swapping MLPs for deep low-rank residual networks that inflate backprop memory and complicate optimization, yet preserve original capabilities via module-wise distillation.

STS: Efficient Sparse Attention with Speculative Token Sparsity

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

STS repurposes draft-model attention scores from speculative decoding to build token-and-head-wise sparsity masks, delivering 2.67x speedup at ~90% sparsity on NarrativeQA with negligible accuracy loss.

Training Transformers for KV Cache Compressibility

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Training transformers with KV sparsification during continued pretraining produces representations that admit better post-hoc KV cache compression, improving quality under memory budgets for long-context tasks.

Parcae: Scaling Laws For Stable Looped Language Models

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.

TrainMover: An Interruption-Resilient Runtime for ML Training

cs.DC · 2024-12-17 · unverdicted · novelty 6.0

TrainMover achieves ~20s downtime for interruptions in 1024-GPU LLM training via two-phase delta-based communication setup, communication-free sandboxed warmup, and general standby design, projecting 55% reduction in wasted GPU hours.

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

cs.AI · 2025-01-15 · unverdicted · novelty 4.0

Agentic RAG embeds agents with reflection, planning, tool use, and collaboration into retrieval pipelines to overcome static RAG limitations, and the survey offers a taxonomy by agent count, control, autonomy, and knowledge representation plus applications and open challenges.

citing papers explorer

Showing 12 of 12 citing papers.