pith. sign in

arXiv preprint arXiv:2104.08378 , year=

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 11 2025 1

verdicts

UNVERDICTED 12

roles

background 1

polarities

background 1

clear filters

representative citing papers

Small LLMs: Pruning vs. Training from Scratch

cs.LG · 2026-06-12 · unverdicted · novelty 5.0

Pruned initializations from an 8B model outperform random starts with equal training tokens, but with full token budgets fine-grained pruning retains advantage while coarse structured pruning does not.

Pruning Deep Neural Networks via the Marchenko--Pastur Distribution

cs.LG · 2026-05-23 · unverdicted · novelty 5.0

Marchenko-Pastur random-matrix pruning of DNNs yields theoretical certificates for accuracy preservation under small fine-tuning and empirical ImageNet results with 50-60% MAC reduction and sub-2pp accuracy drops on ViT and CNN models.

Adaptive Norm-Based Regularization for Neural Networks

stat.ML · 2026-04-30 · unverdicted · novelty 5.0

Covariance-aware ridge and combined l1-l2 regularizers for neural networks yield better predictive performance and complexity control than standard penalties in simulations and applications to cooling-load prediction and leukemia classification.

HieraSparse: Hierarchical Semi-Structured Sparse KV Attention

cs.DC · 2026-04-18 · unverdicted · novelty 5.0

HieraSparse delivers a hierarchical semi-structured sparse KV attention system that achieves 1.2x KV compression and 4.57x decode attention speedup versus prior unstructured sparsity methods at equivalent sparsity, plus up to 1.85x prefill speedup and 1.37x/1.77x speedups with magnitude pruning and

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • HieraSparse: Hierarchical Semi-Structured Sparse KV Attention cs.DC · 2026-04-18 · unverdicted · none · ref 40

    HieraSparse delivers a hierarchical semi-structured sparse KV attention system that achieves 1.2x KV compression and 4.57x decode attention speedup versus prior unstructured sparsity methods at equivalent sparsity, plus up to 1.85x prefill speedup and 1.37x/1.77x speedups with magnitude pruning and