citation dossier

The lottery ticket hypothesis: Finding sparse, trainable neural networks

J · 2018 · arXiv 1803.03635

18Pith papers citing it

18reference links

cs.LGtop field · 8 papers

UNVERDICTEDtop verdict bucket · 15 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 18 reviewed papers. Its strongest current cluster is cs.LG (8 papers). The largest review-status bucket among citing papers is UNVERDICTED (15 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

TENNOR: Trustworthy Execution for Neural Networks through Obliviousness and Retrievals

cs.CR · 2026-05-08 · unverdicted · novelty 7.0

TENNOR enables efficient private training of wide neural networks in TEEs by recasting sparsification as doubly oblivious LSH retrievals and introducing MP-WTA to cut hash table memory by 50x while preserving accuracy.

Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution

cs.PL · 2026-04-19 · unverdicted · novelty 7.0

A new partitioning algorithm that provably load-balances arbitrary sparse tensor algebra expressions by generalizing parallel merging to multi-operand, multi-dimensional hierarchical structures, implemented in a compiler framework.

Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Lesioning a shared core in multilingual LLMs drops whole-brain fMRI encoding correlation by 60.32%, while language-specific lesions selectively weaken predictions only for the matched native language.

Minimal Information Control Invariance via Vector Quantization

eess.SY · 2026-04-03 · unverdicted · novelty 7.0

A vector-quantized autoencoder learns minimal control codebooks for forward invariance in sampled-data control, achieving 157x reduction over grid baselines on a 12D quadrotor model.

Compact SO(3) Equivariant Atomistic Foundation Models via Structural Pruning

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Structural pruning of SO(3) equivariant atomistic models from large checkpoints yields 1.5-4x fewer parameters and 2.5-4x less pre-training compute than small models trained from scratch, while outperforming them on most Matbench Discovery metrics and downstream tasks.

XPERT: Expert Knowledge Transfer for Effective Training of Language Models

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

XPERT extracts and reuses cross-domain expert knowledge from pre-trained MoE LLMs via inference analysis and tensor decomposition to improve performance and convergence in downstream language model training.

Gaussians on a Diet: High-Quality Memory-Bounded 3D Gaussian Splatting Training

cs.CV · 2026-04-21 · conditional · novelty 6.0

A dynamic training framework for 3D Gaussian Splatting alternates incremental pruning and adaptive growing of primitives to maintain high rendering quality at up to 80% lower peak memory than standard 3DGS.

Training Deep Visual Networks Beyond Loss and Accuracy Through a Dynamical Systems Approach

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

Introduces integration, metastability, and dynamical stability index measures from layer activations and reports patterns distinguishing CIFAR-10 from CIFAR-100 difficulty plus early convergence signals across ResNet variants, DenseNet, MobileNetV2, VGG-16, and a Vision Transformer.

SubFLOT: Submodel Extraction for Efficient and Personalized Federated Learning via Optimal Transport

cs.LG · 2026-04-08 · unverdicted · novelty 6.0

SubFLOT uses optimal transport to generate data-aware personalized submodels via server-side pruning and scaling-based adaptive regularization to mitigate parametric divergence in heterogeneous federated learning.

SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models

cs.LG · 2026-04-06 · unverdicted · novelty 6.0

SLaB compresses LLM weights via sparse-lowrank-binary decomposition guided by activation-aware scores, achieving up to 36% lower perplexity than prior methods at 50% compression on Llama models.

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

cs.CL · 2020-06-30 · unverdicted · novelty 6.0

GShard supplies automatic sharding and conditional computation support that enabled training a 600-billion-parameter multilingual translation model on thousands of TPUs with superior quality.

Empty SPACE: Cross-Attention Sparsity for Concept Erasure in Diffusion Models

cs.LG · 2026-05-11 · unverdicted · novelty 5.0

SPACE induces sparsity in cross-attention parameters via closed-form iterative updates to erase target concepts more effectively than dense baselines in large diffusion models.

Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency

cs.CL · 2026-04-27 · conditional · novelty 5.0

Widthwise pruning of LVLM language backbones combined with supervised finetuning and hidden-state distillation recovers over 95% performance using just 5% of data across 3B-7B models.

Representation-Aligned Multi-Scale Personalization for Federated Learning

cs.LG · 2026-04-13 · unverdicted · novelty 5.0

FRAMP generates client-specific models from compact descriptors in federated learning, trains tailored submodels, and aligns representations to balance personalization with global consistency.

Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference

cs.LG · 2026-04-10 · unverdicted · novelty 5.0

SentryFuse delivers modality-aware zero-shot pruning and sparse attention that improves accuracy by 12.7% on average and up to 18% under sensor dropout while cutting memory 28.2% and latency up to 1.63x across multimodal edge models.

Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation

cs.IR · 2026-04-09 · unverdicted · novelty 5.0

SSR uses static random filters and iterative competitive sparse mechanisms to explicitly enforce sparsity in recommendation models, outperforming dense baselines on public and billion-scale industrial datasets.

Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression

cs.LG · 2026-04-05 · unverdicted · novelty 4.0

The prune-quantize-distill ordering produces a better accuracy-size-latency frontier on CIFAR-10/100 than any single technique or other orderings, with INT8 QAT providing the main runtime gain.

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

cs.LG · 2024-03-21 · accept · novelty 4.0

A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.

citing papers explorer

Showing 18 of 18 citing papers.

TENNOR: Trustworthy Execution for Neural Networks through Obliviousness and Retrievals cs.CR · 2026-05-08 · unverdicted · none · ref 37
TENNOR enables efficient private training of wide neural networks in TEEs by recasting sparsification as doubly oblivious LSH retrievals and introducing MP-WTA to cut hash table memory by 50x while preserving accuracy.
Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution cs.PL · 2026-04-19 · unverdicted · none · ref 21
A new partitioning algorithm that provably load-balances arbitrary sparse tensor algebra expressions by generalizing parallel merging to multi-operand, multi-dimensional hierarchical structures, implemented in a compiler framework.
Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment cs.CL · 2026-04-12 · unverdicted · none · ref 46
Lesioning a shared core in multilingual LLMs drops whole-brain fMRI encoding correlation by 60.32%, while language-specific lesions selectively weaken predictions only for the matched native language.
Minimal Information Control Invariance via Vector Quantization eess.SY · 2026-04-03 · unverdicted · none · ref 4
A vector-quantized autoencoder learns minimal control codebooks for forward invariance in sampled-data control, achieving 157x reduction over grid baselines on a 12D quadrotor model.
Compact SO(3) Equivariant Atomistic Foundation Models via Structural Pruning cs.LG · 2026-05-09 · unverdicted · none · ref 41
Structural pruning of SO(3) equivariant atomistic models from large checkpoints yields 1.5-4x fewer parameters and 2.5-4x less pre-training compute than small models trained from scratch, while outperforming them on most Matbench Discovery metrics and downstream tasks.
XPERT: Expert Knowledge Transfer for Effective Training of Language Models cs.CL · 2026-05-09 · unverdicted · none · ref 73
XPERT extracts and reuses cross-domain expert knowledge from pre-trained MoE LLMs via inference analysis and tensor decomposition to improve performance and convergence in downstream language model training.
Gaussians on a Diet: High-Quality Memory-Bounded 3D Gaussian Splatting Training cs.CV · 2026-04-21 · conditional · none · ref 12
A dynamic training framework for 3D Gaussian Splatting alternates incremental pruning and adaptive growing of primitives to maintain high rendering quality at up to 80% lower peak memory than standard 3DGS.
Training Deep Visual Networks Beyond Loss and Accuracy Through a Dynamical Systems Approach cs.CV · 2026-04-08 · unverdicted · none · ref 6
Introduces integration, metastability, and dynamical stability index measures from layer activations and reports patterns distinguishing CIFAR-10 from CIFAR-100 difficulty plus early convergence signals across ResNet variants, DenseNet, MobileNetV2, VGG-16, and a Vision Transformer.
SubFLOT: Submodel Extraction for Efficient and Personalized Federated Learning via Optimal Transport cs.LG · 2026-04-08 · unverdicted · none · ref 16
SubFLOT uses optimal transport to generate data-aware personalized submodels via server-side pruning and scaling-based adaptive regularization to mitigate parametric divergence in heterogeneous federated learning.
SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models cs.LG · 2026-04-06 · unverdicted · none · ref 6
SLaB compresses LLM weights via sparse-lowrank-binary decomposition guided by activation-aware scores, achieving up to 36% lower perplexity than prior methods at 50% compression on Llama models.
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding cs.CL · 2020-06-30 · unverdicted · none · ref 2
GShard supplies automatic sharding and conditional computation support that enabled training a 600-billion-parameter multilingual translation model on thousands of TPUs with superior quality.
Empty SPACE: Cross-Attention Sparsity for Concept Erasure in Diffusion Models cs.LG · 2026-05-11 · unverdicted · none · ref 53
SPACE induces sparsity in cross-attention parameters via closed-form iterative updates to erase target concepts more effectively than dense baselines in large diffusion models.
Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency cs.CL · 2026-04-27 · conditional · none · ref 17
Widthwise pruning of LVLM language backbones combined with supervised finetuning and hidden-state distillation recovers over 95% performance using just 5% of data across 3B-7B models.
Representation-Aligned Multi-Scale Personalization for Federated Learning cs.LG · 2026-04-13 · unverdicted · none · ref 7
FRAMP generates client-specific models from compact descriptors in federated learning, trains tailored submodels, and aligns representations to balance personalization with global consistency.
Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference cs.LG · 2026-04-10 · unverdicted · none · ref 4
SentryFuse delivers modality-aware zero-shot pruning and sparse attention that improves accuracy by 12.7% on average and up to 18% under sensor dropout while cutting memory 28.2% and latency up to 1.63x across multimodal edge models.
Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation cs.IR · 2026-04-09 · unverdicted · none · ref 9
SSR uses static random filters and iterative competitive sparse mechanisms to explicitly enforce sparsity in recommendation models, outperforming dense baselines on public and billion-scale industrial datasets.
Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression cs.LG · 2026-04-05 · unverdicted · none · ref 12
The prune-quantize-distill ordering produces a better accuracy-size-latency frontier on CIFAR-10/100 than any single technique or other orderings, with INT8 QAT providing the main runtime gain.
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey cs.LG · 2024-03-21 · accept · none · ref 107
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.

The lottery ticket hypothesis: Finding sparse, trainable neural networks

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer