hub Canonical reference

arXiv preprint arXiv:2310.06694 (2023)

Xia, M · 2023 · arXiv 2310.06694

Canonical reference. 80% of citing Pith papers cite this work as background.

20 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 method 1

citation-polarity summary

background 4 use method 1

representative citing papers

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Star Elastic trains N nested submodels in a single post-training job on a parent reasoning LLM, supporting elastic budget control that matches or exceeds independent baselines while cutting training compute by up to 360x.

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

cs.CL · 2025-12-27 · unverdicted · novelty 7.0

Width pruning in Llama-3.2 models reduces parametric knowledge while enhancing instruction-following and preserving reasoning.

Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training

cs.LG · 2025-07-21 · unverdicted · novelty 7.0

An RL agent learns domain re-weighting policies from evaluation feedback to improve balanced performance in continual pre-training of LLMs across source and target domains.

Federated Co-tuning Framework for Large and Small Language Models

cs.CL · 2024-11-18 · unverdicted · novelty 7.0

FedCoLLM is a parameter-efficient federated co-tuning framework that improves client SLMs via server LLMs and enriches LLMs with client domain insights using adapters on NLP text generation tasks.

SAFE-SVD: Sensitivity-Aware Fidelity-Enforcing SVD for Physics Foundation Models

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

SAFE-SVD introduces a sensitivity-aware fidelity-enforcing SVD framework for compressing physics foundation models that maintains higher accuracy than standard methods at greater compression ratios.

Compact SO(3) Equivariant Atomistic Foundation Models via Structural Pruning

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Structural pruning of SO(3) equivariant atomistic models from large checkpoints yields 1.5-4x fewer parameters and 2.5-4x less pre-training compute than small models trained from scratch, while outperforming them on most Matbench Discovery metrics and downstream tasks.

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

cs.CV · 2025-11-18 · conditional · novelty 6.0

OmniZip introduces an audio-guided dynamic token compression framework that achieves 3.42X inference speedup and 1.4X memory reduction for omnimodal LLMs without any training.

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

cs.CV · 2024-09-06 · unverdicted · novelty 6.0

VILA-U unifies visual understanding and generation inside one autoregressive next-token prediction model, removing separate diffusion components while claiming near state-of-the-art results.

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

cs.CL · 2024-04-09 · conditional · novelty 6.0

MiniCPM 1.2B and 2.4B models reach parity with 7B-13B LLMs via model wind-tunnel scaling and a WSD scheduler that yields a higher optimal data-to-model ratio than Chinchilla scaling.

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

cs.CL · 2023-09-07 · conditional · novelty 6.0

DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

cs.LG · 2026-05-09 · unverdicted · novelty 5.0 · 2 refs

Pruning pretrained MoE models outperforms training from scratch under fixed budget, different expert compression methods converge after continued training, and progressive pruning plus multi-token KD improves the final 23A2B model.

Light-FMP: Lightweight Feature and Model Pruning for Enhanced Deep Recommender Systems

cs.IR · 2026-05-07 · unverdicted · novelty 5.0

Light-FMP prunes features and model parameters in deep recommender systems by pretraining a hard-concrete masking layer on data subsets, then retraining the reduced model to improve both efficiency and accuracy over prior methods.

Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency

cs.CL · 2026-04-27 · conditional · novelty 5.0

Widthwise pruning of LVLM language backbones combined with supervised finetuning and hidden-state distillation recovers over 95% performance using just 5% of data across 3B-7B models.

On the Limits of Layer Pruning for Generative Reasoning in Large Language Models

cs.LG · 2026-02-02 · unverdicted · novelty 5.0

Layer pruning preserves classification performance in LLMs but fundamentally limits recovery of generative reasoning capabilities even after extensive self-supervised finetuning.

TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

cs.LG · 2025-10-26 · unverdicted · novelty 5.0

TALE selectively prunes task-detrimental layers in LLMs at inference time to match or exceed baseline performance with lower computational cost across multiple models and tasks.

A Survey on Efficient Inference for Large Language Models

cs.CL · 2024-04-22 · accept · novelty 3.0

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

Small Language Models (SLMs) Can Still Pack a Punch: A survey (updated 2026)

cs.CL · 2025-01-03 · unverdicted · novelty 2.0

A literature survey of Small Language Models (1-8B parameters) that can perform comparably or better than larger models, covering general-purpose and task-specific approaches plus creation techniques.

Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs

cs.LG · 2026-05-15

TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability

cs.LG · 2026-05-14

citing papers explorer

Showing 20 of 20 citing papers.

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling cs.LG · 2026-05-14 · unverdicted · none · ref 269
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control cs.LG · 2026-05-08 · unverdicted · none · ref 7
Star Elastic trains N nested submodels in a single post-training job on a parent reasoning LLM, supporting elastic budget control that matches or exceeds independent baselines while cutting training compute by up to 360x.
Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2 cs.CL · 2025-12-27 · unverdicted · none · ref 17
Width pruning in Llama-3.2 models reduces parametric knowledge while enhancing instruction-following and preserving reasoning.
Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training cs.LG · 2025-07-21 · unverdicted · none · ref 42
An RL agent learns domain re-weighting policies from evaluation feedback to improve balanced performance in continual pre-training of LLMs across source and target domains.
Federated Co-tuning Framework for Large and Small Language Models cs.CL · 2024-11-18 · unverdicted · none · ref 19
FedCoLLM is a parameter-efficient federated co-tuning framework that improves client SLMs via server LLMs and enriches LLMs with client domain insights using adapters on NLP text generation tasks.
SAFE-SVD: Sensitivity-Aware Fidelity-Enforcing SVD for Physics Foundation Models cs.LG · 2026-05-18 · unverdicted · none · ref 28
SAFE-SVD introduces a sensitivity-aware fidelity-enforcing SVD framework for compressing physics foundation models that maintains higher accuracy than standard methods at greater compression ratios.
Compact SO(3) Equivariant Atomistic Foundation Models via Structural Pruning cs.LG · 2026-05-09 · unverdicted · none · ref 14
Structural pruning of SO(3) equivariant atomistic models from large checkpoints yields 1.5-4x fewer parameters and 2.5-4x less pre-training compute than small models trained from scratch, while outperforming them on most Matbench Discovery metrics and downstream tasks.
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models cs.CV · 2025-11-18 · conditional · none · ref 50
OmniZip introduces an audio-guided dynamic token compression framework that achieves 3.42X inference speedup and 1.4X memory reduction for omnimodal LLMs without any training.
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation cs.CV · 2024-09-06 · unverdicted · none · ref 19
VILA-U unifies visual understanding and generation inside one autoregressive next-token prediction model, removing separate diffusion components while claiming near state-of-the-art results.
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies cs.CL · 2024-04-09 · conditional · none · ref 43
MiniCPM 1.2B and 2.4B models reach parity with 7B-13B LLMs via model wind-tunnel scaling and a WSD scheduler that yields a higher optimal data-to-model ratio than Chinchilla scaling.
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models cs.CL · 2023-09-07 · conditional · none · ref 93
DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training cs.LG · 2026-05-09 · unverdicted · none · ref 68 · 2 links
Pruning pretrained MoE models outperforms training from scratch under fixed budget, different expert compression methods converge after continued training, and progressive pruning plus multi-token KD improves the final 23A2B model.
Light-FMP: Lightweight Feature and Model Pruning for Enhanced Deep Recommender Systems cs.IR · 2026-05-07 · unverdicted · none · ref 46
Light-FMP prunes features and model parameters in deep recommender systems by pretraining a hard-concrete masking layer on data subsets, then retraining the reduced model to improve both efficiency and accuracy over prior methods.
Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency cs.CL · 2026-04-27 · conditional · none · ref 32
Widthwise pruning of LVLM language backbones combined with supervised finetuning and hidden-state distillation recovers over 95% performance using just 5% of data across 3B-7B models.
On the Limits of Layer Pruning for Generative Reasoning in Large Language Models cs.LG · 2026-02-02 · unverdicted · none · ref 31
Layer pruning preserves classification performance in LLMs but fundamentally limits recovery of generative reasoning capabilities even after extensive self-supervised finetuning.
TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination cs.LG · 2025-10-26 · unverdicted · none · ref 21
TALE selectively prunes task-detrimental layers in LLMs at inference time to match or exceed baseline performance with lower computational cost across multiple models and tasks.
A Survey on Efficient Inference for Large Language Models cs.CL · 2024-04-22 · accept · none · ref 175
The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.
Small Language Models (SLMs) Can Still Pack a Punch: A survey (updated 2026) cs.CL · 2025-01-03 · unverdicted · none · ref 138
A literature survey of Small Language Models (1-8B parameters) that can perform comparably or better than larger models, covering general-purpose and task-specific approaches plus creation techniques.
Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs cs.LG · 2026-05-15 · unreviewed · ref 36
TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability cs.LG · 2026-05-14 · unreviewed · ref 52

arXiv preprint arXiv:2310.06694 (2023)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer