pith. sign in

hub

Llm pruning and distillation in practice: The minitron approach

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

roles

background 1

polarities

background 1

representative citing papers

FTerViT: Fully Ternary Vision Transformer

cs.CV · 2026-05-20 · conditional · novelty 7.0

FTerViT introduces fully ternary Vision Transformers with TernaryBitConv2d and TernaryLayerNorm operators, achieving 82.43% ImageNet top-1 at 6.09 MB with 15x compression.

Contribution Weights: A Geometrical Analysis of Self-Attention Transformers

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

Contribution Weights combine attention, value magnitude, and directional alignment to measure token influence more faithfully than attention alone, and show attention sinks actively suppress information via a convex sink-rate to output-norm relationship.

Small LLMs: Pruning vs. Training from Scratch

cs.LG · 2026-06-12 · unverdicted · novelty 5.0

Pruned initializations from an 8B model outperform random starts with equal training tokens, but with full token budgets fine-grained pruning retains advantage while coarse structured pruning does not.

NVIDIA Nemotron 3: Efficient and Open Intelligence

cs.CL · 2025-12-24 · unverdicted · novelty 5.0

NVIDIA releases the Nemotron 3 model family with hybrid Mamba-Transformer architecture, LatentMoE, NVFP4 training, MTP layers, and multi-environment RL post-training for reasoning and agentic tasks.

Ministral 3

cs.CL · 2026-01-13 · unverdicted · novelty 4.0

Ministral 3 releases 3B/8B/14B parameter-efficient language models with base, instruction, and reasoning variants derived via iterative pruning and distillation, including image understanding capabilities.

citing papers explorer

Showing 15 of 15 citing papers.