Torchao: Pytorch- native training-to-serving model optimization.arXiv preprint arXiv:2507.16099

Andrew Or, Apurva Jain, Daniel Vega-Myhre, Jesse Cai, Charles David Hernandez, Zhenrui Zheng, Driss Guessous, Vasiliy Kuznetsov, Christian Puhrsch, Mark Saroufim, Supriya Rao, Thien Tran, Aleksandar Samardžić · 2025 · arXiv 2507.16099

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Zero-Shot Quantization via Weight-Space Arithmetic

cs.CV · 2026-04-03 · unverdicted · novelty 8.0

A quantization vector derived from a donor model via weight-space arithmetic can be added to a receiver model to improve post-PTQ Top-1 accuracy by up to 60 points in 3-bit settings without receiver-side QAT or data.

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.

StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

StoSignSGD resolves SignSGD divergence on non-smooth objectives via structural stochasticity, matching optimal convex rates and improving non-convex bounds while delivering 1.44-2.14x speedups in FP8 LLM pretraining.

torchtune: PyTorch native post-training library

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

torchtune is a modular PyTorch library for LLM post-training that delivers competitive performance and memory efficiency while supporting rapid research iteration through hackable components.

citing papers explorer

Showing 4 of 4 citing papers.

Zero-Shot Quantization via Weight-Space Arithmetic cs.CV · 2026-04-03 · unverdicted · none · ref 6
A quantization vector derived from a donor model via weight-space arithmetic can be added to a receiver model to improve post-PTQ Top-1 accuracy by up to 60 points in 3-bit settings without receiver-side QAT or data.
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale cs.LG · 2026-05-11 · unverdicted · none · ref 64 · 2 links
LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models cs.LG · 2026-04-16 · unverdicted · none · ref 29
StoSignSGD resolves SignSGD divergence on non-smooth objectives via structural stochasticity, matching optimal convex rates and improving non-convex bounds while delivering 1.44-2.14x speedups in FP8 LLM pretraining.
torchtune: PyTorch native post-training library cs.LG · 2026-05-20 · unverdicted · none · ref 62
torchtune is a modular PyTorch library for LLM post-training that delivers competitive performance and memory efficiency while supporting rapid research iteration through hackable components.

Torchao: Pytorch- native training-to-serving model optimization.arXiv preprint arXiv:2507.16099

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer