hub

PyTorch 2: Faster machine learning through dynamic Python bytecode transformation and graph compilation

· 2024 · arXiv 0665.364036

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Demystifying the Silence of Correctness Bugs in PyTorch Compiler

cs.SE · 2026-04-09 · conditional · novelty 8.0

First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.

Locking Pretrained Weights via Deep Low-Rank Residual Distillation

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

DLR-Lock locks open-weight LLMs against unauthorized fine-tuning by swapping MLPs for deep low-rank residual networks that inflate backprop memory and complicate optimization, yet preserve original capabilities via module-wise distillation.

When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning

cs.AI · 2026-05-11 · conditional · novelty 7.0

State-conditioned commitment depth in a vision-language policy Pareto-dominates fixed-depth baselines on Sliding Puzzle and Sokoban, raising solve rates by up to 12.5 points while using 25% fewer actions and beating larger models.

End-to-End Keyword Spotting on FPGA Using Graph Neural Networks with a Neuromorphic Auditory Sensor

cs.LG · 2026-05-10 · conditional · novelty 7.0

An FPGA implementation of a neuromorphic auditory sensor plus graph neural network achieves 87.43% accuracy on Google Speech Commands v2 with sub-35 µs latency and 1.12 W power.

Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.

VNN-LIB 2.0: Rigorous Foundations for Neural Network Verification

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

VNN-LIB 2.0 defines a network theory abstraction, formal query syntax, type system over numeric domains, and Agda-mechanized semantics to provide rigorous foundations for neural network verification independent of evolving model formats.

Sarus Suite: Cloud-native Containers for HPC

cs.DC · 2026-04-18 · unverdicted · novelty 7.0

Sarus Suite shows HPC can match production container performance using an unmodified Podman engine plus explicit system layers for scheduling, scalable images, and host integration.

Neuro-Symbolic ODE Discovery with Latent Grammar Flow

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

Latent Grammar Flow discovers ODEs by placing grammar-based equation representations in a discrete latent space, using a behavioral loss to cluster similar equations, and sampling via a discrete flow model guided by data fit and constraints.

Learning Minimally Rigid Graphs with High Realization Counts

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Reinforcement learning with graph neural networks finds minimally rigid graphs that match known planar realization optima and set new records for spherical realization counts.

ShardTensor: Domain Parallelism for Scientific Machine Learning

cs.DC · 2026-05-11 · unverdicted · novelty 6.0

ShardTensor is a domain-parallelism system for SciML that enables flexible scaling of extreme-resolution spatial datasets by removing the constraint of batch size one per device.

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

LoKA enables practical FP8 use in numerically sensitive large recommendation models via profiling, model adaptations, and runtime kernel orchestration.

Doubly Robust Proxy Causal Learning with Neural Mean Embeddings

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

A neural doubly robust proxy causal learning framework using mean embeddings for treatment bridges provides consistent estimators for causal dose-response functions under unobserved confounding for continuous and structured treatments.

ExecuTorch -- A Unified PyTorch Solution to Run AI Models On-Device

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

ExecuTorch is a unified PyTorch-native deployment framework that enables seamless on-device execution of AI models across heterogeneous hardware while preserving original PyTorch semantics.

TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches

cs.CV · 2026-04-10 · unverdicted · novelty 6.0

TouchAnything reconstructs accurate 3D object geometries from only a few tactile contacts by optimizing for consistency with a pretrained visual diffusion prior.

Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models

cs.AR · 2026-04-04 · unverdicted · novelty 6.0

Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.

The $\textit{Silicon Society}$ Cookbook: Design Space of LLM-based Social Simulations

cs.MA · 2026-04-30 · unverdicted · novelty 5.0

The base LLM choice dominates simulation outcomes in LLM-based social networks, while other design parameters show either additive or complex interactive effects.

Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML

cs.DC · 2026-04-19 · unverdicted · novelty 5.0

Flint generates compiler-derived workload graphs that support cluster-free design space exploration for distributed machine learning systems.

Can Muon Fine-tune Adam-Pretrained Models?

cs.LG · 2026-05-11 · unverdicted · novelty 4.0

Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.

An Engineering Journey Training Large Language Models at Scale on Alps: The Apertus Experience

cs.DC · 2026-04-14 · unverdicted · novelty 3.0

Apertus, a 70B open multilingual foundation model, was pre-trained on the Alps supercomputer, with details on adapting HPC infrastructure into a resilient ML platform.

Quantum-inspired tensor networks in machine learning models

cs.LG · 2026-04-15 · unverdicted · novelty 2.0

Tensor networks developed for quantum states are reviewed as tools for machine learning models, with assessment of their potential computational, explanatory, and privacy advantages alongside remaining challenges.

citing papers explorer

Showing 20 of 20 citing papers.

Demystifying the Silence of Correctness Bugs in PyTorch Compiler cs.SE · 2026-04-09 · conditional · none · ref 1
First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.
Locking Pretrained Weights via Deep Low-Rank Residual Distillation cs.LG · 2026-05-11 · unverdicted · none · ref 1
DLR-Lock locks open-weight LLMs against unauthorized fine-tuning by swapping MLPs for deep low-rank residual networks that inflate backprop memory and complicate optimization, yet preserve original capabilities via module-wise distillation.
When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning cs.AI · 2026-05-11 · conditional · none · ref 3
State-conditioned commitment depth in a vision-language policy Pareto-dominates fixed-depth baselines on Sliding Puzzle and Sokoban, raising solve rates by up to 12.5 points while using 25% fewer actions and beating larger models.
End-to-End Keyword Spotting on FPGA Using Graph Neural Networks with a Neuromorphic Auditory Sensor cs.LG · 2026-05-10 · conditional · none · ref 2
An FPGA implementation of a neuromorphic auditory sensor plus graph neural network achieves 87.43% accuracy on Google Speech Commands v2 with sub-35 µs latency and 1.12 W power.
Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval cs.CV · 2026-05-08 · unverdicted · none · ref 40
A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.
VNN-LIB 2.0: Rigorous Foundations for Neural Network Verification cs.LG · 2026-05-08 · unverdicted · partial · ref 1
VNN-LIB 2.0 defines a network theory abstraction, formal query syntax, type system over numeric domains, and Agda-mechanized semantics to provide rigorous foundations for neural network verification independent of evolving model formats.
Sarus Suite: Cloud-native Containers for HPC cs.DC · 2026-04-18 · unverdicted · none · ref 2
Sarus Suite shows HPC can match production container performance using an unmodified Podman engine plus explicit system layers for scheduling, scalable images, and host integration.
Neuro-Symbolic ODE Discovery with Latent Grammar Flow cs.LG · 2026-04-17 · unverdicted · none · ref 36
Latent Grammar Flow discovers ODEs by placing grammar-based equation representations in a discrete latent space, using a behavioral loss to cluster similar equations, and sampling via a discrete flow model guided by data fit and constraints.
Learning Minimally Rigid Graphs with High Realization Counts cs.LG · 2026-05-12 · unverdicted · none · ref 57
Reinforcement learning with graph neural networks finds minimally rigid graphs that match known planar realization optima and set new records for spherical realization counts.
ShardTensor: Domain Parallelism for Scientific Machine Learning cs.DC · 2026-05-11 · unverdicted · none · ref 67
ShardTensor is a domain-parallelism system for SciML that enables flexible scaling of extreme-resolution spatial datasets by removing the constraint of batch size one per device.
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale cs.LG · 2026-05-11 · unverdicted · none · ref 6
LoKA enables practical FP8 use in numerically sensitive large recommendation models via profiling, model adaptations, and runtime kernel orchestration.
Doubly Robust Proxy Causal Learning with Neural Mean Embeddings cs.LG · 2026-05-10 · unverdicted · none · ref 51
A neural doubly robust proxy causal learning framework using mean embeddings for treatment bridges provides consistent estimators for causal dose-response functions under unobserved confounding for continuous and structured treatments.
ExecuTorch -- A Unified PyTorch Solution to Run AI Models On-Device cs.LG · 2026-05-05 · unverdicted · none · ref 9
ExecuTorch is a unified PyTorch-native deployment framework that enables seamless on-device execution of AI models across heterogeneous hardware while preserving original PyTorch semantics.
TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches cs.CV · 2026-04-10 · unverdicted · none · ref 1
TouchAnything reconstructs accurate 3D object geometries from only a few tactile contacts by optimizing for consistency with a pretrained visual diffusion prior.
Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models cs.AR · 2026-04-04 · unverdicted · none · ref 53
Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.
The $\textit{Silicon Society}$ Cookbook: Design Space of LLM-based Social Simulations cs.MA · 2026-04-30 · unverdicted · none · ref 1
The base LLM choice dominates simulation outcomes in LLM-based social networks, while other design parameters show either additive or complex interactive effects.
Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML cs.DC · 2026-04-19 · unverdicted · none · ref 2
Flint generates compiler-derived workload graphs that support cluster-free design space exploration for distributed machine learning systems.
Can Muon Fine-tune Adam-Pretrained Models? cs.LG · 2026-05-11 · unverdicted · none · ref 89
Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.
An Engineering Journey Training Large Language Models at Scale on Alps: The Apertus Experience cs.DC · 2026-04-14 · unverdicted · none · ref 19
Apertus, a 70B open multilingual foundation model, was pre-trained on the Alps supercomputer, with details on adapting HPC infrastructure into a resilient ML platform.
Quantum-inspired tensor networks in machine learning models cs.LG · 2026-04-15 · unverdicted · none · ref 5
Tensor networks developed for quantum states are reviewed as tools for machine learning models, with assessment of their potential computational, explanatory, and privacy advantages alongside remaining challenges.

PyTorch 2: Faster machine learning through dynamic Python bytecode transformation and graph compilation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer