hub Mixed citations

PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transforma- tion and Graph Compilation

Zihan Liu, Wentao Ni, Jingwen Leng, Yu Feng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, Yuhao Zhu · 2024 · arXiv 0665.364036

Mixed citation behavior. Most common role is background (57%).

61 Pith papers citing it

Background 57% of classified citations

read on arXiv browse 61 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 method 6

citation-polarity summary

background 8 use method 6

representative citing papers

StreamKL: Fast and Memory-Efficient KL Divergence for Boosting Attention Distillation

cs.LG · 2026-06-18 · unverdicted · novelty 8.0

StreamKL is the first fused GPU primitive for attention KL divergence that reduces memory from O(N_Q N_K) to O(1) via an online one-pass formulation and tile-wise recomputation.

Demystifying the Silence of Correctness Bugs in PyTorch Compiler

cs.SE · 2026-04-09 · conditional · novelty 8.0

First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.

Training a Predictive Coding Network on ImageNet using Equilibrium Propagation

cs.LG · 2026-06-02 · unverdicted · novelty 7.0

A VGG10 predictive coding network is trained on ImageNet via equilibrium propagation to 13.23% top-5 error, close to the 12.2% backpropagation baseline, marking the first such demonstration at this scale.

RT-RkNN: Reverse k Nearest Neighbor Queries as a Graphics Ray Casting Problem

cs.DB · 2026-05-26 · unverdicted · novelty 7.0

Reformulates RkNN queries as graphics ray casting to leverage GPU ray-tracing cores, claiming better performance than prior methods in challenging spatial database scenarios.

AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing

cs.CV · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

AIGaitor is the first claimed end-to-end on-device monocular motion-capture and deep-learning gait analysis pipeline demonstrated on consumer smartphones.

Locking Pretrained Weights via Deep Low-Rank Residual Distillation

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

DLR-Lock locks open-weight LLMs against unauthorized fine-tuning by swapping MLPs for deep low-rank residual networks that inflate backprop memory and complicate optimization, yet preserve original capabilities via module-wise distillation.

End-to-End Keyword Spotting on FPGA Using Graph Neural Networks with a Neuromorphic Auditory Sensor

cs.LG · 2026-05-10 · conditional · novelty 7.0

An FPGA implementation of a neuromorphic auditory sensor plus graph neural network achieves 87.43% accuracy on Google Speech Commands v2 with sub-35 µs latency and 1.12 W power.

Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.

VNN-LIB 2.0: Rigorous Foundations for Neural Network Verification

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

VNN-LIB 2.0 defines a network theory abstraction, formal query syntax, type system over numeric domains, and Agda-mechanized semantics to provide rigorous foundations for neural network verification independent of evolving model formats.

Sarus Suite: Cloud-native Containers for HPC

cs.DC · 2026-04-18 · unverdicted · novelty 7.0

Sarus Suite shows HPC can match production container performance using an unmodified Podman engine plus explicit system layers for scheduling, scalable images, and host integration.

Neuro-Symbolic ODE Discovery with Latent Grammar Flow

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

Latent Grammar Flow discovers ODEs by placing grammar-based equation representations in a discrete latent space, using a behavioral loss to cluster similar equations, and sampling via a discrete flow model guided by data fit and constraints.

A Large-Scale Comparative Analysis of Imputation Methods for Single-Cell RNA Sequencing Data

q-bio.GN · 2026-03-25 · unverdicted · novelty 7.0

A large benchmark finds traditional imputation methods for scRNA-seq data generally outperform deep learning ones, but numerical recovery does not reliably improve biological downstream analyses and no method wins across all settings.

In Situ Training of Implicit Neural Compressors for Scientific Simulations via Sketch-Based Regularization

cs.LG · 2025-11-04 · unverdicted · novelty 7.0

Sketch-based regularization allows in situ training of implicit neural compressors to approximately match offline performance on 2D/3D simulation data at high compression rates.

MALOQ: Massively Accelerated Learning of Operators for Quantum Transport

cs.LG · 2026-06-27 · unverdicted · novelty 6.0

MALOQ introduces a scalable SO(2)-equivariant ML framework with custom kernels and edge-wise graph distribution for predicting large-scale quantum transport operators.

Finding Compiler-Platform Interaction Bugs in Deep Learning Pipelines via Cross-Layer Constraints

cs.SE · 2026-06-16 · unverdicted · novelty 6.0

XCheck extracts cross-layer constraints to generate test models and monitor behaviors, revealing 2,034 compiler-platform interaction bugs in three DL compilers.

GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving

cs.DC · 2026-06-11 · unverdicted · novelty 6.0

GF-DiT introduces elastic GPU parallelism scheduling for DiT serving via asynchronous trajectory tasks and group-free collectives, reporting up to 6.01x throughput gains over static configurations.

Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

The paper constructs an SCPI dataset via LLM-based annotation and trains classifiers to detect sensitive personal information in Japanese pre-training corpora, claiming this is the first such exploration.

WHET: Welding Homomorphic Encryption to Accelerator Architectures

cs.CR · 2026-06-10 · unverdicted · novelty 6.0

WHET applies fine-grained coefficient-to-slot transforms, plaintext compression, and modulus raising plus lightweight hardware tweaks to FHE accelerators, delivering 1.38-8.74x per-area gains and sub-millisecond CKKS bootstrapping.

Optimal Post-Training Quantization Scales and Where to Find Them

cs.LG · 2026-06-09 · unverdicted · novelty 6.0

PiSO computes exact optimal channel-wise quantization scales for PTQ by partitioning the scale search space into intervals admitting closed-form minimizers, with extensions to group-wise quantization and error correction.

ANNS-AMP: Accelerating Approximate Nearest Neighbor Search via Adaptive Mixed-Precision Computing

cs.PF · 2026-06-05 · unverdicted · novelty 6.0

ANNS-AMP adapts distance-computation precision to vector-space regions via a lightweight cluster-level predictor and a bit-serial accelerator, delivering 163.76x/10.57x/2.06x average speedups and 1100x/39.41x/6.66x energy reductions versus CPU/GPU/custom baselines with <2.7% accuracy loss.

KForge: LLM-Driven Cross-Platform Kernel Generation for AI Accelerators

cs.LG · 2026-06-01 · unverdicted · novelty 6.0

KForge uses dual LLM agents for cross-platform kernel generation, reporting 2.12% throughput gain on NVIDIA B200 vs TensorRT-LLM and 5.13x geometric mean speedup on Intel Arc B580 vs PyTorch on 37 workloads.

PINNs Failure Modes are Overfitting

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

PINN failure modes are overfitting to collocation points; regularization and double backpropagation over full residuals fix them, achieving SOTA with up to 23x fewer points on standard benchmarks.

Specular gradient methods for nonsmooth convex optimization in Euclidean spaces: a subgradient selection strategy

math.OC · 2026-05-25 · unverdicted · novelty 6.0

Introduces specular gradients and three convergent subgradient selection methods for nonsmooth convex optimization in Euclidean spaces.

Courant: a State-Adaptive Perceiver-Based Neural Surrogate with Local Support and Interpretable Field Decomposition

cs.LG · 2026-05-24 · unverdicted · novelty 6.0

Courant is a state-adaptive Perceiver encoder-processor-decoder surrogate trained with L2 loss that yields interpretable, multiscale, locally supported latent features acting as time-evolving spatial basis functions.

citing papers explorer

Showing 11 of 61 citing papers.

The Role and Relationship of Initialization and Densification in 3D Gaussian Splatting cs.CV · 2026-03-21 · unverdicted · none · ref 1
Current densification methods in 3D Gaussian Splatting do not significantly benefit from dense initializations and perform similarly to sparse SfM-based ones.
Hyperdimensional Decoding of Spiking Neural Networks cs.AI · 2025-11-11 · unverdicted · none · ref 52
SNN-HDC decoding delivers better accuracy, lower latency, and 1.24x-3.67x lower estimated energy than standard methods on DvsGesture and SL-Animals-DVS while detecting 100% of samples from an untrained class.
A Study of Parallel Continuous Local Search cs.AI · 2026-06-04 · unverdicted · none · ref 8
Empirical study of parallel continuous local search for SAT finds redundant constraints can slow convergence, CLS works as a hybrid sub-solver, and search stabilizes quickly due to saddle-dense objectives.
Viability of Tensor Train Methods for Geophysical Fluid Dynamics physics.flu-dyn · 2026-05-18 · unverdicted · none · ref 30
Tensor train methods compress and accelerate simple GFD flows but struggle to represent complex realistic states in shallow water equation tests.
Can Muon Fine-tune Adam-Pretrained Models? cs.LG · 2026-05-11 · unverdicted · none · ref 89
Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.
Evaluating Artificial Intelligence Algorithms for the Standardization of Transtibial Prosthetic Socket Shape Design cs.LG · 2025-04-30 · unverdicted · none · ref 22
Random forest predicting prosthetist adaptations from limb scans achieves median surface-to-surface error of 1.24 mm, outperforming direct socket shape prediction and other models.
An Engineering Journey Training Large Language Models at Scale on Alps: The Apertus Experience cs.DC · 2026-04-14 · unverdicted · none · ref 19
Apertus, a 70B open multilingual foundation model, was pre-trained on the Alps supercomputer, with details on adapting HPC infrastructure into a resilient ML platform.
Benchmarking Quantum Red TEA on CPUs, GPUs, and TPUs quant-ph · 2024-09-05 · unverdicted · none · ref 1
Benchmarking of variational tensor network ground-state searches reports 34x CPU speedup via parameter tuning and an additional 2.76x gain when moving to GPUs.
Quantum-inspired tensor networks in machine learning models cs.LG · 2026-04-15 · unverdicted · none · ref 5
Tensor networks developed for quantum states are reviewed as tools for machine learning models, with assessment of their potential computational, explanatory, and privacy advantages alongside remaining challenges.
GraphMend: Code Transformations for Fixing Graph Breaks in PyTorch 2 cs.PL · 2025-09-17 · unreviewed · ref 2
Synthetic Homes: A Multimodal Generative AI Pipeline for Residential Building Data Generation under Data Scarcity cs.AI · 2025-09-11 · unreviewed · ref 31

PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transforma- tion and Graph Compilation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer