citation dossier

preprint arXiv:1803.05407 , year=

Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson · 2018 · arXiv 1803.05407

20Pith papers citing it

20reference links

cs.LGtop field · 6 papers

UNVERDICTEDtop verdict bucket · 19 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 20 reviewed papers. Its strongest current cluster is cs.LG (6 papers). The largest review-status bucket among citing papers is UNVERDICTED (19 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

A foundation model of vision, audition, and language for in-silico neuroscience

q-bio.NC · 2026-05-05 · unverdicted · novelty 7.0

TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.

Differentially Private Model Merging

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Post-processing via random selection or linear combination generates differentially private models for arbitrary privacy parameters from pre-trained models on the same dataset.

Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading

cs.CR · 2026-04-19 · unverdicted · novelty 7.0

Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.

Scalable Neural Decoders for Practical Fault-Tolerant Quantum Computation

quant-ph · 2026-04-09 · unverdicted · novelty 7.0

Neural decoder for quantum LDPC codes achieves ~10^{-10} logical error at 0.1% physical error with 17x improvement and high throughput, enabling practical fault tolerance at modest code sizes.

Spatial Adapter: Structured Spatial Decomposition and Closed-Form Covariance for Frozen Predictors

stat.ML · 2026-05-12 · unverdicted · novelty 6.0

The Spatial Adapter equips frozen predictors with a spatially regularized orthonormal basis for residuals and derives a closed-form low-rank-plus-noise covariance for spatial prediction and kriging.

TopoGeoScore: A Self-Supervised Source-Only Geometric Framework for OOD Checkpoint Selection

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

TopoGeoScore combines a torsion-inspired Laplacian log-determinant, Ollivier-Ricci curvature, and higher-order topological summaries from source embeddings, with weights learned via self-supervised invariance to geometry-preserving views, to rank checkpoints by expected OOD robustness.

MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text

cs.CL · 2026-05-07 · unverdicted · novelty 6.0

MELD is a multi-task AI-text detector using auxiliary heads, uncertainty-weighted losses, EMA distillation, and pairwise ranking that reaches 99.9% TPR at 1% FPR on a new held-out benchmark while remaining competitive on the RAID leaderboard.

CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

CPCANet deep-unfolds Common PCA to learn domain-invariant subspaces, achieving state-of-the-art zero-shot domain generalization on standard benchmarks.

Perturb and Correct: Post-Hoc Ensembles using Affine Redundancy

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

Perturb-and-Correct generates epistemically diverse predictors from a single pretrained network via hidden-layer perturbations followed by affine least-squares corrections that enforce agreement on calibration data.

Using Graph Neural Networks for hadronic clustering and to reduce beam background in the Belle~II electromagnetic calorimeter

hep-ex · 2026-04-22 · unverdicted · novelty 6.0

Graph neural networks can identify and remove unwanted beam background depositions in the Belle II calorimeter to improve hadronic clustering and reduce fake photon clusters.

FastAT Benchmark: A Comprehensive Framework for Fair Evaluation of Fast Adversarial Training Methods

cs.CV · 2026-04-22 · conditional · novelty 6.0

The FastAT Benchmark standardizes evaluation of over twenty fast adversarial training methods under unified conditions, showing that well-designed single-step approaches can match or exceed PGD-AT robustness at lower training cost on CIFAR-10, CIFAR-100, and Tiny-ImageNet.

Generalization at the Edge of Stability

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

Training at the edge of stability causes neural network optimizers to converge on fractal attractors whose effective dimension, measured via a new sharpness dimension from the Hessian spectrum, bounds generalization error in a way not captured by prior trace or norm measures.

Benchmarking Optimizers for MLPs in Tabular Deep Learning

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.

Vision Transformers Need Registers

cs.CV · 2023-09-28 · unverdicted · novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

Biologically-Grounded Multi-Encoder Architectures as Developability Oracles for Antibody Design

q-bio.BM · 2026-04-10 · unverdicted · novelty 5.0

CrossAbSense oracles using frozen PLM encoders plus self- or cross-attention decoders improve prediction accuracy by 12-20% on three of five developability assays for therapeutic IgGs, with architecture choices revealing that aggregation depends on single-chain signals while stability requires heavy

MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications

cs.CV · 2026-04-03 · unverdicted · novelty 5.0

MOMO merges sensor-specific models from three Mars orbital instruments at matched validation loss stages to form a foundation model that outperforms ImageNet, Earth observation, sensor-specific, and supervised baselines on nine Mars-Bench tasks.

Revitalizing the Beginning: Avoiding Storage Dependency for Model Merging in Continual Learning

cs.LG · 2026-05-08 · unverdicted · novelty 4.0

The paper proposes Trajectory Regularized Merging (TRM) to enable storage-free model merging in continual learning by optimizing in an augmented trajectory subspace with task alignment, prediction consistency, and gradient responsiveness objectives, claiming SOTA results.

Momentum-Anchored Multi-Scale Fusion Model for Long-Tailed Chest X-Ray Classification

cs.CV · 2026-05-04 · unverdicted · novelty 4.0

A new neural network stabilizes features for rare chest X-ray diseases via momentum anchoring and multi-scale fusion on EfficientNet, achieving 0.8682 AUC on ChestX-ray14.

Phoenix-VL 1.5 Medium Technical Report

cs.CL · 2026-05-11 · unverdicted · novelty 3.0

Phoenix-VL 1.5 Medium is a 123B-parameter natively multimodal model that reaches state-of-the-art results on Singapore multimodal, legal, and policy benchmarks after localized training on 1T+ tokens while staying competitive on global benchmarks.

LLMs Struggle with Abstract Meaning Comprehension More Than Expected

cs.CL · 2026-04-13 · unverdicted · novelty 3.0

LLMs struggle with abstract meaning comprehension on SemEval-2021 Task 4 more than fine-tuned models, and a new bidirectional attention classifier yields small accuracy gains of 3-4%.

citing papers explorer

Showing 20 of 20 citing papers.

A foundation model of vision, audition, and language for in-silico neuroscience q-bio.NC · 2026-05-05 · unverdicted · none · ref 65
TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.
Differentially Private Model Merging cs.LG · 2026-04-22 · unverdicted · none · ref 33
Post-processing via random selection or linear combination generates differentially private models for arbitrary privacy parameters from pre-trained models on the same dataset.
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading cs.CR · 2026-04-19 · unverdicted · none · ref 188
Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
Scalable Neural Decoders for Practical Fault-Tolerant Quantum Computation quant-ph · 2026-04-09 · unverdicted · none · ref 76
Neural decoder for quantum LDPC codes achieves ~10^{-10} logical error at 0.1% physical error with 17x improvement and high throughput, enabling practical fault tolerance at modest code sizes.
Spatial Adapter: Structured Spatial Decomposition and Closed-Form Covariance for Frozen Predictors stat.ML · 2026-05-12 · unverdicted · none · ref 21
The Spatial Adapter equips frozen predictors with a spatially regularized orthonormal basis for residuals and derives a closed-form low-rank-plus-noise covariance for spatial prediction and kriging.
TopoGeoScore: A Self-Supervised Source-Only Geometric Framework for OOD Checkpoint Selection cs.LG · 2026-05-09 · unverdicted · none · ref 12
TopoGeoScore combines a torsion-inspired Laplacian log-determinant, Ollivier-Ricci curvature, and higher-order topological summaries from source embeddings, with weights learned via self-supervised invariance to geometry-preserving views, to rank checkpoints by expected OOD robustness.
MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text cs.CL · 2026-05-07 · unverdicted · none · ref 18
MELD is a multi-task AI-text detector using auxiliary heads, uncertainty-weighted losses, EMA distillation, and pairwise ranking that reaches 99.9% TPR at 1% FPR on a new held-out benchmark while remaining competitive on the RAID leaderboard.
CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization cs.CV · 2026-05-06 · unverdicted · none · ref 44
CPCANet deep-unfolds Common PCA to learn domain-invariant subspaces, achieving state-of-the-art zero-shot domain generalization on standard benchmarks.
Perturb and Correct: Post-Hoc Ensembles using Affine Redundancy cs.LG · 2026-05-02 · unverdicted · none · ref 24
Perturb-and-Correct generates epistemically diverse predictors from a single pretrained network via hidden-layer perturbations followed by affine least-squares corrections that enforce agreement on calibration data.
Using Graph Neural Networks for hadronic clustering and to reduce beam background in the Belle~II electromagnetic calorimeter hep-ex · 2026-04-22 · unverdicted · none · ref 9
Graph neural networks can identify and remove unwanted beam background depositions in the Belle II calorimeter to improve hadronic clustering and reduce fake photon clusters.
FastAT Benchmark: A Comprehensive Framework for Fair Evaluation of Fast Adversarial Training Methods cs.CV · 2026-04-22 · conditional · none · ref 27
The FastAT Benchmark standardizes evaluation of over twenty fast adversarial training methods under unified conditions, showing that well-designed single-step approaches can match or exceed PGD-AT robustness at lower training cost on CIFAR-10, CIFAR-100, and Tiny-ImageNet.
Generalization at the Edge of Stability cs.LG · 2026-04-21 · unverdicted · none · ref 36
Training at the edge of stability causes neural network optimizers to converge on fractal attractors whose effective dimension, measured via a new sharpness dimension from the Hessian spectrum, bounds generalization error in a way not captured by prior trace or norm measures.
Benchmarking Optimizers for MLPs in Tabular Deep Learning cs.LG · 2026-04-16 · unverdicted · none · ref 4
Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.
Vision Transformers Need Registers cs.CV · 2023-09-28 · unverdicted · none · ref 166
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
Biologically-Grounded Multi-Encoder Architectures as Developability Oracles for Antibody Design q-bio.BM · 2026-04-10 · unverdicted · none · ref 12
CrossAbSense oracles using frozen PLM encoders plus self- or cross-attention decoders improve prediction accuracy by 12-20% on three of five developability assays for therapeutic IgGs, with architecture choices revealing that aggregation depends on single-chain signals while stability requires heavy
MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications cs.CV · 2026-04-03 · unverdicted · none · ref 32
MOMO merges sensor-specific models from three Mars orbital instruments at matched validation loss stages to form a foundation model that outperforms ImageNet, Earth observation, sensor-specific, and supervised baselines on nine Mars-Bench tasks.
Revitalizing the Beginning: Avoiding Storage Dependency for Model Merging in Continual Learning cs.LG · 2026-05-08 · unverdicted · none · ref 30
The paper proposes Trajectory Regularized Merging (TRM) to enable storage-free model merging in continual learning by optimizing in an augmented trajectory subspace with task alignment, prediction consistency, and gradient responsiveness objectives, claiming SOTA results.
Momentum-Anchored Multi-Scale Fusion Model for Long-Tailed Chest X-Ray Classification cs.CV · 2026-05-04 · unverdicted · none · ref 14
A new neural network stabilizes features for rare chest X-ray diseases via momentum anchoring and multi-scale fusion on EfficientNet, achieving 0.8682 AUC on ChestX-ray14.
Phoenix-VL 1.5 Medium Technical Report cs.CL · 2026-05-11 · unverdicted · none · ref 11
Phoenix-VL 1.5 Medium is a 123B-parameter natively multimodal model that reaches state-of-the-art results on Singapore multimodal, legal, and policy benchmarks after localized training on 1T+ tokens while staying competitive on global benchmarks.
LLMs Struggle with Abstract Meaning Comprehension More Than Expected cs.CL · 2026-04-13 · unverdicted · none · ref 12
LLMs struggle with abstract meaning comprehension on SemEval-2021 Task 4 more than fine-tuned models, and a new bidirectional attention classifier yields small accuracy gains of 3-4%.

preprint arXiv:1803.05407 , year=

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer