mega hub Mixed citations

Deep residual learning for image recognition

Jian Sun, Kaiming He, Shaoqing Ren, Xiangyu Zhang · 2016 · 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) · DOI 10.1109/cvpr.2016.90

Mixed citation behavior. Most common role is method (46%).

193 Pith papers citing it

164.2k external citations · Crossref

Method 46% of classified citations

open at publisher browse 193 citing papers more from Jian Sun

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

method 18 background 14 baseline 2 dataset 1

citation-polarity summary

use method 16 background 14 baseline 2 unclear 2 use dataset 1

claims ledger

method These channels are not independent signals but jointly represent a single complex-valued measurement, where the relationship between them encodes the local phase. Unlike magnitude-only approaches, where a single intensity channel is compressed, this coupling must be explicitly preserved. The architecture, loss function, and evaluation metrics described below are designed accordingly. The architecture is implemented as a ResNet-based [20] conditional variational autoencoder (CVAE) [21]. The encod
method Together, these considerations make a scalable, high-speed, and robust reconstruction capable of operating at Monte Carlo scale essential for Hyper-Kamiokande. Machine-learning based reconstruction offers a promising path toward meeting these computational and topological chal- lenges. Convolutional neural networks [ 16], and in particular residual networks (ResNets) [17], are well suited to process the high-dimensional charge and time images recorded by the PMT array. At Super-Kamiokande, machi
method Instead of binary classification, our model classifies into four states (LL,L,H,HH), and instead of training CNN feature extractors from scratch, we use pre-trained ResNet50 using transfer learning. The model architecture is shown in Figure 3. 3.6.1 Feature extraction.The first step is to extract features from each of the seven images. Here we apply transfer learning using ResNet50 [22], pre-trained on a large dataset. We extract information from the penultimate layer of ResNet50, compressing ea
dataset historical video and recomputes attention upon query arrival. (2) ReKV [12] retrieves query-relevant KVCache at the token level. (3) LiveVLM [13] further combines token-level retrieval with KVCache compression to reduce memory usage. (4) StreamMem [14] also compresses KVCache, but under a TABLE II DATASET CONFIGURATIONS. Dataset Max Length Description MLVU [19] 703s multi-task long video LongVideoBench [20] 468s long-term multi-modal video VideoMME [21] 1,018s full-spectrum multi-modal video RVS
background Training on such data could reinforce areas where AI systems are vulnerable [37, 796], enhancing their robustness in real-world applications. Adversarial examples can be constructed in various ways. One straightforward approach is to add small perturbations to inputs, which preserves their original labels while introducing adversarial characteristics [100, 260, 300, 504]. Another effective strategy is red teaming, which usually involves human teams systematically testing to find vulnerabilities
method histopathological images [2], [4], [5], [6]. CNN have been widely adopted for cancer detection due to their ability to capture local texture patterns and hierarchical spatial features. Residual learning has been introduced to alleviate the vanishing gradient problem, leading to significant improvements in deep feature representation, as exemplified by ResNet architectures [7]. Similarly, DenseNet and kernel architectures enhance feature reuse and gradient flow, while EfficientNet achieves state-

authors

Jian Sun Kaiming He Shaoqing Ren Xiangyu Zhang

mega hub controls

export citing contexts JSON export graph JSON export full bundle JSON annotated reader queued

Recognition alignment

counterfactual ablation

If this work disappeared, these are the nearest dependency candidates in Pith, weighted toward method, dataset, baseline, and extension contexts where available. This is a structural signal, not a retraction verdict.

co-cited works

representative citing papers

FigSIM: A Dataset for Fine-grained Suicide Severity and Figurative Language in Suicide Memes

cs.CL · 2026-06-01 · conditional · novelty 8.0

FigSIM is the first annotated dataset for fine-grained suicide severity and figurative language in suicide memes, accompanied by benchmarks on 16 unimodal and multimodal models.

Traces of Helium Detected in Type Ic Supernova 2014L

astro-ph.HE · 2026-03-31 · accept · novelty 8.0

Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.

HASTE: A Framework for Training-Free, Dynamic, and Steerable Compression of Pre-Trained Convolutional Neural Networks

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

HASTE enables training-free dynamic compression of pre-trained CNNs by patch-wise LSH-based merging of redundant channels, reporting 46.2% FLOPs reduction on ResNet34 CIFAR-10 with 1.25% accuracy drop.

Event-based Gaze Control System for Accurate Real-time Spin Estimation in Professional Ball Games

cs.CV · 2026-06-25 · unverdicted · novelty 7.0 · 2 refs

An event-camera system with active gaze control and contrast-maximization spin estimation achieves real-time performance in table tennis with 8.8% magnitude error, 6.4° axis error, 3 ms latency, and 750 Hz throughput.

MATCH: Flow Matching for Multi-View Anomaly Detection

cs.CV · 2026-06-23 · unverdicted · novelty 7.0

MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.

Multi-channel Optical Vision Model

physics.optics · 2026-06-08 · unverdicted · novelty 7.0

Spatial multiplexing in optical neural networks is repurposed as a trainable representational coordinate, demonstrated in multi-layer architectures for image classification, regression, and hybrid vision-language captioning with over one million optical phase parameters.

Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation

cs.CV · 2026-06-05 · unverdicted · novelty 7.0

An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.

DELOS: Detecting Shallow Transits in Kepler Photometry Using a Contrastive-Learning Framework

astro-ph.EP · 2026-05-28 · conditional · novelty 7.0

DELOS applies contrastive learning to phase-folded light curves to detect shallow intermediate-to-long period transits, reporting 15.5% and 11.25% gains in combined precision-recall over BLS and TLS in low-SNR tests plus 3-80x speedups.

SDM: A Powerful Tool for Evaluating Model Robustness

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

SDM is a new staged gradient attack that reconstructs the adversarial objective around probability differences and reports stronger performance than prior methods like APGD.

Your Neighbors Know: Leveraging Local Neighborhoods for Backdoor Detection in Decentralized Learning

cs.LG · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

Argus enables backdoor detection in decentralized ML by collaborative neighbor-based validation of triggers, backed by convergence theory and reducing attack success by up to 90% on tested datasets.

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

Navigating Potholes with Geometry-Aware Sharpness Minimization

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

LLQR+SAM pairs a slow learned geometry preconditioner with fast SAM perturbations to amplify escape from locally sharp 'potholes' while stabilizing flat basins, producing consistent gains over SAM and LLQR alone.

MorphoHELM: A Comprehensive Benchmark for Evaluating Representations for Microscopy-Based Morphology Assays

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

MorphoHELM is a new benchmark for Cell Painting morphology representations that tests methods across increasing batch effect levels and finds classic computer vision strategies remain the strongest general-purpose performers.

VCR: Learning Valid Contextual Representation for Incomplete Wearable Signals

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

VCR learns valid contextual representations for incomplete wearable signals via orthogonal disentanglement and missing-aware mixture-of-experts, improving robustness across full and missing-modality settings.

Martingale-Consistent Self-Supervised Learning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

The paper develops a martingale-consistent SSL framework enforcing expected coherence between coarse and refined predictions via new objectives and a Monte Carlo estimator, improving robustness under partial observations.

Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.

GPROF-IR: An Improved Single-Channel Infrared Precipitation Retrieval for Merged Satellite Precipitation Products

physics.ao-ph · 2026-05-08 · unverdicted · novelty 7.0

GPROF-IR is a CNN-based retrieval that uses temporal context in geostationary IR observations to produce precipitation estimates with lower error than prior IR methods and climatological consistency with PMW retrievals for integration into IMERG V08.

Rethinking the Need for Source Models: Source-Free Domain Adaptation from Scratch Guided by a Vision-Language Model

cs.CV · 2026-05-04 · unverdicted · novelty 7.0

The paper introduces the VODA setting for domain adaptation from scratch using vision-language models and presents TS-DRD, which achieves competitive performance on standard benchmarks without source models.

GEODE: Angle-Adaptive OOD Detection with Universal Scorer Compatibility

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

GEODE uses per-sample cosine-similarity scaling in a norm loss to preserve feature geometry for universal scorer-compatible OOD detection, matching or exceeding OE performance on CIFAR benchmarks.

PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

cs.LG · 2026-04-23 · unverdicted · novelty 7.0

Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.

Trust-SSL: Additive-Residual Selective Invariance for Robust Aerial Self-Supervised Learning

cs.CV · 2026-04-23 · accept · novelty 7.0

Trust-SSL introduces additive-residual trust weights in SSL to selectively handle corruptions in aerial imagery, yielding higher linear-probe accuracy and larger gains under severe degradations than SimCLR or VICReg.

FRTSearch: Unified Detection and Parameter Inference of Fast Radio Transients using Instance Segmentation

astro-ph.IM · 2026-04-14 · unverdicted · novelty 7.0

FRTSearch reframes fast radio transient detection as instance segmentation on dynamic spectra and uses the segmented shapes to infer dispersion measure and time of arrival, achieving 98% recall with over 99.9% fewer false positives than traditional methods.

CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance Extraction

cs.AR · 2026-04-13 · accept · novelty 7.0

CapBench is a new multi-PDK dataset of post-layout 3D windows with high-fidelity capacitance labels and multiple ML-ready representations, plus baseline results showing CNN accuracy versus GNN speed trade-offs.

citing papers explorer

Showing 50 of 193 citing papers.

WHET: Welding Homomorphic Encryption to Accelerator Architectures cs.CR · 2026-06-10 · unverdicted · none · ref 48
WHET applies fine-grained coefficient-to-slot transforms, plaintext compression, and modulus raising plus lightweight hardware tweaks to FHE accelerators, delivering 1.38-8.74x per-area gains and sub-millisecond CKKS bootstrapping.
A Comprehensive Inference-Time Augmentation Framework in Physiological Signals: Application to PPG-Based AF Detection cs.LG · 2026-06-09 · unverdicted · none · ref 32 · 2 links
A unified inference-time augmentation framework with 13 methods and Bayesian-optimized parameters improves AUROC up to 8.5% and reduces false positives in PPG-based AF detection across five datasets.
Mechanical Field Networks: Structured Neural Dynamics for Multivariate Systems cs.LG · 2026-06-08 · unverdicted · none · ref 10
MF-Net learns a shared field state and mechanical transition rule from trajectories to deliver competitive forecasting and recoverable relation matrices on Lorenz-96 and real systems.
DN-Hypo-Pipeline: An AI-Driven Workflow for Generating Hypotheses using Large Language Models and Scientific Explanations cs.AI · 2026-06-07 · unverdicted · none · ref 53
DN-Hypo-Pipeline operationalizes three philosophy-of-science accounts to direct LLMs toward principle-based hypothesis generation, claims superior performance over direct prompting, and derives two new transformer algorithms from the resulting hypotheses.
Frequency-Domain Latent Attention Gating for Cross-Domain Token Aggregation cs.LG · 2026-06-06 · unverdicted · none · ref 18
FLaG is a frequency-domain module using FFT, latent queries, and gating that improves token aggregation and shows gains on ESM2 AMP prediction and CIFAR-100 image classification while staying competitive on text tasks.
Neutrino Fingerprints: Image-Based Encodings of IceCube Events for CNN Direction Reconstruction astro-ph.IM · 2026-06-01 · unverdicted · none · ref 13
IceCube events are encoded as 72x72x3 images and processed by ResNet18 to reach 1.10 rad mean angular error in neutrino direction reconstruction.
Parameter-efficient Dual-encoder Architecture with Differentiable Choquet Integral Fusion for Underwater Acoustic Classification cs.SD · 2026-06-01 · unverdicted · none · ref 41
A parameter-efficient dual-encoder model with differentiable Choquet integral fusion improves underwater acoustic classification accuracy over single-encoder baselines on DeepShip and ShipsEar datasets.
Multimodal Action Diffusion for Robust End-to-End Autonomous Driving cs.CV · 2026-06-01 · unverdicted · none · ref 17
Action Diffusion Transformer generates multimodal driving actions via diffusion and nearest-neighbor selection, claiming SOTA on Bench2Drive with 10x lower latency.
Deep Psychovisual Image Representations cs.CV · 2026-05-28 · unverdicted · none · ref 3
Proposes a psychovisual-inspired deep learning method that encodes images in learned frequency sub-bands for interpretable semantic structures and reduced depth dependence.
Misalignment Between Backpropagation and the Hierarchy of Brain Responses to Images q-bio.NC · 2026-05-27 · unverdicted · none · ref 11
Backpropagated gradients from vision models predict higher visual cortex signals but diverge from brain hierarchies in spatial and temporal organization.
SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training cs.LG · 2026-05-26 · unverdicted · none · ref 1
SparseOpt is a new optimizer that counters batch normalization's gradient skew in dynamic sparse training, yielding faster convergence and better accuracy on ResNet models for CIFAR-100 and ImageNet.
Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation cs.LG · 2026-05-21 · unverdicted · none · ref 5
IRNO augments neural operators with learned fixed-point iterative refinement modules and a progressive spectral loss, achieving up to 56% error reduction on turbulent flow and large drops in high-frequency normalized errors on active matter.
ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models cs.LG · 2026-05-21 · unverdicted · none · ref 17 · 3 links
ARC-STAR reduces velocity rollout error by at least 36x over raw Poseidon across all tested regime cells via auditable global and local correction stages on five flow benchmarks.
Ultra-High-Definition Image Quality Assessment via Graph Representation Learning cs.CV · 2026-05-21 · unverdicted · none · ref 46
UHD-GCN-BIQA models structural dependencies among sampled patches via a hybrid kNN graph and residual graph convolutions to achieve competitive PLCC and SRCC with the lowest RMSE on the UHD-IQA benchmark for blind ultra-high-definition image quality assessment.
Spectra as Language: Large Language Models for Scalable Stellar Parameter and Abundance Inference astro-ph.IM · 2026-05-21 · unverdicted · none · ref 11 · 3 links
Two-stage LLM framework infers stellar parameters and ~20 elemental abundances from spectra, showing performance gains with increasing data volume.
VBT-MPC: Vision-Based Tactile MPC for Contour Following cs.RO · 2026-05-19 · unverdicted · none · ref 21
VBT-MPC performs robotic contour following by running MPC directly in vision-based tactile contour feature space and is tested on varied geometries in simulation and real experiments.
Chessformer: A Unified Architecture for Chess Modeling cs.LG · 2026-05-18 · unverdicted · none · ref 7
Chessformer is a unified encoder-only transformer for chess that uses square tokens, geometric attention bias, and an attention-based policy head to set new records in human move prediction accuracy, playing strength, and interpretability.
Prognostic Value of Lung Ultrasound Biomarkers for Readmission Risk in Congestive Heart Failure: A Pilot Data-Driven Analysis eess.SP · 2026-05-16 · unverdicted · none · ref 270
Pilot study uses pretrained video encoder features from lung ultrasound to predict 30-day CHF readmission, finding lower-lung views and temporal differences most informative with top MLP F1 of 0.80.
Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex cs.CV · 2026-05-15 · unverdicted · none · ref 58
MINE uses mechanistic interpretability on language-aligned image representations to generate per-voxel feature descriptions, validated via image generation and counterfactual edits that causally shift brain activation.
Successive convex optimization for transformer encoder model predictive control math.OC · 2026-05-14 · unverdicted · none · ref 5
A successive convex programming framework embeds transformer encoders into MPC by deriving DC representations of attention, guaranteeing recursive feasibility and convergence to local optima under mild assumptions.
Adam-SHANG: A Convergent Adam-Type Method for Stochastic Smooth Convex Optimization math.OC · 2026-05-13 · unverdicted · none · ref 19
Adam-SHANG is a convergent Adam variant for stochastic smooth convex optimization that uses a stable lagged-preconditioner update and a computable trace-ratio stepsize rule.
OUIDecay: Adaptive Layer-wise Weight Decay for CNNs Using Online Activation Patterns cs.LG · 2026-05-11 · unverdicted · none · ref 6
OUIDecay adaptively rescales layer-wise weight decay in CNNs using an online activation-based Overfitting-Underfitting Indicator and outperforms fixed decay in 7 of 8 tested settings.
Hybrid Machine Learning and Physical Modeling of Feedstock Deformation During Robotic 3D Printing of Continuous Fiber Thermoplastic Composites cs.CE · 2026-05-04 · unverdicted · none · ref 33
A hybrid Kelvin-Voigt viscoelastic and stabilized neural ODE model, identified from DMA and DSC experiments, predicts composite prepreg deformation in robotic 3D printing and generalizes beyond training temperatures.
Gradient-Discrepancy Acquisition for Pool-Based Active Learning cs.LG · 2026-05-04 · unverdicted · none · ref 6
Introduces gradient-discrepancy acquisition criterion derived from Luo et al. (2022) generalization bound for active learning.
MooD: Perception-Enhanced Efficient Affective Image Editing via Continuous Valence-Arousal Modeling cs.CV · 2026-05-04 · unverdicted · none · ref 8
MooD introduces continuous valence-arousal modeling with VA-aware retrieval and perception-enhanced guidance for efficient, controllable affective image editing, plus a new AffectSet dataset.
Model Merging: Foundations and Algorithms cs.LG · 2026-05-02 · unverdicted · none · ref 72
New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.
PrismAgent: Illuminating Harm in Memes via a Zero-Shot Interpretable Multi-Agent Framework cs.LG · 2026-05-01 · unverdicted · none · ref 14
PrismAgent deploys four specialized LLM agents in sequence to analyze meme intent, gather context, make preliminary judgments, and deliver a final harm verdict, outperforming prior zero-shot methods on three public datasets.
Empirical Insights of Test Selection Metrics under Multiple Testing Objectives and Distribution Shifts cs.SE · 2026-04-25 · unverdicted · none · ref 30
A broad empirical benchmark shows how 15 existing test selection metrics perform for fault detection, performance estimation, and retraining under corrupted, adversarial, temporal, natural, and label shifts across image, text, and Android data.
CAHAL: Clinically Applicable resolution enHAncement for Low-resolution MRI scans cs.CV · 2026-04-20 · unverdicted · none · ref 103
CAHAL introduces a physics-informed mixture-of-experts super-resolution network for clinical MRI that conditions on resolution and anisotropy and uses edge-penalised, Fourier, and segmentation-guided losses to reduce hallucinations compared with prior generative methods.
Physics-Informed Tracking (PIT) cs.CV · 2026-04-18 · unverdicted · none · ref 2
PIT uses a neural autoencoder with a differentiable physics module and a new Physics-Informed Landmark Loss to track single particles in video, achieving sub-pixel accuracy in supervised and unsupervised modes.
Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition cs.AR · 2026-04-17 · unverdicted · none · ref 15
A co-design framework using approximate matrix decomposition and genetic algorithms delivers 33% average latency reduction in TinyML CNN FPGA accelerators with 1.3% average accuracy loss versus standard systolic arrays.
Generative Modeling of Complex-Valued Brain MRI Data eess.IV · 2026-04-16 · unverdicted · none · ref 20
A cVAE plus flow-matching model generates realistic complex-valued brain MRI that preserves phase coherence above 0.997 and yields synthetic data that trains abnormality classifiers to 0.880 AUROC, beating the 0.842 real-data baseline on fastMRI.
Preventing Latent Rehearsal Decay in Online Continual SSL with SOLAR cs.LG · 2026-04-12 · unverdicted · none · ref 23
SOLAR prevents latent rehearsal decay in online continual SSL by adaptively managing replay buffers with deviation proxies and an explicit overlap loss, delivering both fast convergence and state-of-the-art final accuracy on vision benchmarks.
LSST Strong Lensing Systems Dark Matter Sensitivity Analysis with Neural Ratio Estimators astro-ph.CO · 2026-04-08 · conditional · none · ref 16
Simulations indicate that 2500 LSST strong lenses can exclude 74% and 36% of the prior volume on halo mass function parameters at 3σ and 5σ, with sensitivity from both high- and low-mass halos plus line-of-sight contributions.
Extraction of linearized models from pre-trained networks via knowledge distillation cs.LG · 2026-04-08 · unverdicted · none · ref 35
Koopman theory plus knowledge distillation yields linearized models from pre-trained nets that outperform standard least-squares Koopman approximations on MNIST and Fashion-MNIST in accuracy and stability.
Variational Feature Compression for Model-Specific Representations cs.CV · 2026-04-08 · unverdicted · none · ref 11
A variational latent bottleneck with KL regularization and a dynamic binary mask based on saliency produces model-specific features that keep high accuracy for one classifier but drop others below 2% on CIFAR-100 with over 45x suppression.
Toward Unified Fine-Grained Vehicle Classification and Automatic License Plate Recognition cs.CV · 2026-04-07 · accept · none · ref 26
UFPR-VeSV is a new real-world dataset for fine-grained vehicle classification and automatic license plate recognition collected from Brazilian police cameras, with benchmarks demonstrating its difficulty and the value of joint task use.
Physical Knot Classification Beyond Accuracy: A Benchmark and Diagnostic Study cs.CV · 2026-03-24 · unverdicted · none · ref 24
New knot classification benchmark and topology-aware supervision methods yield small specificity gains but confirm that appearance bias remains the dominant failure mode.
SHANG++: Robust Stochastic Acceleration under Multiplicative Noise math.OC · 2026-03-10 · unverdicted · none · ref 29
SHANG++ delivers faster convergence and stronger robustness to multiplicative noise in stochastic optimization for both convex and strongly convex problems, with explicit parameters and competitive deep-learning results.
Positive-Unlabelled Active Learning to Curate a Dataset for Orca Resident Interpretation cs.LG · 2026-02-10 · unverdicted · none · ref 25
Curates over 900 hours of SRKW acoustic data plus other marine mammal recordings via positive-unlabeled active learning, releasing transformer classifiers that report AUROC 0.58-0.77 and species top-1 accuracy of 53.2% on held-out benchmarks.
Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD cs.LG · 2026-01-15 · unverdicted · none · ref 39
Shuffled DP-SGD requires σ ≥ 1/√(2 ln M) or κ ≥ (1/√8)(1 - 1/√(4π ln M)) to limit adversarial advantage, preventing strong privacy and high utility simultaneously.
Holi-DETR: Holistic Fashion Item Detection Leveraging Contextual Information cs.CV · 2025-12-29 · unverdicted · none · ref 52
Holi-DETR improves fashion item detection by integrating co-occurrence probabilities, inter-item spatial arrangements, and body keypoint relationships into the DETR architecture.
HOLE: Homological Observation of Latent Embeddings for Neural Network Interpretability cs.LG · 2025-12-08 · unverdicted · none · ref 32
HOLE applies persistent homology to latent embeddings in neural networks and uses visualizations such as cluster flow diagrams to reveal patterns of class separation, feature disentanglement, and robustness.
Closed-Form Last Layer Optimization cs.LG · 2025-10-06 · unverdicted · none · ref 5
A method that alternates gradient steps on a neural network backbone with closed-form optimal updates to the final linear layer under squared loss, including an SGD adaptation and NTK-regime convergence analysis.
Bayesian E(3)-Equivariant Interatomic Potential with Iterative Restratification of Many-body Message Passing cs.LG · 2025-10-03 · unverdicted · none · ref 60
Bayesian E(3)-equivariant MLPs with joint energy-force NLL loss achieve competitive accuracy while enabling uncertainty-guided active learning, OOD detection, and calibration.
Perceptual implications of automatic anonymization in pathological speech eess.AS · 2025-05-01 · conditional · none · ref 62
Listeners detect automatic anonymization in pathological speech at 91-93% accuracy with a 30-point perceived quality drop, yet clinical severity ratings stay nearly unchanged for dysarthria, dysglossia, and dysphonia.
Neural Mean-Field Games: Extending Mean-Field Game Theory with Neural Stochastic Differential Equations cs.LG · 2025-04-17 · unverdicted · none · ref 46
Neural mean-field games integrate mean-field game theory with neural SDEs to learn strategic interactions from data in a model-free way, demonstrated on games and viral dynamics.
A Generalist Model for Diverse Text-Guided Medical Image Synthesis cs.CV · 2024-05-16 · unverdicted · none · ref 69
MediSyn is a generalist latent diffusion model that synthesizes text-guided medical images across multiple specialties and modalities from public data and improves downstream classifiers in low-data settings.
Deep Privacy Funnel Model: From a Discriminative to a Generative Approach with an Application to Face Recognition cs.LG · 2024-04-03 · unverdicted · none · ref 84
Introduces Generative Privacy Funnel (GenPF) and deep variational PF (DVPF) models that extend the privacy funnel to generative settings and provide a controllable privacy-utility trade-off with reduced sensitive attribute leakage in face recognition.
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations cs.CL · 2023-05-23 · conditional · none · ref 63
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.