mega hub Mixed citations

Deep residual learning for image recognition

Jian Sun, Kaiming He, Shaoqing Ren, Xiangyu Zhang · 2016 · 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) · DOI 10.1109/cvpr.2016.90

Mixed citation behavior. Most common role is method (46%).

193 Pith papers citing it

164.2k external citations · Crossref

Method 46% of classified citations

open at publisher browse 193 citing papers more from Jian Sun

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

method 18 background 14 baseline 2 dataset 1

citation-polarity summary

use method 16 background 14 baseline 2 unclear 2 use dataset 1

claims ledger

method These channels are not independent signals but jointly represent a single complex-valued measurement, where the relationship between them encodes the local phase. Unlike magnitude-only approaches, where a single intensity channel is compressed, this coupling must be explicitly preserved. The architecture, loss function, and evaluation metrics described below are designed accordingly. The architecture is implemented as a ResNet-based [20] conditional variational autoencoder (CVAE) [21]. The encod
method Together, these considerations make a scalable, high-speed, and robust reconstruction capable of operating at Monte Carlo scale essential for Hyper-Kamiokande. Machine-learning based reconstruction offers a promising path toward meeting these computational and topological chal- lenges. Convolutional neural networks [ 16], and in particular residual networks (ResNets) [17], are well suited to process the high-dimensional charge and time images recorded by the PMT array. At Super-Kamiokande, machi
method Instead of binary classification, our model classifies into four states (LL,L,H,HH), and instead of training CNN feature extractors from scratch, we use pre-trained ResNet50 using transfer learning. The model architecture is shown in Figure 3. 3.6.1 Feature extraction.The first step is to extract features from each of the seven images. Here we apply transfer learning using ResNet50 [22], pre-trained on a large dataset. We extract information from the penultimate layer of ResNet50, compressing ea
dataset historical video and recomputes attention upon query arrival. (2) ReKV [12] retrieves query-relevant KVCache at the token level. (3) LiveVLM [13] further combines token-level retrieval with KVCache compression to reduce memory usage. (4) StreamMem [14] also compresses KVCache, but under a TABLE II DATASET CONFIGURATIONS. Dataset Max Length Description MLVU [19] 703s multi-task long video LongVideoBench [20] 468s long-term multi-modal video VideoMME [21] 1,018s full-spectrum multi-modal video RVS
background Training on such data could reinforce areas where AI systems are vulnerable [37, 796], enhancing their robustness in real-world applications. Adversarial examples can be constructed in various ways. One straightforward approach is to add small perturbations to inputs, which preserves their original labels while introducing adversarial characteristics [100, 260, 300, 504]. Another effective strategy is red teaming, which usually involves human teams systematically testing to find vulnerabilities
method histopathological images [2], [4], [5], [6]. CNN have been widely adopted for cancer detection due to their ability to capture local texture patterns and hierarchical spatial features. Residual learning has been introduced to alleviate the vanishing gradient problem, leading to significant improvements in deep feature representation, as exemplified by ResNet architectures [7]. Similarly, DenseNet and kernel architectures enhance feature reuse and gradient flow, while EfficientNet achieves state-

authors

Jian Sun Kaiming He Shaoqing Ren Xiangyu Zhang

mega hub controls

export citing contexts JSON export graph JSON export full bundle JSON annotated reader queued

Recognition alignment

counterfactual ablation

If this work disappeared, these are the nearest dependency candidates in Pith, weighted toward method, dataset, baseline, and extension contexts where available. This is a structural signal, not a retraction verdict.

co-cited works

representative citing papers

FigSIM: A Dataset for Fine-grained Suicide Severity and Figurative Language in Suicide Memes

cs.CL · 2026-06-01 · conditional · novelty 8.0

FigSIM is the first annotated dataset for fine-grained suicide severity and figurative language in suicide memes, accompanied by benchmarks on 16 unimodal and multimodal models.

Traces of Helium Detected in Type Ic Supernova 2014L

astro-ph.HE · 2026-03-31 · accept · novelty 8.0

Quantitative Bayesian inference using a deep-learning emulator detects 0.018-0.020 M_sun of helium in the Type Ic supernova 2014L.

HASTE: A Framework for Training-Free, Dynamic, and Steerable Compression of Pre-Trained Convolutional Neural Networks

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

HASTE enables training-free dynamic compression of pre-trained CNNs by patch-wise LSH-based merging of redundant channels, reporting 46.2% FLOPs reduction on ResNet34 CIFAR-10 with 1.25% accuracy drop.

Event-based Gaze Control System for Accurate Real-time Spin Estimation in Professional Ball Games

cs.CV · 2026-06-25 · unverdicted · novelty 7.0 · 2 refs

An event-camera system with active gaze control and contrast-maximization spin estimation achieves real-time performance in table tennis with 8.8% magnitude error, 6.4° axis error, 3 ms latency, and 750 Hz throughput.

MATCH: Flow Matching for Multi-View Anomaly Detection

cs.CV · 2026-06-23 · unverdicted · novelty 7.0

MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.

Multi-channel Optical Vision Model

physics.optics · 2026-06-08 · unverdicted · novelty 7.0

Spatial multiplexing in optical neural networks is repurposed as a trainable representational coordinate, demonstrated in multi-layer architectures for image classification, regression, and hybrid vision-language captioning with over one million optical phase parameters.

Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation

cs.CV · 2026-06-05 · unverdicted · novelty 7.0

An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.

DELOS: Detecting Shallow Transits in Kepler Photometry Using a Contrastive-Learning Framework

astro-ph.EP · 2026-05-28 · conditional · novelty 7.0

DELOS applies contrastive learning to phase-folded light curves to detect shallow intermediate-to-long period transits, reporting 15.5% and 11.25% gains in combined precision-recall over BLS and TLS in low-SNR tests plus 3-80x speedups.

SDM: A Powerful Tool for Evaluating Model Robustness

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

SDM is a new staged gradient attack that reconstructs the adversarial objective around probability differences and reports stronger performance than prior methods like APGD.

Your Neighbors Know: Leveraging Local Neighborhoods for Backdoor Detection in Decentralized Learning

cs.LG · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

Argus enables backdoor detection in decentralized ML by collaborative neighbor-based validation of triggers, backed by convergence theory and reducing attack success by up to 90% on tested datasets.

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

Navigating Potholes with Geometry-Aware Sharpness Minimization

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

LLQR+SAM pairs a slow learned geometry preconditioner with fast SAM perturbations to amplify escape from locally sharp 'potholes' while stabilizing flat basins, producing consistent gains over SAM and LLQR alone.

MorphoHELM: A Comprehensive Benchmark for Evaluating Representations for Microscopy-Based Morphology Assays

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

MorphoHELM is a new benchmark for Cell Painting morphology representations that tests methods across increasing batch effect levels and finds classic computer vision strategies remain the strongest general-purpose performers.

VCR: Learning Valid Contextual Representation for Incomplete Wearable Signals

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

VCR learns valid contextual representations for incomplete wearable signals via orthogonal disentanglement and missing-aware mixture-of-experts, improving robustness across full and missing-modality settings.

Martingale-Consistent Self-Supervised Learning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

The paper develops a martingale-consistent SSL framework enforcing expected coherence between coarse and refined predictions via new objectives and a Monte Carlo estimator, improving robustness under partial observations.

Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.

GPROF-IR: An Improved Single-Channel Infrared Precipitation Retrieval for Merged Satellite Precipitation Products

physics.ao-ph · 2026-05-08 · unverdicted · novelty 7.0

GPROF-IR is a CNN-based retrieval that uses temporal context in geostationary IR observations to produce precipitation estimates with lower error than prior IR methods and climatological consistency with PMW retrievals for integration into IMERG V08.

Rethinking the Need for Source Models: Source-Free Domain Adaptation from Scratch Guided by a Vision-Language Model

cs.CV · 2026-05-04 · unverdicted · novelty 7.0

The paper introduces the VODA setting for domain adaptation from scratch using vision-language models and presents TS-DRD, which achieves competitive performance on standard benchmarks without source models.

GEODE: Angle-Adaptive OOD Detection with Universal Scorer Compatibility

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

GEODE uses per-sample cosine-similarity scaling in a norm loss to preserve feature geometry for universal scorer-compatible OOD detection, matching or exceeding OE performance on CIFAR benchmarks.

PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

cs.LG · 2026-04-23 · unverdicted · novelty 7.0

Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.

Trust-SSL: Additive-Residual Selective Invariance for Robust Aerial Self-Supervised Learning

cs.CV · 2026-04-23 · accept · novelty 7.0

Trust-SSL introduces additive-residual trust weights in SSL to selectively handle corruptions in aerial imagery, yielding higher linear-probe accuracy and larger gains under severe degradations than SimCLR or VICReg.

FRTSearch: Unified Detection and Parameter Inference of Fast Radio Transients using Instance Segmentation

astro-ph.IM · 2026-04-14 · unverdicted · novelty 7.0

FRTSearch reframes fast radio transient detection as instance segmentation on dynamic spectra and uses the segmented shapes to infer dispersion measure and time of arrival, achieving 98% recall with over 99.9% fewer false positives than traditional methods.

CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance Extraction

cs.AR · 2026-04-13 · accept · novelty 7.0

CapBench is a new multi-PDK dataset of post-layout 3D windows with high-fidelity capacitance labels and multiple ML-ready representations, plus baseline results showing CNN accuracy versus GNN speed trade-offs.

citing papers explorer

Showing 50 of 193 citing papers.

Multimodal Chain-of-Thought Reasoning in Language Models cs.CL · 2023-02-02 · accept · none · ref 16
Multimodal-CoT achieves state-of-the-art on ScienceQA by using a two-stage process that incorporates vision into chain-of-thought rationale generation for models under 1 billion parameters.
Generalizing from a few environments in safety-critical reinforcement learning cs.LG · 2019-07-02 · unverdicted · none · ref 14
RL agents fail dangerously on unseen environments; ensembles reduce catastrophes in gridworld but not CoinRun, with uncertainty enabling intervention prediction.
From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training cs.LG · 2026-07-01 · unverdicted · none · ref 21
MTCL learns multi-scale temporal correlations in videos via contrastive learning to produce more informative representations that improve sample efficiency and performance in downstream RL tasks.
Leveraging Multimodality for Real-Time Classification of Transients and Variables found by the Zwicky Transient Facility astro-ph.IM · 2026-06-30 · unverdicted · none · ref 37
ORACLE-2 multimodal classifiers raise macro F1 from 0.52-0.66 (light-curve only) to 0.73 on ZTF Bright Transient Survey data and reach 0.88 on simulated ELAsTiCC data.
Discovering Collaboration from Novelty: Random Network Distillation for Clustered Federated Learning cs.LG · 2026-06-29 · unverdicted · none · ref 18
Random Network Distillation enables pre-training discovery of client clusters in federated learning via local novelty signals, supporting autonomous grouping under non-IID data without a priori cluster count.
FedLAS: Feature-Modulated Bidirectional Label Smoothing for Neural Network Calibration cs.CV · 2026-06-26 · unverdicted · none · ref 11
FedLAS adds feature-norm based confidence detection and bidirectional gating to label smoothing losses to reduce calibration error on vision benchmarks while preserving accuracy.
Improving Adversarial Robustness via Activation Amplification and Attenuation cs.CV · 2026-06-26 · unverdicted · none · ref 15
A3 is a learnable activation scaling module that trains on amplified adversarial signals via contrastive losses to improve robustness when the same parameters are used in attenuation mode.
Compression-Driven Anomaly Detection in Brain MRI Using an Interpretable Quantum Autoencoder quant-ph · 2026-06-25 · unverdicted · none · ref 118
A variational quantum autoencoder detects anomalies in brain MRI by scoring resistance to compression, reporting slice-level ROC-AUC of 0.95 and outperforming classical autoencoders and PCA on public datasets.
First Results from the LSST Shadow Survey: The Restless Luminous Blue Variable AT2017des in the Virgo-Cluster Galaxy, NGC4532 astro-ph.HE · 2026-06-22 · unverdicted · none · ref 70
The DECam Shadow Survey has detected variable LBV eruptions in AT2017des with peaks brightening by ~0.05 mag per year, reaching luminosities similar to extreme SN impostors.
Frequency-Domain Neural ODEs for Modeling Non-Linear Dynamical Systems cs.LG · 2026-06-20 · unverdicted · none · ref 29
FNODE projects Neural ODE dynamics into the frequency domain via FFT and reports better generalization and convergence stability than GRUs, LSTMs, and ANODE on Lotka-Volterra, forced Duffing, Van der Pol, and Lorenz systems.
A Controlled Study of CLIP-Based Body-Scene Fusion for Emotion Recognition in Context cs.CV · 2026-06-20 · unverdicted · none · ref 16
Controlled study finds CLIP-based body-scene fusion model for emotion recognition on EMOTIC is not improved by context debiasing or rare-class training, with best mAP of 34.52%.
Test-Time Adaptation in Optical Coherence Tomography Using Trajectory-Aligned Time-Independent Flow cs.CV · 2026-06-17 · unverdicted · none · ref 7
Flow-matching TTA with histogram matching to synthetic reference trajectories and time-independent flow achieves SOTA segmentation of AMD biomarkers in OCT.
SketchXplain: Intuitive Visual Explanations of Image Classifiers with Sketches cs.HC · 2026-06-16 · unverdicted · none · ref 131
SketchXplain produces sketch-based explanations for image classifiers that users interpret faster and more coherently than saliency maps on face expression and skin lesion tasks.
Multi-FRuGaL: Multimodal Flexible Redundancy-aware Decomposed Gated Learning for Cancer Diagnosis and Prognosis cs.CV · 2026-06-05 · unverdicted · none · ref 79
Multi-FRuGaL is a decomposition-aware gated fusion framework for multimodal cancer data that maintains performance under missing modalities and reports AUC gains on two head-and-neck cancer cohorts.
Signed Spiking Neuron Enabled by an Orthogonal-Easy-Axis Magnetic Tunnel Junction cs.NE · 2026-06-02 · unverdicted · none · ref 19
An MTJ device with orthogonal easy axes is proposed to realize signed LIF neurons, with LLG simulations confirming the behavior and network tests showing 91.06% CIFAR-10 accuracy.
From Local Training to Large-Scale Mapping: A Comparative Assessment of Machine Learning and Deep Learning for Transferable Satellite-Derived Bathymetry cs.CV · 2026-06-01 · unverdicted · none · ref 38
Deep CNNs with spatial continuity preservation and a new weighted loss function outperform Random Forest in cross-regional transfer for satellite-derived bathymetry, achieving low RMSE on independent tests and a public benchmark.
Pretrained Approximators for Low-Thrust Trajectory Cost and Reachability cs.LG · 2026-05-26 · unverdicted · none · ref 65 · 2 links
Neural surrogates trained with scaling laws and self-similar transformations accurately approximate low-thrust trajectory costs and reachability while generalizing across orbital parameters.
Deep Learning-Enabled Prediction of Geoeffective CMEs Using SOHO and SDO Observations astro-ph.SR · 2026-05-23 · unverdicted · none · ref 26
A CNN-based fusion model trained on multi-instrument solar observations predicts geoeffective CMEs, achieving mean TSS of 0.703 and Brier score of 0.095 via five-fold cross-validation.
Rethinking Federated Unlearning via the Lens of Memorization cs.LG · 2026-05-23 · unverdicted · none · ref 20
Introduces Grouped Memorization Evaluation and FedMemPrune to remove unique memorized information in federated unlearning while preserving overlapping knowledge.
OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025 cs.CV · 2026-05-21 · accept · none · ref 33
The OSS Challenge provides benchmarks showing spatiotemporal video models excel at open suturing skill classification and OSATS scoring but struggle with keypoint tracking under occlusion.
Classification of Single and Mixed Partial Discharges under Switching Voltage Using an AWA-CNN Framework cs.LG · 2026-05-20 · unverdicted · none · ref 34
AWA patterns from PD pulse amplitude, width, and area enable CNNs to classify single and mixed partial discharge sources under switching voltage with over 96% test accuracy.
Position: Age Estimation Models Do Not Process Biometric Data cs.CY · 2026-05-17 · unverdicted · none · ref 17
Empirical evaluation shows age estimation models perform orders of magnitude below identification thresholds on face verification benchmarks, indicating they do not extract identity-discriminative representations.
Soft Learning cs.LG · 2026-05-16 · unverdicted · none · ref 10
Soft Learning optimally combines heterogeneous ML specialists via cross-validated non-negative least squares, achieving top performance on 70% of 37 datasets with formal guarantees and 72-435x CPU speedups over deep networks.
When Does Sparse MoE Help in Vision? The Role of Backbone Compute Leverage in Sparse Routing cs.CV · 2026-05-15 · unverdicted · none · ref 16
Sparse MoE vision models show positive accuracy gaps only when routing a substantial compute fraction ρ and using k≥2 experts at large scale; batch-axis dispatch is identified as a key failure mode.
Rethinking the Good Enough Embedding for Easy Few-Shot Learning cs.CV · 2026-05-13 · conditional · none · ref 7
Frozen DINOv2-L features with k-NN classification and PCA/ICA refinement achieve state-of-the-art few-shot performance on four benchmarks without any backpropagation or fine-tuning.
Venus-DeFakerOne: Unified Fake Image Detection & Localization cs.CV · 2026-05-13 · unverdicted · none · ref 66
DeFakerOne is a unified foundation model for joint image-level fake image detection and pixel-level localization that reports SOTA results on 39 detection and 9 localization benchmarks.
Dynamical Predictive Modelling of Cardiovascular Disease Progression Post-Myocardial Infarction via ECG-Trained Artificial Intelligence Model cs.LG · 2026-05-13 · unverdicted · none · ref 8
A contrastive-learning ECG foundation model with multitask heads predicts post-MI outcomes better than training from scratch (AUC 0.794 vs 0.608).
Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations cs.AR · 2026-05-12 · unverdicted · none · ref 107 · 2 links
BMRUs enable analog recurrent neural network hardware via discrete outputs that suppress noise 20-fold, with one-to-one parameter-to-circuit mapping and linear power scaling for recurrence.
Multi-Narrow Transformation as a Single-Model Ensemble: Boundary Conditions, Mechanisms, and Failure Modes cs.LG · 2026-05-12 · unverdicted · none · ref 9
Multi-narrow single-model ensembles outperform wide baselines in low-data image classification by learning diverse features but underperform in data-rich settings where training favors few paths.
Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction cs.DC · 2026-05-09 · unverdicted · none · ref 13
A generative compression model using historical priors for Earth observation data achieves up to 10,000x reduction after exascale training on an Armv9 supercomputer.
LAMES: A Large-Scale and Artisanal Mining Environmental Segmentation Dataset cs.CV · 2026-05-08 · conditional · none · ref 17
LAMES is a new annotated remote-sensing dataset covering 150 large-scale mining sites and 870 km² of artisanal mining for environmental segmentation and monitoring tasks.
A Unified Framework for the Detection and Classification of Fatty Pancreas in Ultrasound Images cs.CV · 2026-05-08 · unverdicted · none · ref 32
A TransUNet-based segmentation followed by texture comparison classifies fatty pancreas in ultrasound with 89.7% accuracy on a small clinical dataset.
Architecture-agnostic Lipschitz-constant Bayesian header and its application to resolve semantically proximal classification errors with vision transformers cs.CV · 2026-05-07 · unverdicted · none · ref 36
LipB-ViT adds bi-Lipschitz Bayesian layers to vision transformers and uses uncertainty-aware fusion to identify corrupted labels with over 93% recall at 15% noise, beating kNN baselines.
The autoPET3 Challenge: Automated Lesion Segmentation in Whole-Body PET/CT $\unicode{x2013}$ Multitracer Multicenter Generalization cs.CV · 2026-05-07 · unverdicted · none · ref 55
The autoPET3 challenge finds that leading AI models reach a mean Dice score of 0.66 for multitracer PET/CT lesion segmentation, with compositional generalization to unseen tracer-center pairs remaining an open problem driven by volume overestimation and case heterogeneity.
Cool-chic 5.0: Faster Encoding and Inter-Feature Entropy Modeling for Overfitted Image Compression eess.IV · 2026-05-04 · unverdicted · none · ref 38
Cool-chic 5.0 delivers 11% lower rate than H.266/VVC and matches modern autoencoders like MLIC++ with 250 times lower decoding complexity through an updated decoder architecture and faster optimization for overfitted codecs.
Online Generalised Predictive Coding stat.ML · 2026-05-04 · unverdicted · none · ref 31
Online generalised predictive coding (ODEM) tracks latent states in nonlinear and chaotic generative models by separating temporal scales for fast Bayesian belief updating and slow parameter learning.
Geometric and dynamical analysis of attractor boundaries and storage limits in kernel Hopfield networks cs.NE · 2026-05-01 · unverdicted · none · ref 12 · 4 links
KLR Hopfield networks reach P/N storage of ~16 for random patterns and ~20 for structured data, with limits set by dynamical instability against noise rather than geometric separability per Cover's theorem.
H-SemiS: Hierarchical Fusion of Semi and Self-Supervised Learning for Knee Osteoarthritis Severity Grading cs.CV · 2026-04-25 · unverdicted · none · ref 20
H-SemiS decomposes multi-class KOA severity grading into binary sub-tasks in a semi-supervised setup with self-supervision and quantum-inspired mixing, outperforming baselines on two multi-class and two binary datasets.
EnergAIzer: Fast and Accurate GPU Power Estimation Framework for AI Workloads cs.AR · 2026-04-22 · unverdicted · none · ref 45
EnergAIzer predicts module-level GPU utilization from structured kernel patterns and feeds it into a power model to estimate dynamic power with 8% error on Ampere GPUs and 7% on H100 forecasts.
Seeing Candidates at Scale: Multimodal LLMs for Visual Political Communication on Instagram cs.CV · 2026-04-21 · unverdicted · none · ref 111
GPT-4o achieves macro F1 scores of 0.89 for politician face recognition and 0.86 for person counting in election Instagram stories, outperforming FaceNet512, RetinaFace, and Google Cloud Vision.
Training-inference input alignment outweighs framework choice in longitudinal retinal image prediction cs.CV · 2026-04-18 · unverdicted · none · ref 24
Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.
Learning to Look before Learning to Like: Incorporating Human Visual Cognition into Aesthetic Quality Assessment cs.CV · 2026-04-17 · unverdicted · none · ref 2
AestheticNet improves aesthetic quality assessment by fusing a gaze-aligned visual encoder pre-trained on eye-tracking data with semantic encoders via cross-attention, yielding consistent gains over semantic-only baselines.
Weak-to-Strong Knowledge Distillation Accelerates Visual Learning cs.CV · 2026-04-16 · unverdicted · none · ref 14
Weak-to-strong knowledge distillation applied early and then turned off accelerates convergence to target performance in visual learning tasks by factors of 1.7-4.8x.
Enhancing Event Reconstruction in Hyper-Kamiokande with Machine Learning: A ResNet Implementation hep-ex · 2026-04-15 · conditional · none · ref 17
ResNet models classify four particle types and regress vertex, direction, and momentum in Hyper-Kamiokande with resolutions matching likelihood methods but at 30,000-50,000x faster inference on GPU.
Predicting Associations between Solar Flares and Coronal Mass Ejections Using SDO/HMI Magnetograms and a Hybrid Neural Network astro-ph.SR · 2026-04-11 · unverdicted · none · ref 23
Hybrid neural network predicts eruptive versus confined solar flares from SDO/HMI magnetogram sequences, reports good performance, and links results to magnetic flux cancellation in polarity inversion lines.
Prototype-Guided Robust Learning against Backdoor Attacks cs.CR · 2025-09-03 · unverdicted · none · ref 32
PGRL defends ML models from backdoor attacks by using a few verified clean samples to guide removal of suspicious training data and unlearning of backdoor features during fine-tuning, outperforming prior defenses in experiments.
DoSReMC: Domain Shift Resilient Mammography Classification using Batch Normalization Adaptation eess.IV · 2025-08-21 · unverdicted · none · ref 65
DoSReMC improves cross-domain generalization in mammography classification by fine-tuning only batch normalization and fully connected layers of pretrained CNNs while preserving convolutional filters, combined with adversarial training.
Automatic Road Subsurface Distress Recognition from Ground Penetrating Radar Images using Deep Learning-based Cross-verification cs.CV · 2025-07-15 · unverdicted · none · ref 43
A cross-verification strategy using three YOLO models trained on distinct views of a 2134-sample 3D GPR dataset detects road subsurface distress with over 98.6 percent recall on field data.
Automated Description Generation of Cytologic Findings for Lung Cytological Images Using a Pretrained Vision Model and Dual Text Decoders: Preliminary Study eess.IV · 2024-03-26 · unverdicted · none · ref 15
A CNN classifies lung cytology patches as benign or malignant at 100% sensitivity and 96.4% specificity, then routes to one of two Transformer decoders to generate findings text achieving BLEU-4 of 0.828 on 801 images.
General Inverse Design of Thin-Film Metamaterials With Convolutional Neural Networks physics.comp-ph · 2021-03-29 · unverdicted · none · ref 51
Convolutional neural networks are shown to perform inverse design of thin-film metamaterial stacks by learning the mapping from structure to ellipsometric and reflectance/transmittance spectra, with efficiency gains over traditional optimization as layer count increases.