State-of-the-art convolutional networks easily memorize random labels and unstructured noise images, indicating that generalization in deep learning cannot be explained by traditional capacity or regularization arguments.
Rethinking the Inception Architecture for Computer Vision
17 Pith papers cite this work, alongside 21,437 external citations. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
authors
co-cited works
roles
background 4representative citing papers
The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.
SemanticBridge provides a new 3D dataset for bridge component segmentation and quantifies sensor-induced domain gaps that drop model performance by up to 11.4% mIoU.
Matching in semantic SSL feature space via Sinkhorn divergence enables effective one-step generation on ImageNet by inducing compact geometry for distribution matching, with training and evaluation features best kept distinct.
AuxPath-FM extends flow matching to arbitrary auxiliary distributions while preserving the continuity equation and marginal training objective.
P-Guide achieves single-pass classifier-free guidance in flow matching by modulating the initial latent state and is equivalent to standard CFG under a first-order approximation while cutting latency by half.
Explicit dropout reformulates stochastic dropout as deterministic loss penalties for Transformers, matching or exceeding standard performance with independent control per component.
RoMAE applies rotary positional embeddings to masked autoencoders to enable representation learning and interpolation on continuous positional data across irregular time-series, images, and audio without modality-specific modifications.
GAP3D is a diffusion-based alignment technique that maps VLM latents to dense patch embeddings from image encoders, enabling modular VLM conditioning for 3D generation without 3D training data.
HeartBeatAI reports 98% Macro F1 under intra-source testing on four ECG datasets but shows significant degradation on rare anomalies under leave-one-domain-out evaluation.
Contradictions across CNN studies for Vis-NIR chemometrics are expected outcomes of uncontrolled variables in spectral physics and validation design, motivating a conditional rather than universal design framework.
PD36-C is a 1.25 million parameter CNN achieving 99.53% average test accuracy on 38 plant disease classes from the New Plant Diseases Dataset, with a Qt-based app enabling edge deployment.
AlphaQuanter introduces a single-agent tool-augmented RL framework for stock trading that learns dynamic policies over a transparent decision workflow and reports state-of-the-art financial metrics.
Thesis uses statistical mechanics to study DAM and RBM models for understanding memorization, low-dimensional learning, and adversarial robustness in neural networks.
A DenseNet201 base model trained on a constructed plant leaf disease dataset outperforms baselines and enables faster, more robust transfer learning with less data than general models.
Benchmark of twelve models finds hybrid CNN-transformer architectures and a SigLIP vision-language model deliver the strongest overall performance on skin cancer detection using the PAD-UFES-20 dataset.
citing papers explorer
-
Understanding deep learning requires rethinking generalization
State-of-the-art convolutional networks easily memorize random labels and unstructured noise images, indicating that generalization in deep learning cannot be explained by traditional capacity or regularization arguments.
-
Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification
The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.
-
SemanticBridge - A Dataset for 3D Semantic Segmentation of Bridges and Domain Gap Analysis
SemanticBridge provides a new 3D dataset for bridge component segmentation and quantifies sensor-induced domain gaps that drop model performance by up to 11.4% mIoU.
-
Generate in Reconstruction Space, Match in Semantic Space: Transport Geometry for One-Step Generation
Matching in semantic SSL feature space via Sinkhorn divergence enables effective one-step generation on ImageNet by inducing compact geometry for distribution matching, with training and evaluation features best kept distinct.
-
Flow Matching with Arbitrary Auxiliary Paths
AuxPath-FM extends flow matching to arbitrary auxiliary distributions while preserving the continuity equation and marginal training objective.
-
P-Guide: Parameter-Efficient Prior Steering for Single-Pass CFG Inference
P-Guide achieves single-pass classifier-free guidance in flow matching by modulating the initial latent state and is equivalent to standard CFG under a first-order approximation while cutting latency by half.
-
Explicit Dropout: Deterministic Regularization for Transformer Architectures
Explicit dropout reformulates stochastic dropout as deterministic loss penalties for Transformers, matching or exceeding standard performance with independent control per component.
-
Rotary Masked Autoencoders are Versatile Learners
RoMAE applies rotary positional embeddings to masked autoencoders to enable representation learning and interpolation on continuous positional data across irregular time-series, images, and audio without modality-specific modifications.
-
GAP3D: Generative Alignment of VLM Latents to Patch-Level Embeddings for 3D Generation
GAP3D is a diffusion-based alignment technique that maps VLM latents to dense patch embeddings from image encoders, enabling modular VLM conditioning for 3D generation without 3D training data.
-
HeartBeatAI: An Interpretable and Robust Deep Learning Framework for Multi-Label ECG Arrhythmia Detection
HeartBeatAI reports 98% Macro F1 under intra-source testing on four ECG datasets but shows significant degradation on rare anomalies under leave-one-domain-out evaluation.
-
CNNs for Vis-NIR Chemometrics: From Contradiction to Conditional Design
Contradictions across CNN studies for Vis-NIR chemometrics are expected outcomes of uncontrolled variables in spectral physics and validation design, motivating a conditional rather than universal design framework.
-
A Compact and Efficient 1.251 Million Parameter Machine Learning CNN Model PD36-C for Plant Disease Detection: A Case Study
PD36-C is a 1.25 million parameter CNN achieving 99.53% average test accuracy on 38 plant disease classes from the New Plant Diseases Dataset, with a Qt-based app enabling edge deployment.
-
AlphaQuanter: An End-to-End Tool-Augmented Agentic Reinforcement Learning Framework for Stock Trading
AlphaQuanter introduces a single-agent tool-augmented RL framework for stock trading that learns dynamic policies over a transparent decision workflow and reports state-of-the-art financial metrics.
-
Explaining Machine Learning and Memorization with Statistical Mechanics
Thesis uses statistical mechanics to study DAM and RBM models for understanding memorization, low-dimensional learning, and adversarial robustness in neural networks.
-
Developing a Strong Pre-Trained Base Model for Plant Leaf Disease Classification
A DenseNet201 base model trained on a constructed plant leaf disease dataset outperforms baselines and enables faster, more robust transfer learning with less data than general models.
-
CNNs, Transformers, Hybrid, and Vision Language Models for Skin Cancer Detection
Benchmark of twelve models finds hybrid CNN-transformer architectures and a SigLIP vision-language model deliver the strongest overall performance on skin cancer detection using the PAD-UFES-20 dataset.
- Semi supervised GAN for smart microscopy, fast and data efficient cell cycle classification