Empirical power-law scaling governs language model loss versus model size, data size, and compute, enabling optimal allocation of training compute.
hub Mixed citations
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mixed citation behavior. Most common role is background (67%).
abstract
Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet. To go even further, we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.3% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Source code is at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
iSAGE achieves near-dense mIoU performance in remote sensing semantic segmentation using iterative expert clicks on confident model errors with an error-weighted loss, using only 0.011-0.04% of pixels.
A dual-encoder deepfake detector pairs a frozen specialist with a LoRA-tuned MLLM, trained first via binary alignment then via RL to reward explain-then-classify behavior, yielding improved cross-dataset performance and interpretability.
PHAT-JeT combines geometric message-passing with hierarchical patch attention to reach state-of-the-art accuracy and background rejection among resource-constrained jet tagging models on four benchmarks.
QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.
The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.
SMCNet applies a complex-valued CNN to mmWave radar IQ data for high-accuracy surface material classification across multiple and unseen sensing distances.
Presents the ev-CIVIL dataset and benchmark showing that event-based cameras can support real-time detection of cracks and spalling in civil infrastructure under challenging lighting.
The DFDC dataset is the largest public collection of face-swapped videos and supports detectors that generalize to in-the-wild deepfakes.
Adapts Flow Matching from generative AI to probabilistic inversion, evaluated on a simple 2D velocity model and the OpenFWI seismic dataset.
Cross-dataset testing of nearest-neighbor and Mahalanobis anomaly detectors on CLIP, DINOv2, ResNet-50 and EfficientNet embeddings shows same-dataset AUC averaging 0.704 dropping to 0.499 on other datasets, with false-alarm rates around 31,931 per hour at usable operating points.
Cross-AUC averages per-domain AUCs with a polarization term from Wasserstein distance on score distributions to assess deepfake detector generalization under domain shift more realistically than isolated AUC.
LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.
The ITW-SM dataset and targeted optimization of detector design choices yield a 26.87% average AUC improvement for state-of-the-art AI-generated image detectors under real-world social media conditions.
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.
SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.
CoughPhase-CLR uses cough physiological phases to build contrastive positive pairs, outperforming random cropping on downstream tasks including COVID-19 detection and COPD classification.
Multi-FRuGaL is a decomposition-aware gated fusion framework for multimodal cancer data that maintains performance under missing modalities and reports AUC gains on two head-and-neck cancer cohorts.
A preprocessor of Gaussian noise plus bilateral filtering yields supralinear adversarial robustness in CNNs and, when paired with adversarial training, ranks near the top of RobustBench while using far less compute, parameters, epochs, and data than prior defenses.
A multimodal alignment pipeline decodes EEG signals recorded during natural image viewing into image retrieval (86.3% Top-1) and reconstruction (CLIP 0.903) tasks.
Sparse MoE vision models show positive accuracy gaps only when routing a substantial compute fraction ρ and using k≥2 experts at large scale; batch-axis dispatch is identified as a key failure mode.
citing papers explorer
-
iSAGE: A Human-in-the-Loop Framework for Remote Sensing Semantic Segmentation via Sparse Point Supervision
iSAGE achieves near-dense mIoU performance in remote sensing semantic segmentation using iterative expert clicks on confident model errors with an error-weighted loss, using only 0.011-0.04% of pixels.
-
The Regularizing Power of Language-Training Deepfake Detectors
A dual-encoder deepfake detector pairs a frozen specialist with a LoRA-tuned MLLM, trained first via binary alignment then via RL to reward explain-then-classify behavior, yielding improved cross-dataset performance and interpretability.
-
Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification
The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.
-
Event-based Civil Infrastructure Visual Defect Detection: ev-CIVIL Dataset and Benchmark
Presents the ev-CIVIL dataset and benchmark showing that event-based cameras can support real-time detection of cracks and spalling in civil infrastructure under challenging lighting.
-
The DeepFake Detection Challenge (DFDC) Dataset
The DFDC dataset is the largest public collection of face-swapped videos and supports detectors that generalize to in-the-wild deepfakes.
-
Benchmark AUC Is Not Deployable Reliability: A Cross-Dataset Audit of Off-the-Shelf Features for Surveillance Video Anomaly Detection
Cross-dataset testing of nearest-neighbor and Mahalanobis anomaly detectors on CLIP, DINOv2, ResNet-50 and EfficientNet embeddings shows same-dataset AUC averaging 0.704 dropping to 0.499 on other datasets, with false-alarm rates around 31,931 per hour at usable operating points.
-
When AUC Misleads: Polarization-Aware Evaluation of Deepfake Detectors under Domain Shift
Cross-AUC averages per-domain AUCs with a polarization term from Wasserstein distance on score distributions to assess deepfake detector generalization under domain shift more realistically than isolated AUC.
-
LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection
LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.
-
Navigating the Challenges of AI-Generated Image Detection in the Wild: What Truly Matters?
The ITW-SM dataset and targeted optimization of detector design choices yield a 26.87% average AUC improvement for state-of-the-art AI-generated image detectors under real-world social media conditions.
-
Vision Transformers Need Registers
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
-
Multi-FRuGaL: Multimodal Flexible Redundancy-aware Decomposed Gated Learning for Cancer Diagnosis and Prognosis
Multi-FRuGaL is a decomposition-aware gated fusion framework for multimodal cancer data that maintains performance under missing modalities and reports AUC gains on two head-and-neck cancer cohorts.
-
Brain-to-Image Retrieval and Reconstruction via Multimodal EEG Alignment
A multimodal alignment pipeline decodes EEG signals recorded during natural image viewing into image retrieval (86.3% Top-1) and reconstruction (CLIP 0.903) tasks.
-
When Does Sparse MoE Help in Vision? The Role of Backbone Compute Leverage in Sparse Routing
Sparse MoE vision models show positive accuracy gaps only when routing a substantial compute fraction ρ and using k≥2 experts at large scale; batch-axis dispatch is identified as a key failure mode.
-
Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification
Inpainting auxiliary task improves clustering of embeddings for individual zebrafish identification based on skin patterns.
-
Non-identifiability of Explanations from Model Behavior in Deep Networks of Image Authenticity Judgments
Models predicting human authenticity judgments produce inconsistent attribution maps across architectures, showing that explanations are non-identifiable.
-
Generalizable Deepfake Detection Based on Forgery-aware Layer Masking and Multi-artifact Subspace Decomposition
FMSD improves cross-dataset generalization in deepfake detection by using gradient-based layer masking to select forgery-sensitive weights and SVD to split them into preserved semantic and multiple learnable artifact subspaces with orthogonality constraints.
-
Where Do Tokens Go? Understanding Pruning Behaviors in STEP at High Resolutions
STEP uses dynamic superpatch merging via dCTS and early token exits to cut token count by 2.5x and computational complexity by up to 4x on ViT-Large for high-res segmentation, with at most 2% accuracy drop and 40% tokens halted early.
-
Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection
A multi-dataset cross-domain knowledge distillation approach improves unified performance on medical image segmentation, classification, and detection by transferring domain-invariant features from a joint teacher model to task-specific students.
-
DYMAPIA: A Multi-Domain Framework for Detecting AI-based Video Manipulation
DYMAPIA builds dynamic anomaly masks from Fourier spectra, texture, edges, and optical flow to guide a lightweight DistXCNet classifier, reporting over 99% accuracy and F1 on FF++, Celeb-DF, and VDFD.
-
A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data
Describes a camera-radar fusion network that uses raw RD spectra and BEV-polar camera features for BEV object detection, evaluated for accuracy and compute on the RADIal dataset.
-
Attention mechanisms and transfer learning for robust peach leaf damage classification under domain shift
EfficientNetB5 with CBAM reaches 93.3% accuracy on a 1,366-image peach leaf damage dataset and EfficientNetB3 with CBAM reaches 93% macro F1 after transfer to a 180-image local domain.
-
A Multiscale Network with Supervised Contrastive Learning for Real-Time Facial Emotion Recognition
A multiscale network combined with supervised contrastive learning is trained on a standard dataset to perform real-time facial emotion recognition from video, yielding satisfactory outcomes.
-
Image Classification via Random Dilated Convolution with Multi-Branch Feature Extraction and Context Excitation
RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.
-
Towards Accurate and Efficient Waste Image Classification: A Hybrid Deep Learning and Machine Learning Approach
A hybrid deep learning plus classical ML pipeline for waste image classification reaches up to 100% accuracy on TrashNet and a corrected household dataset while cutting feature dimensionality by over 95%.
-
Classifying galaxies in the Galaxy10 DECals dataset using Inception and Residual CNNs
ResNet101 and InceptionV4 both reach approximately 90 percent accuracy on ten-class galaxy classification in Galaxy10 DECals, with ResNet101 superior on performance metrics.
-
Robust Deepfake Detection, NTIRE 2026 Challenge: Report
The NTIRE 2026 challenge finds that large foundation models combined with ensembles and degradation-aware training produce the most robust deepfake detectors.
-
Introduction to Camera Pose Estimation with Deep Learning
A survey of deep learning approaches for regressing absolute camera pose from single RGB images, covering key methods, trends, cross-comparisons, and reproducibility notes.