super hub

ImageNet Large Scale Visual Recognition Challenge

Aditya Khosla, Alexander C. Berg, Andrej Karpathy, Hao Su, Jia Deng, Jonathan Krause + 2 more · 2015 · International Journal of Computer Vision · DOI 10.1007/s11263-015-0816-y

50 Pith papers cite this work, alongside 30,004 external citations. Polarity classification is still indexing.

50 Pith papers citing it

30k external citations · Crossref

open at publisher browse 50 citing papers more from Aditya Khosla

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 2 dataset 2

citation-polarity summary

background 2 use dataset 2

claims ledger

dataset T able 4Common datasets used in CDOD benchmarks, summarizing modality, scale, annotation volume, typical role, and dominant shift type. Acronyms: S = Source, T = Target. Symbol:∼ indicates approximate counts. Dataset Y ear Modality #Images #Cls #Anno Role Domain Shift PASCAL VOC [95] 2007-2012 RGB∼16.5K∼20∼40K S/T mild scene shift MS COCO [96] 2014 RGB∼330K∼80∼2.5M S scene diversity ImageNet DET [97] 2013 RGB∼450K∼200∼500K S fine-grained cate- gory Cityscapes [98] 2016 RGB∼3.0K∼8∼65K T urban sce
dataset Finally, ifg 1 andg 2 both do not depend on the second argument, (3) is a linear parabolic SPDE with additive noise: dUt =α 1(t)∆Ut dt+α 2(t) dWt for allt∈I.(20) I Numerical simulation For the numerical simulation of the forward and backward processes, (3) and (1), we modeled the image space Λ as Λ = (0, d1)×(0, d 2)and decomposed the boundary∂Λaccording to ∂LΛ :={0} ×[0, d 2);(21) ∂T Λ := [0, d1)× {d 2};(22) ∂RΛ :={d 1} ×(0, d 2];(23) ∂BΛ := (0, d1]× {0}(24) into its left, top, right and bottom
background ImageNet Large Scale Visual Recognition Challenge.International Journal of Computer Vision (IJCV)115, 3 (2015), 211-252. doi:10.1007/s11263-015-0816-y [41] Shuran Song, Samuel P Lichtenberg, and Jianxiong Xiao. 2015. Sun rgb-d: A rgb-d scene understanding benchmark suite. InProceedings of the IEEE conference on computer vision and pattern recognition. 567-576. [42] Alex Tamkin, Mike Wu, and Noah D. Goodman. 2020. Viewmaker Networks: Learning Views for Unsupervised Representation Learning.ArXivab
background 1 Introduction In recent years, the emergence and evolution of auto-regressive models [18, 44, 66] and diffusion models [32, 61, 16, 50, 58, 55, 56] have led to AI-generated content (AIGC) becoming increasingly realistic and widely applied across industries, bringing convenience to fields such as entertainment [51, 2, 63], advertising [ 39, 17], and medicine [ 60, 83]. This progress is particularly evident in AI- synthesized images, which have seen gradual improvements in resolution and semantic

authors

Aditya Khosla Alexander C. Berg Andrej Karpathy Hao Su Jia Deng Jonathan Krause Li Fei-Fei Michael Bernstein Olga Russakovsky Sanjeev Satheesh Sean Ma Zhiheng Huang

co-cited works

representative citing papers

GPUBreach: Privilege Escalation Attacks on GPUs using Rowhammer

cs.CR · 2026-05-05 · unverdicted · novelty 8.0

Unprivileged CUDA kernels can use Rowhammer to tamper with GPU page tables for targeted privilege escalation, leaking cryptographic keys and escalating to CPU root access by bypassing IOMMU.

Understanding deep learning requires rethinking generalization

cs.LG · 2016-11-10 · accept · novelty 8.0

State-of-the-art convolutional networks easily memorize random labels and unstructured noise images, indicating that generalization in deep learning cannot be explained by traditional capacity or regularization arguments.

Structure Before Collapse: Transient semantic geometry in next-token prediction

cs.LG · 2026-06-25 · unverdicted · novelty 7.0

Semantic geometry emerges transiently early in next-token prediction training before collapsing to Neural Collapse symmetry in synthetic settings with latent semantic factors.

Rethinking Token Reduction for Diffusion Models via Output-Similarity-Awareness

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

DiTo shifts token reduction in DiTs to output token similarity, reusing prior-step matches across timesteps with PMR scheduling and frequency-aware penalties to raise PSNR at given speedups.

ImageAttributionBench: How Far Are We from Generalizable Attribution?

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

ImageAttributionBench is a benchmark dataset demonstrating that state-of-the-art image attribution methods lack robustness to image degradation and fail to generalize to semantically disjoint domains.

Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders

cs.PF · 2026-05-09 · accept · novelty 7.0 · 2 refs

Single-thread JPEG benchmarks misrank decoders for ML DataLoader use, with rankings changing across CPUs and worker counts; torchvision and simplejpeg perform best in measured DataLoader tiers.

Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

Variable codebook sizes that increase along the sequence in visual tokenizers reduce generation FID scores significantly for autoregressive models on ImageNet.

Representational Alignment Across Model Layers and Brain Regions with Multi-Level Optimal Transport

cs.LG · 2025-10-02 · accept · novelty 7.0

Multi-Level Optimal Transport (MOT) jointly infers soft layer couplings and neuron transport plans to produce global alignment scores and structured hierarchical correspondences between networks of varying depths.

ClusterMark: Towards Robust Watermarking for Autoregressive Image Generators with Visual Token Clustering

cs.CV · 2025-08-08 · unverdicted · novelty 7.0

ClusterMark applies visual token clustering to create robust in-generation watermarks for autoregressive image models, improving detectability under perturbations compared to direct token biasing while preserving quality.

SCOOTER: A Human Evaluation Framework for Unrestricted Adversarial Examples

cs.CV · 2025-07-10 · conditional · novelty 7.0

SCOOTER supplies best-practice guidelines, open tools, and a 3K-image benchmark with 34K+ human ratings showing that six tested unrestricted attacks produce images humans can detect as fake.

LAION-5B: An open large-scale dataset for training next generation image-text models

cs.CV · 2022-10-16 · accept · novelty 7.0

LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.

Pose Estimation for Non-Cooperative Rendezvous Using Neural Networks

cs.CV · 2019-06-24 · unverdicted · novelty 7.0

SPN is a CNN that detects a spacecraft bounding box, classifies then regresses attitude, and optimizes position via Gauss-Newton, achieving degree-level attitude and cm-level position errors on real images after training only on synthetic data.

Mixed Precision Training

cs.AI · 2017-10-10 · accept · novelty 7.0

Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.

Full spectrum Unlearnable Examples via Spectral Equalization

cs.CV · 2026-06-25 · unverdicted · novelty 6.0

FUSE creates full-spectrum unlearnable perturbations using random spectral masking during training and cross-band guidance to enforce consistency between frequency components.

Radial Basis Function Networks as Projection Heads in Self-Supervised Learning

cs.CV · 2026-06-19 · unverdicted · novelty 6.0

RBFN projection heads serve as competitive replacements for MLP heads in SSL and enable SNS, a label-free metric from RBF parameters that correlates strongly with logistic regression evaluation.

Jaguar: Fast Private CNN Inference with Power-of-Two Homomorphic Arithmetic

cs.CR · 2026-06-10 · unverdicted · novelty 6.0

Jaguar replaces prime-modulus HE with power-of-two arithmetic to enable coefficient-domain convolution and local-shift truncation, reporting 2-3.7x lower latency than Cheetah and Rhombus on ResNet-18/50 and MobileNetV2.

CSFlow: Aligning Flow Matching with Human Contrast Sensitivity

cs.CV · 2026-06-07 · unverdicted · novelty 6.0

CSFlow derives inference-time timestep weights for flow matching by matching per-step frequency content to human CSF, yielding 4.7% FID reduction and smaller gains on IS and GenEval.

Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders

cs.CV · 2026-05-30 · unverdicted · novelty 6.0

C-GSPN scales 2D spatial propagation to foundation vision encoders via a fast CUDA kernel, compressed blocks, and two-stage distillation, matching ViT performance with 15% fewer parameters and 4x block speedup at 2K resolution.

The Trust Paradox: How CS Researchers Engage LLM Leaderboards

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

CS researchers show pragmatic skepticism toward LLM leaderboards, using them despite distrust while preferring peer networks, arena leaderboards, and cost transparency as key missing feature.

Uncovering the Latent Potential of Deep Intermediate Representations

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.

Multi-Scale Generative Modeling with Heat Dissipation Flow Matching

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

HDFM adds a continuous heat-dissipation (blur) process to flow matching, aligns an interpolated path to fix ill-posed inverse heat dissipation, and uses x-prediction to ease high-dimensional regression, yielding better performance than most baselines on image datasets.

Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

MINE uses mechanistic interpretability on language-aligned image representations to generate per-voxel feature descriptions, validated via image generation and counterfactual edits that causally shift brain activation.

From DES to KiDS: Domain adaptation for cross-survey detection of low-surface-brightness galaxies

astro-ph.GA · 2026-05-13 · unverdicted · novelty 6.0

Domain adaptation with an ensemble of CNN and transformer models trained on DES detects 20,180 LSBGs and 434 UDGs in KiDS DR5, with structural parameters and environmental trends consistent with known samples.

Score-Based Generative Modeling through Anisotropic Stochastic Partial Differential Equations

cs.CE · 2026-05-09 · unverdicted · novelty 6.0

Anisotropic SPDEs preserve geometric data structure over longer timescales in score-based generative modeling, yielding better image quality than standard SDE baselines and flow matching in unconditional and conditional tasks.

citing papers explorer

Showing 50 of 50 citing papers.

GPUBreach: Privilege Escalation Attacks on GPUs using Rowhammer cs.CR · 2026-05-05 · unverdicted · none · ref 45
Unprivileged CUDA kernels can use Rowhammer to tamper with GPU page tables for targeted privilege escalation, leaking cryptographic keys and escalating to CPU root access by bypassing IOMMU.
Understanding deep learning requires rethinking generalization cs.LG · 2016-11-10 · accept · none · ref 5
State-of-the-art convolutional networks easily memorize random labels and unstructured noise images, indicating that generalization in deep learning cannot be explained by traditional capacity or regularization arguments.
Structure Before Collapse: Transient semantic geometry in next-token prediction cs.LG · 2026-06-25 · unverdicted · none · ref 262
Semantic geometry emerges transiently early in next-token prediction training before collapsing to Neural Collapse symmetry in synthetic settings with latent semantic factors.
Rethinking Token Reduction for Diffusion Models via Output-Similarity-Awareness cs.CV · 2026-05-21 · unverdicted · none · ref 35
DiTo shifts token reduction in DiTs to output token similarity, reusing prior-step matches across timesteps with PMR scheduling and frequency-aware penalties to raise PSNR at given speedups.
ImageAttributionBench: How Far Are We from Generalizable Attribution? cs.CV · 2026-05-13 · unverdicted · none · ref 60
ImageAttributionBench is a benchmark dataset demonstrating that state-of-the-art image attribution methods lack robustness to image degradation and fail to generalize to semantically disjoint domains.
Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders cs.PF · 2026-05-09 · accept · none · ref 13 · 2 links
Single-thread JPEG benchmarks misrank decoders for ML DataLoader use, with rankings changing across CPUs and worker counts; torchvision and simplejpeg perform best in measured DataLoader tiers.
Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation cs.CV · 2026-05-07 · unverdicted · none · ref 50
Variable codebook sizes that increase along the sequence in visual tokenizers reduce generation FID scores significantly for autoregressive models on ImageNet.
Representational Alignment Across Model Layers and Brain Regions with Multi-Level Optimal Transport cs.LG · 2025-10-02 · accept · none · ref 12
Multi-Level Optimal Transport (MOT) jointly infers soft layer couplings and neuron transport plans to produce global alignment scores and structured hierarchical correspondences between networks of varying depths.
ClusterMark: Towards Robust Watermarking for Autoregressive Image Generators with Visual Token Clustering cs.CV · 2025-08-08 · unverdicted · none · ref 23
ClusterMark applies visual token clustering to create robust in-generation watermarks for autoregressive image models, improving detectability under perturbations compared to direct token biasing while preserving quality.
SCOOTER: A Human Evaluation Framework for Unrestricted Adversarial Examples cs.CV · 2025-07-10 · conditional · none · ref 57
SCOOTER supplies best-practice guidelines, open tools, and a 3K-image benchmark with 34K+ human ratings showing that six tested unrestricted attacks produce images humans can detect as fake.
LAION-5B: An open large-scale dataset for training next generation image-text models cs.CV · 2022-10-16 · accept · none · ref 68
LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.
Pose Estimation for Non-Cooperative Rendezvous Using Neural Networks cs.CV · 2019-06-24 · unverdicted · none · ref 35
SPN is a CNN that detects a spacecraft bounding box, classifies then regresses attitude, and optimizes position via Gauss-Newton, achieving degree-level attitude and cm-level position errors on real images after training only on synthetic data.
Mixed Precision Training cs.AI · 2017-10-10 · accept · none · ref 29
Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.
Full spectrum Unlearnable Examples via Spectral Equalization cs.CV · 2026-06-25 · unverdicted · none · ref 11
FUSE creates full-spectrum unlearnable perturbations using random spectral masking during training and cross-band guidance to enforce consistency between frequency components.
Radial Basis Function Networks as Projection Heads in Self-Supervised Learning cs.CV · 2026-06-19 · unverdicted · none · ref 41
RBFN projection heads serve as competitive replacements for MLP heads in SSL and enable SNS, a label-free metric from RBF parameters that correlates strongly with logistic regression evaluation.
Jaguar: Fast Private CNN Inference with Power-of-Two Homomorphic Arithmetic cs.CR · 2026-06-10 · unverdicted · none · ref 41
Jaguar replaces prime-modulus HE with power-of-two arithmetic to enable coefficient-domain convolution and local-shift truncation, reporting 2-3.7x lower latency than Cheetah and Rhombus on ResNet-18/50 and MobileNetV2.
CSFlow: Aligning Flow Matching with Human Contrast Sensitivity cs.CV · 2026-06-07 · unverdicted · none · ref 6
CSFlow derives inference-time timestep weights for flow matching by matching per-step frequency content to human CSF, yielding 4.7% FID reduction and smaller gains on IS and GenEval.
Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders cs.CV · 2026-05-30 · unverdicted · none · ref 169
C-GSPN scales 2D spatial propagation to foundation vision encoders via a fast CUDA kernel, compressed blocks, and two-stage distillation, matching ViT performance with 15% fewer parameters and 4x block speedup at 2K resolution.
The Trust Paradox: How CS Researchers Engage LLM Leaderboards cs.CL · 2026-05-27 · unverdicted · none · ref 33
CS researchers show pragmatic skepticism toward LLM leaderboards, using them despite distrust while preferring peer networks, arena leaderboards, and cost transparency as key missing feature.
Uncovering the Latent Potential of Deep Intermediate Representations cs.LG · 2026-05-21 · unverdicted · none · ref 57
Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.
Multi-Scale Generative Modeling with Heat Dissipation Flow Matching cs.CV · 2026-05-19 · unverdicted · none · ref 29
HDFM adds a continuous heat-dissipation (blur) process to flow matching, aligns an interpolated path to fix ill-posed inverse heat dissipation, and uses x-prediction to ease high-dimensional regression, yielding better performance than most baselines on image datasets.
Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex cs.CV · 2026-05-15 · unverdicted · none · ref 74
MINE uses mechanistic interpretability on language-aligned image representations to generate per-voxel feature descriptions, validated via image generation and counterfactual edits that causally shift brain activation.
From DES to KiDS: Domain adaptation for cross-survey detection of low-surface-brightness galaxies astro-ph.GA · 2026-05-13 · unverdicted · none · ref 266
Domain adaptation with an ensemble of CNN and transformer models trained on DES detects 20,180 LSBGs and 434 UDGs in KiDS DR5, with structural parameters and environmental trends consistent with known samples.
Score-Based Generative Modeling through Anisotropic Stochastic Partial Differential Equations cs.CE · 2026-05-09 · unverdicted · none · ref 23
Anisotropic SPDEs preserve geometric data structure over longer timescales in score-based generative modeling, yielding better image quality than standard SDE baselines and flow matching in unconditional and conditional tasks.
Response Time Enhances Alignment with Heterogeneous Preferences cs.LG · 2026-05-07 · unverdicted · none · ref 233
Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.
ViTok-v2: Scaling Native Resolution Auto-Encoders to 5 Billion Parameters cs.CV · 2026-05-06 · unverdicted · none · ref 10
ViTok-v2 is a 5B-parameter native-resolution image autoencoder using NaFlex and DINOv3 loss that matches or exceeds prior tokenizers at 256p and outperforms them at 512p and above while advancing the Pareto frontier in joint scaling with generators.
Detecting Adversarial Data via Provable Adversarial Noise Amplification cs.LG · 2026-05-04 · unverdicted · none · ref 30
A provable adversarial noise amplification theorem under sufficient conditions enables a custom-trained detector that identifies adversarial examples at inference time using enhanced layer-wise noise signals.
Efficient Adversarial Training via Criticality-Aware Fine-Tuning cs.CV · 2026-04-14 · unverdicted · none · ref 46
CAAT selects critical parameters for adversarial robustness in ViTs and applies PEFT to tune only those, yielding a 4.3% robustness drop versus full AT while using ~6% of parameters.
On the Robustness of Watermarking for Autoregressive Image Generation cs.CV · 2026-04-13 · unverdicted · none · ref 29
Watermarking schemes for autoregressive image generation fail against removal and forgery attacks, enabling false detections and undermining synthetic content filtering.
EmergentBridge: Improving Zero-Shot Cross-Modal Transfer in Unified Multimodal Embedding Models cs.AI · 2026-04-13 · unverdicted · none · ref 42 · 2 links
EmergentBridge enhances zero-shot cross-modal performance on unpaired modalities by learning noisy bridge anchors from existing alignments and enforcing proxy alignment only in the orthogonal subspace to avoid gradient interference.
Complex Facial Expression Recognition Using Deep Knowledge Distillation of Basic Features cs.CV · 2023-08-11 · unverdicted · none · ref 22
Continual learning via knowledge distillation achieves SOTA 74.28% accuracy on new compound facial expression classes and 100% in one-shot learning.
Learning Effective Loss Functions Efficiently cs.LG · 2019-06-28 · unverdicted · none · ref 17
An anytime algorithm for learning loss functions that is asymptotically optimal in the worst case and experimentally faster than prior methods for hyperparameter tuning.
OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025 cs.CV · 2026-05-21 · accept · none · ref 67
The OSS Challenge provides benchmarks showing spatiotemporal video models excel at open suturing skill classification and OSATS scoring but struggle with keypoint tracking under occlusion.
Accelerating Vision Foundation Models with Drop-in Depthwise Convolution cs.CV · 2026-05-21 · unverdicted · none · ref 35
Replacing selected attention heads in pretrained ViTs with depthwise convolutions, identified by simple strategies and recovered via fine-tuning, delivers 17-20% inference speedup on image tasks with minimal accuracy loss.
A generalised pre-training strategy for deep learning networks in semantic segmentation of remotely sensed images cs.CV · 2026-04-30 · unverdicted · none · ref 7
A novel pre-training strategy for ImageNet-initialized models achieves state-of-the-art semantic segmentation performance on four remote sensing datasets (iSAID, MFNet, PST900, Potsdam) by reducing domain-specific feature learning during pre-training.
Insights from Visual Cognition: Understanding Human Action Dynamics with Overall Glance and Refined Gaze Transformer cs.CV · 2026-04-08 · unverdicted · none · ref 68
The OG-ReG Transformer achieves state-of-the-art results on Kinetics-400, Something-Something v2, and Diving-48 by combining global glance and local gaze processing paths.
CHiQPM: Calibrated Hierarchical Interpretable Image Classification cs.LG · 2025-11-25 · unverdicted · none · ref 50
CHiQPM is a hierarchical interpretable image classifier that maintains 99% of non-interpretable model accuracy while supplying contrastive global explanations, human-like hierarchical paths, and calibrated interpretable set predictions via conformal prediction.
Where Do Tokens Go? Understanding Pruning Behaviors in STEP at High Resolutions cs.CV · 2025-09-17 · unverdicted · none · ref 62
STEP uses dynamic superpatch merging via dCTS and early token exits to cut token count by 2.5x and computational complexity by up to 4x on ViT-Large for high-res segmentation, with at most 2% accuracy drop and 40% tokens halted early.
TOAST: Transformer Optimization using Adaptive and Simple Transformations cs.LG · 2024-10-07 · unverdicted · none · ref 28
TOAST approximates full transformer blocks in pretrained models via lightweight closed-form mappings to cut parameters and FLOPs without retraining or finetuning.
Adversarially Trained Deep Neural Semantic Hashing Scheme for Subjective Search in Fashion Inventory cs.CV · 2019-06-30 · unverdicted · none · ref 11
Adversarial deep semantic hashing for fashion retrieval achieves 90.65% mAP, outperforming prior deep Cauchy hashing at 53.26%.
A Utility-Preserving GAN for Face Obscuration cs.CV · 2019-06-27 · unverdicted · none · ref 21
UP-GAN uses a GAN to obscure faces while preserving utility attributes like age, gender, pose, and expression better than blurring or pixelation.
Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD cs.LG · 2019-06-26 · unverdicted · none · ref 19
GNC convolves stochastic gradient noise to smooth sharp minima in large-batch SGD, outperforming isotropic noise for better generalization in distributed deep learning.
Formal Concept Lattices are Good Semantic Scaffolds for Concept-Based Learning cs.CV · 2026-06-03 · unverdicted · none · ref 32
Formal concept lattices guide staged, hierarchical concept learning in deep networks to produce more interpretable and semantically structured representations.
CoarseSoundNet: Building a reliable model for ecological soundscape analysis cs.SD · 2026-05-20 · unverdicted · none · ref 115 · 2 links
The paper introduces CoarseSoundNet, a deep learning model for classifying biophony, geophony, and anthropophony in passive acoustic monitoring recordings, reporting performance gains from additional similar data, a silence class, and decision thresholds, plus a case study on acoustic index trends.
Debunking Grad-ECLIP: A Comprehensive Study on Its Incorrectness and Fundamental Principles for Model Interpretation cs.CV · 2026-05-13 · unverdicted · none · ref 28
Grad-ECLIP is an equivalent but flawed variant of attention-based interpretation, with two principles proposed to ensure model explanations reflect the original model.
Opportunistic Bone-Loss Screening from Routine Knee Radiographs Using a Multi-Task Deep Learning Framework with Sensitivity-Constrained Threshold Optimization cs.CV · 2026-04-22 · unverdicted · none · ref 18
STR-Net achieves AUROC of 0.933 for binary bone-loss screening and 0.801 correlation for T-score estimation from knee X-rays on a held-out test set.
Robustness Analysis of USmorph: II. Optimizing Feature Extraction, Dimensionality Reduction, and Clustering for Unsupervised Galaxy Morphology Classification astro-ph.GA · 2026-05-20 · unverdicted · none · ref 128
Optimizes ImageNet-pretrained AlexNet, UMAP, and a bagging multi-cluster voting scheme with K-means, Birch and Agg for unsupervised galaxy morphology classification, reporting improved stability and consistency with galaxy evolution expectations.
Generalization Under Scrutiny: Cross-Domain Detection Progresses, Pitfalls, and Persistent Challenges cs.CV · 2026-04-09 · unverdicted · none · ref 97
A survey that organizes methods for cross-domain object detection into a taxonomy, analyzes domain shift across detection stages, and outlines persistent challenges.
RGB-D image-based Object Detection: from Traditional Methods to Deep Learning Techniques cs.CV · 2019-07-22 · unverdicted · none · ref 73
A survey of RGB-D object detection from traditional hand-crafted features with machine learning to deep learning techniques.
Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy Labels cs.LG · 2026-05-19 · unreviewed · ref 31

ImageNet Large Scale Visual Recognition Challenge

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer