super hub

ImageNet Large Scale Visual Recognition Challenge

Aditya Khosla, Alexander C. Berg, Andrej Karpathy, Hao Su, Jia Deng, Jonathan Krause + 2 more · 2015 · International Journal of Computer Vision · DOI 10.1007/s11263-015-0816-y

52 Pith papers cite this work, alongside 30,004 external citations. Polarity classification is still indexing.

52 Pith papers citing it

30k external citations · Crossref

open at publisher browse 52 citing papers more from Aditya Khosla

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 2 dataset 2

citation-polarity summary

background 2 use dataset 2

claims ledger

dataset T able 4Common datasets used in CDOD benchmarks, summarizing modality, scale, annotation volume, typical role, and dominant shift type. Acronyms: S = Source, T = Target. Symbol:∼ indicates approximate counts. Dataset Y ear Modality #Images #Cls #Anno Role Domain Shift PASCAL VOC [95] 2007-2012 RGB∼16.5K∼20∼40K S/T mild scene shift MS COCO [96] 2014 RGB∼330K∼80∼2.5M S scene diversity ImageNet DET [97] 2013 RGB∼450K∼200∼500K S fine-grained cate- gory Cityscapes [98] 2016 RGB∼3.0K∼8∼65K T urban sce
dataset Finally, ifg 1 andg 2 both do not depend on the second argument, (3) is a linear parabolic SPDE with additive noise: dUt =α 1(t)∆Ut dt+α 2(t) dWt for allt∈I.(20) I Numerical simulation For the numerical simulation of the forward and backward processes, (3) and (1), we modeled the image space Λ as Λ = (0, d1)×(0, d 2)and decomposed the boundary∂Λaccording to ∂LΛ :={0} ×[0, d 2);(21) ∂T Λ := [0, d1)× {d 2};(22) ∂RΛ :={d 1} ×(0, d 2];(23) ∂BΛ := (0, d1]× {0}(24) into its left, top, right and bottom
background ImageNet Large Scale Visual Recognition Challenge.International Journal of Computer Vision (IJCV)115, 3 (2015), 211-252. doi:10.1007/s11263-015-0816-y [41] Shuran Song, Samuel P Lichtenberg, and Jianxiong Xiao. 2015. Sun rgb-d: A rgb-d scene understanding benchmark suite. InProceedings of the IEEE conference on computer vision and pattern recognition. 567-576. [42] Alex Tamkin, Mike Wu, and Noah D. Goodman. 2020. Viewmaker Networks: Learning Views for Unsupervised Representation Learning.ArXivab
background 1 Introduction In recent years, the emergence and evolution of auto-regressive models [18, 44, 66] and diffusion models [32, 61, 16, 50, 58, 55, 56] have led to AI-generated content (AIGC) becoming increasingly realistic and widely applied across industries, bringing convenience to fields such as entertainment [51, 2, 63], advertising [ 39, 17], and medicine [ 60, 83]. This progress is particularly evident in AI- synthesized images, which have seen gradual improvements in resolution and semantic

authors

Aditya Khosla Alexander C. Berg Andrej Karpathy Hao Su Jia Deng Jonathan Krause Li Fei-Fei Michael Bernstein Olga Russakovsky Sanjeev Satheesh Sean Ma Zhiheng Huang

co-cited works

representative citing papers

GPUBreach: Privilege Escalation Attacks on GPUs using Rowhammer

cs.CR · 2026-05-05 · unverdicted · novelty 8.0

Unprivileged CUDA kernels can use Rowhammer to tamper with GPU page tables for targeted privilege escalation, leaking cryptographic keys and escalating to CPU root access by bypassing IOMMU.

Understanding deep learning requires rethinking generalization

cs.LG · 2016-11-10 · accept · novelty 8.0

State-of-the-art convolutional networks easily memorize random labels and unstructured noise images, indicating that generalization in deep learning cannot be explained by traditional capacity or regularization arguments.

HASTE: A Framework for Training-Free, Dynamic, and Steerable Compression of Pre-Trained Convolutional Neural Networks

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

HASTE enables training-free dynamic compression of pre-trained CNNs by patch-wise LSH-based merging of redundant channels, reporting 46.2% FLOPs reduction on ResNet34 CIFAR-10 with 1.25% accuracy drop.

Structure Before Collapse: Transient semantic geometry in next-token prediction

cs.LG · 2026-06-25 · unverdicted · novelty 7.0

Semantic geometry emerges transiently early in next-token prediction training before collapsing to Neural Collapse symmetry in synthetic settings with latent semantic factors.

Rethinking Token Reduction for Diffusion Models via Output-Similarity-Awareness

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

DiTo shifts token reduction in DiTs to output token similarity, reusing prior-step matches across timesteps with PMR scheduling and frequency-aware penalties to raise PSNR at given speedups.

ImageAttributionBench: How Far Are We from Generalizable Attribution?

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

ImageAttributionBench is a benchmark dataset demonstrating that state-of-the-art image attribution methods lack robustness to image degradation and fail to generalize to semantically disjoint domains.

Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders

cs.PF · 2026-05-09 · accept · novelty 7.0 · 2 refs

Single-thread JPEG benchmarks misrank decoders for ML DataLoader use, with rankings changing across CPUs and worker counts; torchvision and simplejpeg perform best in measured DataLoader tiers.

Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

Variable codebook sizes that increase along the sequence in visual tokenizers reduce generation FID scores significantly for autoregressive models on ImageNet.

Representational Alignment Across Model Layers and Brain Regions with Multi-Level Optimal Transport

cs.LG · 2025-10-02 · accept · novelty 7.0

Multi-Level Optimal Transport (MOT) jointly infers soft layer couplings and neuron transport plans to produce global alignment scores and structured hierarchical correspondences between networks of varying depths.

ClusterMark: Towards Robust Watermarking for Autoregressive Image Generators with Visual Token Clustering

cs.CV · 2025-08-08 · unverdicted · novelty 7.0

ClusterMark applies visual token clustering to create robust in-generation watermarks for autoregressive image models, improving detectability under perturbations compared to direct token biasing while preserving quality.

SCOOTER: A Human Evaluation Framework for Unrestricted Adversarial Examples

cs.CV · 2025-07-10 · conditional · novelty 7.0

SCOOTER supplies best-practice guidelines, open tools, and a 3K-image benchmark with 34K+ human ratings showing that six tested unrestricted attacks produce images humans can detect as fake.

LAION-5B: An open large-scale dataset for training next generation image-text models

cs.CV · 2022-10-16 · accept · novelty 7.0

LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.

Pose Estimation for Non-Cooperative Rendezvous Using Neural Networks

cs.CV · 2019-06-24 · unverdicted · novelty 7.0

SPN is a CNN that detects a spacecraft bounding box, classifies then regresses attitude, and optimizes position via Gauss-Newton, achieving degree-level attitude and cm-level position errors on real images after training only on synthetic data.

Mixed Precision Training

cs.AI · 2017-10-10 · accept · novelty 7.0

Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.

Full spectrum Unlearnable Examples via Spectral Equalization

cs.CV · 2026-06-25 · unverdicted · novelty 6.0

FUSE creates full-spectrum unlearnable perturbations using random spectral masking during training and cross-band guidance to enforce consistency between frequency components.

Vision-driven Preference Synthesis for Mitigating Hallucinations in VLMs

cs.CV · 2026-06-24 · unverdicted · novelty 6.0

ViPSy constructs policy-aligned and visually grounded preference pairs for VLMs via visual cues from image variants, yielding SOTA hallucination reductions of 35.7% on AMBER and 24.5% on Object HalBench.

Radial Basis Function Networks as Projection Heads in Self-Supervised Learning

cs.CV · 2026-06-19 · unverdicted · novelty 6.0

RBFN projection heads serve as competitive replacements for MLP heads in SSL and enable SNS, a label-free metric from RBF parameters that correlates strongly with logistic regression evaluation.

Jaguar: Fast Private CNN Inference with Power-of-Two Homomorphic Arithmetic

cs.CR · 2026-06-10 · unverdicted · novelty 6.0

Jaguar replaces prime-modulus HE with power-of-two arithmetic to enable coefficient-domain convolution and local-shift truncation, reporting 2-3.7x lower latency than Cheetah and Rhombus on ResNet-18/50 and MobileNetV2.

CSFlow: Aligning Flow Matching with Human Contrast Sensitivity

cs.CV · 2026-06-07 · unverdicted · novelty 6.0

CSFlow derives inference-time timestep weights for flow matching by matching per-step frequency content to human CSF, yielding 4.7% FID reduction and smaller gains on IS and GenEval.

Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders

cs.CV · 2026-05-30 · unverdicted · novelty 6.0

C-GSPN scales 2D spatial propagation to foundation vision encoders via a fast CUDA kernel, compressed blocks, and two-stage distillation, matching ViT performance with 15% fewer parameters and 4x block speedup at 2K resolution.

The Trust Paradox: How CS Researchers Engage LLM Leaderboards

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

CS researchers show pragmatic skepticism toward LLM leaderboards, using them despite distrust while preferring peer networks, arena leaderboards, and cost transparency as key missing feature.

Uncovering the Latent Potential of Deep Intermediate Representations

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.

Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy Labels

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Symmetrizing cross-entropy produces the unique convex multi-class unhinged loss, which locally approximates other symmetric losses, and enables new interpolating losses SGCE and alpha-MAE with competitive performance on noisy-label benchmarks.

Multi-Scale Generative Modeling with Heat Dissipation Flow Matching

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

HDFM adds a continuous heat-dissipation (blur) process to flow matching, aligns an interpolated path to fix ill-posed inverse heat dissipation, and uses x-prediction to ease high-dimensional regression, yielding better performance than most baselines on image datasets.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Mixed Precision Training cs.AI · 2017-10-10 · accept · none · ref 29
Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.
EmergentBridge: Improving Zero-Shot Cross-Modal Transfer in Unified Multimodal Embedding Models cs.AI · 2026-04-13 · unverdicted · none · ref 42 · 2 links
EmergentBridge enhances zero-shot cross-modal performance on unpaired modalities by learning noisy bridge anchors from existing alignments and enforcing proxy alignment only in the orthogonal subspace to avoid gradient interference.

ImageNet Large Scale Visual Recognition Challenge

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer