Recognition: unknown
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
read the original abstract
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.
This paper has not been read by Pith yet.
Forward citations
Cited by 16 Pith papers
-
HairGPT: Strand-as-Language Autoregressive Modeling for Realistic 3D Hairstyle Synthesis
HairGPT reframes 3D hairstyle synthesis as dual-decoupled autoregressive strand sequence modeling with geometric tokenization for semantic control and rare style generation.
-
Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes
Text-to-3D models lose prompt sensitivity for out-of-distribution shapes due to sink traps but retain geometric diversity via unconditional priors, enabling a decoupled inversion method for robust editing.
-
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On
FIT is a large-scale dataset of 1.13M try-on triplets with exact size data plus a synthetic generation pipeline that enables training of virtual try-on models capable of depicting realistic garment fit including ill-f...
-
GS-Surrogate: Deformable Gaussian Splatting for Parameter Space Exploration of Ensemble Simulations
GS-Surrogate creates a canonical Gaussian field that is sequentially deformed by simulation parameters to enable real-time, controllable 3D exploration of ensemble data while separating simulation variations from visu...
-
GVCC: Zero-Shot Video Compression via Codebook-Driven Stochastic Rectified Flow
GVCC achieves the lowest LPIPS on UVG at bitrates down to 0.003 bpp by encoding stochastic innovations in a marginal-preserving stochastic process derived from a pretrained rectified-flow video model, with 65% LPIPS r...
-
FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry
Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
-
Hero-Mamba: Mamba-based Dual Domain Learning for Underwater Image Enhancement
Hero-Mamba combines parallel spatial-spectral Mamba processing and a background-light-guided ColorFusion block to enhance underwater images, reporting PSNR 25.802 and SSIM 0.913 on the LSUI benchmark.
-
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios
PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional ...
-
Zero-shot World Models Are Developmentally Efficient Learners
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
-
Emu3: Next-Token Prediction is All You Need
Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.
-
CaloArt: Large-Patch x-Prediction Diffusion Transformers for High-Granularity Calorimeter Shower Generation
CaloArt achieves top FPD, high-level, and classifier metrics on CaloChallenge datasets 2 and 3 while keeping single-GPU generation at 9-11 ms per shower by combining large-patch tokenization, x-prediction, and conditi...
-
Consistency Regularised Gradient Flows for Inverse Problems
A consistency-regularized Euclidean-Wasserstein-2 gradient flow performs joint posterior sampling and prompt optimization in latent space for efficient low-NFE inverse problem solving with diffusion models.
-
Photometric Super-Resolution for Improving Galaxy Morphological Measurements using Conditional Generative Adversarial Networks
Neo, a cGAN, super-resolves HSC images to HST-like quality and improves galaxy morphological parameter accuracy by factors of 2-10.
-
UniCSG: Unified High-Fidelity Content-Constrained Style-Driven Generation via Staged Semantic and Frequency Disentanglement
UniCSG adds staged semantic disentanglement and frequency-aware reconstruction to DiT diffusion models to improve content preservation and style fidelity in both text- and reference-guided generation.
-
Enhancing Hazy Wildlife Imagery: AnimalHaze3k and IncepDehazeGan
A new wildlife-specific hazy image dataset and IncepDehazeGan model that reports state-of-the-art dehazing metrics and more than doubles downstream animal detection performance.
-
Generating Satellite Imagery Data for Wildfire Detection through Mask-Conditioned Generative AI
A pre-trained Earth Observation diffusion model generates realistic post-wildfire Sentinel-2 imagery from burn masks via inpainting, achieving Burn IoU 0.456 and improved saliency over full generation.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.