The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Richard Zhang , Phillip Isola , Alexei A. Efros , Eli Shechtman , Oliver Wang

Authors on Pith no claims yet

classification 💻 cs.CV cs.GR

keywords perceptualdeepfeaturesacrossmetricssimilarityarchitecturesdataset

read the original abstract

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 16 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

HairGPT: Strand-as-Language Autoregressive Modeling for Realistic 3D Hairstyle Synthesis
cs.GR 2026-05 unverdicted novelty 7.0

HairGPT reframes 3D hairstyle synthesis as dual-decoupled autoregressive strand sequence modeling with geometric tokenization for semantic control and rare style generation.
Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes
cs.CV 2026-04 unverdicted novelty 7.0

Text-to-3D models lose prompt sensitivity for out-of-distribution shapes due to sink traps but retain geometric diversity via unconditional priors, enabling a decoupled inversion method for robust editing.
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On
cs.CV 2026-04 unverdicted novelty 7.0

FIT is a large-scale dataset of 1.13M try-on triplets with exact size data plus a synthetic generation pipeline that enables training of virtual try-on models capable of depicting realistic garment fit including ill-f...
GS-Surrogate: Deformable Gaussian Splatting for Parameter Space Exploration of Ensemble Simulations
cs.GR 2026-04 unverdicted novelty 7.0

GS-Surrogate creates a canonical Gaussian field that is sequentially deformed by simulation parameters to enable real-time, controllable 3D exploration of ensemble data while separating simulation variations from visu...
GVCC: Zero-Shot Video Compression via Codebook-Driven Stochastic Rectified Flow
cs.CV 2026-03 unverdicted novelty 7.0

GVCC achieves the lowest LPIPS on UVG at bitrates down to 0.003 bpp by encoding stochastic innovations in a marginal-preserving stochastic process derived from a pretrained rectified-flow video model, with 65% LPIPS r...
FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry
cs.LG 2026-05 unverdicted novelty 6.0

Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
Hero-Mamba: Mamba-based Dual Domain Learning for Underwater Image Enhancement
cs.CV 2026-04 unverdicted novelty 6.0

Hero-Mamba combines parallel spatial-spectral Mamba processing and a background-light-guided ColorFusion block to enhance underwater images, reporting PSNR 25.802 and SSIM 0.913 on the LSUI benchmark.
PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios
cs.CV 2026-04 unverdicted novelty 6.0

PostureObjectStitch generates assembly-aware anomaly images by decoupling multi-view features into high-frequency, texture and RGB components, modulating them temporally in a diffusion model, and applying conditional ...
Zero-shot World Models Are Developmentally Efficient Learners
cs.AI 2026-04 unverdicted novelty 6.0

A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
Emu3: Next-Token Prediction is All You Need
cs.CV 2024-09 unverdicted novelty 6.0

Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.
CaloArt: Large-Patch x-Prediction Diffusion Transformers for High-Granularity Calorimeter Shower Generation
physics.ins-det 2026-05 unverdicted novelty 5.0

CaloArt achieves top FPD, high-level, and classifier metrics on CaloChallenge datasets 2 and 3 while keeping single-GPU generation at 9-11 ms per shower by combining large-patch tokenization, x-prediction, and conditi...
Consistency Regularised Gradient Flows for Inverse Problems
stat.ML 2026-05 unverdicted novelty 5.0

A consistency-regularized Euclidean-Wasserstein-2 gradient flow performs joint posterior sampling and prompt optimization in latent space for efficient low-NFE inverse problem solving with diffusion models.
Photometric Super-Resolution for Improving Galaxy Morphological Measurements using Conditional Generative Adversarial Networks
astro-ph.IM 2026-04 unverdicted novelty 5.0

Neo, a cGAN, super-resolves HSC images to HST-like quality and improves galaxy morphological parameter accuracy by factors of 2-10.
UniCSG: Unified High-Fidelity Content-Constrained Style-Driven Generation via Staged Semantic and Frequency Disentanglement
cs.CV 2026-04 unverdicted novelty 5.0

UniCSG adds staged semantic disentanglement and frequency-aware reconstruction to DiT diffusion models to improve content preservation and style fidelity in both text- and reference-guided generation.
Enhancing Hazy Wildlife Imagery: AnimalHaze3k and IncepDehazeGan
cs.CV 2026-04 conditional novelty 5.0

A new wildlife-specific hazy image dataset and IncepDehazeGan model that reports state-of-the-art dehazing metrics and more than doubles downstream animal detection performance.
Generating Satellite Imagery Data for Wildfire Detection through Mask-Conditioned Generative AI
cs.CV 2026-04 conditional novelty 4.0

A pre-trained Earth Observation diffusion model generates realistic post-wildfire Sentinel-2 imagery from burn masks via inpainting, achieving Burn IoU 0.456 and improved saliency over full generation.