Recognition: unknown
Instance Normalization: The Missing Ingredient for Fast Stylization
read the original abstract
It this paper we revisit the fast stylization method introduced in Ulyanov et. al. (2016). We show how a small change in the stylization architecture results in a significant qualitative improvement in the generated images. The change is limited to swapping batch normalization with instance normalization, and to apply the latter both at training and testing times. The resulting method can be used to train high-performance architectures for real-time image generation. The code will is made available on github at https://github.com/DmitryUlyanov/texture_nets. Full paper can be found at arXiv:1701.02096.
This paper has not been read by Pith yet.
Forward citations
Cited by 16 Pith papers
-
QuadNorm: Resolution-Robust Normalization for Neural Operators
QuadNorm uses quadrature-based moments instead of uniform averaging in normalization layers, achieving O(h²) consistency across resolutions and better cross-resolution transfer in neural operators.
-
Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity
Every fixed finite feedforward neural network definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting.
-
Normalization Equivariance for Arbitrary Backbones, with Application to Image Denoising
A normalize-process-denormalize wrapper enforces normalization equivariance on arbitrary backbones, improving robustness to distribution shift in image denoising with no overhead.
-
StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition
StyleID supplies human-perception-aligned benchmarks and fine-tuned encoders that improve facial identity recognition robustness across stylization types and strengths.
-
High-Speed Full-Color HDR Imaging via Unwrapping Modulo-Encoded Spike Streams
An exposure-decoupled modulo formulation and iteration-free diffusion-prior unwrapping enable 1000 FPS full-color HDR imaging on spike cameras while cutting bandwidth from 20 Gbps to 6 Gbps.
-
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
PatchTST uses subseries patching and channel-independent Transformers to deliver significantly better long-term multivariate time series forecasting and strong self-supervised transfer performance.
-
Rethinking Constraint Awareness for Efficient State Embedding of Neural Routing Solver
The CARM module boosts neural routing solvers by adaptively modulating embeddings with constraint variables, enabling better use of global observations and improved performance on constrained VRPs.
-
Linearizing Vision Transformer with Test-Time Training
Using Test-Time Training's structural match to Softmax attention plus key normalization and locality modules allows inheriting pretrained weights and fine-tuning Stable Diffusion 3.5 in one hour to match quality while...
-
Are Natural-Domain Foundation Models Effective for Accelerated Cardiac MRI Reconstruction?
Natural-domain foundation models provide competitive and more robust priors than task-specific models for accelerated cardiac MRI reconstruction in cross-domain settings.
-
A fast and Generic Energy-Shifting Transformer for Hybrid Monte Carlo Radiotherapy Calculation
A hybrid Transformer-UNet model with energy-shifting inputs generates 6 MV LINAC dose maps from monoenergetic data, achieving over 98% gamma passing rate (3%/3mm) versus full Monte Carlo for prostate radiotherapy.
-
Time-Domain Voice Identity Morphing (TD-VIM): A Signal-Level Approach to Morphing Attacks on Speaker Verification Systems
TD-VIM creates signal-level morphed voice samples that achieve G-MAP attack success rates up to 99.74% against deep-learning and commercial speaker verification systems.
-
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
-
USEMA: a Scalable Efficient Mamba Like Attention for Medical Image Segmentation
USEMA is a hybrid UNet architecture merging CNNs with scalable Mamba-like attention (SEMA) that achieves better efficiency than transformers and superior segmentation accuracy than pure CNN or Mamba models across medi...
-
Style-Based Neural Architectures for Real-Time Weather Classification
Three style-based neural architectures are proposed for real-time weather classification from images, with two truncated ResNet variants claimed to outperform prior methods and generalize across public datasets.
-
Reversible Residual Normalization Alleviates Spatio-Temporal Distribution Shift
Reversible Residual Normalization (RRN) introduces spatially-aware invertible residual blocks that combine center normalization with spectral-constrained graph convolutions to mitigate spatio-temporal distribution shi...
-
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.