Deep Exemplar-based Video Colorization

Amine Bermak; Bo Zhang; Dong Chen; Jing Liao; Lu Yuan; Mingming He; Pedro V. Sander

arxiv: 1906.09909 · v1 · pith:WA6WWBMDnew · submitted 2019-06-24 · 💻 cs.CV · cs.AI· cs.LG

Deep Exemplar-based Video Colorization

Bo Zhang , Mingming He , Jing Liao , Pedro V. Sander , Lu Yuan , Amine Bermak , Dong Chen This is my paper

Pith reviewed 2026-05-25 17:41 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords exemplar-based video colorizationrecurrent networktemporal consistencysemantic correspondencecolor propagationdeep learningcomputer vision

0 comments

The pith

A recurrent network unifies semantic matching and color propagation to colorize video sequences from one reference image while maintaining temporal consistency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the first end-to-end network for exemplar-based video colorization. It introduces a recurrent framework that combines the tasks of finding semantic correspondences between the reference and each frame with propagating colors across the sequence. Video frames are processed in order using prior colorization results, and a temporal consistency loss is applied during training to enforce coherence. This setup is intended to reduce error accumulation that occurs when correspondence and propagation are handled in separate stages.

Core claim

By training a recurrent network end-to-end that unifies semantic correspondence and color propagation steps, with both steps guided by the reference image and reinforced by a temporal consistency loss, realistic videos can be produced that remain faithful to the reference style and exhibit good temporal stability.

What carries the argument

The recurrent framework that unifies semantic correspondence and color propagation, allowing the reference image to guide colorization of every frame based on colorization history.

If this is right

Each frame receives guidance from the reference through the unified correspondence and propagation steps.
Sequential processing based on colorization history reduces accumulated propagation errors.
The temporal consistency loss enforces coherency across the entire sequence.
The resulting videos are claimed to be superior to prior methods in both quantitative metrics and visual quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same recurrent unification might apply to other reference-guided video tasks such as style transfer or segmentation.
Efficiency improvements would be needed before the method could run on long sequences in real time.
Performance on videos with rapid motion or lighting changes would need separate verification beyond the reported experiments.

Load-bearing premise

Training the recurrent network end-to-end with the temporal consistency loss will produce realistic videos with good temporal stability without introducing new artifacts or drifting from the reference style across long sequences.

What would settle it

A test on a long video sequence showing either accumulated color drift away from the reference or visible flickering despite the temporal loss would falsify the central claim.

Figures

Figures reproduced from arXiv: 1906.09909 by Amine Bermak, Bo Zhang, Dong Chen, Jing Liao, Lu Yuan, Mingming He, Pedro V. Sander.

**Figure 2.** Figure 2: The detailed diagram of the proposed network. The correspondence subnet finds the correspondence of source image [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Augmented training images from ImageNet dataset. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: First row: nearest neighbor matching. Second row: [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation study for different loss functions. Please refer to the supplementary material for the quantitative comparisons. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison with image colorization with state-of-the-art methods. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Quantitative comparison with video color propagation. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: User study results [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Comparison with video color propagation. With a given color frame as start, colors are propagated to the succeeding video [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison with automatic video colorization. [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗

**Figure 11.** Figure 11: Multi-modal colorization according to the user reference. [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

**Figure 12.** Figure 12: Multi-modal colorization according to the user reference. [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗

**Figure 13.** Figure 13: Multi-modal colorization according to the user reference. [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗

**Figure 14.** Figure 14: Multi-modal colorization according to the user reference. [PITH_FULL_IMAGE:figures/full_fig_p013_14.png] view at source ↗

**Figure 15.** Figure 15: Colorization on legacy videos. Appendix E. User study We conduct two user studies: one to measure the video colorization quality and another for video propagation. For the first study, we first compare our video colorization with three methods of per-frame automatic video colorization: Larsson et al. [16], Zhang et al. [17] and Iizuka et al. [15]. We use 19 videos randomly selected from the video test dat… view at source ↗

**Figure 16.** Figure 16: Limitation: our method cannot assure long-term temporal consistency. The color of the train gradually changes (from red to [PITH_FULL_IMAGE:figures/full_fig_p015_16.png] view at source ↗

read the original abstract

This paper presents the first end-to-end network for exemplar-based video colorization. The main challenge is to achieve temporal consistency while remaining faithful to the reference style. To address this issue, we introduce a recurrent framework that unifies the semantic correspondence and color propagation steps. Both steps allow a provided reference image to guide the colorization of every frame, thus reducing accumulated propagation errors. Video frames are colorized in sequence based on the colorization history, and its coherency is further enforced by the temporal consistency loss. All of these components, learned end-to-end, help produce realistic videos with good temporal stability. Experiments show our result is superior to the state-of-the-art methods both quantitatively and qualitatively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Recurrent unification of correspondence and propagation is the core new idea, but thin experimental detail leaves the no-drift claim unverified.

read the letter

This paper claims to deliver the first end-to-end network for exemplar-based video colorization. The key move is a recurrent framework that folds semantic correspondence and color propagation into one loop so the reference image keeps guiding every frame instead of letting errors pile up through separate steps. A temporal consistency loss is added on top to enforce coherence across the sequence. That unification is the part that has not been reported before for this exact task, and it directly targets the practical headache of style drift and flickering in video restoration pipelines. The approach is sensible on paper: by learning the whole thing end-to-end, the network can in principle learn to maintain reference fidelity without manual tuning of propagation rules. The abstract positions the method as superior both quantitatively and qualitatively, which is the usual claim in this area. The citation pattern looks standard, building on prior static-image colorization work without obvious gaps in the referenced literature. The soft spot is the experimental section, which the abstract summarizes without dataset sizes, error bars, ablation tables, or long-sequence tests. The central promise—that end-to-end training with the recurrent state will avoid new artifacts and reference drift over extended clips—rests on that unshown evidence. If those results hold up in the full paper, the method is worth attention; if they are only short-clip demos, the advantage shrinks. Readers working on video colorization or temporal consistency in generative models would find the recurrent unification worth reading. The work is coherent enough on its own terms to merit a serious referee, though any review would need to press for the missing ablations and longer-sequence validation before acceptance.

Referee Report

2 major / 0 minor

Summary. This paper claims to present the first end-to-end recurrent network for exemplar-based video colorization. The recurrent framework unifies semantic correspondence and color propagation so a single reference image guides colorization of every frame, reducing accumulated propagation errors. Video frames are processed sequentially using colorization history, with a temporal consistency loss enforcing coherency; all components are learned end-to-end to produce realistic videos with good temporal stability. Experiments are said to demonstrate quantitative and qualitative superiority over state-of-the-art methods.

Significance. If the experimental claims hold with proper validation, the unified recurrent approach would represent a meaningful advance in exemplar-based video colorization by addressing error accumulation and temporal instability in a single learned model, potentially outperforming prior separate-stage pipelines.

major comments (2)

[Abstract] Abstract: the central claim of quantitative and qualitative superiority (and reduced propagation errors via the recurrent unification) is asserted without error bars, dataset details, ablation results, or long-sequence experiments, so the experimental summary cannot be verified and the claim that end-to-end training prevents reference drift remains untested.
[Abstract] Abstract: no loss equation, recurrence depth analysis, or ablation isolating the unification of semantic correspondence and color propagation is supplied, leaving the assumption that the recurrent state maintains reference fidelity without introducing new artifacts or style drift across long sequences unsupported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each point below and indicate where revisions to the abstract or supporting text are feasible.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of quantitative and qualitative superiority (and reduced propagation errors via the recurrent unification) is asserted without error bars, dataset details, ablation results, or long-sequence experiments, so the experimental summary cannot be verified and the claim that end-to-end training prevents reference drift remains untested.

Authors: The abstract is a concise summary; full dataset descriptions appear in Section 4.1, quantitative/qualitative results and comparisons in Section 4.2, and ablations in Section 5. Error bars were not reported in the submitted version. The recurrent unification is motivated in Section 3 as a means to reduce propagation drift, with temporal stability shown on the evaluated sequences. We will revise the abstract to qualify the superiority claim and add a forward reference to the experimental sections. revision: partial
Referee: [Abstract] Abstract: no loss equation, recurrence depth analysis, or ablation isolating the unification of semantic correspondence and color propagation is supplied, leaving the assumption that the recurrent state maintains reference fidelity without introducing new artifacts or style drift across long sequences unsupported.

Authors: The temporal consistency loss is formalized in the method section (with the relevant equation). The recurrent state and unification of correspondence and propagation are detailed in Section 3, and component ablations appear in Section 5. A dedicated recurrence-depth study and explicit long-sequence drift measurements are not present; we can add a brief reference to the loss equation in the abstract and note the design rationale for fidelity preservation. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical ML method with external benchmarks

full rationale

The paper proposes a recurrent end-to-end neural network architecture for exemplar-based video colorization, trained with a temporal consistency loss and evaluated on external datasets against prior methods. No derivation chain, equations, or first-principles results are presented that reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains. Claims rest on learned behavior and quantitative/qualitative experiments, which are self-contained against independent benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated. The central claim rests on standard supervised learning assumptions (gradient-based optimization of a neural network) and the untested premise that the proposed architecture and loss suffice for temporal stability.

axioms (1)

domain assumption Gradient-based optimization can jointly learn semantic correspondence, color propagation, and temporal consistency from paired training data.
Implicit in any end-to-end deep learning claim; location: entire abstract description of training.

pith-pipeline@v0.9.0 · 5655 in / 1170 out tokens · 22855 ms · 2026-05-25T17:41:49.006445+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 17 internal anchors

[1]

Colorization us- ing optimization,

A. Levin, D. Lischinski, and Y . Weiss, “Colorization us- ing optimization,” in ACM transactions on graphics (TOG), vol. 23, pp. 689–694, ACM, 2004. 1, 2

work page 2004
[2]

Fast image and video colorization using chrominance blending,

L. Yatziv and G. Sapiro, “Fast image and video colorization using chrominance blending,” 2004. 1, 2

work page 2004
[3]

An adaptive edge detection based colorization algo- rithm and its applications,

Y .-C. Huang, Y .-S. Tung, J.-C. Chen, S.-W. Wang, and J.- L. Wu, “An adaptive edge detection based colorization algo- rithm and its applications,” inProceedings of the 13th annual ACM international conference on Multimedia, pp. 351–354, ACM, 2005. 1, 2

work page 2005
[4]

Manga colorization,

Y . Qu, T.-T. Wong, and P.-A. Heng, “Manga colorization,” in ACM Transactions on Graphics (TOG) , vol. 25, pp. 1214– 1220, ACM, 2006. 1, 2

work page 2006
[5]

Natural image colorization,

Q. Luan, F. Wen, D. Cohen-Or, L. Liang, Y .-Q. Xu, and H.- Y . Shum, “Natural image colorization,” in Proceedings of the 18th Eurographics conference on Rendering Techniques, pp. 309–320, Eurographics Association, 2007. 1, 2

work page 2007
[6]

Transferring color to greyscale images,

T. Welsh, M. Ashikhmin, and K. Mueller, “Transferring color to greyscale images,” in ACM Transactions on Graph- ics (TOG), vol. 21, pp. 277–280, ACM, 2002. 1, 2

work page 2002
[7]

Variational exemplar-based image colorization,

A. Bugeau, V .-T. Ta, and N. Papadakis, “Variational exemplar-based image colorization,” IEEE Transactions on Image Processing, vol. 23, no. 1, pp. 298–307, 2014. 1, 2

work page 2014
[8]

Intrinsic colorization,

X. Liu, L. Wan, Y . Qu, T.-T. Wong, S. Lin, C.-S. Leung, and P.-A. Heng, “Intrinsic colorization,” inACM Transactions on Graphics (TOG), vol. 27, p. 152, ACM, 2008. 1, 2

work page 2008
[9]

Semantic colorization with internet im- ages,

A. Y .-S. Chia, S. Zhuo, R. K. Gupta, Y .-W. Tai, S.-Y . Cho, P. Tan, and S. Lin, “Semantic colorization with internet im- ages,” in ACM Transactions on Graphics (TOG) , vol. 30, p. 156, ACM, 2011. 1, 2

work page 2011
[10]

Image colorization using similar images,

R. K. Gupta, A. Y .-S. Chia, D. Rajan, E. S. Ng, and H. Zhiy- ong, “Image colorization using similar images,” in Proceed- ings of the 20th ACM international conference on Multime- dia, pp. 369–378, ACM, 2012. 1, 2

work page 2012
[11]

Automatic im- age colorization via multimodal predictions,

G. Charpiat, M. Hofmann, and B. Sch¨olkopf, “Automatic im- age colorization via multimodal predictions,” in European conference on computer vision, pp. 126–139, Springer, 2008. 1, 2

work page 2008
[12]

Colorization by example.,

R. Ironi, D. Cohen-Or, and D. Lischinski, “Colorization by example.,” in Rendering Techniques, pp. 201–210, Citeseer,

work page
[13]

Local color transfer via probabilistic segmentation by expectation-maximization,

Y .-W. Tai, J.-Y . Jia, and C.-K. Tang, “Local color transfer via probabilistic segmentation by expectation-maximization,” in IEEE Conference on Computer Vision & Pattern Recognition (CVPR), 2005. 1, 2

work page 2005
[14]

Deep colorization,

Z. Cheng, Q. Yang, and B. Sheng, “Deep colorization,” in Proceedings of the IEEE International Conference on Com- puter Vision, pp. 415–423, 2015. 1, 2

work page 2015
[15]

Let there be color!: joint end-to-end learning of global and local im- age priors for automatic image colorization with simultane- ous classiﬁcation,

S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Let there be color!: joint end-to-end learning of global and local im- age priors for automatic image colorization with simultane- ous classiﬁcation,” ACM Transactions on Graphics (TOG) , vol. 35, no. 4, p. 110, 2016. 1, 2, 6, 7, 8, 14

work page 2016
[16]

Learning rep- resentations for automatic colorization,

G. Larsson, M. Maire, and G. Shakhnarovich, “Learning rep- resentations for automatic colorization,” in European Con- ference on Computer Vision , pp. 577–593, Springer, 2016. 1, 2, 6, 7, 8, 14

work page 2016
[17]

Colorful image col- orization,

R. Zhang, P. Isola, and A. A. Efros, “Colorful image col- orization,” in European Conference on Computer Vision , pp. 649–666, Springer, 2016. 1, 2, 6, 7, 8, 14

work page 2016
[18]

Learning large- scale automatic image colorization,

A. Deshpande, J. Rock, and D. Forsyth, “Learning large- scale automatic image colorization,” in Proceedings of the IEEE International Conference on Computer Vision , pp. 567–575, 2015. 1, 2

work page 2015
[19]

Pixel-level Semantics Guided Image Colorization

J. Zhao, L. Liu, C. G. Snoek, J. Han, and L. Shao, “Pixel- level semantics guided image colorization,” arXiv preprint arXiv:1808.01597, 2018. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2

F. Baldassarre, D. G. Mor ´ın, and L. Rod ´es-Guirao, “Deep koalarization: Image colorization using cnns and inception- resnet-v2,” arXiv preprint arXiv:1712.03400, 2017. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

Blind video temporal consistency,

N. Bonneel, J. Tompkin, K. Sunkavalli, D. Sun, S. Paris, and H. Pﬁster, “Blind video temporal consistency,” ACM Trans- actions on Graphics (TOG), vol. 34, no. 6, p. 196, 2015. 1, 2

work page 2015
[22]

Learning Blind Video Temporal Consistency

W.-S. Lai, J.-B. Huang, O. Wang, E. Shechtman, E. Yumer, and M.-H. Yang, “Learning blind video temporal consis- tency,”arXiv preprint arXiv:1808.00449, 2018. 1, 2, 7

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

Video colorization using parallel optimization in feature space,

B. Sheng, H. Sun, M. Magnor, and P. Li, “Video colorization using parallel optimization in feature space,” IEEE Transac- tions on Circuits and Systems for Video Technology, vol. 24, no. 3, pp. 407–417, 2014. 1, 2

work page 2014
[24]

Key- frame based spatiotemporal scribble propagation,

P. Do ˘gan, T. O. Aydın, N. Stefanoski, and A. Smolic, “Key- frame based spatiotemporal scribble propagation,” in Pro- ceedings of the Eurographics Workshop on Intelligent Cin- ematography and Editing, pp. 13–20, Eurographics Associ- ation, 2015. 1, 2

work page 2015
[25]

Spatiotemporal colorization of video using 3d steerable pyramids,

S. Paul, S. Bhattacharya, and S. Gupta, “Spatiotemporal colorization of video using 3d steerable pyramids,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 8, pp. 1605–1619, 2017. 1, 2

work page 2017
[26]

Video propagation networks,

V . Jampani, R. Gadde, and P. V . Gehler, “Video propagation networks,” in Proc. CVPR, vol. 6, p. 7, 2017. 1, 2, 7, 8, 14

work page 2017
[27]

Tracking emerges by colorizing videos,

C. V ondrick, A. Shrivastava, A. Fathi, S. Guadarrama, and K. Murphy, “Tracking emerges by colorizing videos,” in Proc. ECCV, 2018. 1, 2

work page 2018
[28]

Switchable Temporal Propagation Network

S. Liu, G. Zhong, S. De Mello, J. Gu, V . Jampani, M.-H. Yang, and J. Kautz, “Switchable temporal propagation net- work,” arXiv preprint arXiv:1804.08758 , 2018. 1, 2, 7, 8, 14

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

Deep Video Color Propagation

S. Meyer, V . Cornill `ere, A. Djelouah, C. Schroers, and M. Gross, “Deep video color propagation,” arXiv preprint arXiv:1808.03232, 2018. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

Deep exemplar-based colorization,

M. He, D. Chen, J. Liao, P. V . Sander, and L. Yuan, “Deep exemplar-based colorization,” ACM Transactions on Graph- ics (TOG), vol. 37, no. 4, p. 47, 2018. 1, 2, 6, 7

work page 2018
[31]

Visual Attribute Transfer through Deep Image Analogy

J. Liao, Y . Yao, L. Yuan, G. Hua, and S. B. Kang, “Visual at- tribute transfer through deep image analogy,” arXiv preprint arXiv:1705.01088, 2017. 1, 2, 4

work page internal anchor Pith review Pith/arXiv arXiv 2017
[32]

Real-Time User-Guided Image Colorization with Learned Deep Priors

R. Zhang, J.-Y . Zhu, P. Isola, X. Geng, A. S. Lin, T. Yu, and A. A. Efros, “Real-time user-guided image colorization with learned deep priors,” arXiv preprint arXiv:1705.02999,

work page internal anchor Pith review Pith/arXiv arXiv
[33]

Progressive Color Transfer with Dense Semantic Correspondences

M. He, J. Liao, L. Yuan, and P. V . Sander, “Neural color transfer between images,” arXiv preprint arXiv:1710.00756,

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Image- to-image translation with conditional adversarial networks,

P. Isola, J.-Y . Zhu, T. Zhou, and A. A. Efros, “Image- to-image translation with conditional adversarial networks,” arXiv preprint, 2017. 2

work page 2017
[35]

Learning diverse image colorization.,

A. Deshpande, J. Lu, M.-C. Yeh, M. J. Chong, and D. A. Forsyth, “Learning diverse image colorization.,” in CVPR, pp. 2877–2885, 2017. 2

work page 2017
[36]

Structural Consistency and Controllability for Diverse Colorization

S. Messaoud, D. Forsyth, and A. G. Schwing, “Struc- tural consistency and controllability for diverse coloriza- tion,” arXiv preprint arXiv:1809.02129, 2018. 2

work page internal anchor Pith review Pith/arXiv arXiv 2018
[37]

PixColor: Pixel Recursive Colorization

S. Guadarrama, R. Dahl, D. Bieber, M. Norouzi, J. Shlens, and K. Murphy, “Pixcolor: Pixel recursive colorization,” arXiv preprint arXiv:1705.07208, 2017. 2

work page internal anchor Pith review Pith/arXiv arXiv 2017
[38]

Probabilistic Image Colorization

A. Royer, A. Kolesnikov, and C. H. Lampert, “Probabilistic image colorization,”arXiv preprint arXiv:1705.04258, 2017. 2

work page internal anchor Pith review Pith/arXiv arXiv 2017
[39]

Colorization of grayscale images and videos using a semiautomatic approach,

V . G. Jacob and S. Gupta, “Colorization of grayscale images and videos using a semiautomatic approach,” in Image Pro- cessing (ICIP), 2009 16th IEEE International Conference on, pp. 1653–1656, IEEE, 2009. 2

work page 2009
[40]

Approximate nearest neighbor ﬁelds in video,

N. Ben-Zrihem and L. Zelnik-Manor, “Approximate nearest neighbor ﬁelds in video,” inProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition, pp. 5233– 5242, 2015. 2

work page 2015
[41]

Robust and au- tomatic video colorization via multiframe reordering reﬁne- ment,

S. Xia, J. Liu, Y . Fang, W. Yang, and Z. Guo, “Robust and au- tomatic video colorization via multiframe reordering reﬁne- ment,” in Image Processing (ICIP), 2016 IEEE International Conference on, pp. 4017–4021, IEEE, 2016. 2

work page 2016
[42]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. 3

work page internal anchor Pith review Pith/arXiv arXiv 2014
[43]

Non-local Neural Networks

X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” arXiv preprint arXiv:1711.07971, vol. 10,

work page internal anchor Pith review Pith/arXiv arXiv
[44]

Perceptual losses for real-time style transfer and super-resolution,

J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision , pp. 694–711, Springer,

work page
[45]

The Contextual Loss for Image Transformation with Non-Aligned Data

R. Mechrez, I. Talmi, and L. Zelnik-Manor, “The contextual loss for image transformation with non-aligned data,” arXiv preprint arXiv:1803.02077, 2018. 4

work page internal anchor Pith review Pith/arXiv arXiv 2018
[46]

Edge- preserving decompositions for multi-scale tone and detail manipulation,

Z. Farbman, R. Fattal, D. Lischinski, and R. Szeliski, “Edge- preserving decompositions for multi-scale tone and detail manipulation,” in ACM Transactions on Graphics (TOG) , vol. 27, p. 67, ACM, 2008. 4

work page 2008
[47]

The relativistic discriminator: a key element missing from standard GAN

A. Jolicoeur-Martineau, “The relativistic discriminator: a key element missing from standard gan,” arXiv preprint arXiv:1807.00734, 2018. 5

work page internal anchor Pith review Pith/arXiv arXiv 2018
[48]

Coherent online video style transfer,

D. Chen, J. Liao, L. Yuan, N. Yu, and G. Hua, “Coherent online video style transfer,” in Proceedings of the IEEE In- ternational Conference on Computer Vision, pp. 1105–1114,

work page
[49]

Self-Attention Generative Adversarial Networks

H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self- attention generative adversarial networks,” arXiv preprint arXiv:1805.08318, 2018. 5

work page internal anchor Pith review Pith/arXiv arXiv 2018
[50]

Spectral Normalization for Generative Adversarial Networks

T. Miyato, T. Kataoka, M. Koyama, and Y . Yoshida, “Spec- tral normalization for generative adversarial networks,” arXiv preprint arXiv:1802.05957, 2018. 5

work page internal anchor Pith review Pith/arXiv arXiv 2018
[51]

“Videvo.” https://www.videvo.net/. 5

work page
[52]

Actions in con- text,

M. Marszałek, I. Laptev, and C. Schmid, “Actions in con- text,” in IEEE Conference on Computer Vision & Pattern Recognition, 2009. 5

work page 2009
[53]

Flownet 2.0: Evolution of optical ﬂow estimation with deep networks,

E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical ﬂow estimation with deep networks,” inIEEE conference on computer vision and pattern recognition (CVPR), vol. 2, p. 6, 2017. 5

work page 2017
[54]

Artistic style trans- fer for videos,

M. Ruder, A. Dosovitskiy, and T. Brox, “Artistic style trans- fer for videos,” in German Conference on Pattern Recogni- tion, pp. 26–36, Springer, 2016. 5

work page 2016
[55]

Gans trained by a two time-scale update rule converge to a local nash equilibrium,

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” inAdvances in Neural Information Processing Systems, pp. 6626–6637, 2017. 6

work page 2017
[56]

Measuring colorfulness in natural images,

D. Hasler and S. E. Suesstrunk, “Measuring colorfulness in natural images,” in Human vision and electronic imaging VIII, vol. 5007, pp. 87–96, International Society for Optics and Photonics, 2003. 6

work page 2003
[57]

A benchmark dataset and evaluation methodology for video object segmentation,

F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung, “A benchmark dataset and evaluation methodology for video object segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732, 2016. 8 Appendix A. Details of network architecture The overall network consists of two sub-m...

work page 2016

[1] [1]

Colorization us- ing optimization,

A. Levin, D. Lischinski, and Y . Weiss, “Colorization us- ing optimization,” in ACM transactions on graphics (TOG), vol. 23, pp. 689–694, ACM, 2004. 1, 2

work page 2004

[2] [2]

Fast image and video colorization using chrominance blending,

L. Yatziv and G. Sapiro, “Fast image and video colorization using chrominance blending,” 2004. 1, 2

work page 2004

[3] [3]

An adaptive edge detection based colorization algo- rithm and its applications,

Y .-C. Huang, Y .-S. Tung, J.-C. Chen, S.-W. Wang, and J.- L. Wu, “An adaptive edge detection based colorization algo- rithm and its applications,” inProceedings of the 13th annual ACM international conference on Multimedia, pp. 351–354, ACM, 2005. 1, 2

work page 2005

[4] [4]

Manga colorization,

Y . Qu, T.-T. Wong, and P.-A. Heng, “Manga colorization,” in ACM Transactions on Graphics (TOG) , vol. 25, pp. 1214– 1220, ACM, 2006. 1, 2

work page 2006

[5] [5]

Natural image colorization,

Q. Luan, F. Wen, D. Cohen-Or, L. Liang, Y .-Q. Xu, and H.- Y . Shum, “Natural image colorization,” in Proceedings of the 18th Eurographics conference on Rendering Techniques, pp. 309–320, Eurographics Association, 2007. 1, 2

work page 2007

[6] [6]

Transferring color to greyscale images,

T. Welsh, M. Ashikhmin, and K. Mueller, “Transferring color to greyscale images,” in ACM Transactions on Graph- ics (TOG), vol. 21, pp. 277–280, ACM, 2002. 1, 2

work page 2002

[7] [7]

Variational exemplar-based image colorization,

A. Bugeau, V .-T. Ta, and N. Papadakis, “Variational exemplar-based image colorization,” IEEE Transactions on Image Processing, vol. 23, no. 1, pp. 298–307, 2014. 1, 2

work page 2014

[8] [8]

Intrinsic colorization,

X. Liu, L. Wan, Y . Qu, T.-T. Wong, S. Lin, C.-S. Leung, and P.-A. Heng, “Intrinsic colorization,” inACM Transactions on Graphics (TOG), vol. 27, p. 152, ACM, 2008. 1, 2

work page 2008

[9] [9]

Semantic colorization with internet im- ages,

A. Y .-S. Chia, S. Zhuo, R. K. Gupta, Y .-W. Tai, S.-Y . Cho, P. Tan, and S. Lin, “Semantic colorization with internet im- ages,” in ACM Transactions on Graphics (TOG) , vol. 30, p. 156, ACM, 2011. 1, 2

work page 2011

[10] [10]

Image colorization using similar images,

R. K. Gupta, A. Y .-S. Chia, D. Rajan, E. S. Ng, and H. Zhiy- ong, “Image colorization using similar images,” in Proceed- ings of the 20th ACM international conference on Multime- dia, pp. 369–378, ACM, 2012. 1, 2

work page 2012

[11] [11]

Automatic im- age colorization via multimodal predictions,

G. Charpiat, M. Hofmann, and B. Sch¨olkopf, “Automatic im- age colorization via multimodal predictions,” in European conference on computer vision, pp. 126–139, Springer, 2008. 1, 2

work page 2008

[12] [12]

Colorization by example.,

R. Ironi, D. Cohen-Or, and D. Lischinski, “Colorization by example.,” in Rendering Techniques, pp. 201–210, Citeseer,

work page

[13] [13]

Local color transfer via probabilistic segmentation by expectation-maximization,

Y .-W. Tai, J.-Y . Jia, and C.-K. Tang, “Local color transfer via probabilistic segmentation by expectation-maximization,” in IEEE Conference on Computer Vision & Pattern Recognition (CVPR), 2005. 1, 2

work page 2005

[14] [14]

Deep colorization,

Z. Cheng, Q. Yang, and B. Sheng, “Deep colorization,” in Proceedings of the IEEE International Conference on Com- puter Vision, pp. 415–423, 2015. 1, 2

work page 2015

[15] [15]

Let there be color!: joint end-to-end learning of global and local im- age priors for automatic image colorization with simultane- ous classiﬁcation,

S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Let there be color!: joint end-to-end learning of global and local im- age priors for automatic image colorization with simultane- ous classiﬁcation,” ACM Transactions on Graphics (TOG) , vol. 35, no. 4, p. 110, 2016. 1, 2, 6, 7, 8, 14

work page 2016

[16] [16]

Learning rep- resentations for automatic colorization,

G. Larsson, M. Maire, and G. Shakhnarovich, “Learning rep- resentations for automatic colorization,” in European Con- ference on Computer Vision , pp. 577–593, Springer, 2016. 1, 2, 6, 7, 8, 14

work page 2016

[17] [17]

Colorful image col- orization,

R. Zhang, P. Isola, and A. A. Efros, “Colorful image col- orization,” in European Conference on Computer Vision , pp. 649–666, Springer, 2016. 1, 2, 6, 7, 8, 14

work page 2016

[18] [18]

Learning large- scale automatic image colorization,

A. Deshpande, J. Rock, and D. Forsyth, “Learning large- scale automatic image colorization,” in Proceedings of the IEEE International Conference on Computer Vision , pp. 567–575, 2015. 1, 2

work page 2015

[19] [19]

Pixel-level Semantics Guided Image Colorization

J. Zhao, L. Liu, C. G. Snoek, J. Han, and L. Shao, “Pixel- level semantics guided image colorization,” arXiv preprint arXiv:1808.01597, 2018. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2018

[20] [20]

Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2

F. Baldassarre, D. G. Mor ´ın, and L. Rod ´es-Guirao, “Deep koalarization: Image colorization using cnns and inception- resnet-v2,” arXiv preprint arXiv:1712.03400, 2017. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

Blind video temporal consistency,

N. Bonneel, J. Tompkin, K. Sunkavalli, D. Sun, S. Paris, and H. Pﬁster, “Blind video temporal consistency,” ACM Trans- actions on Graphics (TOG), vol. 34, no. 6, p. 196, 2015. 1, 2

work page 2015

[22] [22]

Learning Blind Video Temporal Consistency

W.-S. Lai, J.-B. Huang, O. Wang, E. Shechtman, E. Yumer, and M.-H. Yang, “Learning blind video temporal consis- tency,”arXiv preprint arXiv:1808.00449, 2018. 1, 2, 7

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [23]

Video colorization using parallel optimization in feature space,

B. Sheng, H. Sun, M. Magnor, and P. Li, “Video colorization using parallel optimization in feature space,” IEEE Transac- tions on Circuits and Systems for Video Technology, vol. 24, no. 3, pp. 407–417, 2014. 1, 2

work page 2014

[24] [24]

Key- frame based spatiotemporal scribble propagation,

P. Do ˘gan, T. O. Aydın, N. Stefanoski, and A. Smolic, “Key- frame based spatiotemporal scribble propagation,” in Pro- ceedings of the Eurographics Workshop on Intelligent Cin- ematography and Editing, pp. 13–20, Eurographics Associ- ation, 2015. 1, 2

work page 2015

[25] [25]

Spatiotemporal colorization of video using 3d steerable pyramids,

S. Paul, S. Bhattacharya, and S. Gupta, “Spatiotemporal colorization of video using 3d steerable pyramids,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 8, pp. 1605–1619, 2017. 1, 2

work page 2017

[26] [26]

Video propagation networks,

V . Jampani, R. Gadde, and P. V . Gehler, “Video propagation networks,” in Proc. CVPR, vol. 6, p. 7, 2017. 1, 2, 7, 8, 14

work page 2017

[27] [27]

Tracking emerges by colorizing videos,

C. V ondrick, A. Shrivastava, A. Fathi, S. Guadarrama, and K. Murphy, “Tracking emerges by colorizing videos,” in Proc. ECCV, 2018. 1, 2

work page 2018

[28] [28]

Switchable Temporal Propagation Network

S. Liu, G. Zhong, S. De Mello, J. Gu, V . Jampani, M.-H. Yang, and J. Kautz, “Switchable temporal propagation net- work,” arXiv preprint arXiv:1804.08758 , 2018. 1, 2, 7, 8, 14

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [29]

Deep Video Color Propagation

S. Meyer, V . Cornill `ere, A. Djelouah, C. Schroers, and M. Gross, “Deep video color propagation,” arXiv preprint arXiv:1808.03232, 2018. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

Deep exemplar-based colorization,

M. He, D. Chen, J. Liao, P. V . Sander, and L. Yuan, “Deep exemplar-based colorization,” ACM Transactions on Graph- ics (TOG), vol. 37, no. 4, p. 47, 2018. 1, 2, 6, 7

work page 2018

[31] [31]

Visual Attribute Transfer through Deep Image Analogy

J. Liao, Y . Yao, L. Yuan, G. Hua, and S. B. Kang, “Visual at- tribute transfer through deep image analogy,” arXiv preprint arXiv:1705.01088, 2017. 1, 2, 4

work page internal anchor Pith review Pith/arXiv arXiv 2017

[32] [32]

Real-Time User-Guided Image Colorization with Learned Deep Priors

R. Zhang, J.-Y . Zhu, P. Isola, X. Geng, A. S. Lin, T. Yu, and A. A. Efros, “Real-time user-guided image colorization with learned deep priors,” arXiv preprint arXiv:1705.02999,

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

Progressive Color Transfer with Dense Semantic Correspondences

M. He, J. Liao, L. Yuan, and P. V . Sander, “Neural color transfer between images,” arXiv preprint arXiv:1710.00756,

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

Image- to-image translation with conditional adversarial networks,

P. Isola, J.-Y . Zhu, T. Zhou, and A. A. Efros, “Image- to-image translation with conditional adversarial networks,” arXiv preprint, 2017. 2

work page 2017

[35] [35]

Learning diverse image colorization.,

A. Deshpande, J. Lu, M.-C. Yeh, M. J. Chong, and D. A. Forsyth, “Learning diverse image colorization.,” in CVPR, pp. 2877–2885, 2017. 2

work page 2017

[36] [36]

Structural Consistency and Controllability for Diverse Colorization

S. Messaoud, D. Forsyth, and A. G. Schwing, “Struc- tural consistency and controllability for diverse coloriza- tion,” arXiv preprint arXiv:1809.02129, 2018. 2

work page internal anchor Pith review Pith/arXiv arXiv 2018

[37] [37]

PixColor: Pixel Recursive Colorization

S. Guadarrama, R. Dahl, D. Bieber, M. Norouzi, J. Shlens, and K. Murphy, “Pixcolor: Pixel recursive colorization,” arXiv preprint arXiv:1705.07208, 2017. 2

work page internal anchor Pith review Pith/arXiv arXiv 2017

[38] [38]

Probabilistic Image Colorization

A. Royer, A. Kolesnikov, and C. H. Lampert, “Probabilistic image colorization,”arXiv preprint arXiv:1705.04258, 2017. 2

work page internal anchor Pith review Pith/arXiv arXiv 2017

[39] [39]

Colorization of grayscale images and videos using a semiautomatic approach,

V . G. Jacob and S. Gupta, “Colorization of grayscale images and videos using a semiautomatic approach,” in Image Pro- cessing (ICIP), 2009 16th IEEE International Conference on, pp. 1653–1656, IEEE, 2009. 2

work page 2009

[40] [40]

Approximate nearest neighbor ﬁelds in video,

N. Ben-Zrihem and L. Zelnik-Manor, “Approximate nearest neighbor ﬁelds in video,” inProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition, pp. 5233– 5242, 2015. 2

work page 2015

[41] [41]

Robust and au- tomatic video colorization via multiframe reordering reﬁne- ment,

S. Xia, J. Liu, Y . Fang, W. Yang, and Z. Guo, “Robust and au- tomatic video colorization via multiframe reordering reﬁne- ment,” in Image Processing (ICIP), 2016 IEEE International Conference on, pp. 4017–4021, IEEE, 2016. 2

work page 2016

[42] [42]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. 3

work page internal anchor Pith review Pith/arXiv arXiv 2014

[43] [43]

Non-local Neural Networks

X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” arXiv preprint arXiv:1711.07971, vol. 10,

work page internal anchor Pith review Pith/arXiv arXiv

[44] [44]

Perceptual losses for real-time style transfer and super-resolution,

J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision , pp. 694–711, Springer,

work page

[45] [45]

The Contextual Loss for Image Transformation with Non-Aligned Data

R. Mechrez, I. Talmi, and L. Zelnik-Manor, “The contextual loss for image transformation with non-aligned data,” arXiv preprint arXiv:1803.02077, 2018. 4

work page internal anchor Pith review Pith/arXiv arXiv 2018

[46] [46]

Edge- preserving decompositions for multi-scale tone and detail manipulation,

Z. Farbman, R. Fattal, D. Lischinski, and R. Szeliski, “Edge- preserving decompositions for multi-scale tone and detail manipulation,” in ACM Transactions on Graphics (TOG) , vol. 27, p. 67, ACM, 2008. 4

work page 2008

[47] [47]

The relativistic discriminator: a key element missing from standard GAN

A. Jolicoeur-Martineau, “The relativistic discriminator: a key element missing from standard gan,” arXiv preprint arXiv:1807.00734, 2018. 5

work page internal anchor Pith review Pith/arXiv arXiv 2018

[48] [48]

Coherent online video style transfer,

D. Chen, J. Liao, L. Yuan, N. Yu, and G. Hua, “Coherent online video style transfer,” in Proceedings of the IEEE In- ternational Conference on Computer Vision, pp. 1105–1114,

work page

[49] [49]

Self-Attention Generative Adversarial Networks

H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self- attention generative adversarial networks,” arXiv preprint arXiv:1805.08318, 2018. 5

work page internal anchor Pith review Pith/arXiv arXiv 2018

[50] [50]

Spectral Normalization for Generative Adversarial Networks

T. Miyato, T. Kataoka, M. Koyama, and Y . Yoshida, “Spec- tral normalization for generative adversarial networks,” arXiv preprint arXiv:1802.05957, 2018. 5

work page internal anchor Pith review Pith/arXiv arXiv 2018

[51] [51]

“Videvo.” https://www.videvo.net/. 5

work page

[52] [52]

Actions in con- text,

M. Marszałek, I. Laptev, and C. Schmid, “Actions in con- text,” in IEEE Conference on Computer Vision & Pattern Recognition, 2009. 5

work page 2009

[53] [53]

Flownet 2.0: Evolution of optical ﬂow estimation with deep networks,

E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical ﬂow estimation with deep networks,” inIEEE conference on computer vision and pattern recognition (CVPR), vol. 2, p. 6, 2017. 5

work page 2017

[54] [54]

Artistic style trans- fer for videos,

M. Ruder, A. Dosovitskiy, and T. Brox, “Artistic style trans- fer for videos,” in German Conference on Pattern Recogni- tion, pp. 26–36, Springer, 2016. 5

work page 2016

[55] [55]

Gans trained by a two time-scale update rule converge to a local nash equilibrium,

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” inAdvances in Neural Information Processing Systems, pp. 6626–6637, 2017. 6

work page 2017

[56] [56]

Measuring colorfulness in natural images,

D. Hasler and S. E. Suesstrunk, “Measuring colorfulness in natural images,” in Human vision and electronic imaging VIII, vol. 5007, pp. 87–96, International Society for Optics and Photonics, 2003. 6

work page 2003

[57] [57]

A benchmark dataset and evaluation methodology for video object segmentation,

F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung, “A benchmark dataset and evaluation methodology for video object segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732, 2016. 8 Appendix A. Details of network architecture The overall network consists of two sub-m...

work page 2016