Efficient Multi-Domain Network Learning by Covariance Normalization

Nuno Vasconcelos; Yunsheng Li

arxiv: 1906.10267 · v1 · pith:LMRMFPZInew · submitted 2019-06-24 · 💻 cs.CV

Efficient Multi-Domain Network Learning by Covariance Normalization

Yunsheng Li , Nuno Vasconcelos This is my paper

Pith reviewed 2026-05-25 17:06 UTC · model grok-4.3

classification 💻 cs.CV

keywords multi-domain learningcovariance normalizationdeep networksparameter efficiencydomain adaptationprincipal component analysisnetwork adaptationcovariance

0 comments

The pith

Covariance normalization enables deep networks to adapt to multiple domains with performance matching full fine-tuning while using only 0.13 percent of the parameters per domain.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes covariance normalization, called CovNorm, to create a lightweight adaptive layer for each target domain in multi-domain deep network learning. The procedure consists of two principal component analyses on covariances followed by fine-tuning a small adaptation layer. It claims advantages over batch normalization and geometric matrix approximations in both theory and experiments. The approach supports target domains presented either sequentially or all at once. A reader would care because it points to a route for handling many domains without retraining entire networks each time.

Core claim

CovNorm is a data driven method of fairly simple implementation, requiring two principal component analyzes (PCA) and fine-tuning of a mini-adaptation layer. It is shown, both theoretically and experimentally, to have several advantages over previous approaches, such as batch normalization or geometric matrix approximations. Furthermore, CovNorm can be deployed both when target datasets are available sequentially or simultaneously. Experiments show that, in both cases, it has performance comparable to a fully fine-tuned network, using as few as 0.13% of the corresponding parameters per target domain.

What carries the argument

Covariance normalization (CovNorm), a data-driven procedure that reduces parameters in per-domain adaptive layers via two PCAs on covariances plus mini-layer fine-tuning.

If this is right

Performance comparable to a fully fine-tuned network on target domains.
Advantages over batch normalization and geometric matrix approximations.
Deployment possible whether target datasets arrive sequentially or simultaneously.
Only two PCAs and fine-tuning of a mini-adaptation layer required per domain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The covariance-focused adaptation might apply to other settings where domain shifts are captured by second-order statistics rather than means alone.
Resource savings could allow a single base network to serve dozens of domains in embedded or edge deployments without proportional memory growth.
Sequential deployment suggests a path to continual learning where new domains are added without revisiting prior ones.
The mini-adaptation layer might be further compressed if the PCA step already extracts most domain variation.

Load-bearing premise

That performing two PCAs on covariances plus fine-tuning a mini-adaptation layer is sufficient to capture domain-specific adaptations without substantial performance loss.

What would settle it

A controlled multi-domain experiment in which a network using CovNorm achieves accuracy more than a few percent below that of a fully fine-tuned counterpart on the same target domains.

Figures

Figures reproduced from arXiv: 1906.10267 by Nuno Vasconcelos, Yunsheng Li.

**Figure 2.** Figure 2: Covariance normalization. Each adaptation layer [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: a) original network, b) after fine-tuning, and c) with adaptation layer [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Top: covnorm approximates adaptation layer [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 4.** Figure 4: When kx > ky, Mx,yW˜ x has dimension ky × d and replacing the two matrices by their product reduces the total parameter count to 2dky. In this case, we say that Mx,y is absorbed into W˜ x. Conversely, if kx < ky, Mx,y can be absorbed into C˜ y. Hence, the total parameter count is 2d min(kx, ky). CovNorm is summarized in Algorithm 1. 3.6. The importance of covariance normalization The benefits of covariance… view at source ↗

**Figure 6.** Figure 6: accuracy vs. % of parameters used for adaptation. Left: MITIn [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Variance explained by eigenvalues of a layer input and output, and similar plot for singular values. Left: MITIndoor. Right: CIFAR100. ImNet Airc C100 DPed DTD GTSR Flwr OGlt SVHN UCF avg acc S #par RA [34] 59.67% 61.87% 81.20% 93.88% 57.13% 97.57% 81.67% 89.62% 96.13% 50.12% 76.89% 2621 2 DAN [39] 57.74% 64.12% 80.07% 91.3% 56.54% 98.46% 86.05% 89.67% 96.77% 49.38% 77.01% 2851 2.17 Piggyback [27] 57.69% … view at source ↗

read the original abstract

The problem of multi-domain learning of deep networks is considered. An adaptive layer is induced per target domain and a novel procedure, denoted covariance normalization (CovNorm), proposed to reduce its parameters. CovNorm is a data driven method of fairly simple implementation, requiring two principal component analyzes (PCA) and fine-tuning of a mini-adaptation layer. Nevertheless, it is shown, both theoretically and experimentally, to have several advantages over previous approaches, such as batch normalization or geometric matrix approximations. Furthermore, CovNorm can be deployed both when target datasets are available sequentially or simultaneously. Experiments show that, in both cases, it has performance comparable to a fully fine-tuned network, using as few as 0.13% of the corresponding parameters per target domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CovNorm runs two PCAs on covariances then fine-tunes a mini-layer to drop per-domain parameters to 0.13% while matching full fine-tuning accuracy on the reported tests.

read the letter

CovNorm is a parameter-reduction trick for multi-domain deep networks that runs two PCAs on covariances and then fine-tunes a small adaptation layer, and the experiments indicate it reaches accuracy close to full per-domain fine-tuning with only 0.13% of the parameters. The new part is the specific CovNorm procedure that combines PCA normalization with the mini-layer, and it shows advantages over batch norm and matrix approximations both theoretically and in tests. It also handles sequential or simultaneous domain data. The paper does a decent job laying out the method and running the comparisons on what seem to be standard vision benchmarks. The main concern is whether the PCA steps reliably keep the domain-specific signal. The stress test note flags this correctly: if the principal components miss directions important for the task, the mini-layer may not compensate, and the performance claim would not hold. The paper needs to show that the retained components are sufficient under its assumptions, and the experiments should include cases where domain shifts are more pronounced to test the boundary. Overall, this is a practical contribution for anyone trying to adapt a network to many domains without blowing up the parameter count. Readers working on efficient transfer or multi-task learning would find it useful. It looks solid enough on the surface to go to peer review, though the theory-experiment link on the PCA retention should be checked carefully.

Referee Report

2 major / 0 minor

Summary. The paper proposes CovNorm, a covariance normalization method for multi-domain learning of deep networks. An adaptive layer is induced per target domain, with parameters reduced via two PCAs on covariances followed by fine-tuning a mini-adaptation layer. The method is claimed to offer theoretical and experimental advantages over batch normalization and geometric matrix approximations, and to achieve performance comparable to fully fine-tuned networks using as few as 0.13% of the parameters per target domain, whether target datasets are available sequentially or simultaneously.

Significance. If the central claims hold, the work provides a practical, low-parameter approach to multi-domain adaptation that could benefit resource-limited computer vision applications. The data-driven use of standard PCA operations and support for both sequential and simultaneous deployment modes are practical strengths. However, the efficiency and performance-comparability results rest on the unexamined assumption that the two-PCA procedure plus mini-layer fine-tuning captures domain-specific adaptations without substantial loss relative to full per-domain fine-tuning.

major comments (2)

[Abstract] Abstract: the claim of theoretical support for advantages over batch normalization and geometric approximations, and of performance comparable to full fine-tuning at 0.13% parameters, cannot be assessed without the derivations and experimental controls; the load-bearing assumption that two PCAs preserve task-relevant domain-specific directions is not shown to hold under the paper's modeling assumptions.
[Theoretical and experimental sections] The weakest assumption (that two PCAs on covariances plus mini-layer fine-tuning suffice to capture domain-specific adaptations without substantial performance loss) is load-bearing for both the efficiency argument and the claimed advantages; if the covariance estimate is dominated by shared variance, the retained components may discard directions that matter for the downstream task, undermining the performance-comparability result even if the implementation is correct.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and the opportunity to clarify the theoretical and empirical foundations of CovNorm. We address the major comments below, pointing to the relevant sections of the manuscript. We maintain that the derivations and controls are present, but we are prepared to expand explanations if needed for clarity.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of theoretical support for advantages over batch normalization and geometric approximations, and of performance comparable to full fine-tuning at 0.13% parameters, cannot be assessed without the derivations and experimental controls; the load-bearing assumption that two PCAs preserve task-relevant domain-specific directions is not shown to hold under the paper's modeling assumptions.

Authors: Section 3 derives the advantages of CovNorm over batch normalization (by showing how covariance normalization decouples domain-specific scaling from shared statistics) and over geometric matrix approximations (by demonstrating lower computational complexity while retaining equivalent expressivity under the low-rank covariance model). The 0.13% parameter claim is directly supported by the experimental controls in Section 4, where we compare against full fine-tuning across sequential and simultaneous deployment modes on standard multi-domain benchmarks. On the two-PCA assumption, the modeling in Section 2 posits that domain adaptations manifest as perturbations in the covariance eigenspace; the first PCA extracts the principal shared directions and the second isolates the residual domain-specific subspace, with the mini-adaptation layer fine-tuned to recover any task-relevant components. While a worst-case guarantee that every task direction is retained would require stronger assumptions on the data distribution, the paper's empirical results (near-parity with full fine-tuning) indicate that the retained components suffice in practice. revision: no
Referee: [Theoretical and experimental sections] The weakest assumption (that two PCAs on covariances plus mini-layer fine-tuning suffice to capture domain-specific adaptations without substantial performance loss) is load-bearing for both the efficiency argument and the claimed advantages; if the covariance estimate is dominated by shared variance, the retained components may discard directions that matter for the downstream task, undermining the performance-comparability result even if the implementation is correct.

Authors: We agree that this is a central modeling choice. Section 2 explicitly models the covariance as a sum of shared and domain-specific terms, with the two-PCA procedure constructed to separate them; the subsequent mini-layer is then optimized end-to-end on the target task, which empirically recovers any directions that the PCA truncation might have attenuated. The experiments in Section 4 include ablation studies varying the number of retained components and report that performance remains comparable to full fine-tuning even when the shared variance dominates the initial covariance estimate. If the referee has a specific counter-example dataset or metric where this fails, we would be happy to include it; otherwise the current controls already address the concern. revision: partial

Circularity Check

0 steps flagged

No circularity: CovNorm uses standard PCA and fine-tuning without reduction to inputs by construction

full rationale

The paper presents CovNorm as a data-driven procedure consisting of two PCAs plus mini-layer fine-tuning, with advantages shown via theory and experiments over batch norm or geometric approximations. No self-definitional steps, no fitted parameters renamed as predictions, and no load-bearing self-citations appear in the provided text. The performance comparability (0.13% parameters) is an empirical claim, not forced by definition or prior author results. The derivation chain is self-contained against external benchmarks like PCA.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim rests on the effectiveness of PCA for covariance normalization and the sufficiency of a small fine-tuned layer; these are standard tools but their specific combination for this efficiency gain is the paper's addition. No invented entities are introduced.

free parameters (1)

mini-adaptation layer parameters
Parameters of the mini-adaptation layer are fine-tuned per domain and constitute the main adjustable component after the two PCAs.

axioms (1)

domain assumption Two PCAs on feature covariances suffice to normalize domain-specific statistics for effective adaptation
Invoked as the core of CovNorm to achieve parameter reduction.

pith-pipeline@v0.9.0 · 5649 in / 1156 out tokens · 38324 ms · 2026-05-25T17:06:45.461348+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 10 internal anchors

[1]

Aljundi, P

R. Aljundi, P. Chakravarty, and T. Tuytelaars. Expert gate: Lifelong learning with a network of experts. InCVPR, pages 7120–7129, 2017

work page 2017
[2]

Bilen and A

H. Bilen and A. Vedaldi. Universal representations: The missing link between faces, text, planktons, and cat breeds. arXiv preprint arXiv:1701.07275, 2017

work page arXiv 2017
[3]

Bousmalis, N

K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Kr- ishnan. Unsupervised pixel-level domain adaptation with generative adversarial networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , vol- ume 1, page 7, 2017

work page 2017
[4]

Bousmalis, G

K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan. Domain separation networks. In Advances in Neu- ral Information Processing Systems, pages 343–351, 2016

work page 2016
[5]

F. M. Carlucci, L. Porzi, B. Caputo, E. Ricci, and S. R. Bul`o. Autodial: Automatic domain alignment layers. In ICCV, pages 5077–5085, 2017

work page 2017
[6]

R. Caruana. Multitask learning. In Learning to learn, pages 95–133. Springer, 1998

work page 1998
[7]

Eigen and R

D. Eigen and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolu- tional architecture. In Proceedings of the IEEE International Conference on Computer Vision, pages 2650–2658, 2015

work page 2015
[8]

Ganin and V

Y . Ganin and V . Lempitsky. Unsupervised domain adaptation by backpropagation. International Conference in Machine Learning, 2014

work page 2014
[9]

Fast R-CNN

R. Girshick. Fast r-cnn. arXiv preprint arXiv:1504.08083, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[10]

Gkioxari, R

G. Gkioxari, R. Girshick, and J. Malik. Contextual action recognition with r* cnn. In Proceedings of the IEEE inter- national conference on computer vision , pages 1080–1088, 2015

work page 2015
[11]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio. Gen- erative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014

work page 2014
[12]

Grifﬁn, A

G. Grifﬁn, A. Holub, and P. Perona. Caltech-256 object cat- egory dataset. 2007

work page 2007
[13]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learn- ing for image recognition. In Proceedings of the IEEE con- ference on computer vision and pattern recognition , pages 770–778, 2016

work page 2016
[14]

CyCADA: Cycle-Consistent Adversarial Domain Adaptation

J. Hoffman, E. Tzeng, T. Park, J.-Y . Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell. Cycada: Cycle-consistent adver- sarial domain adaptation. arXiv preprint arXiv:1711.03213, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Huang, R

J. Huang, R. S. Feris, Q. Chen, and S. Yan. Cross-domain image retrieval with a dual attribute-aware ranking network. In Proceedings of the IEEE international conference on com- puter vision, pages 1062–1070, 2015

work page 2015
[16]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[17]

Jou and S.-F

B. Jou and S.-F. Chang. Deep cross residual learning for mul- titask visual recognition. In Proceedings of the 2016 ACM on Multimedia Conference, pages 998–1007. ACM, 2016

work page 2016
[18]

Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

A. Kendall, Y . Gal, and R. Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and seman- tics. arXiv preprint arXiv:1705.07115, 3, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

Kokkinos

I. Kokkinos. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In CVPR, volume 2, page 8, 2017

work page 2017
[20]

Krizhevsky and G

A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. 2009

work page 2009
[21]

LeCun, Y

Y . LeCun, Y . Bengio, and G. Hinton. Deep learning.nature, 521(7553):436, 2015

work page 2015
[22]

Lee, J.-H

S.-W. Lee, J.-H. Kim, J. Jun, J.-W. Ha, and B.-T. Zhang. Overcoming catastrophic forgetting by incremental moment matching. In Advances in Neural Information Processing Systems, pages 4655–4665, 2017

work page 2017
[23]

Li and D

Z. Li and D. Hoiem. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

work page 2017
[24]

M. Long, Y . Cao, J. Wang, and M. I. Jordan. Learning transferable features with deep adaptation networks. Inter- national Conference in Machine Learning, 2015

work page 2015
[25]

Y . Lu, A. Kumar, S. Zhai, Y . Cheng, T. Javidi, and R. S. Feris. Fully-adaptive feature sharing in multi-task networks with applications in person attribute classiﬁcation. In CVPR, volume 1, page 6, 2017

work page 2017
[26]

S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi. Fine-grained visual classiﬁcation of aircraft. arXiv preprint arXiv:1306.5151, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[27]

Mallya, D

A. Mallya, D. Davis, and S. Lazebnik. Piggyback: Adapt- ing a single network to multiple tasks by learning to mask weights. In Proceedings of the European Conference on Computer Vision (ECCV), pages 67–82, 2018

work page 2018
[28]

Boosting Domain Adaptation by Discovering Latent Domains

M. Mancini, L. Porzi, S. R. Bul `o, B. Caputo, and E. Ricci. Boosting domain adaptation by discovering latent domains. arXiv preprint arXiv:1805.01386, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

Misra, A

I. Misra, A. Shrivastava, A. Gupta, and M. Hebert. Cross- stitch Networks for Multi-task Learning. In CVPR, 2016

work page 2016
[30]

Morgado and N

P. Morgado and N. Vasconcelos. Semantically consistent regularization for zero-shot recognition. In CVPR, volume 9, page 10, 2017

work page 2017
[31]

Netzer, T

Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng. Reading digits in natural images with unsupervised fea- ture learning. In NIPS workshop on deep learning and unsu- pervised feature learning, volume 2011, page 5, 2011

work page 2011
[32]

Nilsback and A

M.-E. Nilsback and A. Zisserman. Automated ﬂower classi- ﬁcation over a large number of classes. In Computer Vision, Graphics & Image Processing, 2008. ICVGIP’08. Sixth In- dian Conference on, pages 722–729. IEEE, 2008

work page 2008
[33]

Ranjan, V

R. Ranjan, V . M. Patel, and R. Chellappa. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

work page 2017
[34]

Rebufﬁ, H

S.-A. Rebufﬁ, H. Bilen, and A. Vedaldi. Learning multiple visual domains with residual adapters. InAdvances in Neural Information Processing Systems, pages 506–516, 2017

work page 2017
[35]

Efficient parametrization of multi-domain deep neural networks

S.-A. Rebufﬁ, H. Bilen, and A. Vedaldi. Efﬁcient parametrization of multi-domain deep neural networks. arXiv preprint arXiv:1803.10082, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[36]

Rebufﬁ, A

S.-A. Rebufﬁ, A. Kolesnikov, G. Sperl, and C. H. Lampert. icarl: Incremental classiﬁer and representation learning. In Proc. CVPR, 2017

work page 2017
[37]

S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: to- wards real-time object detection with region proposal net- works. IEEE transactions on pattern analysis and machine intelligence, 39(6):1137–1149, 2017

work page 2017
[38]

Incremental Learning Through Deep Adaptation

A. Rosenfeld and J. K. Tsotsos. Incremental learning through deep adaptation. arXiv preprint arXiv:1705.04228, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[39]

Rosenfeld and J

A. Rosenfeld and J. K. Tsotsos. Incremental learning through deep adaptation. IEEE transactions on pattern analysis and machine intelligence, 2018

work page 2018
[40]

A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Had- sell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[41]

Shrivastava, T

A. Shrivastava, T. Pﬁster, O. Tuzel, J. Susskind, W. Wang, and R. Webb. Learning from simulated and unsupervised images through adversarial training. In CVPR, volume 2, page 5, 2017

work page 2017
[42]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[43]

Sun and K

B. Sun and K. Saenko. Deep coral: Correlation alignment for deep domain adaptation. In European Conference on Com- puter Vision, pages 443–450. Springer, 2016

work page 2016
[44]

A. R. Triki, R. Aljundi, M. B. Blaschko, and T. Tuytelaars. Encoder based lifelong learning. IEEE Conference Com- puter Vision and Pattern Recognition, 2017

work page 2017
[45]

Tzeng, J

E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition (CVPR), volume 1, page 4, 2017

work page 2017
[46]

Valenti, B

M. Valenti, B. Bethke, D. Dale, A. Frank, J. McGrew, S. Ahrens, J. P. How, and J. Vian. The mit indoor multi- vehicle ﬂight testbed. In Robotics and Automation, 2007 IEEE International Conference on, pages 2758–2759. IEEE, 2007

work page 2007
[47]

J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pages 3485–3492. IEEE, 2010

work page 2010
[48]

A. R. Zamir, A. Sax, W. Shen, L. Guibas, J. Malik, and S. Savarese. Taskonomy: Disentangling task transfer learn- ing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3712–3722, 2018

work page 2018
[49]

Zhang and Q

Y . Zhang and Q. Yang. A survey on multi-task learning. arXiv preprint arXiv:1707.08114, 2017

work page arXiv 2017
[50]

Zhang, P

Z. Zhang, P. Luo, C. C. Loy, and X. Tang. Facial landmark detection by deep multi-task learning. In European Confer- ence on Computer Vision, pages 94–108. Springer, 2014

work page 2014

[1] [1]

Aljundi, P

R. Aljundi, P. Chakravarty, and T. Tuytelaars. Expert gate: Lifelong learning with a network of experts. InCVPR, pages 7120–7129, 2017

work page 2017

[2] [2]

Bilen and A

H. Bilen and A. Vedaldi. Universal representations: The missing link between faces, text, planktons, and cat breeds. arXiv preprint arXiv:1701.07275, 2017

work page arXiv 2017

[3] [3]

Bousmalis, N

K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Kr- ishnan. Unsupervised pixel-level domain adaptation with generative adversarial networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , vol- ume 1, page 7, 2017

work page 2017

[4] [4]

Bousmalis, G

K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan. Domain separation networks. In Advances in Neu- ral Information Processing Systems, pages 343–351, 2016

work page 2016

[5] [5]

F. M. Carlucci, L. Porzi, B. Caputo, E. Ricci, and S. R. Bul`o. Autodial: Automatic domain alignment layers. In ICCV, pages 5077–5085, 2017

work page 2017

[6] [6]

R. Caruana. Multitask learning. In Learning to learn, pages 95–133. Springer, 1998

work page 1998

[7] [7]

Eigen and R

D. Eigen and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolu- tional architecture. In Proceedings of the IEEE International Conference on Computer Vision, pages 2650–2658, 2015

work page 2015

[8] [8]

Ganin and V

Y . Ganin and V . Lempitsky. Unsupervised domain adaptation by backpropagation. International Conference in Machine Learning, 2014

work page 2014

[9] [9]

Fast R-CNN

R. Girshick. Fast r-cnn. arXiv preprint arXiv:1504.08083, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[10] [10]

Gkioxari, R

G. Gkioxari, R. Girshick, and J. Malik. Contextual action recognition with r* cnn. In Proceedings of the IEEE inter- national conference on computer vision , pages 1080–1088, 2015

work page 2015

[11] [11]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio. Gen- erative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014

work page 2014

[12] [12]

Grifﬁn, A

G. Grifﬁn, A. Holub, and P. Perona. Caltech-256 object cat- egory dataset. 2007

work page 2007

[13] [13]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learn- ing for image recognition. In Proceedings of the IEEE con- ference on computer vision and pattern recognition , pages 770–778, 2016

work page 2016

[14] [14]

CyCADA: Cycle-Consistent Adversarial Domain Adaptation

J. Hoffman, E. Tzeng, T. Park, J.-Y . Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell. Cycada: Cycle-consistent adver- sarial domain adaptation. arXiv preprint arXiv:1711.03213, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Huang, R

J. Huang, R. S. Feris, Q. Chen, and S. Yan. Cross-domain image retrieval with a dual attribute-aware ranking network. In Proceedings of the IEEE international conference on com- puter vision, pages 1062–1070, 2015

work page 2015

[16] [16]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[17] [17]

Jou and S.-F

B. Jou and S.-F. Chang. Deep cross residual learning for mul- titask visual recognition. In Proceedings of the 2016 ACM on Multimedia Conference, pages 998–1007. ACM, 2016

work page 2016

[18] [18]

Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

A. Kendall, Y . Gal, and R. Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and seman- tics. arXiv preprint arXiv:1705.07115, 3, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[19] [19]

Kokkinos

I. Kokkinos. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In CVPR, volume 2, page 8, 2017

work page 2017

[20] [20]

Krizhevsky and G

A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. 2009

work page 2009

[21] [21]

LeCun, Y

Y . LeCun, Y . Bengio, and G. Hinton. Deep learning.nature, 521(7553):436, 2015

work page 2015

[22] [22]

Lee, J.-H

S.-W. Lee, J.-H. Kim, J. Jun, J.-W. Ha, and B.-T. Zhang. Overcoming catastrophic forgetting by incremental moment matching. In Advances in Neural Information Processing Systems, pages 4655–4665, 2017

work page 2017

[23] [23]

Li and D

Z. Li and D. Hoiem. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

work page 2017

[24] [24]

M. Long, Y . Cao, J. Wang, and M. I. Jordan. Learning transferable features with deep adaptation networks. Inter- national Conference in Machine Learning, 2015

work page 2015

[25] [25]

Y . Lu, A. Kumar, S. Zhai, Y . Cheng, T. Javidi, and R. S. Feris. Fully-adaptive feature sharing in multi-task networks with applications in person attribute classiﬁcation. In CVPR, volume 1, page 6, 2017

work page 2017

[26] [26]

S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi. Fine-grained visual classiﬁcation of aircraft. arXiv preprint arXiv:1306.5151, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[27] [27]

Mallya, D

A. Mallya, D. Davis, and S. Lazebnik. Piggyback: Adapt- ing a single network to multiple tasks by learning to mask weights. In Proceedings of the European Conference on Computer Vision (ECCV), pages 67–82, 2018

work page 2018

[28] [28]

Boosting Domain Adaptation by Discovering Latent Domains

M. Mancini, L. Porzi, S. R. Bul `o, B. Caputo, and E. Ricci. Boosting domain adaptation by discovering latent domains. arXiv preprint arXiv:1805.01386, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [29]

Misra, A

I. Misra, A. Shrivastava, A. Gupta, and M. Hebert. Cross- stitch Networks for Multi-task Learning. In CVPR, 2016

work page 2016

[30] [30]

Morgado and N

P. Morgado and N. Vasconcelos. Semantically consistent regularization for zero-shot recognition. In CVPR, volume 9, page 10, 2017

work page 2017

[31] [31]

Netzer, T

Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng. Reading digits in natural images with unsupervised fea- ture learning. In NIPS workshop on deep learning and unsu- pervised feature learning, volume 2011, page 5, 2011

work page 2011

[32] [32]

Nilsback and A

M.-E. Nilsback and A. Zisserman. Automated ﬂower classi- ﬁcation over a large number of classes. In Computer Vision, Graphics & Image Processing, 2008. ICVGIP’08. Sixth In- dian Conference on, pages 722–729. IEEE, 2008

work page 2008

[33] [33]

Ranjan, V

R. Ranjan, V . M. Patel, and R. Chellappa. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

work page 2017

[34] [34]

Rebufﬁ, H

S.-A. Rebufﬁ, H. Bilen, and A. Vedaldi. Learning multiple visual domains with residual adapters. InAdvances in Neural Information Processing Systems, pages 506–516, 2017

work page 2017

[35] [35]

Efficient parametrization of multi-domain deep neural networks

S.-A. Rebufﬁ, H. Bilen, and A. Vedaldi. Efﬁcient parametrization of multi-domain deep neural networks. arXiv preprint arXiv:1803.10082, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[36] [36]

Rebufﬁ, A

S.-A. Rebufﬁ, A. Kolesnikov, G. Sperl, and C. H. Lampert. icarl: Incremental classiﬁer and representation learning. In Proc. CVPR, 2017

work page 2017

[37] [37]

S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: to- wards real-time object detection with region proposal net- works. IEEE transactions on pattern analysis and machine intelligence, 39(6):1137–1149, 2017

work page 2017

[38] [38]

Incremental Learning Through Deep Adaptation

A. Rosenfeld and J. K. Tsotsos. Incremental learning through deep adaptation. arXiv preprint arXiv:1705.04228, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[39] [39]

Rosenfeld and J

A. Rosenfeld and J. K. Tsotsos. Incremental learning through deep adaptation. IEEE transactions on pattern analysis and machine intelligence, 2018

work page 2018

[40] [40]

A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Had- sell. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[41] [41]

Shrivastava, T

A. Shrivastava, T. Pﬁster, O. Tuzel, J. Susskind, W. Wang, and R. Webb. Learning from simulated and unsupervised images through adversarial training. In CVPR, volume 2, page 5, 2017

work page 2017

[42] [42]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[43] [43]

Sun and K

B. Sun and K. Saenko. Deep coral: Correlation alignment for deep domain adaptation. In European Conference on Com- puter Vision, pages 443–450. Springer, 2016

work page 2016

[44] [44]

A. R. Triki, R. Aljundi, M. B. Blaschko, and T. Tuytelaars. Encoder based lifelong learning. IEEE Conference Com- puter Vision and Pattern Recognition, 2017

work page 2017

[45] [45]

Tzeng, J

E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition (CVPR), volume 1, page 4, 2017

work page 2017

[46] [46]

Valenti, B

M. Valenti, B. Bethke, D. Dale, A. Frank, J. McGrew, S. Ahrens, J. P. How, and J. Vian. The mit indoor multi- vehicle ﬂight testbed. In Robotics and Automation, 2007 IEEE International Conference on, pages 2758–2759. IEEE, 2007

work page 2007

[47] [47]

J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pages 3485–3492. IEEE, 2010

work page 2010

[48] [48]

A. R. Zamir, A. Sax, W. Shen, L. Guibas, J. Malik, and S. Savarese. Taskonomy: Disentangling task transfer learn- ing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3712–3722, 2018

work page 2018

[49] [49]

Zhang and Q

Y . Zhang and Q. Yang. A survey on multi-task learning. arXiv preprint arXiv:1707.08114, 2017

work page arXiv 2017

[50] [50]

Zhang, P

Z. Zhang, P. Luo, C. C. Loy, and X. Tang. Facial landmark detection by deep multi-task learning. In European Confer- ence on Computer Vision, pages 94–108. Springer, 2014

work page 2014