arxiv: 1606.04671 · v4 · submitted 2016-06-15 · 💻 cs.LG

Recognition: 2 theorem links

Progressive Neural Networks

Andrei A. Rusu, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Neil C. Rabinowitz, Raia Hadsell, Razvan Pascanu

Pith reviewed 2026-05-12 16:07 UTC · model grok-4.3

classification 💻 cs.LG

keywords progressive neural networkscatastrophic forgettingtransfer learningreinforcement learningAtari gameslifelong learningneural network columnslateral connections

0 comments

The pith

Progressive neural networks learn sequences of tasks without forgetting by adding task-specific columns with lateral connections to prior features.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes progressive neural networks to address learning multiple tasks in sequence while avoiding the loss of earlier knowledge. Each new task receives its own column of layers that connects laterally to all previous columns, permitting reuse of useful features without overwriting old ones. This architecture is claimed to be immune to catastrophic forgetting unlike standard neural network training. The authors test it on reinforcement learning problems including Atari games and 3D mazes, reporting better results than pretraining or finetuning approaches. A sensitivity analysis indicates that beneficial transfer happens at both low-level sensory and high-level control layers.

Core claim

Progressive networks are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features, outperforming common baselines based on pretraining and finetuning across a wide variety of reinforcement learning tasks in Atari and 3D maze games.

What carries the argument

The progressive network architecture consisting of task-specific columns linked by lateral connections to features in all earlier columns.

If this is right

The network can accumulate skills across a sequence of tasks without interference between them.
Transfer of knowledge occurs at both low-level sensory features and high-level control policies.
The approach outperforms standard pretraining and finetuning on Atari games and 3D navigation tasks.
A sensitivity measure confirms the locations of useful feature reuse within the policy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This column-based design may extend to domains outside reinforcement learning where tasks arrive over time.
It could reduce the need to restart training from scratch when environments or goals change gradually.
Scaling the number of columns might eventually require mechanisms to manage computational cost.

Load-bearing premise

Lateral connections between columns will reliably produce positive transfer across tasks without introducing harmful interference.

What would settle it

If progressive networks exhibit significant forgetting of prior tasks or underperform fine-tuning on a sequence of reinforcement learning tasks, the central claim would be falsified.

read the original abstract

Learning to solve complex sequences of tasks--while both leveraging transfer and avoiding catastrophic forgetting--remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this architecture extensively on a wide variety of reinforcement learning tasks (Atari and 3D maze games), and show that it outperforms common baselines based on pretraining and finetuning. Using a novel sensitivity measure, we demonstrate that transfer occurs at both low-level sensory and high-level control layers of the learned policy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Progressive networks add columns per task with lateral connections to avoid forgetting and beat pretrain/finetune baselines on Atari and mazes, but gains may partly trace to extra capacity rather than transfer.

read the letter

Progressive networks add a new column for each task in a sequence, freeze the previous columns to prevent forgetting, and use lateral connections to pull in features from earlier tasks. The paper reports that this setup outperforms pretraining and finetuning on a range of Atari games and 3D navigation tasks, while a sensitivity measure shows the lateral links carry useful information at both early and late layers. The architecture itself is the main contribution. It gives a simple way to grow the network as new tasks arrive without retraining everything or risking interference with old skills. The experiments are extensive for the setting, covering multiple games and including the sensitivity analysis to locate the transfer. That part is useful because it moves beyond just showing better scores to indicating how the mechanism works. The results look reliable on the tasks they chose. The no-forgetting property follows directly from freezing the old columns, and the performance edge over baselines is presented with enough detail to be convincing. The softer part is separating the effect of the lateral connections from the simple fact of having more parameters. Each new column adds a lot of capacity, and the comparison methods keep the network size fixed across tasks. Without an experiment that adds equivalent parameters but without the structured lateral connections, or a version where the connections are present but randomized, it's hard to know how much of the win comes from transfer versus just having room to learn the new task from scratch. The sensitivity numbers help by showing nonzero influence from prior columns, but they don't fully address the capacity confound. This matches the stress-test concern. Overall, this is a paper for researchers focused on continual or lifelong reinforcement learning. It provides a workable baseline architecture and some evidence that transfer can be made reliable in practice. I would take it to a reading group to discuss how to tighten the controls in follow-up work. It should go through peer review because the idea is clear, the problem is important, and the empirical support is on standard benchmarks even if some questions remain about the exact source of the gains.

Referee Report

2 major / 2 minor

Summary. The paper introduces progressive neural networks for continual learning in RL: a new column is added per task, prior columns are frozen to prevent forgetting, and lateral connections from previous columns to the new one enable transfer of features. The architecture is evaluated on Atari games and 3D maze navigation tasks, with claims of outperformance over pretraining and finetuning baselines plus a sensitivity analysis showing transfer at sensory and control layers.

Significance. If the performance gains can be shown to arise from the lateral transfer mechanism rather than capacity scaling, the approach offers a concrete, scalable architecture for avoiding catastrophic forgetting while reusing knowledge across tasks. This would be a useful contribution to multi-task and lifelong RL, with the sensitivity measure providing a starting point for analyzing where transfer occurs.

major comments (2)

[Evaluation] Evaluation section: the central claim that lateral connections enable positive transfer (and thus outperformance) is not isolated from the fact that total model capacity grows linearly with the number of tasks. No capacity-matched baseline (e.g., a single larger network with equivalent total parameters) or lateral-connection ablation is reported, so the evidence that gains are due to transfer rather than extra parameters remains indirect.
[Sensitivity analysis] The sensitivity measure is introduced to quantify cross-column influence, but without reported numerical values, error bars, or controls for task difficulty, it is unclear how strongly it supports the claim of transfer at both low- and high-level layers.

minor comments (2)

[Abstract] The abstract states outperformance but supplies no quantitative metrics, task counts, or statistical details; these should be summarized with key numbers and error bars for readers.
[Methods] Notation for the lateral connections and column indexing could be clarified with a single diagram or equation set early in the methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, acknowledging where the concerns are valid and indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the central claim that lateral connections enable positive transfer (and thus outperformance) is not isolated from the fact that total model capacity grows linearly with the number of tasks. No capacity-matched baseline (e.g., a single larger network with equivalent total parameters) or lateral-connection ablation is reported, so the evidence that gains are due to transfer rather than extra parameters remains indirect.

Authors: We agree that the current evaluation does not fully isolate the contribution of lateral connections from the increase in total model capacity, as progressive networks add new columns (and thus parameters) for each task. The pretraining and finetuning baselines use fixed-capacity networks equivalent to a single column, which is the standard comparison in this setting, but a capacity-matched single-network baseline would indeed provide stronger evidence. We will add a dedicated discussion of this limitation in the revised manuscript and include an ablation or capacity-matched comparison where feasible with existing compute resources. This revision will clarify the role of the lateral transfer mechanism while preserving the core result that the architecture avoids catastrophic forgetting. revision: partial
Referee: [Sensitivity analysis] The sensitivity measure is introduced to quantify cross-column influence, but without reported numerical values, error bars, or controls for task difficulty, it is unclear how strongly it supports the claim of transfer at both low- and high-level layers.

Authors: The sensitivity analysis is presented via figures in the manuscript showing relative influence across layers. To address this, we will revise the relevant section to explicitly report the numerical sensitivity values, include error bars from multiple runs, and add a brief discussion of how task difficulty was accounted for in the analysis. These additions will provide quantitative support for the observation that transfer occurs at both sensory and control layers. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture evaluated on external benchmarks

full rationale

The paper introduces progressive neural networks as an architecture for continual RL, with lateral connections for transfer and frozen columns to prevent forgetting. It reports performance on Atari and 3D maze tasks against pretraining/finetuning baselines, plus a sensitivity analysis for transfer. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text or abstract. The central claims rest on external empirical comparisons rather than internal definitions or tautological reductions, satisfying the self-contained benchmark criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract contains no mathematical derivations, free parameters, or explicit axioms; the architecture is presented conceptually without formal assumptions listed.

pith-pipeline@v0.9.0 · 5418 in / 1021 out tokens · 68591 ms · 2026-05-12T16:07:34.759637+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

HierarchyEmergence hierarchy_emergence_forces_phi contradicts
the addition of new capacity alongside pretrained networks gives these models the flexibility to both reuse old computations and learn new ones

Forward citations

Cited by 33 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ReConText3D: Replay-based Continual Text-to-3D Generation
cs.CV 2026-04 conditional novelty 8.0

ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
cs.AI 2023-06 conditional novelty 8.0

LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
KAN-CL: Per-Knot Importance Regularization for Continual Learning with Kolmogorov-Arnold Networks
cs.LG 2026-05 conditional novelty 7.0

KAN-CL cuts catastrophic forgetting by 88-93% on Split-CIFAR-10/5T and Split-CIFAR-100/10T by anchoring KAN parameters at per-knot granularity while matching baseline accuracy.
MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound
cs.LG 2026-05 unverdicted novelty 7.0

MIST fixes unreliable splits in streaming decision trees for class-incremental learning by using a K-independent McDiarmid bound on Gini impurity, Bayesian moment projection for knowledge transfer, and KLL quantile sk...
Dynamic Full-body Motion Agent with Object Interaction via Blending Pre-trained Modular Controllers
cs.CV 2026-05 unverdicted novelty 7.0

A two-stage framework augments HOI data with dynamic priors and blends pre-trained dynamic motion and static interaction agents via a composer network to enable long-term dynamic human-object interactions with higher ...
Beyond Forgetting in Continual Medical Image Segmentation: A Comprehensive Benchmark Study
cs.CV 2026-05 unverdicted novelty 7.0

Benchmark experiments in continual medical image segmentation reveal that no single method satisfies all clinical requirements, with replay-based approaches offering the best stability-plasticity trade-off while forwa...
Continual Learning for fMRI-Based Brain Disorder Diagnosis via Functional Connectivity Matrices Generative Replay
q-bio.TO 2026-04 conditional novelty 7.0

A structure-aware VAE generates realistic FC matrices for replay, combined with multi-level knowledge distillation and hierarchical contextual bandit sampling, to enable continual fMRI-based brain disorder diagnosis a...
EMBER: Autonomous Cognitive Behaviour from Learned Spiking Neural Network Dynamics in a Hybrid LLM Architecture
cs.AI 2026-04 unverdicted novelty 7.0

A hybrid SNN-LLM system uses learned spiking dynamics and lateral STDP propagation to trigger LLM actions without external prompts, producing the first autonomous action after 7 exchanges from a clean start.
SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 7.0

SafeAdapt certifies a Rashomon set of safe policies from demonstration data and projects updates from arbitrary RL algorithms onto it to guarantee preservation of safety on source tasks.
A Generalist Agent
cs.AI 2022-05 accept novelty 7.0

Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.
Dota 2 with Large Scale Deep Reinforcement Learning
cs.LG 2019-12 accept novelty 7.0

OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.
DIMoE-Adapters: Dynamic Expert Evolution for Continual Learning in Vision-Language Models
cs.CV 2026-05 unverdicted novelty 6.0

DIMoE-Adapters uses self-calibrated expert evolution and prototype-guided selection to dynamically grow and allocate experts, outperforming prior continual learning methods on vision-language models.
Shortcut Solutions Learned by Transformers Impair Continual Compositional Reasoning
cs.LG 2026-05 unverdicted novelty 6.0

BERT learns shortcut solutions that impair generalization and forward transfer in continual LEGO, while ALBERT learns loop-like solutions for better performance, yet both fail at cross-experience composition, with ALB...
MILE: Mixture of Incremental LoRA Experts for Continual Semantic Segmentation across Domains and Modalities
cs.CV 2026-05 unverdicted novelty 6.0

MILE combines incremental LoRA experts with prototype-guided gating to support continual semantic segmentation across domains and modalities while adding only a small number of parameters per task.
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting
cs.LG 2026-05 unverdicted novelty 6.0

Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
NORACL: Neurogenesis for Oracle-free Resource-Adaptive Continual Learning
cs.LG 2026-04 unverdicted novelty 6.0

NORACL dynamically grows network capacity via neurogenesis-inspired signals to achieve oracle-level continual learning performance without pre-specifying architecture size.
Cortex-Inspired Continual Learning: Unsupervised Instantiation and Recovery of Functional Task Networks
cs.LG 2026-04 unverdicted novelty 6.0

FTN achieves near-zero forgetting on continual learning benchmarks by isolating task subnetworks via self-organizing binary masks generated through gradient descent, smoothing, and k-winner-take-all.
Learning Without Losing Identity: Capability Evolution for Embodied Agents
cs.RO 2026-04 unverdicted novelty 6.0

Embodied agents maintain a persistent identity while evolving capabilities via modular ECMs, raising simulated task success from 32.4% to 91.3% over 20 iterations with zero policy drift or safety violations.
Information as Structural Alignment: A Dynamical Theory of Continual Learning
cs.LG 2026-04 unverdicted novelty 6.0

IBF achieves near-zero forgetting and positive backward transfer in continual learning by driving configurations toward coherence through motion and modification dynamics without storing raw data.
When Modalities Remember: Continual Learning for Multimodal Knowledge Graphs
cs.CL 2026-04 unverdicted novelty 6.0

MRCKG combines a multimodal-structural curriculum, cross-modal preservation, and contrastive replay to let multimodal knowledge graphs learn new entities and relations over time without catastrophic forgetting.
Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments
cs.LG 2026-04 unverdicted novelty 6.0

AMC models memory consolidation via a Liquid-Glass-Crystal process governed by an SDE with proven convergence to a Beta distribution, yielding 34-43% better forward transfer and 67-80% less forgetting on standard cont...
FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning
cs.LG 2026-05 unverdicted novelty 5.0

FLAME is an MoE architecture using modality-specific routers and low-rank compression of expert knowledge to support efficient continual multimodal multi-task learning while reducing catastrophic forgetting.
Learning Material-Aware Hamiltonian Risk Fields for Safe Navigation
cs.LG 2026-05 unverdicted novelty 5.0

A learned context-energy term in port-Hamiltonian policies creates selective risk navigation that activates evasive forces only when safer paths are available.
A Domain Incremental Continual Learning Benchmark for ICU Time Series Model Transportability
cs.LG 2026-05 unverdicted novelty 5.0

Proposes a domain incremental continual learning benchmark for ICU time series model transportability across US regions and evaluates data replay and EWC methods.
Task Switching Without Forgetting via Proximal Decoupling
cs.LG 2026-04 unverdicted novelty 5.0

Operator splitting separates task optimization from proximal stability enforcement to achieve forgetting-free continual learning with SOTA benchmark results.
Failure Ontology: A Lifelong Learning Framework for Blind Spot Detection and Resilience Design
cs.AI 2026-04 unverdicted novelty 5.0

Failure Ontology offers a four-type taxonomy of blind spots, five failure patterns, and a theorem claiming failure-based learning is more sample-efficient than success-based learning under limited data.
Neural Computers
cs.LG 2026-04 unverdicted novelty 5.0

Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives f...
Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
cs.RO 2026-05 unverdicted novelty 4.0

Enhanced EWC for LVLMs cuts forgetting rates by 78% versus naive training and keeps visual-textual alignment with 15% extra compute.
Revitalizing the Beginning: Avoiding Storage Dependency for Model Merging in Continual Learning
cs.LG 2026-05 unverdicted novelty 4.0

The paper proposes Trajectory Regularized Merging (TRM) to enable storage-free model merging in continual learning by optimizing in an augmented trajectory subspace with task alignment, prediction consistency, and gra...
MPCS: Neuroplastic Continual Learning via Multi-Component Plasticity and Topology-Aware EWC
cs.LG 2026-05 unverdicted novelty 4.0

MPCS integrates eleven plasticity mechanisms and reaches a Normalized Efficiency Score of 94.2 on a 31-task benchmark, with ablations showing that removing EWC and Hebbian updates yields higher performance at lower cost.
Self-Distillation as a Performance Recovery Mechanism for LLMs: Counteracting Compression and Catastrophic Forgetting
cs.LG 2026-04 unverdicted novelty 4.0

Self-distillation fine-tuning recovers LLM capabilities by aligning the student's high-dimensional hidden-layer manifold with the teacher's, as quantified by CKA correlation with performance gains.
Multi-Faceted Continual Knowledge Graph Embedding for Semantic-Aware Link Prediction
cs.IR 2026-04 unverdicted novelty 4.0

MF-CKGE separates temporal old and new knowledge into distinct embedding spaces with semantic decoupling and adaptive importance scoring to improve continual link prediction.
Face-D(^2)CL: Multi-Domain Synergistic Representation with Dual Continual Learning for Facial DeepFake Detection
cs.CV 2026-04 unverdicted novelty 4.0

Face-D²CL fuses spatial and frequency features and uses dual continual learning to reduce forgetting while adapting to new DeepFakes, cutting average error rates by 60.7% and raising unseen-domain AUC by 7.9% over prior SOTA.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 33 Pith papers · 1 internal anchor

[1]

Adaptive multi-column deep neural networks with application to robust image denoising

Forest Agostinelli, Michael R Anderson, and Honglak Lee. Adaptive multi-column deep neural networks with application to robust image denoising. In Advances in Neural Information Processing Systems, 2013

work page 2013
[2]

Natural gradient works efﬁciently in learning

Shun-ichi Amari. Natural gradient works efﬁciently in learning. Neural Computation, 1998

work page 1998
[3]

M. G. Bellemare, Y . Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artiﬁcial Intelligence Research (JAIR), 47:253–279, 2013

work page 2013
[4]

Deep learning of representations for unsupervised and transfer learning

Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In JMLR: Workshop on Unsupervised and Transfer Learning, 2012

work page 2012
[5]

Ciresan, Ueli Meier, and Jürgen Schmidhuber

Dan C. Ciresan, Ueli Meier, and Jürgen Schmidhuber. Multi-column deep neural networks for image classiﬁcation. In Conf. on Computer Vision and Pattern Recognition, 2012

work page 2012
[6]

Fahlman and Christian Lebiere

Scott E. Fahlman and Christian Lebiere. The cascade-correlation learning architecture. In Advances in Neural Information Processing Systems, 1990

work page 1990
[7]

G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, July 2006

work page 2006
[8]

Distilling the Knowledge in a Neural Network

Goeff Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[9]

Denker, and Sara A

Yann LeCun, John S. Denker, and Sara A. Solla. Optimal brain damage. InAdvances in Neural Information Processing Systems, 1990

work page 1990
[10]

Network in network

Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. In Proc. of Int’l Conference on Learning Representations (ICLR), 2013

work page 2013
[11]

Mesnil, Y

G. Mesnil, Y . Dauphin, X. Glorot, S. Rifai, Y . Bengio, I. Goodfellow, E. Lavoie, X. Muller, G. Desjardins, D. Warde-Farley, P. Vincent, A. Courville, and J. Bergstra. Unsupervised and transfer learning challenge: a deep learning approach. In JMLR W& CP: Proc. of the Unsupervised and Transfer Learning challenge and workshop, volume 27, 2012

work page 2012
[12]

Mnih, Kk Kavukcuoglu, D

V . Mnih, Kk Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015

work page 2015
[13]

Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu

V olodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In Int’l Conf. on Machine Learning (ICML), 2016

work page 2016
[14]

Actor-mimic: Deep multitask and transfer reinforcement learning

Emilio Parisotto, Lei Jimmy Ba, and Ruslan Salakhutdinov. Actor-mimic: Deep multitask and transfer reinforcement learning. In Proc. of Int’l Conference on Learning Representations (ICLR), 2016

work page 2016
[15]

Mark B. Ring. Continual Learning in Reinforcement Environments. R. Oldenbourg Verlag, 1995

work page 1995
[16]

Beyond sharing weights for deep domain adaptation

Artem Rozantsev, Mathieu Salzmann, and Pascal Fua. Beyond sharing weights for deep domain adaptation. CoRR, abs/1603.06432, 2016

work page arXiv 2016
[17]

A. Rusu, S. Colmenarejo, Ç. Gülçehre, G. Desjardins, J. Kirkpatrick, R. Pascanu, V . Mnih, K. Kavukcuoglu, and R. Hadsell. Policy distillation. abs/1511.06295, 2016

work page Pith review arXiv 2016
[18]

Ella: An efﬁcient lifelong learning algorithm

Paul Ruvolo and Eric Eaton. Ella: An efﬁcient lifelong learning algorithm. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), June 2013

work page 2013
[19]

Silver, Qiang Yang, and Lianghao Li

Daniel L. Silver, Qiang Yang, and Lianghao Li. Lifelong machine learning systems: Beyond learning algorithms. In AAAI Spring Symposium: Lifelong Machine Learning, 2013

work page 2013
[20]

Taylor and Peter Stone

Matthew E. Taylor and Peter Stone. An introduction to inter-task transfer for reinforcement learning. AI Magazine, 32(1):15–34, 2011

work page 2011
[21]

Terekhov, Guglielmo Montone, and J

Alexander V . Terekhov, Guglielmo Montone, and J. Kevin O’Regan. Knowledge Transfer in Deep Block-Modular Neural Networks, pages 268–279. Springer International Publishing, Cham, 2015

work page 2015
[22]

Tessler, S

C. Tessler, S. Givony, T. Zahavy, D. J. Mankowitz, and S. Mannor. A Deep Hierarchical Approach to Lifelong Learning in Minecraft. ArXiv e-prints, 2016

work page 2016
[23]

How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems, pages 3320–3328, 2014

Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems, pages 3320–3328, 2014

work page 2014
[24]

Online incremental feature learning with denoising autoencoders

Guanyu Zhou, Kihyuk Sohn, and Honglak Lee. Online incremental feature learning with denoising autoencoders. In Proc. of Int’l Conf. on Artiﬁcial Intelligence and Statistics (AISTATS), pages 1453–1461, 2012. 9 Supplementary Material A Perturbation Analysis We explored two related methods for analysing transfer in progressive networks. One based on Fisher i...

work page 2012
[25]

(b-c) Comparison of per-layer sensitivities obtained using the APS method (b) and the AFS method (c; as per main text)

Grey line determines critical noise magnitude for each representation, σ2 i . (b-c) Comparison of per-layer sensitivities obtained using the APS method (b) and the AFS method (c; as per main text). These are highly similar. DeﬁneΛ(k) i = 1/σ2(k) i as the precision of the noise injected at layeri of columnk, which results in a 50% drop in performance. The ...

work page 2000