arxiv: 1610.01644 · v4 · submitted 2016-10-05 · 📊 stat.ML · cs.LG

Recognition: 2 theorem links

Understanding intermediate layers using linear classifier probes

Guillaume Alain , Yoshua Bengio

Authors on Pith no claims yet

Pith reviewed 2026-05-11 18:25 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords linear probesintermediate layersneural networksfeature separabilityInception v3ResNet-50model interpretabilitylayer-wise analysis

0 comments

The pith

Linear probes show that feature separability increases monotonically along the depth of neural networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors propose training independent linear classifiers, called probes, on the activations from each layer of a neural network to measure how well those features support the classification task. This allows tracking the evolution of useful information as it passes through the model without changing how the network was trained. When applied to Inception v3 and ResNet-50, the probes demonstrate that accuracy rises steadily with layer depth. Such monitoring can help diagnose issues in models and clarify the function of intermediate layers.

Core claim

Training linear probes independently on each layer's activations in popular models like Inception v3 and ResNet-50 shows that the probes' classification accuracy increases monotonically with depth. This establishes that the features become progressively more linearly separable for the downstream task.

What carries the argument

The linear classifier probe: a simple linear model trained separately on a layer's activations to quantify the linear separability of features for the target classes.

If this is right

The deeper layers of the network hold features that are more readily usable by a linear classifier for the task.
Breaks in the monotonic increase of probe accuracy can indicate locations where the model may have training problems or suboptimal feature extraction.
The method provides a layer-by-layer view that can inform decisions about network architecture and where to focus debugging efforts.
Similar probes could be used to study how information is transformed in other deep learning models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the monotonic increase is general, it would support the view that depth allows for successive refinement of representations.
The approach could be used to evaluate the quality of individual layers for purposes like model pruning or transfer learning.
One might investigate whether the same pattern holds when using non-linear probes or in different domains such as natural language processing.

Load-bearing premise

The accuracy achieved by a linear probe trained on a layer's activations is a reliable indicator of how informative and useful those activations are for solving the classification problem.

What would settle it

A counterexample would be a trained neural network in which the accuracy of linear probes trained on deeper layers is lower than on shallower layers, despite the model achieving high overall performance on the task.

read the original abstract

Neural network models have a reputation for being black boxes. We propose to monitor the features at every layer of a model and measure how suitable they are for classification. We use linear classifiers, which we refer to as "probes", trained entirely independently of the model itself. This helps us better understand the roles and dynamics of the intermediate layers. We demonstrate how this can be used to develop a better intuition about models and to diagnose potential problems. We apply this technique to the popular models Inception v3 and Resnet-50. Among other things, we observe experimentally that the linear separability of features increase monotonically along the depth of the model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Linear probes give a clean, independent way to measure rising class separability layer by layer in modern CNNs like Inception and ResNet.

read the letter

The main point is that you can use simple linear probes trained independently on each layer's activations to track how class information builds up in a network. The experiments on Inception v3 and ResNet-50 show this separability rising monotonically with depth. What stands out is the clean setup: probes are trained separately, not reusing the model's weights, and results come from held-out data. This gives a repeatable diagnostic without the usual circularity problems. They apply it to two standard models and get consistent patterns, which is useful for getting intuition about what intermediate layers are doing. The soft spot is the reliance on linear separability as the measure. It might underestimate features that are useful only after nonlinear processing. The paper doesn't test alternatives like kernel probes or mutual information estimates, so the claim stays narrow to what linear classifiers can pick up. They also don't show the method leading to specific fixes or better architectures in the reported work. This is for interpretability researchers and anyone wanting to inspect their trained models layer by layer. The technique is straightforward enough to adopt. I'd recommend sending it for peer review. The experiments are solid and the idea fills a gap in practical analysis tools.

Referee Report

0 major / 3 minor

Summary. The paper proposes training linear classifiers, termed 'probes,' independently on the frozen activations of each layer in a neural network to measure how linearly separable the features are for the target classification task. Applied to Inception-v3 and ResNet-50, the central experimental result is that probe accuracy increases monotonically with network depth; the method is further shown to provide diagnostic value for understanding layer roles and identifying model issues.

Significance. The probe technique supplies a simple, reproducible diagnostic that requires no changes to the original model and yields direct empirical observations about feature evolution across depth. The monotonic separability finding on two standard architectures offers a concrete, falsifiable insight into how task-relevant information accumulates in deep networks. Strengths include the independent training protocol on held-out activations and the absence of post-hoc fitting or circular definitions, making the approach broadly applicable for interpretability studies.

minor comments (3)

Abstract: the sentence 'the linear separability of features increase monotonically' contains a subject-verb agreement error ('increase' should be 'increases').
The description of probe training (independent linear classifiers on layer activations) would benefit from an explicit statement of the loss function and optimizer used for the probes, even if standard cross-entropy and SGD are implied.
Figure captions and axis labels for the accuracy-vs-depth plots should include the number of probe training runs or error bars to convey variability in the reported monotonic trend.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review, accurate summary of the work, and recommendation to accept. The referee correctly identifies the core contribution of the linear probe technique and the monotonic separability observation on Inception-v3 and ResNet-50.

Circularity Check

0 steps flagged

No significant circularity; experimental measurements are independent

full rationale

The paper presents an empirical method of training linear probes independently on frozen layer activations from held-out data to measure linear separability. The central observation—that probe accuracy increases monotonically with depth—is a direct experimental result on Inception-v3 and ResNet-50 with no equations, fitted parameters, or self-referential definitions that reduce the reported quantities back to the inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the derivation chain. The method is self-contained and externally verifiable via standard supervised training on activations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work is empirical and rests on standard machine-learning assumptions about feature representations rather than introducing new free parameters, axioms, or entities.

axioms (1)

domain assumption Linear classifier accuracy on layer activations measures the linear separability of those features for the classification task.
This assumption underpins the claim that probe performance indicates feature quality at each layer.

pith-pipeline@v0.9.0 · 5394 in / 1107 out tokens · 61237 ms · 2026-05-11T18:25:41.726397+00:00 · methodology

discussion (0)

Forward citations

Cited by 58 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dissecting Jet-Tagger Through Mechanistic Interpretability
hep-ph 2026-05 accept novelty 8.0

A Particle Transformer jet tagger contains a sparse six-head circuit whose source-relay-readout structure recovers most performance and whose residual stream preferentially encodes 2-prong energy correlators.
Slot Machines: How LLMs Keep Track of Multiple Entities
cs.CL 2026-04 unverdicted novelty 8.0

LLM activations encode current and prior entities in orthogonal slots, but models only use the current slot for explicit factual retrieval despite prior-slot information being linearly decodable.
Do Audio-Visual Large Language Models Really See and Hear?
cs.AI 2026-04 unverdicted novelty 8.0

AVLLMs encode audio semantics in middle layers but suppress them in final text outputs when audio conflicts with vision, due to training that largely inherits from vision-language base models.
Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers
cs.CV 2026-05 unverdicted novelty 7.0

Text embeddings in MM-DiTs contain a detectable omission signal for missing concepts, and amplifying it via OSI reduces concept omission in generated images on FLUX.1-Dev and SD3.5-Medium.
Controlling Logical Collapse in LLMs via Algebraic Ontology Projection over F2
cs.LG 2026-05 unverdicted novelty 7.0

Projecting LLM hidden states onto F2 algebra with 42 pairs yields 93% zero-shot accuracy on logical relations and identifies prompt-preventable late-layer collapse.
Deep Minds and Shallow Probes
cs.LG 2026-05 unverdicted novelty 7.0

Symmetry under affine reparameterizations of hidden coordinates selects a unique hierarchy of shallow coordinate-stable probes and a probe-visible quotient for cross-model transfer.
From Mechanistic to Compositional Interpretability
cs.LG 2026-05 unverdicted novelty 7.0

Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaran...
Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection
cs.CV 2026-05 unverdicted novelty 7.0

A new orthogonal projection module for video anomaly detection suppresses facial attributes via weak face-presence signals and cosine alignment while preserving anomaly-relevant features like pose and motion.
SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data
cs.LG 2026-05 unverdicted novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships ...
Inference Time Causal Probing in LLMs
cs.AI 2026-05 unverdicted novelty 7.0

HDMI is a new probe-free technique that steers LLM hidden states via margin objectives to achieve more reliable causal interventions than prior probe-based methods on standard benchmarks.
Understanding Performance Collapse in Layer-Pruned Large Language Models via Decision Representation Transitions
cs.CL 2026-05 unverdicted novelty 7.0

Performance collapse in layer-pruned LLMs stems from disrupting the Silent Phase of decision-making, which blocks the transition to correct predictions, while the later Decisive Phase is robust to pruning.
Logic-Regularized Verifier Elicits Reasoning from LLMs
cs.CL 2026-05 unverdicted novelty 7.0

LOVER creates an unsupervised logic-regularized verifier that reaches 95% of supervised verifier performance on reasoning tasks across 10 datasets.
The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences
cs.CL 2026-05 unverdicted novelty 7.0

The primary axis of psychometric variation among LLMs is the degree to which they represent themselves as loci of phenomenal experience rather than systems of behavioral responses.
LUMINA: A Grid Foundation Model for Benchmarking AC Optimal Power Flow Surrogate Learning
cs.LG 2026-05 unverdicted novelty 7.0

LUMINA-Bench is a standardized evaluation framework for ACOPF surrogate models that tests generalization across multiple grid topologies using accuracy and physics-constraint metrics.
Concepts Whisper While Syntax Shouts: Spectral Anti-Concentration and the Dual Geometry of Transformer Representations
cs.LG 2026-05 unverdicted novelty 7.0

Transformer activations show spectral anti-concentration for concepts in the tail while syntax prefers high-variance directions, forming a dual geometry.
Knowing when to trust machine-learned interatomic potentials
cs.LG 2026-05 unverdicted novelty 7.0

PROBE recasts MLIP uncertainty quantification as selective classification by training a compact discriminative classifier on frozen per-atom backbone embeddings, yielding a reliability probability that tracks actual e...
Latent Space Probing for Adult Content Detection in Video Generative Models
cs.CV 2026-04 unverdicted novelty 7.0

Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.
Lost in the Hype: Revealing and Dissecting the Performance Degradation of Medical Multimodal Large Language Models in Image Classification
cs.CV 2026-04 unverdicted novelty 7.0

Medical MLLMs degrade on image classification due to four failure modes in visual representation quality, connector projection fidelity, LLM comprehension, and semantic mapping alignment, quantified by feature probing...
The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior
cs.LG 2026-03 unverdicted novelty 7.0

The grokking delay in encoder-decoder models on one-step Collatz prediction stems from decoder inability to use early-learned encoder representations of parity and residue structure, with numeral base acting as a stro...
Synthetic Designed Experiments for Diagnosing Vision Model Failure
cs.CV 2026-03 unverdicted novelty 7.0

SDRS uses designed experiments and ANOVA decomposition on synthetic data to identify Type I coverage gaps and Type II spurious dependencies in vision models, then generates targeted data to improve performance.
Eliciting Latent Predictions from Transformers with the Tuned Lens
cs.LG 2023-03 accept novelty 7.0

Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.
Dynamics of the Transformer Residual Stream: Coupling Spectral Geometry to Network Topology
cs.LG 2026-05 unverdicted novelty 6.0

Training installs a depth-dependent spectral gradient and low-rank bottleneck in LLM residual streams whose amplification or suppression of graph communities is predicted by local operator type.
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
cs.LG 2026-05 unverdicted novelty 6.0

Sparse autoencoders on EEG transformers identify three regimes of clinical concept encoding and reveal entanglements such as age-pathology confounding via a new steering selectivity metric.
When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction
cs.AI 2026-05 unverdicted novelty 6.0

Attention to goal tokens declines in multi-turn LLM interactions while residual representations often retain decodable goal information, and the gap between these predicts whether goal-conditioned behavior survives.
When Reasoning Traces Become Performative: Step-Level Evidence that Chain-of-Thought Is an Imperfect Oversight Channel
cs.AI 2026-05 unverdicted novelty 6.0

CoT traces align with internal answer commitment in only 61.9% of steps on average, dominated by confabulated continuations after commitment has stabilized.
A Controlled Counterexample to Strong Proxy-Based Explanations of OOD Performance: in a Fixed Pretraining-and-Probing Setup
cs.LG 2026-05 unverdicted novelty 6.0

Proxy rankings of pretraining datasets by learned structure can reverse the actual OOD accuracy rankings in a synthetic sequence modeling task.
Hidden Error Awareness in Chain-of-Thought Reasoning: The Signal Is Diagnostic, Not Causal
cs.CL 2026-05 unverdicted novelty 6.0

LLMs detect CoT reasoning errors in hidden states with 0.95 AUROC but cannot use this awareness to correct them via steering, patching, or self-correction, indicating the signal is diagnostic not causal.
The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations
cs.AI 2026-05 unverdicted novelty 6.0

Temporal knowledge drift is encoded as a geometrically orthogonal direction in LLM residual streams, independent of correctness and uncertainty.
Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation
cs.LG 2026-05 unverdicted novelty 6.0

Pretraining induces stable leading singular vectors that form a reusable spectral basis inherited by downstream tasks, enabling competitive performance with 0.2% trainable parameters on GLUE.
Molecules Meet Language: Confound-Aware Representation Learning and Chemical Property Steering in Transformer-VAE Latent Spaces
cs.LG 2026-05 unverdicted novelty 6.0

Chemically meaningful steering for properties like cLogP and TPSA emerges in entangled Transformer-VAE latent spaces only after controlling for SELFIES representation confounds through residualization and decoded traversals.
The Weight Gram Matrix Captures Sequential Feature Linearization in Deep Networks
cs.LG 2026-05 unverdicted novelty 6.0

Gradient descent in deep networks implicitly drives features toward target-linear structure as captured by the weight Gram matrix and a derived virtual covariance.
On the Blessing of Pre-training in Weak-to-Strong Generalization
cs.LG 2026-05 unverdicted novelty 6.0

Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.
FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation
cs.LG 2026-05 unverdicted novelty 6.0

FAAST analytically compiles labeled examples into fast weights via a single forward pass, matching backprop adaptation performance with over 90% less time and up to 95% less memory than memory-based methods.
Intermediate Representations are Strong AI-Generated Image Detectors
cs.CV 2026-05 unverdicted novelty 6.0

Intermediate layer embedding sensitivity to perturbations distinguishes AI-generated images from real ones, yielding higher AUROC on GenImage and Forensics Small benchmarks than prior methods.
Differentiable Kernel Ridge Regression for Deep Learning Pipelines
cs.LG 2026-05 unverdicted novelty 6.0

Sparse Kernels turn kernel ridge regression into end-to-end differentiable PyTorch layers that support training-free transfer, nonlinear probing, and hybrid models while matching or augmenting neural readouts in some ...
GeoSAE: Geometric Prior-Guided Layer-Wise Sparse Autoencoder Annotation of Brain MRI Foundation Models
cs.CV 2026-05 unverdicted novelty 6.0

GeoSAE extracts a compact, interpretable feature set from frozen brain MRI foundation models that predicts MCI-to-AD conversion (AUC 0.746) with age-deconfounded annotations and replicates across cohorts.
Lost in State Space: Probing Frozen Mamba Representations
cs.CL 2026-04 unverdicted novelty 6.0

Frozen Mamba patch-boundary readouts do not outperform mean pooling for sentence representations on SST-2, CoLA, MRPC, STS-B, and IMDb due to anisotropy (cosine similarity ~0.9999) and representational collapse (MCC=0...
Debiasing Reward Models via Causally Motivated Inference-Time Intervention
cs.CL 2026-04 unverdicted novelty 6.0

Neuron-level inference-time intervention reduces multiple biases in reward models, enabling 2B and 7B models to match 70B performance on LLM alignment benchmarks without trade-offs.
AttriBE: Quantifying Attribute Expressivity in Body Embeddings for Recognition and Identification
cs.CV 2026-04 unverdicted novelty 6.0

Transformer-based ReID embeddings encode BMI most strongly in deeper layers, followed by pitch, gender, and yaw, with pose peaking in middle layers and BMI increasing with depth; cross-spectral settings shift reliance...
Contextual Linear Activation Steering of Language Models
cs.CL 2026-04 unverdicted novelty 6.0

CLAS dynamically adapts linear activation steering strengths to context, outperforming fixed-strength steering and matching or exceeding ReFT and LoRA on eleven benchmarks across four model families with limited labeled data.
MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis
q-bio.NC 2026-04 unverdicted novelty 6.0

MoDAl discovers complementary neurolinguistic modalities via contrastive-decorrelation objectives, cutting brain-to-text word error rate from 26.3% to 21.6% by incorporating area 44 signals.
Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models
cs.AI 2026-04 unverdicted novelty 6.0

Omni-modal LLMs exhibit visual preference that emerges in mid-to-late layers, enabling hallucination detection without task-specific training.
Class Unlearning via Depth-Aware Removal of Forget-Specific Directions
cs.CV 2026-04 unverdicted novelty 6.0

DAMP performs one-shot class unlearning by extracting and projecting out forget-specific residual directions at each network depth using class prototypes and a separability-derived scaling rule.
Preventing Latent Rehearsal Decay in Online Continual SSL with SOLAR
cs.LG 2026-04 unverdicted novelty 6.0

SOLAR prevents latent rehearsal decay in online continual SSL by adaptively managing replay buffers with deviation proxies and an explicit overlap loss, delivering both fast convergence and state-of-the-art final accu...
Zero-Shot Synthetic-to-Real Handwritten Text Recognition via Task Analogies
cs.CV 2026-04 unverdicted novelty 6.0

A method learns synthetic-to-real parameter corrections from source languages and transfers them to target languages without any real target data, improving HTR across five languages and six models.
Exact Unlearning from Proxies Induces Closeness Guarantees on Approximate Unlearning
cs.LG 2026-05 unverdicted novelty 5.0

Inferring data distributions precisely allows distilling exact unlearning signals, yielding KL divergence bounds to the retrained model and outperforming competitors in three forgetting scenarios.
Towards Effective Theory of LLMs: A Representation Learning Approach
cs.LG 2026-05 unverdicted novelty 5.0

RET learns temporally consistent macrovariables from LLM activations via self-supervised learning to support interpretability, early behavioral prediction, and causal intervention.
HyperLens: Quantifying Cognitive Effort in LLMs with Fine-grained Confidence Trajectory
cs.AI 2026-05 unverdicted novelty 5.0

HyperLens reveals that deeper transformer layers magnify small confidence changes into fine-grained trajectories, allowing quantification of cognitive effort where complex tasks demand more and standard SFT can reduce it.
Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes
cs.AI 2026-05 unverdicted novelty 5.0

Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation
cs.LG 2026-05 unverdicted novelty 5.0

FAAST performs test-time supervised adaptation by analytically deriving fast weights from examples in one forward pass, matching backprop performance with over 90% less adaptation time and up to 95% memory savings ver...
Trust, but Verify: Peeling Low-Bit Transformer Networks for Training Monitoring
cs.LG 2026-05 unverdicted novelty 5.0

A layer-wise peeling framework creates reference bounds to diagnose under-optimized layers in trained decoder-only transformers, including low-bit and quantized versions.
Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification
eess.AS 2026-04 unverdicted novelty 5.0

Dual-LoRA with a language-anchored adversary achieves 0.91% EER on the TidyVoice benchmark for cross-lingual speaker verification by targeting true linguistic cues while preserving speaker discriminability.
Probing for Reading Times
cs.CL 2026-04 unverdicted novelty 5.0

Early layers of language models predict early-pass human reading times better than surprisal, with surprisal superior for late-pass measures and strong variation by language.
A Model of Understanding in Deep Learning Systems
cs.AI 2026-04 unverdicted novelty 5.0

Deep learning systems achieve systematic understanding through internal models tracking regularities but exhibit fractured understanding due to symbolic misalignment, lack of explicit reduction, and weak unification.
Beyond the Black Box: Interpretability of Agentic AI Tool Use
cs.AI 2026-05 unverdicted novelty 4.0

A mechanistic interpretability toolkit with SAEs and probes enables pre-action inference of tool decisions in AI agents trained on function-calling trajectories.
Fairness is Not Flat: Geometric Phase Transitions Against Shortcut Learning
cs.LG 2026-04 unverdicted novelty 4.0

A zero-hidden-layer Topological Auditor prunes linear shortcuts, forcing networks to higher geometric capacity (N>=16) for fairer representations and cutting counterfactual gender vulnerability from 21.18% to 7.66%.
Probing Classifiers: Promises, Shortcomings, and Advances
cs.CL 2021-02 unverdicted novelty 3.0

Probing classifiers are a common but limited method for analyzing linguistic knowledge in neural NLP models, and this review outlines their promises, methodological shortcomings, and recent advances.
High-Dimensional Statistics: Reflections on Progress and Open Problems
math.ST 2026-05 unverdicted novelty 2.0

A survey synthesizing representative advances, common themes, and open problems in high-dimensional statistics while pointing to key entry-point works.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · cited by 57 Pith papers · 3 internal anchors

[1]

Understanding intermediate layers using linear classifier probes

Alain, G. and Bengio, Y. (2016). Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644\/

work page internal anchor Pith review Pith/arXiv arXiv 2016
[2]

Arras, L., Montavon, G., M \"u ller, K.-R., and Samek, W. (2017). Explaining recurrent neural network predictions in sentiment analysis. arXiv preprint arXiv:1706.07206\/

work page arXiv 2017
[3]

Bach, S., Binder, A., Montavon, G., Klauschen, F., M \"u ller, K.-R., and Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one\/ , 10 (7), e0130140

work page 2015
[4]

Biggio, B., Corona, I., Maiorca, D., Nelson, B., S rndi \'c , N., Laskov, P., Giacinto, G., and Roli, F. (2013). Evasion attacks against machine learning at test time. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases\/ , pages 387--402. Springer

work page 2013
[5]

Binder, A., Montavon, G., Lapuschkin, S., M \"u ller, K.-R., and Samek, W. (2016). Layer-wise relevance propagation for neural networks with local renormalization layers. In International Conference on Artificial Neural Networks\/ , pages 63--71. Springer

work page 2016
[6]

Chollet, F. et al. (2015). Keras. https://github.com/fchollet/keras

work page 2015
[7]

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Darrell, T. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In International conference on machine learning\/ , pages 647--655

work page 2014
[8]

and Brox, T

Dosovitskiy, A. and Brox, T. (2016). Inverting visual representations with convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition\/ , pages 4829--4837

work page 2016
[9]

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems\/ , pages 2672--2680

work page 2014
[10]

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition\/ , pages 770--778

work page 2016
[11]

Jarrett, K., Kavukcuoglu, K., Lecun, Y., et al. (2009). What is the best multi-stage architecture for object recognition? In 2009 IEEE 12th International Conference on Computer Vision\/ , pages 2146--2153. IEEE

work page 2009
[12]

Jastrzebski, S., Arpit, D., Ballas, N., Verma, V., Che, T., and Bengio, Y. (2017). Residual connections encourage iterative inference. arXiv preprint arXiv:1710.04773\/

work page arXiv 2017
[13]

Lapuschkin, S., Binder, A., Montavon, G., M \"u ller, K.-R., and Samek, W. (2016). Analyzing classifiers: Fisher vectors and deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition\/ , pages 2912--2920

work page 2016
[14]

Larsson, G., Maire, M., and Shakhnarovich, G. (2016). Fractalnet: Ultra-deep neural networks without residuals. arXiv preprint arXiv:1605.07648\/

work page arXiv 2016
[15]

and Vedaldi, A

Mahendran, A. and Vedaldi, A. (2015). Understanding deep image representations by inverting them. In Proceedings of the IEEE conference on computer vision and pattern recognition\/ , pages 5188--5196

work page 2015
[16]

and Vedaldi, A

Mahendran, A. and Vedaldi, A. (2016). Visualizing deep convolutional neural networks using natural pre-images. International Journal of Computer Vision\/ , 120 (3), 233--255

work page 2016
[17]

L., and M \"u ller, K.-R

Montavon, G., Braun, M. L., and M \"u ller, K.-R. (2011). Kernel analysis of deep networks. Journal of Machine Learning Research\/ , 12 (Sep), 2563--2581

work page 2011
[18]

Raghu, M., Yosinski, J., and Sohl-Dickstein, J. (2017a). Bottom up or top down? dynamics of deep representations via canonical correlation analysis. arxiv\/

work page
[19]

Raghu, M., Gilmer, J., Yosinski, J., and Sohl-Dickstein, J. (2017b). Svcca: Singular vector canonical correlation analysis for deep understanding and improvement. arXiv preprint arXiv:1706.05806\/

work page Pith review arXiv
[20]

C., and Fei-Fei, L

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge . International Journal of Computer Vision (IJCV)\/ , 115 (3), 211--252

work page 2015
[21]

Singh, S., Hoiem, D., and Forsyth, D. (2016). Swapout: Learning an ensemble of deep architectures. In Advances In Neural Information Processing Systems\/ , pages 28--36

work page 2016
[22]

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199\/

work page internal anchor Pith review arXiv 2013
[23]

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition\/ , pages 1--9

work page 2015
[24]

J., and Belongie, S

Veit, A., Wilber, M. J., and Belongie, S. (2016). Residual networks behave like ensembles of relatively shallow networks. In Advances in Neural Information Processing Systems\/ , pages 550--558

work page 2016
[25]

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning\/ , pages 2048--2057

work page 2015
[26]

Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems\/ , pages 3320--3328

work page 2014
[27]

Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding convolutional networks. In European conference on computer vision\/ , pages 818--833. Springer

work page 2014
[28]

Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2016). Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530\/

work page internal anchor Pith review arXiv 2016