The Description Length of Deep Learning Models

· 2018 · cs.LG · arXiv 1802.07044

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Solomonoff's general theory of inference and the Minimum Description Length principle formalize Occam's razor, and hold that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. Deep neural networks might seem to go against this principle given the large number of parameters to be encoded. We demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding. The compression viewpoint originally motivated the use of variational methods in neural networks. Unexpectedly, we found that these variational methods provide surprisingly poor compression bounds, despite being explicitly built to minimize such bounds. This might explain the relatively poor practical performance of variational methods in deep learning. On the other hand, simple incremental encoding methods yield excellent compression values on deep networks, vindicating Solomonoff's approach.

representative citing papers

How can embedding models bind concepts?

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

CLIP relies on high-complexity additive binding that prevents generalization to unseen concept combinations, whereas transformers trained from scratch develop low-complexity multiplicative binding functions that enable systematic generalization with sufficient data.

citing papers explorer

Showing 1 of 1 citing paper.

How can embedding models bind concepts? cs.CV · 2026-05-29 · unverdicted · none · ref 1 · internal anchor
CLIP relies on high-complexity additive binding that prevents generalization to unseen concept combinations, whereas transformers trained from scratch develop low-complexity multiplicative binding functions that enable systematic generalization with sufficient data.

The Description Length of Deep Learning Models

fields

years

verdicts

representative citing papers

citing papers explorer