pith. sign in

hub

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

25 Pith papers cite this work. Polarity classification is still indexing.

25 Pith papers citing it
abstract

The reparameterization trick enables optimizing large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution. After refactoring, the gradients of the loss propagated by the chain rule through the graph are low variance unbiased estimators of the gradients of the expected loss. While many continuous random variables have such reparameterizations, discrete random variables lack useful reparameterizations due to the discontinuous nature of discrete states. In this work we introduce Concrete random variables---continuous relaxations of discrete random variables. The Concrete distribution is a new family of distributions with closed form densities and a simple reparameterization. Whenever a discrete stochastic node of a computation graph can be refactored into a one-hot bit representation that is treated continuously, Concrete stochastic nodes can be used with automatic differentiation to produce low-variance biased gradients of objectives (including objectives that depend on the log-probability of latent stochastic nodes) on the corresponding discrete graph. We demonstrate the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks.

hub tools

citation-role summary

background 1 method 1

citation-polarity summary

verdicts

UNVERDICTED 25

representative citing papers

Attention-based optimizer for symmetry finding

quant-ph · 2026-05-28 · unverdicted · novelty 7.0

A Set-Transformer architecture with self-attention encodes Pauli-string correlations, optimizes via commutation objective, and finds symmetries with near-deterministic success on physical models like Ising and Toric code.

Rethinking the Harmonic Loss via Non-Euclidean Distance Layers

cs.LG · 2026-03-10 · unverdicted · novelty 7.0

Non-Euclidean distance variants of harmonic loss improve accuracy, gradient stability, and energy efficiency over cross-entropy and Euclidean harmonic loss in vision backbones and large language models.

Approximation-Free Differentiable Oblique Decision Trees

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

DTSemNet gives an exact, invertible neural-network encoding of hard oblique decision trees that supports direct gradient training for both classification and regression without probabilistic softening or quantized estimators.

LumiMotion: Improving Gaussian Relighting with Scene Dynamics

cs.CV · 2026-04-13 · unverdicted · novelty 7.0

LumiMotion improves albedo estimation and scene relighting in dynamic scenes by leveraging motion to separate lighting effects from surface appearance in a dynamic 2D Gaussian Splatting representation.

The Power of Order: Fooling LLMs with Adversarial Table Permutations

cs.LG · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

Semantically invariant row and column permutations in tables can cause LLMs to output incorrect answers, and a gradient-based attack called ATP efficiently finds such permutations that degrade performance across many models.

SWAN: World-Aware Adaptive Multimodal Networks for Runtime Variations

cs.LG · 2026-04-28 · unverdicted · novelty 6.0

SWAN is the first adaptive multimodal network that meets variable compute budgets, optimizes layer use by sample complexity, and drops irrelevant features, cutting FLOPs up to 49% in 3D object detection with minimal accuracy loss.

citing papers explorer

Showing 25 of 25 citing papers.