A Set-Transformer architecture with self-attention encodes Pauli-string correlations, optimizes via commutation objective, and finds symmetries with near-deterministic success on physical models like Ising and Toric code.
hub
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
25 Pith papers cite this work. Polarity classification is still indexing.
abstract
The reparameterization trick enables optimizing large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution. After refactoring, the gradients of the loss propagated by the chain rule through the graph are low variance unbiased estimators of the gradients of the expected loss. While many continuous random variables have such reparameterizations, discrete random variables lack useful reparameterizations due to the discontinuous nature of discrete states. In this work we introduce Concrete random variables---continuous relaxations of discrete random variables. The Concrete distribution is a new family of distributions with closed form densities and a simple reparameterization. Whenever a discrete stochastic node of a computation graph can be refactored into a one-hot bit representation that is treated continuously, Concrete stochastic nodes can be used with automatic differentiation to produce low-variance biased gradients of objectives (including objectives that depend on the log-probability of latent stochastic nodes) on the corresponding discrete graph. We demonstrate the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 25representative citing papers
INSHAPE discovers instance-specific non-overlapping shapelets, models their temporal dependencies, and aggregates them bottom-up into population-level prototypes for improved accuracy and interpretability in time-series classification.
FLDD learns non-Markovian marginal and posterior distributions for the forward process so a factorized reverse process can match the target better and produce higher-quality samples in fewer steps.
HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.
In-context symbolic regression methods improve robustness of symbolic formula recovery from KANs, cutting median OFAT test MSE by up to 99.8 percent across hyperparameter sweeps.
Non-Euclidean distance variants of harmonic loss improve accuracy, gradient stability, and energy efficiency over cross-entropy and Euclidean harmonic loss in vision backbones and large language models.
MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.
LeapTS reformulates forecasting as adaptive multi-horizon scheduling via hierarchical control and NCDEs, delivering at least 7.4% better performance and 2.6-5.3x faster inference than Transformer baselines while adapting to non-stationary dynamics.
DTSemNet gives an exact, invertible neural-network encoding of hard oblique decision trees that supports direct gradient training for both classification and regression without probabilistic softening or quantized estimators.
AS-LoRA adaptively chooses which LoRA factor to update per layer and round using a curvature-aware second-order score, eliminating reconstruction error floors and improving performance in DP federated learning.
ARCH is a hierarchical flow-based generative model that enables tractable conditional intensity computation and arbitrary conditioning for spatiotemporal event distributions.
LumiMotion improves albedo estimation and scene relighting in dynamic scenes by leveraging motion to separate lighting effects from surface appearance in a dynamic 2D Gaussian Splatting representation.
LMFT enables state-of-the-art performance in video unsupervised domain adaptation by focusing on motion-rich tokens and reducing computational overhead.
SNMPP builds a product-form neural influence kernel from a signed interaction network over event classes and a delay-aware monotonic temporal network to enable explicit discovery of inter-event relationships alongside strong prediction.
GSQ uses Gumbel-Softmax to optimize scalar quantization grids for LLMs, closing most of the accuracy gap to vector methods like QTIP at 2-3 bits per parameter while using symmetric scalar grids compatible with existing kernels.
A differentiable logic programming approach optimizes continuous gate switches to discover and adapt quantum circuits while satisfying user-defined logical axioms.
LILogicNet trains compact logic-gate networks with learnable sparse connectivity via Top-K selection, reaching 98.45% MNIST accuracy with 8k gates and 60.98% CIFAR-10 accuracy with 256k gates while using far fewer gates than prior logic models.
A two-stage framework learns a world graph of pivotal states task-agnostically via joint training of a latent model and curiosity-driven policy, then uses the graph to accelerate hierarchical RL on maze tasks.
CapsID uses probabilistic capsule routing and confidence-based termination to generate variable-length semantic IDs, improving recall by 9.6% over strong baselines with half the latency of dual-representation systems.
GRE-MC retrieves relevant subgraphs and uses a graph transformer plus sparse codebook to complete missing modalities, outperforming prior methods on recommendation benchmarks.
Semantically invariant row and column permutations in tables can cause LLMs to output incorrect answers, and a gradient-based attack called ATP efficiently finds such permutations that degrade performance across many models.
SWAN is the first adaptive multimodal network that meets variable compute budgets, optimizes layer use by sample complexity, and drops irrelevant features, cutting FLOPs up to 49% in 3D object detection with minimal accuracy loss.
Quantum annealing combined with a Neural Hash Function lets generative models create molecules that are more drug-like than classical versions or the training set itself.
ARMIN introduces auto-addressing via hidden states and a novel RNN cell to produce a lighter recurrent memory network with lower overhead than existing MANNs or vanilla LSTMs.
citing papers explorer
-
Attention-based optimizer for symmetry finding
A Set-Transformer architecture with self-attention encodes Pauli-string correlations, optimizes via commutation objective, and finds symmetries with near-deterministic success on physical models like Ising and Toric code.
-
INSHAPE: Instance-Level Shapelets for Interpretable Time-Series Classification
INSHAPE discovers instance-specific non-overlapping shapelets, models their temporal dependencies, and aggregates them bottom-up into population-level prototypes for improved accuracy and interpretability in time-series classification.
-
Forward-Learned Discrete Diffusion: Learning how to noise to denoise faster
FLDD learns non-Markovian marginal and posterior distributions for the forward process so a factorized reverse process can match the target better and produce higher-quality samples in fewer steps.
-
Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients
HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.
-
In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks
In-context symbolic regression methods improve robustness of symbolic formula recovery from KANs, cutting median OFAT test MSE by up to 99.8 percent across hyperparameter sweeps.
-
Rethinking the Harmonic Loss via Non-Euclidean Distance Layers
Non-Euclidean distance variants of harmonic loss improve accuracy, gradient stability, and energy efficiency over cross-entropy and Euclidean harmonic loss in vision backbones and large language models.
-
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.
-
LeapTS: Rethinking Time Series Forecasting as Adaptive Multi-Horizon Scheduling
LeapTS reformulates forecasting as adaptive multi-horizon scheduling via hierarchical control and NCDEs, delivering at least 7.4% better performance and 2.6-5.3x faster inference than Transformer baselines while adapting to non-stationary dynamics.
-
Approximation-Free Differentiable Oblique Decision Trees
DTSemNet gives an exact, invertible neural-network encoding of hard oblique decision trees that supports direct gradient training for both classification and regression without probabilistic softening or quantized estimators.
-
Adaptive Selection of LoRA Components in Privacy-Preserving Federated Learning
AS-LoRA adaptively chooses which LoRA factor to update per layer and round using a curvature-aware second-order score, eliminating reconstruction error floors and improving performance in DP federated learning.
-
Arbitrarily Conditioned Hierarchical Flows for Spatiotemporal Events
ARCH is a hierarchical flow-based generative model that enables tractable conditional intensity computation and arbitrary conditioning for spatiotemporal event distributions.
-
LumiMotion: Improving Gaussian Relighting with Scene Dynamics
LumiMotion improves albedo estimation and scene relighting in dynamic scenes by leveraging motion to separate lighting effects from surface appearance in a dynamic 2D Gaussian Splatting representation.
-
Learnable Motion-Focused Tokenization for Effective and Efficient Video Unsupervised Domain Adaptation
LMFT enables state-of-the-art performance in video unsupervised domain adaptation by focusing on motion-rich tokens and reducing computational overhead.
-
Structured Neural Marked Point Processes for Interpretable Event Interaction Modeling
SNMPP builds a product-form neural influence kernel from a signed interaction network over event classes and a delay-aware monotonic temporal network to enable explicit discovery of inter-event relationships alongside strong prediction.
-
GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling
GSQ uses Gumbel-Softmax to optimize scalar quantization grids for LLMs, closing most of the accuracy gap to vector methods like QTIP at 2-3 bits per parameter while using symmetric scalar grids compatible with existing kernels.
-
Differentiable Logical Programming for Quantum Circuit Discovery and Optimization
A differentiable logic programming approach optimizes continuous gate switches to discover and adapt quantum circuits while satisfying user-defined logical axioms.
-
LILogic Net: Compact Logic Gate Networks with Learnable Connectivity for Efficient Hardware Deployment
LILogicNet trains compact logic-gate networks with learnable sparse connectivity via Top-K selection, reaching 98.45% MNIST accuracy with 8k gates and 60.98% CIFAR-10 accuracy with 256k gates while using far fewer gates than prior logic models.
-
Learning World Graphs to Accelerate Hierarchical Reinforcement Learning
A two-stage framework learns a world graph of pivotal states task-agnostically via joint training of a latent model and curiosity-driven policy, then uses the graph to accelerate hierarchical RL on maze tasks.
-
CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation
CapsID uses probabilistic capsule routing and confidence-based termination to generate variable-length semantic IDs, improving recall by 9.6% over strong baselines with half the latency of dual-representation systems.
-
Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality Completion
GRE-MC retrieves relevant subgraphs and uses a graph transformer plus sparse codebook to complete missing modalities, outperforming prior methods on recommendation benchmarks.
-
The Power of Order: Fooling LLMs with Adversarial Table Permutations
Semantically invariant row and column permutations in tables can cause LLMs to output incorrect answers, and a gradient-based attack called ATP efficiently finds such permutations that degrade performance across many models.
-
SWAN: World-Aware Adaptive Multimodal Networks for Runtime Variations
SWAN is the first adaptive multimodal network that meets variable compute budgets, optimizes layer use by sample complexity, and drops irrelevant features, cutting FLOPs up to 49% in 3D object detection with minimal accuracy loss.
-
Molecular Design beyond Training Data with Novel Extended Objective Functionals of Generative AI Models Driven by Quantum Annealing Computer
Quantum annealing combined with a Neural Hash Function lets generative models create molecules that are more drug-like than classical versions or the training set itself.
-
ARMIN: Towards a More Efficient and Light-weight Recurrent Memory Network
ARMIN introduces auto-addressing via hidden states and a novel RNN cell to produce a lighter recurrent memory network with lower overhead than existing MANNs or vanilla LSTMs.
-
Bayesian Neural Networks: An Introduction and Survey
A survey introducing Bayesian Neural Networks and comparing approximate inference methods to enable uncertainty quantification in neural network predictions.