AGAN is the first neural architecture search method for GANs that discovers architectures outperforming state-of-the-art on CIFAR-10 unsupervised image generation and competitive on supervised tasks.
hub Canonical reference
Neural Combinatorial Optimization with Reinforcement Learning
Canonical reference. 100% of citing Pith papers cite this work as background.
abstract
This paper presents a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent network that, given a set of city coordinates, predicts a distribution over different city permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent network using a policy gradient method. We compare learning the network parameters on a set of training graphs against learning them on individual test graphs. Despite the computational expense, without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes. Applied to the KnapSack, another NP-hard problem, the same method obtains optimal solutions for instances with up to 200 items.
hub tools
citation-role summary
citation-polarity summary
fields
cs.LG 21 cs.AI 8 quant-ph 4 cs.CL 2 cs.CV 2 math.OC 2 cond-mat.dis-nn 1 cs.IR 1 cs.MA 1 cs.RO 1roles
background 5polarities
background 5representative citing papers
A reinforcement learning policy for the vertex-guard art gallery problem encodes sufficient geometric information in its encoder to allow a simple classifier to achieve high coverage feasibility out of distribution.
EPB distills NCO models into evolving program portfolios via LLM-driven textual-numerical optimization, matching original performance while exposing stage-dependent heuristic-like behavior.
TriSearch is an RL framework that optimizes triangulations of polytopes using bistellar flips with a circuit-supported subtriangulation action representation, generalizing zero-shot to larger instances and outperforming prior samplers in 3D and 4D.
SPACE framework unifies symmetric and asymmetric VRPs via bidirectional Frechet representations and weight-decomposed decoding for zero-shot generalization across 110 variants.
MEMOIR adds branch-local and global memory with a reflection step to tree search for LLM solver synthesis, reaching 96.7% solution validity and 7.3-point score gains over baselines on seven CO problems with lower run-to-run variance.
TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
Linear decision trees can represent optimal solution policies for families of integer linear programs, enabling polynomial-time queries after offline synthesis for fixed feasible sets.
PLMA combines cross-graph attention EBMs with short warm-started MCMC chains to reach near-zero average optimality gaps on QAPLIB and strong robustness on hard Taixxeyy instances.
VaP-CSMV uses a cross-semantic encoder and multi-view decoder to unify DRL solving of HFVRP variants, outperforming prior neural solvers while matching heuristics at much lower inference time and generalizing zero-shot to unseen scales.
NCP trains a neural network to predict certificate-level dual prices for CO problems, enabling structured primal recovery with a local second-order error guarantee when consistency holds.
AtomTreeSearch embeds a neutral-atom quantum MWIS subroutine inside Monte Carlo Tree Search and matches or exceeds OR-Tools and simulated annealing on TSP instances up to 100 cities.
GeoRouteNet improves non-autoregressive neural TSP solvers via geometric inductive biases and MCS-RL training, reporting 0.32% gap on TSP50, 1.26% on TSP100, and 3.60% on TSPLIB instances with higher throughput than Concorde and LKH3.
RACL lets a reasoning agent discover and apply control rules to a metaheuristic by observing operational memory and testing bounded interventions, shown on vehicle routing with reported cost improvements over baselines.
MViewRouter internalizes D4 geometric equivariance for routing via Multi-view Alternating Attention and Collective Policy Gradient Aggregation, yielding competitive solutions and strong generalization on TSP/CVRP benchmarks.
PARCEL is a new visual tokenization architecture combining pool-anchored resampling with conditioned elastic queries to enhance performance-efficiency tradeoffs in LVLMs over prior matryoshka methods.
AlphaTransit pairs MCTS with a learned policy-value network to reach 54.6% and 82.1% service rates on a Bloomington transit benchmark, outperforming plain RL and plain MCTS baselines.
Deep Boltzmann Quantum States with natural-gradient optimization and annealing-like training match exact or best-known solutions for large infinite-range Ising spin glasses and solve job shop scheduling instances.
SCOPE-BENCH shows state-of-the-art molecular models suffer up to 8x higher errors under extreme OOD, while POMA reduces mean absolute error by up to 11.2% via target-aware source selection and dual-scale adaptation.
Reinforcement learning policy for qubit mapping reduces SWAP overhead by 65-85% versus standard quantum compilers on MQTBench and Queko benchmark circuits.
ECO uses supervised warm-up plus iterative batched DPO on a Mamba backbone to reach top neural performance on TSP and CVRP while lowering memory growth and raising throughput.
A differentiable MPNN approximates uniform facility location with provable guarantees and outperforms standard approximation algorithms while closing the gap to exact ILP solutions.
QARMA applies transformer-augmented reinforcement learning to qubit allocation and reuse in modular quantum systems, reporting up to 86% average reduction in inter-core communications versus optimized Qiskit baselines.
citing papers explorer
-
Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models
Large vision-language models applied to multi-scale remote sensing imagery can generate recommendations on built environment design, constructability, land use, and risks for smart city decision-making.
-
Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.