IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.
hub
PonderNet: Learning to ponder.arXiv preprint arXiv:2106.01345
12 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Graph transformer RL for dynamic RMSA supports up to 13% more traffic than benchmarks on networks up to 143 nodes and 362 links.
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
Gradient-boosted attention applies a corrective second attention pass within a single layer, mapping to Friedman's gradient boosting and improving perplexity by 5.6-6.0% on WikiText-103 and OpenWebText subsets over standard attention.
RankQ adds a self-supervised ranking loss to Q-learning to learn structured action orderings, yielding competitive or better performance than prior methods on D4RL benchmarks and large gains in vision-based robot fine-tuning.
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
Lightweight numerical bandits on text embeddings match or exceed LLM accuracy in contextual bandits at a fraction of the cost, with an embedding-based diagnostic to choose between them.
ARL lifts states into signature-augmented manifolds and employs self-consistent proxies of future path-laws to enable deterministic expected-return evaluation while preserving contraction mappings in jump-diffusion environments.
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
A comprehensive benchmark study of offline imitation learning methods on multi-stage robot manipulation tasks identifies key sensitivities to algorithm design, data quality, and stopping criteria while releasing all datasets and code.
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.
Large vision-language models applied to multi-scale remote sensing imagery can generate recommendations on built environment design, constructability, land use, and risks for smart city decision-making.
citing papers explorer
-
Offline Reinforcement Learning with Implicit Q-Learning
IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.
-
Graph Transformers and Stabilized Reinforcement Learning for Large-Scale Dynamic Routing Modulation and Spectrum Allocation in Elastic Optical Networks
Graph transformer RL for dynamic RMSA supports up to 13% more traffic than benchmarks on networks up to 143 nodes and 362 links.
-
Latent State Design for World Models under Sufficiency Constraints
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
-
Gradient Boosting within a Single Attention Layer
Gradient-boosted attention applies a corrective second attention pass within a single layer, mapping to Friedman's gradient boosting and improving perplexity by 5.6-6.0% on WikiText-103 and OpenWebText subsets over standard attention.
-
RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking
RankQ adds a self-supervised ranking loss to Q-learning to learn structured action orderings, yielding competitive or better performance than prior methods on D4RL benchmarks and large gains in vision-based robot fine-tuning.
-
A Meta Reinforcement Learning Approach to Goals-Based Wealth Management
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
-
When Do We Need LLMs? A Diagnostic for Language-Driven Bandits
Lightweight numerical bandits on text embeddings match or exceed LLM accuracy in contextual bandits at a fraction of the cost, with an embedding-based diagnostic to choose between them.
-
Anticipatory Reinforcement Learning: From Generative Path-Laws to Distributional Value Functions
ARL lifts states into signature-augmented manifolds and employs self-consistent proxies of future path-laws to enable deterministic expected-return evaluation while preserving contraction mappings in jump-diffusion environments.
-
A General Language Assistant as a Laboratory for Alignment
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
-
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
A comprehensive benchmark study of offline imitation learning methods on multi-stage robot manipulation tasks identifies key sensitivities to algorithm design, data quality, and stopping criteria while releasing all datasets and code.
-
Galactica: A Large Language Model for Science
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.
-
Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models
Large vision-language models applied to multi-scale remote sensing imagery can generate recommendations on built environment design, constructability, land use, and risks for smart city decision-making.