MIT press, ??? (2018)

· 1998 · arXiv 1998.712192

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Proper Scoring Rules for Agentic Uncertainty Quantification

cs.AI · 2026-05-23 · unverdicted · novelty 7.0

Introduces Trajectory Proper Score (TPS) as a strictly proper family of trajectory-level scoring rules that elicits the complete prefix-conditioned success probability process.

Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation

math.NA · 2026-06-09 · unverdicted · novelty 6.0

Dmsh is a new multi-agent RL framework that formulates mesh generation as an MDP and uses three coordinated agents plus curriculum learning to produce globally conforming all-quad meshes without post-processing.

Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

Entropy polarity is a signed token-level quantity derived from a first-order approximation of entropy change that predicts whether RL updates expand or contract policy entropy in LLM fine-tuning, revealing an asymmetry between high- and low-probability tokens.

DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

DAWM introduces a modular diffusion world model with an inverse dynamics model to produce complete synthetic transitions that improve conservative offline RL algorithms like TD3BC and IQL on D4RL tasks.

Safe reinforcement learning with online filtering for fatigue-predictive human-robot task planning and allocation in production

cs.AI · 2026-04-14 · unverdicted · novelty 5.0

PF-CD3Q uses online particle filtering to estimate fatigue parameters and constrains a deep Q-learning agent to solve fatigue-aware human-robot task planning as a CMDP.

Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making

cs.CY · 2025-02-17 · unverdicted · novelty 5.0

A reinforcement learning model is ethically fine-tuned using aggregated feedback from LLMs embodying five moral principles via Belief Jensen-Shannon Divergence and Dempster-Shafer Theory.

Benchmark Data Contamination of Large Language Models: A Survey

cs.CL · 2024-06-06 · unverdicted · novelty 3.0

A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.

citing papers explorer

Showing 7 of 7 citing papers.

Proper Scoring Rules for Agentic Uncertainty Quantification cs.AI · 2026-05-23 · unverdicted · none · ref 29
Introduces Trajectory Proper Score (TPS) as a strictly proper family of trajectory-level scoring rules that elicits the complete prefix-conditioned success probability process.
Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation math.NA · 2026-06-09 · unverdicted · none · ref 23
Dmsh is a new multi-agent RL framework that formulates mesh generation as an MDP and uses three coordinated agents plus curriculum learning to produce globally conforming all-quad meshes without post-processing.
Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control cs.LG · 2026-05-12 · unverdicted · none · ref 38 · 2 links
Entropy polarity is a signed token-level quantity derived from a first-order approximation of entropy change that predicts whether RL updates expand or contract policy entropy in LLM fine-tuning, revealing an asymmetry between high- and low-probability tokens.
DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions cs.LG · 2025-09-23 · unverdicted · none · ref 26
DAWM introduces a modular diffusion world model with an inverse dynamics model to produce complete synthetic transitions that improve conservative offline RL algorithms like TD3BC and IQL on D4RL tasks.
Safe reinforcement learning with online filtering for fatigue-predictive human-robot task planning and allocation in production cs.AI · 2026-04-14 · unverdicted · none · ref 72
PF-CD3Q uses online particle filtering to estimate fatigue parameters and constrains a deep Q-learning agent to solve fatigue-aware human-robot task planning as a CMDP.
Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making cs.CY · 2025-02-17 · unverdicted · none · ref 60
A reinforcement learning model is ethically fine-tuned using aggregated feedback from LLMs embodying five moral principles via Belief Jensen-Shannon Divergence and Dempster-Shafer Theory.
Benchmark Data Contamination of Large Language Models: A Survey cs.CL · 2024-06-06 · unverdicted · none · ref 140
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.

MIT press, ??? (2018)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer