GEAR adaptively reweights GRPO advantages in LLM RL by using divergence spikes from self-distillation to define semantic segments and modulate local credit.
Tooltree: Efficient llm agent tool planning via dual-feedback monte carlo tree search and bidirectional pruning
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 2polarities
background 2representative citing papers
NeuroSymActive combines soft-unification symbolic modules, a neural path evaluator, and Monte-Carlo-style active exploration to reach strong answer accuracy on KGQA benchmarks while cutting graph lookups and model calls versus standard retrieval baselines.
ActorMind is a four-agent chain-of-thought framework that emulates human actors to produce spontaneous, emotion-infused speech responses for role-playing scenarios.
Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.
citing papers explorer
-
GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation
GEAR adaptively reweights GRPO advantages in LLM RL by using divergence spikes from self-distillation to define semantic segments and modulate local credit.
-
NeuroSymActive: Differentiable Neural-Symbolic Reasoning with Active Exploration for Knowledge Graph Question Answering
NeuroSymActive combines soft-unification symbolic modules, a neural path evaluator, and Monte-Carlo-style active exploration to reach strong answer accuracy on KGQA benchmarks while cutting graph lookups and model calls versus standard retrieval baselines.
-
ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing
ActorMind is a four-agent chain-of-thought framework that emulates human actors to produce spontaneous, emotion-infused speech responses for role-playing scenarios.
-
Reinforcement Learning for Scalable and Trustworthy Intelligent Systems
Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.