Evolving-RL jointly optimizes experience extraction and utilization in LLM agents via RL with separate evaluation signals, delivering up to 98.7% relative gains on out-of-distribution tasks in ALFWorld and Mind2Web.
hub Canonical reference
Findings of the Association for Computational Linguistics: ACL 2023 , month = jul, year =
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 6polarities
background 6representative citing papers
Uncertainty trace profiles from LM reasoning traces predict correct final answers with AUROC up to 0.807 and enable early error detection using only initial tokens.
ToxiREX is a new dataset of 128k Reddit comments in six languages with hierarchical annotations for implicit toxicity in conversational context based on an existing reasoning schema.
RMCT matches the rate of target behaviors like bias-following across input perturbations to reduce sycophancy in LLMs while preserving verbalization of bias cues.
Teachers' views on AI benefits and risks vary widely across 55 countries, but LLMs compress these differences, overestimate both sides, and show little improvement from country prompting or better reasoning.
Math-Shepherd is an automatically trained process reward model that scores solution steps to verify and reinforce LLMs, lifting Mistral-7B from 77.9% to 89.1% on GSM8K and 28.6% to 43.5% on MATH.
CLASP combines TP-KMPs with VLMs for language-guided skill selection, covariance-weighted composition, and active learning requests, reporting 73.3-100% success on a 7-DoF manipulator.
Reasoning in large output spaces proceeds via shortlisting then fine-grained reasoning; this characterization enables a mechanistic distillation strategy that outperforms standard distillation.
GRC unifies generation, retrieval, and compression in LLMs via meta latent tokens for single-pass execution with modular flexibility.
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.
LLM accuracy on reasoning tasks differs significantly by question type, with step-by-step reasoning accuracy often uncorrelated to final answer selection.
A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.
A literature survey synthesizing benchmarks, architectures, training strategies, and evaluation methods for mathematical reasoning in LLMs, based on roughly 120 papers.
citing papers explorer
-
CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning
CLASP combines TP-KMPs with VLMs for language-guided skill selection, covariance-weighted composition, and active learning requests, reporting 73.3-100% success on a 7-DoF manipulator.