GiGPO adds a hierarchical grouping mechanism to group-based RL so that LLM agents receive both global trajectory and local step-level credit signals, yielding >12% gains on ALFWorld and >9% on WebShop over GRPO while keeping the same rollout and memory footprint.
Natural questions: a benchmark for question answering research.Transactions of the Association for Computational Linguistics, 7:453–466
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
dataset 1polarities
use dataset 1representative citing papers
LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.
EditRisk-Bench demonstrates that malicious knowledge editing reliably induces incorrect or unsafe reasoning in LLMs while largely preserving general capabilities.
PiCA uses pivot-based potential rewards derived from historical sub-queries to supply trajectory-aware step guidance in agentic RL, delivering 15% gains on QA benchmarks for 3B/7B models.
SpecBlock achieves 8-19% higher speedup than EAGLE-3 in LLM speculative decoding by using repeated block expansions with hidden-state inheritance, a dynamic rank head, and a valid-prefix training mask.
citing papers explorer
-
Group-in-Group Policy Optimization for LLM Agent Training
GiGPO adds a hierarchical grouping mechanism to group-based RL so that LLM agents receive both global trajectory and local step-level credit signals, yielding >12% gains on ALFWorld and >9% on WebShop over GRPO while keeping the same rollout and memory footprint.
-
Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs
LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.
-
Benchmarking Safety Risks of Knowledge-Intensive Reasoning under Malicious Knowledge Editing
EditRisk-Bench demonstrates that malicious knowledge editing reliably induces incorrect or unsafe reasoning in LLMs while largely preserving general capabilities.
-
PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning
PiCA uses pivot-based potential rewards derived from historical sub-queries to supply trajectory-aware step guidance in agentic RL, delivering 15% gains on QA benchmarks for 3B/7B models.
-
SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting
SpecBlock achieves 8-19% higher speedup than EAGLE-3 in LLM speculative decoding by using repeated block expansions with hidden-state inheritance, a dynamic rank head, and a valid-prefix training mask.