InProceed- ings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1417–1436, Abu Dhabi, United Arab Emirates

Li, Y · 2025 · arXiv 2505.23794

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use

cs.LG · 2026-05-27 · unverdicted · novelty 7.0

CARL trains a critic for segment-level credit assignment from binary outcomes in LLM tool-use trajectories, yielding 6.7-9.7 point accuracy gains and 53% fewer calls on solvable questions across five benchmarks.

AdaptR1: Reinforcement Learning Based Adaptive Interleaved Thinking in Multi-hop Question Answering

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

AdaptR1 uses fully RL-based training with a quality-gated efficiency reward for step-wise adaptive reasoning in multi-hop QA, reducing think tokens by 69.71% on average and 90.35% on HotpotQA with comparable or better performance.

Integrating Chain-of-Thought into Generative Retrieval: A Preliminary Study

cs.IR · 2026-05-21 · unverdicted · novelty 6.0

ThinkGR interleaves chain-of-thought with docid generation using hybrid decoding and two-phase training to achieve state-of-the-art results on multi-hop retrieval benchmarks.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use cs.LG · 2026-05-27 · unverdicted · none · ref 22
CARL trains a critic for segment-level credit assignment from binary outcomes in LLM tool-use trajectories, yielding 6.7-9.7 point accuracy gains and 53% fewer calls on solvable questions across five benchmarks.
AdaptR1: Reinforcement Learning Based Adaptive Interleaved Thinking in Multi-hop Question Answering cs.CL · 2026-05-29 · unverdicted · none · ref 3
AdaptR1 uses fully RL-based training with a quality-gated efficiency reward for step-wise adaptive reasoning in multi-hop QA, reducing think tokens by 69.71% on average and 90.35% on HotpotQA with comparable or better performance.
Integrating Chain-of-Thought into Generative Retrieval: A Preliminary Study cs.IR · 2026-05-21 · unverdicted · none · ref 3
ThinkGR interleaves chain-of-thought with docid generation using hybrid decoding and two-phase training to achieve state-of-the-art results on multi-hop retrieval benchmarks.

InProceed- ings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1417–1436, Abu Dhabi, United Arab Emirates

fields

years

verdicts

representative citing papers

citing papers explorer