pith. machine review for the scientific record. sign in

hub

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3102–3116

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

years

2026 9 2025 2

roles

background 1

polarities

background 1

representative citing papers

Gradient Extrapolation-Based Policy Optimization

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

GXPO approximates longer local lookahead in GRPO training via gradient extrapolation from two optimizer steps using three backward passes total, improving pass@1 accuracy by 1.65-5.00 points over GRPO and delivering up to 4x step speedup.

Cost-Aware Learning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

Cost-aware SGD achieves target error with lower total sampling cost than standard methods, and Cost-Aware GRPO reduces token usage by up to 30% in LLM reinforcement learning while matching baseline performance.

Can LLMs Learn to Reason Robustly under Noisy Supervision?

cs.LG · 2026-04-05 · conditional · novelty 6.0

Online Label Refinement lets LLMs learn robust reasoning from noisy supervision by correcting labels when majority answers show rising rollout success and stable history, delivering 3-4% gains on math and reasoning benchmarks even at high noise levels.

ToolRL: Reward is All Tool Learning Needs

cs.LG · 2025-04-16 · conditional · novelty 6.0

A principled reward design for tool selection and application in RL-trained LLMs delivers 17% gains over base models and 15% over SFT across benchmarks.

citing papers explorer

Showing 11 of 11 citing papers.