* Low Score: The author took the most literal or common interpretation of the prompt

**Originality (Innovative approach to the prompt):** * *High Score:* The author interpreted the prompt in a way that is fresh or subverts expectations

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization

cs.CL · 2026-04-13 · unverdicted · novelty 6.0

Policy Split bifurcates LLM policies into normal and high-entropy modes with dual-mode entropy regularization to enhance exploration while preserving task accuracy.

citing papers explorer

Showing 1 of 1 citing paper.

Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization cs.CL · 2026-04-13 · unverdicted · none · ref 7
Policy Split bifurcates LLM policies into normal and high-entropy modes with dual-mode entropy regularization to enhance exploration while preserving task accuracy.

* Low Score: The author took the most literal or common interpretation of the prompt

fields

years

verdicts

representative citing papers

citing papers explorer