Sub-network Laplace approximations always underestimate full-model predictive variance, and two new gradient-based and greedy selection rules provide theoretically grounded improvements.
Efficient exploration for llms.arXiv preprint arXiv:2402.00396
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
UG-TTT adds epistemic uncertainty measured by adapter disagreement as an exploration bonus in RL for LLMs, raising maximum reward and diversity on scientific discovery benchmarks.
citing papers explorer
-
Optimality of Sub-network Laplace Approximations: New Results and Methods
Sub-network Laplace approximations always underestimate full-model predictive variance, and two new gradient-based and greedy selection rules provide theoretically grounded improvements.
-
Epistemic Uncertainty for Test-Time Discovery
UG-TTT adds epistemic uncertainty measured by adapter disagreement as an exploration bonus in RL for LLMs, raising maximum reward and diversity on scientific discovery benchmarks.