Dynamic Boundary Evaluation adaptively identifies each LLM's performance boundary on a shared difficulty scale using a calibrated item bank and a search algorithm.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.AI 2years
2026 2representative citing papers
Recovering an orthogonal basis from model activations yields a model-native skill characterization that improves reasoning Pass@1 by up to 41% via targeted data selection and supports inference steering, outperforming human-characterized alternatives.
citing papers explorer
-
Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models
Dynamic Boundary Evaluation adaptively identifies each LLM's performance boundary on a shared difficulty scale using a calibrated item bank and a search algorithm.
-
Characterizing Model-Native Skills
Recovering an orthogonal basis from model activations yields a model-native skill characterization that improves reasoning Pass@1 by up to 41% via targeted data selection and supports inference steering, outperforming human-characterized alternatives.