RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.
To prevent models from refusing to answer our questions, we append the input with an answer prefix to elicit model responses
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2024 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
RULER: What's the Real Context Size of Your Long-Context Language Models?
RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.