One thousand and one pairs: A "novel" challenge for long-context language models

Marzena Karpinska, Katherine Thai, Kyle Lo, Tanya Goyal, Mohit Iyyer · 2024 · DOI 10.18653/v1/2024.emnlp-main.948

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.

Attention Flows: Tracing LLM Conceptual Engagement via Story Summaries

cs.CL · 2026-04-07 · unverdicted · novelty 7.0

LLM novel summaries emphasize endings more than human ones, measured by aligning summary sentences to referenced chapters.

citing papers explorer

Showing 2 of 2 citing papers.

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues cs.CL · 2026-05-12 · unverdicted · none · ref 74
LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.
Attention Flows: Tracing LLM Conceptual Engagement via Story Summaries cs.CL · 2026-04-07 · unverdicted · none · ref 14
LLM novel summaries emphasize endings more than human ones, measured by aligning summary sentences to referenced chapters.

One thousand and one pairs: A "novel" challenge for long-context language models

fields

years

verdicts

representative citing papers

citing papers explorer