In Findings of the Association for Computational Lin- guistics: ACL 2025, pages 18632–18702, Vienna, Austria

Multi- Challenge: A realistic multi-turn conversation evaluation benchmark challenging to frontier LLMs · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Memora benchmark and FAMA metric show that LLMs and memory agents frequently reuse invalid memories and struggle to reconcile evolving information in long-term interactions.

citing papers explorer

Showing 1 of 1 citing paper.

From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents cs.CL · 2026-04-21 · unverdicted · none · ref 2
Memora benchmark and FAMA metric show that LLMs and memory agents frequently reuse invalid memories and struggle to reconcile evolving information in long-term interactions.

In Findings of the Association for Computational Lin- guistics: ACL 2025, pages 18632–18702, Vienna, Austria

fields

years

verdicts

representative citing papers

citing papers explorer