TOKI types four common contradiction-resolution heuristics as bitemporal operators on a dual-row schema, supplies soundness theorems, and shows via a verdict matrix that it alone avoids three write-time anomalies while retaining a language-model judge.
Yohei Nakajima
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
AgingBench demonstrates multi-dimensional degradation in deployed AI agents through four aging mechanisms diagnosed by temporal graphs and counterfactual probes across hundreds of runs.
MemConflict provides a benchmark for testing LLM long-term memory systems under dynamic, static, and conditional conflicts involving temporal validity, factual correctness, and contextual applicability.
Introduces BeliefTrack benchmark diagnosing three CBM failures in LLMs and shows RL with belief-state rewards cuts failure rates by 70.9% while representation steering cuts them by 46.1%.
Survey mapping persistent state in LLM agents along six axes and proposing the AOEP-v0 protocol to evaluate governance and recovery obligations.
citing papers explorer
-
TOKI: A Bitemporal Operator Algebra for Contradiction Resolution in LLM-Agent Persistent Memory
TOKI types four common contradiction-resolution heuristics as bitemporal operators on a dual-row schema, supplies soundness theorems, and shows via a verdict matrix that it alone avoids three write-time anomalies while retaining a language-model judge.
-
Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems
AgingBench demonstrates multi-dimensional degradation in deployed AI agents through four aging mechanisms diagnosed by temporal graphs and counterfactual probes across hundreds of runs.
-
MemConflict: Evaluating Long-Term Memory Systems Under Memory Conflicts
MemConflict provides a benchmark for testing LLM long-term memory systems under dynamic, static, and conditional conflicts involving temporal validity, factual correctness, and contextual applicability.
-
When Should Models Change Their Minds? Contextual Belief Management in Large Language Models
Introduces BeliefTrack benchmark diagnosing three CBM failures in LLMs and shows RL with belief-state rewards cuts failure rates by 70.9% while representation steering cuts them by 46.1%.
-
Always-OnAgents:A Survey of Persistent Memory, State, and Governance in LLMAgents
Survey mapping persistent state in LLM agents along six axes and proposing the AOEP-v0 protocol to evaluate governance and recovery obligations.