pith. sign in

arxiv: 2604.19771 · v1 · submitted 2026-03-27 · 💻 cs.CL · cs.AI· cs.IR

Cognis: Context-Aware Memory for Conversational AI Agents

Pith reviewed 2026-05-14 23:27 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.IR
keywords memory architectureconversational agentspersistent memoryretrieval pipelinecontext awarenesslong-term recallbenchmark evaluationLLM agents
0
0 comments X

The pith

A dual-store memory pipeline with context-aware ingestion lets conversational AI agents retain details across sessions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Cognis as a memory architecture that gives LLM agents persistent storage so conversations do not reset between sessions. It combines keyword matching and vector search in a dual backend, fuses their results, and adds an ingestion step that checks existing memories first to track versions cleanly. Temporal signals and a final reranker improve relevance for time-sensitive or complex queries. Tests across two benchmarks and eight generation models show the system reaches top scores. If the approach works as described, agents could build reliable long-term user context without relying solely on the model's internal window.

Core claim

Cognis is a unified memory architecture for conversational AI agents that pairs a dual-store backend of BM25 keyword search and Matryoshka vector similarity, fused by reciprocal rank fusion, with a context-aware ingestion pipeline that retrieves prior memories before extraction to enable version tracking. Temporal boosting and a BGE-2 cross-encoder reranker further refine outputs. When evaluated on the LoCoMo and LongMemEval benchmarks using eight different answer generation models, the architecture achieves state-of-the-art performance on both.

What carries the argument

The multi-stage retrieval pipeline that runs context-aware ingestion before memory extraction, then fuses BM25 keyword matching with Matryoshka vector search via reciprocal rank fusion.

If this is right

  • Agents maintain consistent memory history through version tracking during each new ingestion.
  • Time-sensitive queries gain improved ranking from the temporal boosting step.
  • Final result quality increases after the cross-encoder reranker reorders candidates.
  • Performance gains hold across multiple underlying answer generation models.
  • The open-source implementation supports direct integration into production agent systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dual-store plus versioned ingestion pattern could be adapted for agents that must remember tool-use histories over extended tasks.
  • Accumulated memories might eventually support more accurate personalization without expanding the model's context window.
  • Production deployment implies the pipeline scales to handle concurrent users without frequent store corruption.
  • Extending the temporal boost to include user-specified priorities could further tailor recall in personal assistant settings.

Load-bearing premise

The two benchmarks capture the memory demands that arise in real, open-ended conversations spanning many sessions.

What would settle it

A controlled test in which users hold repeated multi-session conversations with agents and measure how accurately each agent recalls earlier details when using Cognis versus standard context-only baselines.

Figures

Figures reproduced from arXiv: 2604.19771 by Jithin George, Khush Patel, Parshva Daftari, Shreyas Kapale, Siva Surendira.

Figure 1
Figure 1. Figure 1: LoCoMo benchmark F1 scores across four question types. Cognis achieves the highest [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Cross-system accuracy on LongMemEval across six question types. Cognis (orange) [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Dual-store architecture: OpenSearch (documents + native BM25) and VDB (Ma [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Two-panel ingestion pipeline. Left: Immediate storage enables recall before extrac￾tion. Right: Context-aware extraction retrieves similar memories, LLM decides ADD/UP￾DATE/DELETE, version tracking maintains history. 4.2 Speaker Identification Messages follow a structured format that enables speaker identification: [2024 -05 -08 10:30:00] James : I just got a new job at Google ! [2024 -05 -08 10:30:15] Ass… view at source ↗
Figure 5
Figure 5. Figure 5: Context-aware extraction comparison. Left: Without context, the LLM creates conflicting memories (existing memory ignored). Right: With context retrieval, the LLM issues UPDATE operations maintaining consistency. { " operations ": [ { " action ": " UPDATE " , " fact ": " James works at Google as a Senior Engineer " , " replaces_id ": 42 , " category ": " professional " , " event_date ": "2024 -05 -08" } , … view at source ↗
Figure 6
Figure 6. Figure 6: Version chaining for memory history. Each UPDATE creates a new version linked [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Retrieval pipeline: Query analysis → parallel Vector/BM25 search → RRF fusion (70%+30%) → temporal boost, dedup → BGE-2 rerank → results. 5.1 Query Analysis Before search execution, we analyze the query to determine retrieval strategy: Temporal Intent Detection: Keywords like “when”, “yesterday”, “last week”, “on May 8th” trigger temporal boosting. The system extracts the time reference and calculates appr… view at source ↗
Figure 8
Figure 8. Figure 8: Matryoshka two-stage retrieval: truncated 256D embeddings enable fast shortlisting [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
read the original abstract

LLM agents lack persistent memory, causing conversations to reset each session and preventing personalization over time. We present Lyzr Cognis, a unified memory architecture for conversational AI agents that addresses this limitation through a multi-stage retrieval pipeline. Cognis combines a dual-store backend pairing OpenSearch BM25 keyword matching with Matryoshka vector similarity search, fused via Reciprocal Rank Fusion. Its context-aware ingestion pipeline retrieves existing memories before extraction, enabling intelligent version tracking that preserves full memory history while keeping the store consistent. Temporal boosting enhances time-sensitive queries, and a BGE-2 cross-encoder reranker refines final result quality. We evaluate Cognis on two independent benchmarks -- LoCoMo and LongMemEval -- across eight answer generation models, demonstrating state-of-the-art performance on both. The system is open-source and deployed in production serving conversational AI applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents Lyzr Cognis, a unified memory architecture for conversational AI agents. It uses a dual-store backend (OpenSearch BM25 keyword matching paired with Matryoshka vector search, fused via Reciprocal Rank Fusion), a context-aware ingestion pipeline that retrieves existing memories before extraction to enable version tracking, temporal boosting for time-sensitive queries, and a BGE-2 cross-encoder reranker. The system is evaluated on the LoCoMo and LongMemEval benchmarks across eight answer generation models and claims state-of-the-art performance on both; the implementation is released as open-source and deployed in production.

Significance. If the performance claims are robust, Cognis provides a practical, deployable solution to the persistent-memory gap in LLM agents, supporting personalization and consistency over multiple sessions. The engineering focus on a multi-stage retrieval pipeline with explicit versioning and temporal handling addresses real deployment needs. The open-source release and production deployment are concrete strengths that enable reproducibility and external validation.

major comments (2)
  1. [Evaluation] Evaluation section: The SOTA claim on LoCoMo and LongMemEval is presented as evidence that the architecture solves persistent memory for real-world conversational agents, yet the paper provides no discussion or evidence that these benchmarks contain multi-session traces with explicit fact updates, preference drift, or cross-session retrieval demands; without this, superior retrieval performance cannot be attributed to the context-aware ingestion and versioning components that form the central novelty.
  2. [Results] Results and Methods sections: No ablation studies or component-wise breakdowns are reported to quantify the contribution of the context-aware ingestion pipeline versus the retrieval stack (BM25 + Matryoshka + RRF + reranker); this omission makes it impossible to determine whether the reported gains stem from the novel memory-management features or from standard retrieval improvements.
minor comments (2)
  1. [Abstract] Abstract: The claim of evaluation 'across eight answer generation models' does not list the specific models; this information should be added for reproducibility.
  2. [Introduction] The manuscript would benefit from a short related-work subsection contrasting Cognis with prior memory-augmented agent systems (e.g., those using vector stores or episodic memory) to clarify the precise incremental contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive review. We respond to the major comments point-by-point below.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: The SOTA claim on LoCoMo and LongMemEval is presented as evidence that the architecture solves persistent memory for real-world conversational agents, yet the paper provides no discussion or evidence that these benchmarks contain multi-session traces with explicit fact updates, preference drift, or cross-session retrieval demands; without this, superior retrieval performance cannot be attributed to the context-aware ingestion and versioning components that form the central novelty.

    Authors: We thank the referee for highlighting this point. While the original benchmark papers describe LoCoMo and LongMemEval as testing long-context and long-term memory in multi-turn conversations, we agree that explicit analysis of fact updates and preference drift is missing. The context-aware ingestion enables version tracking which is crucial for consistency in such settings. In the revision, we will add a subsection discussing the benchmarks' characteristics and how our components address them. revision: partial

  2. Referee: [Results] Results and Methods sections: No ablation studies or component-wise breakdowns are reported to quantify the contribution of the context-aware ingestion pipeline versus the retrieval stack (BM25 + Matryoshka + RRF + reranker); this omission makes it impossible to determine whether the reported gains stem from the novel memory-management features or from standard retrieval improvements.

    Authors: We acknowledge the value of ablation studies. The full pipeline was evaluated as a unified system, but to isolate the impact of the context-aware ingestion, we will conduct and report additional experiments comparing the system with and without the ingestion pipeline, as well as breakdowns of the retrieval components. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering system evaluated on external benchmarks

full rationale

The paper presents a memory architecture for LLM agents using standard components (BM25, Matryoshka embeddings, RRF, cross-encoder reranker) and a context-aware ingestion pipeline. Claims of SOTA performance rest entirely on results from two independent external benchmarks (LoCoMo, LongMemEval) across eight models. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claims do not reduce to inputs by construction; they are empirical outcomes on held-out benchmarks. This is the expected non-finding for a system-description paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on standard information-retrieval assumptions rather than new axioms or fitted parameters; the Cognis system itself is the primary invented entity.

axioms (2)
  • domain assumption BM25 keyword matching and Matryoshka vector similarity can be fused effectively via Reciprocal Rank Fusion
    Invoked in the dual-store backend description; standard in IR literature
  • domain assumption Retrieving existing memories before extraction enables consistent version tracking
    Core to the context-aware ingestion pipeline
invented entities (1)
  • Lyzr Cognis architecture no independent evidence
    purpose: Provide persistent, context-aware memory for conversational agents
    New integrated system proposed in the paper

pith-pipeline@v0.9.0 · 5460 in / 1319 out tokens · 30893 ms · 2026-05-14T23:27:09.509818+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Cognis combines a dual-store backend pairing OpenSearch BM25 keyword matching with Matryoshka vector similarity search, fused via Reciprocal Rank Fusion... context-aware ingestion pipeline retrieves existing memories before extraction, enabling intelligent version tracking

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

  1. [1]

    SimpleMem: Efficient Lifelong Memory for LLM Agents

    URLhttps://arxiv.org/abs/2601.02553. Code available athttps://github.com/ aiming-lab/SimpleMem. Xueguang Ma, Kai Sun, Ronak Pradeep, and Jimmy Lin. A replication study of dense passage retriever.arXiv preprint arXiv:2104.05740, 2021. URLhttps://arxiv.org/abs/2104.05740. Adyasha Maharana, Dong-Ho Lee, Sergey Tuber, Mohit Jain, Francesco Barbieri, and Mohit...

  2. [2]

    C ar efu ll y analyze all provided memories from both speakers

  3. [3]

    Pay special a tt en ti on to the t i m e s t a m p s to d et er mi ne the answer

  4. [4]

    If the question asks about a specific event or fact , look for direct evidence

  5. [5]

    If the memories contain c o n t r a d i c t o r y information , p r i o r i t i z e the most recent memory

  6. [6]

    last year

    If there is a question about time r e f e r e n c e s ( like " last year " , " two months ago ") , ca lc ul at e the actual date based on the memory ti me sta mp

  7. [7]

    Always convert relative time r e f e r e n c e s to specific dates , months , or years

  8. [8]

    Focus only on the content of the memories from both speakers

  9. [9]

    For lists , include ALL items

    Be concise but COMPLETE . For lists , include ALL items . APPROACH ( Think step by step ) :

  10. [10]

    First , examine all memories that contain i n f o r m a t i o n related to the question

  11. [11]

    Examine the t i m e s t a m p s and content of these memories ca re fu ll y

  12. [12]

    Look for explicit mentions of dates , times , locations , or events that answer the question

  13. [13]

    If the answer requires c a l c u l a t i o n ( e . g . , c o n v e r t i n g relative time r e f e r e n c e s ) , show your work

  14. [14]

    F or mul at e a precise , concise answer based solely on the evidence in the memories

  15. [15]

    Double - check that your answer directly ad dr es ses the question asked

  16. [16]

    FOCUS ON THE TOP 1 -3 MOST RELEVANT MEMORIES

    Ensure your final answer is specific and avoids vague time r e f e r e n c e s Memories for user \{\{ s p e a k e r _ 1 _ u s e r _ i d \}\}: \{\{ s p e a k e r _ 1 _ m e m o r i e s \}\} Memories for user \{\{ s p e a k e r _ 2 _ u s e r _ i d \}\}: 28 \{\{ s p e a k e r _ 2 _ m e m o r i e s \}\} Question : \{\{ question \}\} Answer : A.3 Single-Hop Que...

  17. [17]

    Find the memory that directly answers the question

  18. [18]

    T r a n s g e n d e r woman

    Use EXACT words / phrases from that memory ( e . g . , " T r a n s g e n d e r woman " not " Trans ")

  19. [19]

    For lists ( hobbies , activities , pets ) : include ALL items from the relevant memory

  20. [20]

    Be COMPLETE but CONCISE - give the full answer , no extra e x p l a n a t i o n

  21. [21]

    FOCUS ON THE SINGLE MEMORY that mentions the EXACT event in the question

    IGNORE memories about di ff er ent events / topics Question : \{ question \} Complete answer from the most relevant memory : A.4 Temporal Question Prompt (Category 2) For questions asking WHEN something happened: This is a TEMPORAL question asking WHEN so me th ing happened . FOCUS ON THE SINGLE MEMORY that mentions the EXACT event in the question . Ignor...

  22. [22]

    how long ago

    " how long ago " -> relative terms ( e . g . , "10 years ago ")

  23. [23]

    " when " -> specific date from memory

  24. [24]

    The week before X

    Use exact phrasing like " The week before X " if memory says that Question : \{ question \} Answer ( date / time from the most relevant memory ) : A.5 Multi-Hop Question Prompt (Category 3) For questions requiring careful inference from multiple facts: This is a MULTI - HOP question re qui ri ng careful inf er en ce from facts . CRITICAL IN FER EN CE RULES :

  25. [25]

    S u p p o r t i n g X

    " S u p p o r t i n g X " != " Being X " ( e . g . , s u p p o r t i n g LGBTQ != being LGBTQ member ) 29

  26. [26]

    No explicit mention

    " No explicit mention " does NOT mean " No " - be careful with a s s u m p t i o n s

  27. [27]

    Would X be c o n s i d e r e d a member of

    For " Would X be c o n s i d e r e d a member of ..." -> look for SELF - i d e n t i f i c a t i o n only

  28. [28]

    Would X be c o n s i d e r e d an ally

    For " Would X be c o n s i d e r e d an ally ..." -> s u p p o r t i n g others = being an ally

  29. [29]

    Would X

    Base answers ONLY on explicit s t a t e m e n t s in memories For " Would X ..." qu est io ns : - If clear evidence exists : " Yes " or " No " + brief reason - If in fe rr in g : " Likely yes " or " Likely no " + brief reason - Default to what the evidence actually shows Question : \{ question \} Answer based on evidence : A.6 Open-Domain Question Prompt ...

  30. [30]

    Answer in 1 -5 words MAXIMUM

  31. [31]

    Use EXACT terms from the top - scored memory

  32. [32]

    Do NOT add extra context or e x p l a n a t i o n

  33. [33]

    No p u n c t u a t i o n at the end Question : \{ question \} Concise answer : 30