Hierarchical Long-Term Semantic Memory for LinkedIn's Hiring Agent

Emir Poyraz; Karthik Ramgopal; Praveen Kumar Bodigutla; Shangjin Zhang; Xiaofeng Wang; Xiaoyang Gu; Xie Lu; Ye Jin; Yvonne Li; Zhentao Xu

arxiv: 2604.26197 · v3 · pith:PPMBUTRRnew · submitted 2026-04-29 · 💻 cs.IR · cs.LG

Hierarchical Long-Term Semantic Memory for LinkedIn's Hiring Agent

Zhentao Xu , Shangjin Zhang , Emir Poyraz , Yvonne Li , Ye Jin , Xie Lu , Xiaoyang Gu , Karthik Ramgopal

show 2 more authors

Praveen Kumar Bodigutla Xiaofeng Wang

This is my paper

Pith reviewed 2026-05-07 13:28 UTC · model grok-4.3

classification 💻 cs.IR cs.LG

keywords long-term memorysemantic memoryhierarchical structureLLM agentspersonalizationretrievalschema alignment

0 comments

The pith

A schema-aligned hierarchical memory tree lets LLM agents store and retrieve long-term semantic knowledge with over 10% gains in correctness and retrieval quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the Hierarchical Long-Term Semantic Memory framework as a way to handle noisy longitudinal user data for LLM agents that need personalized, context-aware responses. It structures this data into a tree where each node follows a predefined schema, allowing knowledge to sit at different levels of detail so that broad patterns and specific facts can both be accessed quickly. The design adds an adaptation step that tunes the tree for new domains without rebuilding from scratch. Evaluations in a hiring-agent setting report more than 10% better answer correctness and retrieval F1 scores, plus a better balance between query time and indexing cost. The system is now running in production for core personalization tasks.

Core claim

The paper claims that representing long-term semantic memory as a schema-aligned tree that holds knowledge at multiple granularities, combined with an adaptation mechanism, solves the joint problems of scalable ingestion, privacy-aware storage, low-latency retrieval, and observable provenance, producing more than 10% higher answer correctness and retrieval F1 while moving the query-versus-indexing latency frontier outward.

What carries the argument

The schema-aligned memory tree that stores semantic knowledge at multiple levels of granularity and incorporates an adaptation mechanism for cross-domain use.

If this is right

Ingestion of noisy longitudinal behavioral data becomes scalable because the tree grows incrementally along schema paths.
Storage can remain privacy-aware since only the structured nodes, not raw logs, need to be kept.
Retrieval latency drops because queries can target the appropriate granularity level instead of scanning everything.
Provenance stays transparent because every retrieved fact traces back to its originating node and schema.
The adaptation mechanism allows the same tree structure to be reused in new domains with limited additional tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The tree structure might naturally support selective forgetting or data minimization, which could help satisfy stricter privacy rules without extra engineering.
Similar hierarchical organization could be tested in other agent settings that accumulate user history, such as personal scheduling or customer-support assistants.
The latency gains suggest that indexing cost might stay manageable even as the number of users grows, provided the schema remains stable.
If the adaptation step can be made fully automatic, the framework could reduce the need for per-domain engineering effort.

Load-bearing premise

The schema-aligned memory tree and adaptation mechanism will work across many different applications and the reported gains on internal data will hold when baselines and data splits are chosen independently.

What would settle it

Applying the same tree construction and retrieval procedure to an independent, publicly available long-term memory benchmark and measuring no improvement in correctness or in the latency trade-off would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 2604.26197 by Emir Poyraz, Karthik Ramgopal, Praveen Kumar Bodigutla, Shangjin Zhang, Xiaofeng Wang, Xiaoyang Gu, Xie Lu, Ye Jin, Yvonne Li, Zhentao Xu.

**Figure 1.** Figure 1: LinkedIn Hiring Assistant with HLTM: a recruiter initiates a hiring project; the hiring assistant queries HLTM in natural language to retrieve preference signals, then uses the returned information to update structured hiring requirements. Query Latency (s) Answer Correctness 3 0.3 0.4 4 5 6 7 8 9 10 20 0.5 0.6 0.7 0.8 HLTM (ours) HLTM (ours) view at source ↗

**Figure 2.** Figure 2: Performance–latency trade-off across evaluated view at source ↗

**Figure 3.** Figure 3: Overview of Hierarchical Long-Term Semantic Memory ( view at source ↗

**Figure 4.** Figure 4: Lossless incremental nearline indexing in view at source ↗

**Figure 5.** Figure 5: Query vs. indexing latency: HLTM advances the Pareto frontier. 2Disclaimer: Results may vary in production environments or with different datasets. 3Disclaimer: Results may vary in production environments or with different datasets. 4.6 Ablation Study and Analysis We conduct an ablation study to quantify the contributions of tree aggregation, adaptation, and each memory representation view at source ↗

**Figure 6.** Figure 6: Hyperparameter analysis shows HLTM has no early peak and quickly plateaus at small 𝑘, indicating robustness to 𝑘 beyond a small threshold. User Setup Set up environment Supervisor Planner Based on user message, chat history, and current workflow state Supervisor Planning Scenario Scenario ? Plan instruction Task List Based on task result, update plan (remaining task list) Supervisor Replanner All done, or … view at source ↗

**Figure 7.** Figure 7: HLTM’s Production Use Case in Hiring Assistant 5 Production Use Case LinkedIn Hiring Assistant (LiHA) [8] is an AI agent for recruiters, powered by LinkedIn’s dynamic talent network, that helps recruiters discover and engage candidates with greater speed and scale. Architecturally, LiHA is a plan-and-execute system centered on a supervisor agent that interprets recruiter intent and orchestrates specializ… view at source ↗

read the original abstract

Large Language Model (LLM) agents are increasingly used in real-world products, where personalized and context-aware user interactions are essential. A central enabler of such capabilities is the agent's long-term semantic memory system, which extracts implicit and explicit signals from noisy longitudinal behavioral data, stores them in a structured form, and supports low-latency retrieval. Building industrial-grade long-term memory for LLM agents raises five challenges: scalability, low-latency retrieval, privacy constraints, adaptability, and observability. We introduce the Hierarchical Long-Term Semantic Memory (HLTM) framework, which organizes textual data into a schema-aligned memory tree that captures semantic knowledge at multiple levels of granularity, enabling scalable ingestion, privacy-aware storage, low-latency retrieval, and transparent provenance; HLTM further incorporates an adaptation mechanism to generalize across diverse use cases. Extensive evaluations on LinkedIn's Hiring Assistant show that HLTM improves answer correctness by more than 5% and retrieval F1 by more than 10%, while significantly advancing the Pareto frontier between query and indexing latency. HLTM has been fully deployed in LinkedIn's Hiring Assistant to power core personalization features in production hiring workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HLTM gives a concrete deployed example of hierarchical memory for production agents but the >10% gains rest on internal data without disclosed baselines or ablations.

read the letter

This paper presents Hierarchical Long-Term Semantic Memory, a tree-based system that stores user signals from LinkedIn's Hiring Assistant at multiple granularities and adds an adaptation step to handle new domains. The authors focus on five practical constraints—scalability, latency, privacy, generalizability, and observability—and show how a schema-aligned tree plus provenance tracking addresses them in a live product. That framing and the fact of production deployment are the clearest contributions here; many academic memory papers skip these constraints entirely.

Referee Report

2 major / 0 minor

Summary. The paper introduces the Hierarchical Long-Term Semantic Memory (HLTM) framework for LLM agents, which structures longitudinal behavioral data into a schema-aligned memory tree supporting multi-granularity semantic knowledge. This addresses industrial challenges including scalability, low-latency retrieval, privacy, cross-domain generalizability, and observability, with an adaptation mechanism for diverse use cases. Evaluations on LinkedIn's Hiring Assistant data report >10% gains in answer correctness and retrieval F1, plus Pareto improvements in query/indexing latency; the system is deployed in production for personalization in hiring workflows.

Significance. If the empirical results hold under scrutiny, the work is significant for industrial information retrieval and LLM agent systems. It offers a deployable solution to long-term memory challenges with explicit attention to privacy and latency trade-offs, backed by real-world production use at LinkedIn. This provides a concrete reference point for similar personalization tasks in hiring and recommendation domains.

major comments (2)

[Abstract / Evaluation] Abstract and Evaluation section: The central claims of >10% improvements in answer correctness and retrieval F1 (plus Pareto frontier advance) are stated without any reported details on test-set size, query distribution, baseline definitions (e.g., standard RAG or prior memory systems), statistical tests, ablation results, or data characteristics. This omission is load-bearing because the headline performance gains cannot be verified as robust rather than artifacts of data selection or weak baselines.
[HLTM Framework] Framework description (likely §3): The adaptation mechanism is asserted to enable generalization across use cases, yet no cross-domain, hold-out, or external validation experiments are described to support this claim, leaving the generalizability assertion unsupported by evidence.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback, which highlights important areas for strengthening the presentation of our empirical results and the generalizability discussion. We have revised the manuscript to provide additional context and clarifications while respecting the proprietary constraints of the LinkedIn production data.

read point-by-point responses

Referee: [Abstract / Evaluation] Abstract and Evaluation section: The central claims of >10% improvements in answer correctness and retrieval F1 (plus Pareto frontier advance) are stated without any reported details on test-set size, query distribution, baseline definitions (e.g., standard RAG or prior memory systems), statistical tests, ablation results, or data characteristics. This omission is load-bearing because the headline performance gains cannot be verified as robust rather than artifacts of data selection or weak baselines.

Authors: We agree that greater transparency on the experimental setup is warranted. In the revised manuscript, we have expanded the Evaluation section (and updated the abstract for consistency) to report test-set size, high-level query characteristics, explicit baseline definitions (including standard RAG and prior memory systems), statistical significance testing, and ablation studies isolating the contribution of the hierarchical structure and adaptation mechanism. Due to privacy and proprietary constraints, we report aggregated statistics rather than raw query distributions or individual examples. revision: yes
Referee: [HLTM Framework] Framework description (likely §3): The adaptation mechanism is asserted to enable generalization across use cases, yet no cross-domain, hold-out, or external validation experiments are described to support this claim, leaving the generalizability assertion unsupported by evidence.

Authors: We acknowledge that the generalizability claim would be strengthened by additional empirical validation. The current work evaluates HLTM on LinkedIn's Hiring Assistant, a complex production setting. The adaptation mechanism is presented in Section 3 as a modular, schema-driven component intended to support diverse domains. In the revision, we have added a new subsection in the Discussion that explicitly addresses design choices supporting generalization, outlines how the mechanism can be applied to other use cases, and states the limitations of validating only within the hiring domain. revision: partial

standing simulated objections not resolved

Full raw query distributions and per-user data characteristics, which cannot be disclosed due to LinkedIn's privacy policies and data protection regulations.

Circularity Check

0 steps flagged

No circularity: claims rest on empirical system evaluation without self-referential derivations

full rationale

The paper describes the HLTM framework as a hierarchical memory tree with schema alignment and an adaptation mechanism for LLM agents in hiring workflows. Central performance claims (>10% gains in correctness and F1, plus Pareto latency improvements) are presented as results of extensive evaluations on LinkedIn's internal Hiring Assistant data and production deployment. No equations, fitted parameters, predictions, or uniqueness theorems appear in the abstract or described structure that could reduce to inputs by construction. No self-citations are invoked as load-bearing justification for core premises. The derivation chain is therefore self-contained as an engineering contribution validated externally to any internal definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.0 · 5530 in / 1131 out tokens · 78680 ms · 2026-05-07T13:28:13.899470+00:00 · methodology

Hierarchical Long-Term Semantic Memory for LinkedIn's Hiring Agent

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)