Holographic Memory for Zero-Shot Compositional Reasoning in Knowledge Graphs: A Mechanistic Study of Where and Why It Fails

Randhir Kumar

arxiv: 2606.24948 · v1 · pith:QGL5BR4Cnew · submitted 2026-06-23 · 💻 cs.LG · cs.AI

Holographic Memory for Zero-Shot Compositional Reasoning in Knowledge Graphs: A Mechanistic Study of Where and Why It Fails

Randhir Kumar This is my paper

Pith reviewed 2026-06-26 01:01 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords knowledge graph embeddingholographic reduced representationszero-shot compositional reasoningmemory interferencecompositional querieshopfield cleanupknowledge graphs

0 comments

The pith

Holographic memory fails zero-shot composition because facts in chains are intrinsically harder to retrieve even at single hops

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests holographic reduced representations with circular convolution binding on knowledge graphs to see if they can handle zero-shot multi-hop queries whose relation chains are unseen in training. Single-hop retrieval works competitively, and a first-hop probe recovers the correct intermediate entity with high fidelity. Yet composition fails at chance level. Standalone probes on the ground-truth second-hop facts, posed without any composition, recover them at only 0.26 to 0.48 times average atomic accuracy, showing the bottleneck is already present in single retrieval under superposition. A secondary result proves the phase-only variant's cleanup is not phase-equivariant, adding error on some chains.

Core claim

Both HRR and FHRR are competitive single-hop retrievers on FB15k-237 yet produce zero accuracy on zero-shot compositional queries across cleanup temperatures. The bind-unbind algebra and cleanup are not the cause: the first hop succeeds, and even verified-correct intermediates do not enable the second hop. Posing the ground-truth second-hop fact as an atomic query already yields only 26-48 percent of average atomic accuracy, uniformly across relation fan-out. The authors conclude that facts compositional chains pass through are intrinsically harder for the superposed memory to retrieve, a capacity and interference effect present already at a single hop. Lemma 4.1 shows FHRR's softmax cleanup

What carries the argument

Holographic Reduced Representations using circular convolution to bind and unbind symbols into a superposed memory, paired with modern Hopfield cleanup, where retrieval capacity is uneven across facts that appear in potential compositional chains.

If this is right

Single-hop performance does not guarantee zero-shot compositional ability in superposed holographic memories.
Facts that appear in multi-hop chains experience systematically higher interference than average facts.
The bind-unbind operations succeed on the first hop, so redesigning the algebra will not fix composition.
FHRR's cleanup adds compounding error on chains where the first hop already errs because it lacks phase-equivariance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The capacity effect may appear in other vector-symbolic or binding-based reasoning systems that store facts in superposition.
Rebalancing training data or loss to favor facts that occur in chains could be tested as a way to improve composition.
Uniform retrieval across all facts may be a necessary precondition for reliable multi-hop reasoning in memory models.

Load-bearing premise

The measured drop in standalone second-hop retrieval accuracy directly explains the compositional failure without confounding from the specific choice of relations, the probe construction, or dataset statistics on FB15k-237.

What would settle it

Measure whether raising embedding dimension or memory size increases standalone second-hop accuracy on chain facts and simultaneously increases zero-shot compositional accuracy; if composition stays at chance despite better single-fact retrieval, the central claim is false.

Figures

Figures reproduced from arXiv: 2606.24948 by Randhir Kumar.

**Figure 1.** Figure 1: Complex-phasor binding and unbinding in FHRR. Left: Binding adds phases. Right: Unbinding subtracts them via conjugate multiplication, recovering the original entity up to cross-talk noise from other superposed facts. Memory construction. Both models superpose training triples into a single memory vector: M = X (h,r,t) ∈ Ttr eh ⊛ ρr ⊛ et , (6) where ex ∈ R D (HRR) or C D (FHRR) are learned entity embedding… view at source ↗

**Figure 2.** Figure 2: Holographic memory architecture. (a) An atomic query binds the head with the relation, unbinds from the superposed memory, and projects via the Hopfield cleanup. (b) A two-hop query repeats bind-unbind-cleanup. Hop 1 (left cleanup) succeeds with high fidelity; the dashed red box marks where failure actually occurs: in retrieving the second-hop fact from the superposed memory, not in the correctness of mˆ o… view at source ↗

**Figure 3.** Figure 3: Cleanup temperature sweep (Real HRR, representative seed). Zero-shot accuracy stays near chance across all β; no temperature elicits composition. 6.3 Hop-1 Retrieval Succeeds: Localising the Failure to Hop 2 The most obvious explanation for compositional failure is that the model never recovers a usable intermediate entity at hop 1, and that two-hop failure is just hop-1 failure propagated forward. The hop… view at source ↗

**Figure 4.** Figure 4: FHRR phase coherence per seed: phasor cosine of the cleaned two-hop representation to the true target (green) vs. a random entity (red). Both hover near zero. 11 [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Phase error propagation. Mean absolute phase error sits at the uninformative π/2 baseline at both hops; circular variance approaches 1. Renormalisation and hard cleanup. Explicitly re-normalising intermediate and final representations to unit modulus gives zero-shot accuracy < 1 × 10−5 (zero correct in 69,855 pairs). Replacing soft cleanup with hard argmax at both hops likewise gives zero correct answers.… view at source ↗

read the original abstract

Knowledge graph embedding (KGE) models predict single-hop links well but have no mechanism for zero-shot compositional queries: multi-hop questions whose relation chains never appeared during training. Holographic Reduced Representations (HRR), which bind and unbind symbols via circular convolution, are a theoretically attractive candidate, since binding is approximately invertible and associative. We test whether this promise holds. We study two holographic memory variants, real-valued HRR and phase-only Fourier HRR (FHRR), each with a modern Hopfield cleanup, on FB15k-237 over five seeds. Four findings follow. First, both are competitive single-hop retrievers (filtered MRR 0.358 +/- 0.002 for HRR, 0.350 +/- 0.021 for FHRR). Second, neither composes zero-shot: accuracy stays at chance across all cleanup temperatures. Third, the main contribution, we localise the failure mechanistically. A hop-1 probe shows the memory recovers the correct intermediate entity with high fidelity (MRR 0.896 +/- 0.002 for HRR), yet composition still fails even with a verified-correct intermediate. A second probe shows why: posing the ground-truth second-hop fact as a standalone atomic query, bypassing composition entirely, already recovers it at only 0.26 to 0.48x average atomic accuracy, uniformly across relation fan-out. The bottleneck is not the bind-unbind algebra or the cleanup; it is that facts compositional chains pass through are intrinsically harder for the superposed memory to retrieve, a capacity and interference effect present already at a single hop. Fourth, we prove (Lemma 4.1) that FHRR's softmax cleanup is not phase-equivariant, compounding the primary failure on the minority of chains where hop-1 itself errs. Fixing zero-shot composition requires improving retrieval capacity under superposition, not just redesigning the cleanup.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main contribution is localizing HRR compositional failure to intrinsically harder single-hop retrieval on the second facts in chains, backed by two probes and a lemma, though the selection of those chains leaves room for bias questions.

read the letter

The punchline is that HRR and FHRR do fine on single-hop KG links but fail zero-shot composition because the second facts along the chains are already harder to retrieve even when posed as standalone atomic queries. The two probes make this concrete: hop-1 recovery works well, yet the ground-truth second hop only hits 0.26-0.48x the usual atomic accuracy.

What the paper does well is the mechanistic breakdown. Reporting filtered MRR with standard deviations across five seeds on FB15k-237 gives some reproducibility. The hop-1 probe plus the standalone second-hop measurement goes beyond just saying composition fails. The lemma that FHRR softmax cleanup is not phase-equivariant is a clean formal point that explains the minority of cases where hop-1 itself errs. These pieces together shift the diagnosis away from the bind-unbind algebra toward capacity and interference under superposition.

The soft spot is the possibility that the chosen compositional chains are not representative. The abstract notes uniformity across fan-out but does not compare the second-hop facts on frequency, entity degree, or relation rarity against the full training distribution. If those facts systematically differ, the measured drop could partly reflect dataset selection rather than a general limit on superposed memory. The probe construction details would also need checking to rule out subtle differences from the main single-hop evaluation. Without those controls the central claim is suggestive but not fully locked down.

This work is aimed at people building or analyzing memory-augmented models for multi-hop KG reasoning. It has enough new localization and a formal lemma to deserve a serious referee, even if the bias concern requires extra experiments in revision.

Referee Report

2 major / 0 minor

Summary. The paper claims that HRR and FHRR with Hopfield cleanup achieve competitive single-hop MRR on FB15k-237 (0.358 and 0.350) but fail at zero-shot compositional queries (chance accuracy). Two probes localize the failure: hop-1 recovers intermediates well (MRR 0.896), yet even verified intermediates do not enable composition; a standalone second-hop probe recovers ground-truth facts at only 0.26-0.48x atomic accuracy uniformly across fan-out. The bottleneck is thus intrinsic single-hop retrieval difficulty under superposition (capacity/interference), not bind-unbind or cleanup. Lemma 4.1 shows FHRR softmax cleanup is not phase-equivariant, compounding errors on some chains.

Significance. If the probe-based localization holds, the result is significant for redirecting work on holographic KGE from algebraic fixes to capacity under superposition. Strengths include multi-seed reporting with standard deviations, explicit probe construction for mechanistic diagnosis, and the formal lemma on non-equivariance.

major comments (2)

[Abstract] Abstract: The claim that compositional failure reduces to intrinsic hardness of second-hop facts (0.26-0.48x atomic accuracy in the standalone probe) is load-bearing for the central mechanistic conclusion. The manuscript reports uniformity across relation fan-out but provides no comparison of the selected compositional-chain facts against the full FB15k-237 distribution on entity degree, relation frequency, or other statistics; without this, the measured drop could reflect selection bias rather than a general capacity limit.
[Abstract] Abstract: Reported MRR values include standard deviations across five seeds, but the absence of hyperparameter details, exact probe vector construction/normalization, and data exclusion rules prevents verification that post-hoc choices do not affect the 0.26-0.48x ratio or the hop-1 vs. composition contrast.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate the suggested additions in the revised manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that compositional failure reduces to intrinsic hardness of second-hop facts (0.26-0.48x atomic accuracy in the standalone probe) is load-bearing for the central mechanistic conclusion. The manuscript reports uniformity across relation fan-out but provides no comparison of the selected compositional-chain facts against the full FB15k-237 distribution on entity degree, relation frequency, or other statistics; without this, the measured drop could reflect selection bias rather than a general capacity limit.

Authors: We agree that explicitly ruling out selection bias strengthens the central claim. In the revision we will add a new table (and accompanying text) that reports the distributions of entity in-degree/out-degree, relation frequency, and triple count for the facts appearing in the zero-shot compositional chains versus the full FB15k-237 training set. This comparison will be performed on the exact same set of chains used for the 0.26–0.48× probe results. revision: yes
Referee: [Abstract] Abstract: Reported MRR values include standard deviations across five seeds, but the absence of hyperparameter details, exact probe vector construction/normalization, and data exclusion rules prevents verification that post-hoc choices do not affect the 0.26-0.48x ratio or the hop-1 vs. composition contrast.

Authors: We acknowledge that the current manuscript does not contain the full set of implementation details required for independent verification. In the revised version we will expand Section 3 (Experimental Setup) with (i) the complete hyperparameter table for both HRR and FHRR training, (ii) the precise vector-construction and normalization steps used for each probe, and (iii) the exact data-exclusion criteria applied when forming the compositional-chain test set. We will also release the full experimental code upon acceptance. revision: yes

Circularity Check

0 steps flagged

No circularity: all central claims are direct empirical measurements on FB15k-237 with no fitted inputs renamed as predictions or self-referential derivations

full rationale

The paper's load-bearing results (single-hop MRR, zero-shot composition failure, hop-1 probe fidelity, and standalone second-hop accuracy ratios) are reported as measured quantities on a fixed benchmark across seeds. No equation or lemma reduces a target quantity to a parameter fitted from that same quantity. Lemma 4.1 is presented as a mathematical proof of non-equivariance and does not rely on the empirical claims. No self-citations are invoked as uniqueness theorems or to justify ansatzes. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is purely empirical and introduces no new free parameters, axioms, or invented entities beyond standard properties of HRR drawn from prior literature.

pith-pipeline@v0.9.1-grok · 5892 in / 1164 out tokens · 52133 ms · 2026-06-26T01:01:04.245374+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references

[1]

Arakelyan, D

E. Arakelyan, D. Daza, P. Minervini, and M. Cochez. Complex query answering with neural link predictors. InICLR, 2021

2021
[2]

Bordes, N

A. Bordes, N. Usunier, A. Garcia-Durán, J. Weston, and O. Yakhnenko. Translating em- beddings for modeling multi-relational data. InNeurIPS, 2013

2013
[3]

Dettmers, P

T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel. Convolutional 2D knowledge graph embeddings. InAAAI, 2018

2018
[4]

E. P. Frady, S. J. Kent, B. A. Olshausen, and F. T. Sommer. Resonator networks, 1: An effi- cient solution for factoring high-dimensional, distributed representations of data structures. Neural Computation, 32(12):2311–2331, 2020

2020
[5]

K. Guu, J. Miller, and P. Liang. Traversing knowledge graphs in vector space. InEMNLP, 2015

2015
[6]

P. Kanerva. Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors.Cognitive Computation, 1(2):139– 159, 2009

2009
[7]

T. A. Plate.Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications, 2003

2003
[8]

M. Qu, J. Chen, L.-P. Xhonneux, Y. Bengio, and J. Tang. RNNLogic: Learning logic rules for reasoning on knowledge graphs. InICLR, 2021

2021
[9]

Ramsauer, B

H. Ramsauer, B. Schäfl, J. Lehner, P. Seidl, M. Widrich, et al. Hopfield networks is all you need. InICLR, 2021

2021
[10]

Ren and J

H. Ren and J. Leskovec. Beta embeddings for multi-hop logical reasoning in knowledge graphs. InNeurIPS, 2020

2020
[11]

H. Ren, W. Hu, and J. Leskovec. Query2box: Reasoning over knowledge graphs in vector space using box embeddings. InICLR, 2020

2020
[12]

Schlegel, P

K. Schlegel, P. Neubert, and P. Protzel. A comparison of vector symbolic architectures. Artificial Intelligence Review, 55:4523–4555, 2022

2022
[13]

Tensorproductvariable bindingandtherepresentationofsymbolicstructures in connectionist systems.Artificial Intelligence, 46(1–2):159–216, 1990

P.Smolensky. Tensorproductvariable bindingandtherepresentationofsymbolicstructures in connectionist systems.Artificial Intelligence, 46(1–2):159–216, 1990

1990
[14]

Sun, Z.-H

Z. Sun, Z.-H. Deng, J.-Y. Nie, and J. Tang. RotatE: Knowledge graph embedding by relational rotation in complex space. InICLR, 2019

2019
[15]

Toutanova, D

K. Toutanova, D. Chen, P. Pantel, H. Poon, P. Choudhury, and M. Gamon. Representing text for joint embedding of text and knowledge bases. InEMNLP, 2015

2015
[16]

Trouillon, J

T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard. Complex embeddings for simple link prediction. InICML, 2016

2016
[17]

Xiong, T

W. Xiong, T. Hoang, and W. Y. Wang. DeepPath: A reinforcement learning method for knowledge graph reasoning. InEMNLP, 2017. 14

2017
[18]

B. Yang, W. Yih, X. He, J. Gao, and L. Deng. Embedding entities and relations for learning and inference in knowledge bases. InICLR, 2015

2015
[19]

F. Yang, Z. Yang, and W. W. Cohen. Differentiable learning of logical rules for knowledge base reasoning. InNeurIPS, 2017. 15

2017

[1] [1]

Arakelyan, D

E. Arakelyan, D. Daza, P. Minervini, and M. Cochez. Complex query answering with neural link predictors. InICLR, 2021

2021

[2] [2]

Bordes, N

A. Bordes, N. Usunier, A. Garcia-Durán, J. Weston, and O. Yakhnenko. Translating em- beddings for modeling multi-relational data. InNeurIPS, 2013

2013

[3] [3]

Dettmers, P

T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel. Convolutional 2D knowledge graph embeddings. InAAAI, 2018

2018

[4] [4]

E. P. Frady, S. J. Kent, B. A. Olshausen, and F. T. Sommer. Resonator networks, 1: An effi- cient solution for factoring high-dimensional, distributed representations of data structures. Neural Computation, 32(12):2311–2331, 2020

2020

[5] [5]

K. Guu, J. Miller, and P. Liang. Traversing knowledge graphs in vector space. InEMNLP, 2015

2015

[6] [6]

P. Kanerva. Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors.Cognitive Computation, 1(2):139– 159, 2009

2009

[7] [7]

T. A. Plate.Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications, 2003

2003

[8] [8]

M. Qu, J. Chen, L.-P. Xhonneux, Y. Bengio, and J. Tang. RNNLogic: Learning logic rules for reasoning on knowledge graphs. InICLR, 2021

2021

[9] [9]

Ramsauer, B

H. Ramsauer, B. Schäfl, J. Lehner, P. Seidl, M. Widrich, et al. Hopfield networks is all you need. InICLR, 2021

2021

[10] [10]

Ren and J

H. Ren and J. Leskovec. Beta embeddings for multi-hop logical reasoning in knowledge graphs. InNeurIPS, 2020

2020

[11] [11]

H. Ren, W. Hu, and J. Leskovec. Query2box: Reasoning over knowledge graphs in vector space using box embeddings. InICLR, 2020

2020

[12] [12]

Schlegel, P

K. Schlegel, P. Neubert, and P. Protzel. A comparison of vector symbolic architectures. Artificial Intelligence Review, 55:4523–4555, 2022

2022

[13] [13]

Tensorproductvariable bindingandtherepresentationofsymbolicstructures in connectionist systems.Artificial Intelligence, 46(1–2):159–216, 1990

P.Smolensky. Tensorproductvariable bindingandtherepresentationofsymbolicstructures in connectionist systems.Artificial Intelligence, 46(1–2):159–216, 1990

1990

[14] [14]

Sun, Z.-H

Z. Sun, Z.-H. Deng, J.-Y. Nie, and J. Tang. RotatE: Knowledge graph embedding by relational rotation in complex space. InICLR, 2019

2019

[15] [15]

Toutanova, D

K. Toutanova, D. Chen, P. Pantel, H. Poon, P. Choudhury, and M. Gamon. Representing text for joint embedding of text and knowledge bases. InEMNLP, 2015

2015

[16] [16]

Trouillon, J

T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard. Complex embeddings for simple link prediction. InICML, 2016

2016

[17] [17]

Xiong, T

W. Xiong, T. Hoang, and W. Y. Wang. DeepPath: A reinforcement learning method for knowledge graph reasoning. InEMNLP, 2017. 14

2017

[18] [18]

B. Yang, W. Yih, X. He, J. Gao, and L. Deng. Embedding entities and relations for learning and inference in knowledge bases. InICLR, 2015

2015

[19] [19]

F. Yang, Z. Yang, and W. W. Cohen. Differentiable learning of logical rules for knowledge base reasoning. InNeurIPS, 2017. 15

2017