Holographic Memory for Zero-Shot Compositional Reasoning in Knowledge Graphs: A Mechanistic Study of Where and Why It Fails
Pith reviewed 2026-06-26 01:01 UTC · model grok-4.3
The pith
Holographic memory fails zero-shot composition because facts in chains are intrinsically harder to retrieve even at single hops
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Both HRR and FHRR are competitive single-hop retrievers on FB15k-237 yet produce zero accuracy on zero-shot compositional queries across cleanup temperatures. The bind-unbind algebra and cleanup are not the cause: the first hop succeeds, and even verified-correct intermediates do not enable the second hop. Posing the ground-truth second-hop fact as an atomic query already yields only 26-48 percent of average atomic accuracy, uniformly across relation fan-out. The authors conclude that facts compositional chains pass through are intrinsically harder for the superposed memory to retrieve, a capacity and interference effect present already at a single hop. Lemma 4.1 shows FHRR's softmax cleanup
What carries the argument
Holographic Reduced Representations using circular convolution to bind and unbind symbols into a superposed memory, paired with modern Hopfield cleanup, where retrieval capacity is uneven across facts that appear in potential compositional chains.
If this is right
- Single-hop performance does not guarantee zero-shot compositional ability in superposed holographic memories.
- Facts that appear in multi-hop chains experience systematically higher interference than average facts.
- The bind-unbind operations succeed on the first hop, so redesigning the algebra will not fix composition.
- FHRR's cleanup adds compounding error on chains where the first hop already errs because it lacks phase-equivariance.
Where Pith is reading between the lines
- The capacity effect may appear in other vector-symbolic or binding-based reasoning systems that store facts in superposition.
- Rebalancing training data or loss to favor facts that occur in chains could be tested as a way to improve composition.
- Uniform retrieval across all facts may be a necessary precondition for reliable multi-hop reasoning in memory models.
Load-bearing premise
The measured drop in standalone second-hop retrieval accuracy directly explains the compositional failure without confounding from the specific choice of relations, the probe construction, or dataset statistics on FB15k-237.
What would settle it
Measure whether raising embedding dimension or memory size increases standalone second-hop accuracy on chain facts and simultaneously increases zero-shot compositional accuracy; if composition stays at chance despite better single-fact retrieval, the central claim is false.
Figures
read the original abstract
Knowledge graph embedding (KGE) models predict single-hop links well but have no mechanism for zero-shot compositional queries: multi-hop questions whose relation chains never appeared during training. Holographic Reduced Representations (HRR), which bind and unbind symbols via circular convolution, are a theoretically attractive candidate, since binding is approximately invertible and associative. We test whether this promise holds. We study two holographic memory variants, real-valued HRR and phase-only Fourier HRR (FHRR), each with a modern Hopfield cleanup, on FB15k-237 over five seeds. Four findings follow. First, both are competitive single-hop retrievers (filtered MRR 0.358 +/- 0.002 for HRR, 0.350 +/- 0.021 for FHRR). Second, neither composes zero-shot: accuracy stays at chance across all cleanup temperatures. Third, the main contribution, we localise the failure mechanistically. A hop-1 probe shows the memory recovers the correct intermediate entity with high fidelity (MRR 0.896 +/- 0.002 for HRR), yet composition still fails even with a verified-correct intermediate. A second probe shows why: posing the ground-truth second-hop fact as a standalone atomic query, bypassing composition entirely, already recovers it at only 0.26 to 0.48x average atomic accuracy, uniformly across relation fan-out. The bottleneck is not the bind-unbind algebra or the cleanup; it is that facts compositional chains pass through are intrinsically harder for the superposed memory to retrieve, a capacity and interference effect present already at a single hop. Fourth, we prove (Lemma 4.1) that FHRR's softmax cleanup is not phase-equivariant, compounding the primary failure on the minority of chains where hop-1 itself errs. Fixing zero-shot composition requires improving retrieval capacity under superposition, not just redesigning the cleanup.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that HRR and FHRR with Hopfield cleanup achieve competitive single-hop MRR on FB15k-237 (0.358 and 0.350) but fail at zero-shot compositional queries (chance accuracy). Two probes localize the failure: hop-1 recovers intermediates well (MRR 0.896), yet even verified intermediates do not enable composition; a standalone second-hop probe recovers ground-truth facts at only 0.26-0.48x atomic accuracy uniformly across fan-out. The bottleneck is thus intrinsic single-hop retrieval difficulty under superposition (capacity/interference), not bind-unbind or cleanup. Lemma 4.1 shows FHRR softmax cleanup is not phase-equivariant, compounding errors on some chains.
Significance. If the probe-based localization holds, the result is significant for redirecting work on holographic KGE from algebraic fixes to capacity under superposition. Strengths include multi-seed reporting with standard deviations, explicit probe construction for mechanistic diagnosis, and the formal lemma on non-equivariance.
major comments (2)
- [Abstract] Abstract: The claim that compositional failure reduces to intrinsic hardness of second-hop facts (0.26-0.48x atomic accuracy in the standalone probe) is load-bearing for the central mechanistic conclusion. The manuscript reports uniformity across relation fan-out but provides no comparison of the selected compositional-chain facts against the full FB15k-237 distribution on entity degree, relation frequency, or other statistics; without this, the measured drop could reflect selection bias rather than a general capacity limit.
- [Abstract] Abstract: Reported MRR values include standard deviations across five seeds, but the absence of hyperparameter details, exact probe vector construction/normalization, and data exclusion rules prevents verification that post-hoc choices do not affect the 0.26-0.48x ratio or the hop-1 vs. composition contrast.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will incorporate the suggested additions in the revised manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that compositional failure reduces to intrinsic hardness of second-hop facts (0.26-0.48x atomic accuracy in the standalone probe) is load-bearing for the central mechanistic conclusion. The manuscript reports uniformity across relation fan-out but provides no comparison of the selected compositional-chain facts against the full FB15k-237 distribution on entity degree, relation frequency, or other statistics; without this, the measured drop could reflect selection bias rather than a general capacity limit.
Authors: We agree that explicitly ruling out selection bias strengthens the central claim. In the revision we will add a new table (and accompanying text) that reports the distributions of entity in-degree/out-degree, relation frequency, and triple count for the facts appearing in the zero-shot compositional chains versus the full FB15k-237 training set. This comparison will be performed on the exact same set of chains used for the 0.26–0.48× probe results. revision: yes
-
Referee: [Abstract] Abstract: Reported MRR values include standard deviations across five seeds, but the absence of hyperparameter details, exact probe vector construction/normalization, and data exclusion rules prevents verification that post-hoc choices do not affect the 0.26-0.48x ratio or the hop-1 vs. composition contrast.
Authors: We acknowledge that the current manuscript does not contain the full set of implementation details required for independent verification. In the revised version we will expand Section 3 (Experimental Setup) with (i) the complete hyperparameter table for both HRR and FHRR training, (ii) the precise vector-construction and normalization steps used for each probe, and (iii) the exact data-exclusion criteria applied when forming the compositional-chain test set. We will also release the full experimental code upon acceptance. revision: yes
Circularity Check
No circularity: all central claims are direct empirical measurements on FB15k-237 with no fitted inputs renamed as predictions or self-referential derivations
full rationale
The paper's load-bearing results (single-hop MRR, zero-shot composition failure, hop-1 probe fidelity, and standalone second-hop accuracy ratios) are reported as measured quantities on a fixed benchmark across seeds. No equation or lemma reduces a target quantity to a parameter fitted from that same quantity. Lemma 4.1 is presented as a mathematical proof of non-equivariance and does not rely on the empirical claims. No self-citations are invoked as uniqueness theorems or to justify ansatzes. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Arakelyan, D
E. Arakelyan, D. Daza, P. Minervini, and M. Cochez. Complex query answering with neural link predictors. InICLR, 2021
2021
-
[2]
Bordes, N
A. Bordes, N. Usunier, A. Garcia-Durán, J. Weston, and O. Yakhnenko. Translating em- beddings for modeling multi-relational data. InNeurIPS, 2013
2013
-
[3]
Dettmers, P
T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel. Convolutional 2D knowledge graph embeddings. InAAAI, 2018
2018
-
[4]
E. P. Frady, S. J. Kent, B. A. Olshausen, and F. T. Sommer. Resonator networks, 1: An effi- cient solution for factoring high-dimensional, distributed representations of data structures. Neural Computation, 32(12):2311–2331, 2020
2020
-
[5]
K. Guu, J. Miller, and P. Liang. Traversing knowledge graphs in vector space. InEMNLP, 2015
2015
-
[6]
P. Kanerva. Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors.Cognitive Computation, 1(2):139– 159, 2009
2009
-
[7]
T. A. Plate.Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications, 2003
2003
-
[8]
M. Qu, J. Chen, L.-P. Xhonneux, Y. Bengio, and J. Tang. RNNLogic: Learning logic rules for reasoning on knowledge graphs. InICLR, 2021
2021
-
[9]
Ramsauer, B
H. Ramsauer, B. Schäfl, J. Lehner, P. Seidl, M. Widrich, et al. Hopfield networks is all you need. InICLR, 2021
2021
-
[10]
Ren and J
H. Ren and J. Leskovec. Beta embeddings for multi-hop logical reasoning in knowledge graphs. InNeurIPS, 2020
2020
-
[11]
H. Ren, W. Hu, and J. Leskovec. Query2box: Reasoning over knowledge graphs in vector space using box embeddings. InICLR, 2020
2020
-
[12]
Schlegel, P
K. Schlegel, P. Neubert, and P. Protzel. A comparison of vector symbolic architectures. Artificial Intelligence Review, 55:4523–4555, 2022
2022
-
[13]
Tensorproductvariable bindingandtherepresentationofsymbolicstructures in connectionist systems.Artificial Intelligence, 46(1–2):159–216, 1990
P.Smolensky. Tensorproductvariable bindingandtherepresentationofsymbolicstructures in connectionist systems.Artificial Intelligence, 46(1–2):159–216, 1990
1990
-
[14]
Sun, Z.-H
Z. Sun, Z.-H. Deng, J.-Y. Nie, and J. Tang. RotatE: Knowledge graph embedding by relational rotation in complex space. InICLR, 2019
2019
-
[15]
Toutanova, D
K. Toutanova, D. Chen, P. Pantel, H. Poon, P. Choudhury, and M. Gamon. Representing text for joint embedding of text and knowledge bases. InEMNLP, 2015
2015
-
[16]
Trouillon, J
T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard. Complex embeddings for simple link prediction. InICML, 2016
2016
-
[17]
Xiong, T
W. Xiong, T. Hoang, and W. Y. Wang. DeepPath: A reinforcement learning method for knowledge graph reasoning. InEMNLP, 2017. 14
2017
-
[18]
B. Yang, W. Yih, X. He, J. Gao, and L. Deng. Embedding entities and relations for learning and inference in knowledge bases. InICLR, 2015
2015
-
[19]
F. Yang, Z. Yang, and W. W. Cohen. Differentiable learning of logical rules for knowledge base reasoning. InNeurIPS, 2017. 15
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.