arxiv: 2604.11274 · v1 · submitted 2026-04-13 · 💻 cs.LG · cs.IR

Recognition: unknown

Mycelium-Index: A Streaming Approximate Nearest Neighbor Index with Myelial Edge Decay, Traffic-Driven Reinforcement, and Adaptive Living Hierarchy

Anton Pakhunov

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:50 UTC · model grok-4.3

classification 💻 cs.LG cs.IR

keywords approximate nearest neighborstreaming indexdynamic graphhigh-dimensional vectorsmycelium-inspiredrecall optimizationmemory efficiencytopological repair

0 comments

The pith

Mycelium-inspired streaming ANN index matches recall of top systems with over five times less RAM.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces mycelium-index, a streaming approximate nearest neighbor index that adapts its topology dynamically through mechanisms modeled on fungal mycelium growth. By employing myelial edge decay, traffic-driven reinforcement, and an adaptive living hierarchy, the index maintains high recall while drastically reducing memory requirements compared to existing methods like FreshDiskANN and HNSW. A sympathetic reader would care because high-dimensional similarity search underpins many machine learning applications, and lower resource demands could make advanced search feasible on smaller hardware. The work also shows that topological repair strategies succeed in high dimensions where geometric ones fail, pointing to a general principle for maintaining dynamic graphs.

Core claim

Mycelium-index is a streaming ANN index for high-dimensional vector spaces that continuously adapts its topology through myelial edge decay and reinforcement, a traffic-driven living hierarchy, and hybrid deletion combining O(1) bypass for cold nodes with O(k) beam-search repair for hub nodes. On SIFT-1M, it achieves 0.927 recall@5 under 100% turnover, matching FreshDiskANN within error bars, at 5.7x less RAM and 4.7x higher QPS; on static, it matches HNSW recall at 5.2x less RAM. A study of ten repair mechanisms reveals that topological mechanisms succeed in high dimensions while geometric heuristics fail.

What carries the argument

The myelial graph structure with edge decay based on inactivity, traffic-driven reinforcement of connections, and adaptive living hierarchy that reorganizes nodes based on query patterns.

If this is right

Comparable recall can be achieved in streaming settings with substantially reduced memory usage.
Higher query throughput is possible through the adaptive mechanisms and performance optimizations.
Topological repair invariance holds in high-dimensional ANN graphs, favoring certain maintenance strategies.
Hybrid deletion mechanisms enable efficient node removal without full rebuilds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach may generalize to other dynamic graph problems beyond nearest neighbor search, such as in recommendation systems or network routing.
Memory savings could allow deployment of large-scale vector search on edge devices or in cloud environments with cost constraints.
The failure of geometric heuristics suggests that high-dimensional geometry behaves differently for graph repair than expected.

Load-bearing premise

That the combination of myelial edge decay, traffic-driven reinforcement, adaptive living hierarchy, and hybrid deletion mechanisms will produce the reported performance gains without hidden post-hoc tuning or dataset-specific optimizations that do not generalize.

What would settle it

Re-running the FreshDiskANN 100%-turnover benchmark protocol on the SIFT-1M dataset and finding that mycelium's recall@5 falls significantly below the 0.927 +/- 0.028 interval or its RAM usage exceeds 88 MB substantially.

read the original abstract

We present mycelium-index, a streaming approximate nearest neighbor (ANN) index for high-dimensional vector spaces, inspired by the adaptive growth patterns of biological mycelium. The system continuously adapts its topology through myelial edge decay and reinforcement, a traffic-driven living hierarchy, and hybrid deletion combining O(1) bypass for cold nodes with O(k) beam-search repair for hub nodes. Experimental evaluation on SIFT-1M demonstrates that mycelium achieves 0.927 +/- 0.028 recall@5 under FreshDiskANN's 100%-turnover benchmark protocol -- within the measurement confidence interval of FreshDiskANN's ~0.95 -- while using 5.7x less RAM (88 MB vs. >500 MB) and achieving 4.7x higher QPS (2,795 vs. ~600). On the static index, at ef=192, mycelium matches HNSW M=16 recall (0.962 vs. 0.965) at 5.2x less RAM (163 MB vs. 854 MB). Performance optimizations including NEON SIMD distance computation, Vec-backed node storage, and bitset visited tracking yield a cumulative 2.7x QPS improvement. A systematic study of ten streaming repair mechanisms finds that geometric heuristics universally fail in high dimensions, while topological mechanisms succeed -- a principle we term the topological repair invariance of high-dimensional ANN graphs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mycelium-Index gets near FreshDiskANN recall on SIFT-1M with 5x less RAM via new decay and repair mechanisms, but all results are on one dataset with no sensitivity checks.

read the letter

This paper's main contribution is a new streaming ANN index called Mycelium that gets close to FreshDiskANN's recall on SIFT-1M while using far less memory and running faster, thanks to some fungus-inspired graph adaptations. The novel parts are the myelial edge decay and reinforcement based on traffic, the adaptive living hierarchy, and the hybrid deletion that uses quick bypass for cold nodes and beam search for hubs. They also ran a study comparing ten different ways to repair the graph during streaming and found that topological fixes work reliably in high dimensions while geometric ones don't. That's a useful observation. The practical optimizations with SIMD and bitsets are straightforward but effective, adding up to a 2.7x QPS boost. The numbers are concrete and include some error bars. On the streaming benchmark with full turnover, it hits 0.927 recall at 88 MB RAM and 2795 QPS, compared to over 500 MB and 600 QPS for the baseline. Static case is similar, matching HNSW recall at lower memory. If these hold up, it's a solid engineering win for memory efficiency. The soft spot is the single dataset. All results are on SIFT-1M, with no tests on other common benchmarks or different dimensionalities. The abstract doesn't show any parameter sensitivity sweeps or confirm that the same settings worked without adjustment for this data. That leaves open the chance that the gains are partly from careful tuning rather than the mechanisms alone. The full paper might address this, but it's not clear from the summary. This is aimed at people who build or use approximate nearest neighbor indexes for large vector data, especially in streaming or dynamic settings. Engineers working on recommendation systems or search would find the memory savings interesting. The repair study could be useful for others designing dynamic graphs. It deserves peer review to verify the implementation details, check for any hidden tuning, and see if the results generalize. The claims are empirical and specific enough that referees could test them.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Mycelium-Index, a biologically-inspired streaming ANN index that adapts its graph topology using myelial edge decay, traffic-driven reinforcement, an adaptive living hierarchy, and hybrid deletion strategies. Experimental results on the SIFT-1M dataset show that under a 100% turnover streaming benchmark, it achieves a recall@5 of 0.927 ± 0.028, which is within the confidence interval of FreshDiskANN's performance, while consuming 5.7 times less RAM and delivering 4.7 times higher queries per second. For static indexes, it matches HNSW recall at ef=192 with 5.2 times less memory. The paper also includes a comparative study of ten streaming repair mechanisms, leading to the observation that topological mechanisms succeed where geometric heuristics fail in high-dimensional spaces.

Significance. Should the performance claims prove robust and generalizable beyond SIFT-1M, this work would offer a valuable contribution to the design of memory-efficient dynamic ANN indexes for high-dimensional vector search. The emphasis on topological repair mechanisms and the empirical demonstration of low-memory streaming performance could influence future index designs. The inclusion of a systematic study on repair mechanisms provides a useful empirical principle for the community.

major comments (2)

The central performance claims (recall@5 of 0.927 ± 0.028, 5.7x RAM reduction to 88 MB, 4.7x QPS improvement under 100% turnover) are supported only by results on SIFT-1M. The manuscript provides no multi-dataset experiments or ablation studies on the sensitivity of myelial decay and reinforcement parameters, which is necessary to confirm that the reported gains are intrinsic to the proposed mechanisms rather than resulting from dataset-specific tuning.
The abstract and experimental description provide no explicit statement on whether the same hyperparameter set was used without dataset-specific adjustment, nor details on how the ten repair mechanisms were selected or implemented, raising the possibility that the headline efficiency numbers depend on post-hoc choices tuned to SIFT-1M's 128-D distribution.

minor comments (2)

The abstract mentions 'NEON SIMD distance computation, Vec-backed node storage, and bitset visited tracking' yielding a cumulative 2.7x QPS improvement; a brief explanation of these optimizations in the main text would enhance clarity for readers unfamiliar with low-level implementation techniques.
The term 'topological repair invariance of high-dimensional ANN graphs' is introduced as a principle but would benefit from a more precise definition or supporting reference when first presented.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and for highlighting the potential contribution of Mycelium-Index to memory-efficient dynamic ANN indexes. We address each major comment below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: The central performance claims (recall@5 of 0.927 ± 0.028, 5.7x RAM reduction to 88 MB, 4.7x QPS improvement under 100% turnover) are supported only by results on SIFT-1M. The manuscript provides no multi-dataset experiments or ablation studies on the sensitivity of myelial decay and reinforcement parameters, which is necessary to confirm that the reported gains are intrinsic to the proposed mechanisms rather than resulting from dataset-specific tuning.

Authors: We acknowledge the limitation of evaluating exclusively on SIFT-1M. This dataset was selected as it is a standard benchmark for high-dimensional ANN search and enables direct comparison with prior streaming methods like FreshDiskANN under the 100% turnover protocol. To address sensitivity to parameters, we will incorporate ablation studies on the myelial decay rate and traffic-driven reinforcement threshold in the revised manuscript. These studies will show the performance stability across a range of parameter values on SIFT-1M. While we do not have additional dataset results at this time, we will add a discussion section noting that the topological mechanisms are designed to be dataset-agnostic in high dimensions and outline future multi-dataset validation. revision: partial
Referee: The abstract and experimental description provide no explicit statement on whether the same hyperparameter set was used without dataset-specific adjustment, nor details on how the ten repair mechanisms were selected or implemented, raising the possibility that the headline efficiency numbers depend on post-hoc choices tuned to SIFT-1M's 128-D distribution.

Authors: We confirm that a single, fixed set of hyperparameters was employed for all experiments reported in the manuscript, with no per-dataset tuning. The specific values for myelial decay, reinforcement, hierarchy adaptation, and other parameters are listed in the experimental setup. The ten repair mechanisms were selected to include representative examples of geometric heuristics (such as distance-threshold based) and topological approaches (such as connectivity-based repair), drawn from related work in graph-based ANN and streaming indexes. Full implementation details, including pseudocode, are provided in the appendix. We will update the abstract and the experimental section to explicitly state the fixed hyperparameter usage and provide a brief rationale for the selection of the repair mechanisms in the main text. revision: yes

Circularity Check

0 steps flagged

No circularity detected; purely empirical claims with no derivations or self-referential reductions

full rationale

The paper reports experimental benchmark results for a streaming ANN index on SIFT-1M, including recall, RAM, and QPS metrics under specific protocols. No equations, parameter fits, or derivation chains are described that reduce any claimed result to its own inputs by construction. The 'topological repair invariance' is presented as an empirical observation from comparing ten mechanisms, not a mathematical principle imported via self-citation or ansatz. All quantitative claims are direct measurements rather than predictions derived from fitted parameters or renamed known results. This matches the default expectation for an empirical systems paper with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, free parameters, or new physical entities are introduced; the work is an algorithmic design with empirical validation. The mycelium analogy is a source of inspiration rather than a formal axiom or invented entity with independent evidence.

pith-pipeline@v0.9.0 · 5559 in / 1264 out tokens · 46664 ms · 2026-05-10T15:50:34.723904+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 4 canonical work pages

[1]

EnhanceGraph: A continuously enhanced graph-based index for high-dimensional approximate nearest neighbor search, 2025

Ant Group et al. EnhanceGraph: A continuously enhanced graph-based index for high-dimensional approximate nearest neighbor search, 2025. arXiv:2506.13144

work page arXiv 2025
[2]

Graph reordering for cache-efficient near neighbor search

Benjamin Coleman, Richard Baraniuk, and Anshumali Shrivastava. Graph reordering for cache-efficient near neighbor search. In Advances in Neural Information Processing Systems, volume 35, 2022

2022
[3]

AntNet: Distributed stigmergetic control for commu- nications networks.Journal of Artificial In- telligence Research, 9:317–365, 1998

Gianni Di Caro and Marco Dorigo. AntNet: Distributed stigmergetic control for commu- nications networks.Journal of Artificial In- telligence Research, 9:317–365, 1998

1998
[4]

Ant colony optimization theory: A survey.The- oretical Computer Science, 344(2–3):243– 278, 2005

Marco Dorigo and Christian Blum. Ant colony optimization theory: A survey.The- oretical Computer Science, 344(2–3):243– 278, 2005

2005
[5]

Fast approximate nearest neigh- bor search with the navigating spreading- out graph.Proceedings of the VLDB En- dowment, 12(5):461–474, 2019

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approximate nearest neigh- bor search with the navigating spreading- out graph.Proceedings of the VLDB En- dowment, 12(5):461–474, 2019

2019
[6]

Optimization of indexing based on k-nearest neighbor graph for proximity search in high- dimensional data, 2018

Masajiro Iwasaki and Daisuke Miyazaki. Optimization of indexing based on k-nearest neighbor graph for proximity search in high- dimensional data, 2018. arXiv:1810.07355

work page arXiv 2018
[7]

Billion-scale similarity search with GPUs.IEEE Transactions on Big Data, 7(3):535–547, 2019

Jeff Johnson, Matthijs Douze, and Herv´ e J´ egou. Billion-scale similarity search with GPUs.IEEE Transactions on Big Data, 7(3):535–547, 2019

2019
[8]

Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs,

Yury A. Malkov and Dmitry A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836, 2020. arXiv:1603.09320

work page arXiv 2020
[9]

Freshdiskann: A fast and accurate graph-based ann index for streaming similarity search,

Aditi Singh, Suhas Jayaram Subramanya, Ravishankar Krishnaswamy, and Har- sha Vardhan Simhadri. FreshDiskANN: A fast and accurate graph-based ANN index for streaming similarity search, 2021. arXiv:2105.09613

work page arXiv 2021
[10]

DiskANN: Fast accurate billion-point near- est neighbor search on a single node

Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar 8 Krishnaswamy, and Rohan Kadekodi. DiskANN: Fast accurate billion-point near- est neighbor search on a single node. In Advances in Neural Information Processing Systems, volume 32, 2019. 9 A Full Parameter Table Table 10: Full parameter table for SIFT-1M experiments. Parameter Va...

2019