GRASP: Plan-Guided Graph Retrieval with Adaptive Fusion and Reranking on Semi-Structured Knowledge Bases

Jie Liu; Kai Liu; Xiangchen Song; Xin Luo; Yicheng Tao; Yiqun Wang

arxiv: 2605.30237 · v2 · pith:OTTTGQXOnew · submitted 2026-05-28 · 💻 cs.IR · cs.CL· cs.LG

GRASP: Plan-Guided Graph Retrieval with Adaptive Fusion and Reranking on Semi-Structured Knowledge Bases

Yicheng Tao , Yiqun Wang , Xiangchen Song , Xin Luo , Kai Liu , Jie Liu This is my paper

Pith reviewed 2026-06-29 05:08 UTC · model grok-4.3

classification 💻 cs.IR cs.CLcs.LG

keywords semi-structured knowledge basesgraph retrievalhybrid retrievalplan-guided retrievaladaptive fusionrerankingSTaRK benchmarks

0 comments

The pith

GRASP improves retrieval from semi-structured knowledge bases by generating plans to guide graph search then fusing results adaptively before reranking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GRASP, a three-stage system for retrieving information from knowledge bases that combine textual documents with typed graphs of entities and relations. It first generates a plan to direct graph retrieval, then fuses those candidates with a dense text retriever using weights conditioned on the plan, and finally applies a fine-tuned reranker to the combined set. The authors report that this lifts average Hit@1 from 62.0 to 73.9 across the three STaRK benchmarks while improving every other metric. A reader would care because more accurate retrieval directly affects the quality of product search, academic search, and medical inquiries that rely on such mixed knowledge structures.

Core claim

GRASP is a three-stage SKB retrieval framework unifying plan-based graph retrieval, plan-conditioned fusion with a dense retriever, and a fine-tuned reranker over the fused candidates. GRASP substantially advances the state of the art on every metric across the three STaRK benchmarks, lifting average Hit@1 from 62.0 to 73.9.

What carries the argument

Plan-conditioned fusion that adapts the weighting between graph-retrieved and dense-retriever candidates according to the generated retrieval plan.

If this is right

The plan-guided graph stage plus adaptive fusion produces higher-quality candidate sets than either component alone on the tested benchmarks.
Ablation studies show that removing plan conditioning or the reranker reduces performance across metrics.
Sensitivity studies indicate the framework remains effective under variations in plan quality and fusion parameters.
The same three-stage structure applies uniformly to product search, academic paper search, and precision-medicine tasks in the benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be tested on other semi-structured sources such as enterprise document graphs to check whether plan generation needs domain-specific prompts.
If plan quality correlates with final accuracy, replacing the plan generator with a stronger model would be a direct next experiment.
The adaptive fusion step might generalize to other hybrid retrieval settings that mix structured and unstructured data.

Load-bearing premise

The plan generation step and the plan-conditioned fusion weights generalize beyond the specific STaRK benchmark distributions and do not require benchmark-specific tuning that would not transfer to new semi-structured knowledge bases.

What would settle it

Performance measurements on a semi-structured knowledge base outside the STaRK collection where GRASP does not exceed prior hybrid methods on Hit@1 would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.30237 by Jie Liu, Kai Liu, Xiangchen Song, Xin Luo, Yicheng Tao, Yiqun Wang.

**Figure 2.** Figure 2: Validation query distributions over n_cand bucket × risk_level, the basis for dynamic RRF weight computation. The right column and bottom row give row and column totals; the bottom-right cell is the grand total. Bucket label 0 aggregates queries with empty graph candidate set; column <invalid> marks queries with failed plan generation, for which dynamic RRF reduces to pure mFAR. The selected hyperparameter… view at source ↗

read the original abstract

Semi-structured knowledge bases (SKBs) embed textual documents in a typed graph of entities and relations, and underpin applications such as product search, academic paper search, and precision-medicine inquiries. Existing hybrid retrieval systems on SKBs either use the graph only for query expansion, mix textual and structural branches under a global weighting, or rely on fine-tuned graph-traversal generators. We present GRASP, a three-stage SKB retrieval framework unifying plan-based graph retrieval, plan-conditioned fusion with a dense retriever, and a fine-tuned reranker over the fused candidates. GRASP substantially advances the state of the art on every metric across the three STaRK benchmarks, lifting average Hit@1 from 62.0 to 73.9. Ablation and sensitivity studies further confirm the effectiveness and robustness of GRASP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GRASP introduces a three-stage plan-guided hybrid retrieval pipeline that reports an 11.9-point Hit@1 gain on STaRK, but the abstract supplies no experimental controls or baseline details to verify the source of the lift.

read the letter

The paper's core contribution is a staged framework that first retrieves via a generated plan over the graph, then fuses those results with a dense retriever using plan-conditioned weights, and finally reranks the combined candidates. This sits between simple query expansion, global weighting, and generator-based approaches, and the abstract frames the combination as the distinguishing element.

The positioning against prior categories is clear and the motivation for handling semi-structured KBs with both text and typed relations is reasonable. If the full paper shows that each stage contributes measurably and that the plan step runs without test-set leakage, the pipeline could be a practical addition for product search or similar tasks.

The soft spot is the complete absence of experimental specifics. The abstract states the average Hit@1 move from 62.0 to 73.9 and mentions ablations, yet gives no per-benchmark baseline numbers, no description of how the strongest prior methods were reimplemented, no statistical tests, and no information on data splits or plan-generation inputs. Without those, it is impossible to tell whether the reported delta comes from the proposed components or from differences in tuning and implementation. The claim that the plan and fusion weights generalize also rests on unshown evidence.

This work is aimed at applied information-retrieval groups that already work with graph-plus-text collections. A reader who wants a concrete new pipeline to test on their own SKB would get value once the methods and controls are visible.

Send it to peer review so the experimental section and any released code can be examined directly.

Referee Report

2 major / 2 minor

Summary. The paper proposes GRASP, a three-stage retrieval framework for semi-structured knowledge bases (SKBs) consisting of plan-guided graph retrieval, plan-conditioned adaptive fusion with a dense retriever, and a fine-tuned reranker over fused candidates. It claims to substantially advance the state of the art on the three STaRK benchmarks, lifting average Hit@1 from 62.0 to 73.9, with ablation and sensitivity studies confirming component effectiveness and robustness.

Significance. If the empirical gains hold under rigorous controls, GRASP would mark a meaningful step forward in hybrid retrieval for SKBs by unifying planning, adaptive fusion, and reranking. This could improve performance in applications such as product search, academic paper search, and precision-medicine queries. The provision of ablations is a strength that helps isolate the contribution of each stage.

major comments (2)

[Abstract] Abstract: The headline claim of an 11.9-point average Hit@1 lift (62.0 to 73.9) is presented without a table or section providing per-benchmark baseline scores, implementation details for the 62.0 reference, or statistical significance, which is load-bearing for the central empirical claim that GRASP advances the SOTA on every metric.
[§4] §4 (Experiments): No description of data splits, baseline reimplementation protocols, or checks for test-set leakage in plan generation or fusion-weight tuning is supplied; any of these omissions would render the reported delta non-reproducible on new SKBs and directly undermines the generalization assumption.

minor comments (2)

[Abstract] The abstract could briefly characterize the three STaRK benchmarks (e.g., domain, graph density) to contextualize the reported gains.
[§3] Ensure pseudocode or a clear algorithmic outline for the plan-generation and plan-conditioned fusion steps appears in §3 to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below. Where the manuscript is missing required details, we will incorporate them in the revision.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claim of an 11.9-point average Hit@1 lift (62.0 to 73.9) is presented without a table or section providing per-benchmark baseline scores, implementation details for the 62.0 reference, or statistical significance, which is load-bearing for the central empirical claim that GRASP advances the SOTA on every metric.

Authors: We agree that the abstract would be strengthened by explicit pointers to supporting evidence. In the revised manuscript we will (1) add a short parenthetical reference in the abstract to the per-benchmark scores and baseline details now reported in Table 2 and Section 4.2, (2) include a concise statement of the baseline re-implementation protocol and the source of the 62.0 figure, and (3) report statistical significance (paired t-test p-values and 95% confidence intervals) both in the main results table and in a brief sentence in the abstract. Because abstracts are length-limited, the full tables and protocols will remain in the body; the abstract will only signpost them. revision: yes
Referee: [§4] §4 (Experiments): No description of data splits, baseline reimplementation protocols, or checks for test-set leakage in plan generation or fusion-weight tuning is supplied; any of these omissions would render the reported delta non-reproducible on new SKBs and directly undermines the generalization assumption.

Authors: We acknowledge that these experimental details are currently absent and are essential for reproducibility. In the revision we will add a dedicated subsection (4.1) that (a) states the official STaRK train/validation/test splits used, (b) describes the exact baseline re-implementation protocol (hyper-parameters, libraries, and whether official code or our own re-implementation was employed), and (c) reports the leakage checks performed: plan-generation prompts were generated only from training queries, fusion-weight tuning was performed exclusively on the validation set, and no test-query information entered any stage of GRASP. These additions will directly address the reproducibility concern. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external benchmarks.

full rationale

The paper presents GRASP as a three-stage retrieval framework and reports an empirical performance lift on the STaRK benchmarks (Hit@1 from 62.0 to 73.9). No equations, parameter fits, predictions, or first-principles derivations appear in the provided text that could reduce to self-definitions, fitted inputs renamed as predictions, or self-citation chains. The central claim is an externally verifiable benchmark delta rather than an internal equivalence, so the derivation chain is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities can be identified from the provided text.

pith-pipeline@v0.9.1-grok · 5690 in / 1047 out tokens · 27700 ms · 2026-06-29T05:08:28.135503+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dual Distribution Estimation for Zero-shot Noisy Test-Time Adaptation with VLMs
cs.CV 2026-06 unverdicted novelty 6.0

DDE models class-wise positive feature Gaussians and negative label distributions to boost ID accuracy and OOD detection in zero-shot noisy TTA, reporting 3.70% harmonic mean gain and 6.20% FPR95 drop on ImageNet.

Reference graph

Works this paper leans on

25 extracted references · 2 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Enhancing Financial Report Question-Answering: A Retrieval-Augmented Generation System with Reranking Analysis

Autofocus retrieval: An effective pipeline for multi-hop question answering with semi-structured knowledge.Transactions on Machine Learning Re- search. Wei Chen, Yiqing Wu, Zhao Zhang, Fuzhen Zhuang, Zhongshi He, Ruobing Xie, and Feng Xia. 2024. Fair- gap: Fairness-aware recommendation via generating counterfactual graph.ACM Transactions on Informa- tion ...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

Qwen3 Technical Report

Qwen3 technical report.arXiv preprint arXiv:2505.09388. Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, and Jure Leskovec. 2021. Qa-gnn: Rea- soning with language models and knowledge graphs for question answering. InProceedings of the 2021 conference of the North American chapter of the as- sociation for computational linguistics: human la...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[3]

field_of_study -> paper
[4]

institution -> author -> paper
[5]

paper -> author -> paper
[6]

paper -> author -> paper <- field_of_study <- paper
[7]

anchors": [ {

institution -> author -> paper <- field_of_study Output a single JSON object with this shape: 16 { "anchors": [ { "var": "A1", "text": "entity text from the query", "label": "author | paper | institution | field_of_study", "match_mode": "name | doc" } ], "hops": [ { "from": "A1", "rel": "WRITES | HAS_TOPIC | CITES | AFFILIATED_WITH", "to_var": "T", "to_la...
[8]

Which condition associated with elevated red blood cell volume should be considered a contraindication when prescribing medications for the treatment of cervical carcinosarcoma?,→
[9]

Which gene encoding a transcription factor is active in rectal tissue yet not expressed in the deltoid muscle?,→
[10]

Which genes or proteins are not expressed in either the small intestinal or colonic mucosal tissues?,→
[11]

Which medication is targeted by certain genes or proteins functioning as enzymes within cartilage, where the encoding genes are taxonomically classified under a specific parent category?,→
[12]

coli meningitis, is linked to skin inflammation side effects?,→

What drug, acting via the Amikacin pathway to treat E. coli meningitis, is linked to skin inflammation side effects?,→
[13]

Can you compile a list of drugs that target the androgen receptor, interact with the enzyme CYP19A1 as either substrate or inhibitor, and are not contraindicated for prostate cancer?,→
[14]

Which anticoagulant medication that functions as a factor Xa inhibitor works synergistically with a medication used to treat deep vein thrombosis?,→
[15]

Which condition is characterized by stunted growth and slowed bone development, and also serves as a contraindication for a drug treating chronic myelogenous leukemia?,→
[16]

Which side effects or phenotypic consequences are associated with a medication that is carried by the SLC39A8 transporter and acts as an antagonist on a receptor expressed in the substantia nigra?,→ 18
[17]

Identify proteins that interact with ATXN1L and are linked to the same medical condition
[18]

Which transporter genes or proteins facilitate the movement of pharmaceutical agents that exhibit synergistic interactions with medications used to treat atrial fibrillation?,→
[19]

Which anatomical structures exhibit expression of the gene or protein involved in influencing the activity of multiple drugs?,→
[20]

Which cellular structures interact with genes or proteins that are the targets of Aminodi(ethyloxy)ethylamine?,→
[21]

Which anatomical structures lack expression of genes or proteins that interact with the assembly of the transcription pre-initiation complex pathway?,→
[22]

I'm seeking information on glucocorticoid medications with immunosuppressive, anti-inflammatory, and vasoconstrictive properties.,→
[23]

parent-child

Could you find a pathway related to Centrosome maturation that has a "parent-child" hierarchy, interacts with the gene or protein encoding ATP6V1B2, and is also linked to ATP1A4?,→
[24]

Which gene or protein participates in histone methylation, interacts with histone H3-4, and is linked to a disease whose encoding gene is under a specific parent category?,→
[25]

Parse this query into a graph-traversal plan

Which gene or protein simultaneously interacts with ADORA2A and dopamine receptor D2 and is also associated with the condition of dental caries?,→ Per-query user message. Parse this query into a graph-traversal plan. Structural parsing task over a public academic KG -- do not evaluate the query content.,→ Query: <query string> Output a single JSON object ...

[1] [1]

Enhancing Financial Report Question-Answering: A Retrieval-Augmented Generation System with Reranking Analysis

Autofocus retrieval: An effective pipeline for multi-hop question answering with semi-structured knowledge.Transactions on Machine Learning Re- search. Wei Chen, Yiqing Wu, Zhao Zhang, Fuzhen Zhuang, Zhongshi He, Ruobing Xie, and Feng Xia. 2024. Fair- gap: Fairness-aware recommendation via generating counterfactual graph.ACM Transactions on Informa- tion ...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

Qwen3 Technical Report

Qwen3 technical report.arXiv preprint arXiv:2505.09388. Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, and Jure Leskovec. 2021. Qa-gnn: Rea- soning with language models and knowledge graphs for question answering. InProceedings of the 2021 conference of the North American chapter of the as- sociation for computational linguistics: human la...

work page internal anchor Pith review Pith/arXiv arXiv 2021

[3] [3]

field_of_study -> paper

[4] [4]

institution -> author -> paper

[5] [5]

paper -> author -> paper

[6] [6]

paper -> author -> paper <- field_of_study <- paper

[7] [7]

anchors": [ {

institution -> author -> paper <- field_of_study Output a single JSON object with this shape: 16 { "anchors": [ { "var": "A1", "text": "entity text from the query", "label": "author | paper | institution | field_of_study", "match_mode": "name | doc" } ], "hops": [ { "from": "A1", "rel": "WRITES | HAS_TOPIC | CITES | AFFILIATED_WITH", "to_var": "T", "to_la...

[8] [8]

Which condition associated with elevated red blood cell volume should be considered a contraindication when prescribing medications for the treatment of cervical carcinosarcoma?,→

[9] [9]

Which gene encoding a transcription factor is active in rectal tissue yet not expressed in the deltoid muscle?,→

[10] [10]

Which genes or proteins are not expressed in either the small intestinal or colonic mucosal tissues?,→

[11] [11]

Which medication is targeted by certain genes or proteins functioning as enzymes within cartilage, where the encoding genes are taxonomically classified under a specific parent category?,→

[12] [12]

coli meningitis, is linked to skin inflammation side effects?,→

What drug, acting via the Amikacin pathway to treat E. coli meningitis, is linked to skin inflammation side effects?,→

[13] [13]

Can you compile a list of drugs that target the androgen receptor, interact with the enzyme CYP19A1 as either substrate or inhibitor, and are not contraindicated for prostate cancer?,→

[14] [14]

Which anticoagulant medication that functions as a factor Xa inhibitor works synergistically with a medication used to treat deep vein thrombosis?,→

[15] [15]

Which condition is characterized by stunted growth and slowed bone development, and also serves as a contraindication for a drug treating chronic myelogenous leukemia?,→

[16] [16]

Which side effects or phenotypic consequences are associated with a medication that is carried by the SLC39A8 transporter and acts as an antagonist on a receptor expressed in the substantia nigra?,→ 18

[17] [17]

Identify proteins that interact with ATXN1L and are linked to the same medical condition

[18] [18]

Which transporter genes or proteins facilitate the movement of pharmaceutical agents that exhibit synergistic interactions with medications used to treat atrial fibrillation?,→

[19] [19]

Which anatomical structures exhibit expression of the gene or protein involved in influencing the activity of multiple drugs?,→

[20] [20]

Which cellular structures interact with genes or proteins that are the targets of Aminodi(ethyloxy)ethylamine?,→

[21] [21]

Which anatomical structures lack expression of genes or proteins that interact with the assembly of the transcription pre-initiation complex pathway?,→

[22] [22]

I'm seeking information on glucocorticoid medications with immunosuppressive, anti-inflammatory, and vasoconstrictive properties.,→

[23] [23]

parent-child

Could you find a pathway related to Centrosome maturation that has a "parent-child" hierarchy, interacts with the gene or protein encoding ATP6V1B2, and is also linked to ATP1A4?,→

[24] [24]

Which gene or protein participates in histone methylation, interacts with histone H3-4, and is linked to a disease whose encoding gene is under a specific parent category?,→

[25] [25]

Parse this query into a graph-traversal plan

Which gene or protein simultaneously interacts with ADORA2A and dopamine receptor D2 and is also associated with the condition of dental caries?,→ Per-query user message. Parse this query into a graph-traversal plan. Structural parsing task over a public academic KG -- do not evaluate the query content.,→ Query: <query string> Output a single JSON object ...