TTFT-Aware Graph Chain-of-Thought:Distance-Indexed Neural A* for Low-Hallucination Multi-Hop Medical Reasoning

Bechir Dardouri; Ka\"is Zhioua; Yassine Msaddak

arxiv: 2606.23108 · v1 · pith:6PXMRVF3new · submitted 2026-06-22 · 💻 cs.AI

TTFT-Aware Graph Chain-of-Thought:Distance-Indexed Neural A* for Low-Hallucination Multi-Hop Medical Reasoning

Bechir Dardouri , Ka\"is Zhioua , Yassine Msaddak This is my paper

Pith reviewed 2026-06-26 08:42 UTC · model grok-4.3

classification 💻 cs.AI

keywords GraphRAGmedical reasoninghallucination reductionA* searchPruned Landmark Labelingmulti-hop reasoningfertilityknowledge graph

0 comments

The pith

Hybrid PLL and AStarNet navigation on a 700K-node medical graph constrains LLM outputs to scored paths, yielding lower TTFT and fewer hallucinations than text-only RAG on fertility queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a production GraphRAG system that restricts LLM answers to explicit chains of paths drawn from a heterogeneous medical knowledge graph. A Pruned Landmark Labeling oracle supplies exact distances for rapid feasibility checks and path listing, while an AStarNet model guides expansions inside those corridors using clinical plausibility signals. Selected paths are scored on semantic overlap, length, and provenance before being packed into compact prompts for generation. On fertility queries this hybrid produces a stronger latency-recall trade-off, faster first-token times, and lower rates of clinician-detected hallucinations than pure text retrieval or single-component baselines.

Core claim

By indexing the medical graph with directed PLL for sub-millisecond distance oracles and restricting a lightweight neural A* heuristic to the resulting corridors, the method enumerates a small diverse set of paths that are scored and used to condition generation; the resulting system improves the latency-recall Pareto frontier, reduces TTFT, and lowers audited hallucinations relative to text-only RAG while preserving explanation clarity.

What carries the argument

The directed Pruned Landmark Labeling (PLL) oracle for exact distances and simple-path enumeration, paired with an AStarNet heuristic that stays inside the PLL corridor to prioritize expansions.

If this is right

The hybrid establishes a better latency/recall Pareto frontier than text-only RAG and single-component baselines on fertility queries.
It lowers TTFT through compact path-conditioned prompts.
It reduces clinician-audited hallucinations while preserving explanation clarity.
Paths are selected using CUI/semantic-type overlap, length prior, and provenance priors before generation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same corridor-constrained search could be tested on non-fertility medical queries to check domain transfer.
If graph completeness varies across specialties, path coverage gaps would directly limit reasoning quality on those topics.
Adding provenance weighting as an explicit tunable parameter might allow clinicians to trade explanation length against source reliability.
The approach could be combined with existing retrieval methods to further tighten the observed Pareto frontier.

Load-bearing premise

Paths enumerated and scored from the medical knowledge graph correspond to clinically valid and sufficient reasoning steps for the target queries.

What would settle it

A head-to-head clinician audit on the same fertility queries that finds equal or higher hallucination rates for the PLL+AStarNet system than for text-only RAG.

Figures

Figures reproduced from arXiv: 2606.23108 by Bechir Dardouri, Ka\"is Zhioua, Yassine Msaddak.

**Figure 1.** Figure 1: End-to-end pipeline: PLL bounds search to the shortest-distance corridor; A*Net focuses expansions within that corridor; scoring/diversity/packing yield compact, auditable evidence for citation-enforced decoding. Non-goals. We do not solve open-domain text retrieval; we do not learn the KG; and we do not modify the generator architecture beyond citation-enforcing decoding. 3.2 Design overview and rational… view at source ↗

**Figure 2.** Figure 2: Latency–recall frontier (lower-right is better). Hybrid (PLL+AStarNet) improves the Pareto frontier relative to Text RAG, AStarNet-only, and PLL-only at k∈ {3, 5} [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: TTFT distributions across methods (lower is better). The hybrid lowers median TTFT and tightens the tail (p95) vs. all baselines [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Cumulative expansions vs. time (lower is better). The hybrid explores far fewer nodes/edges than BFS/Bi-BFS and AStarNet-only, aligning with observed latency gains [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

Hallucinations and opaque reasoning remain unacceptable failure modes for clinical LLMs. We present a production-grade GraphRAG stack that constrains answers to verifiable graph chain-of-thought paths in a heterogeneous, ~700K-node medical knowledge graph powering a fertility assistant. The core idea is targeted navigation: a directed Pruned Landmark Labeling (PLL) oracle provides exact distances for sub-millisecond feasibility checks and simple-path enumeration, while a lightweight AStarNet heuristic operates strictly within the PLL corridor to prioritize clinically plausible expansions. We score and pack a small, diverse set of paths (CUI/semantic-type overlap, length prior, provenance priors) to condition generation, yielding compact prompts and improved Time to First Token (TTFT). On fertility-focused queries, the hybrid (PLL+AStarNet) establishes a better latency/recall Pareto frontier than text-only RAG and single-component baselines, lowers TTFT, and reduces clinician-audited hallucinations while preserving explanation clarity. The result is a practical recipe for explainable, low-hallucination multi-hop medical reasoning ready for real-world deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This describes a GraphRAG pipeline using PLL distances and neural A* to constrain medical LLM paths for a fertility assistant, but the abstract supplies no results or validation.

read the letter

The paper's main contribution is a concrete engineering recipe for a GraphRAG system in a fertility assistant. It uses pruned landmark labeling to get exact distances for fast path feasibility, then runs a lightweight neural A* heuristic inside that corridor to prioritize expansions before scoring and packing a few paths into the prompt.

The integration targets two practical problems at once: keeping generation explainable by tying it to graph chains and improving TTFT by limiting prompt size. Scoring paths on CUI overlap, length, and provenance priors is a straightforward way to select diverse candidates.

The description of how the components fit together is clear enough that someone building similar constrained systems could take the high-level design and adapt it.

The central problem is that the abstract states gains on latency, recall, TTFT, and clinician-audited hallucinations versus text-only RAG and single-component baselines, yet gives no numbers, no protocol, no baselines, and no details on the audit. Claims cannot be checked.

The assumption that PLL-enumerated paths from the 700K-node graph are clinically sufficient and complete for fertility queries also goes untested. No coverage metrics or comparison to guidelines appear.

This is aimed at engineers who need a starting template for graph-constrained medical generation. Readers wanting validated methods or reproducible results will not find enough here.

I would not bring it to a reading group or cite it. It does not yet deserve peer review; the experimental section needs to be added first.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces TTFT-Aware Graph Chain-of-Thought, a method that uses a Pruned Landmark Labeling (PLL) oracle for exact distance queries and a neural A* heuristic (AStarNet) to enumerate and score paths in a ~700K-node medical knowledge graph. These paths condition LLM generation to reduce hallucinations in multi-hop medical reasoning for a fertility assistant. The hybrid approach is claimed to achieve a superior latency/recall Pareto frontier, lower Time to First Token (TTFT), and fewer clinician-audited hallucinations compared to text-only RAG and single-component baselines while maintaining explanation clarity.

Significance. If the performance claims and clinical validity of the selected paths hold, the work could provide a practical, deployable framework for explainable medical reasoning that balances efficiency and accuracy in production settings. The combination of exact graph oracles with learned heuristics for path prioritization is a notable technical contribution for constrained generation.

major comments (2)

[Abstract] Abstract: The claims of establishing a better latency/recall Pareto frontier, lowering TTFT, and reducing hallucinations are presented without any quantitative results, baselines, error bars, or details of the experimental protocol, which prevents evaluation of the central empirical claims.
[Method] Path scoring and enumeration description: The assumption that PLL-enumerated paths scored by CUI/semantic-type overlap, length prior, and provenance priors are clinically valid and sufficient reasoning steps for fertility queries is load-bearing for the low-hallucination claim but unsupported by coverage metrics, external validation against clinical guidelines, or inter-clinician agreement on path adequacy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point by point below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The claims of establishing a better latency/recall Pareto frontier, lowering TTFT, and reducing hallucinations are presented without any quantitative results, baselines, error bars, or details of the experimental protocol, which prevents evaluation of the central empirical claims.

Authors: We agree that the abstract would benefit from including key quantitative results. In the revised version we will insert concise metrics (e.g., latency/recall deltas versus text-only RAG and single-component baselines, TTFT reduction, and hallucination-rate improvement with clinician audit) while preserving brevity. revision: yes
Referee: [Method] Path scoring and enumeration description: The assumption that PLL-enumerated paths scored by CUI/semantic-type overlap, length prior, and provenance priors are clinically valid and sufficient reasoning steps for fertility queries is load-bearing for the low-hallucination claim but unsupported by coverage metrics, external validation against clinical guidelines, or inter-clinician agreement on path adequacy.

Authors: The scoring priors are derived directly from the curated medical graph (CUI overlap, semantic-type constraints, provenance, and length). The clinician-audited hallucination reduction provides indirect empirical support for path utility. We will add coverage statistics and an expanded justification of the priors in the method section; however, a full external clinical-guideline validation study lies outside the scope of the present work. revision: partial

Circularity Check

0 steps flagged

No circularity; method relies on external oracles and independent scoring

full rationale

The paper describes a GraphRAG pipeline that combines an external Pruned Landmark Labeling (PLL) oracle for exact distances, a separate lightweight AStarNet heuristic, and path scoring via CUI/semantic-type overlap plus length/provenance priors. These components are presented as inputs to constrained generation rather than derived from the target metrics (TTFT, recall, hallucination rate). No equations, fitted parameters, or self-citations are shown that would reduce any claimed result to a tautology or renamed input. The performance claims on fertility queries are empirical comparisons against baselines, not predictions forced by construction from the same data used to tune the system. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only; the central claim rests on the unstated premise that the ~700K-node medical knowledge graph is sufficiently accurate and complete for the queries, plus that path scoring (CUI overlap, length prior, provenance) selects clinically useful chains.

axioms (1)

domain assumption The medical knowledge graph accurately and completely represents the clinical relationships needed for fertility queries
Implicit in the use of graph paths as the sole source of verifiable reasoning

pith-pipeline@v0.9.1-grok · 5736 in / 1253 out tokens · 31580 ms · 2026-06-26T08:42:57.216830+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 10 canonical work pages

[1]

In: Proceedings of the 20th Annual European Symposium on Algorithms (ESA)

Abraham, I., Delling, D., Goldberg, A.V., Werneck, R.F.: Hierarchical hub labelings for shortest paths. In: Proceedings of the 20th Annual European Symposium on Algorithms (ESA). pp. 24–35 (2012). https://doi.org/10.1007/ 978-3-642-33090-2_3

2012
[2]

In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Akiba, T., Iwata, Y., Yoshida, Y.: Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. pp. 349–360 (2013). https://doi.org/10.1145/2463676.2465312, https://arxiv.org/abs/1304.4661

work page doi:10.1145/2463676.2465312 2013
[3]

In: Companion Proceedings of the 23rd International Conference on World Wide Web (WWW Companion)

Akiba, T., Iwata, Y., Yoshida, Y.: Dynamic and historical shortest-path distance queries on large evolving networks by pruned landmark labeling. In: Companion Proceedings of the 23rd International Conference on World Wide Web (WWW Companion). pp. 237–238 (2014). https://doi.org/10.1145/2567948.2579222

work page doi:10.1145/2567948.2579222 2014
[4]

doi:10.1038/s41746-025-01670-7 , issn =

Asgari, E., Montaña-Brown, N., Dubois, M., Khalil, S., Balloch, J., Au Ye- ung, J., Pimenta, D.: A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation. npj Digital Medicine8(274), 1– 15 (2025). https://doi.org/10.1038/s41746-025-01670-7, https://www.nature.com/ articles/s41746-025-01670-7

work page doi:10.1038/s41746-025-01670-7 2025
[5]

Nucleic Acids Research32(suppl_1), D267–D270 (2004)

Bodenreider, O.: The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Research32(suppl_1), D267–D270 (2004). https://doi.org/10.1093/nar/gkh061

work page doi:10.1093/nar/gkh061 2004
[6]

Scientific Data10(67), 1–16 (2023)

Chandak, P., Huang, K., Zitnik, M.: Building a knowledge graph to enable pre- cision medicine. Scientific Data10(67), 1–16 (2023). https://doi.org/10.1038/ s41597-023-01960-3, https://www.nature.com/articles/s41597-023-01960-3 14 B. Dardouri et al

2023
[7]

GitHub repository (2024), https://github.com/DeepGraphLearning/AStarNet

DeepGraphLearning: AStarNet: Official implementation of a* networks. GitHub repository (2024), https://github.com/DeepGraphLearning/AStarNet

2024
[8]

arXiv preprint (2024), https://arxiv.org/abs/2404.16130

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Larson, J.: From local to global: A graph RAG approach to query-focused summarization. arXiv preprint (2024), https://arxiv.org/abs/2404.16130

Pith/arXiv arXiv 2024
[9]

SIAM Journal on Computing28(2), 652–673 (1999)

Eppstein, D.: Finding the k shortest paths. SIAM Journal on Computing28(2), 652–673 (1999). https://doi.org/10.1137/S0097539795290477

work page doi:10.1137/s0097539795290477 1999
[10]

ExplosionAI:ScispaCyEntityLinkerdocumentation.Documentation(2024),https: //scispacy.readthedocs.io/en/latest/linking.html

2024
[11]

In: Proceedings of the 16th Annual ACM–SIAM Symposium on DiscreteAlgorithms (SODA)

Goldberg, A.V., Harrelson, C.: Computing the shortest path: A* search meets graph theory. In: Proceedings of the 16th Annual ACM–SIAM Symposium on DiscreteAlgorithms (SODA). pp.156–165(2005), https://dl.acm.org/doi/10.5555/ 1070432.1070455

arXiv 2005
[12]

Hart, Nils J

Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determina- tion of minimum cost paths. IEEE Transactions on Systems Science and Cybernet- ics4(2), 100–107 (1968). https://doi.org/10.1109/TSSC.1968.300136

work page doi:10.1109/tssc.1968.300136 1968
[13]

Hershberger, J., Suri, S.: Vickrey prices and shortest paths: What is an edge worth? In: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS). pp. 252–259 (2003). https://doi.org/10.1109/SFCS.2003.1238196

work page doi:10.1109/sfcs.2003.1238196 2003
[14]

GitHub repository (2013), https://github.com/iwiwi/pruned-landmark-labeling

Iwata, Y.: Pruned landmark labeling (PLL): Reference implementation. GitHub repository (2013), https://github.com/iwiwi/pruned-landmark-labeling

2013
[15]

Artificial Intelligence in Medicine117, 102083 (2021)

Kraljevic, Z., Searle, T., Shek, A., Roguski, L., Noor, K., Bean, D., Mascio, A., Zhu, L., Folarin, A.A., Roberts, A., Bendayan, R., Richardson, M.P., Stewart, R., Shah, A.D., Wong, W.K., Ibrahim, Z., Teo, J.T., Dobson, R.J.B.: Multi- domain clinical natural language processing with MedCAT: The medical con- cept annotation toolkit. Artificial Intelligence...

work page doi:10.1016/j.artmed.2021.102083 2021
[16]

Product documentation (2025), https://neo4j.com/docs/

Neo4j, Inc.: Neo4j graph database documentation. Product documentation (2025), https://neo4j.com/docs/

2025
[17]

Qi, X., Zeng, Y ., Xie, T., Chen, P.-Y ., Jia, R., Mittal, P., and Henderson, P

Neumann, M., King, D., Beltagy, I., Ammar, W.: ScispaCy: Fast and robust mod- els for biomedical natural language processing. In: Proceedings of the 18th BioNLP Workshop and Shared Task. pp. 319–327 (2019). https://doi.org/10.18653/v1/ W19-5034, https://aclanthology.org/W19-5034/

work page doi:10.18653/v1/ 2019
[18]

Product documentation (2020), https://docs.nvidia.com/dgx/pdf/dgx-a100-user-guide.pdf

NVIDIA: NVIDIA DGX A100 system: User guide. Product documentation (2020), https://docs.nvidia.com/dgx/pdf/dgx-a100-user-guide.pdf

2020
[19]

NVIDIA NIM Benchmarking Documentation (2024), https: //docs.nvidia.com/nim/benchmarking/llm/latest/metrics.html

NVIDIA: LLM inference benchmarking: Fundamental concepts (time to first token and related metrics). NVIDIA NIM Benchmarking Documentation (2024), https: //docs.nvidia.com/nim/benchmarking/llm/latest/metrics.html

2024
[20]

NVIDIA NIM Benchmarking Documentation (2024), https://docs.nvidia.com/ nim/benchmarking/llm/latest/parameters.html

NVIDIA: LLM inference benchmarking: Parameters and best practices. NVIDIA NIM Benchmarking Documentation (2024), https://docs.nvidia.com/ nim/benchmarking/llm/latest/parameters.html

2024
[21]

Management Science 17(11), 712–716 (1971)

Yen, J.Y.: Finding the k shortest loopless paths in a network. Management Science 17(11), 712–716 (1971). https://doi.org/10.1287/mnsc.17.11.712

work page doi:10.1287/mnsc.17.11.712 1971
[22]

Databricks Blog (2023), https://www.databricks

Zaharia,M.,Lee,J.,Wendell,P.,Chien,C.,Graves,P.:Timetofirsttoken(TTFT): What it is and how to improve it. Databricks Blog (2023), https://www.databricks. com/blog/time-first-token-ttft-what-it-and-how-improve-it

2023
[23]

In: Advances in Neural Information Processing Systems (NeurIPS)

Zhu, Z., Yuan, X., Galkin, M., Xhonneux, S., Zhang, M., Gazeau, M., Tang, J.: A*net: A scalable path-based reasoning approach for knowledge graphs. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 36 (2023), https://arxiv.org/abs/2206.04798

arXiv 2023

[1] [1]

In: Proceedings of the 20th Annual European Symposium on Algorithms (ESA)

Abraham, I., Delling, D., Goldberg, A.V., Werneck, R.F.: Hierarchical hub labelings for shortest paths. In: Proceedings of the 20th Annual European Symposium on Algorithms (ESA). pp. 24–35 (2012). https://doi.org/10.1007/ 978-3-642-33090-2_3

2012

[2] [2]

In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Akiba, T., Iwata, Y., Yoshida, Y.: Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. pp. 349–360 (2013). https://doi.org/10.1145/2463676.2465312, https://arxiv.org/abs/1304.4661

work page doi:10.1145/2463676.2465312 2013

[3] [3]

In: Companion Proceedings of the 23rd International Conference on World Wide Web (WWW Companion)

Akiba, T., Iwata, Y., Yoshida, Y.: Dynamic and historical shortest-path distance queries on large evolving networks by pruned landmark labeling. In: Companion Proceedings of the 23rd International Conference on World Wide Web (WWW Companion). pp. 237–238 (2014). https://doi.org/10.1145/2567948.2579222

work page doi:10.1145/2567948.2579222 2014

[4] [4]

doi:10.1038/s41746-025-01670-7 , issn =

Asgari, E., Montaña-Brown, N., Dubois, M., Khalil, S., Balloch, J., Au Ye- ung, J., Pimenta, D.: A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation. npj Digital Medicine8(274), 1– 15 (2025). https://doi.org/10.1038/s41746-025-01670-7, https://www.nature.com/ articles/s41746-025-01670-7

work page doi:10.1038/s41746-025-01670-7 2025

[5] [5]

Nucleic Acids Research32(suppl_1), D267–D270 (2004)

Bodenreider, O.: The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Research32(suppl_1), D267–D270 (2004). https://doi.org/10.1093/nar/gkh061

work page doi:10.1093/nar/gkh061 2004

[6] [6]

Scientific Data10(67), 1–16 (2023)

Chandak, P., Huang, K., Zitnik, M.: Building a knowledge graph to enable pre- cision medicine. Scientific Data10(67), 1–16 (2023). https://doi.org/10.1038/ s41597-023-01960-3, https://www.nature.com/articles/s41597-023-01960-3 14 B. Dardouri et al

2023

[7] [7]

GitHub repository (2024), https://github.com/DeepGraphLearning/AStarNet

DeepGraphLearning: AStarNet: Official implementation of a* networks. GitHub repository (2024), https://github.com/DeepGraphLearning/AStarNet

2024

[8] [8]

arXiv preprint (2024), https://arxiv.org/abs/2404.16130

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Larson, J.: From local to global: A graph RAG approach to query-focused summarization. arXiv preprint (2024), https://arxiv.org/abs/2404.16130

Pith/arXiv arXiv 2024

[9] [9]

SIAM Journal on Computing28(2), 652–673 (1999)

Eppstein, D.: Finding the k shortest paths. SIAM Journal on Computing28(2), 652–673 (1999). https://doi.org/10.1137/S0097539795290477

work page doi:10.1137/s0097539795290477 1999

[10] [10]

ExplosionAI:ScispaCyEntityLinkerdocumentation.Documentation(2024),https: //scispacy.readthedocs.io/en/latest/linking.html

2024

[11] [11]

In: Proceedings of the 16th Annual ACM–SIAM Symposium on DiscreteAlgorithms (SODA)

Goldberg, A.V., Harrelson, C.: Computing the shortest path: A* search meets graph theory. In: Proceedings of the 16th Annual ACM–SIAM Symposium on DiscreteAlgorithms (SODA). pp.156–165(2005), https://dl.acm.org/doi/10.5555/ 1070432.1070455

arXiv 2005

[12] [12]

Hart, Nils J

Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determina- tion of minimum cost paths. IEEE Transactions on Systems Science and Cybernet- ics4(2), 100–107 (1968). https://doi.org/10.1109/TSSC.1968.300136

work page doi:10.1109/tssc.1968.300136 1968

[13] [13]

Hershberger, J., Suri, S.: Vickrey prices and shortest paths: What is an edge worth? In: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS). pp. 252–259 (2003). https://doi.org/10.1109/SFCS.2003.1238196

work page doi:10.1109/sfcs.2003.1238196 2003

[14] [14]

GitHub repository (2013), https://github.com/iwiwi/pruned-landmark-labeling

Iwata, Y.: Pruned landmark labeling (PLL): Reference implementation. GitHub repository (2013), https://github.com/iwiwi/pruned-landmark-labeling

2013

[15] [15]

Artificial Intelligence in Medicine117, 102083 (2021)

Kraljevic, Z., Searle, T., Shek, A., Roguski, L., Noor, K., Bean, D., Mascio, A., Zhu, L., Folarin, A.A., Roberts, A., Bendayan, R., Richardson, M.P., Stewart, R., Shah, A.D., Wong, W.K., Ibrahim, Z., Teo, J.T., Dobson, R.J.B.: Multi- domain clinical natural language processing with MedCAT: The medical con- cept annotation toolkit. Artificial Intelligence...

work page doi:10.1016/j.artmed.2021.102083 2021

[16] [16]

Product documentation (2025), https://neo4j.com/docs/

Neo4j, Inc.: Neo4j graph database documentation. Product documentation (2025), https://neo4j.com/docs/

2025

[17] [17]

Qi, X., Zeng, Y ., Xie, T., Chen, P.-Y ., Jia, R., Mittal, P., and Henderson, P

Neumann, M., King, D., Beltagy, I., Ammar, W.: ScispaCy: Fast and robust mod- els for biomedical natural language processing. In: Proceedings of the 18th BioNLP Workshop and Shared Task. pp. 319–327 (2019). https://doi.org/10.18653/v1/ W19-5034, https://aclanthology.org/W19-5034/

work page doi:10.18653/v1/ 2019

[18] [18]

Product documentation (2020), https://docs.nvidia.com/dgx/pdf/dgx-a100-user-guide.pdf

NVIDIA: NVIDIA DGX A100 system: User guide. Product documentation (2020), https://docs.nvidia.com/dgx/pdf/dgx-a100-user-guide.pdf

2020

[19] [19]

NVIDIA NIM Benchmarking Documentation (2024), https: //docs.nvidia.com/nim/benchmarking/llm/latest/metrics.html

NVIDIA: LLM inference benchmarking: Fundamental concepts (time to first token and related metrics). NVIDIA NIM Benchmarking Documentation (2024), https: //docs.nvidia.com/nim/benchmarking/llm/latest/metrics.html

2024

[20] [20]

NVIDIA NIM Benchmarking Documentation (2024), https://docs.nvidia.com/ nim/benchmarking/llm/latest/parameters.html

NVIDIA: LLM inference benchmarking: Parameters and best practices. NVIDIA NIM Benchmarking Documentation (2024), https://docs.nvidia.com/ nim/benchmarking/llm/latest/parameters.html

2024

[21] [21]

Management Science 17(11), 712–716 (1971)

Yen, J.Y.: Finding the k shortest loopless paths in a network. Management Science 17(11), 712–716 (1971). https://doi.org/10.1287/mnsc.17.11.712

work page doi:10.1287/mnsc.17.11.712 1971

[22] [22]

Databricks Blog (2023), https://www.databricks

Zaharia,M.,Lee,J.,Wendell,P.,Chien,C.,Graves,P.:Timetofirsttoken(TTFT): What it is and how to improve it. Databricks Blog (2023), https://www.databricks. com/blog/time-first-token-ttft-what-it-and-how-improve-it

2023

[23] [23]

In: Advances in Neural Information Processing Systems (NeurIPS)

Zhu, Z., Yuan, X., Galkin, M., Xhonneux, S., Zhang, M., Gazeau, M., Tang, J.: A*net: A scalable path-based reasoning approach for knowledge graphs. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 36 (2023), https://arxiv.org/abs/2206.04798

arXiv 2023