Recognition: unknown
Neuro-Oracle: A Trajectory-Aware Agentic RAG Framework for Interpretable Epilepsy Surgical Prognosis
Pith reviewed 2026-05-10 15:36 UTC · model grok-4.3
The pith
Encoding changes between pre- and post-surgery brain scans allows retrieval of similar cases to predict and explain epilepsy outcomes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework distils pre-to-post-operative MRI changes into a compact trajectory vector, retrieves historically similar surgical trajectories from a population archive via nearest-neighbour search, and synthesises a natural-language prognosis grounded in the retrieved evidence. Trajectory-based classifiers separate outcome groups more effectively than single-timepoint baselines, while the reasoning agent matches their grouping performance and supplies structured justifications with no observed hallucinations under the audit protocol.
What carries the argument
the three-stage architecture that encodes longitudinal MRI differences as a trajectory vector for nearest-neighbour retrieval of comparable cases followed by agentic synthesis of explanations
If this is right
- Classifiers operating in trajectory space separate outcome groups more effectively than those limited to one pre-operative scan.
- The agent produces structured, auditable justifications alongside its group assignments.
- An ensemble of trajectory-space classifiers can reach higher grouping performance without language-model overhead.
Where Pith is reading between the lines
- Clinicians could use the retrieved cases to discuss concrete parallels with patients when weighing surgical options.
- Collecting actual seizure-freedom records instead of proxy labels would allow testing whether the trajectories capture genuine prognostic signals.
- The same change-vector approach might apply to other longitudinal imaging problems where the sequence of structural shifts carries predictive value.
Load-bearing premise
The evaluation uses resection type as a proxy for seizure-freedom outcomes, so the system may be learning anatomical location features rather than true prognostic morphology.
What would settle it
A direct test on a cohort with verified post-surgical seizure-freedom labels to check whether trajectory predictions align with actual clinical outcomes better than single-scan baselines.
Figures
read the original abstract
Predicting post-surgical seizure outcomes in pharmacoresistant epilepsy is a clinical challenge. Conventional deep-learning approaches operate on static, single-timepoint pre-operative scans, omitting longitudinal morphological changes. We propose \emph{Neuro-Oracle}, a three-stage framework that: (i) distils pre-to-post-operative MRI changes into a compact 512-dimensional trajectory vector using a 3D Siamese contrastive encoder; (ii) retrieves historically similar surgical trajectories from a population archive via nearest-neighbour search; and (iii) synthesises a natural-language prognosis grounded in the retrieved evidence using a quantized Llama-3-8B reasoning agent. Evaluations are conducted on the public EPISURG dataset ($N{=}268$ longitudinally paired cases) using five-fold stratified cross-validation. Since ground-truth seizure-freedom scores are unavailable, we utilize a clinical proxy label based on the resection type. We acknowledge that the network representations may potentially learn the anatomical features of the resection cavities (i.e., temporal versus non-temporal locations) rather than true prognostic morphometry. Our current evaluation thus serves mainly as a proof-of-concept for the trajectory-aware retrieval architecture. Trajectory-based classifiers achieve AUC values between 0.834 and 0.905, compared with 0.793 for a single-timepoint ResNet-50 baseline. The Neuro-Oracle agent (M5) matches the AUC of purely discriminative trajectory classifiers (0.867) while producing structured justifications with zero observed hallucinations under our audit protocol. A Siamese Diversity Ensemble (M6) of trajectory-space classifiers attains an AUC of 0.905 without language-model overhead.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Neuro-Oracle, a three-stage agentic RAG framework for predicting post-surgical seizure outcomes in epilepsy. It distills pre-to-post MRI changes into 512-dimensional trajectory vectors using a 3D Siamese contrastive encoder, retrieves similar historical trajectories from a population archive, and generates natural-language prognoses via a quantized Llama-3-8B reasoning agent. Evaluations on the public EPISURG dataset (N=268 longitudinally paired cases) use five-fold stratified cross-validation and a resection-type proxy label, reporting AUCs of 0.834–0.905 for trajectory-based classifiers versus 0.793 for a single-timepoint ResNet-50 baseline; the Neuro-Oracle agent matches 0.867 AUC with structured justifications and zero observed hallucinations.
Significance. If the trajectory vectors can be shown to capture prognostic morphometry independent of resection anatomy, the combination of contrastive trajectory encoding, nearest-neighbour retrieval, and hallucination-audited agentic generation would represent a meaningful advance in interpretable AI for epilepsy surgical planning. The use of a public dataset, explicit cross-validation protocol, and zero-hallucination audit protocol are concrete strengths that support reproducibility and clinical trust.
major comments (1)
- [Abstract] Abstract (evaluation paragraph): The reported AUC gains (0.834–0.905 versus 0.793 baseline) and the Neuro-Oracle agent's 0.867 AUC rest on a proxy label defined by resection type. Because the 512-dimensional trajectory vector is produced by a Siamese encoder on paired pre- and post-operative scans, the vector necessarily encodes resection-cavity geometry (temporal versus non-temporal). Any downstream classifier or retriever can therefore achieve high performance by recovering the resection label rather than learning independent prognostic features. This circularity, which the manuscript itself flags, is load-bearing for the central claim of trajectory-aware surgical prognosis.
minor comments (2)
- [Abstract] Abstract: The notation 'N{=}268' is a typesetting artifact and should read 'N=268'.
- [Methods] Methods (Siamese encoder description): The precise contrastive loss, margin value, and projection head architecture used to produce the 512-dimensional trajectory vector are not fully specified; adding these details would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive critique and for recognizing the strengths in reproducibility and the zero-hallucination audit. We agree that the proxy-label evaluation introduces a circularity that limits the strength of claims about independent prognostic morphometry, and we will revise the manuscript to foreground this limitation more explicitly while preserving the proof-of-concept demonstration of the trajectory-aware architecture.
read point-by-point responses
-
Referee: [Abstract] Abstract (evaluation paragraph): The reported AUC gains (0.834–0.905 versus 0.793 baseline) and the Neuro-Oracle agent's 0.867 AUC rest on a proxy label defined by resection type. Because the 512-dimensional trajectory vector is produced by a Siamese encoder on paired pre- and post-operative scans, the vector necessarily encodes resection-cavity geometry (temporal versus non-temporal). Any downstream classifier or retriever can therefore achieve high performance by recovering the resection label rather than learning independent prognostic features. This circularity, which the manuscript itself flags, is load-bearing for the central claim of trajectory-aware surgical prognosis.
Authors: We appreciate the referee's precise identification of this issue. The manuscript already states in the abstract that 'the network representations may potentially learn the anatomical features of the resection cavities (i.e., temporal versus non-temporal locations) rather than true prognostic morphometry' and positions the work as 'mainly as a proof-of-concept for the trajectory-aware retrieval architecture.' The resection-type proxy is required because the public EPISURG dataset does not contain ground-truth seizure-freedom labels. Although the Siamese trajectory vectors necessarily encode resection geometry, the consistent AUC improvement over the single-timepoint ResNet-50 baseline (0.793 vs. 0.834–0.905) indicates that longitudinal change information contributes signal beyond static anatomy alone. We do not interpret the current results as evidence of fully independent prognostic morphometry. In revision we will (i) expand the abstract evaluation paragraph to restate the proxy limitation upfront, (ii) add a dedicated limitations subsection that quantifies the circularity risk, and (iii) outline concrete next steps for validation on datasets that supply actual post-surgical outcome labels. revision: partial
- The public EPISURG dataset lacks ground-truth post-surgical seizure outcome labels, which prevents a direct, non-circular test of whether the trajectory vectors capture prognostic morphometry independent of resection anatomy.
Circularity Check
Proxy seizure-freedom label derived from resection type, directly encoded in pre-to-post trajectory vector
specific steps
-
self definitional
[Abstract]
"Since ground-truth seizure-freedom scores are unavailable, we utilize a clinical proxy label based on the resection type. We acknowledge that the network representations may potentially learn the anatomical features of the resection cavities (i.e., temporal versus non-temporal locations) rather than true prognostic morphometry. ... Trajectory-based classifiers achieve AUC values between 0.834 and 0.905, compared with 0.793 for a single-timepoint ResNet-50 baseline."
The proxy label is resection type. The trajectory vector is derived from pre-to-post MRI pairs via Siamese encoder, which includes the post-operative resection cavity. Any classifier on this vector can achieve high AUC by recovering the resection label already present in the input features, rather than learning independent prognostic morphometry. The single-timepoint baseline lacks this information and shows lower AUC, confirming the circularity.
full rationale
The paper's central performance claims (AUC 0.834–0.905 for trajectory classifiers vs. 0.793 baseline, Neuro-Oracle at 0.867) rest on a proxy label defined by resection type. The 512-dim trajectory is produced by a Siamese encoder on paired pre- and post-operative scans, so the vector necessarily encodes resection cavity geometry (temporal vs. non-temporal). Classifiers or retrievers in this space recover the label from features that contain it, rendering the reported 'prognostic' gains tautological. The paper explicitly acknowledges the risk but still presents the metrics as evidence of trajectory-aware value, with the evaluation framed only as a proof-of-concept after the fact.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption A 3D Siamese contrastive encoder produces a compact trajectory vector that captures clinically relevant pre-to-post MRI morphological changes.
- domain assumption Nearest-neighbor search in trajectory space retrieves historically relevant cases for prognosis synthesis.
invented entities (1)
-
512-dimensional trajectory vector
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A reproducible eval- uation of ants similarity metric performance in brain image registration.Neuroimage, 54(3):2033–2044, 2011
Brian B Avants, Nicholas J Tustison, Gang Song, Philip A Cook, Arno Klein, and James C Gee. A reproducible eval- uation of ants similarity metric performance in brain image registration.Neuroimage, 54(3):2033–2044, 2011. 2, 3
2033
-
[2]
Longitudinal and cross- sectional analysis of atrophy in pharmacoresistant temporal lobe epilepsy.Neurology, 72(20):1747–1754, 2009
Boris C Bernhardt, KJ Worsley, H Kim, AC Evans, A Bernasconi, and N Bernasconi. Longitudinal and cross- sectional analysis of atrophy in pharmacoresistant temporal lobe epilepsy.Neurology, 72(20):1747–1754, 2009. 1, 2, 5, 8
2009
-
[3]
Lan- guage models are few-shot learners.Advances in neural in- formation processing systems, 33:1877–1901, 2020
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan- guage models are few-shot learners.Advances in neural in- formation processing systems, 33:1877–1901, 2020. 2
1901
-
[4]
MONAI: An open-source framework for deep learning in healthcare
M Jorge Cardoso, Wenqi Li, Richard Brown, Nic Ma, Eric Kerfoot, Yiheng Wang, Benjamin Murrey, Andriy Myro- nenko, Can Zhao, Dong Yang, et al. Monai: An open-source framework for deep learning in healthcare.arXiv preprint arXiv:2211.02701, 2022. 2, 3, 5
work page internal anchor Pith review arXiv 2022
-
[5]
A simple framework for contrastive learning of visual representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge- offrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on ma- chine learning, pages 1597–1607. PmLR, 2020. 2
2020
-
[6]
Learning a similarity metric discriminatively, with application to face verification
Sumit Chopra, Raia Hadsell, and Yann LeCun. Learning a similarity metric discriminatively, with application to face verification. In2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), pages 539–546. IEEE, 2005. 1, 2, 3
2005
-
[7]
The long-term outcome of adult epilepsy surgery, patterns of seizure remission, and relapse: a cohort study.The Lancet, 378(9800):1388–1395, 2011
Jane De Tisi, Gail S Bell, Janet L Peacock, Andrew W McEvoy, William FJ Harkness, Josemir W Sander, and John S Duncan. The long-term outcome of adult epilepsy surgery, patterns of seizure remission, and relapse: a cohort study.The Lancet, 378(9800):1388–1395, 2011. 1, 8
2011
-
[8]
Qlora: Efficient fine-tuning of quantized language models.arXiv preprint, 2022
T Dettmers, M Lewis, S Shleifer, and L Zettlemoyer. Qlora: Efficient fine-tuning of quantized language models.arXiv preprint, 2022. 2, 4
2022
-
[9]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 2, 5
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[10]
Jerome Engel Jr, Samuel Wiebe, Jacqueline French, Michael Sperling, Peter Williamson, Dennis Spencer, Robert Gum- nit, Catherine Zahn, Edward Westbrook, and Bruce Enos. Practice parameter: temporal lobe and localized neocortical resections for epilepsy: report of the quality standards sub- committee of the american academy of neurology, in asso- ciation w...
2003
-
[11]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Ab- hinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024. 1, 2, 4, 8
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
Unetr: Transformers for 3d med- ical image segmentation
Ali Hatamizadeh, Yucheng Tang, Vishwesh Nath, Dong Yang, Andriy Myronenko, Bennett Landman, Holger R Roth, and Daguang Xu. Unetr: Transformers for 3d med- ical image segmentation. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 574–584, 2022. 2
2022
-
[13]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 2, 3, 5
2016
-
[14]
Momentum contrast for unsupervised visual rep- resentation learning
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual rep- resentation learning. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 9729–9738, 2020. 2
2020
-
[15]
Billion- scale similarity search with gpus.IEEE transactions on big data, 7(3):535–547, 2019
Jeff Johnson, Matthijs Douze, and Herv ´e J ´egou. Billion- scale similarity search with gpus.IEEE transactions on big data, 7(3):535–547, 2019. 1, 2, 4
2019
-
[16]
Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673,
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673,
-
[17]
Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨uttler, Mike Lewis, Wen-tau Yih, Tim Rockt ¨aschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020. 1, 2, 8
2020
-
[18]
Focal loss for dense object detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. InPro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. 2, 3, 8
2017
-
[19]
A survey on deep learning in medical image analysis.Medical image analysis, 42:60–88,
Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Ar- naud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Gin- neken, and Clara I S ´anchez. A survey on deep learning in medical image analysis.Medical image analysis, 42:60–88,
-
[20]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017. 4, 5
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[21]
P ´erez-Garc´ıa, R
F. P ´erez-Garc´ıa, R. Rodionov, A. Alim-Marvasti, R. Sparks, J. S. Duncan, and S. Ourselin. EPISURG: MRI dataset for quantitative analysis of resective neurosurgery for refractory epilepsy.-, 2020. 2, 3, 8
2020
-
[22]
Within-subject template estimation for unbi- ased longitudinal image analysis.Neuroimage, 61(4):1402– 1418, 2012
Martin Reuter, Nicholas J Schmansky, H Diana Rosas, and Bruce Fischl. Within-subject template estimation for unbi- ased longitudinal image analysis.Neuroimage, 61(4):1402– 1418, 2012. 2, 3
2012
-
[23]
Outcome of epilepsy surgery in mri-negative patients without histopathologic abnormalities in the re- sected tissue.Neurology, 102(4):e208007, 2024
Maurits W Sanders, Iskander Van der Wolf, Floor E Jansen, Eleonora Aronica, Christoph Helmstaedter, Attila Racz, Rainer Surges, Alexander Grote, Albert J Becker, Sylvain Rheims, et al. Outcome of epilepsy surgery in mri-negative patients without histopathologic abnormalities in the re- sected tissue.Neurology, 102(4):e208007, 2024. 1
2024
-
[24]
Large language models encode clinical knowledge.Nature, 620 (7972):172–180, 2023
Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tan- wani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode clinical knowledge.Nature, 620 (7972):172–180, 2023. 2
2023
-
[25]
Outcomes of epilepsy surgery in adults and children.The Lancet Neurology, 7(6):525–537,
Susan Spencer and Linda Huh. Outcomes of epilepsy surgery in adults and children.The Lancet Neurology, 7(6):525–537,
-
[26]
Long- term seizure outcomes following epilepsy surgery: a sys- tematic review and meta-analysis.Brain, 128(5):1188–1198,
Jos ´e F T´ellez-Zenteno, Raj Dhar, and Samuel Wiebe. Long- term seizure outcomes following epilepsy surgery: a sys- tematic review and meta-analysis.Brain, 128(5):1188–1198,
-
[27]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[28]
Visualizing data using t-sne.Journal of machine learning research, 9 (11), 2008
Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (11), 2008. 7
2008
-
[29]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 2
2017
-
[30]
A randomized, controlled trial of surgery for temporal-lobe epilepsy.New England Journal of Medicine, 345(5):311–318, 2001
Samuel Wiebe, Warren T Blume, John P Girvin, and Michael Eliasziw. A randomized, controlled trial of surgery for temporal-lobe epilepsy.New England Journal of Medicine, 345(5):311–318, 2001. 1, 3
2001
-
[31]
Transformers: State-of-the-art natural language processing
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chau- mond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R ´emi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. InProceed- ings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45,
2020
-
[32]
A review of machine learning and deep learning trends in eeg-based epileptic seizure predic- tion.IEEE Access, 2025
Yitong Wu, Eileen Lee Ming Su, Mingyu Wu, Chia Yee Ooi, and William Holderbaum. A review of machine learning and deep learning trends in eeg-based epileptic seizure predic- tion.IEEE Access, 2025. 1, 2
2025
-
[33]
Peng Xia, Kangyu Zhu, Haoran Li, Tianze Wang, Weijia Shi, Sheng Wang, Linjun Zhang, James Zou, and Huaxiu Yao. Mmed-rag: Versatile multimodal rag system for medical vi- sion language models.arXiv preprint arXiv:2410.13085,
-
[34]
Region- specific retrieval augmentation for longitudinal visual ques- tion answering: A mix-and-match paradigm
Ka-Wai Yung, Jayaram Sivaraj, Danail Stoyanov, Stavros Loukogeorgakis, and Evangelos B Mazomenos. Region- specific retrieval augmentation for longitudinal visual ques- tion answering: A mix-and-match paradigm. InInter- national Conference on Medical Image Computing and Computer-Assisted Intervention, pages 585–594. Springer,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.