Recognition: unknown
FRAGATA: Semantic Retrieval of HPC Support Tickets via Hybrid RAG over 20 Years of Request Tracker History
Pith reviewed 2026-05-10 12:52 UTC · model grok-4.3
The pith
Fragata retrieves relevant HPC support tickets even when queries differ in language, typos, or exact wording.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Fragata is a semantic ticket retrieval system that combines modern information retrieval techniques with the complete Request Tracker history from the Galician Supercomputing Center. The architecture delivers matches that remain effective across language differences, typos, and rephrased queries. It runs on local infrastructure with incremental updates that avoid service interruption and shifts the heaviest processing stages to the FinisTerrae III supercomputer. Early tests indicate a clear qualitative gain over the native search engine built into Request Tracker.
What carries the argument
The hybrid RAG pipeline that merges semantic embeddings with traditional retrieval methods to index and query the full ticket corpus.
If this is right
- Support staff can locate historical solutions using natural phrasing instead of precise keywords.
- Twenty years of resolved incidents become reusable even when new queries contain errors or use different languages.
- The system stays current through incremental updates that do not interrupt ongoing operations.
- Heavy computation is offloaded to supercomputing resources to keep response times practical.
Where Pith is reading between the lines
- The same hybrid approach could be adapted to other long-running IT support databases that accumulate similar volumes of incident data.
- Pairing the retrieval step with generative models might allow automatic drafting of responses based on the most relevant past tickets.
- Adding quantitative relevance scores and error analysis would enable direct comparisons against alternative search techniques.
Load-bearing premise
That the hybrid retrieval pipeline will deliver consistent useful matches on real ticket data without requiring detailed quantitative metrics or comparisons to other retrieval methods.
What would settle it
A side-by-side test in which support staff judge the relevance of tickets returned by Fragata against those from native Request Tracker search on the same set of real-world queries, with no measurable advantage for Fragata.
Figures
read the original abstract
The technical support team of a supercomputing centre accumulates, over the course of decades, a large volume of resolved incidents that constitute critical operational knowledge. At the Galician Supercomputing Center (CESGA) this history has been managed for over twenty years with Request Tracker (RT), whose built-in search engine has significant limitations that hinder knowledge reuse by the support staff. This paper presents Fragata, a semantic ticket search system that combines modern information retrieval techniques with the full RT history. The system can find relevant past incidents regardless of language, the presence of typos, or the specific wording of the query. The architecture is deployed on CESGA's infrastructure, supports incremental updates without service interruption, and offloads the most expensive stages to the FinisTerrae III supercomputer. Preliminary results show a substantial qualitative improvement over RT's native search.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes FRAGATA, a hybrid RAG-based semantic retrieval system deployed over 20 years of Request Tracker (RT) support tickets at the Galician Supercomputing Center (CESGA). It combines embeddings, reranking, and other modern IR techniques to enable retrieval of relevant past incidents irrespective of language, typos, or query wording. The system is integrated with existing RT infrastructure, supports incremental updates without downtime, and offloads compute to the FinisTerrae III supercomputer. Preliminary qualitative results are presented as showing substantial improvement over RT's native search.
Significance. If the robustness claims were supported by rigorous evaluation, the work would offer practical value for knowledge reuse in HPC support environments by improving access to long-term incident history. The engineering focus on production deployment, incremental indexing, and hybrid integration with legacy RT is a strength for applied IR. However, the current reliance on unquantified preliminary results limits the assessed significance and generalizability of the contribution.
major comments (2)
- [Abstract] Abstract: The central claims that the system retrieves relevant tickets 'regardless of language, the presence of typos, or the specific wording of the query' and delivers 'substantial qualitative improvement' over RT native search rest exclusively on preliminary qualitative results. No quantitative metrics (e.g., precision, recall, nDCG), evaluation protocol, test-set construction, or head-to-head baselines (RT search, BM25, dense retrieval) are reported, preventing verification of the robustness assertion on real ticket data.
- [Evaluation] Evaluation section (or equivalent): There is no error analysis, discussion of failure modes for HPC-specific elements (domain terminology, code snippets, non-English tickets), or comparison against alternative retrieval methods. This absence makes the generalization claims load-bearing but unsupported, requiring a dedicated quantitative evaluation with reproducible metrics before the improvement can be assessed.
minor comments (2)
- [System Architecture] The manuscript would benefit from an architecture diagram illustrating the hybrid RAG pipeline stages, data flow, and offloading to FinisTerrae III to improve clarity for readers.
- [Related Work] Consider adding references to standard RAG and semantic retrieval benchmarks or prior work on ticket retrieval systems for better context.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which correctly identifies the need for stronger empirical support. We will perform a major revision to address both points.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claims that the system retrieves relevant tickets 'regardless of language, the presence of typos, or the specific wording of the query' and delivers 'substantial qualitative improvement' over RT native search rest exclusively on preliminary qualitative results. No quantitative metrics (e.g., precision, recall, nDCG), evaluation protocol, test-set construction, or head-to-head baselines (RT search, BM25, dense retrieval) are reported, preventing verification of the robustness assertion on real ticket data.
Authors: We agree that the abstract phrasing is too strong given the current evidence. In the revised manuscript we will rewrite the abstract to state that the system 'demonstrates promising qualitative improvements in initial tests' and explicitly note that a full quantitative evaluation with metrics and baselines is presented in the new Evaluation section. revision: yes
-
Referee: [Evaluation] Evaluation section (or equivalent): There is no error analysis, discussion of failure modes for HPC-specific elements (domain terminology, code snippets, non-English tickets), or comparison against alternative retrieval methods. This absence makes the generalization claims load-bearing but unsupported, requiring a dedicated quantitative evaluation with reproducible metrics before the improvement can be assessed.
Authors: We accept this criticism. The original submission emphasized system design and deployment over evaluation. For the revision we will add a dedicated Evaluation section containing: (i) a reproducible test-set construction protocol using real CESGA tickets, (ii) quantitative metrics (precision@K, recall, nDCG) on that set, (iii) head-to-head comparisons against RT native search, BM25, and dense-only retrieval, and (iv) error analysis covering domain terminology, code snippets, and non-English content. We will also discuss limitations of the evaluation. revision: yes
Circularity Check
No circularity: system description with no derivations or self-referential predictions
full rationale
The paper is a system deployment report describing FRAGATA, a hybrid RAG pipeline over RT ticket history. It makes qualitative claims about robustness to language/typos/wording and improvement over native search, but presents no equations, fitted parameters, predictions, ansatzes, or uniqueness theorems. No load-bearing self-citations or reductions of outputs to inputs by construction appear in the provided text. The work is self-contained as an engineering description evaluated via preliminary qualitative results, with no mathematical derivation chain to inspect.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Request tracker (rt),
Best Practical Solutions, “Request tracker (rt),”https: //bestpractical.com/request-tracker
-
[2]
Retrieval-augmented generation for knowledge-intensive NLP tasks,
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨ uttler, Mike Lewis, Wen-tau Yih, Tim Rockt¨ aschel, Se- bastian Riedel, and Douwe Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” in Advances in Neural Information Processing Systems (NeurIPS), 2020
2020
-
[3]
The probabilis- tic relevance framework: BM25 and beyond,
Stephen Robertson and Hugo Zaragoza, “The probabilis- tic relevance framework: BM25 and beyond,”Founda- tions and Trends in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009
2009
-
[4]
Rodrigo Nogueira and Kyunghyun Cho, “Pas- sage re-ranking with BERT,” inarXiv preprint arXiv:1901.04085, 2019
work page internal anchor Pith review arXiv 1901
-
[5]
Attention is all you need,
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017
2017
-
[6]
BERT: Pre-training of deep bidi- rectional transformers for language understanding,
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “BERT: Pre-training of deep bidi- rectional transformers for language understanding,” in Proceedings of NAACL-HLT, 2019
2019
-
[7]
Sentence-BERT: Sen- tence embeddings using Siamese BERT-networks,
Nils Reimers and Iryna Gurevych, “Sentence-BERT: Sen- tence embeddings using Siamese BERT-networks,” in Proceedings of EMNLP, 2019
2019
-
[8]
Dense passage retrieval for open-domain ques- tion answering,
Vladimir Karpukhin, Barber Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen- tau Yih, “Dense passage retrieval for open-domain ques- tion answering,” inProceedings of EMNLP, 2020
2020
-
[9]
Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations,
Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira, “Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations,” inPro- ceedings of SIGIR, 2021
2021
-
[10]
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jin- liu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang, “Retrieval-augmented generation for large language mod- els: A survey,”arXiv preprint arXiv:2312.10997, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[11]
Juggling the jigsaw: Towards automated prob- lem inference from network trouble tickets,
Rahul Potharaju, Navendu Jain, and Cristina Nita- Rotaru, “Juggling the jigsaw: Towards automated prob- lem inference from network trouble tickets,” inProcee- dings of NSDI, 2013
2013
-
[12]
Smart ticket routing by multi- criteria learning,
Wenjun Zhou, Wei Xue, Ramesh Baral, Hongyuan Zha, and Robert Welch, “Smart ticket routing by multi- criteria learning,” inProceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2016
2016
-
[13]
Billion- scale similarity search with GPUs,
Jeff Johnson, Matthijs Douze, and Herv´ e J´ egou, “Billion- scale similarity search with GPUs,”IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, 2021
2021
-
[14]
arXiv preprint arXiv:2108.13897 , year=
Luiz Henrique Bonif´ acio, Hugo Abonizio, Marzieh Fadaee, and Rodrigo Nogueira, “mMARCO: A mul- tilingual version of the MS MARCO passage ranking dataset,” inarXiv preprint arXiv:2108.13897, 2022
-
[15]
Reciprocal rank fusion outperforms Con- dorcet and individual rank learning methods,
Gordon V. Cormack, Charles L. A. Clarke, and Stefan B¨ uttcher, “Reciprocal rank fusion outperforms Con- dorcet and individual rank learning methods,” inPro- ceedings of SIGIR, 2009
2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.