arxiv: 2604.13721 · v2 · submitted 2026-04-15 · 💻 cs.IR · cs.AI

Recognition: unknown

FRAGATA: Semantic Retrieval of HPC Support Tickets via Hybrid RAG over 20 Years of Request Tracker History

Jorge Fern\'andez-Fabeiro, Jos\'e Carlos Mouri\~no-Gallego, Nicol\'as Filloy-Montesino, Santiago Param\'es-Est\'evez

Pith reviewed 2026-05-10 12:52 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords semantic retrievalhybrid RAGsupport ticketsHPC operationsknowledge reuseRequest Trackerinformation retrievalticket search

0 comments

The pith

Fragata retrieves relevant HPC support tickets even when queries differ in language, typos, or exact wording.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Fragata, a semantic search system that processes over twenty years of resolved incidents stored in Request Tracker at a supercomputing center. Traditional keyword search in the existing tool often fails to surface useful historical cases because staff phrase new queries differently from past records. Fragata applies a hybrid retrieval approach to locate matching tickets regardless of those variations, enabling better reuse of accumulated operational knowledge. A sympathetic reader would care because quicker access to similar past incidents can reduce the time needed to resolve new support requests in high-performance computing environments.

Core claim

Fragata is a semantic ticket retrieval system that combines modern information retrieval techniques with the complete Request Tracker history from the Galician Supercomputing Center. The architecture delivers matches that remain effective across language differences, typos, and rephrased queries. It runs on local infrastructure with incremental updates that avoid service interruption and shifts the heaviest processing stages to the FinisTerrae III supercomputer. Early tests indicate a clear qualitative gain over the native search engine built into Request Tracker.

What carries the argument

The hybrid RAG pipeline that merges semantic embeddings with traditional retrieval methods to index and query the full ticket corpus.

If this is right

Support staff can locate historical solutions using natural phrasing instead of precise keywords.
Twenty years of resolved incidents become reusable even when new queries contain errors or use different languages.
The system stays current through incremental updates that do not interrupt ongoing operations.
Heavy computation is offloaded to supercomputing resources to keep response times practical.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same hybrid approach could be adapted to other long-running IT support databases that accumulate similar volumes of incident data.
Pairing the retrieval step with generative models might allow automatic drafting of responses based on the most relevant past tickets.
Adding quantitative relevance scores and error analysis would enable direct comparisons against alternative search techniques.

Load-bearing premise

That the hybrid retrieval pipeline will deliver consistent useful matches on real ticket data without requiring detailed quantitative metrics or comparisons to other retrieval methods.

What would settle it

A side-by-side test in which support staff judge the relevance of tickets returned by Fragata against those from native Request Tracker search on the same set of real-world queries, with no measurable advantage for Fragata.

Figures

Figures reproduced from arXiv: 2604.13721 by Jorge Fern\'andez-Fabeiro, Jos\'e Carlos Mouri\~no-Gallego, Nicol\'as Filloy-Montesino, Santiago Param\'es-Est\'evez.

**Figure 2.** Figure 2: Deployment topology: the virtual machine hosts the web service and the API, while expensive indexing stages are [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

The technical support team of a supercomputing centre accumulates, over the course of decades, a large volume of resolved incidents that constitute critical operational knowledge. At the Galician Supercomputing Center (CESGA) this history has been managed for over twenty years with Request Tracker (RT), whose built-in search engine has significant limitations that hinder knowledge reuse by the support staff. This paper presents Fragata, a semantic ticket search system that combines modern information retrieval techniques with the full RT history. The system can find relevant past incidents regardless of language, the presence of typos, or the specific wording of the query. The architecture is deployed on CESGA's infrastructure, supports incremental updates without service interruption, and offloads the most expensive stages to the FinisTerrae III supercomputer. Preliminary results show a substantial qualitative improvement over RT's native search.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fragata is a practical RAG deployment on 20 years of HPC tickets that describes a working system but leaves its improvement claims unmeasured.

read the letter

This paper describes a real deployment of hybrid RAG for searching 20 years of HPC support tickets, but its claims of substantial improvement rest on qualitative impressions rather than measured results. The work takes the existing Request Tracker history at CESGA and adds embeddings plus reranking on top, with incremental updates and compute offloaded to their supercomputer. That setup addresses a genuine operational pain point for support staff who need to reuse past incidents despite language differences, typos, or rephrased queries. The architecture section shows they thought through production constraints like uptime and integration with the legacy system, which is the part that feels grounded. What stands out as useful is the concrete description of how they made the hybrid pipeline run in their environment without disrupting daily work. The paper does not introduce new retrieval methods or frameworks, but it shows one way to apply established techniques to a long institutional archive. The soft spot is the evaluation. It reports only preliminary qualitative results with no precision, recall, nDCG, or direct comparison to the native RT search or to BM25 on the same ticket set. Without those numbers or an error analysis on HPC-specific terms and non-English tickets, the robustness claim stays hard to verify. This kind of paper is mainly for practitioners running support systems at other HPC centers or similar knowledge-base teams who want a case study on modernizing search. It is not aimed at core IR researchers. I would bring it to a reading group if the group discusses applied deployments, but I would not cite it in my own work. It deserves a serious referee because the deployment details are specific and the problem is real, though reviewers will almost certainly ask for quantitative results and baselines before it can stand on its own.

Referee Report

2 major / 2 minor

Summary. The manuscript describes FRAGATA, a hybrid RAG-based semantic retrieval system deployed over 20 years of Request Tracker (RT) support tickets at the Galician Supercomputing Center (CESGA). It combines embeddings, reranking, and other modern IR techniques to enable retrieval of relevant past incidents irrespective of language, typos, or query wording. The system is integrated with existing RT infrastructure, supports incremental updates without downtime, and offloads compute to the FinisTerrae III supercomputer. Preliminary qualitative results are presented as showing substantial improvement over RT's native search.

Significance. If the robustness claims were supported by rigorous evaluation, the work would offer practical value for knowledge reuse in HPC support environments by improving access to long-term incident history. The engineering focus on production deployment, incremental indexing, and hybrid integration with legacy RT is a strength for applied IR. However, the current reliance on unquantified preliminary results limits the assessed significance and generalizability of the contribution.

major comments (2)

[Abstract] Abstract: The central claims that the system retrieves relevant tickets 'regardless of language, the presence of typos, or the specific wording of the query' and delivers 'substantial qualitative improvement' over RT native search rest exclusively on preliminary qualitative results. No quantitative metrics (e.g., precision, recall, nDCG), evaluation protocol, test-set construction, or head-to-head baselines (RT search, BM25, dense retrieval) are reported, preventing verification of the robustness assertion on real ticket data.
[Evaluation] Evaluation section (or equivalent): There is no error analysis, discussion of failure modes for HPC-specific elements (domain terminology, code snippets, non-English tickets), or comparison against alternative retrieval methods. This absence makes the generalization claims load-bearing but unsupported, requiring a dedicated quantitative evaluation with reproducible metrics before the improvement can be assessed.

minor comments (2)

[System Architecture] The manuscript would benefit from an architecture diagram illustrating the hybrid RAG pipeline stages, data flow, and offloading to FinisTerrae III to improve clarity for readers.
[Related Work] Consider adding references to standard RAG and semantic retrieval benchmarks or prior work on ticket retrieval systems for better context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which correctly identifies the need for stronger empirical support. We will perform a major revision to address both points.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims that the system retrieves relevant tickets 'regardless of language, the presence of typos, or the specific wording of the query' and delivers 'substantial qualitative improvement' over RT native search rest exclusively on preliminary qualitative results. No quantitative metrics (e.g., precision, recall, nDCG), evaluation protocol, test-set construction, or head-to-head baselines (RT search, BM25, dense retrieval) are reported, preventing verification of the robustness assertion on real ticket data.

Authors: We agree that the abstract phrasing is too strong given the current evidence. In the revised manuscript we will rewrite the abstract to state that the system 'demonstrates promising qualitative improvements in initial tests' and explicitly note that a full quantitative evaluation with metrics and baselines is presented in the new Evaluation section. revision: yes
Referee: [Evaluation] Evaluation section (or equivalent): There is no error analysis, discussion of failure modes for HPC-specific elements (domain terminology, code snippets, non-English tickets), or comparison against alternative retrieval methods. This absence makes the generalization claims load-bearing but unsupported, requiring a dedicated quantitative evaluation with reproducible metrics before the improvement can be assessed.

Authors: We accept this criticism. The original submission emphasized system design and deployment over evaluation. For the revision we will add a dedicated Evaluation section containing: (i) a reproducible test-set construction protocol using real CESGA tickets, (ii) quantitative metrics (precision@K, recall, nDCG) on that set, (iii) head-to-head comparisons against RT native search, BM25, and dense-only retrieval, and (iv) error analysis covering domain terminology, code snippets, and non-English content. We will also discuss limitations of the evaluation. revision: yes

Circularity Check

0 steps flagged

No circularity: system description with no derivations or self-referential predictions

full rationale

The paper is a system deployment report describing FRAGATA, a hybrid RAG pipeline over RT ticket history. It makes qualitative claims about robustness to language/typos/wording and improvement over native search, but presents no equations, fitted parameters, predictions, ansatzes, or uniqueness theorems. No load-bearing self-citations or reductions of outputs to inputs by construction appear in the provided text. The work is self-contained as an engineering description evaluated via preliminary qualitative results, with no mathematical derivation chain to inspect.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the system builds on standard RAG components assumed to function as described.

pith-pipeline@v0.9.0 · 5474 in / 942 out tokens · 36666 ms · 2026-05-10T12:52:12.843422+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Request tracker (rt),

Best Practical Solutions, “Request tracker (rt),”https: //bestpractical.com/request-tracker
[2]

Retrieval-augmented generation for knowledge-intensive NLP tasks,

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨ uttler, Mike Lewis, Wen-tau Yih, Tim Rockt¨ aschel, Se- bastian Riedel, and Douwe Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” in Advances in Neural Information Processing Systems (NeurIPS), 2020

2020
[3]

The probabilis- tic relevance framework: BM25 and beyond,

Stephen Robertson and Hugo Zaragoza, “The probabilis- tic relevance framework: BM25 and beyond,”Founda- tions and Trends in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009

2009
[4]

Passage Re-ranking with BERT

Rodrigo Nogueira and Kyunghyun Cho, “Pas- sage re-ranking with BERT,” inarXiv preprint arXiv:1901.04085, 2019

work page internal anchor Pith review arXiv 1901
[5]

Attention is all you need,

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017

2017
[6]

BERT: Pre-training of deep bidi- rectional transformers for language understanding,

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “BERT: Pre-training of deep bidi- rectional transformers for language understanding,” in Proceedings of NAACL-HLT, 2019

2019
[7]

Sentence-BERT: Sen- tence embeddings using Siamese BERT-networks,

Nils Reimers and Iryna Gurevych, “Sentence-BERT: Sen- tence embeddings using Siamese BERT-networks,” in Proceedings of EMNLP, 2019

2019
[8]

Dense passage retrieval for open-domain ques- tion answering,

Vladimir Karpukhin, Barber Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen- tau Yih, “Dense passage retrieval for open-domain ques- tion answering,” inProceedings of EMNLP, 2020

2020
[9]

Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations,

Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira, “Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations,” inPro- ceedings of SIGIR, 2021

2021
[10]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jin- liu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang, “Retrieval-augmented generation for large language mod- els: A survey,”arXiv preprint arXiv:2312.10997, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

Juggling the jigsaw: Towards automated prob- lem inference from network trouble tickets,

Rahul Potharaju, Navendu Jain, and Cristina Nita- Rotaru, “Juggling the jigsaw: Towards automated prob- lem inference from network trouble tickets,” inProcee- dings of NSDI, 2013

2013
[12]

Smart ticket routing by multi- criteria learning,

Wenjun Zhou, Wei Xue, Ramesh Baral, Hongyuan Zha, and Robert Welch, “Smart ticket routing by multi- criteria learning,” inProceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2016

2016
[13]

Billion- scale similarity search with GPUs,

Jeff Johnson, Matthijs Douze, and Herv´ e J´ egou, “Billion- scale similarity search with GPUs,”IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, 2021

2021
[14]

arXiv preprint arXiv:2108.13897 , year=

Luiz Henrique Bonif´ acio, Hugo Abonizio, Marzieh Fadaee, and Rodrigo Nogueira, “mMARCO: A mul- tilingual version of the MS MARCO passage ranking dataset,” inarXiv preprint arXiv:2108.13897, 2022

work page arXiv 2022
[15]

Reciprocal rank fusion outperforms Con- dorcet and individual rank learning methods,

Gordon V. Cormack, Charles L. A. Clarke, and Stefan B¨ uttcher, “Reciprocal rank fusion outperforms Con- dorcet and individual rank learning methods,” inPro- ceedings of SIGIR, 2009

2009