pith. machine review for the scientific record. sign in

arxiv: 2604.13721 · v2 · submitted 2026-04-15 · 💻 cs.IR · cs.AI

Recognition: unknown

FRAGATA: Semantic Retrieval of HPC Support Tickets via Hybrid RAG over 20 Years of Request Tracker History

Jorge Fern\'andez-Fabeiro, Jos\'e Carlos Mouri\~no-Gallego, Nicol\'as Filloy-Montesino, Santiago Param\'es-Est\'evez

Pith reviewed 2026-05-10 12:52 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords semantic retrievalhybrid RAGsupport ticketsHPC operationsknowledge reuseRequest Trackerinformation retrievalticket search
0
0 comments X

The pith

Fragata retrieves relevant HPC support tickets even when queries differ in language, typos, or exact wording.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Fragata, a semantic search system that processes over twenty years of resolved incidents stored in Request Tracker at a supercomputing center. Traditional keyword search in the existing tool often fails to surface useful historical cases because staff phrase new queries differently from past records. Fragata applies a hybrid retrieval approach to locate matching tickets regardless of those variations, enabling better reuse of accumulated operational knowledge. A sympathetic reader would care because quicker access to similar past incidents can reduce the time needed to resolve new support requests in high-performance computing environments.

Core claim

Fragata is a semantic ticket retrieval system that combines modern information retrieval techniques with the complete Request Tracker history from the Galician Supercomputing Center. The architecture delivers matches that remain effective across language differences, typos, and rephrased queries. It runs on local infrastructure with incremental updates that avoid service interruption and shifts the heaviest processing stages to the FinisTerrae III supercomputer. Early tests indicate a clear qualitative gain over the native search engine built into Request Tracker.

What carries the argument

The hybrid RAG pipeline that merges semantic embeddings with traditional retrieval methods to index and query the full ticket corpus.

If this is right

  • Support staff can locate historical solutions using natural phrasing instead of precise keywords.
  • Twenty years of resolved incidents become reusable even when new queries contain errors or use different languages.
  • The system stays current through incremental updates that do not interrupt ongoing operations.
  • Heavy computation is offloaded to supercomputing resources to keep response times practical.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same hybrid approach could be adapted to other long-running IT support databases that accumulate similar volumes of incident data.
  • Pairing the retrieval step with generative models might allow automatic drafting of responses based on the most relevant past tickets.
  • Adding quantitative relevance scores and error analysis would enable direct comparisons against alternative search techniques.

Load-bearing premise

That the hybrid retrieval pipeline will deliver consistent useful matches on real ticket data without requiring detailed quantitative metrics or comparisons to other retrieval methods.

What would settle it

A side-by-side test in which support staff judge the relevance of tickets returned by Fragata against those from native Request Tracker search on the same set of real-world queries, with no measurable advantage for Fragata.

Figures

Figures reproduced from arXiv: 2604.13721 by Jorge Fern\'andez-Fabeiro, Jos\'e Carlos Mouri\~no-Gallego, Nicol\'as Filloy-Montesino, Santiago Param\'es-Est\'evez.

Figure 1
Figure 1. Figure 1: Ticket processing pipeline: from SQL extraction of the RT history to the generation of [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Deployment topology: the virtual machine hosts the web service and the API, while expensive indexing stages are [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

The technical support team of a supercomputing centre accumulates, over the course of decades, a large volume of resolved incidents that constitute critical operational knowledge. At the Galician Supercomputing Center (CESGA) this history has been managed for over twenty years with Request Tracker (RT), whose built-in search engine has significant limitations that hinder knowledge reuse by the support staff. This paper presents Fragata, a semantic ticket search system that combines modern information retrieval techniques with the full RT history. The system can find relevant past incidents regardless of language, the presence of typos, or the specific wording of the query. The architecture is deployed on CESGA's infrastructure, supports incremental updates without service interruption, and offloads the most expensive stages to the FinisTerrae III supercomputer. Preliminary results show a substantial qualitative improvement over RT's native search.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript describes FRAGATA, a hybrid RAG-based semantic retrieval system deployed over 20 years of Request Tracker (RT) support tickets at the Galician Supercomputing Center (CESGA). It combines embeddings, reranking, and other modern IR techniques to enable retrieval of relevant past incidents irrespective of language, typos, or query wording. The system is integrated with existing RT infrastructure, supports incremental updates without downtime, and offloads compute to the FinisTerrae III supercomputer. Preliminary qualitative results are presented as showing substantial improvement over RT's native search.

Significance. If the robustness claims were supported by rigorous evaluation, the work would offer practical value for knowledge reuse in HPC support environments by improving access to long-term incident history. The engineering focus on production deployment, incremental indexing, and hybrid integration with legacy RT is a strength for applied IR. However, the current reliance on unquantified preliminary results limits the assessed significance and generalizability of the contribution.

major comments (2)
  1. [Abstract] Abstract: The central claims that the system retrieves relevant tickets 'regardless of language, the presence of typos, or the specific wording of the query' and delivers 'substantial qualitative improvement' over RT native search rest exclusively on preliminary qualitative results. No quantitative metrics (e.g., precision, recall, nDCG), evaluation protocol, test-set construction, or head-to-head baselines (RT search, BM25, dense retrieval) are reported, preventing verification of the robustness assertion on real ticket data.
  2. [Evaluation] Evaluation section (or equivalent): There is no error analysis, discussion of failure modes for HPC-specific elements (domain terminology, code snippets, non-English tickets), or comparison against alternative retrieval methods. This absence makes the generalization claims load-bearing but unsupported, requiring a dedicated quantitative evaluation with reproducible metrics before the improvement can be assessed.
minor comments (2)
  1. [System Architecture] The manuscript would benefit from an architecture diagram illustrating the hybrid RAG pipeline stages, data flow, and offloading to FinisTerrae III to improve clarity for readers.
  2. [Related Work] Consider adding references to standard RAG and semantic retrieval benchmarks or prior work on ticket retrieval systems for better context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which correctly identifies the need for stronger empirical support. We will perform a major revision to address both points.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claims that the system retrieves relevant tickets 'regardless of language, the presence of typos, or the specific wording of the query' and delivers 'substantial qualitative improvement' over RT native search rest exclusively on preliminary qualitative results. No quantitative metrics (e.g., precision, recall, nDCG), evaluation protocol, test-set construction, or head-to-head baselines (RT search, BM25, dense retrieval) are reported, preventing verification of the robustness assertion on real ticket data.

    Authors: We agree that the abstract phrasing is too strong given the current evidence. In the revised manuscript we will rewrite the abstract to state that the system 'demonstrates promising qualitative improvements in initial tests' and explicitly note that a full quantitative evaluation with metrics and baselines is presented in the new Evaluation section. revision: yes

  2. Referee: [Evaluation] Evaluation section (or equivalent): There is no error analysis, discussion of failure modes for HPC-specific elements (domain terminology, code snippets, non-English tickets), or comparison against alternative retrieval methods. This absence makes the generalization claims load-bearing but unsupported, requiring a dedicated quantitative evaluation with reproducible metrics before the improvement can be assessed.

    Authors: We accept this criticism. The original submission emphasized system design and deployment over evaluation. For the revision we will add a dedicated Evaluation section containing: (i) a reproducible test-set construction protocol using real CESGA tickets, (ii) quantitative metrics (precision@K, recall, nDCG) on that set, (iii) head-to-head comparisons against RT native search, BM25, and dense-only retrieval, and (iv) error analysis covering domain terminology, code snippets, and non-English content. We will also discuss limitations of the evaluation. revision: yes

Circularity Check

0 steps flagged

No circularity: system description with no derivations or self-referential predictions

full rationale

The paper is a system deployment report describing FRAGATA, a hybrid RAG pipeline over RT ticket history. It makes qualitative claims about robustness to language/typos/wording and improvement over native search, but presents no equations, fitted parameters, predictions, ansatzes, or uniqueness theorems. No load-bearing self-citations or reductions of outputs to inputs by construction appear in the provided text. The work is self-contained as an engineering description evaluated via preliminary qualitative results, with no mathematical derivation chain to inspect.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the system builds on standard RAG components assumed to function as described.

pith-pipeline@v0.9.0 · 5474 in / 942 out tokens · 36666 ms · 2026-05-10T12:52:12.843422+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    Request tracker (rt),

    Best Practical Solutions, “Request tracker (rt),”https: //bestpractical.com/request-tracker

  2. [2]

    Retrieval-augmented generation for knowledge-intensive NLP tasks,

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨ uttler, Mike Lewis, Wen-tau Yih, Tim Rockt¨ aschel, Se- bastian Riedel, and Douwe Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” in Advances in Neural Information Processing Systems (NeurIPS), 2020

  3. [3]

    The probabilis- tic relevance framework: BM25 and beyond,

    Stephen Robertson and Hugo Zaragoza, “The probabilis- tic relevance framework: BM25 and beyond,”Founda- tions and Trends in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009

  4. [4]

    Passage Re-ranking with BERT

    Rodrigo Nogueira and Kyunghyun Cho, “Pas- sage re-ranking with BERT,” inarXiv preprint arXiv:1901.04085, 2019

  5. [5]

    Attention is all you need,

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017

  6. [6]

    BERT: Pre-training of deep bidi- rectional transformers for language understanding,

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “BERT: Pre-training of deep bidi- rectional transformers for language understanding,” in Proceedings of NAACL-HLT, 2019

  7. [7]

    Sentence-BERT: Sen- tence embeddings using Siamese BERT-networks,

    Nils Reimers and Iryna Gurevych, “Sentence-BERT: Sen- tence embeddings using Siamese BERT-networks,” in Proceedings of EMNLP, 2019

  8. [8]

    Dense passage retrieval for open-domain ques- tion answering,

    Vladimir Karpukhin, Barber Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen- tau Yih, “Dense passage retrieval for open-domain ques- tion answering,” inProceedings of EMNLP, 2020

  9. [9]

    Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations,

    Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira, “Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations,” inPro- ceedings of SIGIR, 2021

  10. [10]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jin- liu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang, “Retrieval-augmented generation for large language mod- els: A survey,”arXiv preprint arXiv:2312.10997, 2024

  11. [11]

    Juggling the jigsaw: Towards automated prob- lem inference from network trouble tickets,

    Rahul Potharaju, Navendu Jain, and Cristina Nita- Rotaru, “Juggling the jigsaw: Towards automated prob- lem inference from network trouble tickets,” inProcee- dings of NSDI, 2013

  12. [12]

    Smart ticket routing by multi- criteria learning,

    Wenjun Zhou, Wei Xue, Ramesh Baral, Hongyuan Zha, and Robert Welch, “Smart ticket routing by multi- criteria learning,” inProceedings of the ACM Conference on Information and Knowledge Management (CIKM), 2016

  13. [13]

    Billion- scale similarity search with GPUs,

    Jeff Johnson, Matthijs Douze, and Herv´ e J´ egou, “Billion- scale similarity search with GPUs,”IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, 2021

  14. [14]

    arXiv preprint arXiv:2108.13897 , year=

    Luiz Henrique Bonif´ acio, Hugo Abonizio, Marzieh Fadaee, and Rodrigo Nogueira, “mMARCO: A mul- tilingual version of the MS MARCO passage ranking dataset,” inarXiv preprint arXiv:2108.13897, 2022

  15. [15]

    Reciprocal rank fusion outperforms Con- dorcet and individual rank learning methods,

    Gordon V. Cormack, Charles L. A. Clarke, and Stefan B¨ uttcher, “Reciprocal rank fusion outperforms Con- dorcet and individual rank learning methods,” inPro- ceedings of SIGIR, 2009