pith. machine review for the scientific record. sign in

arxiv: 2604.21125 · v1 · submitted 2026-04-22 · 💻 cs.DC

Recognition: unknown

A Cloud-Native Architecture for Human-in-Control LLM-Assisted OpenSearch in Investigative Settings

Benjamin Puhani , Kai Brehmer , Malte Prie{\ss}

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:41 UTC · model grok-4.3

classification 💻 cs.DC
keywords cloud-native architecturelarge language modelshuman-in-controlOpenSearchinvestigative searchhybrid retrievalmicroservicesnatural language to DSL
0
0 comments X

The pith

A cloud-native system uses large language models to turn natural-language investigative queries into precise OpenSearch commands under continuous human oversight.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a design for a cloud-native microservice architecture that incorporates large language models to bridge the gap between everyday investigative language and the technical syntax required for searching large unstructured evidence collections. Users express their search needs in plain language, after which the LLM generates corresponding OpenSearch Domain-Specific Language expressions that a human operator must review and approve before execution. The architecture combines this controlled translation step with a hybrid retrieval method inside OpenSearch that merges traditional keyword-based ranking with vector embeddings for semantic similarity. A working prototype demonstrates technical feasibility using the Enron Email Dataset as a stand-in for restricted investigative material, while outlining a path for later rigorous testing. The overall goal is to enable scalable, secure deployments in private-cloud environments where data sensitivity and audit requirements are high.

Core claim

The proposed system integrates Large Language Models into a Human-in-Control workflow that translates natural-language queries into syntactically valid OpenSearch Domain-Specific Language expressions, supported by a hybrid BM25-plus-vector retrieval strategy inside OpenSearch and implemented as a cloud-native microservice architecture suitable for private-cloud investigative deployments.

What carries the argument

The Human-in-Control workflow, in which an LLM proposes but a human reviews and validates OpenSearch DSL queries before they are executed against the evidence index.

If this is right

  • Investigators without deep query-language expertise can explore large evidence sets more quickly while retaining final authority over every search.
  • The microservice design separates LLM inference from the search engine, allowing independent scaling and security controls in private clouds.
  • Hybrid lexical-plus-vector retrieval can surface both exact matches and conceptually related items within the same result list.
  • The outlined evaluation methodology using a public proxy dataset provides a reproducible baseline for measuring later performance on actual restricted corpora.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar controlled-translation patterns could be applied to other specialized query languages beyond OpenSearch.
  • The architecture may reduce the cognitive load on analysts who currently must translate investigative intent into technical syntax manually.
  • Extending the LLM prompt engineering to include domain-specific investigative terminology could improve translation accuracy in real cases.
  • Deployment in actual law-enforcement settings would require additional audit logging and access-control layers not detailed in the prototype.

Load-bearing premise

Large language models can reliably generate syntactically correct and semantically useful OpenSearch queries that human operators can effectively oversee and refine.

What would settle it

A controlled test in which the prototype processes a set of realistic investigative queries and the LLM outputs contain invalid syntax or miss key relevant documents in more than a small fraction of cases even after human review.

Figures

Figures reproduced from arXiv: 2604.21125 by Benjamin Puhani, Kai Brehmer, Malte Prie{\ss}.

Figure 1
Figure 1. Figure 1: Schematic representation of the modular architecture and data flow. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
read the original abstract

Complex criminal investigations are often hindered by large volumes of unstructured evidence and by the semantic gap between natural language investigative intent and technical search logic. To address this challenge, we present a design and feasibility study of a cloud-native microservice architecture tailored to private-cloud deployments, contributing to research in secure cloud computing and leveraging modern cloud paradigms under high security and scalability requirements. The proposed system integrates Large Language Models into a "Human-in-Control" workflow that translates natural-language queries into syntactically valid OpenSearch Domain-Specific Language expressions. We describe the implementation of a hybrid retrieval strategy within OpenSearch that combines BM25-based lexical search with nested semantic vector embeddings. The paper focuses on system design and preliminary functional validation, establishing an architectural baseline for future empirical evaluation. Technical feasibility is demonstrated through a functional prototype, and a rigorous evaluation methodology is outlined using the Enron Email Dataset as a structural proxy for restricted investigative corpora.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard domain assumptions about LLM reliability under human supervision and the benefits of hybrid retrieval; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption LLMs can be guided by prompts and human review to produce syntactically valid OpenSearch DSL queries
    Central to the human-in-control workflow described in the abstract.

pith-pipeline@v0.9.0 · 5460 in / 1171 out tokens · 133854 ms · 2026-05-09T22:41:15.672133+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 9 canonical work pages · 1 internal anchor

  1. [1]

    Infor- mation analysis in criminal investigations: Methods, challenges, and computational opportunities processing unstructured text,

    M. Skipanes, G. Demartini, K. Franke, and A. B. Nissen, “Infor- mation analysis in criminal investigations: Methods, challenges, and computational opportunities processing unstructured text,” Policing: A Journal of Policy and Practice, vol. 19, paaf005, Mar. 2025,ISSN: 1752-4520.DOI: 10.1093/police/paaf005

  2. [2]

    OpenSearch Project,OpenSearch, version 3.4, https://opensear ch.org/ [visited: 2026-03-13], The Linux Foundation, 2025

  3. [3]

    Microsoft coco: Common objects in contex t

    B. Klimt and Y . Yang, “The enron corpus: A new dataset for email classification research,” inMachine Learning: ECML 2004, J. -F. Boulicaut, F. Esposito, F. Giannotti, and D. Pe- dreschi, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2004, pp. 217–226,ISBN: 978-3-540-30115-8.DOI: 10.1007/9 78-3-540-30115-8_22

  4. [4]

    ACM Trans

    Y . Zhu et al., “Large language models for information retrieval: A survey,”ACM Transactions on Information Systems, vol. 44, no. 1, pp. 1–54, 2026,ISSN: 1046-8188.DOI: 10.1145/3748304

  5. [5]

    A survey on employing large language models for text-to-SQL tasks.ACM Computing Surveys, 2024

    L. Shi, Z. Tang, N. Zhang, X. Zhang, and Z. Yang, “A survey on employing large language models for text-to-sql tasks,” ACM Computing Surveys, vol. 58, no. 2, pp. 1–37, 2026,ISSN: 0360-0300.DOI: 10.1145/3737873

  6. [6]

    Towards Learning Boulder Excavation with Hydraulic Excavators

    Y . Gao et al.,Retrieval-augmented generation for large language models: A survey, 2024.DOI: 10.48550/arXiv.2 312.10997

  7. [7]

    Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

    N. F. Liu et al., “Lost in the middle: How language mod- els use long contexts,”Transactions of the Association for Computational Linguistics, vol. 12, pp. 157–173, 2024.DOI: 10.1162/tacl_a_00638

  8. [8]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei et al., “Chain-of-thought prompting elicits reasoning in large language models,” inAdvances in Neural Information Processing Systems, S. Koyejo et al., Eds., vol. 35, Curran Associates, Inc., 2022, pp. 24 824–24 837

  9. [9]

    Langflow AI,Langflow, version 1.6.8, https://github.com/langf low-ai/langflow [visited: 2026-03-13], 2025

  10. [10]

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    N. Reimers and I. Gurevych, “Sentence-BERT: Sentence em- beddings using Siamese BERT-networks,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2019. DOI: 10.48550/arXiv.1908.10084

  11. [11]

    C. D. Manning, P. Raghavan, and H. Schütze,Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008,ISBN: 9780521865715.DOI: 10.1017/CBO97805118090 71

  12. [12]

    Okapi at trec-3,

    S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford, “Okapi at trec-3,” inOverview of the Third Text REtrieval Conference (TREC-3), NIST, 1995, pp. 109–126

  13. [13]

    Malkov and D

    Y . A. Malkov and D. A. Yashunin, “Efficient and robust ap- proximate nearest neighbor search using hierarchical navigable small world graphs,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 4, pp. 824–836, 2020. DOI: 10.1109/TPAMI.2018.2889473

  14. [14]

    Retrieval-Augmented Generation for knowledge-intensive NLP tasks,

    P. Lewis et al., “Retrieval-Augmented Generation for knowledge-intensive NLP tasks,” inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Curran Associates, Inc., 2020, pp. 9459–9474