A Cloud-Native Architecture for Human-in-Control LLM-Assisted OpenSearch in Investigative Settings

Benjamin Puhani , Kai Brehmer , Malte Prie{\ss}

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:41 UTC · model grok-4.3

classification 💻 cs.DC

keywords cloud-native architecturelarge language modelshuman-in-controlOpenSearchinvestigative searchhybrid retrievalmicroservicesnatural language to DSL

0 comments

The pith

A cloud-native system uses large language models to turn natural-language investigative queries into precise OpenSearch commands under continuous human oversight.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a design for a cloud-native microservice architecture that incorporates large language models to bridge the gap between everyday investigative language and the technical syntax required for searching large unstructured evidence collections. Users express their search needs in plain language, after which the LLM generates corresponding OpenSearch Domain-Specific Language expressions that a human operator must review and approve before execution. The architecture combines this controlled translation step with a hybrid retrieval method inside OpenSearch that merges traditional keyword-based ranking with vector embeddings for semantic similarity. A working prototype demonstrates technical feasibility using the Enron Email Dataset as a stand-in for restricted investigative material, while outlining a path for later rigorous testing. The overall goal is to enable scalable, secure deployments in private-cloud environments where data sensitivity and audit requirements are high.

Core claim

The proposed system integrates Large Language Models into a Human-in-Control workflow that translates natural-language queries into syntactically valid OpenSearch Domain-Specific Language expressions, supported by a hybrid BM25-plus-vector retrieval strategy inside OpenSearch and implemented as a cloud-native microservice architecture suitable for private-cloud investigative deployments.

What carries the argument

The Human-in-Control workflow, in which an LLM proposes but a human reviews and validates OpenSearch DSL queries before they are executed against the evidence index.

If this is right

Investigators without deep query-language expertise can explore large evidence sets more quickly while retaining final authority over every search.
The microservice design separates LLM inference from the search engine, allowing independent scaling and security controls in private clouds.
Hybrid lexical-plus-vector retrieval can surface both exact matches and conceptually related items within the same result list.
The outlined evaluation methodology using a public proxy dataset provides a reproducible baseline for measuring later performance on actual restricted corpora.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar controlled-translation patterns could be applied to other specialized query languages beyond OpenSearch.
The architecture may reduce the cognitive load on analysts who currently must translate investigative intent into technical syntax manually.
Extending the LLM prompt engineering to include domain-specific investigative terminology could improve translation accuracy in real cases.
Deployment in actual law-enforcement settings would require additional audit logging and access-control layers not detailed in the prototype.

Load-bearing premise

Large language models can reliably generate syntactically correct and semantically useful OpenSearch queries that human operators can effectively oversee and refine.

What would settle it

A controlled test in which the prototype processes a set of realistic investigative queries and the LLM outputs contain invalid syntax or miss key relevant documents in more than a small fraction of cases even after human review.

Figures

Figures reproduced from arXiv: 2604.21125 by Benjamin Puhani, Kai Brehmer, Malte Prie{\ss}.

read the original abstract

Complex criminal investigations are often hindered by large volumes of unstructured evidence and by the semantic gap between natural language investigative intent and technical search logic. To address this challenge, we present a design and feasibility study of a cloud-native microservice architecture tailored to private-cloud deployments, contributing to research in secure cloud computing and leveraging modern cloud paradigms under high security and scalability requirements. The proposed system integrates Large Language Models into a "Human-in-Control" workflow that translates natural-language queries into syntactically valid OpenSearch Domain-Specific Language expressions. We describe the implementation of a hybrid retrieval strategy within OpenSearch that combines BM25-based lexical search with nested semantic vector embeddings. The paper focuses on system design and preliminary functional validation, establishing an architectural baseline for future empirical evaluation. Technical feasibility is demonstrated through a functional prototype, and a rigorous evaluation methodology is outlined using the Enron Email Dataset as a structural proxy for restricted investigative corpora.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a clear architecture sketch for LLM query translation into OpenSearch DSL with hybrid retrieval, but it stops at feasibility claims without any performance data or prototype results.

read the letter

The paper describes a cloud-native microservice setup that routes natural-language investigative queries through an LLM to generate valid OpenSearch DSL, then runs them with a mix of BM25 and vector search in a private deployment. Human oversight is built in at the translation step. That combination is the main contribution, and it targets a real pain point in handling large unstructured evidence sets under security constraints.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard domain assumptions about LLM reliability under human supervision and the benefits of hybrid retrieval; no free parameters or new entities are introduced.

axioms (1)

domain assumption LLMs can be guided by prompts and human review to produce syntactically valid OpenSearch DSL queries
Central to the human-in-control workflow described in the abstract.

pith-pipeline@v0.9.0 · 5460 in / 1171 out tokens · 133854 ms · 2026-05-09T22:41:15.672133+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 9 canonical work pages · 1 internal anchor

[1]

Infor- mation analysis in criminal investigations: Methods, challenges, and computational opportunities processing unstructured text,

M. Skipanes, G. Demartini, K. Franke, and A. B. Nissen, “Infor- mation analysis in criminal investigations: Methods, challenges, and computational opportunities processing unstructured text,” Policing: A Journal of Policy and Practice, vol. 19, paaf005, Mar. 2025,ISSN: 1752-4520.DOI: 10.1093/police/paaf005

work page doi:10.1093/police/paaf005 2025
[2]

OpenSearch Project,OpenSearch, version 3.4, https://opensear ch.org/ [visited: 2026-03-13], The Linux Foundation, 2025

2026
[3]

Microsoft coco: Common objects in contex t

B. Klimt and Y . Yang, “The enron corpus: A new dataset for email classification research,” inMachine Learning: ECML 2004, J. -F. Boulicaut, F. Esposito, F. Giannotti, and D. Pe- dreschi, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2004, pp. 217–226,ISBN: 978-3-540-30115-8.DOI: 10.1007/9 78-3-540-30115-8_22

work page doi:10.1007/9 2004
[4]

ACM Trans

Y . Zhu et al., “Large language models for information retrieval: A survey,”ACM Transactions on Information Systems, vol. 44, no. 1, pp. 1–54, 2026,ISSN: 1046-8188.DOI: 10.1145/3748304

work page doi:10.1145/3748304 2026
[5]

A survey on employing large language models for text-to-SQL tasks.ACM Computing Surveys, 2024

L. Shi, Z. Tang, N. Zhang, X. Zhang, and Z. Yang, “A survey on employing large language models for text-to-sql tasks,” ACM Computing Surveys, vol. 58, no. 2, pp. 1–37, 2026,ISSN: 0360-0300.DOI: 10.1145/3737873

work page doi:10.1145/3737873 2026
[6]

Towards Learning Boulder Excavation with Hydraulic Excavators

Y . Gao et al.,Retrieval-augmented generation for large language models: A survey, 2024.DOI: 10.48550/arXiv.2 312.10997

work page doi:10.48550/arxiv.2 2024
[7]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

N. F. Liu et al., “Lost in the middle: How language mod- els use long contexts,”Transactions of the Association for Computational Linguistics, vol. 12, pp. 157–173, 2024.DOI: 10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024
[8]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei et al., “Chain-of-thought prompting elicits reasoning in large language models,” inAdvances in Neural Information Processing Systems, S. Koyejo et al., Eds., vol. 35, Curran Associates, Inc., 2022, pp. 24 824–24 837

2022
[9]

Langflow AI,Langflow, version 1.6.8, https://github.com/langf low-ai/langflow [visited: 2026-03-13], 2025

2026
[10]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers and I. Gurevych, “Sentence-BERT: Sentence em- beddings using Siamese BERT-networks,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2019. DOI: 10.48550/arXiv.1908.10084

work page internal anchor Pith review doi:10.48550/arxiv.1908.10084 2019
[11]

C. D. Manning, P. Raghavan, and H. Schütze,Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008,ISBN: 9780521865715.DOI: 10.1017/CBO97805118090 71

work page doi:10.1017/cbo97805118090 2008
[12]

Okapi at trec-3,

S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford, “Okapi at trec-3,” inOverview of the Third Text REtrieval Conference (TREC-3), NIST, 1995, pp. 109–126

1995
[13]

Malkov and D

Y . A. Malkov and D. A. Yashunin, “Efficient and robust ap- proximate nearest neighbor search using hierarchical navigable small world graphs,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 4, pp. 824–836, 2020. DOI: 10.1109/TPAMI.2018.2889473

work page doi:10.1109/tpami.2018.2889473 2020
[14]

Retrieval-Augmented Generation for knowledge-intensive NLP tasks,

P. Lewis et al., “Retrieval-Augmented Generation for knowledge-intensive NLP tasks,” inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Curran Associates, Inc., 2020, pp. 9459–9474

2020