arxiv: 2605.00060 · v1 · submitted 2026-04-30 · 💻 cs.AI · cs.SY· eess.SY

Recognition: unknown

TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data

Rong Lu

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:58 UTC · model grok-4.3

classification 💻 cs.AI cs.SYeess.SY

keywords agentic LLMdrilling intelligencetool augmentationheterogeneous dataevidence groundingdaily drilling reportsWITSMLmulti-step reasoning

0 comments

The pith

An agentic system with twelve domain tools turns mixed drilling reports and measurements into evidence-based answers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TADI as an AI setup where a large language model iteratively calls twelve specialized tools to query both structured tables and narrative documents from the Volve Field. It demonstrates zero-error parsing of 1,759 daily reports, reconciliation of incompatible well names, and generation of answers that cite measurements and report quotes. A sympathetic reader cares because the work shows how to build analytical capability for technical operations by layering domain tools on top of existing data stores rather than depending on ever-larger models alone. The case studies and ablation checks support the idea that careful tool design is the main factor in output quality.

Core claim

TADI formalizes agent behavior as sequential tool selection over a dual-store architecture and shows that this produces grounded analytical intelligence from heterogeneous wellsite data, with the Evidence Grounding Score serving as a compliance check based on measurements, attributed quotations, and required answer sections.

What carries the argument

Twelve domain-specialized tools orchestrated by iterative LLM function calling across a DuckDB structured store and a ChromaDB semantic store.

If this is right

The system parses every daily drilling report XML file without errors and reconciles three incompatible well naming conventions automatically.
It supports a 130-question stress taxonomy across six operational categories backed by 95 automated tests.
Analytical quality stems primarily from the design of the twelve domain tools rather than from increasing the size of the underlying language model.
The full implementation is reproducible from the public Volve dataset plus an API key.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar tool-augmented setups could be adapted to other technical domains that combine numeric logs with free-text reports.
Real-time extensions might allow the same orchestration to run against live streaming data feeds during active drilling.
Explicit comparison runs against larger models on the same question set would provide a quantitative test of whether tool specialization truly dominates scale.

Load-bearing premise

The language model will consistently pick and chain the right tools for multi-step questions without adding ungrounded claims.

What would settle it

A new set of drilling queries where the system frequently selects wrong tools or produces answers lacking required measurements and report quotes would show the approach does not hold.

read the original abstract

We present TADI (Tool-Augmented Drilling Intelligence), an agentic AI system that transforms drilling operational data into evidence-based analytical intelligence. Applied to the Equinor Volve Field dataset, TADI integrates 1,759 daily drilling reports, selected WITSML real-time objects, 15,634 production records, formation tops, and perforations into a dual-store architecture: DuckDB for structured queries over 12 tables with 65,447 rows, and ChromaDB for semantic search over 36,709 embedded documents. Twelve domain-specialized tools, orchestrated by a large language model via iterative function calling, support multi-step evidence gathering that cross-references structured drilling measurements with daily report narratives. The system parses all 1,759 DDR XML files with zero errors, handles three incompatible well naming conventions, and is backed by 95 automated tests plus a 130-question stress-question taxonomy spanning six operational categories. We formalize the agent's behavior as a sequential tool-selection problem and propose the Evidence Grounding Score (EGS) as a simple grounding-compliance proxy based on measurements, attributed DDR quotations, and required answer sections. The complete 6,084-line, framework-free implementation is reproducible given the public Volve download and an API key, and the case studies and qualitative ablation analysis suggest that domain-specialized tool design, rather than model scale alone, is the primary driver of analytical quality in technical operations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TADI gives a working, fully reproducible agentic LLM setup for drilling data that handles real messiness well, but its main claim about tool design rests on qualitative case studies.

read the letter

This paper presents TADI, an agentic LLM system designed to analyze heterogeneous drilling data from the Volve field. It pulls together daily reports, real-time measurements, and other records using a dual database architecture and a set of twelve specialized tools that the model invokes step by step. The strongest aspect is the concrete implementation and attention to real data issues. The system parses every one of the 1,759 daily drilling reports without errors, manages incompatible well naming schemes, and includes 95 automated tests along with a stress-question set. Releasing the full 6,084 lines of code makes it possible for others to examine or build on the work directly. The Evidence Grounding Score offers a straightforward way to measure how well answers stay connected to source data like measurements and report excerpts. What the paper does less strongly is back up its suggestion that domain-specific tool design is the main factor behind good performance. This comes from case studies and a qualitative look at ablations rather than from numbers on error rates, EGS values, or head-to-head tests against untuned larger models. The abstract frames it modestly as a suggestion, which fits the evidence provided. Overall the thinking is clear and the focus stays on making the agent reliable in a technical setting with messy inputs. No obvious internal problems show up in the description. This kind of paper is most useful for researchers and practitioners who work on applying LLMs to industrial or operational data, particularly in energy or similar fields where grounding and tool use are critical. It gives a template for handling mixed structured and unstructured sources. I think it deserves peer review. The level of detail on the system and its reproducibility provide enough substance for referees to evaluate and suggest improvements to the evaluation methods.

Referee Report

2 major / 2 minor

Summary. The paper presents TADI, an agentic LLM-orchestrated system that integrates 1,759 daily drilling reports (DDRs), WITSML objects, production records, and other Volve Field data into a dual-store architecture (DuckDB for 12 structured tables and ChromaDB for semantic search). Twelve domain-specialized tools are called iteratively by the LLM to support multi-step evidence gathering that cross-references measurements with narrative text. The work reports zero XML parsing errors, a fully reproducible 6,084-line implementation, 95 automated tests, a 130-question stress taxonomy across six operational categories, and the Evidence Grounding Score (EGS) as a grounding-compliance proxy. Based on case studies and qualitative ablation analysis, the authors suggest that domain-specialized tool design, rather than model scale, is the primary driver of analytical quality.

Significance. If the qualitative findings hold, the manuscript offers a concrete, reproducible demonstration of agentic LLM systems applied to heterogeneous technical data in drilling operations. Strengths include explicit handling of naming/format incompatibilities, public code, and the introduction of EGS as a simple proxy metric. This could inform tool-augmented agent design in other engineering domains where structured and unstructured data must be combined without parameter fitting.

major comments (2)

[Case Studies and Qualitative Ablation Analysis] The central suggestion that domain-specialized tool design is the primary driver rests on case studies and qualitative ablation, yet the manuscript reports no numerical EGS values, tool-selection success rates, or ablation deltas across the 130-question taxonomy. Without these, the magnitude and robustness of the claimed effect cannot be assessed from the provided evidence.
[Tool Orchestration and Stress-Question Taxonomy] The system description assumes the LLM will consistently select and chain the twelve tools correctly without ungrounded content, but no quantitative evaluation of tool-calling accuracy or failure modes (e.g., over the stress-question taxonomy) is supplied. This leaves the weakest assumption untested in the reported results.

minor comments (2)

[Abstract and Methods] The abstract states that the system 'parses all 1,759 DDR XML files with zero errors' and is 'backed by 95 automated tests,' but neither the methods nor results sections detail the coverage of those tests or the specific failure modes they address.
[Methods] The definition of the Evidence Grounding Score (EGS) is described only at a high level as a 'simple grounding-compliance proxy based on measurements, attributed DDR quotations, and required answer sections.' A precise formula or pseudocode would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation of minor revision. We address each major comment below and will strengthen the manuscript with additional quantitative evaluations as outlined.

read point-by-point responses

Referee: [Case Studies and Qualitative Ablation Analysis] The central suggestion that domain-specialized tool design is the primary driver rests on case studies and qualitative ablation, yet the manuscript reports no numerical EGS values, tool-selection success rates, or ablation deltas across the 130-question taxonomy. Without these, the magnitude and robustness of the claimed effect cannot be assessed from the provided evidence.

Authors: We agree that the absence of numerical EGS values, tool-selection success rates, and ablation deltas limits the ability to quantify the effect size. The current manuscript presents only qualitative ablation and case studies to support the suggestion that tool design is the primary driver. In the revised version, we will compute and report EGS scores across the full 130-question taxonomy, include tool-selection success rates, and provide ablation deltas (e.g., performance with vs. without specific tools) to allow readers to assess the magnitude and robustness of the findings. revision: yes
Referee: [Tool Orchestration and Stress-Question Taxonomy] The system description assumes the LLM will consistently select and chain the twelve tools correctly without ungrounded content, but no quantitative evaluation of tool-calling accuracy or failure modes (e.g., over the stress-question taxonomy) is supplied. This leaves the weakest assumption untested in the reported results.

Authors: The manuscript does not currently include quantitative metrics on tool-calling accuracy or failure modes over the stress-question taxonomy. We acknowledge this leaves an important assumption untested in the reported results. In the revision, we will add a quantitative evaluation of tool-selection accuracy, including success rates and categorized failure modes (e.g., incorrect tool choice, chaining errors, or ungrounded outputs) evaluated against the 130-question taxonomy. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes a fully specified, reproducible implementation (6,084-line codebase, zero-error parsing of 1,759 public Volve DDR files, 95 tests, dual-store architecture with explicit incompatibility handling) whose central suggestion—that domain-specialized tool design drives quality—is drawn from case studies and qualitative ablation rather than any fitted parameter, self-defined metric, or load-bearing self-citation. The proposed EGS is introduced as an observable proxy based on measurements, quotations, and required sections, not derived from the system's own outputs or prior author results. No equations, uniqueness theorems, or ansatzes reduce the claims to their inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that LLM-driven tool orchestration produces higher-quality analysis than scale alone when tools are domain-specialized, with the EGS serving as a proxy metric whose validity is not independently validated in the provided text.

axioms (1)

domain assumption Large language models can reliably perform iterative tool selection and function calling for multi-step evidence gathering in technical domains
Invoked to support the orchestration mechanism described in the abstract.

invented entities (1)

Evidence Grounding Score (EGS) no independent evidence
purpose: Simple proxy for grounding compliance using measurements, attributed DDR quotations, and required answer sections
New metric introduced by the paper to evaluate the agent's outputs.

pith-pipeline@v0.9.0 · 5557 in / 1351 out tokens · 35975 ms · 2026-05-09T20:58:02.356843+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 13 canonical work pages · 7 internal anchors

[1]

V olve field data set

Equinor. V olve field data set. Equinor Open Data, 2018. Multi-terabyte dataset from the V olve field, Norwegian North Sea, comprising approximately 40,000 files

2018
[2]

Rodriguez, and E

Eugenio Ferrigno, M. Rodriguez, and E. Davidsson. Revolutionizing drilling operations: Next-gen LLM-AI for real-time support in well construction control rooms. InSPE Annual Technical Conference and Exhibition, New Orleans, Louisiana, USA, 2024. Society of Petroleum Engineers. SPE-220798-MS

2024
[3]

Large language models (LLMs) for natural language processing (NLP) of oil and gas drilling data

Prateek Kumar and Sanjay Kathuria. Large language models (LLMs) for natural language processing (NLP) of oil and gas drilling data. InSPE Annual Technical Conference and Exhibition, San Antonio, Texas, USA, 2023. Society of Petroleum Engineers

2023
[4]

Bhatia, A

G. Bhatia, A. Yadav, D. Nanda, D. Goyal, S. Perumalla, A. Shinde, B. C. Jha, and D. Upreti. Digitization of daily drilling reports using LLMs. InSPE Middle East Oil, Gas and Geosciences Show, Manama, Bahrain, 2025. Society of Petroleum Engineers. SPE-227059-MS

2025
[5]

ReAct: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

2023
[6]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

2023
[7]

Patil, Tianjun Zhang, Xin Wang, and Joseph E

Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive APIs. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

2024
[8]

HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face

Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

2023
[9]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. ToolLLM: Facilitating large language models to master 16000+ real-world APIs.arXiv preprint arXiv:2307.16789, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

API-bank: A comprehensive benchmark for tool-augmented LLMs

Minghao Li, Yingxiu Zhao, Bowen Yu, Feifan Song, Hangyu Li, Haiyang Yu, Zhoujun Li, Fei Huang, and Yongbin Li. API-bank: A comprehensive benchmark for tool-augmented LLMs. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3102–3116, Singapore, 2023. Association for Computational Linguistics

2023
[11]

StableToolBench: Towards stable large-scale benchmarking on tool learning of large language models

Zhicheng Guo, Sijie Cheng, Hao Wang, Shihao Liang, Yujia Qin, Peng Li, Zhiyuan Liu, Maosong Sun, and Yang Liu. StableToolBench: Towards stable large-scale benchmarking on tool learning of large language models. In Findings of the Association for Computational Linguistics: ACL 2024, pages 11143–11156, Bangkok, Thailand,

2024
[12]

Association for Computational Linguistics
[13]

Tool learning with foundation models

Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang, Chaojun Xiao, Chi Han, et al. Tool learning with foundation models.arXiv preprint arXiv:2304.08354, 2023. 14 TADI: Tool-Augmented Drilling IntelligencePreprint

work page arXiv 2023
[14]

A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

2024
[15]

Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D. Nguyen. Multi-agent collaboration mechanisms: A survey of LLMs.arXiv preprint arXiv:2501.06322, 2025

work page internal anchor Pith review arXiv 2025
[16]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V . Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges.arXiv preprint arXiv:2402.01680, 2024

work page internal anchor Pith review arXiv 2024
[17]

Natural language processing techniques on oil and gas drilling data

Maria Antoniak, Jeff Dalgliesh, Marc Verkruyse, and Jonathan Lo. Natural language processing techniques on oil and gas drilling data. InSPE Intelligent Energy International Conference and Exhibition, Aberdeen, Scotland, UK, 2016. Society of Petroleum Engineers. SPE-181015-MS

2016
[18]

Sequence mining and pattern analysis in drilling reports with deep natural language processing

Júlio Hoffimann, Youli Mao, Avinash Wesley, and Aimee Taylor. Sequence mining and pattern analysis in drilling reports with deep natural language processing. InSPE Annual Technical Conference and Exhibition, Dallas, Texas, USA, 2018. Society of Petroleum Engineers. SPE-191505-MS

2018
[19]

Applications of large language models in well construction planning and real-time operation

Michael Yi, Kamil Ceglinski, Pradeepkumar Ashok, Michael Behounek, Spencer White, Trey Peroyea, and Taylor Thetford. Applications of large language models in well construction planning and real-time operation. In IADC/SPE International Drilling Conference and Exhibition, Galveston, Texas, USA, 2024. Society of Petroleum Engineers. IADC/SPE-217700-MS

2024
[20]

Pacis, Sergey Alyaev, Gilles Pelfrene, and Tomasz Wiktorski

Felix J. Pacis, Sergey Alyaev, Gilles Pelfrene, and Tomasz Wiktorski. Enhancing information retrieval in the drilling domain: Zero-shot learning with large language models for question-answering. InIADC/SPE International Drilling Conference and Exhibition, Galveston, Texas, USA, 2024. Society of Petroleum Engineers. IADC/SPE-217671-MS

2024
[21]

Cloud-free question answering from internal knowledge bases: Building an AI for drilling applications.First Break, 43(2):43–49, 2025

Liang Zhang, Felix James Pacis, Sergey Alyaev, and Tomasz Wiktorski. Cloud-free question answering from internal knowledge bases: Building an AI for drilling applications.First Break, 43(2):43–49, 2025

2025
[22]

Industrial engi- neering with large language models: A case study of chatgpt’s performance on oil & gas problems

Oluwatosin Ogundare, Srinath Madasu, and Nathanial Wiggins. Industrial engineering with large language models: A case study of ChatGPT’s performance on oil & gas problems.arXiv preprint arXiv:2304.14354, 2023

work page arXiv 2023
[23]

Retrieval-augmented generation for knowledge-intensive NLP tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 9459–9474, 2020

2020
[24]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[25]

A comprehensive survey of retrieval-augmented generation (rag): Evolution, current landscape and future directions,

Shailja Gupta, Rajesh Ranjan, and Surya Narayan Singh. A comprehensive survey of retrieval-augmented generation (RAG): Evolution, current landscape and future directions.arXiv preprint arXiv:2410.12837, 2024

work page arXiv 2024
[26]

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. Self-RAG: Learning to retrieve, generate, and critique through self-reflection.arXiv preprint arXiv:2310.11511, 2023

work page internal anchor Pith review arXiv 2023
[27]

HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction

Bhaskarjit Sarmah, Benika Hall, Rohan Rao, Sunil Patel, Stefano Pasquali, and Dhagash Mehta. HybridRAG: Integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction.arXiv preprint arXiv:2408.04948, 2024

work page arXiv 2024
[28]

MTEB: Massive text embedding benchmark

Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. MTEB: Massive text embedding benchmark. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 2014–2037, Dubrovnik, Croatia, 2023

2014
[29]

A survey of nl2sql with large language models: Where are we, and where are we going?

Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, and Yuyu Luo. A survey of text-to-SQL in the era of LLMs: Where are we, and where are we going?arXiv preprint arXiv:2408.05109, 2024

work page arXiv 2024
[30]

Tunkiel, Tomasz Wiktorski, and Dan Sui

Andrzej T. Tunkiel, Tomasz Wiktorski, and Dan Sui. Drilling dataset exploration, processing and interpretation using V olve field data. InProceedings of the ASME 2020 39th International Conference on Ocean, Offshore and Arctic Engineering (OMAE), volume 11, page V011T11A076, Virtual, Online, 2020. ASME

2020
[31]

Nikitin, Ilia Revin, Alexander Hvatov, Pavel Vychuzhanin, and Anna V

Nikolay O. Nikitin, Ilia Revin, Alexander Hvatov, Pavel Vychuzhanin, and Anna V . Kalyuzhnaya. Hybrid and automated machine learning approaches for oil fields development: The case study of V olve field, North Sea. Computers & Geosciences, 161:105061, 2022. 15 TADI: Tool-Augmented Drilling IntelligencePreprint

2022
[32]

Cuthbert Shang Wui Ng, Ashkan Jahanbani Ghahfarokhi, and Menad Nait Amar. Well production forecast in V olve field: Application of rigorous machine learning techniques and metaheuristic algorithm.Journal of Petroleum Science and Engineering, 208:109468, 2022

2022
[33]

Geomechanical model construction to resolve field stress profile and reservoir rock properties of Jurassic Hugin Formation, V olve field, North Sea

Sankhajit Saha, Vikram Vishal, Bankim Mahanta, and Sarada Prasad Pradhan. Geomechanical model construction to resolve field stress profile and reservoir rock properties of Jurassic Hugin Formation, V olve field, North Sea. Geomechanics and Geophysics for Geo-Energy and Geo-Resources, 8(2):59, 2022

2022
[34]

Petrophysical property prediction from seismic inversion attributes using rock physics and machine learning: V olve field, North Sea.Applied Sciences, 14(4):1345, 2024

Olalere Oloruntobi et al. Petrophysical property prediction from seismic inversion attributes using rock physics and machine learning: V olve field, North Sea.Applied Sciences, 14(4):1345, 2024

2024
[35]

WITSML data standards

Energistics. WITSML data standards. Energistics Consortium, 2011. Version 1.4.1.1. Wellsite Information Transfer Standard Markup Language

2011
[36]

Suranga C. H. Geekiyanage, Andrzej T. Tunkiel, and Dan Sui. Drilling data quality improvement and information extraction with case studies.Journal of Petroleum Exploration and Production Technology, 11:819–837, 2021

2021
[37]

DuckDB: An embeddable analytical database

Mark Raasveldt and Hannes Mühleisen. DuckDB: An embeddable analytical database. InProceedings of the 2019 International Conference on Management of Data (SIGMOD), pages 1981–1984, Amsterdam, Netherlands,

2019
[38]

The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, et al. The prompt report: A systematic survey of prompting techniques.arXiv preprint arXiv:2406.06608, 2024

work page internal anchor Pith review arXiv 2024
[39]

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, and Aman Chadha. A sys- tematic survey of prompt engineering in large language models: Techniques and applications.arXiv preprint arXiv:2402.07927, 2024

work page internal anchor Pith review arXiv 2024
[40]

JSONSchemaBench: A rigorous bench- mark of structured outputs for language models.arXiv preprint arXiv:2501.10868, 2025

Saibo Geng et al. JSONSchemaBench: A rigorous benchmark of structured outputs for language models.arXiv preprint arXiv:2501.10868, 2025

work page arXiv 2025
[41]

Developing a large language model for oil- and gas-related rock mechanics: Progress and challenges.Natural Gas Industry B, 12(2):110–122, 2025

Botao Lin, Yan Jin, Qianwen Cao, Han Meng, Huiwen Pang, and Shiming Wei. Developing a large language model for oil- and gas-related rock mechanics: Progress and challenges.Natural Gas Industry B, 12(2):110–122, 2025

2025
[42]

Tools, technologies and frameworks for digital twins in the oil and gas industry: An in-depth analysis.Sensors, 24(19):6457, 2024

Edwin Benito Mitacc Meza, Dalton Garcia Borges de Souza, Alessandro Copetti, Ana Paula Barbosa Sobral, Guido Vaz Silva, Iara Tammela, and Rodolfo Cardoso. Tools, technologies and frameworks for digital twins in the oil and gas industry: An in-depth analysis.Sensors, 24(19):6457, 2024. 16 TADI: Tool-Augmented Drilling IntelligencePreprint A Test Coverage D...

2024