From Task-Guided Conversational Graphs to Goal-Oriented Dialogue Runtimes

Mariano Garralda-Barrio

arxiv: 2606.23797 · v1 · pith:YOJQUADJnew · submitted 2026-06-22 · 💻 cs.SE · cs.AI· cs.CL· cs.MA

From Task-Guided Conversational Graphs to Goal-Oriented Dialogue Runtimes

Mariano Garralda-Barrio This is my paper

Pith reviewed 2026-06-26 07:07 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.CLcs.MA

keywords goal-oriented dialogue runtimeconversational continuityLLM orchestrationdesign patterntask framesruntime objectsmulti-domain conversationsinterruption handling

0 comments

The pith

GODR elevates goals, task frames, and resumption contracts to first-class runtime objects to maintain continuity across suspended and interdependent objectives in complex LLM conversations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that graph-based and multi-agent orchestration frameworks for large language model workflows do not solve conversational continuity when users pursue several interdependent objectives that can be suspended, resumed, revised, or invalidated by actions in other goals. It proposes the Goal-Oriented Dialogue Runtime as a framework-neutral design pattern that makes goals, task frames, lifecycle state, invalidation rules, and resumption contracts explicit first-class runtime objects, while leaving bounded execution to existing graph runtimes, agents, tools, or APIs. The pattern targets the high-complexity end of the design space where objective continuity cannot be recovered reliably from agent identity, chat history, or execution-graph position alone. A sympathetic reader would care because this addresses a practical limitation in building reliable, interruptible, multi-domain dialogue systems that current workflow graphs do not cover.

Core claim

The paper claims that the Goal-Oriented Dialogue Runtime (GODR) is a framework-neutral design pattern that treats goals, task frames, lifecycle state, invalidation rules, and resumption contracts as first-class runtime objects while delegating bounded execution to graph runtimes, agents, tools, or APIs, intended for complex, multi-domain, interruptible conversations where objective continuity cannot be recovered reliably from agent identity, chat history, or execution-graph position alone.

What carries the argument

The Goal-Oriented Dialogue Runtime (GODR) design pattern, which elevates goals, task frames, lifecycle state, invalidation rules, and resumption contracts to first-class runtime objects.

If this is right

Goals can be suspended and resumed across interruptions without depending on chat history or current graph position.
Actions in one goal can invalidate or revise other goals through explicit invalidation rules.
The pattern applies only to high-complexity cases and does not replace workflow graphs for simple guided processes.
Evaluation is positioned as an agenda for future empirical validation rather than a current performance measurement.
Resumption contracts and lifecycle state become inspectable and portable across different underlying execution engines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The pattern could be layered on top of existing graph frameworks to add explicit goal tracking without replacing their execution engines.
It would make state inspection and debugging easier in systems where multiple parallel user objectives run simultaneously.
A natural test would involve building a multi-domain customer-support dialogue that handles concurrent requests such as order changes and account updates.

Load-bearing premise

Objective continuity in high-complexity conversations cannot be recovered reliably from agent identity, chat history, or execution-graph position alone.

What would settle it

A concrete implementation or simulation of a complex multi-domain conversation in which all suspended and interdependent objectives can be fully reconstructed and resumed using only chat history and execution-graph position without any explicit goal objects would falsify the central motivation for GODR.

Figures

Figures reproduced from arXiv: 2606.23797 by Mariano Garralda-Barrio.

**Figure 2.** Figure 2: Local control moves in a process-guided dialogue. The runtime retries the current node, advances [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Objective continuity across an interruption. The travel booking remains resumable while the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: A GC-4 procurement goal graph. Solid arrows encode subgoal and required-for relations; dashed [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Three-layer separation for goal-oriented conversational architecture. GODR separates goal [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Reference implementation architecture for GODR. The runtime owns goal registry, policy, state, [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Trace of the event-registration example. The runtime suspends the registration goal, serves the side [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

read the original abstract

Graph and multi-agent orchestration frameworks make production large language model (LLM) workflows practical, but they do not by themselves solve conversational continuity when users maintain several interdependent objectives. This conceptual systems paper focuses on the high-complexity end of that design space, where goals can be suspended, resumed, revised, and invalidated by actions in other goals. We introduce the Goal-Oriented Dialogue Runtime (GODR), a framework-neutral design pattern that treats goals, task frames, lifecycle state, invalidation rules, and resumption contracts as first-class runtime objects while delegating bounded execution to graph runtimes, agents, tools, or application programming interfaces (APIs). GODR is not proposed as a replacement for workflow graphs in simple guided processes; it is intended for complex, multi-domain, interruptible conversations where objective continuity cannot be recovered reliably from agent identity, chat history, or execution-graph position alone. The paper formalizes the problem, proposes runtime objects and architecture-selection criteria, and frames evaluation as an agenda for future empirical validation rather than as a measured performance claim.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Conceptual proposal for a GODR design pattern to handle complex multi-goal LLM continuity, but no evidence or formalization is provided.

read the letter

The main takeaway is that this paper proposes GODR, a framework-neutral design pattern that elevates goals, task frames, lifecycle state, invalidation rules, and resumption contracts to first-class runtime objects for interruptible multi-domain conversations. It does not claim to replace graph-based systems for simpler cases.

What the paper does reasonably is scope the problem and describe the intended architecture. It identifies scenarios where chat history or execution-graph position alone may not suffice for objective continuity across suspended and resumed goals, and it suggests criteria for choosing this approach over lighter mechanisms. The framing stays neutral on underlying runtimes, which keeps the idea portable.

The soft spots are central rather than minor. The work is a design proposal with no formal model, no worked example, no pseudocode, and no data. Evaluation is explicitly left for future work, so there is nothing to assess whether the added runtime objects actually improve reliability or introduce unacceptable overhead. The motivating assumption about continuity limits is stated but not tested against concrete alternatives.

Citation patterns are typical for a conceptual piece and do not overclaim prior results. The paper is honest about its scope.

This is for engineers and researchers already working on production LLM orchestration who want to think through runtime structures for long-running, interruptible tasks. A reader looking for measured improvements or reusable artifacts will not find them.

I would not send this to peer review in its current form. It needs at least a prototype or small case study before it would be worth referee time.

Referee Report

1 major / 1 minor

Summary. The manuscript is a conceptual systems paper that identifies limitations in graph and multi-agent orchestration frameworks for maintaining conversational continuity in complex, multi-domain dialogues where goals can be suspended, resumed, revised, or invalidated. It introduces the Goal-Oriented Dialogue Runtime (GODR) as a framework-neutral design pattern that elevates goals, task frames, lifecycle state, invalidation rules, and resumption contracts to first-class runtime objects while delegating bounded execution to existing graph runtimes, agents, tools, or APIs. The paper formalizes the continuity problem, proposes architecture-selection criteria, and explicitly frames empirical validation as future work rather than a current claim.

Significance. If the proposed design pattern can be shown to improve objective continuity in interruptible conversations, it would address a practical gap in production LLM workflow engineering by providing explicit mechanisms beyond reliance on chat history or execution position. The contribution lies in its framing of runtime objects for goal management, which could inform the design of more robust conversational systems if accompanied by implementation guidance or case studies.

major comments (1)

[Abstract] Abstract and Introduction: The central motivation—that objective continuity cannot be recovered reliably from agent identity, chat history, or execution-graph position alone—is asserted without concrete examples, failure cases, or references to prior work demonstrating this limitation. This assumption is load-bearing for the claim that first-class goal objects are required.

minor comments (1)

The manuscript would benefit from at least one detailed illustrative scenario showing how GODR objects interact during goal suspension and resumption, to make the architecture-selection criteria more concrete.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below and will revise the manuscript to incorporate concrete examples and references.

read point-by-point responses

Referee: [Abstract] Abstract and Introduction: The central motivation—that objective continuity cannot be recovered reliably from agent identity, chat history, or execution-graph position alone—is asserted without concrete examples, failure cases, or references to prior work demonstrating this limitation. This assumption is load-bearing for the claim that first-class goal objects are required.

Authors: We agree that the motivation is presented at a high level without explicit failure cases or citations in the abstract and introduction. As a conceptual systems paper, the manuscript focuses on formalizing the continuity problem and proposing runtime objects rather than empirical validation. To address this, the revised manuscript will expand the Introduction with illustrative failure cases (e.g., interleaved multi-domain goals where a support interruption invalidates a prior booking task frame in ways not recoverable from history or graph position alone) and add references to prior work on dialogue state tracking and goal management. This will make the design rationale more concrete while preserving the paper's scope. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a purely conceptual systems proposal that defines GODR as a design pattern for interruptible multi-goal conversations. It contains no equations, fitted parameters, predictions, or derivations that could reduce to inputs by construction. Evaluation is explicitly deferred to future work, and the central motivation (objective continuity not recoverable from history or graph position) is stated as an assumption rather than derived from prior results. No self-citations or ansatzes are invoked as load-bearing steps. The derivation chain is self-contained as a definitional framework.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The proposal rests on the domain assumption that existing graph and multi-agent frameworks leave conversational continuity unsolved for interdependent objectives; no free parameters or invented physical entities are introduced.

axioms (2)

domain assumption Graph and multi-agent orchestration frameworks do not by themselves solve conversational continuity when users maintain several interdependent objectives.
Opening sentence of the abstract; used to motivate the need for first-class goal objects.
domain assumption Objective continuity cannot be recovered reliably from agent identity, chat history, or execution-graph position alone in high-complexity cases.
Stated as the condition under which GODR is intended to be used.

invented entities (1)

Goal-Oriented Dialogue Runtime (GODR) no independent evidence
purpose: Runtime layer that elevates goals, task frames, lifecycle state, invalidation rules, and resumption contracts to first-class objects.
New named design pattern introduced in the paper; no independent evidence or falsifiable prediction supplied.

pith-pipeline@v0.9.1-grok · 5716 in / 1483 out tokens · 20819 ms · 2026-06-26T07:07:57.811962+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 9 canonical work pages

[1]

Narasimhan, and Yuan Cao

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=WE_vluYUL-X

2023
[2]

Dialog2API: Task-oriented dialogue with api description and example programs, 2022

Raphael Shu, Elman Mansimov, Tamer Alkhouli, Nikolaos Pappas, Salvatore Romeo, Arshit Gupta, Saab Mansour, Yi Zhang, and Dan Roth. Dialog2API: Task-oriented dialogue with api description and example programs, 2022. URL https://arxiv.org/abs/2212.09946

arXiv 2022
[3]

LangChain Documentation, 2026

LangChain.Multi-Agent Systems Documentation. LangChain Documentation, 2026. URL https: //docs.langchain.com/oss/python/langchain/multi-agent. Technical documentation. Accessed 2026-06-15

2026
[4]

Google Developers Blog,

Google Developers Blog.Developer’s Guide to Multi-Agent Patterns in ADK. Google Developers Blog,
[5]

Technical blog

URL https://developers.googleblog.com/developers-guide-to-multi-agent-patterns-in-adk/. Technical blog. Accessed 2026-06-15

2026
[6]

Microsoft Learn, 2026

Microsoft.Microsoft Agent Framework Overview. Microsoft Learn, 2026. URL https://learn.microsoft. com/en-us/agent-framework/overview/. Technical documentation. Accessed 2026-06-15

2026
[7]

OpenAI Documentation, 2026

OpenAI.OpenAI Agents SDK Documentation. OpenAI Documentation, 2026. URL https://openai.git hub.io/openai-agents-python/. Technical documentation. Accessed 2026-06-15

2026
[8]

LangChain Documentation, 2026

LangChain.LangGraph Subgraphs Documentation. LangChain Documentation, 2026. URL https: //docs.langchain.com/oss/python/langgraph/use-subgraphs. Technical documentation. Accessed 2026-06-15

2026
[9]

LangChain Documentation, 2026

LangChain.Handoffs Documentation. LangChain Documentation, 2026. URL https://docs.langchain.c om/oss/python/langchain/multi-agent/handoffs. Technical documentation. Accessed 2026-06-15

2026
[10]

Google Cloud Blog, 2026

Google Cloud.Remember This: Agent State and Memory with ADK. Google Cloud Blog, 2026. URL https://cloud.google.com/blog/topics/developers-practitioners/remember-this-agent-state-and-memor y-with-adk. Technical blog. Accessed 2026-06-15

2026
[11]

Ai agent systems: Architectures, applications, and evaluation.arXiv preprint arXiv:2601.01743, 2026

Bin Xu. Ai agent systems: Architectures, applications, and evaluation.arXiv preprint arXiv:2601.01743, 2026

arXiv 2026
[12]

Governed evolution of agent runtimes through executable operational cognition,

Mariano Garralda-Barrio. Governed evolution of agent runtimes through executable operational cognition,
[13]

URL https://arxiv.org/abs/2605.27328

Pith/arXiv arXiv
[14]

Rudnicky

Dan Bohus and Alexander I. Rudnicky. The RavenClaw dialog management framework: Architecture and systems.Computer Speech & Language, 23(3):332–361, 2009. doi:10.1016/j.csl.2008.10.001. URL https://www.cs.brandeis.edu/~cs115/CS115_docs/Ravenclaw.pdf

work page doi:10.1016/j.csl.2008.10.001 2009
[15]

Microsoft Learn, 2026

Microsoft.About Component and Waterfall Dialogs. Microsoft Learn, 2026. URL https://learn.micr osoft.com/en-us/azure/bot-service/bot-builder-concept-waterfall-dialogs. Technical documentation. Accessed 2026-06-15

2026
[16]

Staffan Larsson and David R. Traum. Information state and dialogue management in the TRINDI dialogue move engine toolkit.Natural Language Engineering, 6(3–4):323–340, 2000. doi:10.1017/S1351324900002539

work page doi:10.1017/s1351324900002539 2000
[17]

Williams and Steve Young

Jason D. Williams and Steve Young. Partially observable Markov decision processes for spoken dialog systems.Computer Speech & Language, 21(2):393–422, 2007. doi:10.1016/j.csl.2006.06.008

work page doi:10.1016/j.csl.2006.06.008 2007
[18]

Williams

Steve Young, Milica Gasic, Blaise Thomson, and Jason D. Williams. POMDP-based sta- tistical spoken dialog systems: A review.Proceedings of the IEEE, 101(5):1160–1179, 2013. doi:10.1109/JPROC.2012.2225812

work page doi:10.1109/jproc.2012.2225812 2013
[19]

Rudnicky

Dan Bohus and Alexander I. Rudnicky. RavenClaw: Dialog management using hierarchical task decomposition and an expectation agenda. InProceedings of Eurospeech, 2003. URL https://www.isca-a rchive.org/eurospeech_2003/bohus03_eurospeech.pdf. 20 From Task-Guided Conversational Graphs to Goal-Oriented Dialogue Runtimes

2003
[20]

Microsoft Learn, 2026

Microsoft.Dialogs in the Bot Framework SDK. Microsoft Learn, 2026. URL https://learn.microsoft.co m/en-us/azure/bot-service/bot-builder-concept-dialog. Technical documentation. Accessed 2026-06-15

2026
[21]

Williams

Matthew Henderson, Blaise Thomson, and Jason D. Williams. The second dialog state tracking challenge. InProceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 263–272, 2014. doi:10.3115/v1/W14-4337. URL https://aclanthology.org/W14-4337/

work page doi:10.3115/v1/w14-4337 2014
[22]

M ulti WOZ - A Large-Scale Multi-Domain W izard-of- O z Dataset for Task-Oriented Dialogue Modelling

Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. MultiWOZ: A large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016–5026, 2018. doi:10.18653/v1/D18-1547. URL ...

work page doi:10.18653/v1/d18-1547 2018
[23]

Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset

Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, and Pranav Khaitan. Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. InProceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8689–8696, 2020. doi:10.1609/aaai.v34i05.6394. URL https://ojs.aaai.org/index.php/AAAI/article/view/6394

work page doi:10.1609/aaai.v34i05.6394 2020
[24]

Google Research Blog, 2026

Google Research.Introducing the Schema-Guided Dialogue Dataset for Conversational Assistants. Google Research Blog, 2026. URL https://research.google/blog/introducing-the-schema-guided-dialogue-datas et-for-conversational-assistants/. Technical blog. Accessed 2026-06-15

2026
[25]

Task-oriented dialogue as dataflow synthesis.Transactions of the Association for Computational Linguistics, 8:556–571, 2020

Jacob Andreas et al. Task-oriented dialogue as dataflow synthesis.Transactions of the Association for Computational Linguistics, 8:556–571, 2020. doi:10.1162/tacl_a_00333. URL https://direct.mit.edu/tac l/article/doi/10.1162/tacl_a_00333/96470/Task-Oriented-Dialogue-as-Dataflow-Synthesis

work page doi:10.1162/tacl_a_00333 2020
[26]

Morgan Kaufmann, 2004

Malik Ghallab, Dana Nau, and Paolo Traverso.Automated Planning: Theory and Practice. Morgan Kaufmann, 2004

2004
[27]

Kutluhan Erol, James Hendler, and Dana S. Nau. HTN planning: Complexity and expressivity. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 1123–1128, 1994. URL https://cdn.aaai.org/AAAI/1994/AAAI94-173.pdf

1994
[28]

Kutluhan Erol, James Hendler, and Dana S. Nau. Complexity results for HTN planning.Annals of Mathematics and Artificial Intelligence, 18(1):69–93, 1996. doi:10.1007/BF02136183

work page doi:10.1007/bf02136183 1996
[29]

Bratman.Intention, Plans, and Practical Reason

Michael E. Bratman.Intention, Plans, and Practical Reason. Harvard University Press, 1987

1987
[30]

Rao and Michael P

Anand S. Rao and Michael P. Georgeff. BDI agents: From theory to practice. InProceedings of the First International Conference on Multi-Agent Systems, pages 312–319, 1995

1995
[31]

Microsoft Learn, 2026

Microsoft.Semantic Kernel Agent Framework. Microsoft Learn, 2026. URL https://learn.microsoft.co m/en-us/semantic-kernel/frameworks/agent/. Technical documentation. Accessed 2026-06-15

2026
[32]

Microsoft Learn, 2026

Microsoft.Semantic Kernel Agent Orchestration. Microsoft Learn, 2026. URL https://learn.microsoft.co m/en-us/semantic-kernel/frameworks/agent/agent-orchestration/. Technical documentation. Accessed 2026-06-15

2026
[33]

CrewAI Documentation, 2026

CrewAI.Flows Documentation. CrewAI Documentation, 2026. URL https://docs.crewai.com/en/conce pts/flows. Technical documentation. Accessed 2026-06-15

2026
[34]

CrewAI Documentation, 2026

CrewAI.Mastering Flow State Management. CrewAI Documentation, 2026. URL https://docs.crewai. com/en/guides/flows/mastering-flow-state. Technical documentation. Accessed 2026-06-15

2026
[35]

Amazon Bedrock Documentation, 2026

Amazon Web Services.Use Multi-Agent Collaboration with Amazon Bedrock Agents. Amazon Bedrock Documentation, 2026. URL https://docs.aws.amazon.com/bedrock/latest/userguide/agents-multi-agent -collaboration.html. Technical documentation. Accessed 2026-06-15

2026
[36]

AutoGen Documentation, 2026

AutoGen.Agent and Multi-Agent Applications. AutoGen Documentation, 2026. URL https://microsoft. github.io/autogen/stable/user-guide/core-user-guide/core-concepts/agent-and-multi-agent-applicati on.html. Technical documentation. Accessed 2026-06-15. 21

2026

[1] [1]

Narasimhan, and Yuan Cao

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=WE_vluYUL-X

2023

[2] [2]

Dialog2API: Task-oriented dialogue with api description and example programs, 2022

Raphael Shu, Elman Mansimov, Tamer Alkhouli, Nikolaos Pappas, Salvatore Romeo, Arshit Gupta, Saab Mansour, Yi Zhang, and Dan Roth. Dialog2API: Task-oriented dialogue with api description and example programs, 2022. URL https://arxiv.org/abs/2212.09946

arXiv 2022

[3] [3]

LangChain Documentation, 2026

LangChain.Multi-Agent Systems Documentation. LangChain Documentation, 2026. URL https: //docs.langchain.com/oss/python/langchain/multi-agent. Technical documentation. Accessed 2026-06-15

2026

[4] [4]

Google Developers Blog,

Google Developers Blog.Developer’s Guide to Multi-Agent Patterns in ADK. Google Developers Blog,

[5] [5]

Technical blog

URL https://developers.googleblog.com/developers-guide-to-multi-agent-patterns-in-adk/. Technical blog. Accessed 2026-06-15

2026

[6] [6]

Microsoft Learn, 2026

Microsoft.Microsoft Agent Framework Overview. Microsoft Learn, 2026. URL https://learn.microsoft. com/en-us/agent-framework/overview/. Technical documentation. Accessed 2026-06-15

2026

[7] [7]

OpenAI Documentation, 2026

OpenAI.OpenAI Agents SDK Documentation. OpenAI Documentation, 2026. URL https://openai.git hub.io/openai-agents-python/. Technical documentation. Accessed 2026-06-15

2026

[8] [8]

LangChain Documentation, 2026

LangChain.LangGraph Subgraphs Documentation. LangChain Documentation, 2026. URL https: //docs.langchain.com/oss/python/langgraph/use-subgraphs. Technical documentation. Accessed 2026-06-15

2026

[9] [9]

LangChain Documentation, 2026

LangChain.Handoffs Documentation. LangChain Documentation, 2026. URL https://docs.langchain.c om/oss/python/langchain/multi-agent/handoffs. Technical documentation. Accessed 2026-06-15

2026

[10] [10]

Google Cloud Blog, 2026

Google Cloud.Remember This: Agent State and Memory with ADK. Google Cloud Blog, 2026. URL https://cloud.google.com/blog/topics/developers-practitioners/remember-this-agent-state-and-memor y-with-adk. Technical blog. Accessed 2026-06-15

2026

[11] [11]

Ai agent systems: Architectures, applications, and evaluation.arXiv preprint arXiv:2601.01743, 2026

Bin Xu. Ai agent systems: Architectures, applications, and evaluation.arXiv preprint arXiv:2601.01743, 2026

arXiv 2026

[12] [12]

Governed evolution of agent runtimes through executable operational cognition,

Mariano Garralda-Barrio. Governed evolution of agent runtimes through executable operational cognition,

[13] [13]

URL https://arxiv.org/abs/2605.27328

Pith/arXiv arXiv

[14] [14]

Rudnicky

Dan Bohus and Alexander I. Rudnicky. The RavenClaw dialog management framework: Architecture and systems.Computer Speech & Language, 23(3):332–361, 2009. doi:10.1016/j.csl.2008.10.001. URL https://www.cs.brandeis.edu/~cs115/CS115_docs/Ravenclaw.pdf

work page doi:10.1016/j.csl.2008.10.001 2009

[15] [15]

Microsoft Learn, 2026

Microsoft.About Component and Waterfall Dialogs. Microsoft Learn, 2026. URL https://learn.micr osoft.com/en-us/azure/bot-service/bot-builder-concept-waterfall-dialogs. Technical documentation. Accessed 2026-06-15

2026

[16] [16]

Staffan Larsson and David R. Traum. Information state and dialogue management in the TRINDI dialogue move engine toolkit.Natural Language Engineering, 6(3–4):323–340, 2000. doi:10.1017/S1351324900002539

work page doi:10.1017/s1351324900002539 2000

[17] [17]

Williams and Steve Young

Jason D. Williams and Steve Young. Partially observable Markov decision processes for spoken dialog systems.Computer Speech & Language, 21(2):393–422, 2007. doi:10.1016/j.csl.2006.06.008

work page doi:10.1016/j.csl.2006.06.008 2007

[18] [18]

Williams

Steve Young, Milica Gasic, Blaise Thomson, and Jason D. Williams. POMDP-based sta- tistical spoken dialog systems: A review.Proceedings of the IEEE, 101(5):1160–1179, 2013. doi:10.1109/JPROC.2012.2225812

work page doi:10.1109/jproc.2012.2225812 2013

[19] [19]

Rudnicky

Dan Bohus and Alexander I. Rudnicky. RavenClaw: Dialog management using hierarchical task decomposition and an expectation agenda. InProceedings of Eurospeech, 2003. URL https://www.isca-a rchive.org/eurospeech_2003/bohus03_eurospeech.pdf. 20 From Task-Guided Conversational Graphs to Goal-Oriented Dialogue Runtimes

2003

[20] [20]

Microsoft Learn, 2026

Microsoft.Dialogs in the Bot Framework SDK. Microsoft Learn, 2026. URL https://learn.microsoft.co m/en-us/azure/bot-service/bot-builder-concept-dialog. Technical documentation. Accessed 2026-06-15

2026

[21] [21]

Williams

Matthew Henderson, Blaise Thomson, and Jason D. Williams. The second dialog state tracking challenge. InProceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 263–272, 2014. doi:10.3115/v1/W14-4337. URL https://aclanthology.org/W14-4337/

work page doi:10.3115/v1/w14-4337 2014

[22] [22]

M ulti WOZ - A Large-Scale Multi-Domain W izard-of- O z Dataset for Task-Oriented Dialogue Modelling

Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. MultiWOZ: A large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016–5026, 2018. doi:10.18653/v1/D18-1547. URL ...

work page doi:10.18653/v1/d18-1547 2018

[23] [23]

Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset

Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, and Pranav Khaitan. Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. InProceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8689–8696, 2020. doi:10.1609/aaai.v34i05.6394. URL https://ojs.aaai.org/index.php/AAAI/article/view/6394

work page doi:10.1609/aaai.v34i05.6394 2020

[24] [24]

Google Research Blog, 2026

Google Research.Introducing the Schema-Guided Dialogue Dataset for Conversational Assistants. Google Research Blog, 2026. URL https://research.google/blog/introducing-the-schema-guided-dialogue-datas et-for-conversational-assistants/. Technical blog. Accessed 2026-06-15

2026

[25] [25]

Task-oriented dialogue as dataflow synthesis.Transactions of the Association for Computational Linguistics, 8:556–571, 2020

Jacob Andreas et al. Task-oriented dialogue as dataflow synthesis.Transactions of the Association for Computational Linguistics, 8:556–571, 2020. doi:10.1162/tacl_a_00333. URL https://direct.mit.edu/tac l/article/doi/10.1162/tacl_a_00333/96470/Task-Oriented-Dialogue-as-Dataflow-Synthesis

work page doi:10.1162/tacl_a_00333 2020

[26] [26]

Morgan Kaufmann, 2004

Malik Ghallab, Dana Nau, and Paolo Traverso.Automated Planning: Theory and Practice. Morgan Kaufmann, 2004

2004

[27] [27]

Kutluhan Erol, James Hendler, and Dana S. Nau. HTN planning: Complexity and expressivity. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 1123–1128, 1994. URL https://cdn.aaai.org/AAAI/1994/AAAI94-173.pdf

1994

[28] [28]

Kutluhan Erol, James Hendler, and Dana S. Nau. Complexity results for HTN planning.Annals of Mathematics and Artificial Intelligence, 18(1):69–93, 1996. doi:10.1007/BF02136183

work page doi:10.1007/bf02136183 1996

[29] [29]

Bratman.Intention, Plans, and Practical Reason

Michael E. Bratman.Intention, Plans, and Practical Reason. Harvard University Press, 1987

1987

[30] [30]

Rao and Michael P

Anand S. Rao and Michael P. Georgeff. BDI agents: From theory to practice. InProceedings of the First International Conference on Multi-Agent Systems, pages 312–319, 1995

1995

[31] [31]

Microsoft Learn, 2026

Microsoft.Semantic Kernel Agent Framework. Microsoft Learn, 2026. URL https://learn.microsoft.co m/en-us/semantic-kernel/frameworks/agent/. Technical documentation. Accessed 2026-06-15

2026

[32] [32]

Microsoft Learn, 2026

Microsoft.Semantic Kernel Agent Orchestration. Microsoft Learn, 2026. URL https://learn.microsoft.co m/en-us/semantic-kernel/frameworks/agent/agent-orchestration/. Technical documentation. Accessed 2026-06-15

2026

[33] [33]

CrewAI Documentation, 2026

CrewAI.Flows Documentation. CrewAI Documentation, 2026. URL https://docs.crewai.com/en/conce pts/flows. Technical documentation. Accessed 2026-06-15

2026

[34] [34]

CrewAI Documentation, 2026

CrewAI.Mastering Flow State Management. CrewAI Documentation, 2026. URL https://docs.crewai. com/en/guides/flows/mastering-flow-state. Technical documentation. Accessed 2026-06-15

2026

[35] [35]

Amazon Bedrock Documentation, 2026

Amazon Web Services.Use Multi-Agent Collaboration with Amazon Bedrock Agents. Amazon Bedrock Documentation, 2026. URL https://docs.aws.amazon.com/bedrock/latest/userguide/agents-multi-agent -collaboration.html. Technical documentation. Accessed 2026-06-15

2026

[36] [36]

AutoGen Documentation, 2026

AutoGen.Agent and Multi-Agent Applications. AutoGen Documentation, 2026. URL https://microsoft. github.io/autogen/stable/user-guide/core-user-guide/core-concepts/agent-and-multi-agent-applicati on.html. Technical documentation. Accessed 2026-06-15. 21

2026