pith. sign in

arxiv: 2606.20041 · v1 · pith:B4XLVNUAnew · submitted 2026-06-18 · 💰 econ.GN · cs.AI· cs.LG· q-fin.EC· q-fin.GN

AI Economist Agent: An Agentic Framework for Model-Grounded Economic Analysis with RAG, Knowledge Graphs, and Large Language Models

Pith reviewed 2026-06-26 15:12 UTC · model grok-4.3

classification 💰 econ.GN cs.AIcs.LGq-fin.ECq-fin.GN
keywords AI economistagentic frameworkRAGknowledge graphslarge language modelseconomic analysismodel-groundedinflation persistence
0
0 comments X

The pith

An AI economist agent uses LLM agents and knowledge graphs to ground economic narratives in explicit model computations and retrieved evidence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes a framework where AI agents plan economic analyses, retrieve information from knowledge graphs of theory and data, choose and run economic models, and then produce narratives tied to those computations. The goal is to move beyond fluent but ungrounded text from language models alone to reports that economists can trace back to specific models and evidence. The framework is applied to creating reports on U.S. inflation persistence and Federal Reserve policy as well as generating narratives for bank stress tests involving commercial real estate refinancing. A reader would care if this approach makes AI assistance in economics more trustworthy by enforcing grounding in established theory and real data.

Core claim

The paper claims that an agentic RAG-based framework called the AI economist agent can generate economic reports by having LLM agents orchestrate retrieval from knowledge graphs, model selection and computation, and evidence-linked narrative generation, resulting in improved economic coherence and traceability as shown in the inflation and stress-test applications.

What carries the argument

LLM-based agents that plan the analysis, retrieve relevant evidence using RAG from knowledge graphs of economic data and theory, select appropriate models, execute the computations, and generate reports linked to the evidence.

If this is right

  • Grounding prevents the language model from producing quantitative claims on its own.
  • The approach leads to reports with better economic coherence in the tested scenarios.
  • Traceability to retrieved evidence and model computations is achieved in applications like inflation analysis and stress testing.
  • The framework supports scenario analysis without direct reliance on LLM-generated numbers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This could allow economists to verify AI outputs more easily by checking the linked models and data.
  • Extending the knowledge graphs with more diverse economic theories might broaden the range of analyses possible.
  • Testing the system on additional applications beyond inflation and banking stress could reveal its general applicability.

Load-bearing premise

LLM-based agents are able to accurately plan analyses, retrieve evidence, select models, and generate coherent reports without errors, provided the knowledge graphs contain sufficiently accurate and complete economic theory and data.

What would settle it

A demonstration that the generated reports in the U.S. inflation or bank stress-test cases contain claims inconsistent with the underlying model computations or evidence from the knowledge graphs would show the framework does not achieve the claimed grounding.

Figures

Figures reproduced from arXiv: 2606.20041 by Masahiro Kato.

Figure 1
Figure 1. Figure 1: Application 1 post-ModelRun GraphRAG paths. The figure displays selected paths from the returned graph result after model execution. Nodes are rendered as circular markers [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Application 1 executed DSGE-lite model path. [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Application 1 judge scores by condition. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Application 2 post-ModelRun GraphRAG paths. The figure displays selected paths from the returned graph result after model execution. Nodes are rendered as circular markers [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Application 2 executed regime-switching model path. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Application 2 bank stress metrics from model execution. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Application 2 judge scores by condition. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
read the original abstract

We propose a model-grounded RAG-based AI economist with an agentic framework for economic scenario analysis using large language models (LLMs) and knowledge graphs. While LLMs can generate fluent economic narratives, economists are often required to make economic claims grounded by economic theory and real-world data. Based on this motivation, this study proposes an RAG-based AI economist, which utilizes knowledge graphs including economic data and theory and LLM-based agents to plan the analysis, retrieve relevant evidence, select appropriate models, and generate reports. In our framework, we do not produce quantitative claims directly with the language model alone; instead, we generate narratives grounded in explicit model-based computations and linked to the retrieved evidence via AI agents. We refer to our framework as an AI economist agent. We evaluate the AI economist agent in two applications: economist report generation for U.S. inflation persistence and Federal Reserve policy, and bank stress-test narrative generation for U.S. commercial real estate refinancing stress. The results illustrate how grounding the generated reports improves their economic coherence and traceability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes an 'AI economist agent' framework that integrates LLMs with RAG and knowledge graphs containing economic theory and data. LLM-based agents plan analyses, retrieve evidence, select models, and generate reports; narratives are produced from explicit model-based computations rather than direct LLM outputs. The framework is evaluated on two applications: economist report generation for U.S. inflation persistence and Federal Reserve policy, and bank stress-test narrative generation for U.S. commercial real estate refinancing stress. The central claim is that this grounding improves economic coherence and traceability.

Significance. If the agentic pipeline reliably executes without material errors and the claimed improvements can be demonstrated quantitatively, the work could provide a practical template for model-grounded LLM use in economics. The emphasis on explicit model computations and evidence linking addresses a recognized limitation of standalone LLMs in domain-specific analysis. However, the absence of any metrics, baselines, ablation studies, or error analysis in the manuscript makes it impossible to assess whether the framework delivers the asserted gains.

major comments (2)
  1. [Abstract] Abstract: the claim that the framework 'improves their economic coherence and traceability' is presented without any quantitative metrics, baselines, error rates, ablation results, or human evaluation details. This absence directly undermines the central empirical claim of the paper.
  2. [Applications / Evaluation] Evaluation description (applications section): the manuscript states that the AI economist agent was evaluated on U.S. inflation persistence and bank stress-test tasks but supplies no information on how coherence or traceability were measured, what comparison systems were used, or what the observed differences were.
minor comments (1)
  1. The manuscript introduces several new terms ('AI economist agent', 'model-grounded RAG-based AI economist') without a clear glossary or consistent usage across sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the need for stronger empirical support. We agree that the manuscript would benefit from explicit quantitative evaluation details and will revise accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the framework 'improves their economic coherence and traceability' is presented without any quantitative metrics, baselines, error rates, ablation results, or human evaluation details. This absence directly undermines the central empirical claim of the paper.

    Authors: We acknowledge the validity of this observation. The current abstract and results section rely on illustrative case studies rather than formal metrics. In the revised manuscript we will (i) moderate the abstract claim to reflect the qualitative nature of the presented evidence and (ii) add a new evaluation subsection that reports expert-rated coherence scores, traceability accuracy (percentage of narrative claims correctly linked to retrieved evidence and model outputs), and direct comparisons against a non-agentic LLM baseline on the same tasks. revision: yes

  2. Referee: [Applications / Evaluation] Evaluation description (applications section): the manuscript states that the AI economist agent was evaluated on U.S. inflation persistence and bank stress-test tasks but supplies no information on how coherence or traceability were measured, what comparison systems were used, or what the observed differences were.

    Authors: We agree that the applications section is insufficiently detailed on measurement. The original evaluation consisted of end-to-end pipeline walkthroughs demonstrating model selection and evidence grounding. The revision will expand this section to specify: (a) the coherence and traceability metrics employed (expert annotation protocol and automated link-verification rate), (b) the baseline systems (vanilla LLM prompting and simple RAG without agentic planning), and (c) the observed differences (quantitative deltas and qualitative examples of improved economic consistency). revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes a new agentic framework combining RAG, knowledge graphs, and LLMs for model-grounded economic analysis. No equations, derivations, fitted parameters, or quantitative predictions appear in the abstract or described structure. The central claim of improved coherence and traceability is presented as a property of the novel construction itself rather than a result reduced to prior inputs by definition or self-citation. No load-bearing steps match any of the enumerated circularity patterns; the work is self-contained as a descriptive system proposal evaluated on two applications.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the unverified assumption that the agentic orchestration can reliably ground outputs and that the knowledge graphs are adequate; no free parameters or invented physical entities are introduced.

axioms (2)
  • domain assumption LLM-based agents can reliably perform planning, retrieval, model selection, and report generation tasks.
    Invoked throughout the framework description as the mechanism for analysis.
  • domain assumption Knowledge graphs can store and provide accurate economic data and theory for retrieval.
    Central to the RAG component of the proposed system.
invented entities (1)
  • AI economist agent no independent evidence
    purpose: Orchestrates planning, retrieval, model selection, and grounded report generation.
    New named framework introduced to combine the components.

pith-pipeline@v0.9.1-grok · 5730 in / 1406 out tokens · 35752 ms · 2026-06-26T15:12:20.004685+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    Koedinger

    Eason Chen, Chuangji Li, Shizhuo Li, Zimo Xiao, Jionghao Lin, and Kenneth R. Koedinger. Comparing rag and graphrag for page-level retrieval question answering on math textbook, 2025. a rXiv: 2509.16780

  2. [2]

    L ight RAG : Simple and fast retrieval-augmented generation

    Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. L ight RAG : Simple and fast retrieval-augmented generation. In Findings of the Association for Computational Linguistics (EMNLP). Association for Computational Linguistics, 2025

  3. [3]

    Hippo RAG : Neurobiologically inspired long-term memory for large language models

    Bernal Jimenez Gutierrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. Hippo RAG : Neurobiologically inspired long-term memory for large language models. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2024

  4. [4]

    Code execution as grounded supervision for llm reasoning, 2025

    Dongwon Jung, Wenxuan Zhou, and Muhao Chen. Code execution as grounded supervision for llm reasoning, 2025. a rXiv: 2506.10343

  5. [5]

    Generative ai for economic research: Use cases and implications for economists

    Anton Korinek. Generative ai for economic research: Use cases and implications for economists. Journal of Economic Literature, 61 0 (4): 0 1281–1317, 2023

  6. [6]

    Ai agents for economic research

    Anton Korinek. Ai agents for economic research. Technical report, National Bureau of Economic Research, 2025. Working Paper Series

  7. [7]

    ATOM : A dap T ive and O pti M ized dynamic temporal knowledge graph construction using LLM s

    Yassir Lairgi, Ludovic Moncla, Khalid Benabdeslem, R \'e my Cazabet, and Pierre Cl \'e au. ATOM : A dap T ive and O pti M ized dynamic temporal knowledge graph construction using LLM s. In Findings of the A ssociation for C omputational L inguistics: EACL 2026 . Association for Computational Linguistics, 2026

  8. [8]

    Kag: Boosting llms in professional domains via knowledge augmented generation

    Lei Liang, Zhongpu Bo, Zhengke Gui, Zhongshu Zhu, Ling Zhong, Peilong Zhao, Mengshu Sun, Zhiqiang Zhang, Jun Zhou, Wenguang Chen, Wen Zhang, and Huajun Chen. Kag: Boosting llms in professional domains via knowledge augmented generation. In Companion Proceedings of the ACM on Web Conference, 2025

  9. [9]

    Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

    Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, and Athanasios V. Vasilakos. Agentic retrieval-augmented generation: A survey on agentic rag, 2026. a rXiv: 2501.09136

  10. [10]

    The knowledge graph for macroeconomic analysis with alternative big data, 2020

    Yucheng Yang, Yue Pang, Guanhua Huang, and Weinan E. The knowledge graph for macroeconomic analysis with alternative big data, 2020. a rXiv: 2010.05172

  11. [11]

    Kag-thinker: Interactive thinking and deep reasoning in llms via knowledge-augmented generation, 2025

    Dalong Zhang, Jun Xu, Jun Zhou, Lei Liang, Lin Yuan, Ling Zhong, Mengshu Sun, Peilong Zhao, QiWei Wang, Xiaorui Wang, Xinkai Du, YangYang Hou, Yu Ao, ZhaoYang Wang, Zhengke Gui, ZhiYing Yi, Zhongpu Bo, Haofen Wang, and Huajun Chen. Kag-thinker: Interactive thinking and deep reasoning in llms via knowledge-augmented generation, 2025. a rXiv: 2506.17728

  12. [12]

    Parkes, and Richard Socher

    Stephan Zheng, Alexander Trott, Sunil Srinivasa, David C. Parkes, and Richard Socher. The ai economist: Taxation policy design via two-level deep multiagent reinforcement learning. Science Advances, 8 0 (18), 2022