DataSTORM: Deep Research on Large-Scale Databases using Exploratory Data Analysis and Data Storytelling

Camila Nicollier Sanchez; David Fernando Castro Pena; Monica S. Lam; Sajid Farook; Shicheng Liu; Yucheng Jiang

arxiv: 2604.06474 · v1 · submitted 2026-04-07 · 💻 cs.CL

DataSTORM: Deep Research on Large-Scale Databases using Exploratory Data Analysis and Data Storytelling

Shicheng Liu , Yucheng Jiang , Sajid Farook , Camila Nicollier Sanchez , David Fernando Castro Pena , Monica S. Lam This is my paper

Pith reviewed 2026-05-10 18:48 UTC · model grok-4.3

classification 💻 cs.CL

keywords LLM agentsdeep researchexploratory data analysisdata storytellingstructured databasesagentic AIInsightBenchACLED dataset

0 comments

The pith

DataSTORM reframes deep research on structured databases as an autonomous thesis-driven process using exploratory data analysis and storytelling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DataSTORM, an LLM agent that conducts deep research on large-scale structured databases and web sources. It treats analysis as discovering candidate theses from data, validating them through cross-source checks, and turning them into coherent narratives. This method draws from exploratory data analysis and data storytelling to handle the demands of quantitative reasoning over schemas. On InsightBench it sets a new state of the art, and on a new ACLED dataset it surpasses proprietary systems like ChatGPT Deep Research in both metrics and human judgment.

Core claim

DataSTORM is an LLM-based agentic system that autonomously performs deep research across large-scale structured databases and internet sources by discovering candidate theses from data, validating them iteratively, and developing them into analytical narratives grounded in exploratory data analysis and data storytelling principles.

What carries the argument

The thesis-driven analytical process that discovers candidate theses from data, validates them through iterative cross-source investigation, and develops them into coherent narratives.

If this is right

DataSTORM achieves a 19.4% relative improvement in insight-level recall and 7.2% in summary-level score on InsightBench.
It outperforms ChatGPT Deep Research on a new ACLED-based dataset in automated metrics and human evaluations.
The system handles both structured databases and unstructured internet sources in a unified way.
Effective data research requires iterative hypothesis generation and quantitative reasoning over schemas rather than just retrieval.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If LLM agents can execute this process reliably, they could automate much of the initial exploratory phase in data analysis projects.
This approach might extend to scientific databases where hypothesis testing from large datasets is key.
Integration with more advanced quantitative tools could further strengthen the validation step.
Human oversight might still be needed for final narrative refinement in high-stakes domains.

Load-bearing premise

LLM agents can perform reliable iterative hypothesis generation, quantitative reasoning, and narrative convergence on structured data without substantial human guidance.

What would settle it

Running DataSTORM autonomously on the ACLED dataset or similar complex database and finding that it produces incoherent narratives or incorrect quantitative insights that do not match expert analysis.

Figures

Figures reproduced from arXiv: 2604.06474 by Camila Nicollier Sanchez, David Fernando Castro Pena, Monica S. Lam, Sajid Farook, Shicheng Liu, Yucheng Jiang.

**Figure 2.** Figure 2: Overview of the Final Report Generation Module [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Pairwise preference rates in human evaluation. For each comparison, the bar shows the [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Consent page shown to participants before the human evaluation. Identifying information [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Screenshot of the custom web interface used for human evaluation. Participants reviewed [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

read the original abstract

Deep research with Large Language Model (LLM) agents is emerging as a powerful paradigm for multi-step information discovery, synthesis, and analysis. However, existing approaches primarily focus on unstructured web data, while the challenges of conducting deep research over large-scale structured databases remain relatively underexplored. Unlike web-based research, effective data-centric research requires more than retrieval and summarization and demands iterative hypothesis generation, quantitative reasoning over structured schemas, and convergence toward a coherent analytical narrative. In this paper, we present DataSTORM, an LLM-based agentic system capable of autonomously conducting research across both large-scale structured databases and internet sources. Grounded in principles from Exploratory Data Analysis and Data Storytelling, DataSTORM reframes deep research over structured data as a thesis-driven analytical process: discovering candidate theses from data, validating them through iterative cross-source investigation, and developing them into coherent analytical narratives. We evaluate DataSTORM on InsightBench, where it achieves a new state-of-the-art result with a 19.4% relative improvement in insight-level recall and 7.2% in summary-level score. We further introduce a new dataset built on ACLED, a real-world complex database, and demonstrate that DataSTORM outperforms proprietary systems such as ChatGPT Deep Research across both automated metrics and human evaluations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DataSTORM frames LLM agents around EDA and data storytelling for structured databases, reports SOTA gains on InsightBench plus a new ACLED dataset, but the agent reliability for quantitative steps stays lightly shown.

read the letter

DataSTORM tries to move LLM agents beyond web scraping into real work on large structured databases. It borrows from exploratory data analysis and data storytelling to set up a loop where the agent finds candidate theses in the data, checks them across sources, and builds them into narratives. The paper reports a 19.4% relative lift in insight-level recall on InsightBench and better results than ChatGPT Deep Research on both automated scores and human ratings for a new dataset drawn from ACLED conflict data. That dataset itself is a practical addition worth having around for future tests on complex real-world tables. The framing also makes sense: structured data research really does need iterative hypothesis work and quantitative checks rather than just retrieval and summary. The authors give credit to prior agentic systems while pointing out the gap for schema-heavy cases. Human evaluations add a bit of grounding beyond pure metrics. The soft spot is the lack of detail on how the agent keeps quantitative fidelity across iterations. The abstract lists the performance numbers but does not show experimental setup, error rates on aggregations, or any verification steps for schema traversal and calculations. The stress-test concern about unproven LLM reliability for autonomous reasoning holds up from what is visible; without ablations or step logs in the full text, it is hard to tell whether the gains reflect the EDA structure or just prompt luck. No circular math or invented entities appear. This is for people working on agentic data tools or applied analytics in domains like enterprise or monitoring. It has enough concrete pieces, including the dataset, to deserve referee time so the methods can be checked for reproducibility and robustness.

Referee Report

2 major / 2 minor

Summary. The paper introduces DataSTORM, an LLM-based agentic system for autonomous deep research over large-scale structured databases and web sources. Grounded in Exploratory Data Analysis and Data Storytelling, it reframes the task as a thesis-driven process of discovering candidate theses from data, validating them via iterative cross-source investigation, and synthesizing them into coherent analytical narratives. The central empirical claims are a new state-of-the-art on InsightBench (19.4% relative gain in insight-level recall, 7.2% in summary-level score) and outperformance versus ChatGPT Deep Research on a newly introduced ACLED-derived dataset, measured by both automated metrics and human evaluations.

Significance. If the results hold under rigorous scrutiny, the work would be a meaningful contribution by extending LLM agents beyond unstructured web retrieval into quantitative reasoning over structured schemas, an underexplored area. The release of a new ACLED-based dataset is a concrete positive that could support future benchmarking. The framing around EDA and data storytelling provides a principled conceptual anchor, though its translation into reliable agent behavior remains to be demonstrated.

major comments (2)

[Abstract / Evaluation] Abstract and Evaluation section: The SOTA claim of a 19.4% relative improvement in insight-level recall on InsightBench is presented without any description of the experimental setup, baselines, number of trials, statistical significance testing, or controls for prompt sensitivity and model version. This information is load-bearing for the central performance claim and must be supplied before the result can be assessed.
[Method / Evaluation] The manuscript's core assumption—that the agentic loop (hypothesis generation, quantitative schema traversal, cross-source validation, and narrative convergence) operates reliably without substantial human guidance or post-hoc tuning—is not supported by any ablation, failure-mode analysis, or quantitative checks on aggregation accuracy and statistical validity across iterations. This directly affects the validity of both the InsightBench and ACLED results.

minor comments (2)

[Abstract] The abstract would be clearer if it briefly stated the underlying LLM(s) and any external tools or verification mechanisms used by the agent.
[Dataset section] Notation for the new ACLED dataset (size, schema complexity, query types) should be introduced earlier to help readers contextualize the human-evaluation results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each of the major comments below and outline the revisions we plan to make.

read point-by-point responses

Referee: [Abstract / Evaluation] Abstract and Evaluation section: The SOTA claim of a 19.4% relative improvement in insight-level recall on InsightBench is presented without any description of the experimental setup, baselines, number of trials, statistical significance testing, or controls for prompt sensitivity and model version. This information is load-bearing for the central performance claim and must be supplied before the result can be assessed.

Authors: We agree that the abstract, being a high-level summary, does not include these details, and the Evaluation section would benefit from greater elaboration. In the revised version, we will expand the Evaluation section to provide a full description of the experimental setup, including the specific baselines compared against, the number of trials or runs performed, results of statistical significance testing, and measures taken to control for prompt sensitivity and model version variations. This will allow readers to better assess the robustness of the reported SOTA results. revision: yes
Referee: [Method / Evaluation] The manuscript's core assumption—that the agentic loop (hypothesis generation, quantitative schema traversal, cross-source validation, and narrative convergence) operates reliably without substantial human guidance or post-hoc tuning—is not supported by any ablation, failure-mode analysis, or quantitative checks on aggregation accuracy and statistical validity across iterations. This directly affects the validity of both the InsightBench and ACLED results.

Authors: This is a valid concern. The current manuscript emphasizes the end-to-end performance on the benchmarks but does not include dedicated ablations or failure analyses. We will add a new subsection in the Evaluation or Method section that includes ablations on the key components of the agentic loop (e.g., impact of hypothesis generation and cross-source validation), a discussion of observed failure modes with examples, and quantitative metrics on the accuracy of data aggregation and statistical validity checks across iterations. This will strengthen the evidence for the reliability of the system. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents DataSTORM as an LLM agent system for database research, grounded in established EDA and data storytelling principles. Its central claims consist of empirical SOTA results on InsightBench (19.4% insight-level recall gain) and outperformance on a new ACLED dataset versus baselines including ChatGPT Deep Research, supported by automated metrics and human evaluations. No equations, fitted parameters, derivations, or predictions appear in the text. No self-citations are invoked as load-bearing justifications for the method or results. The evaluation relies on external benchmarks and independent human assessment rather than any self-referential reduction of outputs to inputs. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities can be extracted from the given text.

pith-pipeline@v0.9.0 · 5554 in / 1147 out tokens · 34650 ms · 2026-05-10T18:48:20.587353+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

multi-agent exploration framework ... planner-executor decomposition ... query consistency module ... thesis generation module ... inductive statistical discovery with deductive LLM reasoning
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Grounded in principles from Exploratory Data Analysis and Data Storytelling

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Jacovi, A., Caciularu, A., Goldman, O., and Goldberg, Y

URLhttps://arxiv.org/abs/2602.05867. Minghang Deng, Ashwin Ramachandran, Canwen Xu, Lanxiang Hu, Zhewei Yao, Anupam Datta, and Hao Zhang. Reforce: A text-to-sql agent with self-refinement, consensus enforcement, and column exploration, 2025. URLhttps://arxiv.org/abs/2502.00675. Mingxuan Du, Benfeng Xu, Chiwei Zhu, Xiaorui Wang, and Zhendong Mao. Deepresea...

work page doi:10.18653/v1/2023.emnlp-main 2025
[2]

Hongchao Gu, Dexun Li, Kuicai Dong, Hao Zhang, Hang Lv, Hao Wang, Defu Lian, Yong Liu, and Enhong Chen

URLhttps://aclanthology.org/2023.emnlp-main.398/. Hongchao Gu, Dexun Li, Kuicai Dong, Hao Zhang, Hang Lv, Hao Wang, Defu Lian, Yong Liu, and Enhong Chen. RAPID: Efficient retrieval-augmented long text generation with writing planning and information discovery. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (eds.),Findings o...

work page doi:10.18653/v1/2025.findings-acl.859 2023
[3]

doi: 10.18653/v1/2024.findings-emnlp.815

Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-emnlp.815. URLhttps://aclanthology.org/2024.findings-emnlp.815/. Siyuan Guo, Cheng Deng, Ying Wen, Hechang Chen, Yi Chang, and Jun Wang. Ds-agent: automated data science by empowering large language models with case-based reasoning. InProceedings of the 41st International Conference...

work page doi:10.18653/v1/2024.findings-emnlp.815 2024
[4]

Wu, T., Xiang, C., Wang, J

URLhttps://proceedings.mlr.press/v235/hu24s.html. Harper Hua, Zhen Han, Zhengyuan Shen, Jeremy Lee, Patrick Guan, Qi Zhu, Sullam Jeoung, Yueyan Chen, Yunfei Bai, Shuai Wang, Vassilis Ioannidis, and Huzefa Rangwala. Sql-trail: Multi-turn reinforcement learning with interleaved feedback for text-to-sql, 2026. URL https: //arxiv.org/abs/2601.17699. Yucheng J...

work page doi:10.18653/v1/2024.emnlp-main 2026
[5]

Andy Kirk.Exploratory data analysis: Using visuals to see your data

URLhttps://aclanthology.org/2024.emnlp-main.554/. Andy Kirk.Exploratory data analysis: Using visuals to see your data. SAGE Publications, 2016. URLhttps://learningresources.sagepub.com/blog/campus/2021/04/22/ exploratory-data-analysis-using-visuals-to-see-your-data. Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, a...

work page 2024
[6]

Jake Linardon, Hannah K

URLhttps://arxiv.org/abs/2504.21776. Jake Linardon, Hannah K. Jarman, Zoe McClure, Cleo Anderson, Claudia Liu, and Mariel Messer. Influence of topic familiarity and prompt specificity on citation fabrication in mental health research using large language models: Experimental study.JMIR Mental Health, 12:e80371, 2025. doi: 10.2196/80371. URLhttps://mental....

work page doi:10.2196/80371 2025
[7]

wrangling

URLhttps://arxiv.org/abs/2503.13262. Dan Zhang, Sining Zhoubian, Min Cai, Fengzu Li, Lekang Yang, Wei Wang, Tianjiao Dong, Ziniu Hu, Jie Tang, and Yisong Yue. Datascibench: An llm agent benchmark for data science, 2025a. URLhttps://arxiv.org/abs/2502.13897. Shaolei Zhang, Ju Fan, Meihao Fan, Guoliang Li, and Xiaoyong Du. Deepanalyze: Agentic large languag...

work page doi:10.18653/v1/2024.acl-long.308 2024
[8]

Identify the question you are interested in

work page
[9]

For database questions, specify the expected output format, the number of columns, and the names of those columns

work page
[10]

destination

Ensure the question is self-contained and clearly scoped. For each question, also specify a "destination" to indicate where the question should be routed: - "database": The question can be answered by querying the database (e.g., aggregations, distributions, trends, filters, correlations, rankings, or any computation over the data). - "internet": The ques...

work page
[11]

For the nodes you would like to correct, issue a follow-up question with the desired SQL predicates

identify any inconsistencies in the SQL predicates used and standarize any inconsistencies. For the nodes you would like to correct, issue a follow-up question with the desired SQL predicates. You can directly instruct what to modify in the SQLs. DO NOT instruct new variables not seen in the current SQL. DO NOT instruct it correct any variables

work page
[12]

example_node

Some noes will be given to you as examples. These examples will be marked with "example_node": True, and you do not need to issue a follow-up question for them

work page
[13]

If any SQL appears to have forgotten the conversational context, issue a follow-up question to resolve it

make sure the SQLs reflect the conversation context presented in previous_queries. If any SQL appears to have forgotten the conversational context, issue a follow-up question to resolve it

work page
[14]

follow_up_question

If no follow-up question is needed, set "follow_up_question": None. Output a JSON following examples. # input { "example_node_0": { "query": "Show me the top 20 countries by the number of missile or artillery attacks that they have targetted by?", "SQL": "SELECT country, COUNT(*) AS attack_count FROM events WHERE sub_event_type IN ('Shelling/artillery/mis...

work page 2025
[15]

Sharpen - narrow or deepen the original argument using new supporting evidence

work page
[16]

Pivot - shift to a better-supported or more compelling argument uncovered by the new findings

work page
[17]

{{ topic }}

Confirm - keep the thesis essentially unchanged if the evidence continues to support it strongly Output exactly one refined thesis and the updated research strategy. # input Description of database content: {{ db_description }} Topic: {{ topic }} Current Thesis: {{ current_thesis }} Current Research strategy: {{ current_research_strategy }} Current findin...

work page
[18]

Give it a short **name** (3-6 words)

work page
[19]

criteria

Write a **description** of the general trend or pattern to look for (1-2 sentences, no specific numbers or dates needed but include e.g. the general trend) Return as a JSON object with a "criteria" array, each item having "name" and "description" fields. # input ## Research Task {{task_prompt}} ## Reference Article {{reference_article}} Table 17: Referenc...

work page

[1] [1]

Jacovi, A., Caciularu, A., Goldman, O., and Goldberg, Y

URLhttps://arxiv.org/abs/2602.05867. Minghang Deng, Ashwin Ramachandran, Canwen Xu, Lanxiang Hu, Zhewei Yao, Anupam Datta, and Hao Zhang. Reforce: A text-to-sql agent with self-refinement, consensus enforcement, and column exploration, 2025. URLhttps://arxiv.org/abs/2502.00675. Mingxuan Du, Benfeng Xu, Chiwei Zhu, Xiaorui Wang, and Zhendong Mao. Deepresea...

work page doi:10.18653/v1/2023.emnlp-main 2025

[2] [2]

Hongchao Gu, Dexun Li, Kuicai Dong, Hao Zhang, Hang Lv, Hao Wang, Defu Lian, Yong Liu, and Enhong Chen

URLhttps://aclanthology.org/2023.emnlp-main.398/. Hongchao Gu, Dexun Li, Kuicai Dong, Hao Zhang, Hang Lv, Hao Wang, Defu Lian, Yong Liu, and Enhong Chen. RAPID: Efficient retrieval-augmented long text generation with writing planning and information discovery. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (eds.),Findings o...

work page doi:10.18653/v1/2025.findings-acl.859 2023

[3] [3]

doi: 10.18653/v1/2024.findings-emnlp.815

Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-emnlp.815. URLhttps://aclanthology.org/2024.findings-emnlp.815/. Siyuan Guo, Cheng Deng, Ying Wen, Hechang Chen, Yi Chang, and Jun Wang. Ds-agent: automated data science by empowering large language models with case-based reasoning. InProceedings of the 41st International Conference...

work page doi:10.18653/v1/2024.findings-emnlp.815 2024

[4] [4]

Wu, T., Xiang, C., Wang, J

URLhttps://proceedings.mlr.press/v235/hu24s.html. Harper Hua, Zhen Han, Zhengyuan Shen, Jeremy Lee, Patrick Guan, Qi Zhu, Sullam Jeoung, Yueyan Chen, Yunfei Bai, Shuai Wang, Vassilis Ioannidis, and Huzefa Rangwala. Sql-trail: Multi-turn reinforcement learning with interleaved feedback for text-to-sql, 2026. URL https: //arxiv.org/abs/2601.17699. Yucheng J...

work page doi:10.18653/v1/2024.emnlp-main 2026

[5] [5]

Andy Kirk.Exploratory data analysis: Using visuals to see your data

URLhttps://aclanthology.org/2024.emnlp-main.554/. Andy Kirk.Exploratory data analysis: Using visuals to see your data. SAGE Publications, 2016. URLhttps://learningresources.sagepub.com/blog/campus/2021/04/22/ exploratory-data-analysis-using-visuals-to-see-your-data. Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, a...

work page 2024

[6] [6]

Jake Linardon, Hannah K

URLhttps://arxiv.org/abs/2504.21776. Jake Linardon, Hannah K. Jarman, Zoe McClure, Cleo Anderson, Claudia Liu, and Mariel Messer. Influence of topic familiarity and prompt specificity on citation fabrication in mental health research using large language models: Experimental study.JMIR Mental Health, 12:e80371, 2025. doi: 10.2196/80371. URLhttps://mental....

work page doi:10.2196/80371 2025

[7] [7]

wrangling

URLhttps://arxiv.org/abs/2503.13262. Dan Zhang, Sining Zhoubian, Min Cai, Fengzu Li, Lekang Yang, Wei Wang, Tianjiao Dong, Ziniu Hu, Jie Tang, and Yisong Yue. Datascibench: An llm agent benchmark for data science, 2025a. URLhttps://arxiv.org/abs/2502.13897. Shaolei Zhang, Ju Fan, Meihao Fan, Guoliang Li, and Xiaoyong Du. Deepanalyze: Agentic large languag...

work page doi:10.18653/v1/2024.acl-long.308 2024

[8] [8]

Identify the question you are interested in

work page

[9] [9]

For database questions, specify the expected output format, the number of columns, and the names of those columns

work page

[10] [10]

destination

Ensure the question is self-contained and clearly scoped. For each question, also specify a "destination" to indicate where the question should be routed: - "database": The question can be answered by querying the database (e.g., aggregations, distributions, trends, filters, correlations, rankings, or any computation over the data). - "internet": The ques...

work page

[11] [11]

For the nodes you would like to correct, issue a follow-up question with the desired SQL predicates

identify any inconsistencies in the SQL predicates used and standarize any inconsistencies. For the nodes you would like to correct, issue a follow-up question with the desired SQL predicates. You can directly instruct what to modify in the SQLs. DO NOT instruct new variables not seen in the current SQL. DO NOT instruct it correct any variables

work page

[12] [12]

example_node

Some noes will be given to you as examples. These examples will be marked with "example_node": True, and you do not need to issue a follow-up question for them

work page

[13] [13]

If any SQL appears to have forgotten the conversational context, issue a follow-up question to resolve it

make sure the SQLs reflect the conversation context presented in previous_queries. If any SQL appears to have forgotten the conversational context, issue a follow-up question to resolve it

work page

[14] [14]

follow_up_question

If no follow-up question is needed, set "follow_up_question": None. Output a JSON following examples. # input { "example_node_0": { "query": "Show me the top 20 countries by the number of missile or artillery attacks that they have targetted by?", "SQL": "SELECT country, COUNT(*) AS attack_count FROM events WHERE sub_event_type IN ('Shelling/artillery/mis...

work page 2025

[15] [15]

Sharpen - narrow or deepen the original argument using new supporting evidence

work page

[16] [16]

Pivot - shift to a better-supported or more compelling argument uncovered by the new findings

work page

[17] [17]

{{ topic }}

Confirm - keep the thesis essentially unchanged if the evidence continues to support it strongly Output exactly one refined thesis and the updated research strategy. # input Description of database content: {{ db_description }} Topic: {{ topic }} Current Thesis: {{ current_thesis }} Current Research strategy: {{ current_research_strategy }} Current findin...

work page

[18] [18]

Give it a short **name** (3-6 words)

work page

[19] [19]

criteria

Write a **description** of the general trend or pattern to look for (1-2 sentences, no specific numbers or dates needed but include e.g. the general trend) Return as a JSON object with a "criteria" array, each item having "name" and "description" fields. # input ## Research Task {{task_prompt}} ## Reference Article {{reference_article}} Table 17: Referenc...

work page