MetaGraph: A Large-Scale Meta-Analysis of GenAI in Financial NLP (2022-2025)

Enrico Santus; Leslie Barrett; Nathan Jessurun; Paolo Pedinotti; Peter Baumann

arxiv: 2509.09544 · v3 · pith:HSQP5ETKnew · submitted 2025-09-11 · 💻 cs.CL

MetaGraph: A Large-Scale Meta-Analysis of GenAI in Financial NLP (2022-2025)

Paolo Pedinotti , Peter Baumann , Nathan Jessurun , Leslie Barrett , Enrico Santus This is my paper

Pith reviewed 2026-05-18 17:38 UTC · model grok-4.3

classification 💻 cs.CL

keywords MetaGraphGenAIfinancial NLPknowledge graphsmeta-analysistrend analysisLLM extractionontology-guided

0 comments

The pith

MetaGraph uses ontology-guided LLM extraction to turn 681 papers into a knowledge graph that maps three phases of GenAI development in financial NLP from 2022 to 2025.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MetaGraph as a method to extract typed knowledge graphs from large collections of scientific papers through ontology-guided large language model processing. Applied to the full set of 681 papers on generative AI in finance, the resulting graph shows an initial burst of new tasks and datasets, followed by greater attention to limitations and risks, and then a turn toward modular system designs such as retrieval-augmented setups. A reader would care because manual narrative reviews cannot scale with the speed of this research area, while the structured extraction offers a repeatable way to track changes over time. The authors also release the extracted resource so others can reproduce or extend the analysis.

Core claim

MetaGraph is a methodology for extracting typed knowledge graphs from scientific corpora using ontology-guided LLM extraction to enable structured, large-scale trend analysis. Applied to 681 papers on GenAI in Finance (2022-2025), MetaGraph reveals three phases: early LLM-driven expansion of tasks and datasets, growing emphasis on limitations and risk, and a shift toward modular, system-oriented methods (e.g., retrieval-augmented designs).

What carries the argument

Ontology-guided LLM extraction that builds typed knowledge graphs from paper text to support structured meta-analysis of trends and relations.

If this is right

The field can now be monitored with reproducible, graph-based snapshots instead of ad-hoc narrative surveys.
Researchers gain access to a released resource of extracted entities and relations for further study.
Future papers can be added incrementally to track whether the shift toward modular designs continues.
The three-phase pattern supplies a baseline for comparing GenAI progress in finance against other application domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same extraction approach could be applied to other fast-moving technical literatures such as medical AI or robotics to detect comparable phase shifts.
If the phases prove stable, they might guide funding or regulatory priorities by highlighting when risk discussions overtake capability expansion.
Periodic re-runs of the pipeline on new papers could create an early-warning system for emerging methodological trends.

Load-bearing premise

The ontology-guided LLM extraction process accurately and consistently identifies the relevant entities, relations, and trends across the 681 papers without substantial errors, omissions, or biases introduced by the model or ontology choices.

What would settle it

Running the same extraction pipeline on the identical 681 papers with a different large language model or a modified ontology that produces markedly different phases or trend patterns would show the method is not reliable.

Figures

Figures reproduced from arXiv: 2509.09544 by Enrico Santus, Leslie Barrett, Nathan Jessurun, Paolo Pedinotti, Peter Baumann.

**Figure 1.** Figure 1: Example of paper subgraph. NLP. We introduce MetaGraph, a methodology for automated Knowledge Graph (KG) construction from research papers using LLMs, to address this gap. MetaGraph involves the manual definition of an ontology of information that is relevant to tracking research evolution (such as papers metadata, motivations and limitations, tasks approached, techniques, models, and datasets), and the … view at source ↗

**Figure 2.** Figure 2: Increasing Focus on Financial QA. Task [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Reported limitations by period. Syn [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 3.** Figure 3: New datasets by period. 4.1 A Growing Awareness LLMs have lowered key barriers to both adoption and data processing. On one hand, they remove data format constraints—enabling the processing of unstructured data. On the other, they support synthetic data generation, helping mitigate challenges such as cost, scarcity, and domain bias ( [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 6.** Figure 6: Share of papers using open-source models [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 5.** Figure 5: Technique evolution over time [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 7.** Figure 7: LLMs’ usage distribution over time [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Open-source LLMs’ sizes over time. and computed the relative proportion of financial QA instances, open models instances, new datasets (datasets created after 2022), and created datasets (the dataset has been created by the same authors who are using it). Industry moved faster—dominating financial QA and driving dataset innovation to stay competitive. Academia responded more cautiously, focusing on estab… view at source ↗

**Figure 11.** Figure 11: Latest trends in financial NLP lens on the field’s changing priorities and a reusable toolkit for data-driven meta-analysis. 6 Limitations • Our approach relies on a manually defined ontology, which introduces an inductive bias in how entities and relations are categorized. While this provides structure and interpretability, it may also limit flexibility and overlook alternative or emergent conceptualiz… view at source ↗

**Figure 12.** Figure 12: Geography of Financial NLP. The intensity of colors indicates the frequency of contributions [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗

**Figure 13.** Figure 13: Structure of the graph generated by applying MetaGraph on financial NLP papers. Nodes [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗

**Figure 14.** Figure 14: illustrates the changing proportions of datasets explicitly referencing prior literature over time. It illustrates how the proportion of datasets that explicitly reference previous literature has changed over time, relative to all datasets used in published studies. The trend suggests a clear shift in data reuse practices: following an initial phase characterized by widespread creation of new datasets, r… view at source ↗

**Figure 15.** Figure 15: Attitude toward LLMs have evolved over time. E References of Cited Approaches [PITH_FULL_IMAGE:figures/full_fig_p016_15.png] view at source ↗

read the original abstract

Financial NLP has evolved rapidly since late 2022, outpacing narrative surveys. We introduce MetaGraph, a methodology for extracting typed knowledge graphs from scientific corpora using ontology-guided LLM extraction to enable structured, large-scale trend analysis. Applied to 681 papers on GenAI in Finance (2022-2025), MetaGraph reveals three phases: early LLM-driven expansion of tasks and datasets, growing emphasis on limitations and risk, and a shift toward modular, system-oriented methods (e.g., retrieval-augmented designs). We release the resulting resource and artifacts to support reproducible meta-analysis and future monitoring of the field.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces MetaGraph, a methodology for extracting typed knowledge graphs from scientific corpora using ontology-guided LLM extraction to enable structured, large-scale trend analysis. Applied to 681 papers on GenAI in Finance (2022-2025), it reveals three phases: early LLM-driven expansion of tasks and datasets, growing emphasis on limitations and risk, and a shift toward modular, system-oriented methods (e.g., retrieval-augmented designs). The resulting resource and artifacts are released to support reproducible meta-analysis.

Significance. If the extraction process is shown to be reliable, MetaGraph provides a scalable framework for meta-analysis in rapidly evolving fields, moving beyond narrative surveys. The public release of the knowledge graph and artifacts is a clear strength that enables reproducibility and ongoing field monitoring.

major comments (2)

[Methodology] The central claims about the three observed phases rest on the accuracy of the ontology-guided LLM extraction from the 681 papers. The manuscript provides no quantitative validation of this step, such as precision/recall on a held-out sample, inter-annotator agreement with experts, or error analysis stratified by year (see Methodology section on the extraction pipeline). Without these, it is impossible to rule out systematic biases from the LLM or ontology choices influencing the reported trends.
[Results] The derivation of the three phases from the extracted graph lacks detail on the quantitative process used (e.g., how changes in entity/relation frequencies or modular method mentions were aggregated over time to identify phase boundaries). This makes the narrative in the Results section difficult to assess for robustness.

minor comments (2)

[Abstract] The abstract would benefit from briefly noting the corpus size (681 papers) and the public release of the resource to better convey the work's scope.
[Figures] Figure captions for the knowledge graph visualizations should include more detail on node/edge types and temporal encoding to improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our work. We address the major comments point-by-point below, indicating the revisions we plan to make to improve the manuscript's methodological transparency and robustness.

read point-by-point responses

Referee: [Methodology] The central claims about the three observed phases rest on the accuracy of the ontology-guided LLM extraction from the 681 papers. The manuscript provides no quantitative validation of this step, such as precision/recall on a held-out sample, inter-annotator agreement with experts, or error analysis stratified by year (see Methodology section on the extraction pipeline). Without these, it is impossible to rule out systematic biases from the LLM or ontology choices influencing the reported trends.

Authors: We agree that demonstrating the reliability of the extraction process is essential to support the validity of the identified phases. While the original submission emphasized the use of a carefully designed ontology to guide the LLM and reduce hallucinations, we did not include quantitative metrics. In the revised manuscript, we will add a dedicated validation subsection. This will report results from a held-out sample of 100 papers where we compute precision and recall by comparing LLM extractions to expert annotations, along with inter-annotator agreement scores. We will also provide a year-stratified error analysis to assess potential temporal biases. These additions will directly address concerns about systematic biases. revision: yes
Referee: [Results] The derivation of the three phases from the extracted graph lacks detail on the quantitative process used (e.g., how changes in entity/relation frequencies or modular method mentions were aggregated over time to identify phase boundaries). This makes the narrative in the Results section difficult to assess for robustness.

Authors: We acknowledge that greater detail on the phase identification process would enhance the transparency and allow for better evaluation of the results. The phases were derived by analyzing temporal trends in the frequencies of key entities and relations in the knowledge graph, such as increases in 'limitation' and 'risk' mentions, and the emergence of modular architectures. In the revision, we will expand the Results section to describe the quantitative aggregation method, including the use of time-binned frequency plots, normalization procedures, and the specific criteria (e.g., inflection points in multiple indicators) used to delineate the phase boundaries. Supporting figures and a step-by-step description will be added to facilitate reproducibility. revision: yes

Circularity Check

0 steps flagged

No significant circularity; trends are outputs from external corpus processing

full rationale

The derivation applies the MetaGraph extraction methodology to an independent corpus of 681 papers and reports the resulting three-phase narrative as an empirical finding. No equations, parameters, or premises reduce by construction to the target trends; the ontology-guided LLM step is a processing tool whose outputs are not presupposed in its definition, and no self-citation or fitted-input patterns are present in the provided description.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the reliability of the LLM extraction step and the completeness of the chosen ontology; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption An ontology can be defined that comprehensively captures the key concepts, tasks, methods, and risks relevant to GenAI in financial NLP.
The extraction process is guided by this ontology; if it misses important categories the resulting graph and phase analysis would be incomplete.

pith-pipeline@v0.9.0 · 5641 in / 1366 out tokens · 55720 ms · 2026-05-18T17:38:47.030742+00:00 · methodology

MetaGraph: A Large-Scale Meta-Analysis of GenAI in Financial NLP (2022-2025)

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)