arxiv: 2605.03701 · v1 · submitted 2026-05-05 · 💻 cs.CL · cs.AI

Recognition: unknown

SERE: Structural Example Retrieval for Enhancing LLMs in Event Causality Identification

Zhifeng Hao , Zhongjie Chen , Junhao Lu , Shengyin Yu , Guimin Hu , Keli Zhang , Ruichu Cai , Boyan Xu

Authors on Pith no claims yet

Pith reviewed 2026-05-07 04:00 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords event causality identificationlarge language modelsstructural retrievalfew-shot learningcausal hallucinationConceptNettree edit distanceexample selection

0 comments

The pith

Structural retrieval of examples reduces causal bias in LLMs for event causality identification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes SERE, a framework to enhance large language models' performance on event causality identification by retrieving structurally similar examples for few-shot prompting. LLMs tend to overpredict causal relationships between events due to inherent biases, limiting their reliability in tasks requiring accurate causal reasoning. SERE addresses this by combining three metrics: measuring conceptual connections via edit distances in ConceptNet, assessing syntactic similarity with tree edit distances, and filtering for matching causal patterns using the LLM itself. If effective, this would make LLMs more accurate guides for determining causality in text without additional training. Readers should care because better causal understanding in AI systems supports improved decision-making in fields like journalism, law, and scientific analysis.

Core claim

SERE integrates three structural retrieval strategies—Conceptual Path Metric using ConceptNet edit distance, Syntactic Metric via tree edit distance, and Causal Pattern Filtering with LLMs—to select relevant examples that guide LLMs in ECI tasks. This selection mitigates causal reasoning biases and improves accuracy compared to standard retrieval methods, as validated on multiple ECI datasets.

What carries the argument

The structural example retrieval mechanism using conceptual path measurement in ConceptNet, syntactic tree similarity, and causal pattern filtering to choose high-quality few-shot examples for LLMs.

If this is right

LLMs guided by SERE examples show improved accuracy and reduced overprediction of causality on ECI datasets.
The combination of conceptual, syntactic, and pattern-based metrics selects more relevant examples than standard semantic retrieval.
Performance gains hold across multiple standard ECI benchmarks.
Structural knowledge from external resources like ConceptNet enhances the relevance of retrieved examples for causal reasoning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could be adapted for other LLM tasks prone to reasoning biases, such as temporal ordering or counterfactual inference.
Combining SERE with other techniques like chain-of-thought might yield further gains in causal accuracy.
Validating the metrics on diverse languages or domains would test their robustness beyond English ECI datasets.

Load-bearing premise

The structural metrics based on ConceptNet paths, syntax trees, and causal patterns can consistently identify examples that better reduce the LLM's tendency to hallucinate causal relationships than alternative retrieval strategies.

What would settle it

Demonstrating that SERE does not yield higher accuracy or lower causal overprediction rates than random or embedding-based example selection on held-out ECI test sets would undermine the claim.

Figures

Figures reproduced from arXiv: 2605.03701 by Boyan Xu, Guimin Hu, Junhao Lu, Keli Zhang, Ruichu Cai, Shengyin Yu, Zhifeng Hao, Zhongjie Chen.

**Figure 1.** Figure 1: The middle instance is the sample to be in view at source ↗

**Figure 2.** Figure 2: Overview of the SERE framework. ing in the text. It provides supplementary information to the model from the perspective of external knowledge priors. Syntactic Structure: This reveals the structural features of the context and provides causal clues at the syntactic level. It directly derives from the language structure of the text and can parse specific expressions, directionality, and complex syntactic … view at source ↗

**Figure 3.** Figure 3: The Causal Pattern extraction results for the view at source ↗

read the original abstract

Event Causality Identification (ECI) requires models to determine whether a given pair of events in a context exhibits a causal relationship. While Large Language Models (LLMs) have demonstrated strong performance across various NLP tasks, their effectiveness in ECI remains limited due to biases in causal reasoning, often leading to overprediction of causal relationships (causal hallucination). To mitigate these issues and enhance LLM performance in ECI, we propose SERE, a structural example retrieval framework that leverages LLMs' few-shot learning capabilities. SERE introduces an innovative retrieval mechanism based on three structural concepts: (i) Conceptual Path Metric, which measures the conceptual relationship between events using edit distance in ConceptNet; (ii) Syntactic Metric, which quantifies structural similarity through tree edit distance on syntactic trees; and (iii) Causal Pattern Filtering, which filters examples based on predefined causal structures using LLMs. By integrating these structural retrieval strategies, SERE selects more relevant examples to guide LLMs in causal reasoning, mitigating bias and improving accuracy in ECI tasks. Extensive experiments on multiple ECI datasets validate the effectiveness of SERE. The source code is publicly available at https://github.com/DMIRLAB-Group/SERE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SERE combines ConceptNet paths, syntactic edit distance, and LLM pattern filtering to pick better few-shot examples for event causality ID, with claimed gains over standard retrieval on multiple datasets.

read the letter

The core of this paper is a retrieval method called SERE that tries to fix LLMs' tendency to overpredict causal links in event pairs. Instead of plain embedding similarity or random shots, it scores candidate examples with three signals: conceptual closeness via edit distance on ConceptNet paths, syntactic similarity via tree edit distance, and an LLM-based filter that keeps only examples matching certain causal structures. The abstract says this mix produces more useful prompts and lifts accuracy on standard ECI benchmarks, with code released on GitHub.

Referee Report

0 major / 3 minor

Summary. The paper proposes SERE, a structural example retrieval framework to improve large language models' performance on Event Causality Identification (ECI). SERE combines three retrieval metrics—Conceptual Path Metric (edit distance over ConceptNet paths), Syntactic Metric (tree edit distance on dependency parses), and Causal Pattern Filtering (LLM-based selection of examples matching predefined causal structures)—to select few-shot examples that reduce causal hallucination and bias. The central claim is that this structural approach outperforms standard retrieval baselines, with the claim resting on experimental results across multiple ECI datasets and the public release of code at https://github.com/DMIRLAB-Group/SERE.

Significance. If the reported gains hold under rigorous controls, the work offers a practical and reproducible method for mitigating LLM biases in causal reasoning via external structural knowledge. The combination of ConceptNet, syntactic parsing, and LLM filtering is a clear strength, and the public code release directly supports reproducibility and extension by the community.

minor comments (3)

[Abstract] The abstract and introduction would be strengthened by naming the specific ECI datasets used and reporting the absolute F1 or accuracy deltas against the strongest baseline, rather than only stating that improvements were observed.
[§3] In the method section, the precise weighting or aggregation rule for combining the three metrics into a single retrieval score is described at a high level; an explicit formula or pseudocode would remove ambiguity about how ties or conflicting signals are resolved.
[§4] The experimental section should include error bars or standard deviations across multiple random seeds for the few-shot runs, as LLM prompting variance can be substantial and is needed to assess whether the reported gains are robust.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the recommendation for minor revision. The summary accurately reflects the core of SERE: a structural retrieval framework that combines Conceptual Path Metric (ConceptNet edit distance), Syntactic Metric (tree edit distance), and Causal Pattern Filtering to select few-shot examples that reduce causal bias and hallucination in LLMs for Event Causality Identification. We appreciate the recognition of the practical value, the combination of external knowledge sources, and the public code release at https://github.com/DMIRLAB-Group/SERE for reproducibility. No specific major comments were provided, so we will focus on minor revisions to improve clarity, presentation, and any potential ambiguities in the experimental details.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The SERE framework defines its three structural retrieval metrics (Conceptual Path Metric via ConceptNet edit distance, Syntactic Metric via tree edit distance, and Causal Pattern Filtering via LLM) using external, independently available resources and tools. These definitions precede and are logically independent of the downstream ECI accuracy measurements; no parameter is fitted to target-task performance and then relabeled as a prediction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in. Experimental comparisons to standard retrieval baselines therefore rest on external validation rather than tautological reduction to the paper's own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach depends on standard external tools (ConceptNet, syntactic parsers) and the assumption that LLMs can reliably perform causal pattern filtering. No new physical entities or forces are postulated.

free parameters (1)

Metric thresholds and filtering criteria
Hyperparameters likely control which examples are selected or filtered; abstract provides no specific values or fitting procedure.

axioms (1)

domain assumption Structural similarity measured by ConceptNet paths, syntactic trees, and causal patterns correlates with example usefulness for LLM causal reasoning
The framework is built on the premise that these metrics select better examples than alternatives for reducing hallucination.

pith-pipeline@v0.9.0 · 5537 in / 1324 out tokens · 81936 ms · 2026-05-07T04:00:44.527187+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence
cs.CL 2026-05 unverdicted novelty 4.0

The authors introduce a validation framework showing LLMs can pull causal links from disaster social media but require checks against post-event evidence to avoid relying on model priors.

Reference graph

Works this paper leans on

26 extracted references · 2 canonical work pages · cited by 1 Pith paper

[1]

Revisiting demonstration selection strategies in in-context learning.Preprint, arXiv:2401.12087. Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, and 25 oth- ers. 2025. Qwen2.5 technical...

work page arXiv 2025
[2]

SemEval-2013 task 1: TempEval-3: Evaluat- ing time expressions, events, and temporal relations. InSecond Joint Conference on Lexical and Compu- tational Semantics (*SEM), V olume 2: Proceedings of the Seventh International Workshop on Seman- tic Evaluation (SemEval 2013), pages 1–9, Atlanta, Georgia, USA. Association for Computational Lin- guistics. Anton...

work page arXiv 2013
[3]

causal hallucination

Llamafactory: Unified efficient fine-tuning of 100+ language models. InProceedings of the 62nd Annual Meeting of the Association for Compu- tational Linguistics (V olume 3: System Demonstra- tions), Bangkok, Thailand. Association for Computa- tional Linguistics. Xinyu Zuo, Yubo Chen, Kang Liu, and Jun Zhao. 2020. KnowDis: Knowledge enhanced data augmentat...

2020
[4]

causal hallucination

onSERE, and the results are shown in Ta- ble 10. As can be seen, using the original causal pat- tern leads to a performance drop forSEREacross all three datasets. This may be due to the inclusion of some rules unrelated to structure in the extrac- tion process of the causal pattern, such as determin- ing direct/indirect pattern based on the presence of th...

2023
[5]

bail" refers to the process of being released from custody after paying a certain amount of money. - Event Y:

**Identify the Events**: - Event X: "bail" refers to the process of being released from custody after paying a certain amount of money. - Event Y: "booked into" refers to the act of being formally admitted into a jail or prison
[6]

booked into county jail

**Analyze the Context**: - The text states that Williams was "booked into county jail" and later "released on bail." This indicates a sequence of events where the booking into jail occurs first, followed by the release on bail
[7]

booked into

**Determine the Nature of the Relationship**: - The act of being "booked into" jail is a prerequisite for the possibility of being released on bail. In other words, one cannot be released on bail without first being booked into jail. - However, the act of "bail" itself does not cause the booking into jail; rather, it is a consequence of being booked
[8]

The booking into jail (Event Y) is what allows for the possibility of bail (Event X)

**Causal Direction**: - The causal direction is important here. The booking into jail (Event Y) is what allows for the possibility of bail (Event X). Therefore, while bail is a result of the booking, it does not cause the booking
[9]

Answer":

**Conclusion**: - Since Event Y (booked into) must occur before Event X (bail) can happen, and Event X does not influence or cause Event Y , there is no causal relationship between the two events. Based on this reasoning, the final answer is: {"Answer": "No"} Case 2: Extracting an Indirect Causal Chain Input Woman[event2: Convicted]of Double Murder In 201...

2010
[10]

- EVENT Y: Destroys a historic dairy building

The two events in natural language are: - EVENT X: A fire. - EVENT Y: Destroys a historic dairy building
[11]

fire" is related to

The ConceptNet Path indicates that "fire" is related to "destroy" through the antonym of "employ." The Dependency Tree shows that "Fire" is the subject of the verb "destroys," which directly connects the two events
[12]

This indicates a direct causal relationship where the fire (X) directly causes the destruction (Y)

Analyzing for a direct causal relationship: The text states that a fire destroys a building. This indicates a direct causal relationship where the fire (X) directly causes the destruction (Y). Therefore, the causal pattern is "Direct."
[13]

pattern":

Since we have already established a direct causal relationship, we do not need to analyze for indirect causal patterns. Final answer in JSON format: {"pattern": "Direct"} Case 4: Error Analysis Example Input After a quick survey of the scene , Owen said , McNutt’s main[event2:concern]was keeping the[event1:fire] from spreading to the heating oil stored ne...
[14]

The two events are: - EVENT X: fire - EVENT Y: concern
[15]

fire" is related to

The ConceptNet Path indicates that "fire" is related to "danger," and "danger" is related to "concern." This suggests a conceptual link between the two events. The Dependency Tree shows that the concern is explicitly stated as McNutt’s main concern regarding the fire, indicating a direct relationship
[16]

Instead, it indicates that the concern is about preventing the fire from spreading

Analyzing for a direct causal relationship, the text does not explicitly state that the fire causes concern in a direct manner. Instead, it indicates that the concern is about preventing the fire from spreading. Therefore, the causal relationship is not direct
[17]

Since there is no direct causal relationship, we need to analyze for indirect causal patterns
[18]

danger" (which is implied by the fire) and

The mediators in this context can be identified as "danger" (which is implied by the fire) and "spreading" (the concern is about preventing the fire from spreading). The fire (X) creates a situation of danger, which leads to concern (Y). Thus, we have: - X (fire) -> Z (danger) - Z (danger) -> Y (concern)
[19]

fire" that has a causal relationship with

Analyzing the patterns: - **Direct**: Not satisfied, as there is no explicit direct causal relationship. - **Coreference of X**: Not satisfied, as there is no similar event to "fire" that has a causal relationship with "concern." - **Coreference of Y**: Not satisfied, as there is no similar event to "concern" that has a causal relationship with "fire." - ...
[20]

The final judgement you give must be either ’Yes’ or ’No’, and nothing else
[21]

Answer":

You need to organize the final answer in JSON format: {"Answer": "Your answer, the answer must be either ’Yes’ or ’No’, and nothing else."}. Text:{input_text}; Event X:{source}; Event Y:{target}. Your answer in JSON format {"Answer": "Your answer, the answer must be either ’Yes’ or ’No’, and nothing else."}: Prompt 2: CoT Method Prompt Given a text and tw...
[25]

None" or

Finally, analyze all the following patterns ONE-BY-ONE to determine whether the given text and events satisfy. DO NOT answer "None" or "No". - Pattern Rules: Direct:If the text explicitly states a causal relationship between X and Y without involving any mediating event (Z), then the causal connection is "Direct". This means that X directly influences Y ,...
[26]

Direct". If so, answer causal pattern as

Analyze and determine whether X and Y have direct causal relationship, and meet the causal pattern rule "Direct". If so, answer causal pattern as "Direct"; If not, continue to analyze
[27]

Note: If X and Y have the indirect causal relationship, they must satisfy to one of the following patterns

Determine which indirect causal pattern given below the given input and events satisfy. Note: If X and Y have the indirect causal relationship, they must satisfy to one of the following patterns
[28]

Note: Mediators can be given explicitly from the input text

Consider whether there are mediators between events X and Y: write down other events (or entities) that relates to X, and other events (or entities) that relates to Y , and determine whether there is any intersection between the events (or entities) that relate to both events. Note: Mediators can be given explicitly from the input text. If not given, you ...
[29]

No" - Pattern Rules: Direct:If the text explicitly states a causal relationship between X and Y without involving any mediating event (Z), then the causal connection is

Finally, analyze all the following patterns ONE-BY-ONE to determine whether the given text and events satisfy. If no pattern rules are met, give "No" - Pattern Rules: Direct:If the text explicitly states a causal relationship between X and Y without involving any mediating event (Z), then the causal connection is "Direct". This means that X directly influ...