Recognition: unknown
EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents
Pith reviewed 2026-05-09 21:36 UTC · model grok-4.3
The pith
A new large manually annotated dataset for open-domain event extraction supports models that generalize across geographical contexts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We create EVENT5Ws, a large, manually annotated, and statistically verified open-domain event extraction dataset. Using EVENT5Ws, we evaluate state-of-the-art pre-trained large language models and establish a benchmark for future research. We further show that models trained on EVENT5Ws generalize effectively to datasets from different geographical contexts.
What carries the argument
The EVENT5Ws dataset, produced by a systematic annotation pipeline that records the central elements of events from text documents.
If this is right
- Models trained on EVENT5Ws can be applied to event extraction tasks in varied geographical settings with effective performance.
- The dataset supplies a standard benchmark for comparing future event extraction algorithms.
- The documented annotation process and lessons learned supply guidance for constructing other large-scale open-domain datasets.
Where Pith is reading between the lines
- Better cross-region performance could support more reliable automated analysis during emergencies that span multiple areas.
- The same verification approach might be reused to improve label quality in other information-extraction datasets.
- Open-domain resources of this scale could reduce the need for domain-specific retraining when event extraction systems are deployed in new locations.
Load-bearing premise
The systematic annotation pipeline and statistical verification produce labels that are accurate, consistent, and representative of open-domain events without significant bias or coverage gaps.
What would settle it
If models trained on EVENT5Ws show no improvement over models trained on prior datasets when tested on event data from new geographical regions, the generalization benefit would be refuted.
Figures
read the original abstract
Event extraction identifies the central aspects of events from text. It supports event understanding and analysis, which is crucial for tasks such as informed decision-making in emergencies. Therefore, it is necessary to develop automated event extraction approaches. However, existing datasets for algorithm development have limitations, including limited coverage of event types in closed-domain settings and a lack of large, manually verified dataset in open-domain settings. To address these limitations, we create EVENT5Ws , a large, manually annotated, and statistically verified open-domain event extraction dataset. We design a systematic annotation pipeline to create the dataset and provide empirical insights into annotation complexity. Using EVENT5Ws, we evaluate state-of-the-art pre-trained large language models and establish a benchmark for future research. We further show that models trained on EVENT5Ws generalize effectively to datasets from different geographical contexts, which demonstrates its potential for developing generalizable algorithms. Finally, we summarize the lessons learned during the dataset development and provide recommendations to support future large-scale dataset development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces EVENT5Ws, a large manually annotated and statistically verified open-domain event extraction dataset based on the 5Ws framework. It describes a systematic annotation pipeline with empirical insights into annotation complexity, benchmarks state-of-the-art pre-trained LLMs to establish performance baselines, demonstrates that models trained on EVENT5Ws generalize effectively to event extraction datasets from different geographical contexts, and summarizes lessons learned with recommendations for future large-scale dataset development.
Significance. If the annotation quality and generalization results hold, this dataset would be a valuable contribution to open-domain event extraction research, filling gaps in coverage and scale compared to existing closed-domain resources. The cross-geographical generalization experiments are a notable strength, as they provide evidence for developing more robust algorithms. The inclusion of annotation complexity insights and practical recommendations further enhances the paper's utility for the community.
major comments (1)
- [Annotation Pipeline section] Annotation Pipeline section: The central claim that EVENT5Ws is 'manually annotated and statistically verified' requires explicit quantitative details on the verification process, including inter-annotator agreement metrics (e.g., Cohen's kappa or Fleiss' kappa), specific verification statistics, and exclusion criteria. These are load-bearing for assessing label accuracy, consistency, and representativeness; their absence prevents full evaluation of the dataset's quality.
minor comments (3)
- [Abstract] Abstract: Include at least one key statistic (e.g., number of documents or annotated events) to immediately convey the dataset's scale.
- [Generalization experiments section] Generalization experiments section: Clearly name the specific external datasets used for the geographical context tests and report their key characteristics (e.g., size, domain) to support replication and assessment of the generalization claim.
- Notation and tables: Ensure consistent use of event component labels (Who, What, When, Where, Why) across tables and figures to avoid reader confusion.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for recognizing the potential value of EVENT5Ws in advancing open-domain event extraction research. We address the major comment below.
read point-by-point responses
-
Referee: [Annotation Pipeline section] Annotation Pipeline section: The central claim that EVENT5Ws is 'manually annotated and statistically verified' requires explicit quantitative details on the verification process, including inter-annotator agreement metrics (e.g., Cohen's kappa or Fleiss' kappa), specific verification statistics, and exclusion criteria. These are load-bearing for assessing label accuracy, consistency, and representativeness; their absence prevents full evaluation of the dataset's quality.
Authors: We agree that the Annotation Pipeline section requires additional quantitative details to fully substantiate the claim of statistical verification. While the manuscript describes the systematic annotation pipeline and notes that annotations were manually performed with verification steps, it does not report specific inter-annotator agreement metrics, detailed verification statistics, or exclusion criteria. In the revised version, we will expand this section to include Cohen's kappa (or Fleiss' kappa) scores for agreement among annotators, verification statistics such as the number and percentage of annotations reviewed, agreement rates during verification, and the explicit exclusion criteria applied for low-quality or inconsistent annotations. These additions will directly address the concern and allow readers to better evaluate label accuracy, consistency, and representativeness. revision: yes
Circularity Check
No significant circularity
full rationale
The paper describes the creation of the EVENT5Ws dataset via a systematic annotation pipeline, provides statistical verification, benchmarks LLMs on it, and tests cross-geographical generalization. No mathematical derivations, equations, fitted parameters, or predictions appear in the argument structure. All claims rest on externally verifiable elements: the released dataset, the documented annotation process, and empirical benchmark results. No self-citation chains, self-definitional steps, or reductions of outputs to inputs by construction are present.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Manual annotation via the described pipeline yields high-quality, consistent labels suitable for benchmarking.
Reference graph
Works this paper leans on
-
[1]
and Doddington, George and Yamron, Jonathan and Yang, Yiming , month = jan, year =
Allan, James and Carbonell, Jaime G. and Doddington, George and Yamron, Jonathan and Yang, Yiming , month = jan, year =. Topic. In proceedings of. doi:10.1184/R1/6626252.v1 , abstract =
-
[2]
2023 , file =
Annotation. 2023 , file =
2023
-
[3]
Yu, Manzhu and Bambacus, Myra and Cervone, Guido and Clarke, Keith and Duffy, Daniel and Huang, Qunying and Li, Jing and Li, Wenwen and Li, Zhenlong and Liu, Qian and Resch, Bernd and Yang, Jingchao and Yang, Chaowei , month = dec, year =. Spatiotemporal event detection: a review , volume =. International Journal of Digital Earth , publisher =. doi:10.108...
-
[4]
Open domain event extraction from twitter , isbn =
Ritter, Alan and. Open domain event extraction from twitter , isbn =. Proceedings of the 18th. 2012 , keywords =. doi:10.1145/2339530.2339704 , abstract =
-
[5]
Jin, Peiquan and Mu, Lin and Zheng, Lizhou and Zhao, Jie and Yue, Lihua , month = apr, year =. News. Proceedings of the 26th. doi:10.1145/3041021.3054151 , abstract =
-
[6]
Ritter, Alan and Clark, Sam , year =. Named. Proceedings of the 2011
2011
-
[7]
Harrower, Tim , year =. Inside
-
[8]
Fine-grained
Dai, Zeyu and Taneja, Himanshu and Huang, Ruihong , month = aug, year =. Fine-grained. Proceedings of the
-
[9]
The Chronicle of Higher Education , author =
Rediscovering the. The Chronicle of Higher Education , author =. 1998 , file =
1998
-
[10]
Location extraction from disaster-related microblogs , isbn =
Lingad, John and Karimi, Sarvnaz and Yin, Jie , month = may, year =. Location extraction from disaster-related microblogs , isbn =. Proceedings of the 22nd. doi:10.1145/2487788.2488108 , abstract =
-
[11]
Annals of Operations Research , author =
Event classification and location prediction from tweets during disasters , volume =. Annals of Operations Research , author =. 2019 , note =. doi:10.1007/s10479-017-2522-3 , abstract =
-
[12]
Culturomics 2.0:. First Monday , author =. doi:10.5210/fm.v16i9.3663 , abstract =
-
[13]
Hamborg, Felix and Lachnit, Soeren and Schubotz, Moritz and Hepp, Thomas and Gipp, Bela , editor =. Transforming. 2018 , keywords =. doi:10.1007/978-3-319-78105-1_39 , abstract =
-
[14]
and Mitra, Tanushree , year =
Norambuena, Brian Felipe Keith and Horning, Michael A. and Mitra, Tanushree , year =. Evaluating the. Proceedings of
-
[15]
and Coyne, Bob and Diab, Mona T
Parton, Kristen and McKeown, Kathleen R. and Coyne, Bob and Diab, Mona T. and Grishman, Ralph and Hakkani-Tür, Dilek and Harper, Mary and Ji, Heng and Ma, Wei Yun and Meyers, Adam and Stolbach, Sara and Sun, Ang and Tur, Gokhan and Xu, Wei and Yaman, Sibel , month = aug, year =. Who,. Proceedings of the
-
[16]
Doddington, George and Mitchell, Alexis and Przybocki, Mark and Ramshaw, Lance and Strassel, Stephanie and Weischedel, Ralph , month = may, year =. The. Proceedings of the
-
[17]
Communication Methods and Measures , author =
The. Communication Methods and Measures , author =. 2016 , note =. doi:10.1080/19312458.2016.1228863 , abstract =
-
[18]
Communication Methods and Measures , author =
Agreement and. Communication Methods and Measures , author =. 2011 , note =. doi:10.1080/19312458.2011.568376 , abstract =
-
[19]
Proceedings of Corpus Linguistics , author =
The. Proceedings of Corpus Linguistics , author =. 2003 , file =
2003
-
[20]
Das, Amitava and Bandyaopadhyay, Sivaji and Gambäck, Björn , editor =. The. Computational. 2012 , keywords =. doi:10.1007/978-3-642-28604-9_44 , abstract =
-
[21]
and Meyers, Adam and Sharma, Kartavya , month = sep, year =
Yaman, Sibel and Hakkani-Tür, Dilek and Tur, Gokhan and Grishman, Ralph and Harper, Mary and McKeown, Kathleen R. and Meyers, Adam and Sharma, Kartavya , month = sep, year =. Classification-based strategies for combining multiple 5-w question answering systems , url =. Interspeech 2009 , publisher =. doi:10.21437/Interspeech.2009-691 , abstract =
-
[22]
Combining semantic and syntactic information sources for 5-w question answering , url =
Yaman, Sibel and Hakkani-Tür, Dilek and Tur, Gokhan , month = sep, year =. Combining semantic and syntactic information sources for 5-w question answering , url =. Interspeech 2009 , publisher =. doi:10.21437/Interspeech.2009-692 , abstract =
-
[23]
Hamborg, Felix and Breitinger, Corinna and Schubotz, Moritz and Lachnit, Soeren and Gipp, Bela , month = may, year =. Extraction of. Proceedings of the 18th. doi:10.1145/3197026.3203899 , abstract =
-
[24]
Overview of
Ellis, Joe and Getman, Jeremy and Strassel, Stephanie , year =. Overview of. Proceedings of
-
[25]
, year =
Getman, Jeremy and Ellis, Joe and Song, Zhiyi and Tracey, Jennifer and Strassel, Stephanie M. , year =. Overview of. Proceedings of the 2017
2017
-
[26]
Wang, Xiaozhi and Wang, Ziqi and Han, Xu and Jiang, Wangyi and Han, Rong and Liu, Zhiyuan and Li, Juanzi and Li, Peng and Lin, Yankai and Zhou, Jie , month = nov, year =. Proceedings of the 2020. doi:10.18653/v1/2020.emnlp-main.129 , abstract =
-
[27]
BERTScore: Evaluating Text Generation with BERT
Zhang, Tianyi and Kishore, Varsha and Wu, Felix and Weinberger, Kilian Q. and Artzi, Yoav , month = feb, year =. doi:10.48550/arXiv.1904.09675 , abstract =
work page internal anchor Pith review doi:10.48550/arxiv.1904.09675 1904
-
[28]
Lin, Chin-Yew , month = jul, year =. Text
-
[29]
Proceedings of the
Minard, Anne-Lyse and Speranza, Manuela and Urizar, Ruben and Altuna, Begoña and van Erp, Marieke and Schoen, Anneleen and van Son, Chantal , month = may, year =. Proceedings of the
-
[30]
Araki, Jun and Mitamura, Teruko , month = aug, year =. Open-. Proceedings of the 27th
-
[31]
Liu, Xiao and Huang, Heyan and Zhang, Yue , month = jul, year =. Open. Proceedings of the 57th. doi:10.18653/v1/P19-1276 , abstract =
-
[32]
Sims, Matthew and Park, Jong Ho and Bamman, David , month = jul, year =. Literary. Proceedings of the 57th. doi:10.18653/v1/P19-1353 , abstract =
-
[33]
2023 , file =
Amazon. 2023 , file =
2023
-
[34]
Prodigy , month = aug, year =
Prodigy ·. Prodigy , month = aug, year =
-
[35]
Open-source data curation platform for
S.L.U, Argilla , month = aug, year =. Open-source data curation platform for
-
[36]
2023 , file =
Best online platform for your. 2023 , file =
2023
-
[37]
doccano , copyright =
Nakayama, Hiroki and Kubo, Takahiro and Kamura, Junya and Yasufumi, Taniguchi and Liang, Xu , month = aug, year =. doccano , copyright =
-
[38]
and Gilbert, Eric , month = apr, year =
Mitra, Tanushree and Hutto, C.J. and Gilbert, Eric , month = apr, year =. Comparing. Proceedings of the 33rd. doi:10.1145/2702123.2702553 , abstract =
-
[39]
Soni, Sandeep and Mitra, Tanushree and Gilbert, Eric and Eisenstein, Jacob , month = jun, year =. Modeling. Proceedings of the 52nd. doi:10.3115/v1/P14-2068 , urldate =
-
[40]
Inductive
Hamilton, Will and Ying, Zhitao and Leskovec, Jure , year =. Inductive. Advances in
-
[41]
The thread of discourse , isbn =
Grimes, Joseph Evans , year =. The thread of discourse , isbn =
-
[42]
Document-Level Event Argument Extraction by Conditional Generation
Li, Sha and Ji, Heng and Han, Jiawei , editor =. Document-. Proceedings of the 2021. 2021 , pages =. doi:10.18653/v1/2021.naacl-main.69 , abstract =
-
[43]
Ebner, Seth and Xia, Patrick and Culkin, Ryan and Rawlins, Kyle and Van Durme, Benjamin , editor =. Multi-. Proceedings of the 58th. 2020 , pages =. doi:10.18653/v1/2020.acl-main.718 , abstract =
-
[44]
Dijk, Teun A. van , year =. News. doi:10.4324/9780203062784 , abstract =
-
[45]
Tong, MeiHan and Xu, Bin and Wang, Shuai and Han, Meihuan and Cao, Yixin and Zhu, Jiangqi and Chen, Siyu and Hou, Lei and Li, Juanzi , editor =. Proceedings of the 2022. 2022 , pages =. doi:10.18653/v1/2022.naacl-main.291 , abstract =
-
[46]
Proceedings of the 7th
Hamborg, Felix and Breitinger, Corinna and Gipp, Bela , month = sep, year =. Proceedings of the 7th
-
[47]
Pouran Ben Veyseh, Amir and Ebrahimi, Javid and Dernoncourt, Franck and Nguyen, Thien , editor =. Proceedings of the 2022. 2022 , pages =. doi:10.18653/v1/2022.emnlp-main.652 , abstract =
-
[48]
Sun, Zhaoyue and Li, Jiazheng and Pergola, Gabriele and Wallace, Byron and John, Bino and Greene, Nigel and Kim, Joseph and He, Yulan , editor =. Proceedings of the 2022. 2022 , pages =. doi:10.18653/v1/2022.emnlp-main.376 , abstract =
-
[49]
Neural Computing and Applications , author =. 2024 , keywords =. doi:10.1007/s00521-024-09907-4 , abstract =
-
[50]
Zheng, Shun and Cao, Wei and Xu, Wei and Bian, Jiang , editor =. Proceedings of the 2019. 2019 , pages =. doi:10.18653/v1/D19-1032 , abstract =
-
[51]
Li, Sha and Zhan, Qiusi and Conger, Kathryn and Palmer, Martha and Ji, Heng and Han, Jiawei , editor =. Proceedings of the 2023. 2023 , pages =. doi:10.18653/v1/2023.emnlp-main.170 , abstract =
-
[52]
Gottschalk, Simon and Demidova, Elena , editor =. The. 2018 , keywords =. doi:10.1007/978-3-319-93417-4_18 , abstract =
-
[53]
Mitamura, Teruko and Yamakawa, Yukari and Holm, Susan and Song, Zhiyi and Bies, Ann and Kulick, Seth and Strassel, Stephanie , editor =. Event. Proceedings of the 3rd. 2015 , pages =. doi:10.3115/v1/W15-0809 , urldate =
-
[54]
Lin, Ying and Ji, Heng and Huang, Fei and Wu, Lingfei , editor =. A. Proceedings of the 58th. 2020 , pages =. doi:10.18653/v1/2020.acl-main.713 , abstract =
-
[55]
Du, Xinya and Cardie, Claire , editor =. Event. Proceedings of the 2020. 2020 , pages =. doi:10.18653/v1/2020.emnlp-main.49 , abstract =
-
[56]
Hsu, I-Hung and Huang, Kuan-Hao and Boschee, Elizabeth and Miller, Scott and Natarajan, Prem and Chang, Kai-Wei and Peng, Nanyun , editor =. Proceedings of the 2022. 2022 , pages =. doi:10.18653/v1/2022.naacl-main.138 , abstract =
-
[57]
Nguyen, Minh Van and Min, Bonan and Dernoncourt, Franck and Nguyen, Thien , editor =. Joint. Proceedings of the 2022. 2022 , pages =. doi:10.18653/v1/2022.naacl-main.324 , abstract =
-
[58]
Ma, Yubo and Wang, Zehao and Cao, Yixin and Li, Mukai and Chen, Meiqi and Wang, Kun and Shao, Jing , editor =. Prompt for. Proceedings of the 60th. 2022 , pages =. doi:10.18653/v1/2022.acl-long.466 , abstract =
-
[59]
Liu, Jian and Liang, Chen and Xu, Jinan and Liu, Haoyan and Zhao, Zhe , editor =. Document-. Proceedings of the 61st. 2023 , pages =. doi:10.18653/v1/2023.acl-long.532 , abstract =
-
[60]
Du, Xinya and Cardie, Claire , editor =. Document-. Proceedings of the 58th. 2020 , pages =. doi:10.18653/v1/2020.acl-main.714 , abstract =
-
[61]
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics,
Lewis, Mike and Liu, Yinhan and Goyal, Naman and Ghazvininejad, Marjan and Mohamed, Abdelrahman and Levy, Omer and Stoyanov, Veselin and Zettlemoyer, Luke , editor =. Proceedings of the 58th. 2020 , pages =. doi:10.18653/v1/2020.acl-main.703 , abstract =
-
[62]
The Journal of Machine Learning Research , author =
Exploring the limits of transfer learning with a unified text-to-text transformer , volume =. The Journal of Machine Learning Research , author =. 2020 , keywords =
2020
-
[63]
Li, Qi and Ji, Heng and Huang, Liang , editor =. Joint. Proceedings of the 51st. 2013 , pages =
2013
-
[64]
Li, Xiang and Nguyen, Thien Huu and Cao, Kai and Grishman, Ralph , editor =. Improving. Proceedings of the. 2015 , pages =. doi:10.18653/v1/W15-4502 , urldate =
-
[65]
The stages of event extraction , url =
Ahn, David , editor =. The stages of event extraction , url =. Proceedings of the. 2006 , pages =
2006
-
[66]
Nguyen, Thien Huu and Grishman, Ralph , editor =. Event. Proceedings of the 53rd. 2015 , pages =. doi:10.3115/v1/P15-2060 , urldate =
-
[67]
Chen, Yubo and Xu, Liheng and Liu, Kang and Zeng, Daojian and Zhao, Jun , editor =. Event. Proceedings of the 53rd. 2015 , pages =. doi:10.3115/v1/P15-1017 , urldate =
-
[68]
Nguyen, Thien Huu and Cho, Kyunghyun and Grishman, Ralph , editor =. Joint. Proceedings of the 2016. 2016 , pages =. doi:10.18653/v1/N16-1034 , urldate =
-
[69]
Liu, Shulin and Chen, Yubo and Liu, Kang and Zhao, Jun , editor =. Exploiting. Proceedings of the 55th. 2017 , pages =. doi:10.18653/v1/P17-1164 , abstract =
-
[70]
Structured
Paolini, Giovanni and Athiwaratkun, Ben and Krone, Jason and Ma, Jie and Achille, Alessandro and Anubhai, Rishita and Santos, Cicero Nogueira dos and Xiang, Bing and Soatto, Stefano , month = oct, year =. Structured
-
[71]
Wadden, David and Wennberg, Ulme and Luan, Yi and Hajishirzi, Hannaneh , editor =. Entity,. Proceedings of the 2019. 2019 , pages =. doi:10.18653/v1/D19-1585 , abstract =
-
[72]
Exploiting
Duan, Shaoyang and He, Ruifang and Zhao, Wenli , editor =. Exploiting. Proceedings of the. 2017 , pages =
2017
-
[73]
Zhao, Yue and Jin, Xiaolong and Wang, Yuanzhuo and Cheng, Xueqi , editor =. Document. Proceedings of the 56th. 2018 , pages =. doi:10.18653/v1/P18-2066 , abstract =
-
[74]
Huang, Kung-Hsiang and Peng, Nanyun , editor =. Document-level. Proceedings of the. 2021 , pages =. doi:10.18653/v1/2021.nuse-1.4 , abstract =
-
[75]
Zhang, Zhisong and Kong, Xiang and Liu, Zhengzhong and Ma, Xuezhe and Hovy, Eduard , editor =. A. Proceedings of the 58th. 2020 , pages =. doi:10.18653/v1/2020.acl-main.667 , abstract =
-
[76]
Yang, Hang and Chen, Yubo and Liu, Kang and Xiao, Yang and Zhao, Jun , editor =. Proceedings of. 2018 , pages =. doi:10.18653/v1/P18-4009 , abstract =
-
[77]
Wei, Kaiwen and Sun, Xian and Zhang, Zequn and Zhang, Jingyuan and Zhi, Guo and Jin, Li , editor =. Trigger is. Proceedings of the 59th. 2021 , pages =. doi:10.18653/v1/2021.acl-long.360 , abstract =
-
[78]
Xu, Runxin and Liu, Tianyu and Li, Lei and Chang, Baobao , editor =. Document-level. Proceedings of the 59th. 2021 , pages =. doi:10.18653/v1/2021.acl-long.274 , abstract =
-
[79]
Yang, Hang and Sui, Dianbo and Chen, Yubo and Liu, Kang and Zhao, Jun and Wang, Taifeng , editor =. Document-level. Proceedings of the 59th. 2021 , pages =. doi:10.18653/v1/2021.acl-long.492 , abstract =
-
[80]
Du, Xinya and Rush, Alexander and Cardie, Claire , editor =. Proceedings of the 16th. 2021 , pages =. doi:10.18653/v1/2021.eacl-main.52 , abstract =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.