pith. machine review for the scientific record. sign in

arxiv: 2604.19779 · v1 · submitted 2026-03-29 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

ESGLens: An LLM-Based RAG Framework for Interactive ESG Report Analysis and Score Prediction

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:29 UTC · model grok-4.3

classification 💻 cs.CL
keywords ESG reportsRAGLLM embeddingsGRI standardsscore regressionenvironmental pillarPDF processing
0
0 comments X

The pith

ESGLens extracts GRI-aligned information from ESG reports and predicts environmental scores from ChatGPT embeddings with 0.48 Pearson correlation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ESGLens as a framework that combines retrieval-augmented generation with prompt-engineered extraction to handle long, unstructured ESG reports. It segments PDFs into typed chunks, retrieves and synthesizes content according to specific GRI standards, supports traceable question answering, and feeds extracted summaries into a regression model. On roughly 300 reports from major indices, ChatGPT embeddings passed through a neural network regressor reach 0.48 correlation with LSEG environmental scores, while other embedding-regressor pairs perform lower. This matters because manual review of heterogeneous reports is slow and inconsistent, so even modest automation could scale analysis without losing source traceability.

Core claim

ESGLens shows that embeddings produced by a general-purpose LLM from GRI-guided summaries of ESG reports contain enough signal to regress against LSEG environmental pillar scores, yielding a Pearson correlation of 0.48 (R² approximately 0.23) on a set of about 300 reports from QQQ, S&P 500, and Russell 1000 companies for fiscal year 2022.

What carries the argument

The scoring module that converts GRI-aligned extracted summaries into LLM embeddings and trains a regressor (ChatGPT embeddings plus neural network) against LSEG reference scores.

Load-bearing premise

Embeddings from a general-purpose LLM on GRI-guided extractions carry predictive information about ESG performance that is not already present in the LSEG scores used to train the regressor.

What would settle it

A replication on a fresh set of several hundred reports from a later fiscal year that yields correlation near zero or below would show the observed signal does not generalize.

Figures

Figures reproduced from arXiv: 2604.19779 by Meng-Chi Chen, Tsung-Yu Yang.

Figure 1
Figure 1. Figure 1: Comparison between General AI-powered PDF tools and the proposed Interactive Question [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Detailed process framework of ESGLens, illustrating the five-stage pipeline. (1) [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Training loss of NN. (b) Correlation of predicted and actual ESG scores using ChatGPT, [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

Environmental, Social, and Governance (ESG) reports are central to investment decision-making, yet their length, heterogeneous content, and lack of standardized structure make manual analysis costly and inconsistent. We present ESGLens, a proof-of-concept framework combining retrieval-augmented generation (RAG) with prompt-engineered extraction to automate three tasks: (1)~structured information extraction guided by Global Reporting Initiative (GRI) standards, (2)~interactive question-answering with source traceability, and (3)~ESG score prediction via regression on LLM-generated embeddings. ESGLens is purpose-built for the domain: a report-processing module segments heterogeneous PDF content into typed chunks (text, tables, charts); a GRI-guided extraction module retrieves and synthesizes information aligned with specific standards; and a scoring module embeds extracted summaries and feeds them to a regression model trained against London Stock Exchange Group (LSEG) reference scores. We evaluate the framework on approximately 300 reports from companies in the QQQ, S\&P~500, and Russell~1000 indices (fiscal year 2022). Among three embedding methods (ChatGPT, BERT, RoBERTa) and two regressors (Neural Network, LightGBM), ChatGPT embeddings with a Neural Network achieve a Pearson correlation of 0.48 ($R^{2} \approx 0.23$) against LSEG ground-truth scores -- a modest but statistically meaningful signal given the ${\sim}300$-report training set and restriction to the environmental pillar. A traceability audit shows that 8 of 10 extracted claims verify against the source document, with two failures attributable to few-shot example leakage. We discuss limitations including dataset size and restriction to environmental indicators, and release the code to support reproducibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. ESGLens is an LLM-based RAG framework for ESG report analysis that performs GRI-guided structured extraction, interactive question answering with source traceability, and ESG score prediction by training regressors on LLM-generated embeddings from extracted summaries. Evaluated on ~300 reports from major indices, it achieves a Pearson correlation of 0.48 (R² ≈ 0.23) for environmental pillar scores using ChatGPT embeddings and a neural network, with 8/10 claims traceable in an audit.

Significance. The work addresses a practical problem in ESG analysis by combining retrieval-augmented generation with domain-specific prompting. The modest correlation indicates some predictive signal in the embeddings, and the code release is a strength for reproducibility. However, the restriction to the environmental pillar and small dataset size mean the significance is primarily as a proof-of-concept rather than a robust predictive tool. Stronger validation would increase its value for the field.

major comments (2)
  1. Abstract: The Pearson correlation of 0.48 is reported for the ∼300-report training set without any held-out test performance, cross-validation results, or details on regularization for the neural network regressor applied to 1536-dimensional embeddings. This is load-bearing for the claim of a 'statistically meaningful signal' in ESG score prediction, as overfitting cannot be ruled out.
  2. Evaluation section: The comparison is limited to three embedding models and two regressors; no baseline using non-LLM features (e.g., keyword counts or report metadata) is provided to establish that the LLM embeddings add value beyond information already captured in the LSEG reference scores used for training.
minor comments (2)
  1. Abstract: The exact number of reports used for the correlation calculation should be stated precisely rather than 'approximately 300'.
  2. The paper would benefit from a table summarizing performance metrics for all embedding-regressor combinations rather than highlighting only the best result.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important aspects of validation for the score prediction component. We agree that the current presentation of results on the full training set requires strengthening to better support claims of a meaningful signal, and we will revise the manuscript to address both major comments through additional experiments and details.

read point-by-point responses
  1. Referee: [—] Abstract: The Pearson correlation of 0.48 is reported for the ∼300-report training set without any held-out test performance, cross-validation results, or details on regularization for the neural network regressor applied to 1536-dimensional embeddings. This is load-bearing for the claim of a 'statistically meaningful signal' in ESG score prediction, as overfitting cannot be ruled out.

    Authors: We agree that performance reported only on the training set without cross-validation leaves open the possibility of overfitting, particularly with high-dimensional embeddings. In the revised manuscript, we will add 5-fold cross-validation results for the ChatGPT + Neural Network model, reporting mean Pearson correlation and R² with standard deviations across folds. We will also expand the methods section to detail the neural network (two hidden layers of 512 and 256 units with ReLU, dropout rate 0.3, and L2 regularization with lambda=0.01) to address regularization concerns. These changes will be reflected in both the abstract and evaluation sections. revision: yes

  2. Referee: [—] Evaluation section: The comparison is limited to three embedding models and two regressors; no baseline using non-LLM features (e.g., keyword counts or report metadata) is provided to establish that the LLM embeddings add value beyond information already captured in the LSEG reference scores used for training.

    Authors: We acknowledge that non-LLM baselines are needed to isolate the contribution of LLM embeddings. While LSEG scores draw from multiple external sources, we will add in the revised evaluation a baseline using non-LLM features extracted from the reports themselves, including TF-IDF vectors on the full text, report length, and counts of GRI-aligned sections. These will be evaluated with the same regressors (Neural Network and LightGBM) and compared directly to the embedding-based results. This will clarify whether the embeddings provide incremental predictive value. revision: yes

Circularity Check

1 steps flagged

ESG score 'prediction' reduces to in-sample NN regressor fit on training embeddings

specific steps
  1. fitted input called prediction [Abstract]
    "ChatGPT embeddings with a Neural Network achieve a Pearson correlation of 0.48 ($R^{2} ≈ 0.23$) against LSEG ground-truth scores -- a modest but statistically meaningful signal given the ∼300-report training set and restriction to the environmental pillar."

    The neural-network regressor is trained on the embeddings from the identical ~300-report collection described as the training set to match the LSEG reference scores; the reported Pearson correlation is therefore the in-sample training fit, not an out-of-sample prediction.

full rationale

The paper's central quantitative result is the reported Pearson 0.48 correlation for environmental-pillar score prediction. This is produced by training a neural-network regressor directly on the ChatGPT embeddings extracted from the same ~300-report collection that is explicitly labeled the training set, with the LSEG scores as targets. The abstract presents this correlation as evidence of a 'statistically meaningful signal' without any reference to held-out data, cross-validation, or regularization. Consequently the quoted performance metric is the training-set fit by construction rather than an independent prediction. The RAG extraction and GRI-guided modules supply the embeddings but do not alter the fact that the regression step is a direct fit; no self-citation or ansatz smuggling appears in the load-bearing claim.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard LLM capabilities for extraction and embedding plus a fitted regression model; no new physical constants or invented entities are introduced.

free parameters (1)
  • Regression model parameters = Trained on ~300 reports
    Neural network or LightGBM weights are optimized to fit the LSEG reference scores on the 300-report training set.
axioms (1)
  • domain assumption LLM embeddings from GRI-guided summaries capture semantically relevant ESG information
    Invoked when feeding ChatGPT embeddings into the regressor as predictive features.

pith-pipeline@v0.9.0 · 5626 in / 1387 out tokens · 119949 ms · 2026-05-14T22:29:52.817253+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 3 internal anchors

  1. [1]

    A brief history of ESG: From pioneer to mainstream

    Dan Byrne. A brief history of ESG: From pioneer to mainstream. The Corporate Governance Institute,

  2. [2]

    Accessed: May 22, 2024

    URL https://www.thecorporategovernanceinstitute.com/insights/guides/a-brief-his tory-of-esg-from-pioneer-to-mainstream/ . Accessed: May 22, 2024

  3. [3]

    The history of ESG: A journey towards sustainable investing

    Tom Krantz. The history of ESG: A journey towards sustainable investing. IBM Think, 2024. URL https://www.ibm.com/think/topics/environmental-social-and-governance-history . Accessed: March 27, 2026

  4. [4]

    Environmental, social, and corporate governance: A history of ESG standardization from 1970s to the present, 4 2023

    Minzhi Luna Wang. Environmental, social, and corporate governance: A history of ESG standardization from 1970s to the present, 4 2023. URL https://sites.asit.columbia.edu/historydept/wp-con tent/uploads/sites/29/2023/05/Wang-Luna_thesis.pdf . Seminar Advisor: Elizabeth Blackmar; Second Reader: Kimberly Phillips-Fein

  5. [5]

    Evolution of esg reporting frameworks

    Satyajit Bose. Evolution of esg reporting frameworks. Values at work: Sustainable investing and ESG reporting, pages 13–33, 2020. URL https://theesgexchange.org/wp-content/uploads/2023/03/E volution-of-ESG-Reporting-Frameworks.pdf

  6. [6]

    The evolution of esg reports and the role of voluntary standards

    Ethan Rouen, Kunal Sachdeva, and Aaron Yoon. The evolution of esg reports and the role of voluntary standards. A vailableat SSRN 4227934, 2023. URL https://www.hbs.edu/ris/Publication%20File s/23-024_5d9ec300-5c37-4cac-9edb-bcf59650ceb4.pdf

  7. [7]

    Esg standards: Looming challenges and pathways forward

    Todd Cort and Daniel Esty. Esg standards: Looming challenges and pathways forward. Organization & Environment, 33(4):491–510, 2020. doi: 10.1177/1086026620945342. URL https://www.jstor.or g/stable/27001593

  8. [8]

    The impact of unstandardized data on ESG reporting

    Julian Göbel. The impact of unstandardized data on ESG reporting. Envoria Insights, May 2022. URL https://envoria.com/insights-news/the-impact-of-unstandardized-data-on-esg-reporting . Accessed: March 27, 2026

  9. [9]

    The 5 Main Challenges of ESG Reporting and Best Practices

    EcoActive. The 5 Main Challenges of ESG Reporting and Best Practices. EcoActive Blog, February

  10. [10]

    Accessed: March 27, 2026

    URL https://ecoactivetech.com/the-5-main-challenges-of-esg-reporting-and-best-p ractices/. Accessed: March 27, 2026

  11. [11]

    Euleresg: Automating esg disclosure analysis with llms, 2025

    Yi Ding, Xushuo Tang, Zhengyi Yang, Wenqian Zhang, Simin Wu, Yuxin Huang, Lingjing Lan, Weiyuan Li, Yin Chen, Mingchen Ju, Wenke Yang, Thong Hoang, Mykhailo Klymenko, Xiwei Zu, and Wenjie Zhang. Euleresg: Automating esg disclosure analysis with llms, 2025. URL https://arxiv.org/abs/ 2511.21712

  12. [12]

    Developing retrieval augmented generation (rag) based llm systems from pdfs: An experience report,

    Ayman Asad Khan, Md Toufique Hasan, Kai Kristian Kemell, Jussi Rasku, and Pekka Abrahamsson. Developing retrieval augmented generation (rag) based llm systems from pdfs: An experience report,

  13. [13]

    URL https://arxiv.org/abs/2410.15944

  14. [14]

    Curiousllm: Elevating multi-document question answering with llm-enhanced knowledge graph reasoning, 2025

    Zukang Yang, Zixuan Zhu, and Xuan Zhu. Curiousllm: Elevating multi-document question answering with llm-enhanced knowledge graph reasoning, 2025. URL https://arxiv.org/abs/2404.09077

  15. [15]

    Large language models for sus- tainability reporting: A systematic review and research agenda

    Seyed Alireza Mousavian Anaraki, Danilo Croce, and Roberto Basili. Large language models for sus- tainability reporting: A systematic review and research agenda. Sustainable Futures, 10:101494, 2025. doi: 10.1016/j.sftr.2025.101494. URL https://doi.org/10.1016/j.sftr.2025.101494

  16. [16]

    Chatreport: De- mocratizing sustainability disclosure analysis through llm-based tools

    Jingwei Ni, Julia Bingler, Chiara Colesanti-Senni, Mathias Kraus, Glen Gostlow, Tobias Schimanski, Dominik Stammbach, Saeid Ashraf Vaghefi, Qian Wang, Nicolas Webersinke, et al. Chatreport: De- mocratizing sustainability disclosure analysis through llm-based tools. arXiv preprint arXiv:2307.15770,

  17. [17]
  18. [18]

    Esgreveal: An llm-based approach for extracting structured data from esg reports

    Yi Zou, Mengying Shi, Zhongjie Chen, Zhu Deng, ZongXiong Lei, Zihan Zeng, Shiming Yang, HongX- iang Tong, Lei Xiao, and Wenwen Zhou. Esgreveal: An llm-based approach for extracting structured data from esg reports. arXiv preprint arXiv:2312.17264, 2023. doi: 10.48550/arXiv.2312.17264. URL https://doi.org/10.48550/arXiv.2312.17264

  19. [19]

    Retrieval-augmented generation for knowledge-intensive NLP tasks

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, ...

  20. [20]

    REALM: Retrieval- augmented language model pre-training

    Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. REALM: Retrieval- augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 3929–3938. PMLR, 2020. URL http://proceedings.mlr.press/v119/guu20a.html

  21. [21]

    Esg reporting lifecycle management with large language models and ai agents,

    Thong Hoang, Mykhailo Klymenko, Xiwei Xu, Shidong Pan, Yi Ding, Xushuo Tang, Zhengyi Yang, Jieke Shi, and David Lo. Esg reporting lifecycle management with large language models and ai agents,

  22. [22]

    URL https://arxiv.org/abs/2603.10646

  23. [23]

    Susgen-gpt: A data-centric llm for financial nlp and sustainability report generation, 2024

    Qilong Wu, Xiaoneng Xiang, Hejia Huang, Xuan Wang, Yeo Wei Jie, Ranjan Satapathy, Ricardo Shirota Filho, and Bharadwaj Veeravalli. Susgen-gpt: A data-centric llm for financial nlp and sustainability report generation, 2024. URL https://arxiv.org/abs/2412.10906

  24. [24]

    Automatic GRI-SDG annotation and LLM-based filtering for sustainability reports

    Seyed Alireza Mousavian Anaraki, Danilo Croce, and Roberto Basili. Automatic GRI-SDG annotation and LLM-based filtering for sustainability reports. In Cristina Bosco, Elisabetta Jezek, Marco Polignano, and Manuela Sanguinetti, editors, Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025), pages 775–784, Cagliari, Ital...

  25. [25]

    Attention Is All You Need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, pages 5998–6008. Curran Associates, Inc., 2017. URL https://arxiv.org/abs/ 1706.03762

  26. [26]

    Leveraging BERT language models for multi-lingual ESG issue identification

    Elvys Linhares Pontes, Mohamed Benjannet, and Lam Kim Ming. Leveraging BERT language models for multi-lingual ESG issue identification. In Proceedings of the Multi-Lingual ESG Issue Identification (ML-ESG) Shared Task. arXiv, 2023. URL https://arxiv.org/abs/2309.02189 . arXiv preprint arXiv:2309.02189

  27. [27]

    RoBERTa: A Robustly Optimized BERT Pretraining Approach

    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019. doi: 10.48550/arXiv.1907.11692. URL https://arxiv.org/ab s/1907.11692

  28. [28]

    Dynamicesg: A dataset for dynamically unearthing esg ratings from news articles

    Yu-Min Tseng, Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. Dynamicesg: A dataset for dynamically unearthing esg ratings from news articles. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 5412–5416, 2023. doi: 10.1145/358378 0.361511. URL https://doi.org/10.1145/3583780.361511

  29. [29]

    Using pre-trained language model for accurate ESG prediction

    Lei Xia, Mingming Yang, and Qi Liu. Using pre-trained language model for accurate ESG prediction. In Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning, pages 1–22, Jeju, South Korea, 3 August 2024. URL https://aclantholo gy.org/2024.finnlp-2.1

  30. [30]

    ESGBERT: Language model to help with classification tasks related to companies’ environmental, social, and governance practices

    Srishti Mehra, Robert Louka, and Yixun Zhang. ESGBERT: Language model to help with classification tasks related to companies’ environmental, social, and governance practices. In CS & IT Conference Proceedings, volume 12, pages 183–190, 2022. doi: 10.5121/csit.2022.120616. URL https://arxiv.or g/abs/2203.16788

  31. [31]

    Chatpdf: Specialized ai tool for pdf interaction and summarization

    ChatPDF GmbH. Chatpdf: Specialized ai tool for pdf interaction and summarization. Berlin, Germany,

  32. [32]

    Version 1.4.7 (May 2024)

    URL https://www.chatpdf.com/. Version 1.4.7 (May 2024). Accessed: 2026-03-27

  33. [33]

    Chatdoc: Ai-powered document analysis and insight extraction, March 2023

    ChatDOC. Chatdoc: Ai-powered document analysis and insight extraction, March 2023. URL https: //chatdoc.com/. Initial release March 15, 2023. Terms updated August 2024. Accessed: 2026-03-27

  34. [34]

    Pdf.ai: Chat with your pdf documents, 2024

    PDF.ai. Pdf.ai: Chat with your pdf documents, 2024. URL https://pdf.ai/. Accessed: 2026-03-27

  35. [35]

    LangChain

    Harrison Chase. LangChain. https://github.com/langchain- ai/langchain , 2022. URL https://github.com/langchain-ai/langchain. Accessed: 2024-12-01

  36. [36]

    Billion-scale similarity search with GPUs

    Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2021. doi: 1 0 . 1 1 0 9 / T B D A T A . 2 0 1 9 . 2 9 2 1 5 7 2. URL https://arxiv.org/abs/1702.08734

  37. [37]

    GRI universal standards 2021

    Global Reporting Initiative. GRI universal standards 2021. Technical report, Global Reporting Initiative, Amsterdam, The Netherlands, 2021. URL https://www.globalreporting.org/standards/

  38. [38]

    BERT: Pre- training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidi- rectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 19 ESGLens: An LLM-Based RAG Framework for Interactive ESG R...

  39. [39]

    LightGBM: A highly efficient gradient boosting decision tree

    Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, volume 30, pages 3146–3154. Curran Associates, Inc., 2017. URL https://pape rs.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-...

  40. [40]

    Environmental, social and governance scores from LSEG

    London Stock Exchange Group. Environmental, social and governance scores from LSEG. Technical report, London Stock Exchange Group, October 2024. URL https://www.lseg.com/content/dam/d ata-analytics/en_us/documents/methodology/lseg-esg-scores-methodology.pdf

  41. [41]

    Proposing an integrated approach to analyzing ESG data via machine learning and deep learning algorithms

    Ook Lee, Hanseon Joo, Hayoung Choi, and Minjong Cheon. Proposing an integrated approach to analyzing ESG data via machine learning and deep learning algorithms. Sustainability, 14(14):8745,

  42. [42]

    URL https://www.mdpi.com/2071-1050/14/14/8745

    doi: 10.3390/su14148745. URL https://www.mdpi.com/2071-1050/14/14/8745

  43. [43]

    Esgenius: Benchmarking llms on environmental, social, and governance (esg) and sustainability knowledge

    Chaoyue He, Xin Zhou, Yi Wu, Xinjia Yu, Yan Zhang, Lei Zhang, Di Wang, Shengfei Lyu, Hong Xu, Wang Xiaoqiao, Wei Liu, and Chunyan Miao. Esgenius: Benchmarking llms on environmental, social, and governance (esg) and sustainability knowledge. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, page 14623–14664. Associa...

  44. [44]

    ISBN 979-8-89176-332-6

    doi: 10.18653/v1/2025.emnlp-main.739. URL http://dx.doi.org/10.18653/v1/2025.emnl p-main.739

  45. [45]

    To appear in Proceedings of the 33rd ACM International Conference on Multimedia (MM ’25), Dublin, Ireland

    Lei Zhang, Xin Zhou, Chaoyue He, Di Wang, Yi Wu, Hong Xu, Wei Liu, and Chunyan Miao. Mmes- gbench: Pioneering multimodal understanding and complex reasoning benchmark for esg tasks, 2025. URL https://arxiv.org/abs/2507.18932. 20