Recognition: 2 theorem links
· Lean TheoremESGLens: An LLM-Based RAG Framework for Interactive ESG Report Analysis and Score Prediction
Pith reviewed 2026-05-14 22:29 UTC · model grok-4.3
The pith
ESGLens extracts GRI-aligned information from ESG reports and predicts environmental scores from ChatGPT embeddings with 0.48 Pearson correlation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ESGLens shows that embeddings produced by a general-purpose LLM from GRI-guided summaries of ESG reports contain enough signal to regress against LSEG environmental pillar scores, yielding a Pearson correlation of 0.48 (R² approximately 0.23) on a set of about 300 reports from QQQ, S&P 500, and Russell 1000 companies for fiscal year 2022.
What carries the argument
The scoring module that converts GRI-aligned extracted summaries into LLM embeddings and trains a regressor (ChatGPT embeddings plus neural network) against LSEG reference scores.
Load-bearing premise
Embeddings from a general-purpose LLM on GRI-guided extractions carry predictive information about ESG performance that is not already present in the LSEG scores used to train the regressor.
What would settle it
A replication on a fresh set of several hundred reports from a later fiscal year that yields correlation near zero or below would show the observed signal does not generalize.
Figures
read the original abstract
Environmental, Social, and Governance (ESG) reports are central to investment decision-making, yet their length, heterogeneous content, and lack of standardized structure make manual analysis costly and inconsistent. We present ESGLens, a proof-of-concept framework combining retrieval-augmented generation (RAG) with prompt-engineered extraction to automate three tasks: (1)~structured information extraction guided by Global Reporting Initiative (GRI) standards, (2)~interactive question-answering with source traceability, and (3)~ESG score prediction via regression on LLM-generated embeddings. ESGLens is purpose-built for the domain: a report-processing module segments heterogeneous PDF content into typed chunks (text, tables, charts); a GRI-guided extraction module retrieves and synthesizes information aligned with specific standards; and a scoring module embeds extracted summaries and feeds them to a regression model trained against London Stock Exchange Group (LSEG) reference scores. We evaluate the framework on approximately 300 reports from companies in the QQQ, S\&P~500, and Russell~1000 indices (fiscal year 2022). Among three embedding methods (ChatGPT, BERT, RoBERTa) and two regressors (Neural Network, LightGBM), ChatGPT embeddings with a Neural Network achieve a Pearson correlation of 0.48 ($R^{2} \approx 0.23$) against LSEG ground-truth scores -- a modest but statistically meaningful signal given the ${\sim}300$-report training set and restriction to the environmental pillar. A traceability audit shows that 8 of 10 extracted claims verify against the source document, with two failures attributable to few-shot example leakage. We discuss limitations including dataset size and restriction to environmental indicators, and release the code to support reproducibility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. ESGLens is an LLM-based RAG framework for ESG report analysis that performs GRI-guided structured extraction, interactive question answering with source traceability, and ESG score prediction by training regressors on LLM-generated embeddings from extracted summaries. Evaluated on ~300 reports from major indices, it achieves a Pearson correlation of 0.48 (R² ≈ 0.23) for environmental pillar scores using ChatGPT embeddings and a neural network, with 8/10 claims traceable in an audit.
Significance. The work addresses a practical problem in ESG analysis by combining retrieval-augmented generation with domain-specific prompting. The modest correlation indicates some predictive signal in the embeddings, and the code release is a strength for reproducibility. However, the restriction to the environmental pillar and small dataset size mean the significance is primarily as a proof-of-concept rather than a robust predictive tool. Stronger validation would increase its value for the field.
major comments (2)
- Abstract: The Pearson correlation of 0.48 is reported for the ∼300-report training set without any held-out test performance, cross-validation results, or details on regularization for the neural network regressor applied to 1536-dimensional embeddings. This is load-bearing for the claim of a 'statistically meaningful signal' in ESG score prediction, as overfitting cannot be ruled out.
- Evaluation section: The comparison is limited to three embedding models and two regressors; no baseline using non-LLM features (e.g., keyword counts or report metadata) is provided to establish that the LLM embeddings add value beyond information already captured in the LSEG reference scores used for training.
minor comments (2)
- Abstract: The exact number of reports used for the correlation calculation should be stated precisely rather than 'approximately 300'.
- The paper would benefit from a table summarizing performance metrics for all embedding-regressor combinations rather than highlighting only the best result.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important aspects of validation for the score prediction component. We agree that the current presentation of results on the full training set requires strengthening to better support claims of a meaningful signal, and we will revise the manuscript to address both major comments through additional experiments and details.
read point-by-point responses
-
Referee: [—] Abstract: The Pearson correlation of 0.48 is reported for the ∼300-report training set without any held-out test performance, cross-validation results, or details on regularization for the neural network regressor applied to 1536-dimensional embeddings. This is load-bearing for the claim of a 'statistically meaningful signal' in ESG score prediction, as overfitting cannot be ruled out.
Authors: We agree that performance reported only on the training set without cross-validation leaves open the possibility of overfitting, particularly with high-dimensional embeddings. In the revised manuscript, we will add 5-fold cross-validation results for the ChatGPT + Neural Network model, reporting mean Pearson correlation and R² with standard deviations across folds. We will also expand the methods section to detail the neural network (two hidden layers of 512 and 256 units with ReLU, dropout rate 0.3, and L2 regularization with lambda=0.01) to address regularization concerns. These changes will be reflected in both the abstract and evaluation sections. revision: yes
-
Referee: [—] Evaluation section: The comparison is limited to three embedding models and two regressors; no baseline using non-LLM features (e.g., keyword counts or report metadata) is provided to establish that the LLM embeddings add value beyond information already captured in the LSEG reference scores used for training.
Authors: We acknowledge that non-LLM baselines are needed to isolate the contribution of LLM embeddings. While LSEG scores draw from multiple external sources, we will add in the revised evaluation a baseline using non-LLM features extracted from the reports themselves, including TF-IDF vectors on the full text, report length, and counts of GRI-aligned sections. These will be evaluated with the same regressors (Neural Network and LightGBM) and compared directly to the embedding-based results. This will clarify whether the embeddings provide incremental predictive value. revision: yes
Circularity Check
ESG score 'prediction' reduces to in-sample NN regressor fit on training embeddings
specific steps
-
fitted input called prediction
[Abstract]
"ChatGPT embeddings with a Neural Network achieve a Pearson correlation of 0.48 ($R^{2} ≈ 0.23$) against LSEG ground-truth scores -- a modest but statistically meaningful signal given the ∼300-report training set and restriction to the environmental pillar."
The neural-network regressor is trained on the embeddings from the identical ~300-report collection described as the training set to match the LSEG reference scores; the reported Pearson correlation is therefore the in-sample training fit, not an out-of-sample prediction.
full rationale
The paper's central quantitative result is the reported Pearson 0.48 correlation for environmental-pillar score prediction. This is produced by training a neural-network regressor directly on the ChatGPT embeddings extracted from the same ~300-report collection that is explicitly labeled the training set, with the LSEG scores as targets. The abstract presents this correlation as evidence of a 'statistically meaningful signal' without any reference to held-out data, cross-validation, or regularization. Consequently the quoted performance metric is the training-set fit by construction rather than an independent prediction. The RAG extraction and GRI-guided modules supply the embeddings but do not alter the fact that the regression step is a direct fit; no self-citation or ansatz smuggling appears in the load-bearing claim.
Axiom & Free-Parameter Ledger
free parameters (1)
- Regression model parameters =
Trained on ~300 reports
axioms (1)
- domain assumption LLM embeddings from GRI-guided summaries capture semantically relevant ESG information
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ChatGPT embeddings with a Neural Network achieve a Pearson correlation of 0.48 (R² ≈ 0.23) against LSEG ground-truth scores
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
GRI-guided extraction module retrieves and synthesizes information aligned with specific standards
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A brief history of ESG: From pioneer to mainstream
Dan Byrne. A brief history of ESG: From pioneer to mainstream. The Corporate Governance Institute,
-
[2]
URL https://www.thecorporategovernanceinstitute.com/insights/guides/a-brief-his tory-of-esg-from-pioneer-to-mainstream/ . Accessed: May 22, 2024
work page 2024
-
[3]
The history of ESG: A journey towards sustainable investing
Tom Krantz. The history of ESG: A journey towards sustainable investing. IBM Think, 2024. URL https://www.ibm.com/think/topics/environmental-social-and-governance-history . Accessed: March 27, 2026
work page 2024
-
[4]
Minzhi Luna Wang. Environmental, social, and corporate governance: A history of ESG standardization from 1970s to the present, 4 2023. URL https://sites.asit.columbia.edu/historydept/wp-con tent/uploads/sites/29/2023/05/Wang-Luna_thesis.pdf . Seminar Advisor: Elizabeth Blackmar; Second Reader: Kimberly Phillips-Fein
work page 2023
-
[5]
Evolution of esg reporting frameworks
Satyajit Bose. Evolution of esg reporting frameworks. Values at work: Sustainable investing and ESG reporting, pages 13–33, 2020. URL https://theesgexchange.org/wp-content/uploads/2023/03/E volution-of-ESG-Reporting-Frameworks.pdf
work page 2020
-
[6]
The evolution of esg reports and the role of voluntary standards
Ethan Rouen, Kunal Sachdeva, and Aaron Yoon. The evolution of esg reports and the role of voluntary standards. A vailableat SSRN 4227934, 2023. URL https://www.hbs.edu/ris/Publication%20File s/23-024_5d9ec300-5c37-4cac-9edb-bcf59650ceb4.pdf
work page 2023
-
[7]
Esg standards: Looming challenges and pathways forward
Todd Cort and Daniel Esty. Esg standards: Looming challenges and pathways forward. Organization & Environment, 33(4):491–510, 2020. doi: 10.1177/1086026620945342. URL https://www.jstor.or g/stable/27001593
-
[8]
The impact of unstandardized data on ESG reporting
Julian Göbel. The impact of unstandardized data on ESG reporting. Envoria Insights, May 2022. URL https://envoria.com/insights-news/the-impact-of-unstandardized-data-on-esg-reporting . Accessed: March 27, 2026
work page 2022
-
[9]
The 5 Main Challenges of ESG Reporting and Best Practices
EcoActive. The 5 Main Challenges of ESG Reporting and Best Practices. EcoActive Blog, February
-
[10]
URL https://ecoactivetech.com/the-5-main-challenges-of-esg-reporting-and-best-p ractices/. Accessed: March 27, 2026
work page 2026
-
[11]
Euleresg: Automating esg disclosure analysis with llms, 2025
Yi Ding, Xushuo Tang, Zhengyi Yang, Wenqian Zhang, Simin Wu, Yuxin Huang, Lingjing Lan, Weiyuan Li, Yin Chen, Mingchen Ju, Wenke Yang, Thong Hoang, Mykhailo Klymenko, Xiwei Zu, and Wenjie Zhang. Euleresg: Automating esg disclosure analysis with llms, 2025. URL https://arxiv.org/abs/ 2511.21712
-
[12]
Developing retrieval augmented generation (rag) based llm systems from pdfs: An experience report,
Ayman Asad Khan, Md Toufique Hasan, Kai Kristian Kemell, Jussi Rasku, and Pekka Abrahamsson. Developing retrieval augmented generation (rag) based llm systems from pdfs: An experience report,
- [13]
-
[14]
Zukang Yang, Zixuan Zhu, and Xuan Zhu. Curiousllm: Elevating multi-document question answering with llm-enhanced knowledge graph reasoning, 2025. URL https://arxiv.org/abs/2404.09077
-
[15]
Large language models for sus- tainability reporting: A systematic review and research agenda
Seyed Alireza Mousavian Anaraki, Danilo Croce, and Roberto Basili. Large language models for sus- tainability reporting: A systematic review and research agenda. Sustainable Futures, 10:101494, 2025. doi: 10.1016/j.sftr.2025.101494. URL https://doi.org/10.1016/j.sftr.2025.101494
-
[16]
Chatreport: De- mocratizing sustainability disclosure analysis through llm-based tools
Jingwei Ni, Julia Bingler, Chiara Colesanti-Senni, Mathias Kraus, Glen Gostlow, Tobias Schimanski, Dominik Stammbach, Saeid Ashraf Vaghefi, Qian Wang, Nicolas Webersinke, et al. Chatreport: De- mocratizing sustainability disclosure analysis through llm-based tools. arXiv preprint arXiv:2307.15770,
-
[17]
Chatreport: De- mocratizing sustainability disclosure analysis through llm-based tools
doi: 10.48550/arXiv.2307.15770. URL https://doi.org/10.48550/arXiv.2307.15770
-
[18]
Esgreveal: An llm-based approach for extracting structured data from esg reports
Yi Zou, Mengying Shi, Zhongjie Chen, Zhu Deng, ZongXiong Lei, Zihan Zeng, Shiming Yang, HongX- iang Tong, Lei Xiao, and Wenwen Zhou. Esgreveal: An llm-based approach for extracting structured data from esg reports. arXiv preprint arXiv:2312.17264, 2023. doi: 10.48550/arXiv.2312.17264. URL https://doi.org/10.48550/arXiv.2312.17264
-
[19]
Retrieval-augmented generation for knowledge-intensive NLP tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, ...
work page 2020
-
[20]
REALM: Retrieval- augmented language model pre-training
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. REALM: Retrieval- augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 3929–3938. PMLR, 2020. URL http://proceedings.mlr.press/v119/guu20a.html
work page 2020
-
[21]
Esg reporting lifecycle management with large language models and ai agents,
Thong Hoang, Mykhailo Klymenko, Xiwei Xu, Shidong Pan, Yi Ding, Xushuo Tang, Zhengyi Yang, Jieke Shi, and David Lo. Esg reporting lifecycle management with large language models and ai agents,
- [22]
-
[23]
Susgen-gpt: A data-centric llm for financial nlp and sustainability report generation, 2024
Qilong Wu, Xiaoneng Xiang, Hejia Huang, Xuan Wang, Yeo Wei Jie, Ranjan Satapathy, Ricardo Shirota Filho, and Bharadwaj Veeravalli. Susgen-gpt: A data-centric llm for financial nlp and sustainability report generation, 2024. URL https://arxiv.org/abs/2412.10906
-
[24]
Automatic GRI-SDG annotation and LLM-based filtering for sustainability reports
Seyed Alireza Mousavian Anaraki, Danilo Croce, and Roberto Basili. Automatic GRI-SDG annotation and LLM-based filtering for sustainability reports. In Cristina Bosco, Elisabetta Jezek, Marco Polignano, and Manuela Sanguinetti, editors, Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025), pages 775–784, Cagliari, Ital...
work page 2025
-
[25]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, pages 5998–6008. Curran Associates, Inc., 2017. URL https://arxiv.org/abs/ 1706.03762
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[26]
Leveraging BERT language models for multi-lingual ESG issue identification
Elvys Linhares Pontes, Mohamed Benjannet, and Lam Kim Ming. Leveraging BERT language models for multi-lingual ESG issue identification. In Proceedings of the Multi-Lingual ESG Issue Identification (ML-ESG) Shared Task. arXiv, 2023. URL https://arxiv.org/abs/2309.02189 . arXiv preprint arXiv:2309.02189
-
[27]
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019. doi: 10.48550/arXiv.1907.11692. URL https://arxiv.org/ab s/1907.11692
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1907.11692 1907
-
[28]
Dynamicesg: A dataset for dynamically unearthing esg ratings from news articles
Yu-Min Tseng, Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. Dynamicesg: A dataset for dynamically unearthing esg ratings from news articles. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 5412–5416, 2023. doi: 10.1145/358378 0.361511. URL https://doi.org/10.1145/3583780.361511
-
[29]
Using pre-trained language model for accurate ESG prediction
Lei Xia, Mingming Yang, and Qi Liu. Using pre-trained language model for accurate ESG prediction. In Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning, pages 1–22, Jeju, South Korea, 3 August 2024. URL https://aclantholo gy.org/2024.finnlp-2.1
work page 2024
-
[30]
Srishti Mehra, Robert Louka, and Yixun Zhang. ESGBERT: Language model to help with classification tasks related to companies’ environmental, social, and governance practices. In CS & IT Conference Proceedings, volume 12, pages 183–190, 2022. doi: 10.5121/csit.2022.120616. URL https://arxiv.or g/abs/2203.16788
-
[31]
Chatpdf: Specialized ai tool for pdf interaction and summarization
ChatPDF GmbH. Chatpdf: Specialized ai tool for pdf interaction and summarization. Berlin, Germany,
-
[32]
URL https://www.chatpdf.com/. Version 1.4.7 (May 2024). Accessed: 2026-03-27
work page 2024
-
[33]
Chatdoc: Ai-powered document analysis and insight extraction, March 2023
ChatDOC. Chatdoc: Ai-powered document analysis and insight extraction, March 2023. URL https: //chatdoc.com/. Initial release March 15, 2023. Terms updated August 2024. Accessed: 2026-03-27
work page 2023
-
[34]
Pdf.ai: Chat with your pdf documents, 2024
PDF.ai. Pdf.ai: Chat with your pdf documents, 2024. URL https://pdf.ai/. Accessed: 2026-03-27
work page 2024
- [35]
-
[36]
Billion-scale similarity search with GPUs
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2021. doi: 1 0 . 1 1 0 9 / T B D A T A . 2 0 1 9 . 2 9 2 1 5 7 2. URL https://arxiv.org/abs/1702.08734
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[37]
Global Reporting Initiative. GRI universal standards 2021. Technical report, Global Reporting Initiative, Amsterdam, The Netherlands, 2021. URL https://www.globalreporting.org/standards/
work page 2021
-
[38]
BERT: Pre- training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidi- rectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 19 ESGLens: An LLM-Based RAG Framework for Interactive ESG R...
-
[39]
LightGBM: A highly efficient gradient boosting decision tree
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, volume 30, pages 3146–3154. Curran Associates, Inc., 2017. URL https://pape rs.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-...
work page 2017
-
[40]
Environmental, social and governance scores from LSEG
London Stock Exchange Group. Environmental, social and governance scores from LSEG. Technical report, London Stock Exchange Group, October 2024. URL https://www.lseg.com/content/dam/d ata-analytics/en_us/documents/methodology/lseg-esg-scores-methodology.pdf
work page 2024
-
[41]
Ook Lee, Hanseon Joo, Hayoung Choi, and Minjong Cheon. Proposing an integrated approach to analyzing ESG data via machine learning and deep learning algorithms. Sustainability, 14(14):8745,
-
[42]
URL https://www.mdpi.com/2071-1050/14/14/8745
doi: 10.3390/su14148745. URL https://www.mdpi.com/2071-1050/14/14/8745
-
[43]
Chaoyue He, Xin Zhou, Yi Wu, Xinjia Yu, Yan Zhang, Lei Zhang, Di Wang, Shengfei Lyu, Hong Xu, Wang Xiaoqiao, Wei Liu, and Chunyan Miao. Esgenius: Benchmarking llms on environmental, social, and governance (esg) and sustainability knowledge. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, page 14623–14664. Associa...
work page 2025
-
[44]
doi: 10.18653/v1/2025.emnlp-main.739. URL http://dx.doi.org/10.18653/v1/2025.emnl p-main.739
-
[45]
Lei Zhang, Xin Zhou, Chaoyue He, Di Wang, Yi Wu, Hong Xu, Wei Liu, and Chunyan Miao. Mmes- gbench: Pioneering multimodal understanding and complex reasoning benchmark for esg tasks, 2025. URL https://arxiv.org/abs/2507.18932. 20
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.