GIScholarBench shows LLMs exhibit consistent overconfidence across three scholarly tasks in GIS, with different manifestations in factual retrieval, citation expansion, and idea generation.
Evaluation of Large Language Model Performance and Reliability for Citations and References in Scholarly Writing: Cross-Disciplinary Study
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A multi-agent framework uses natural language to generate and execute Python code for dynamic bibliometric analysis including networks, clustering, and automated reports.
The paper introduces a red-train-green lifecycle and governance metric stack that adapts acceptance testing to LLM systems for business use.
citing papers explorer
-
GIScholarBench: Benchmarking LLM Overconfidence in GIS Research
GIScholarBench shows LLMs exhibit consistent overconfidence across three scholarly tasks in GIS, with different manifestations in factual retrieval, citation expansion, and idea generation.
-
AI-Augmented Bibliometric Framework: A Paradigm Shift with Agentic AI for Dynamic, Snippet-Based Research Analysis
A multi-agent framework uses natural language to generate and execute Python code for dynamic bibliometric analysis including networks, clustering, and automated reports.
-
Acceptance-Test-Driven Evaluation Protocols for Business-Centric LLM Systems
The paper introduces a red-train-green lifecycle and governance metric stack that adapts acceptance testing to LLM systems for business use.