Journal of Clinical Epidemiology181(2025)

Judith-Lisa Lieberum, Markus Toews, Maria-Inti Metzendorf, et al · 2025 · arXiv 2025.111746

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

cs.CL · 2026-06-15 · unverdicted · novelty 7.0

LLM agents reach 90.9% retrieval recall at K=200 but recover at most 52.7% of ground-truth included studies because they cannot reliably apply PI/ECO eligibility criteria to topically similar distractors.

A Reproducible Optimisation Protocol for Calibrating Prompt-Based Large Language Model Workflows in Evidence Synthesis

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

The paper introduces a reproducible optimization protocol for prompt-based LLM workflows in evidence synthesis that separates task definitions from prompt harnesses, optimizes the harness against metrics and examples, and preserves the result as an inspectable artefact.

Understanding LLMs in Title-Abstract Screening: From Disagreements to Recommendations

cs.SE · 2026-06-16 · unverdicted · novelty 4.0

Analysis of LLM vs human disagreements in six software engineering systematic reviews reveals recurring causes like term ambiguity and proposes recommendations for LLM deployment.

citing papers explorer

Showing 3 of 3 citing papers.

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio cs.CL · 2026-06-15 · unverdicted · none · ref 25
LLM agents reach 90.9% retrieval recall at K=200 but recover at most 52.7% of ground-truth included studies because they cannot reliably apply PI/ECO eligibility criteria to topically similar distractors.
A Reproducible Optimisation Protocol for Calibrating Prompt-Based Large Language Model Workflows in Evidence Synthesis cs.LG · 2026-05-07 · unverdicted · none · ref 4
The paper introduces a reproducible optimization protocol for prompt-based LLM workflows in evidence synthesis that separates task definitions from prompt harnesses, optimizes the harness against metrics and examples, and preserves the result as an inspectable artefact.
Understanding LLMs in Title-Abstract Screening: From Disagreements to Recommendations cs.SE · 2026-06-16 · unverdicted · none · ref 15
Analysis of LLM vs human disagreements in six software engineering systematic reviews reveals recurring causes like term ambiguity and proposes recommendations for LLM deployment.

Journal of Clinical Epidemiology181(2025)

fields

years

verdicts

representative citing papers

citing papers explorer