Title resolution pending

A Reproducibility, Generalizability Study of Large Language Models for Query Generation · 2024 · arXiv 3791.369843

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Hybrid Pooling with LLMs via Relevance Context Learning

cs.IR · 2026-02-09 · unverdicted · novelty 7.0

Relevance Context Learning generates explicit relevance narratives from judged examples to guide LLM assessors, outperforming zero-shot and standard in-context learning for IR relevance judgments.

Guidelines for Empirical Studies in Software Engineering involving Large Language Models

cs.SE · 2025-08-21 · accept · novelty 7.0 · 2 refs

The paper delivers a taxonomy of seven LLM study types in software engineering along with eight guidelines that separate mandatory requirements from recommended practices to address reproducibility challenges.

Formalized Information Needs Improve Large-Language-Model Relevance Judgments

cs.IR · 2026-04-05 · conditional · novelty 6.0

Synthetically formalizing information needs into topics with descriptions and narratives improves LLM relevance assessor agreement with humans and reduces over-labeling of relevant documents on TREC Deep Learning and Robust04.

LLMs as Assessors: Right for the Right Reason?

cs.IR · 2026-01-13 · unverdicted · novelty 5.0

LLMs judge document relevance at a level comparable to humans but frequently highlight different passages, indicating they are often not right for the right reasons and cannot fully replace human assessors.

citing papers explorer

Showing 4 of 4 citing papers.

Hybrid Pooling with LLMs via Relevance Context Learning cs.IR · 2026-02-09 · unverdicted · none · ref 3
Relevance Context Learning generates explicit relevance narratives from judged examples to guide LLM assessors, outperforming zero-shot and standard in-context learning for IR relevance judgments.
Guidelines for Empirical Studies in Software Engineering involving Large Language Models cs.SE · 2025-08-21 · accept · none · ref 127 · 2 links
The paper delivers a taxonomy of seven LLM study types in software engineering along with eight guidelines that separate mandatory requirements from recommended practices to address reproducibility challenges.
Formalized Information Needs Improve Large-Language-Model Relevance Judgments cs.IR · 2026-04-05 · conditional · none · ref 3
Synthetically formalizing information needs into topics with descriptions and narratives improves LLM relevance assessor agreement with humans and reduces over-labeling of relevant documents on TREC Deep Learning and Robust04.
LLMs as Assessors: Right for the Right Reason? cs.IR · 2026-01-13 · unverdicted · none · ref 3
LLMs judge document relevance at a level comparable to humans but frequently highlight different passages, indicating they are often not right for the right reasons and cannot fully replace human assessors.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer