Grain calibration decomposes theoretical constructs into clause-level components, tests each with extractive evidence, and combines results through explicit theory-derived rules to validate LLM coding beyond agreement with human annotators.
Do LLM s Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Outcome evidence improves LLM accuracy on scientific feasibility assessment more consistently than experiment descriptions, which introduce brittleness under partial context.
citing papers explorer
-
Correct codes for the wrong reasons? validating LLMs as measurement instruments for theoretical constructs
Grain calibration decomposes theoretical constructs into clause-level components, tests each with extractive evidence, and combines results through explicit theory-derived rules to validate LLM coding beyond agreement with human annotators.
-
Experiments or Outcomes? Probing Scientific Feasibility in Large Language Models
Outcome evidence improves LLM accuracy on scientific feasibility assessment more consistently than experiment descriptions, which introduce brittleness under partial context.