Comparative evaluation of seven confidence constructions across 25 LLM-dataset pairs reveals that verbalized scores provide good ranking but coarse granularity for thresholding, while multi-query aggregation helps weak models but can harm strong ones.
Hybrid reasoning based on large language models for autonomous car driving,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
An LLM-integrated semantic framework for V2X claims a 33.54% average reduction in transmitted data volume in a multilane traffic simulation.
citing papers explorer
-
The Score Granularity Gap in Black-Box LLM Classification: A Comparative Study of Confidence Constructions
Comparative evaluation of seven confidence constructions across 25 LLM-dataset pairs reveals that verbalized scores provide good ranking but coarse granularity for thresholding, while multi-query aggregation helps weak models but can harm strong ones.