Task-Dependent Evaluation of LLM Output Homogenization: A Taxonomy-Guided Framework

· 2025 · cs.CL · arXiv 2509.21267

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Large language models often generate homogeneous outputs, but whether this is problematic depends on the specific task. For objective math tasks, responses may vary in terms of problem-solving strategy but should maintain the same verifiable answer. Whereas, for creative writing tasks, we often expect variation in key narrative components (e.g. plot, setting, etc.) beyond mere vocabulary diversity. Prior work on homogenization rarely conceptualizes diversity in a task-dependent way. We address this gap with four contributions: (1) a task taxonomy with distinct notions of functional diversity -- whether a user would perceive two responses as meaningfully different for a given task; (2) a small user study validating that the taxonomy aligns with human perception of functional diversity; (3) a task-dependent sampling technique that increases diversity only where homogenization is undesired; (4) evidence challenging the perceived diversity-quality trade-off, showing it may stem from mis-conceptualizing both diversity and quality in a task-agnostic way.

representative citing papers

Cognitive offloading and the speedup illusion in human-AI interaction

cs.CY · 2026-05-22 · unverdicted · novelty 6.0

Preregistered behavioral study identifies a speedup illusion where users overestimate time savings from AI assistance on cognitive tasks despite no actual difference in completion times.

Where does output diversity collapse in post-training?

cs.CL · 2026-04-17 · unverdicted · novelty 6.0

Diversity collapse in post-trained LLMs is driven by data composition during training, occurs at stages like supervised fine-tuning, and is embedded in model weights rather than imposed by generation format.

citing papers explorer

Showing 2 of 2 citing papers.

Cognitive offloading and the speedup illusion in human-AI interaction cs.CY · 2026-05-22 · unverdicted · none · ref 62 · internal anchor
Preregistered behavioral study identifies a speedup illusion where users overestimate time savings from AI assistance on cognitive tasks despite no actual difference in completion times.
Where does output diversity collapse in post-training? cs.CL · 2026-04-17 · unverdicted · none · ref 3 · internal anchor
Diversity collapse in post-trained LLMs is driven by data composition during training, occurs at stages like supervised fine-tuning, and is embedded in model weights rather than imposed by generation format.

Task-Dependent Evaluation of LLM Output Homogenization: A Taxonomy-Guided Framework

fields

years

verdicts

representative citing papers

citing papers explorer