Surprisal Minimisation over Goal-directed Alternatives Predicts Production Choice in Dialogue
Pith reviewed 2026-05-09 19:52 UTC · model grok-4.3
The pith
Surprisal minimization relative to goal-directed alternatives best predicts speakers' production choices in dialogue.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Utterance production is best explained as cost-sensitive choice in which cost is surprisal computed relative to goal-directed alternatives that realize a fixed communicative intent. These alternatives, together with goal-agnostic alternatives based only on contextual plausibility, are produced by language models. Under both deterministic and probabilistic cost minimization, this surprisal account supplies the strongest prediction of observed production choices, while uniform information density and length-based costs exhibit weaker and less consistent performance.
What carries the argument
Surprisal minimization over LM-generated goal-directed alternatives, which serves as the cost measure that separates speaker-oriented from listener-oriented interpretations of production choice.
Load-bearing premise
The language-model-generated goal-directed and goal-agnostic alternative sets accurately reflect the contextual alternatives available to speakers and listeners in naturalistic dialogue.
What would settle it
A new dialogue dataset in which minimizing utterance length or uniform information density predicts observed choices more accurately than minimizing surprisal over goal-directed alternatives.
Figures
read the original abstract
We model utterance production as probabilistic cost-sensitive choice over contextual alternatives, using information-theoretic notions of cost. We distinguish between goal-directed alternatives that realise a fixed communicative intent and goal-agnostic alternatives defined only by contextual plausibility, allowing us to derive speaker- and listener-oriented interpretations of different cost measures. We present a procedure to generate both types of alternative sets using language models. Analysing production choices in open-ended dialogue under both deterministic and probabilistic cost minimisation, we find that surprisal minimisation relative to goal-directed alternatives provides the strongest predictive account under both analyses. By contrast, uniform information density and length-based costs exhibit weaker and less consistent predictive power across conditions. More broadly, our study suggests that alternative-conditioned optimisation with LM-generated alternatives provides a principled framework for studying speaker and listener pressures in naturalistic language production.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper models utterance production in dialogue as probabilistic cost-sensitive choice over contextual alternatives generated by language models. It distinguishes goal-directed alternatives (realising a fixed communicative intent) from goal-agnostic alternatives (defined by contextual plausibility), derives speaker- and listener-oriented interpretations of cost measures, and reports that surprisal minimisation over goal-directed alternatives provides the strongest predictive account of production choices under both deterministic and probabilistic analyses, outperforming uniform information density and length-based costs. The work proposes LM-generated alternative-conditioned optimisation as a framework for studying pressures in naturalistic dialogue production.
Significance. If the result holds after addressing the modelling assumptions, the paper offers a principled, scalable method for incorporating contextual alternatives into information-theoretic models of production. The distinction between goal-directed and goal-agnostic sets, combined with the use of LMs for generation, could help unify speaker- and listener-oriented accounts and provide falsifiable predictions for dialogue data. The approach is innovative in its application to open-ended dialogue and could influence future work on cost-sensitive choice in language use.
major comments (2)
- [Methods section on alternative set generation] Methods, alternative generation procedure: The central claim that surprisal minimisation over goal-directed alternatives outperforms other costs requires that the LM-generated sets faithfully proxy the alternatives speakers and listeners actually entertain. The manuscript describes the generation procedure (fixing intent and sampling realisations) but reports no human validation, rating study, or sensitivity analysis comparing LM outputs to human pragmatic alternatives. Without this, the comparative predictive superiority may reflect LM-specific biases rather than production pressures.
- [Results section] Results, predictive comparisons: The abstract and results state that surprisal minimisation provides the strongest account under both analyses, yet the manuscript must include explicit details on sample sizes, exclusion criteria, statistical tests for model comparisons, and confidence intervals or error bars on the metrics. These are load-bearing for evaluating whether the evidence supports the superiority claim over uniform information density and length costs.
minor comments (2)
- The abstract would benefit from a brief mention of the specific predictive metrics (e.g., accuracy, likelihood) and sample characteristics to allow readers to assess the comparative claims without immediately consulting the full methods.
- [Introduction or Methods] Notation for the cost measures and alternative sets should be clarified in the main text or a table to distinguish speaker-oriented vs. listener-oriented interpretations more explicitly.
Simulated Author's Rebuttal
Thank you for the constructive review and for recognizing the potential of our framework. We address each major comment below with planned revisions to improve clarity and transparency while preserving the core contributions.
read point-by-point responses
-
Referee: Methods, alternative generation procedure: The central claim that surprisal minimisation over goal-directed alternatives outperforms other costs requires that the LM-generated sets faithfully proxy the alternatives speakers and listeners actually entertain. The manuscript describes the generation procedure (fixing intent and sampling realisations) but reports no human validation, rating study, or sensitivity analysis comparing LM outputs to human pragmatic alternatives. Without this, the comparative predictive superiority may reflect LM-specific biases rather than production pressures.
Authors: We agree that the absence of direct human validation leaves open the possibility that LM-specific biases influence the results. Our generation procedure is designed to produce controlled, intent-conditioned alternatives at scale for open-ended dialogue, following precedents in computational models of pragmatics. In revision we will add a sensitivity analysis varying the number of alternatives sampled and the decoding temperature, plus an explicit subsection in Methods discussing the proxy assumptions, potential biases, and how the same alternative sets are used uniformly across all compared cost measures. This preserves relative comparisons even if absolute fidelity to human alternatives is imperfect. A full human rating study is not feasible within the current revision timeline but will be noted as a limitation. revision: partial
-
Referee: Results, predictive comparisons: The abstract and results state that surprisal minimisation provides the strongest account under both analyses, yet the manuscript must include explicit details on sample sizes, exclusion criteria, statistical tests for model comparisons, and confidence intervals or error bars on the metrics. These are load-bearing for evaluating whether the evidence supports the superiority claim over uniform information density and length costs.
Authors: We accept this point. The original manuscript reported the primary metrics but omitted some supporting statistical details. We will revise the Results section to state the exact sample size (number of utterances retained after preprocessing), the exclusion criteria applied to the dialogue corpus, the model-comparison procedures (including any likelihood-ratio or information-criterion tests), and to add confidence intervals or bootstrapped error bars on all reported metrics. These additions will make the superiority claims fully evaluable. revision: yes
Circularity Check
No significant circularity; derivation is an external empirical comparison.
full rationale
The paper generates goal-directed and goal-agnostic alternative sets via a fixed LM procedure, then compares the predictive power of different cost functions (including surprisal minimisation over goal-directed alternatives) against observed production choices in dialogue data under deterministic and probabilistic choice models. This constitutes a standard out-of-sample model comparison where the alternatives and costs are computed independently of any parameters fitted to the target production data itself. No equations, self-citations, or definitions are presented that reduce the central claim to a tautology or to a fit on the same data. The approach remains self-contained against the external benchmark of human production choices.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Entropy Rate Constancy in Text
Entropy Rate Constancy in Text , author=. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics , month=. 2002 , address=. doi:10.3115/1073083.1073117 , pages=
-
[2]
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing , year=
Variation of Entropy and Parse Trees of Sentences as a Function of the Sentence Number , author=. Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing , year=
work page 2003
-
[3]
Rapid expectation adaptation during syntactic comprehension , author=. PloS one , volume=. 2013 , url=
work page 2013
-
[4]
The handbook of pragmatics , pages=
Context in dynamic interpretation , author=. The handbook of pragmatics , pages=. 2006 , url=
work page 2006
-
[5]
Language, Cognition and Neuroscience , volume=
Grammatical and information-structural influences on pronoun production , author=. Language, Cognition and Neuroscience , volume=. 2014 , url=
work page 2014
-
[6]
Levy, Roger and Jaeger, T. Florian , booktitle =. Speakers optimize information density through syntactic reduction , url =
- [7]
-
[8]
The Entropy Rate Principle as a Predictor of Processing Effort:
Keller, Frank , editor=. The Entropy Rate Principle as a Predictor of Processing Effort:. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing , month=. 2004 , address=
work page 2004
-
[9]
Entropy Converges Between Dialogue Participants:
Xu, Yang and Reitter, David , editor=. Entropy Converges Between Dialogue Participants:. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=. 2016 , address=. doi:10.18653/v1/P16-1051 , pages=
-
[10]
Spectral Analysis of Information Density in Dialogue Predicts Collaborative Task Performance , author=. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=. 2017 , address=. doi:10.18653/v1/P17-1058 , pages=
-
[11]
Yang, Zuhao and Yuan, Yingfang and Xu, Yang and Zhan, Shuo and Bai, Huajun and Chen, Kefan , journal=. 2023 , url=
work page 2023
-
[12]
Proceedings of the 25th Conference on Computational Natural Language Learning , month=
Analysing Human Strategies of Information Transmission as a Function of Discourse Context , author=. Proceedings of the 25th Conference on Computational Natural Language Learning , month=. 2021 , address=. doi:10.18653/v1/2021.conll-1.50 , pages=
-
[13]
Findings of the Association for Computational Linguistics: EMNLP 2023 , month=
Revisiting Entropy Rate Constancy in Text , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , month=. 2023 , address=. doi:10.18653/v1/2023.findings-emnlp.1039 , pages=
-
[14]
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , month =
Meister, Clara and Pimentel, Tiago and Haller, Patrick and J. Revisiting the. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , month=. 2021 , address=. doi:10.18653/v1/2021.emnlp-main.74 , pages=
- [15]
-
[16]
Transactions of the Association for Computational Linguistics , volume=
Clark, Thomas Hikaru and Meister, Clara and Pimentel, Tiago and Hahn, Michael and Cotterell, Ryan and Futrell, Richard and Levy, Roger , title=. Transactions of the Association for Computational Linguistics , volume=. 2023 , month=. doi:10.1162/tacl_a_00589 , url=
-
[17]
Finding structure in time , author=. Cognitive science , volume=. 1990 , url=
work page 1990
-
[18]
The Bell system technical journal , volume=
A mathematical theory of communication , author=. The Bell system technical journal , volume=. 1948 , url=
work page 1948
-
[19]
Proceedings of the Annual Meeting of the Cognitive Science Society , volume=
Topic shift in efficient discourse production , author=. Proceedings of the Annual Meeting of the Cognitive Science Society , volume=. 2011 , url=
work page 2011
-
[20]
Shared knowledge in natural conversations:
Ma. Shared knowledge in natural conversations:. 26th Conference on Computational Natural Language Learning (CoNLL) , pages=. 2022 , url=
work page 2022
-
[21]
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , month=
Is Information Density Uniform in Task-Oriented Dialogues? , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , month=. 2021 , address=. doi:10.18653/v1/2021.emnlp-main.652 , pages=
-
[22]
Proceedings of the 2015 Conference of the North
Shared common ground influences information density in microblog texts , author=. Proceedings of the 2015 Conference of the North. 2015 , address=. doi:10.3115/v1/N15-1182 , pages=
-
[23]
Behavioral and brain sciences , volume=
A theory of lexical access in speech production , author=. Behavioral and brain sciences , volume=. 1999 , url=
work page 1999
-
[24]
Aylett, Matthew and Turk, Alice , journal=. Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei , volume=. 2006 , url=
work page 2006
-
[25]
Effects of disfluencies, predictability, and utterance position on word form variation in
Bell, Alan and Jurafsky, Daniel and Fosler-Lussier, Eric and Girand, Cynthia and Gregory, Michelle and Gildea, Daniel , journal=. Effects of disfluencies, predictability, and utterance position on word form variation in. 2003 , url=
work page 2003
-
[26]
Jaeger, T. Florian , date-modified=. Redundancy and reduction:. Cognitive Psychology , number=. 2010 , url=
work page 2010
-
[27]
Frank, Austin F. and Jaeger, T. Florian , booktitle=. Speaking rationally:. 2008 , url=
work page 2008
-
[28]
Aylett, Matthew P , journal=. Stochastic suprasegmentals:. 1999 , url=
work page 1999
-
[29]
Journal of psycholinguistic research , volume=
Information density and dependency length as complementary cognitive models , author=. Journal of psycholinguistic research , volume=. 2014 , url=
work page 2014
-
[30]
Do dialogue representations align with perception? An empirical study , author=. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , month=. 2023 , address=. doi:10.18653/v1/2023.eacl-main.198 , pages=
- [31]
-
[32]
Braud, Chlo. Cross-lingual. Proceedings of the 15th Conference of the. 2017 , address=
work page 2017
-
[33]
William C. Mann and Sandra A. Thompson , pages=. Rhetorical Structure Theory:. Text - Interdisciplinary Journal for the Study of Discourse , doi=. 1988 , lastchecked=
work page 1988
-
[34]
Building a Discourse-Tagged Corpus in the Framework of
Carlson, Lynn and Marcu, Daniel and Okurovsky, Mary Ellen , booktitle=. Building a Discourse-Tagged Corpus in the Framework of. 2001 , url=
work page 2001
-
[35]
ISI Technical Report ISI-TR-545 , volume=
Discourse tagging reference manual , author=. ISI Technical Report ISI-TR-545 , volume=. 2001 , url=
work page 2001
-
[36]
Bowen Peng and Jeffrey Quesnelle and Honglu Fan and Enrico Shippole , booktitle=. Ya. 2024 , url=
work page 2024
-
[37]
Building a discourse-annotated Dutch text corpus , author=. S. Dipper and H. Zinsmeister (Eds.), Beyond Semantics, Bochumer Linguistische Arbeitsbericht , volume=. 2011 , url=
work page 2011
-
[38]
Multi-Layer Discourse Annotation of a
Redeker, Gisela and Berzl. Multi-Layer Discourse Annotation of a. Proceedings of the Eighth International Conference on Language Resources and Evaluation (. 2012 , address=
work page 2012
-
[39]
da Cunha, Iria and Torres-Moreno, Juan-Manuel and Sierra, Gerardo , editor=. On the Development of the. Proceedings of the 5th Linguistic Annotation Workshop , month=. 2011 , address=
work page 2011
-
[40]
Stede, Manfred , booktitle=. The. 2004 , address=
work page 2004
-
[41]
Proceedings of the Ninth International Conference on Language Resources and Evaluation (
Stede, Manfred and Neumann, Arne , editor=. Proceedings of the Ninth International Conference on Language Resources and Evaluation (. 2014 , address=
work page 2014
-
[42]
Bourgonje, Peter and Stede, Manfred , editor=. The. Proceedings of the Twelfth Language Resources and Evaluation Conference , month=. 2020 , address=
work page 2020
-
[43]
Iruskieta, Mikel and Aranzabe, Mar. The. Proceedings of the 4th Workshop RST and Discourse Studies. , pages=. 2013 , url=
work page 2013
-
[44]
Mikel Iruskieta and Arantza Diaz de Ilarraza and Mikel Lersundi , pages=. Establishing criteria for. Corpus Linguistics and Linguistic Theory , doi=. 2015 , lastchecked=
work page 2015
-
[45]
Cardoso, Paula CF and Maziero, Erick G and Jorge, Mara Luca Castro and Seno, Eloize MR and Di Felippo, Ariani and Rino, Lucia Helena Machado and Nunes, Maria das Gracas Volpe and Pardo, Thiago AS , booktitle=. 2011 , url=
work page 2011
-
[46]
and Hirst, Graeme and Pardo, Thiago A.S
Maziero, Erick G. and Hirst, Graeme and Pardo, Thiago A.S. , booktitle=. Adaptation of Discourse Parsing Models for the Portuguese Language , year=
- [47]
- [48]
- [49]
-
[50]
Pardo, Thiago Alexandre Salgueiro and Nunes, Maria das Gra. Rela. 2004 , journal=
work page 2004
- [51]
-
[52]
Memory and surprisal in human sentence comprehension , author=. Sentence processing , pages=. 2013 , publisher=
work page 2013
-
[53]
Futrell, Richard and Gibson, Edward and Levy, Roger P. , journal=. Lossy-Context Surprisal:. 2020 , url=
work page 2020
-
[54]
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics , month=
A Noisy-Channel Model for Document Compression , author=. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics , month=. 2002 , address=. doi:10.3115/1073083.1073159 , pages=
-
[55]
A Novel Discriminative Framework for Sentence-Level Discourse Analysis , author=. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning , month=. 2012 , address=
work page 2012
-
[56]
Proceedings of the Ninth Workshop on Innovative Use of
Surprisal as a Predictor of Essay Quality , author=. Proceedings of the Ninth Workshop on Innovative Use of. 2014 , address=. doi:10.3115/v1/W14-1807 , pages=
-
[57]
Snow, Erica L. and Allen, Laura K. and Jacovina, Matthew E. and Perret, Cecile A. and McNamara, Danielle S. , title=. 2015 , isbn=. doi:10.1145/2723576.2723592 , booktitle=
- [58]
-
[59]
Left-corner Transitions on Dependency Parsing , author=. Proceedings of. 2014 , address=
work page 2014
-
[60]
Eli Bingham and Jonathan P. Chen and Martin Jankowiak and Fritz Obermeyer and Neeraj Pradhan and Theofanis Karaletsos and Rohit Singh and Paul A. Szerlip and Paul Horsfall and Noah D. Goodman , title=. J. Mach. Learn. Res. , volume=. 2019 , url=
work page 2019
-
[61]
Predicting pragmatic reasoning in language games , author=. Science , volume=. 2012 , publisher=
work page 2012
-
[62]
Linguistic Data Consortium, Philadelphia , volume=
Treebank-3 , author=. Linguistic Data Consortium, Philadelphia , volume=. 1999 , url=
work page 1999
-
[63]
Towards Pragmatic Production Strategies for Natural Language Generation Tasks
Giulianelli, Mario. Towards Pragmatic Production Strategies for Natural Language Generation Tasks. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.544
-
[64]
Testing the Processing Hypothesis of word order variation using a probabilistic language model
Bloem, Jelke. Testing the Processing Hypothesis of word order variation using a probabilistic language model. Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity ( CL 4 LC ). 2016
work page 2016
-
[65]
Proceedings of the National Academy of Sciences , volume =
Richard Futrell , title =. Proceedings of the National Academy of Sciences , volume =. 2023 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.2220593120 , abstract =
-
[66]
Topics in Cognitive Science , volume =
Futrell, Richard , title =. Topics in Cognitive Science , volume =. doi:https://doi.org/10.1111/tops.12716 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/tops.12716 , abstract =
-
[67]
Construction Repetition Reduces Information Rate in Dialogue
Giulianelli, Mario and Sinclair, Arabella and Fern \'a ndez, Raquel. Construction Repetition Reduces Information Rate in Dialogue. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2022. doi:10.18...
-
[68]
Yee, Jun Sen and Giulianelli, Mario and Sinclair, Arabella J. Efficiency and Effectiveness in Task-Oriented Dialogue: On Construction Repetition, Information Rate, and Task Success. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024
work page 2024
- [69]
-
[70]
A Probabilistic E arley Parser as a Psycholinguistic Model
Hale, John. A Probabilistic E arley Parser as a Psycholinguistic Model. Second Meeting of the North A merican Chapter of the Association for Computational Linguistics. 2001
work page 2001
-
[71]
Nathaniel J. Smith and Roger Levy , keywords =. The effect of word predictability on reading time is logarithmic , journal =. 2013 , issn =. doi:https://doi.org/10.1016/j.cognition.2013.02.013 , url =
-
[72]
Roger Levy , keywords =. Expectation-based syntactic comprehension , journal =. 2008 , issn =. doi:https://doi.org/10.1016/j.cognition.2007.05.006 , url =
-
[73]
Brouwer, Harm and Fitz, Hartmut and Hoeks, John. Modeling the Noun Phrase versus Sentence Coordination Ambiguity in D utch: Evidence from Surprisal Theory. Proceedings of the 2010 Workshop on Cognitive Modeling and Computational Linguistics. 2010
work page 2010
-
[74]
Cho, Pyeong Whan and Lewis, Richard. A Modeling Study of the Effects of Surprisal and Entropy in Perceptual Decision Making of an Adaptive Agent. Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics. 2019. doi:10.18653/v1/W19-2906
-
[75]
Journal of Memory and Language , author =
Patrick Sturt and Martin J. Pickering and Matthew W. Crocker , abstract =. Structural Change and Reanalysis Difficulty in Language Comprehension , journal =. 1999 , issn =. doi:https://doi.org/10.1006/jmla.1998.2606 , url =
-
[76]
Sinclair, Arabella and Jumelet, Jaap and Zuidema, Willem and Fern \'a ndez, Raquel. Structural Persistence in Language Models: Priming as a Window into Abstract Language Representations. Transactions of the Association for Computational Linguistics. 2022. doi:10.1162/tacl_a_00504
-
[77]
Do Language Models Exhibit Human-like Structural Priming Effects?
Jumelet, Jaap and Zuidema, Willem and Sinclair, Arabella. Do Language Models Exhibit Human-like Structural Priming Effects?. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.877
-
[78]
Structural priming in humans and large language models , journal =. 2026 , issn =. doi:https://doi.org/10.1016/j.jml.2025.104713 , url =
-
[79]
Molnar, Aron and Jumelet, Jaap and Giulianelli, Mario and Sinclair, Arabella. Attribution and Alignment: Effects of Local Context Repetition on Utterance Production and Comprehension in Dialogue. Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL). 2023. doi:10.18653/v1/2023.conll-1.18
-
[80]
Journal of pharmacokinetics and biopharmaceutics , volume=
A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability , author=. Journal of pharmacokinetics and biopharmaceutics , volume=. 1987 , publisher=
work page 1987
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.