Efficient Elicitation of Collective Disagreements
Pith reviewed 2026-05-20 06:13 UTC · model grok-4.3
The pith
Many disagreement measures in voting require information from groups of three alternatives rather than pairwise comparisons alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that a plurality matrix capturing first-place probabilities within every subset of alternatives suffices to compute disagreement measures at their minimal level. In particular, they prove that measures like rank-variance and divisiveness sit at level 3, which means pairwise comparisons are insufficient to distinguish structural disagreement from noise, and they design elicitation methods to estimate this matrix with controlled participant numbers and cognitive demands.
What carries the argument
The plurality matrix, a table that for every subset S of alternatives gives the probability that each alternative in S is ranked first by a random voter within S, together with the level of a disagreement measure defined as the smallest subset size sufficient to express that measure.
If this is right
- Existing disagreement measures can be classified by their level to determine the minimal survey complexity required.
- Elicitation protocols can focus on subsets of size three to accurately compute level-3 measures without full rankings.
- The trade-off between participant count and question difficulty allows tailored survey designs for different applications.
- Theoretical analysis shows the value of higher-level information for capturing collective disagreement more precisely.
Where Pith is reading between the lines
- This approach could improve preference aggregation in AI systems by incorporating deeper disagreement signals.
- Social scientists might adopt triplet-based questions in large-scale surveys to get more reliable disagreement estimates.
- The framework might generalize to other collective choice problems like ranking aggregation or consensus finding.
Load-bearing premise
The plurality matrix entries for subsets of size three can be estimated reliably from sampled voter responses without systematic bias from question ordering or participant fatigue.
What would settle it
A controlled experiment where full rankings are collected from all participants and then compared to estimates from the proposed protocols to check if the computed disagreement measures match within statistical error.
Figures
read the original abstract
We analyze the structure of the disagreement among a population of voters over a set of alternatives. Surveys typically ask either for pairwise comparisons, simple and intuitive for participants, or full rankings over alternatives, eliciting the entire voters' preferences. Building on the observation that pairwise comparisons cannot distinguish structural disagreement from noise, we propose a stratified framework to identify the minimal aggregated preference information needed to compute a number of disagreement measures from the literature. Specifically, we introduce the plurality matrix, a generalization of pairwise comparisons that records, for every subset $S$ of alternatives, the probability that each $a \in S$ ranks first in $S$. We define the level of a disagreement measure as the smallest subset size needed to express it, showing that many existing notions, including rank-variance and divisiveness, sit at level $3$, proving that pairwise comparisons are not enough. In addition, we demonstrate the interest of going beyond level $3$ both theoretically and experimentally. To make these results actionable, we design two elicitation protocols to estimate the plurality matrix, exploring the trade-off between the number of required participants and the cognitive load requested to each of them.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the plurality matrix as a generalization of pairwise comparisons that records first-place probabilities for every subset S of alternatives. It defines the level of a disagreement measure as the smallest subset size needed to express the measure, shows that several existing measures (including rank-variance and divisiveness) are at level 3, and argues that pairwise data are therefore insufficient to distinguish structural disagreement from noise. The authors further demonstrate the value of information beyond level 3 both theoretically and experimentally, and design two elicitation protocols that trade off the number of participants against the cognitive load per participant.
Significance. If the central claims hold, the work provides a principled, minimal-information approach to preference elicitation for disagreement analysis in social choice and AI systems. The theoretical reduction of multiple measures to the plurality matrix for triples, together with the experimental validation on synthetic and real data, supplies a concrete tool for survey design that avoids both under-elicitation and unnecessary cognitive burden. The independence of the level definitions from fitted parameters is a notable strength.
major comments (2)
- [Elicitation protocol design and experimental validation sections] Elicitation protocol design and experimental validation sections: the claim that level-3 measures can be reliably recovered rests on the assumption that plurality-matrix entries for |S|=3 can be estimated from sampled rankings without systematic bias induced by question order or participant fatigue. No explicit diagnostic or robustness check for such bias is provided; any ordering or fatigue effect would render the computed level-3 values (and thus the distinction from noise) unreliable even if the theoretical reduction to the matrix is correct.
- [Experimental results] Experimental results: while synthetic and real-data experiments illustrate the elicitation trade-offs, the manuscript does not report full statistical tests (e.g., confidence intervals or hypothesis tests) for all protocol variants. This weakens the empirical support for the practical advantage of the proposed protocols over simpler baselines.
minor comments (2)
- [Definition of the plurality matrix] Notation for the plurality matrix could be clarified with an explicit small example showing how entries for |S|=2 recover ordinary pairwise probabilities and how |S|=3 entries extend them.
- [Introduction] A few sentences in the introduction repeat the motivation for moving beyond pairwise comparisons; tightening this paragraph would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We respond to each major comment below and describe the revisions we will incorporate to address the concerns raised.
read point-by-point responses
-
Referee: [Elicitation protocol design and experimental validation sections] Elicitation protocol design and experimental validation sections: the claim that level-3 measures can be reliably recovered rests on the assumption that plurality-matrix entries for |S|=3 can be estimated from sampled rankings without systematic bias induced by question order or participant fatigue. No explicit diagnostic or robustness check for such bias is provided; any ordering or fatigue effect would render the computed level-3 values (and thus the distinction from noise) unreliable even if the theoretical reduction to the matrix is correct.
Authors: We agree that the absence of explicit diagnostics for ordering or fatigue effects represents a gap in the current experimental validation. While the protocols described in the manuscript randomize question presentation and are explicitly designed to control cognitive load per participant, we did not report targeted robustness checks such as response-time correlations or cross-order comparisons. In the revised manuscript we will add these diagnostics, including an analysis of potential fatigue effects in the real-data experiments and a sensitivity check across different question orderings, to confirm that the recovered level-3 estimates remain stable. revision: yes
-
Referee: [Experimental results] Experimental results: while synthetic and real-data experiments illustrate the elicitation trade-offs, the manuscript does not report full statistical tests (e.g., confidence intervals or hypothesis tests) for all protocol variants. This weakens the empirical support for the practical advantage of the proposed protocols over simpler baselines.
Authors: We accept that the experimental results would be strengthened by systematic statistical reporting. The current version presents point estimates and qualitative comparisons but omits confidence intervals and formal hypothesis tests for the protocol variants. We will revise the experimental section to include bootstrap confidence intervals for all key metrics and paired statistical tests comparing each proposed protocol against the simpler baselines, thereby providing quantitative evidence for the reported trade-offs. revision: yes
Circularity Check
Theoretical levels defined via matrix expressions; independent of sampling estimates
full rationale
The paper defines the plurality matrix directly from first-place probabilities over subsets and defines the level of each disagreement measure as the minimal subset size k sufficient to express the measure using those matrix entries. It then shows algebraically that rank-variance and divisiveness require k=3. These steps are definitional and algebraic reductions from the original measure formulas; they do not depend on any fitted parameters or sampled data. The elicitation protocols are introduced afterward as a practical tool to recover the matrix entries, but the level-3 claim and the conclusion that pairwise comparisons are insufficient are established prior to and without reference to the estimation procedure. No self-citation chain, ansatz smuggling, or renaming of known results is used to support the core theoretical result. The derivation therefore remains self-contained against external benchmarks and receives only a minor score for the presence of any empirical component at all.
Axiom & Free-Parameter Ledger
free parameters (1)
- sample size per subset
axioms (1)
- domain assumption Voters possess complete transitive rankings over the full set of alternatives
Reference graph
Works this paper leans on
-
[1]
J. Alcalde-Unzu and M. V orsatz. Measuring the cohesiveness of preferences: an axiomatic analysis.Social Choice and Welfare, 41(4):965–988, 2013
work page 2013
-
[2]
M. Ammann and C. Puppe. Preference diversity.Review of Economic Design, 2025
work page 2025
-
[3]
K. J. Arrow, A. Sen, and K. Suzumura.Handbook of Social Choice and Welfare, volume 2. Elsevier, 2010
work page 2010
- [4]
-
[5]
A. Baujard, H. Igersheim, and T. Delemazure. V oter Autrement 2007 - Dataset of the In Situ Experiments. Paper available at https://hal.science/hal-04986968. Data available at10.5281/zenodo.14990025, 2025
work page 2007
-
[6]
D. Black. On the rationale of group decision-making.Journal of political economy, 56(1):23–34, 1948
work page 1948
-
[7]
N. Boehmer, P. Faliszewski, Ł. Janeczko, A. Kaczmarczyk, G. Lisowski, G. Pierczy ´nski, S. Rey, D. Stolicki, S. Szufa, and T. W ˛ as. Guide to numerical experiments on elections in computational social choice. InProceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI), 2024
work page 2024
-
[8]
S. J. Brams and M. R. Sanver.Voting Systems that Combine Approval and Preference. 2009
work page 2009
- [9]
- [10]
-
[11]
J. W. Bucklin. The grand junction plan of city government and its results.The Annals of the American Academy of Political and Social Science, 38(3):87–102, 1911
work page 1911
-
[12]
B. Can, A. I. Ozkes, and T. Storcken. Measuring polarization in preferences.Mathematical Social Sciences, 78:76–79, 2015
work page 2015
-
[13]
B. Can, A. I. Özkes, and T. Storcken. Generalized measures of polarization in preferences. Technical Report AMSE Working Paper 1734, Aix-Marseille School of Economics, 2017
work page 2017
-
[14]
X. Chen, Y . Li, and J. Mao. A nearly instance optimal algorithm for top-k ranking under the multinomial logit model. InProceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2018
work page 2018
- [15]
- [16]
- [17]
-
[18]
V . Conitzer and T. Sandholm. Communication complexity of common voting rules. InProceed- ings of the 6th ACM conference on Electronic commerce, 2005
work page 2005
-
[19]
A. H. Copeland. A ‘reasonable’ social welfare function. Mimeographed notes, Seminar on Applications of Mathematics to the Social Sciences, University of Michigan, Ann Arbor, 1951
work page 1951
- [20]
-
[21]
T. Delemazure, Ł. Janeczko, A. Kaczmarczyk, and S. Szufa. Selecting the most conflicting pair of candidates. InProceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI), 2024
work page 2024
-
[22]
L. Dery. Interactive and iterative peer assessment. InProceedings of the 27th European Conference on Artificial Intelligence (ECAI), 2024
work page 2024
-
[23]
J. Esteban and D. Ray. The measurement of polarization.Econometrica, 62:819–51, 02 1994
work page 1994
-
[24]
P. Faliszewski, A. Kaczmarczyk, K. Sornat, S. Szufa, and T. W ˛ as. Diversity, agreement, and polarization in elections. InProceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI), 2023
work page 2023
-
[25]
P. Faliszewski, K. Sornat, S. Szufa, and T. W ˛ as. Diversity of structured domains via k-Kemeny scores.Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 40, 2026
work page 2026
-
[26]
F. Fischer, O. Hudry, and R. Niedermeier. Weighted tournament solutions. In F. Brandt, V . Conitzer, U. Endriss, J. Lang, and A. D. Procaccia, editors,Handbook of Computational Social Choice, pages 85–102. Cambridge University Press, 2016
work page 2016
- [27]
-
[28]
J. Gaitonde, J. Kleinberg, and E. Tardos. Adversarial perturbations of opinion dynamics in networks. InProceedings of the 21st ACM Conference on Economics and Computation (EC), 2020
work page 2020
-
[29]
H. Gilbert, T. Portoleau, and O. Spanjaard. Beyond pairwise comparisons in social choice: A setwise Kemeny aggregation problem. InProceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), 2020. 11
work page 2020
-
[30]
D. Griffin, W. Liu, and U. Khan. A new look at constructed choice processes.Marketing Letters, 2005
work page 2005
-
[31]
D. Halpern, S. Hossain, and J. Tucker-Foltz. Computing voting rules with elicited incomplete votes. InProceedings of the 25th ACM Conference on Economics and Computation (EC), 2024
work page 2024
-
[32]
V . Hashemi and U. Endriss. Measuring diversity of preferences in a group. InProceedings of the 21st European Conference on Artificial Intelligence (ECAI), 2014
work page 2014
-
[33]
A. Karpov. Preference diversity orderings.Group Decision and Negotiation, 26(4):753–774, 2017
work page 2017
-
[34]
J. G. Kemeny. Mathematics without numbers.Daedalus, 88(4):577–591, 1959
work page 1959
-
[35]
M. G. Kendall and B. B. Smith. The problem of m rankings.The annals of Mathematical Statistics, 10(3):275–287, 1939
work page 1939
-
[36]
G. H. Kramer. A dynamical model of political equilibrium.Journal of Economic Theory, 16(2):310–334, 1977
work page 1977
-
[37]
R. D. Luce et al.Individual choice behavior, volume 4. Wiley New York, 1959
work page 1959
-
[38]
H. Moulin. On strategy-proofness and single peakedness.Public Choice, 35(4):437–455, 1980
work page 1980
- [39]
-
[40]
R. B. Myerson. Incentives to cultivate favored minorities under alternative electoral systems. American Political Science Review, 87(4):856–869, 1993
work page 1993
-
[41]
C. Navarrete, M. Macedo, R. Colley, J. Zhang, N. Ferrada, M. E. Mello, R. Lira, C. Bastos- Filho, U. Grandi, J. Lang, et al. Understanding political divisiveness using online participation data from the 2022 French and Brazilian presidential elections.Nature Human Behaviour, 8(1):137–148, 2024
work page 2022
-
[42]
R. L. Plackett. The analysis of permutations.Journal of the Royal Statistical Society Series C: Applied Statistics, 24(2):193–202, 1975
work page 1975
-
[43]
A. D. Procaccia. Thou shalt covet thy neighbor’s cake. InProceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), 2009
work page 2009
-
[44]
A. Saha and A. Gopalan. Active ranking with subset-wise preferences. InProceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019
work page 2019
-
[45]
M. Schulze. A new monotonic, clone-independent, reversal symmetric, and Condorcet- consistent single-winner election method.Social Choice and Welfare, 36(2):267–303, 2011
work page 2011
-
[46]
R. J. Serfling. Probability inequalities for the sum in sampling without replacement.The Annals of Statistics, pages 39–48, 1974
work page 1974
-
[47]
P. B. Simpson. On defining areas of voter choice: Professor Tullock on stable voting.The Quarterly Journal of Economics, 83(3):478–490, 1969
work page 1969
-
[48]
P. Slater. Inconsistencies in a schedule of paired comparisons.Biometrika, 48(3–4):303–312, 1961
work page 1961
-
[49]
C. T. Small, M. Bjorkegren, T. Erkkilä, L. Shaw, and C. Megill. Polis: Scaling deliberation by mapping high dimensional opinion spaces.Recerca: Revista de Pensament i Anàlisi, 26(2), 2021
work page 2021
- [50]
- [51]
-
[52]
Z. Terzopoulou. V oting with limited energy: A study of plurality and borda. InProceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2023
work page 2023
-
[53]
L. L. Thurstone. The method of paired comparisons for social values.The Journal of Abnormal and Social Psychology, 21(4):384–400, 1927
work page 1927
-
[54]
N. Tideman. The single transferable vote.Journal of Economic Perspectives, 9(1):27–38, 1995
work page 1995
-
[55]
T. N. Tideman. Independence of clones as a criterion for voting rules.Social Choice and Welfare, 4(3):185–206, 1987
work page 1987
-
[56]
J. Waldron.Law and disagreement. OUP Oxford, 1999. Appendix Table of Contents A Examples and omitted details 14 A.1 Bias in chain transitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 A.2 Hoeffding union bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 A.3 Achievable Pareto region . . . . . . . . . ....
work page 1999
-
[57]
Run a 3-chain on S={a, b, c} with ordering τ drawn uniformly from the6permutations, independently of the voter. For a voterσ, record Yca(σ, τ) = 1[c≻ σ a]if{a, c}is compared directly orw 3 ∈ {a, c}(transitive inference), ⊥otherwise, and estimatep ca byˆpca =P[Y ca = 1|Y ca ̸=⊥]. Enumerating all2×6cases: σA (w3 =a)σ B (w3 =b) τresolved?Y ca resolved?Y ca (...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.