Generalized Priority-Aware Shapley Value

Kiljae Lee; Weijing Tang; Yuan Zhang; Ziqi Liu

arxiv: 2605.15018 · v1 · pith:XJGIKCYRnew · submitted 2026-05-14 · 💻 cs.LG · cs.AI

Generalized Priority-Aware Shapley Value

Kiljae Lee , Ziqi Liu , Weijing Tang , Yuan Zhang This is my paper

Pith reviewed 2026-06-30 21:30 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords Shapley valuepriority-aware valuationcooperative game theorycyclic graphsLLM ensemblesrandom order valuespreference graphs

0 comments

The pith

The generalized priority-aware Shapley value extends random-order valuation to arbitrary directed weighted graphs by penalizing order violations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GPASV to handle priority information in valuation tasks where preferences form cycles or have weights, as in aggregated human judgments or multi-criterion comparisons. Existing Shapley methods break on such data because they assume binary and acyclic priorities. GPASV modifies the random order approach so that violating an edge incurs a penalty instead of being impossible. This lets the method apply to practical cases like ranking LLMs from preference data. The authors provide axioms, algorithms, and demonstrate that different priority settings yield distinct results on the same dataset.

Core claim

We introduce the generalized priority-aware Shapley value (GPASV), a random order value defined on arbitrary directed weighted priority graphs, in which pairwise edges penalize rather than forbid order violations. GPASV covers a range of classical models as boundary cases. We establish GPASV through an axiomatic characterization, develop the associated computational methods, and introduce a priority sweeping diagnostic extending PASV's. We apply GPASV to LLM ensemble valuation on the cyclic Chatbot Arena preference graph, illustrating that priority-aware valuation is not a one-button operation: different balances of pairwise graph priority versus individual soft priority produce substantivel

What carries the argument

GPASV, defined as a random order value on directed weighted priority graphs where edges impose penalties on order violations.

If this is right

GPASV applies to cyclic graphs from human preferences without requiring acyclicity.
It recovers classical Shapley and priority-aware variants as special cases.
Computational methods exist for calculating the value.
A priority sweeping diagnostic helps analyze the impact of priorities.
Different weightings of graph priority versus soft priority lead to different ensemble valuations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach may extend valuation techniques to other domains with conflicting constraints, such as resource allocation with soft priorities.
Testing GPASV on additional cyclic datasets could reveal how penalty magnitudes affect fairness in attributions.
The method suggests that priority information in real data is inherently graded rather than absolute, which could influence how preferences are modeled in decision systems.

Load-bearing premise

That penalizing order violations within the random-order formulation correctly captures priority information on cyclic graphs without introducing structural inconsistencies or unintended biases in the resulting valuations.

What would settle it

Applying GPASV to the Chatbot Arena cyclic graph and checking whether the resulting LLM valuations shift substantially when the balance between graph priority and soft priority is altered would test whether the penalization mechanism accurately reflects the input priorities.

Figures

Figures reproduced from arXiv: 2605.15018 by Kiljae Lee, Weijing Tang, Yuan Zhang, Ziqi Liu.

**Figure 1.** Figure 1: Illustration of priority structures. Dashed arrows in (b): [35, 46] require special precedence structures. Shapley value [38] and its variants are widely used in machine learning to accredit content contributors, including training examples, data providers, features, or model components [16, 32, 40]. Their appeal lies in being both principled (uniquely determined by a small set of axioms) and model-agnos… view at source ↗

**Figure 2.** Figure 2: Simulation 1: mixing time, greedy (solid) [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Accuracy and acceleration by caching (case 2); full grids are in Appendix [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Surrogate-assisted methods (case 2). Full results [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Simulation 3: priority sweeping [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Top-valued models under different priority balances. Colors: paid vs open-source. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Group-sum GPASV for paid and opensource models across priority sweeps. The α sweep strengthens open-source node priority, the β sweep strengthens the preference graph priority, and the joint sweep increases both together. 2. α lifts the favored group uniformly. Along the β = 0 slice, the paid and open-source groupsum curves cross near α ≈ 2 and then separate sharply, with open-source climbing and paid co… view at source ↗

**Figure 8.** Figure 8: Example showing that cyclic GPASV does not reduce to PASV; (a) the support of the [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Practical mixing-time diagnostic for the adjacent-swap MH chain on GPASV. Solid curves [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 10.** Figure 10: Monte Carlo accuracy of the direct permutation estimator, reported as [PITH_FULL_IMAGE:figures/full_fig_p035_10.png] view at source ↗

**Figure 11.** Figure 11: Number of distinct coalition subsets evaluated by the direct permutation estimator, over [PITH_FULL_IMAGE:figures/full_fig_p035_11.png] view at source ↗

**Figure 12.** Figure 12: Final ARE under matched utility-evaluation budgets for the direct and the two surrogate [PITH_FULL_IMAGE:figures/full_fig_p036_12.png] view at source ↗

**Figure 13.** Figure 13: Priority sweeping results. Rows: mean forward position, SOU group-sum, and SOR [PITH_FULL_IMAGE:figures/full_fig_p037_13.png] view at source ↗

**Figure 14.** Figure 14: Per-player GPASV ψi(Uens) along the three sweeps, averaged over the 80 MT-Bench prompts. Left: α-only sweep (β = 0). Middle: β-only sweep (α = 0). Right: joint sweep (α = β). Solid lines are open-source models; dashed lines are paid (gpt-4, gpt-3.5-turbo, claude-v1, claude-instant-v1, palm-2). 41 [PITH_FULL_IMAGE:figures/full_fig_p041_14.png] view at source ↗

**Figure 15.** Figure 15: Per-player rank heatmaps over the supported [PITH_FULL_IMAGE:figures/full_fig_p042_15.png] view at source ↗

**Figure 16.** Figure 16: Top-8-valued models across all 19 priority regimes. Rows: temperature values (baseline (0, 0) together with {1, 2, 4, 8, 16, 32}). Columns: α-only sweep (β = 0), β-only sweep (α = 0), joint sweep (α = β). Bar color encodes paid vs. open-source, matching the color scheme of [PITH_FULL_IMAGE:figures/full_fig_p043_16.png] view at source ↗

read the original abstract

Shapley value and its priority-aware extensions are widely used for valuation in machine learning, but existing methods require pairwise priority to be binary and acyclic, a restriction spectacularly violated in real-data examples such as aggregated human preferences and multi-criterion comparisons. We introduce the generalized priority-aware Shapley value (GPASV), a random order value defined on arbitrary directed weighted priority graphs, in which pairwise edges penalize rather than forbid order violations. GPASV covers a range of classical models as boundary cases. We establish GPASV through an axiomatic characterization, develop the associated computational methods, and introduce a priority sweeping diagnostic extending PASV's. We apply GPASV to LLM ensemble valuation on the cyclic Chatbot Arena preference graph, illustrating that priority-aware valuation is not a one-button operation: different balances of pairwise graph priority versus individual soft priority produce substantively different valuations of the same data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper generalizes priority-aware Shapley to arbitrary weighted directed graphs by penalizing order violations, which addresses a real data restriction, though cycle consistency still needs verification.

read the letter

The main takeaway is that this work relaxes the binary acyclic priority requirement in prior Shapley extensions by defining a random-order value on weighted directed graphs where edges add penalties for violations. That directly targets cases like aggregated human preferences that routinely produce cycles.

It does a few things cleanly. The construction is positioned to recover classical models as boundary cases, and the Chatbot Arena application shows that varying the balance between graph priority and individual soft priority changes the resulting valuations in substantive ways. Extending the priority sweeping diagnostic adds a practical tool for users. The claim of an axiomatic characterization is the grounding step, and the abstract treats the penalization approach as the mechanism that makes the extension work.

The soft spots sit in the details of that penalization on cycles. The random-order formulation with penalties could introduce sampling-dependent effects or non-transitive interactions when cycles are present and weights are non-binary; the stress-test note flags exactly this as the least secure step. Without the explicit axioms, the derivation, or quantitative checks on uniqueness and bias in the full text, it is hard to confirm the values remain well-defined and stable. The abstract states the characterization exists, but the lack of derivation or error analysis keeps soundness low for now.

This is for people working on attribution and valuation methods in machine learning who deal with preference data. A reader focused on Shapley variants or ensemble evaluation would get concrete value from the generalization and the example. It deserves a serious referee because the idea targets a genuine practical limitation even if the cyclic case requires closer scrutiny on consistency.

Referee Report

3 major / 2 minor

Summary. The paper introduces the generalized priority-aware Shapley value (GPASV) as a random-order value on arbitrary directed weighted priority graphs. Pairwise edges impose additive or multiplicative penalties on order violations rather than hard constraints. The work claims an axiomatic characterization that recovers classical Shapley and priority-aware models as boundary cases, supplies associated computational methods and a priority-sweeping diagnostic, and applies the construction to LLM ensemble valuation on the cyclic Chatbot Arena preference graph, demonstrating sensitivity to the balance parameter between graph priority and individual soft priority.

Significance. If the axiomatic characterization is rigorous and the penalization mechanism yields well-defined, cycle-consistent valuations without sampling-dependent bias, the result would materially extend priority-aware valuation beyond the binary acyclic restriction that currently limits applicability to aggregated preferences and multi-criterion data. The explicit demonstration that different balance parameters produce substantively different rankings on the same data set is a useful practical contribution. No machine-checked proofs or fully reproducible code artifacts are mentioned.

major comments (3)

[§3] §3 (definition of GPASV): the random-order formulation with pairwise penalties is asserted to remain well-defined and axiomatically unique on graphs containing cycles and non-binary weights, yet the manuscript supplies neither an explicit expression for the penalized permutation probability nor a proof that the resulting value is independent of the sampling procedure used to approximate it. This is load-bearing for the central claim that GPASV extends classical models without introducing structural inconsistencies.
[§4] §4 (axiomatic characterization): the uniqueness theorem is stated but the derivation is not reproduced; it is therefore impossible to verify whether the chosen axioms continue to pin down a unique value once the penalty function is permitted to be non-monotonic across cycles. A concrete counter-example or invariance check on a small cyclic graph would be required.
[§5] §5 (computational methods): the Monte-Carlo estimator for the penalized random-order value is described at a high level; no variance bound or convergence rate is given for the cyclic case, leaving open the possibility that the reported LLM valuations are dominated by sampling noise rather than by the priority signal.

minor comments (2)

The abstract refers to 'an axiomatic characterization' without naming the axioms; a one-sentence list in the abstract would improve readability.
Figure captions for the Chatbot Arena results should explicitly state the numerical values of the balance parameter used in each panel.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments and recommendation for major revision. We address each major comment below and will incorporate the requested clarifications and additions into the revised manuscript.

read point-by-point responses

Referee: [§3] §3 (definition of GPASV): the random-order formulation with pairwise penalties is asserted to remain well-defined and axiomatically unique on graphs containing cycles and non-binary weights, yet the manuscript supplies neither an explicit expression for the penalized permutation probability nor a proof that the resulting value is independent of the sampling procedure used to approximate it. This is load-bearing for the central claim that GPASV extends classical models without introducing structural inconsistencies.

Authors: We agree that an explicit expression for the penalized permutation probability and a proof of sampling independence are required. In the revision we will supply the closed-form expression (the probability of a permutation is proportional to the product over all pairs of the penalty factor raised to the indicator of an order violation) and prove that the resulting expectation is independent of any particular Monte-Carlo sampling scheme because it coincides with the unique solution of the axiomatic system. revision: yes
Referee: [§4] §4 (axiomatic characterization): the uniqueness theorem is stated but the derivation is not reproduced; it is therefore impossible to verify whether the chosen axioms continue to pin down a unique value once the penalty function is permitted to be non-monotonic across cycles. A concrete counter-example or invariance check on a small cyclic graph would be required.

Authors: The full derivation of the uniqueness theorem will be reproduced in an appendix. We will also add an explicit invariance verification on a directed 3-cycle with non-monotonic penalties, confirming that the axioms continue to characterize a unique value. revision: yes
Referee: [§5] §5 (computational methods): the Monte-Carlo estimator for the penalized random-order value is described at a high level; no variance bound or convergence rate is given for the cyclic case, leaving open the possibility that the reported LLM valuations are dominated by sampling noise rather than by the priority signal.

Authors: We will augment §5 with a variance bound (Hoeffding-type, using the boundedness of the penalty function) and a convergence-rate statement that applies directly to graphs containing cycles. This will quantify the number of samples needed to ensure the reported valuations are not dominated by Monte-Carlo noise. revision: yes

Circularity Check

0 steps flagged

No circularity: axiomatic characterization provides independent grounding for the new definition.

full rationale

The paper introduces GPASV via an explicit random-order construction on weighted directed graphs with penalization of violations, then separately establishes it through axiomatic characterization. No equations reduce the output to a fitted input or self-referential quantity by construction. No load-bearing self-citations or uniqueness theorems imported from the authors' prior work are referenced in the provided text. The coverage of classical models as boundary cases is presented as a consequence of the definition rather than a renaming or smuggling of an ansatz. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central addition is the GPASV definition itself; the abstract indicates reliance on standard Shapley axioms extended by the penalization rule and on the random-order representation.

free parameters (1)

balance parameter between graph priority and individual soft priority
Abstract states that different balances produce substantively different valuations on the same data, implying a tunable weighting factor.

axioms (1)

domain assumption Standard Shapley axioms continue to hold under the penalization extension to weighted directed graphs
The axiomatic characterization is invoked to establish the new value.

pith-pipeline@v0.9.1-grok · 5681 in / 1233 out tokens · 32858 ms · 2026-06-30T21:30:06.290554+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 7 canonical work pages · 3 internal anchors

[1]

J. F. Banzhaf III. Weighted voting doesn’t work: A mathematical analysis.Rutgers L. Rev., 19: 317, 1964

1964
[2]

Black et al

D. Black et al. The theory of committees and elections. 1958

1958
[3]

R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons.Biometrika, 39(3/4):324–345, 1952

1952
[4]

Bubley and M

R. Bubley and M. Dyer. Faster random generation of linear extensions.Discrete mathematics, 201(1-3):81–88, 1999

1999
[5]

G. T. Cantwell and C. Moore. Belief propagation for permutations, rankings, and partial orders. Physical Review E, 105(5):L052303, 2022

2022
[6]

Castro, D

J. Castro, D. Gómez, and J. Tejada. Polynomial calculation of the shapley value based on sampling.Computers & operations research, 36(5):1726–1730, 2009

2009
[7]

Chiang, L

W.-L. Chiang, L. Zheng, Y . Sheng, A. N. Angelopoulos, T. Li, D. Li, B. Zhu, H. Zhang, M. Jordan, J. E. Gonzalez, et al. Chatbot arena: An open platform for evaluating llms by human preference. InForty-first International Conference on Machine Learning, 2024

2024
[8]

J. A. M. N. C. Condorcet et al. Essai sur l’application de l’analyse à la probabilité des décisions, rendues à la pluralité des voix/| c par m. le marquis de condorcet..., 1785
[9]

J. Derks. A new proof for weber’s characterization of the random order values.Mathematical Social Sciences, 49(3):327–334, 2005

2005
[10]

Dubey, A

P. Dubey, A. Neyman, and R. J. Weber. Value theory without efficiency.Mathematics of Operations Research, 6(1):122–128, 1981

1981
[11]

Faigle and W

U. Faigle and W. Kern. The shapley value for cooperative games under precedence constraints. International Journal of Game Theory, 21(3):249–266, 1992

1992
[12]

P. C. Fishburn. Condorcet social choice functions.SIAM Journal on applied Mathematics, 33 (3):469–489, 1977

1977
[13]

C. Frye, C. Rowat, and I. Feige. Asymmetric shapley values: incorporating causal knowledge into model-agnostic explainability.Advances in neural information processing systems, 33: 1229–1239, 2020

2020
[14]

An Odd Estimator for Shapley Values

F. Fumagalli, L. Butler, J. S. Kang, K. Ramchandran, and R. T. Witter. An odd estimator for shapley values.arXiv preprint arXiv:2602.01399, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[15]

Fumagalli, R

F. Fumagalli, R. T. Witter, and C. Musco. PolySHAP: Extending kernelSHAP with interaction- informed polynomial regression. InThe Fourteenth International Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id=M19J8UGguq. 10

2026
[16]

Ghorbani and J

A. Ghorbani and J. Zou. Data shapley: Equitable valuation of data for machine learning. In International conference on machine learning, pages 2242–2251. PMLR, 2019

2019
[17]

M. E. Glickman. Parameter estimation in large dynamic paired comparison experiments.Journal of the Royal Statistical Society Series C: Applied Statistics, 48(3):377–394, 1999

1999
[18]

Hesterberg

T. Hesterberg. Weighted average importance sampling and defensive mixture distributions. Technometrics, 37(2):185–194, 1995

1995
[19]

A. B. Kahn. Topological sorting of large networks.Communications of the ACM, 5(11): 558–562, 1962

1962
[20]

Kalai and D

E. Kalai and D. Samet. On weighted Shapley values.International journal of game theory, 16 (3):205–222, 1987

1987
[21]

Kangas, T

K. Kangas, T. Hankala, T. M. Niinimäki, and M. Koivisto. Counting linear extensions of sparse posets. InIJCAI, volume 16, pages 603–609, 2016

2016
[22]

Karzanov and L

A. Karzanov and L. Khachiyan. On the conductance of order markov chains.Order, 8(1):7–15, 1991

1991
[23]

A. Kong, J. S. Liu, and W. H. Wong. Sequential imputations and bayesian missing data problems. Journal of the American statistical association, 89(425):278–288, 1994

1994
[24]

W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. H. Yu, J. Gonzalez, H. Zhang, and I. Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th symposium on operating systems principles, pages 611–626, 2023

2023
[25]

Kwon and J

Y . Kwon and J. Zou. Beta shapley: a unified and noise-reduced data valuation framework for machine learning.arXiv preprint arXiv:2110.14049, 2021

work page arXiv 2021
[26]

K. Lee, Z. Liu, W. Tang, and Y . Zhang. Priority-aware shapley value.arXiv preprint arXiv:2602.09326, 2026

work page arXiv 2026
[27]

Li and Y

W. Li and Y . Yu. Robust data valuation with weighted banzhaf values.Advances in Neural Information Processing Systems, 36:60349–60383, 2023

2023
[28]

Li and Y

W. Li and Y . Yu. One sample fits all: Approximating all probabilistic values simultaneously and efficiently.Advances in Neural Information Processing Systems, 37:58309–58340, 2024

2024
[29]

K. Liu, Q. Long, Z. Shi, W. J. Su, and J. Xiao. Statistical impossibility and possibility of aligning llms with human preferences: From condorcet paradox to nash equilibrium.arXiv preprint arXiv:2503.10990, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Z. Liu, K. Lee, Y . Zhang, and W. Tang. First-order efficiency for probabilistic value estimation via a statistical viewpoint.arXiv preprint arXiv:2605.02827, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[31]

R. D. Luce et al.Individual choice behavior, volume 4. Wiley New York, 1959

1959
[32]

S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions.Advances in neural information processing systems, 30, 2017

2017
[33]

C. L. Mallows. Non-null ranking models. i.Biometrika, 44(1/2):114–130, 1957

1957
[34]

Neurips 2026 main track handbook, 2026

NeurIPS. Neurips 2026 main track handbook, 2026. URL https://neurips.cc/ Conferences/2026/MainTrackHandbook

2026
[35]

A. S. Nowak and T. Radzik. On axiomatizations of the weighted shapley values.Games and Economic Behavior, 8(2):389–405, 1995

1995
[36]

R. L. Plackett. The analysis of permutations.Journal of the Royal Statistical Society Series C: Applied Statistics, 24(2):193–202, 1975

1975
[37]

Qwen3.5: Towards native multimodal agents, February 2026

Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026. URL https: //qwen.ai/blog?id=qwen3.5

2026
[38]

L. S. Shapley. A value for n-person games.Contributions to the Theory of Games, 2, 1953

1953
[39]

Talvitie, T

T. Talvitie, T. M. Niinimäki, and M. Koivisto. The mixing of markov chains on linear extensions in practice. InIJCAI, pages 524–530, 2017

2017
[40]

J. T. Wang and R. Jia. Data Banzhaf: A robust data valuation framework for machine learning. InInternational Conference on Artificial Intelligence and Statistics, pages 6388–6421. PMLR, 2023. 11

2023
[41]

R. J. Weber.Probabilistic values for games, page 101–120. Cambridge University Press, 1988

1988
[42]

D. B. Wilson. Mixing times of lozenge tiling and card shuffling markov chains.The Annals of Applied Probability, 14(1):274–325, 2004

2004
[43]

R. T. Witter, Y . Liu, and C. Musco. Regression-adjusted monte carlo estimators for shapley values and probabilistic values. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URLhttps://openreview.net/forum?id=Qabko39AS5

2025
[44]

Y . Xu, L. Ruis, T. Rocktäschel, and R. Kirk. Investigating non-transitivity in llm-as-a-judge. arXiv preprint arXiv:2502.14074, 2025

work page arXiv 2025
[45]

Zheng, W.-L

L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

2023
[46]

λπt exp{−β·V ω(πt;S t)}P k∈St λk exp{−β·V ω(k;S t)} · X k∈St exp{−β·V ω(k;S t)} # ∝ nY t=1

X. Zheng, Y . Huang, X. Chang, R. Jia, and Y . Tan. Rethinking data value: Asymmetric data Shapley for structure-aware valuation in data markets and machine learning pipelines.arXiv preprint arXiv:2511.12863, 2025. 12 Computer Code Computer code is available at:https://github.com/KiljaeL/GPASV A Limitations A.1 Utility Calls The main computational cost of...

work page arXiv 2025
[52]

After synthesizing, output ONLY the final aggregated answer, with no extra commentary

Two-turn consistency: if the prompt is multi-turn, treat the turn-1 aggregated answer as the assistant’s previous message when producing the turn-2 aggregated answer. After synthesizing, output ONLY the final aggregated answer, with no extra commentary. [Question] {x1} <|The Start of Candidates|> [The Start of Candidate 1 Answer] {candidate answer 1} [The...
[53]

Follow the user’s requested format, constraints, and style exactly
[54]

You may add minimal connective wording for readability

Use only information explicitly stated in the candidates. You may add minimal connective wording for readability
[55]

If candidates disagree and you cannot resolve it from the candidates, omit the claim or mark it as uncertain
[56]

Remove redundancy and irrelevant parts; choose the clearest phrasing among candidates
[57]

If JSON/code/strict format is required, keep it valid and do not introduce new APIs/libraries not present in candidates
[58]

[[rating ]]

Two-turn consistency: if the prompt is multi-turn, treat the turn-1 aggregated answer as the assistant’s previous message when producing the turn-2 aggregated answer. After synthesizing, output ONLY the final aggregated answer, with no extra commentary. <|The Start of Previous Conversation with User|> ### User: {x1} ### Assistant: {y1S} ### User: {x2} <|T...

[1] [1]

J. F. Banzhaf III. Weighted voting doesn’t work: A mathematical analysis.Rutgers L. Rev., 19: 317, 1964

1964

[2] [2]

Black et al

D. Black et al. The theory of committees and elections. 1958

1958

[3] [3]

R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons.Biometrika, 39(3/4):324–345, 1952

1952

[4] [4]

Bubley and M

R. Bubley and M. Dyer. Faster random generation of linear extensions.Discrete mathematics, 201(1-3):81–88, 1999

1999

[5] [5]

G. T. Cantwell and C. Moore. Belief propagation for permutations, rankings, and partial orders. Physical Review E, 105(5):L052303, 2022

2022

[6] [6]

Castro, D

J. Castro, D. Gómez, and J. Tejada. Polynomial calculation of the shapley value based on sampling.Computers & operations research, 36(5):1726–1730, 2009

2009

[7] [7]

Chiang, L

W.-L. Chiang, L. Zheng, Y . Sheng, A. N. Angelopoulos, T. Li, D. Li, B. Zhu, H. Zhang, M. Jordan, J. E. Gonzalez, et al. Chatbot arena: An open platform for evaluating llms by human preference. InForty-first International Conference on Machine Learning, 2024

2024

[8] [8]

J. A. M. N. C. Condorcet et al. Essai sur l’application de l’analyse à la probabilité des décisions, rendues à la pluralité des voix/| c par m. le marquis de condorcet..., 1785

[9] [9]

J. Derks. A new proof for weber’s characterization of the random order values.Mathematical Social Sciences, 49(3):327–334, 2005

2005

[10] [10]

Dubey, A

P. Dubey, A. Neyman, and R. J. Weber. Value theory without efficiency.Mathematics of Operations Research, 6(1):122–128, 1981

1981

[11] [11]

Faigle and W

U. Faigle and W. Kern. The shapley value for cooperative games under precedence constraints. International Journal of Game Theory, 21(3):249–266, 1992

1992

[12] [12]

P. C. Fishburn. Condorcet social choice functions.SIAM Journal on applied Mathematics, 33 (3):469–489, 1977

1977

[13] [13]

C. Frye, C. Rowat, and I. Feige. Asymmetric shapley values: incorporating causal knowledge into model-agnostic explainability.Advances in neural information processing systems, 33: 1229–1239, 2020

2020

[14] [14]

An Odd Estimator for Shapley Values

F. Fumagalli, L. Butler, J. S. Kang, K. Ramchandran, and R. T. Witter. An odd estimator for shapley values.arXiv preprint arXiv:2602.01399, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[15] [15]

Fumagalli, R

F. Fumagalli, R. T. Witter, and C. Musco. PolySHAP: Extending kernelSHAP with interaction- informed polynomial regression. InThe Fourteenth International Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id=M19J8UGguq. 10

2026

[16] [16]

Ghorbani and J

A. Ghorbani and J. Zou. Data shapley: Equitable valuation of data for machine learning. In International conference on machine learning, pages 2242–2251. PMLR, 2019

2019

[17] [17]

M. E. Glickman. Parameter estimation in large dynamic paired comparison experiments.Journal of the Royal Statistical Society Series C: Applied Statistics, 48(3):377–394, 1999

1999

[18] [18]

Hesterberg

T. Hesterberg. Weighted average importance sampling and defensive mixture distributions. Technometrics, 37(2):185–194, 1995

1995

[19] [19]

A. B. Kahn. Topological sorting of large networks.Communications of the ACM, 5(11): 558–562, 1962

1962

[20] [20]

Kalai and D

E. Kalai and D. Samet. On weighted Shapley values.International journal of game theory, 16 (3):205–222, 1987

1987

[21] [21]

Kangas, T

K. Kangas, T. Hankala, T. M. Niinimäki, and M. Koivisto. Counting linear extensions of sparse posets. InIJCAI, volume 16, pages 603–609, 2016

2016

[22] [22]

Karzanov and L

A. Karzanov and L. Khachiyan. On the conductance of order markov chains.Order, 8(1):7–15, 1991

1991

[23] [23]

A. Kong, J. S. Liu, and W. H. Wong. Sequential imputations and bayesian missing data problems. Journal of the American statistical association, 89(425):278–288, 1994

1994

[24] [24]

W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. H. Yu, J. Gonzalez, H. Zhang, and I. Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th symposium on operating systems principles, pages 611–626, 2023

2023

[25] [25]

Kwon and J

Y . Kwon and J. Zou. Beta shapley: a unified and noise-reduced data valuation framework for machine learning.arXiv preprint arXiv:2110.14049, 2021

work page arXiv 2021

[26] [26]

K. Lee, Z. Liu, W. Tang, and Y . Zhang. Priority-aware shapley value.arXiv preprint arXiv:2602.09326, 2026

work page arXiv 2026

[27] [27]

Li and Y

W. Li and Y . Yu. Robust data valuation with weighted banzhaf values.Advances in Neural Information Processing Systems, 36:60349–60383, 2023

2023

[28] [28]

Li and Y

W. Li and Y . Yu. One sample fits all: Approximating all probabilistic values simultaneously and efficiently.Advances in Neural Information Processing Systems, 37:58309–58340, 2024

2024

[29] [29]

K. Liu, Q. Long, Z. Shi, W. J. Su, and J. Xiao. Statistical impossibility and possibility of aligning llms with human preferences: From condorcet paradox to nash equilibrium.arXiv preprint arXiv:2503.10990, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Z. Liu, K. Lee, Y . Zhang, and W. Tang. First-order efficiency for probabilistic value estimation via a statistical viewpoint.arXiv preprint arXiv:2605.02827, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[31] [31]

R. D. Luce et al.Individual choice behavior, volume 4. Wiley New York, 1959

1959

[32] [32]

S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions.Advances in neural information processing systems, 30, 2017

2017

[33] [33]

C. L. Mallows. Non-null ranking models. i.Biometrika, 44(1/2):114–130, 1957

1957

[34] [34]

Neurips 2026 main track handbook, 2026

NeurIPS. Neurips 2026 main track handbook, 2026. URL https://neurips.cc/ Conferences/2026/MainTrackHandbook

2026

[35] [35]

A. S. Nowak and T. Radzik. On axiomatizations of the weighted shapley values.Games and Economic Behavior, 8(2):389–405, 1995

1995

[36] [36]

R. L. Plackett. The analysis of permutations.Journal of the Royal Statistical Society Series C: Applied Statistics, 24(2):193–202, 1975

1975

[37] [37]

Qwen3.5: Towards native multimodal agents, February 2026

Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026. URL https: //qwen.ai/blog?id=qwen3.5

2026

[38] [38]

L. S. Shapley. A value for n-person games.Contributions to the Theory of Games, 2, 1953

1953

[39] [39]

Talvitie, T

T. Talvitie, T. M. Niinimäki, and M. Koivisto. The mixing of markov chains on linear extensions in practice. InIJCAI, pages 524–530, 2017

2017

[40] [40]

J. T. Wang and R. Jia. Data Banzhaf: A robust data valuation framework for machine learning. InInternational Conference on Artificial Intelligence and Statistics, pages 6388–6421. PMLR, 2023. 11

2023

[41] [41]

R. J. Weber.Probabilistic values for games, page 101–120. Cambridge University Press, 1988

1988

[42] [42]

D. B. Wilson. Mixing times of lozenge tiling and card shuffling markov chains.The Annals of Applied Probability, 14(1):274–325, 2004

2004

[43] [43]

R. T. Witter, Y . Liu, and C. Musco. Regression-adjusted monte carlo estimators for shapley values and probabilistic values. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URLhttps://openreview.net/forum?id=Qabko39AS5

2025

[44] [44]

Y . Xu, L. Ruis, T. Rocktäschel, and R. Kirk. Investigating non-transitivity in llm-as-a-judge. arXiv preprint arXiv:2502.14074, 2025

work page arXiv 2025

[45] [45]

Zheng, W.-L

L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

2023

[46] [46]

λπt exp{−β·V ω(πt;S t)}P k∈St λk exp{−β·V ω(k;S t)} · X k∈St exp{−β·V ω(k;S t)} # ∝ nY t=1

X. Zheng, Y . Huang, X. Chang, R. Jia, and Y . Tan. Rethinking data value: Asymmetric data Shapley for structure-aware valuation in data markets and machine learning pipelines.arXiv preprint arXiv:2511.12863, 2025. 12 Computer Code Computer code is available at:https://github.com/KiljaeL/GPASV A Limitations A.1 Utility Calls The main computational cost of...

work page arXiv 2025

[47] [52]

After synthesizing, output ONLY the final aggregated answer, with no extra commentary

Two-turn consistency: if the prompt is multi-turn, treat the turn-1 aggregated answer as the assistant’s previous message when producing the turn-2 aggregated answer. After synthesizing, output ONLY the final aggregated answer, with no extra commentary. [Question] {x1} <|The Start of Candidates|> [The Start of Candidate 1 Answer] {candidate answer 1} [The...

[48] [53]

Follow the user’s requested format, constraints, and style exactly

[49] [54]

You may add minimal connective wording for readability

Use only information explicitly stated in the candidates. You may add minimal connective wording for readability

[50] [55]

If candidates disagree and you cannot resolve it from the candidates, omit the claim or mark it as uncertain

[51] [56]

Remove redundancy and irrelevant parts; choose the clearest phrasing among candidates

[52] [57]

If JSON/code/strict format is required, keep it valid and do not introduce new APIs/libraries not present in candidates

[53] [58]

[[rating ]]

Two-turn consistency: if the prompt is multi-turn, treat the turn-1 aggregated answer as the assistant’s previous message when producing the turn-2 aggregated answer. After synthesizing, output ONLY the final aggregated answer, with no extra commentary. <|The Start of Previous Conversation with User|> ### User: {x1} ### Assistant: {y1S} ### User: {x2} <|T...