pith. sign in

arxiv: 2605.15018 · v1 · pith:XJGIKCYRnew · submitted 2026-05-14 · 💻 cs.LG · cs.AI

Generalized Priority-Aware Shapley Value

Pith reviewed 2026-06-30 21:30 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords Shapley valuepriority-aware valuationcooperative game theorycyclic graphsLLM ensemblesrandom order valuespreference graphs
0
0 comments X

The pith

The generalized priority-aware Shapley value extends random-order valuation to arbitrary directed weighted graphs by penalizing order violations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GPASV to handle priority information in valuation tasks where preferences form cycles or have weights, as in aggregated human judgments or multi-criterion comparisons. Existing Shapley methods break on such data because they assume binary and acyclic priorities. GPASV modifies the random order approach so that violating an edge incurs a penalty instead of being impossible. This lets the method apply to practical cases like ranking LLMs from preference data. The authors provide axioms, algorithms, and demonstrate that different priority settings yield distinct results on the same dataset.

Core claim

We introduce the generalized priority-aware Shapley value (GPASV), a random order value defined on arbitrary directed weighted priority graphs, in which pairwise edges penalize rather than forbid order violations. GPASV covers a range of classical models as boundary cases. We establish GPASV through an axiomatic characterization, develop the associated computational methods, and introduce a priority sweeping diagnostic extending PASV's. We apply GPASV to LLM ensemble valuation on the cyclic Chatbot Arena preference graph, illustrating that priority-aware valuation is not a one-button operation: different balances of pairwise graph priority versus individual soft priority produce substantivel

What carries the argument

GPASV, defined as a random order value on directed weighted priority graphs where edges impose penalties on order violations.

If this is right

  • GPASV applies to cyclic graphs from human preferences without requiring acyclicity.
  • It recovers classical Shapley and priority-aware variants as special cases.
  • Computational methods exist for calculating the value.
  • A priority sweeping diagnostic helps analyze the impact of priorities.
  • Different weightings of graph priority versus soft priority lead to different ensemble valuations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach may extend valuation techniques to other domains with conflicting constraints, such as resource allocation with soft priorities.
  • Testing GPASV on additional cyclic datasets could reveal how penalty magnitudes affect fairness in attributions.
  • The method suggests that priority information in real data is inherently graded rather than absolute, which could influence how preferences are modeled in decision systems.

Load-bearing premise

That penalizing order violations within the random-order formulation correctly captures priority information on cyclic graphs without introducing structural inconsistencies or unintended biases in the resulting valuations.

What would settle it

Applying GPASV to the Chatbot Arena cyclic graph and checking whether the resulting LLM valuations shift substantially when the balance between graph priority and soft priority is altered would test whether the penalization mechanism accurately reflects the input priorities.

Figures

Figures reproduced from arXiv: 2605.15018 by Kiljae Lee, Weijing Tang, Yuan Zhang, Ziqi Liu.

Figure 1
Figure 1. Figure 1: Illustration of prior￾ity structures. Dashed arrows in (b): [35, 46] require special precedence structures. Shapley value [38] and its variants are widely used in machine learning to accredit content contributors, including training exam￾ples, data providers, features, or model components [16, 32, 40]. Their appeal lies in being both principled (uniquely determined by a small set of axioms) and model-agnos… view at source ↗
Figure 2
Figure 2. Figure 2: Simulation 1: mixing time, greedy (solid) [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Accuracy and acceleration by caching (case 2); full grids are in Appendix [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Surrogate-assisted methods (case 2). Full results [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Simulation 3: priority sweeping [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Top-valued models under different priority balances. Colors: paid vs open-source. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Group-sum GPASV for paid and open￾source models across priority sweeps. The α sweep strengthens open-source node priority, the β sweep strengthens the preference graph priority, and the joint sweep increases both together. 2. α lifts the favored group uniformly. Along the β = 0 slice, the paid and open-source group￾sum curves cross near α ≈ 2 and then separate sharply, with open-source climbing and paid co… view at source ↗
Figure 8
Figure 8. Figure 8: Example showing that cyclic GPASV does not reduce to PASV; (a) the support of the [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Practical mixing-time diagnostic for the adjacent-swap MH chain on GPASV. Solid curves [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Monte Carlo accuracy of the direct permutation estimator, reported as [PITH_FULL_IMAGE:figures/full_fig_p035_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Number of distinct coalition subsets evaluated by the direct permutation estimator, over [PITH_FULL_IMAGE:figures/full_fig_p035_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Final ARE under matched utility-evaluation budgets for the direct and the two surrogate [PITH_FULL_IMAGE:figures/full_fig_p036_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Priority sweeping results. Rows: mean forward position, SOU group-sum, and SOR [PITH_FULL_IMAGE:figures/full_fig_p037_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Per-player GPASV ψi(Uens) along the three sweeps, averaged over the 80 MT-Bench prompts. Left: α-only sweep (β = 0). Middle: β-only sweep (α = 0). Right: joint sweep (α = β). Solid lines are open-source models; dashed lines are paid (gpt-4, gpt-3.5-turbo, claude-v1, claude-instant-v1, palm-2). 41 [PITH_FULL_IMAGE:figures/full_fig_p041_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Per-player rank heatmaps over the supported [PITH_FULL_IMAGE:figures/full_fig_p042_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Top-8-valued models across all 19 priority regimes. Rows: temperature values (baseline (0, 0) together with {1, 2, 4, 8, 16, 32}). Columns: α-only sweep (β = 0), β-only sweep (α = 0), joint sweep (α = β). Bar color encodes paid vs. open-source, matching the color scheme of [PITH_FULL_IMAGE:figures/full_fig_p043_16.png] view at source ↗
read the original abstract

Shapley value and its priority-aware extensions are widely used for valuation in machine learning, but existing methods require pairwise priority to be binary and acyclic, a restriction spectacularly violated in real-data examples such as aggregated human preferences and multi-criterion comparisons. We introduce the generalized priority-aware Shapley value (GPASV), a random order value defined on arbitrary directed weighted priority graphs, in which pairwise edges penalize rather than forbid order violations. GPASV covers a range of classical models as boundary cases. We establish GPASV through an axiomatic characterization, develop the associated computational methods, and introduce a priority sweeping diagnostic extending PASV's. We apply GPASV to LLM ensemble valuation on the cyclic Chatbot Arena preference graph, illustrating that priority-aware valuation is not a one-button operation: different balances of pairwise graph priority versus individual soft priority produce substantively different valuations of the same data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces the generalized priority-aware Shapley value (GPASV) as a random-order value on arbitrary directed weighted priority graphs. Pairwise edges impose additive or multiplicative penalties on order violations rather than hard constraints. The work claims an axiomatic characterization that recovers classical Shapley and priority-aware models as boundary cases, supplies associated computational methods and a priority-sweeping diagnostic, and applies the construction to LLM ensemble valuation on the cyclic Chatbot Arena preference graph, demonstrating sensitivity to the balance parameter between graph priority and individual soft priority.

Significance. If the axiomatic characterization is rigorous and the penalization mechanism yields well-defined, cycle-consistent valuations without sampling-dependent bias, the result would materially extend priority-aware valuation beyond the binary acyclic restriction that currently limits applicability to aggregated preferences and multi-criterion data. The explicit demonstration that different balance parameters produce substantively different rankings on the same data set is a useful practical contribution. No machine-checked proofs or fully reproducible code artifacts are mentioned.

major comments (3)
  1. [§3] §3 (definition of GPASV): the random-order formulation with pairwise penalties is asserted to remain well-defined and axiomatically unique on graphs containing cycles and non-binary weights, yet the manuscript supplies neither an explicit expression for the penalized permutation probability nor a proof that the resulting value is independent of the sampling procedure used to approximate it. This is load-bearing for the central claim that GPASV extends classical models without introducing structural inconsistencies.
  2. [§4] §4 (axiomatic characterization): the uniqueness theorem is stated but the derivation is not reproduced; it is therefore impossible to verify whether the chosen axioms continue to pin down a unique value once the penalty function is permitted to be non-monotonic across cycles. A concrete counter-example or invariance check on a small cyclic graph would be required.
  3. [§5] §5 (computational methods): the Monte-Carlo estimator for the penalized random-order value is described at a high level; no variance bound or convergence rate is given for the cyclic case, leaving open the possibility that the reported LLM valuations are dominated by sampling noise rather than by the priority signal.
minor comments (2)
  1. The abstract refers to 'an axiomatic characterization' without naming the axioms; a one-sentence list in the abstract would improve readability.
  2. Figure captions for the Chatbot Arena results should explicitly state the numerical values of the balance parameter used in each panel.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments and recommendation for major revision. We address each major comment below and will incorporate the requested clarifications and additions into the revised manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (definition of GPASV): the random-order formulation with pairwise penalties is asserted to remain well-defined and axiomatically unique on graphs containing cycles and non-binary weights, yet the manuscript supplies neither an explicit expression for the penalized permutation probability nor a proof that the resulting value is independent of the sampling procedure used to approximate it. This is load-bearing for the central claim that GPASV extends classical models without introducing structural inconsistencies.

    Authors: We agree that an explicit expression for the penalized permutation probability and a proof of sampling independence are required. In the revision we will supply the closed-form expression (the probability of a permutation is proportional to the product over all pairs of the penalty factor raised to the indicator of an order violation) and prove that the resulting expectation is independent of any particular Monte-Carlo sampling scheme because it coincides with the unique solution of the axiomatic system. revision: yes

  2. Referee: [§4] §4 (axiomatic characterization): the uniqueness theorem is stated but the derivation is not reproduced; it is therefore impossible to verify whether the chosen axioms continue to pin down a unique value once the penalty function is permitted to be non-monotonic across cycles. A concrete counter-example or invariance check on a small cyclic graph would be required.

    Authors: The full derivation of the uniqueness theorem will be reproduced in an appendix. We will also add an explicit invariance verification on a directed 3-cycle with non-monotonic penalties, confirming that the axioms continue to characterize a unique value. revision: yes

  3. Referee: [§5] §5 (computational methods): the Monte-Carlo estimator for the penalized random-order value is described at a high level; no variance bound or convergence rate is given for the cyclic case, leaving open the possibility that the reported LLM valuations are dominated by sampling noise rather than by the priority signal.

    Authors: We will augment §5 with a variance bound (Hoeffding-type, using the boundedness of the penalty function) and a convergence-rate statement that applies directly to graphs containing cycles. This will quantify the number of samples needed to ensure the reported valuations are not dominated by Monte-Carlo noise. revision: yes

Circularity Check

0 steps flagged

No circularity: axiomatic characterization provides independent grounding for the new definition.

full rationale

The paper introduces GPASV via an explicit random-order construction on weighted directed graphs with penalization of violations, then separately establishes it through axiomatic characterization. No equations reduce the output to a fitted input or self-referential quantity by construction. No load-bearing self-citations or uniqueness theorems imported from the authors' prior work are referenced in the provided text. The coverage of classical models as boundary cases is presented as a consequence of the definition rather than a renaming or smuggling of an ansatz. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central addition is the GPASV definition itself; the abstract indicates reliance on standard Shapley axioms extended by the penalization rule and on the random-order representation.

free parameters (1)
  • balance parameter between graph priority and individual soft priority
    Abstract states that different balances produce substantively different valuations on the same data, implying a tunable weighting factor.
axioms (1)
  • domain assumption Standard Shapley axioms continue to hold under the penalization extension to weighted directed graphs
    The axiomatic characterization is invoked to establish the new value.

pith-pipeline@v0.9.1-grok · 5681 in / 1233 out tokens · 32858 ms · 2026-06-30T21:30:06.290554+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 7 canonical work pages · 3 internal anchors

  1. [1]

    J. F. Banzhaf III. Weighted voting doesn’t work: A mathematical analysis.Rutgers L. Rev., 19: 317, 1964

  2. [2]

    Black et al

    D. Black et al. The theory of committees and elections. 1958

  3. [3]

    R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons.Biometrika, 39(3/4):324–345, 1952

  4. [4]

    Bubley and M

    R. Bubley and M. Dyer. Faster random generation of linear extensions.Discrete mathematics, 201(1-3):81–88, 1999

  5. [5]

    G. T. Cantwell and C. Moore. Belief propagation for permutations, rankings, and partial orders. Physical Review E, 105(5):L052303, 2022

  6. [6]

    Castro, D

    J. Castro, D. Gómez, and J. Tejada. Polynomial calculation of the shapley value based on sampling.Computers & operations research, 36(5):1726–1730, 2009

  7. [7]

    Chiang, L

    W.-L. Chiang, L. Zheng, Y . Sheng, A. N. Angelopoulos, T. Li, D. Li, B. Zhu, H. Zhang, M. Jordan, J. E. Gonzalez, et al. Chatbot arena: An open platform for evaluating llms by human preference. InForty-first International Conference on Machine Learning, 2024

  8. [8]

    J. A. M. N. C. Condorcet et al. Essai sur l’application de l’analyse à la probabilité des décisions, rendues à la pluralité des voix/| c par m. le marquis de condorcet..., 1785

  9. [9]

    J. Derks. A new proof for weber’s characterization of the random order values.Mathematical Social Sciences, 49(3):327–334, 2005

  10. [10]

    Dubey, A

    P. Dubey, A. Neyman, and R. J. Weber. Value theory without efficiency.Mathematics of Operations Research, 6(1):122–128, 1981

  11. [11]

    Faigle and W

    U. Faigle and W. Kern. The shapley value for cooperative games under precedence constraints. International Journal of Game Theory, 21(3):249–266, 1992

  12. [12]

    P. C. Fishburn. Condorcet social choice functions.SIAM Journal on applied Mathematics, 33 (3):469–489, 1977

  13. [13]

    C. Frye, C. Rowat, and I. Feige. Asymmetric shapley values: incorporating causal knowledge into model-agnostic explainability.Advances in neural information processing systems, 33: 1229–1239, 2020

  14. [14]

    An Odd Estimator for Shapley Values

    F. Fumagalli, L. Butler, J. S. Kang, K. Ramchandran, and R. T. Witter. An odd estimator for shapley values.arXiv preprint arXiv:2602.01399, 2026

  15. [15]

    Fumagalli, R

    F. Fumagalli, R. T. Witter, and C. Musco. PolySHAP: Extending kernelSHAP with interaction- informed polynomial regression. InThe Fourteenth International Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id=M19J8UGguq. 10

  16. [16]

    Ghorbani and J

    A. Ghorbani and J. Zou. Data shapley: Equitable valuation of data for machine learning. In International conference on machine learning, pages 2242–2251. PMLR, 2019

  17. [17]

    M. E. Glickman. Parameter estimation in large dynamic paired comparison experiments.Journal of the Royal Statistical Society Series C: Applied Statistics, 48(3):377–394, 1999

  18. [18]

    Hesterberg

    T. Hesterberg. Weighted average importance sampling and defensive mixture distributions. Technometrics, 37(2):185–194, 1995

  19. [19]

    A. B. Kahn. Topological sorting of large networks.Communications of the ACM, 5(11): 558–562, 1962

  20. [20]

    Kalai and D

    E. Kalai and D. Samet. On weighted Shapley values.International journal of game theory, 16 (3):205–222, 1987

  21. [21]

    Kangas, T

    K. Kangas, T. Hankala, T. M. Niinimäki, and M. Koivisto. Counting linear extensions of sparse posets. InIJCAI, volume 16, pages 603–609, 2016

  22. [22]

    Karzanov and L

    A. Karzanov and L. Khachiyan. On the conductance of order markov chains.Order, 8(1):7–15, 1991

  23. [23]

    A. Kong, J. S. Liu, and W. H. Wong. Sequential imputations and bayesian missing data problems. Journal of the American statistical association, 89(425):278–288, 1994

  24. [24]

    W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. H. Yu, J. Gonzalez, H. Zhang, and I. Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th symposium on operating systems principles, pages 611–626, 2023

  25. [25]

    Kwon and J

    Y . Kwon and J. Zou. Beta shapley: a unified and noise-reduced data valuation framework for machine learning.arXiv preprint arXiv:2110.14049, 2021

  26. [26]

    K. Lee, Z. Liu, W. Tang, and Y . Zhang. Priority-aware shapley value.arXiv preprint arXiv:2602.09326, 2026

  27. [27]

    Li and Y

    W. Li and Y . Yu. Robust data valuation with weighted banzhaf values.Advances in Neural Information Processing Systems, 36:60349–60383, 2023

  28. [28]

    Li and Y

    W. Li and Y . Yu. One sample fits all: Approximating all probabilistic values simultaneously and efficiently.Advances in Neural Information Processing Systems, 37:58309–58340, 2024

  29. [29]

    K. Liu, Q. Long, Z. Shi, W. J. Su, and J. Xiao. Statistical impossibility and possibility of aligning llms with human preferences: From condorcet paradox to nash equilibrium.arXiv preprint arXiv:2503.10990, 2025

  30. [30]

    Z. Liu, K. Lee, Y . Zhang, and W. Tang. First-order efficiency for probabilistic value estimation via a statistical viewpoint.arXiv preprint arXiv:2605.02827, 2026

  31. [31]

    R. D. Luce et al.Individual choice behavior, volume 4. Wiley New York, 1959

  32. [32]

    S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions.Advances in neural information processing systems, 30, 2017

  33. [33]

    C. L. Mallows. Non-null ranking models. i.Biometrika, 44(1/2):114–130, 1957

  34. [34]

    Neurips 2026 main track handbook, 2026

    NeurIPS. Neurips 2026 main track handbook, 2026. URL https://neurips.cc/ Conferences/2026/MainTrackHandbook

  35. [35]

    A. S. Nowak and T. Radzik. On axiomatizations of the weighted shapley values.Games and Economic Behavior, 8(2):389–405, 1995

  36. [36]

    R. L. Plackett. The analysis of permutations.Journal of the Royal Statistical Society Series C: Applied Statistics, 24(2):193–202, 1975

  37. [37]

    Qwen3.5: Towards native multimodal agents, February 2026

    Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026. URL https: //qwen.ai/blog?id=qwen3.5

  38. [38]

    L. S. Shapley. A value for n-person games.Contributions to the Theory of Games, 2, 1953

  39. [39]

    Talvitie, T

    T. Talvitie, T. M. Niinimäki, and M. Koivisto. The mixing of markov chains on linear extensions in practice. InIJCAI, pages 524–530, 2017

  40. [40]

    J. T. Wang and R. Jia. Data Banzhaf: A robust data valuation framework for machine learning. InInternational Conference on Artificial Intelligence and Statistics, pages 6388–6421. PMLR, 2023. 11

  41. [41]

    R. J. Weber.Probabilistic values for games, page 101–120. Cambridge University Press, 1988

  42. [42]

    D. B. Wilson. Mixing times of lozenge tiling and card shuffling markov chains.The Annals of Applied Probability, 14(1):274–325, 2004

  43. [43]

    R. T. Witter, Y . Liu, and C. Musco. Regression-adjusted monte carlo estimators for shapley values and probabilistic values. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URLhttps://openreview.net/forum?id=Qabko39AS5

  44. [44]

    Y . Xu, L. Ruis, T. Rocktäschel, and R. Kirk. Investigating non-transitivity in llm-as-a-judge. arXiv preprint arXiv:2502.14074, 2025

  45. [45]

    Zheng, W.-L

    L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

  46. [46]

    λπt exp{−β·V ω(πt;S t)}P k∈St λk exp{−β·V ω(k;S t)} · X k∈St exp{−β·V ω(k;S t)} # ∝ nY t=1

    X. Zheng, Y . Huang, X. Chang, R. Jia, and Y . Tan. Rethinking data value: Asymmetric data Shapley for structure-aware valuation in data markets and machine learning pipelines.arXiv preprint arXiv:2511.12863, 2025. 12 Computer Code Computer code is available at:https://github.com/KiljaeL/GPASV A Limitations A.1 Utility Calls The main computational cost of...

  47. [52]

    After synthesizing, output ONLY the final aggregated answer, with no extra commentary

    Two-turn consistency: if the prompt is multi-turn, treat the turn-1 aggregated answer as the assistant’s previous message when producing the turn-2 aggregated answer. After synthesizing, output ONLY the final aggregated answer, with no extra commentary. [Question] {x1} <|The Start of Candidates|> [The Start of Candidate 1 Answer] {candidate answer 1} [The...

  48. [53]

    Follow the user’s requested format, constraints, and style exactly

  49. [54]

    You may add minimal connective wording for readability

    Use only information explicitly stated in the candidates. You may add minimal connective wording for readability

  50. [55]

    If candidates disagree and you cannot resolve it from the candidates, omit the claim or mark it as uncertain

  51. [56]

    Remove redundancy and irrelevant parts; choose the clearest phrasing among candidates

  52. [57]

    If JSON/code/strict format is required, keep it valid and do not introduce new APIs/libraries not present in candidates

  53. [58]

    [[rating ]]

    Two-turn consistency: if the prompt is multi-turn, treat the turn-1 aggregated answer as the assistant’s previous message when producing the turn-2 aggregated answer. After synthesizing, output ONLY the final aggregated answer, with no extra commentary. <|The Start of Previous Conversation with User|> ### User: {x1} ### Assistant: {y1S} ### User: {x2} <|T...