pith. sign in

arxiv: 2605.25172 · v1 · pith:BBYYT6KPnew · submitted 2026-05-24 · 📊 stat.AP · cs.DL· cs.LG

Rejoinder: The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review

Pith reviewed 2026-06-29 23:32 UTC · model grok-4.3

classification 📊 stat.AP cs.DLcs.LG
keywords peer reviewstatistical estimationIsotonic MechanismICML 2023author self-assessmentgenerative AIequity concernsrejoinder
0
0 comments X

The pith

Rejoinder organizes defense of ICML 2023 ranking experiment around four themes

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This rejoinder addresses practical and theoretical points raised by discussants on the original paper about author self-assessment in ML/AI peer review. It structures the entire response around four core themes to defend the experimental approach. A sympathetic reader would care because the themes clarify how to treat peer review as an estimation task, handle fairness issues, add new signals, and adapt the process to generative AI.

Core claim

The authors address the discussants' points by organizing their response around four core themes: formulating peer review as a statistical estimation problem; mitigating equity and strategic concerns in the deployment of the Isotonic Mechanism; incorporating complementary signals such as reviewer rankings and structured metadata; and exploring a human-centered framework for peer review in the era of generative AI.

What carries the argument

The four core themes used to structure the rejoinder and address discussants' concerns.

If this is right

  • Peer review can be treated as a statistical estimation problem to improve ranking accuracy.
  • The Isotonic Mechanism can be deployed after adding mitigations for equity and strategic behavior.
  • Reviewer rankings and structured metadata can serve as useful complementary signals.
  • A human-centered framework can guide peer review adaptations in the presence of generative AI.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same four-theme structure might be reusable for rejoinders in other statistical studies of conference review processes.
  • Testing the equity mitigations in a follow-up experiment at a different conference would provide direct evidence of their effectiveness.
  • Integrating the human-centered AI framework could connect peer-review research to broader questions of automation in academic evaluation.

Load-bearing premise

That organizing the response around these four themes is sufficient to resolve the discussants' practical and theoretical concerns without requiring new empirical data, formal proofs, or direct rebuttals to specific counter-arguments.

What would settle it

A specific concern raised by one of the discussants that falls outside all four themes and is left unaddressed in the rejoinder.

Figures

Figures reproduced from arXiv: 2605.25172 by Aaron Roth, Buxin Su, Didong Li, Jianqing Fan, Jiayao Zhang, Kyunghyun Cho, Natalie Collina, Weijie Su, Yuling Yan.

Figure 1
Figure 1. Figure 1: MSE and MAE averaged over ICML 2023 authors who submitted rankings of the same [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
read the original abstract

This article is the rejoinder to ``The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review,'' to appear in the Journal of the American Statistical Association with discussion. To address the practical and theoretical points raised by the discussants, we organize our response around four core themes: (i) formulating peer review as a statistical estimation problem; (ii) mitigating equity and strategic concerns in the deployment of the Isotonic Mechanism; (iii) incorporating complementary signals such as reviewer rankings and structured metadata; and (iv) exploring a human-centered framework for peer review in the era of generative AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript is a rejoinder to the discussion of "The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review" (to appear in JASA). It states that the response to the discussants' practical and theoretical points is organized around four core themes: (i) formulating peer review as a statistical estimation problem; (ii) mitigating equity and strategic concerns in the deployment of the Isotonic Mechanism; (iii) incorporating complementary signals such as reviewer rankings and structured metadata; and (iv) exploring a human-centered framework for peer review in the era of generative AI.

Significance. If the thematic organization maps explicitly to and resolves the discussants' specific concerns, the rejoinder could usefully structure ongoing conversation on statistical framing and deployment issues in peer review. Its contribution is primarily organizational, however, as the abstract indicates no new empirical data, formal proofs, or direct rebuttals to individual counter-arguments.

major comments (1)
  1. [Abstract] Abstract: the claim that organizing the response around the four themes addresses the discussants' points assumes that broad thematic discussion is sufficient in place of targeted mapping to specific counter-arguments, new empirical results, or formal analysis; the abstract provides no indication that such mapping or substantiation occurs, leaving the central claim unsubstantiated.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the abstract. We address it directly below and agree that a revision to clarify the mapping would strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that organizing the response around the four themes addresses the discussants' points assumes that broad thematic discussion is sufficient in place of targeted mapping to specific counter-arguments, new empirical results, or formal analysis; the abstract provides no indication that such mapping or substantiation occurs, leaving the central claim unsubstantiated.

    Authors: We agree that the abstract could more explicitly signal how the four themes correspond to clusters of discussant comments. The full rejoinder text does organize responses to the practical and theoretical points raised, with each theme addressing groups of related concerns (e.g., statistical estimation framing covers modeling critiques; equity and strategic issues address deployment objections). However, the abstract itself does not detail this correspondence. We will revise the abstract to include a brief sentence noting that the themes are chosen to group and respond to specific classes of discussant feedback. As this is a rejoinder, we do not introduce new empirical data or formal proofs; the contribution remains organizational and synthetic. revision: yes

Circularity Check

0 steps flagged

No circularity: thematic rejoinder contains no derivations or fitted claims

full rationale

The paper is a rejoinder that organizes discussion around four listed themes without any equations, statistical predictions, parameter fitting, or derivation chains. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear, as the content is purely textual response to discussants rather than a mathematical or empirical claim that could reduce to its own inputs by construction. The structure is self-contained as a discussion piece and does not invoke uniqueness theorems or ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No new mathematical claims, parameters, or entities are introduced; the document is a discussion response relying on the prior experiment and discussant comments.

pith-pipeline@v0.9.1-grok · 5673 in / 1037 out tokens · 28021 ms · 2026-06-29T23:32:38.758676+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 5 canonical work pages

  1. [1]

    Aziz, H., Lev, O., Mattei, N., Rosenschein, J., and Walsh, T. (2016). Strategyproof peer selection: Mechanisms, analyses, and experiments. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 30

  2. [2]

    Down, A. (2025). Artificial intelligence research has a slop problem, academics say: `it's a mess'. The Guardian . Published December 6, 2025; last modified December 9, 2025

  3. [3]

    Goldberg, A., Stelmakh, I., Cho, K., Oh, A., Agarwal, A., Belgrave, D., and Shah, N. B. (2025). Peer reviews of peer reviews: A randomized controlled trial and other experiments. PloS one , 20(4):e0320444

  4. [4]

    He, C., Wang, F., and Zhu, L. (2026). Emerging knowledge trend in statistical research: A content-based analysis using covariate-assisted dynamic topic model. Journal of the American Statistical Association , pages 1--14

  5. [5]

    Kim, J., Lee, Y., and Lee, S. (2025). Position: T he AI conference peer review crisis demands author feedback and reviewer rewards. In International Conference on Machine Learning , pages 81634--81651. PMLR

  6. [6]

    Liang, W., Zhang, Y., Cao, H., Wang, B., Ding, D., Yang, X., Vodrahalli, K., He, S., Smith, D., Yin, Y., McFarland, D., and Zou, J. (2023). Can large language models provide useful feedback on research papers? a large-scale empirical analysis. arXiv preprint arXiv:2310.01783

  7. [7]

    Pearson, H., Ledford, H., Hutson, M., and Van Noorden, R. (2025). Exclusive: the most-cited papers of the twenty-first century. Nature , 640(8059):588--592

  8. [8]

    N., Liang, P., Vaughan, J

    Rastogi, C., Stelmakh, I., Beygelzimer, A., Dauphin, Y. N., Liang, P., Vaughan, J. W., Xue, Z., Daum \'e III, H., Pierson, E., and Shah, N. B. (2022). How do authors' perceptions of their papers compare with co-authors' perceptions and peer-review decisions? arXiv preprint arXiv:2211.12966

  9. [9]

    Shah, N., Tabibian, B., Muandet, K., Guyon, I., and Von Luxburg, U. (2018). Design and analysis of the NIPS 2016 review process. Journal of Machine Learning Research , 19:1--34

  10. [10]

    B., Singh, A., and Daum \'e III, H

    Stelmakh, I., Shah, N. B., Singh, A., and Daum \'e III, H. (2021). A novice-reviewer experiment to address scarcity of qualified reviewers in large conferences. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 35, pages 4785--4793

  11. [11]

    Su, B., Collina, N., Wen, G., Li, D., Cho, K., Fan, J., Zhao, B., and Su, W. (2025a). How to find fantastic AI papers: S elf-rankings as a powerful predictor of scientific impact beyond peer review. arXiv preprint arXiv:2510.02143

  12. [12]

    Su, B., Zhang, J., Collina, N., Yan, Y., Li, D., Cho, K., Fan, J., Roth, A., and Su, W. (2025b). The ICML 2023 ranking experiment: E xamining author self-assessment in ML/AI peer review. Journal of the American Statistical Association , pages 1--12

  13. [13]

    Su, W. (2026). You are the best reviewer of your own papers: T he isotonic mechanism. Operations Research , 74(2):804--824

  14. [14]

    Su, W. J. (2021). You are the best reviewer of your own papers: A n owner-assisted scoring mechanism. Advances in Neural Information Processing Systems , 34:27929--27939

  15. [15]

    G., Su, B., Collina, N., Deng, Z., and Su, W

    Wen, G. G., Su, B., Collina, N., Deng, Z., and Su, W. (2026). Recommending best paper awards for ML/AI conferences via the isotonic mechanism. arXiv preprint arXiv:2601.15249

  16. [16]

    Wu, J., Xu, H., Guo, Y., and Su, W. J. (2023). An isotonic mechanism for overlapping ownership. arXiv preprint arXiv:2306.11154

  17. [17]

    Xu, Y., Jecmen, S., Song, Z., and Fang, F. (2023). A one-size-fits-all approach to improving randomness in paper assignment. Advances in Neural Information Processing Systems , 36:14445--14468

  18. [18]

    J., and Fan, J

    Yan, Y., Su, W. J., and Fan, J. (2025). Isotonic mechanism for exponential family estimation in machine learning peer review. Journal of the Royal Statistical Society Series B: Statistical Methodology , 87(5):1422--1456

  19. [19]

    Yuan, W., Liu, P., and Neubig, G. (2022). Can we automate scientific reviewing? Journal of Artificial Intelligence Research , 75:171--212