Rejoinder: The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review

Aaron Roth; Buxin Su; Didong Li; Jianqing Fan; Jiayao Zhang; Kyunghyun Cho; Natalie Collina; Weijie Su; Yuling Yan

arxiv: 2605.25172 · v1 · pith:BBYYT6KPnew · submitted 2026-05-24 · 📊 stat.AP · cs.DL· cs.LG

Rejoinder: The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review

Buxin Su , Jiayao Zhang , Natalie Collina , Yuling Yan , Didong Li , Kyunghyun Cho , Jianqing Fan , Aaron Roth

show 1 more author

Weijie Su

This is my paper

Pith reviewed 2026-06-29 23:32 UTC · model grok-4.3

classification 📊 stat.AP cs.DLcs.LG

keywords peer reviewstatistical estimationIsotonic MechanismICML 2023author self-assessmentgenerative AIequity concernsrejoinder

0 comments

The pith

Rejoinder organizes defense of ICML 2023 ranking experiment around four themes

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This rejoinder addresses practical and theoretical points raised by discussants on the original paper about author self-assessment in ML/AI peer review. It structures the entire response around four core themes to defend the experimental approach. A sympathetic reader would care because the themes clarify how to treat peer review as an estimation task, handle fairness issues, add new signals, and adapt the process to generative AI.

Core claim

The authors address the discussants' points by organizing their response around four core themes: formulating peer review as a statistical estimation problem; mitigating equity and strategic concerns in the deployment of the Isotonic Mechanism; incorporating complementary signals such as reviewer rankings and structured metadata; and exploring a human-centered framework for peer review in the era of generative AI.

What carries the argument

The four core themes used to structure the rejoinder and address discussants' concerns.

If this is right

Peer review can be treated as a statistical estimation problem to improve ranking accuracy.
The Isotonic Mechanism can be deployed after adding mitigations for equity and strategic behavior.
Reviewer rankings and structured metadata can serve as useful complementary signals.
A human-centered framework can guide peer review adaptations in the presence of generative AI.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same four-theme structure might be reusable for rejoinders in other statistical studies of conference review processes.
Testing the equity mitigations in a follow-up experiment at a different conference would provide direct evidence of their effectiveness.
Integrating the human-centered AI framework could connect peer-review research to broader questions of automation in academic evaluation.

Load-bearing premise

That organizing the response around these four themes is sufficient to resolve the discussants' practical and theoretical concerns without requiring new empirical data, formal proofs, or direct rebuttals to specific counter-arguments.

What would settle it

A specific concern raised by one of the discussants that falls outside all four themes and is left unaddressed in the rejoinder.

Figures

Figures reproduced from arXiv: 2605.25172 by Aaron Roth, Buxin Su, Didong Li, Jianqing Fan, Jiayao Zhang, Kyunghyun Cho, Natalie Collina, Weijie Su, Yuling Yan.

read the original abstract

This article is the rejoinder to ``The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review,'' to appear in the Journal of the American Statistical Association with discussion. To address the practical and theoretical points raised by the discussants, we organize our response around four core themes: (i) formulating peer review as a statistical estimation problem; (ii) mitigating equity and strategic concerns in the deployment of the Isotonic Mechanism; (iii) incorporating complementary signals such as reviewer rankings and structured metadata; and (iv) exploring a human-centered framework for peer review in the era of generative AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Rejoinder organizes replies to discussants around four themes but adds no new data, experiments, or formal analysis.

read the letter

The main takeaway is that this is a rejoinder to the ICML 2023 ranking experiment paper. It structures the authors' responses to the discussants around four themes but introduces no new empirical results, derivations, or validations.

The authors group their replies by framing peer review as a statistical estimation problem, addressing equity and strategic issues with the Isotonic Mechanism, suggesting complementary signals such as reviewer rankings and metadata, and considering a human-centered approach in the generative AI era. This organization is clear and helps connect their positions back to the original work.

The paper does a reasonable job of laying out their thinking on these points in one place. It shows they have considered the practical and theoretical angles raised in the discussion.

The soft spot is that the response relies on thematic discussion without new data or targeted evidence to back up how the themes resolve specific criticisms. If discussants pointed to concrete gaps in the original analysis, this format may leave those points open rather than closing them with additional substantiation.

There are no new equations or parameters, so issues like circularity or overfitting do not arise. Citations stay within the prior paper and the discussion.

This is for readers already following the original experiment and the JASA discussion on peer review. It is too narrow and incremental for a general audience or for someone seeking fresh methods.

I would not bring this to a reading group on its own. I would not cite it in my own work. As part of a journal discussion, though, it should go to peer review rather than desk rejection so the exchange can be properly recorded.

Referee Report

1 major / 0 minor

Summary. The manuscript is a rejoinder to the discussion of "The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review" (to appear in JASA). It states that the response to the discussants' practical and theoretical points is organized around four core themes: (i) formulating peer review as a statistical estimation problem; (ii) mitigating equity and strategic concerns in the deployment of the Isotonic Mechanism; (iii) incorporating complementary signals such as reviewer rankings and structured metadata; and (iv) exploring a human-centered framework for peer review in the era of generative AI.

Significance. If the thematic organization maps explicitly to and resolves the discussants' specific concerns, the rejoinder could usefully structure ongoing conversation on statistical framing and deployment issues in peer review. Its contribution is primarily organizational, however, as the abstract indicates no new empirical data, formal proofs, or direct rebuttals to individual counter-arguments.

major comments (1)

[Abstract] Abstract: the claim that organizing the response around the four themes addresses the discussants' points assumes that broad thematic discussion is sufficient in place of targeted mapping to specific counter-arguments, new empirical results, or formal analysis; the abstract provides no indication that such mapping or substantiation occurs, leaving the central claim unsubstantiated.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the abstract. We address it directly below and agree that a revision to clarify the mapping would strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that organizing the response around the four themes addresses the discussants' points assumes that broad thematic discussion is sufficient in place of targeted mapping to specific counter-arguments, new empirical results, or formal analysis; the abstract provides no indication that such mapping or substantiation occurs, leaving the central claim unsubstantiated.

Authors: We agree that the abstract could more explicitly signal how the four themes correspond to clusters of discussant comments. The full rejoinder text does organize responses to the practical and theoretical points raised, with each theme addressing groups of related concerns (e.g., statistical estimation framing covers modeling critiques; equity and strategic issues address deployment objections). However, the abstract itself does not detail this correspondence. We will revise the abstract to include a brief sentence noting that the themes are chosen to group and respond to specific classes of discussant feedback. As this is a rejoinder, we do not introduce new empirical data or formal proofs; the contribution remains organizational and synthetic. revision: yes

Circularity Check

0 steps flagged

No circularity: thematic rejoinder contains no derivations or fitted claims

full rationale

The paper is a rejoinder that organizes discussion around four listed themes without any equations, statistical predictions, parameter fitting, or derivation chains. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear, as the content is purely textual response to discussants rather than a mathematical or empirical claim that could reduce to its own inputs by construction. The structure is self-contained as a discussion piece and does not invoke uniqueness theorems or ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No new mathematical claims, parameters, or entities are introduced; the document is a discussion response relying on the prior experiment and discussant comments.

pith-pipeline@v0.9.1-grok · 5673 in / 1037 out tokens · 28021 ms · 2026-06-29T23:32:38.758676+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 5 canonical work pages

[1]

Aziz, H., Lev, O., Mattei, N., Rosenschein, J., and Walsh, T. (2016). Strategyproof peer selection: Mechanisms, analyses, and experiments. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 30

2016
[2]

Down, A. (2025). Artificial intelligence research has a slop problem, academics say: `it's a mess'. The Guardian . Published December 6, 2025; last modified December 9, 2025

2025
[3]

Goldberg, A., Stelmakh, I., Cho, K., Oh, A., Agarwal, A., Belgrave, D., and Shah, N. B. (2025). Peer reviews of peer reviews: A randomized controlled trial and other experiments. PloS one , 20(4):e0320444

2025
[4]

He, C., Wang, F., and Zhu, L. (2026). Emerging knowledge trend in statistical research: A content-based analysis using covariate-assisted dynamic topic model. Journal of the American Statistical Association , pages 1--14

2026
[5]

Kim, J., Lee, Y., and Lee, S. (2025). Position: T he AI conference peer review crisis demands author feedback and reviewer rewards. In International Conference on Machine Learning , pages 81634--81651. PMLR

2025
[6]

Liang, W., Zhang, Y., Cao, H., Wang, B., Ding, D., Yang, X., Vodrahalli, K., He, S., Smith, D., Yin, Y., McFarland, D., and Zou, J. (2023). Can large language models provide useful feedback on research papers? a large-scale empirical analysis. arXiv preprint arXiv:2310.01783

work page arXiv 2023
[7]

Pearson, H., Ledford, H., Hutson, M., and Van Noorden, R. (2025). Exclusive: the most-cited papers of the twenty-first century. Nature , 640(8059):588--592

2025
[8]

N., Liang, P., Vaughan, J

Rastogi, C., Stelmakh, I., Beygelzimer, A., Dauphin, Y. N., Liang, P., Vaughan, J. W., Xue, Z., Daum \'e III, H., Pierson, E., and Shah, N. B. (2022). How do authors' perceptions of their papers compare with co-authors' perceptions and peer-review decisions? arXiv preprint arXiv:2211.12966

work page arXiv 2022
[9]

Shah, N., Tabibian, B., Muandet, K., Guyon, I., and Von Luxburg, U. (2018). Design and analysis of the NIPS 2016 review process. Journal of Machine Learning Research , 19:1--34

2018
[10]

B., Singh, A., and Daum \'e III, H

Stelmakh, I., Shah, N. B., Singh, A., and Daum \'e III, H. (2021). A novice-reviewer experiment to address scarcity of qualified reviewers in large conferences. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 35, pages 4785--4793

2021
[11]

Su, B., Collina, N., Wen, G., Li, D., Cho, K., Fan, J., Zhao, B., and Su, W. (2025a). How to find fantastic AI papers: S elf-rankings as a powerful predictor of scientific impact beyond peer review. arXiv preprint arXiv:2510.02143

work page arXiv
[12]

Su, B., Zhang, J., Collina, N., Yan, Y., Li, D., Cho, K., Fan, J., Roth, A., and Su, W. (2025b). The ICML 2023 ranking experiment: E xamining author self-assessment in ML/AI peer review. Journal of the American Statistical Association , pages 1--12

2023
[13]

Su, W. (2026). You are the best reviewer of your own papers: T he isotonic mechanism. Operations Research , 74(2):804--824

2026
[14]

Su, W. J. (2021). You are the best reviewer of your own papers: A n owner-assisted scoring mechanism. Advances in Neural Information Processing Systems , 34:27929--27939

2021
[15]

G., Su, B., Collina, N., Deng, Z., and Su, W

Wen, G. G., Su, B., Collina, N., Deng, Z., and Su, W. (2026). Recommending best paper awards for ML/AI conferences via the isotonic mechanism. arXiv preprint arXiv:2601.15249

work page arXiv 2026
[16]

Wu, J., Xu, H., Guo, Y., and Su, W. J. (2023). An isotonic mechanism for overlapping ownership. arXiv preprint arXiv:2306.11154

work page arXiv 2023
[17]

Xu, Y., Jecmen, S., Song, Z., and Fang, F. (2023). A one-size-fits-all approach to improving randomness in paper assignment. Advances in Neural Information Processing Systems , 36:14445--14468

2023
[18]

J., and Fan, J

Yan, Y., Su, W. J., and Fan, J. (2025). Isotonic mechanism for exponential family estimation in machine learning peer review. Journal of the Royal Statistical Society Series B: Statistical Methodology , 87(5):1422--1456

2025
[19]

Yuan, W., Liu, P., and Neubig, G. (2022). Can we automate scientific reviewing? Journal of Artificial Intelligence Research , 75:171--212

2022

[1] [1]

Aziz, H., Lev, O., Mattei, N., Rosenschein, J., and Walsh, T. (2016). Strategyproof peer selection: Mechanisms, analyses, and experiments. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 30

2016

[2] [2]

Down, A. (2025). Artificial intelligence research has a slop problem, academics say: `it's a mess'. The Guardian . Published December 6, 2025; last modified December 9, 2025

2025

[3] [3]

Goldberg, A., Stelmakh, I., Cho, K., Oh, A., Agarwal, A., Belgrave, D., and Shah, N. B. (2025). Peer reviews of peer reviews: A randomized controlled trial and other experiments. PloS one , 20(4):e0320444

2025

[4] [4]

He, C., Wang, F., and Zhu, L. (2026). Emerging knowledge trend in statistical research: A content-based analysis using covariate-assisted dynamic topic model. Journal of the American Statistical Association , pages 1--14

2026

[5] [5]

Kim, J., Lee, Y., and Lee, S. (2025). Position: T he AI conference peer review crisis demands author feedback and reviewer rewards. In International Conference on Machine Learning , pages 81634--81651. PMLR

2025

[6] [6]

Liang, W., Zhang, Y., Cao, H., Wang, B., Ding, D., Yang, X., Vodrahalli, K., He, S., Smith, D., Yin, Y., McFarland, D., and Zou, J. (2023). Can large language models provide useful feedback on research papers? a large-scale empirical analysis. arXiv preprint arXiv:2310.01783

work page arXiv 2023

[7] [7]

Pearson, H., Ledford, H., Hutson, M., and Van Noorden, R. (2025). Exclusive: the most-cited papers of the twenty-first century. Nature , 640(8059):588--592

2025

[8] [8]

N., Liang, P., Vaughan, J

Rastogi, C., Stelmakh, I., Beygelzimer, A., Dauphin, Y. N., Liang, P., Vaughan, J. W., Xue, Z., Daum \'e III, H., Pierson, E., and Shah, N. B. (2022). How do authors' perceptions of their papers compare with co-authors' perceptions and peer-review decisions? arXiv preprint arXiv:2211.12966

work page arXiv 2022

[9] [9]

Shah, N., Tabibian, B., Muandet, K., Guyon, I., and Von Luxburg, U. (2018). Design and analysis of the NIPS 2016 review process. Journal of Machine Learning Research , 19:1--34

2018

[10] [10]

B., Singh, A., and Daum \'e III, H

Stelmakh, I., Shah, N. B., Singh, A., and Daum \'e III, H. (2021). A novice-reviewer experiment to address scarcity of qualified reviewers in large conferences. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 35, pages 4785--4793

2021

[11] [11]

Su, B., Collina, N., Wen, G., Li, D., Cho, K., Fan, J., Zhao, B., and Su, W. (2025a). How to find fantastic AI papers: S elf-rankings as a powerful predictor of scientific impact beyond peer review. arXiv preprint arXiv:2510.02143

work page arXiv

[12] [12]

Su, B., Zhang, J., Collina, N., Yan, Y., Li, D., Cho, K., Fan, J., Roth, A., and Su, W. (2025b). The ICML 2023 ranking experiment: E xamining author self-assessment in ML/AI peer review. Journal of the American Statistical Association , pages 1--12

2023

[13] [13]

Su, W. (2026). You are the best reviewer of your own papers: T he isotonic mechanism. Operations Research , 74(2):804--824

2026

[14] [14]

Su, W. J. (2021). You are the best reviewer of your own papers: A n owner-assisted scoring mechanism. Advances in Neural Information Processing Systems , 34:27929--27939

2021

[15] [15]

G., Su, B., Collina, N., Deng, Z., and Su, W

Wen, G. G., Su, B., Collina, N., Deng, Z., and Su, W. (2026). Recommending best paper awards for ML/AI conferences via the isotonic mechanism. arXiv preprint arXiv:2601.15249

work page arXiv 2026

[16] [16]

Wu, J., Xu, H., Guo, Y., and Su, W. J. (2023). An isotonic mechanism for overlapping ownership. arXiv preprint arXiv:2306.11154

work page arXiv 2023

[17] [17]

Xu, Y., Jecmen, S., Song, Z., and Fang, F. (2023). A one-size-fits-all approach to improving randomness in paper assignment. Advances in Neural Information Processing Systems , 36:14445--14468

2023

[18] [18]

J., and Fan, J

Yan, Y., Su, W. J., and Fan, J. (2025). Isotonic mechanism for exponential family estimation in machine learning peer review. Journal of the Royal Statistical Society Series B: Statistical Methodology , 87(5):1422--1456

2025

[19] [19]

Yuan, W., Liu, P., and Neubig, G. (2022). Can we automate scientific reviewing? Journal of Artificial Intelligence Research , 75:171--212

2022