ScreenSearch: Uncertainty-Aware OS Exploration

Justin Wagle; Michael Solodko

arxiv: 2605.16024 · v1 · pith:UARXUOVJnew · submitted 2026-05-15 · 💻 cs.AI

ScreenSearch: Uncertainty-Aware OS Exploration

Michael Solodko , Justin Wagle This is my paper

Pith reviewed 2026-05-20 18:12 UTC · model grok-4.3

classification 💻 cs.AI

keywords GUI agentsOS explorationpartial observabilityambiguity reductionscreen deduplicationPUCT searchdesktop applicationsexploration policies

0 comments

The pith

Ambiguity reduction alone does not suffice as an exploration objective for desktop GUI agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Desktop GUI agents encounter partial observability because visually similar screens can represent different workflow states, so actions that look locally good can produce very different outcomes. ScreenSearch tackles this by maintaining a shared deduplicated state graph built from structural screen features and by defining an ambiguity signal from how often matched actions lead to different next states. The system then uses this signal together with frontier rewards inside a PUCT graph-bandit to decide when to keep probing versus when to commit. Large-scale runs across eleven applications produce over a million screenshots and thirty thousand unique states, yet replay evaluations reveal a clear novelty-ambiguity trade-off: policies that cut ambiguity fastest often discover little new territory. The results indicate that state identity, proposal quality, and ambiguity-aware search must all be considered together.

Core claim

ScreenSearch combines structural screen retrieval and deduplication with an ambiguity-aware PUCT graph-bandit. The central ambiguity signal is matched-action outcome dispersion on the deduplicated state graph: when similar screens produce different next states under the same action signature, the state is treated as unresolved and scheduled for further probing. Across eleven desktop applications the system collects over one million screenshots and over thirty thousand deduplicated states. On a fixed replay-start slice, policies that reduce ambiguity quickly discover little frontier, showing that ambiguity reduction by itself is not a sufficient exploration objective.

What carries the argument

An ambiguity signal defined as matched-action outcome dispersion on a deduplicated state graph, used inside a PUCT graph-bandit to balance probing and committing.

Load-bearing premise

The ambiguity signal based on matched-action outcome dispersion accurately reflects true workflow uncertainty rather than noise from retrieval or execution.

What would settle it

Re-run the same exploration with a different screen-retrieval or deduplication method and check whether the novelty-ambiguity trade-off disappears and whether ambiguity-only policies then discover substantial new frontier states.

Figures

Figures reproduced from arXiv: 2605.16024 by Justin Wagle, Michael Solodko.

**Figure 1.** Figure 1: Complementary exploration signals: novelty expands coverage, while ambiguity reduction resolves aliased states before commitment. Let 𝐺𝑡 = (𝑉𝑡 , 𝐸𝑡) denote the global deduplicated state graph accumulated up to search time 𝑡, where 𝑉𝑡 is the set of discovered deduplicated states and 𝐸𝑡 is the set of observed labeled transitions. We represent a realized graph edge as a transition triple (𝑠, 𝜎, 𝑠′ ), where 𝜎 … view at source ↗

**Figure 2.** Figure 2: Screen representation components: shared discrete universes (left) and feature-set extraction [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: System overview of the data generation pipeline distributed across VMs. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: Ambiguity change on the same subset. Lower Δ𝑢𝑡 indicates stronger disambiguation, not greater frontier growth. We compare four reactive baselines that act directly from the current screen, and the uncertainty-guided PUCT graph-bandit with the default uniform prior. Every episode uses a fixed 50-action budget (steps 0–49 in the plots), starts from the same replayed state, and receives the same public inform… view at source ↗

**Figure 6.** Figure 6: Overview of screen similarity and retrieval process. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Cumulative number of unique states discovered over wall-clock time. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Per-step action decision time over total wall-clock time. [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Maximum fraction of changed pixels within each deduplicated screen state. [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

read the original abstract

Desktop GUI agents operate under partial observability: visually similar screens can correspond to different underlying workflow states, so locally plausible actions can lead to sharply different outcomes. We frame this as a problem of computer/OS state exploration, where effective behavior requires both expanding the reachable frontier and reducing ambiguity before committing. We present ScreenSearch, a system that combines structural screen retrieval and deduplication with an ambiguity-aware PUCT graph-bandit for large-scale desktop exploration. The retrieval layer converts UIA trees into location-aware structural features, indexes related screens through sparse token search and metadata filters, and maintains a shared deduplicated state graph across VM workers. On top of this graph, we define a scalable ambiguity signal based on matched-action outcome dispersion. If similar screens produce different next states under the same action signature, the state should be probed further rather than treated as resolved. We use this signal together with frontier rewards to drive large-scale exploration and replay-start policy evaluation over the shared graph. Across 11 desktop applications, ScreenSearch collects over 1M screenshots and over 30K deduplicated states, yielding large exploration corpora with substantial cross-application and within-application diversity. On a fixed replay-start slice, we observe a clear novelty--ambiguity trade-off: some policies reduce ambiguity quickly while discovering little frontier. Ambiguity reduction alone is therefore not a sufficient exploration objective. Appendix ablations show that stronger proposal priors can materially improve unique-state discovery during corpus building. These results suggest that state identity, proposal quality, and ambiguity-aware search all matter when deciding when to probe and when to commit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ScreenSearch builds a large GUI exploration corpus with a deduplicated graph and ambiguity signal, but the dispersion measure lacks validation against noise.

read the letter

I read the ScreenSearch paper on uncertainty-aware OS exploration for desktop GUI agents. The main takeaway is a working system that scales state collection using UIA tree structural retrieval, a shared deduplicated graph across VM workers, and an ambiguity-aware PUCT bandit driven by outcome dispersion on matched actions. They collected over a million screenshots and thirty thousand unique states from eleven applications, with noted cross-app and within-app diversity. The policy comparisons on a replay-start slice show a novelty-ambiguity trade-off, and the appendix ablations indicate that stronger proposal priors improve unique-state discovery. This gives a concrete pipeline and dataset for handling partial observability in GUI agents. The engineering integration stands out as the solid part: location-aware features from UIA trees, sparse search with metadata filters, and the graph that supports large-scale replay and exploration. The observation that you need to balance state identity, proposal quality, and the ambiguity signal rather than optimizing for ambiguity reduction alone is a reasonable pushback on simpler objectives. The soft spots center on the ambiguity signal and the evidence base. Dispersion is defined as variation in next states for the same action on similar screens, but the paper does not report checks for whether this tracks real workflow differences or instead reflects VM timing noise, async events, or imperfect deduplication. Without that grounding, the reported trade-off could be an artifact of the measurement. The results stay at corpus statistics and qualitative policy differences, with no quantitative baselines or error bars in the main claims. This work is aimed at researchers building desktop automation tools and GUI agents who need practical exploration methods and data. Readers focused on system building or dataset creation will get value from the retrieval details and scale. It deserves a serious referee because the implementation is concrete and the problem is relevant to current AI automation efforts. I recommend sending it to peer review with requests for validation experiments on the dispersion signal and tighter quantitative comparisons.

Referee Report

2 major / 2 minor

Summary. The paper introduces ScreenSearch, a system for large-scale desktop OS exploration under partial observability. It combines UIA-tree structural retrieval and deduplication with an ambiguity signal (matched-action outcome dispersion on a shared state graph) and an ambiguity-aware PUCT graph-bandit. The work reports collecting >1M screenshots and >30K deduplicated states across 11 applications, and empirically observes a novelty-ambiguity trade-off across policies on replay-start slices, concluding that ambiguity reduction alone is not a sufficient exploration objective and that state identity, proposal quality, and ambiguity-aware search must be jointly considered.

Significance. If the central empirical observations hold, the manuscript contributes a large-scale, deduplicated exploration corpus and concrete evidence that single-objective strategies are inadequate for GUI agents. The scale of data collection and the shared-graph architecture are clear strengths that could support follow-on work; the explicit trade-off demonstration is a useful falsifiable claim for the community.

major comments (2)

[Abstract (ambiguity signal paragraph) and §4 (method)] The ambiguity signal (described in the abstract as 'matched-action outcome dispersion on the deduplicated state graph') is load-bearing for the novelty-ambiguity trade-off claim. No explicit equation, pseudocode, or threshold definition is provided, nor is there validation that dispersion correlates with human-judged workflow differences rather than VM timing/async noise or retrieval artifacts. Without this, the trade-off could be an artifact of the measurement.
[Results / replay-start evaluation] Results on the fixed replay-start slice report a 'clear novelty-ambiguity trade-off' but supply no quantitative baselines (e.g., random walk, standard PUCT without ambiguity term), error bars, or statistical tests. This makes it impossible to judge the magnitude or reliability of the observed differences that support the multi-objective conclusion.

minor comments (2)

[Abstract] The abstract states 'over 1M screenshots and over 30K deduplicated states' but does not summarize per-application diversity metrics or deduplication false-positive rates.
[Method] Notation for the PUCT exploration constant and ambiguity threshold is introduced without a consolidated table of all free parameters.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the scale of the exploration corpus and the value of the observed trade-off. We address each major comment below and describe the revisions we will incorporate.

read point-by-point responses

Referee: [Abstract (ambiguity signal paragraph) and §4 (method)] The ambiguity signal (described in the abstract as 'matched-action outcome dispersion on the deduplicated state graph') is load-bearing for the novelty-ambiguity trade-off claim. No explicit equation, pseudocode, or threshold definition is provided, nor is there validation that dispersion correlates with human-judged workflow differences rather than VM timing/async noise or retrieval artifacts. Without this, the trade-off could be an artifact of the measurement.

Authors: We agree that an explicit formulation is required for clarity and reproducibility. In the revised manuscript we will add a formal definition of the ambiguity signal in §4, including the equation for outcome dispersion (computed as the entropy of the distribution over deduplicated successor states reached by identical action signatures from structurally matched screens) together with pseudocode for its incremental update on the shared graph. We will also state any thresholds applied during policy selection. The current work does not contain a human validation study correlating dispersion scores with workflow differences; we will add an explicit limitations paragraph acknowledging that distinguishing genuine state ambiguity from retrieval or timing artifacts would benefit from such validation in follow-on work. We note that the UIA-tree structural features and metadata filters were chosen precisely to reduce sensitivity to VM timing noise, and the trade-off appears consistently across 11 applications, but we accept that stronger empirical grounding for the signal itself is desirable. revision: partial
Referee: [Results / replay-start evaluation] Results on the fixed replay-start slice report a 'clear novelty-ambiguity trade-off' but supply no quantitative baselines (e.g., random walk, standard PUCT without ambiguity term), error bars, or statistical tests. This makes it impossible to judge the magnitude or reliability of the observed differences that support the multi-objective conclusion.

Authors: We agree that the absence of explicit baselines and reliability measures weakens the interpretability of the reported trade-off. In the revision we will augment the replay-start evaluation with direct comparisons to random-walk and standard (ambiguity-agnostic) PUCT policies on the same fixed slices. We will report means and standard deviations across multiple independent runs to supply error bars and will include statistical significance tests (paired t-tests) either in the main results table or in the appendix. These additions will allow readers to assess both the magnitude and reliability of the novelty-ambiguity differences. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical corpus statistics and policy comparisons

full rationale

The paper defines an ambiguity signal directly from observed matched-action outcome dispersion on the deduplicated state graph and reports large-scale empirical results (1M+ screenshots, 30K states) plus observed novelty-ambiguity trade-offs across policies. No equations, derivations, or self-citations are invoked that reduce a claimed prediction or uniqueness result to a fitted parameter or prior ansatz defined in terms of the target outcome. The central claim that multiple objectives matter rests on direct data comparisons rather than any self-referential construction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on standard graph-bandit machinery plus domain assumptions about UIA trees and VM replay; no new invented entities or heavily fitted parameters are introduced in the abstract.

free parameters (1)

PUCT exploration constant and ambiguity threshold
Standard bandit hyperparameters that must be chosen or tuned to produce the reported novelty-ambiguity trade-off.

axioms (2)

domain assumption UIA trees yield location-aware structural features sufficient to distinguish workflow states across applications
Invoked in the retrieval layer description as the basis for indexing and deduplication.
domain assumption Outcome dispersion under matched actions is a reliable proxy for state ambiguity
Central to the ambiguity signal definition and the decision to probe further.

pith-pipeline@v0.9.0 · 5809 in / 1398 out tokens · 50460 ms · 2026-05-20T18:12:44.702709+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We define a scalable ambiguity signal based on matched-action outcome dispersion... u(s) := ρ(s)D(s) + (1−ρ(s))u0
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ambiguity reduction alone is not a sufficient exploration objective

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages · 1 internal anchor

[2]

URLhttps://arxiv.org/abs/2204.01691. R. Bajcsy, Y. Aloimonos, and J. K. Tsotsos. Revisiting active perception.Autonomous Robots, 42 (2):177–196,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

URLhttps://doi.org/10.1007/ s10514-017-9615-3

doi: 10.1007/s10514-017-9615-3. URLhttps://doi.org/10.1007/ s10514-017-9615-3. Y. Burda, H. Edwards, A. J. Storkey, and O. Klimov. Exploration by random network distillation. In7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9,

work page doi:10.1007/s10514-017-9615-3 2019
[4]

URLhttps://dblp.org/rec/conf/ aaai/Chrisman92. X. Deng, Y. Gu, B. Zheng, S. Chen, S. Stevens, B. Wang, H. Sun, and Y. Su. Mind2web: Towards a generalist agent for the web. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing ...

work page 2023
[6]

URLhttps://arxiv.org/abs/2402.03610. E. Z. Liu, K. Guu, P. Pasupat, T. Shi, and P. Liang. Reinforcement learning on web interfaces using workflow-guided exploration. In6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net,

work page arXiv 2018
[7]

URLhttps://openreview.net/forum?id=ryTp3f-0-. D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell. Curiosity-driven exploration by self-supervised prediction. In D. Precup and Y. W. Teh, editors,Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Lear...

work page 2017
[8]

Sekar, O

R. Sekar, O. Rybkin, K. Daniilidis, P. Abbeel, D. Hafner, and D. Pathak. Planning to explore via self-supervised world models. InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 ofProceedings of Machine Learning Research, pages 8583–8592. PMLR,

work page 2020
[9]

press/v119/sekar20a.html

URLhttp://proceedings.mlr. press/v119/sekar20a.html. N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao. Reflexion: language agents with verbal reinforcement learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on NeuralInformationProcess...

work page 2023
[10]

URLhttps://dl.acm.org/doi/10.1145/3583068

doi: 10.1145/3583068. URLhttps://dl.acm.org/doi/10.1145/3583068. T. Xie, D. Zhang, J. Chen, X. Li, S. Zhao, R. Cao, T. J. Hua, Z. Cheng, D. Shin, F. Lei, Y. Liu, Y. Xu, S. Zhou, S. Savarese, C. Xiong, V. Zhong, and T. Yu. Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments. In A. Glober- sons, L. Mackey, D. Belgrave,...

work page doi:10.1145/3583068 2024
[11]

URL http://papers.nips.cc/paper_files/paper/ 2024/hash/5d413e48f84dc61244b6be550f1cd8f5-Abstract-Datasets_ and_Benchmarks_Track.html. S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao. React: Synergizing reasoning and acting in language models. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rw...

work page 2024
[12]

URL https://openreview.net/pdf?id=WE_vluYUL-X. A. Zhou, K. Yan, M. Shlapentokh-Rothman, H. Wang, and Y. Wang. Language agent tree search unifies reasoning, acting, and planning in language models. InForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,

work page 2024
[13]

URLhttps://openreview.net/forum?id=njwv9BsGHF

OpenReview.net, 2024a. URLhttps://openreview.net/forum?id=njwv9BsGHF. S.Zhou,F.F.Xu,H.Zhu,X.Zhou,R.Lo,A.Sridhar,X.Cheng,T.Ou,Y.Bisk,D.Fried,U.Alon,and G.Neubig. Webarena: Arealisticwebenvironmentforbuildingautonomousagents. InTheTwelfth InternationalConferenceonLearningRepresentations,ICLR2024,Vienna,Austria,May7-11,2024. OpenReview.net, 2024b. URLhttps:/...

work page 2024

[1] [2]

URLhttps://arxiv.org/abs/2204.01691. R. Bajcsy, Y. Aloimonos, and J. K. Tsotsos. Revisiting active perception.Autonomous Robots, 42 (2):177–196,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [3]

URLhttps://doi.org/10.1007/ s10514-017-9615-3

doi: 10.1007/s10514-017-9615-3. URLhttps://doi.org/10.1007/ s10514-017-9615-3. Y. Burda, H. Edwards, A. J. Storkey, and O. Klimov. Exploration by random network distillation. In7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9,

work page doi:10.1007/s10514-017-9615-3 2019

[3] [4]

URLhttps://dblp.org/rec/conf/ aaai/Chrisman92. X. Deng, Y. Gu, B. Zheng, S. Chen, S. Stevens, B. Wang, H. Sun, and Y. Su. Mind2web: Towards a generalist agent for the web. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing ...

work page 2023

[4] [6]

URLhttps://arxiv.org/abs/2402.03610. E. Z. Liu, K. Guu, P. Pasupat, T. Shi, and P. Liang. Reinforcement learning on web interfaces using workflow-guided exploration. In6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net,

work page arXiv 2018

[5] [7]

URLhttps://openreview.net/forum?id=ryTp3f-0-. D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell. Curiosity-driven exploration by self-supervised prediction. In D. Precup and Y. W. Teh, editors,Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Lear...

work page 2017

[6] [8]

Sekar, O

R. Sekar, O. Rybkin, K. Daniilidis, P. Abbeel, D. Hafner, and D. Pathak. Planning to explore via self-supervised world models. InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 ofProceedings of Machine Learning Research, pages 8583–8592. PMLR,

work page 2020

[7] [9]

press/v119/sekar20a.html

URLhttp://proceedings.mlr. press/v119/sekar20a.html. N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao. Reflexion: language agents with verbal reinforcement learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems 36: Annual Conference on NeuralInformationProcess...

work page 2023

[8] [10]

URLhttps://dl.acm.org/doi/10.1145/3583068

doi: 10.1145/3583068. URLhttps://dl.acm.org/doi/10.1145/3583068. T. Xie, D. Zhang, J. Chen, X. Li, S. Zhao, R. Cao, T. J. Hua, Z. Cheng, D. Shin, F. Lei, Y. Liu, Y. Xu, S. Zhou, S. Savarese, C. Xiong, V. Zhong, and T. Yu. Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments. In A. Glober- sons, L. Mackey, D. Belgrave,...

work page doi:10.1145/3583068 2024

[9] [11]

URL http://papers.nips.cc/paper_files/paper/ 2024/hash/5d413e48f84dc61244b6be550f1cd8f5-Abstract-Datasets_ and_Benchmarks_Track.html. S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao. React: Synergizing reasoning and acting in language models. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rw...

work page 2024

[10] [12]

URL https://openreview.net/pdf?id=WE_vluYUL-X. A. Zhou, K. Yan, M. Shlapentokh-Rothman, H. Wang, and Y. Wang. Language agent tree search unifies reasoning, acting, and planning in language models. InForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,

work page 2024

[11] [13]

URLhttps://openreview.net/forum?id=njwv9BsGHF

OpenReview.net, 2024a. URLhttps://openreview.net/forum?id=njwv9BsGHF. S.Zhou,F.F.Xu,H.Zhu,X.Zhou,R.Lo,A.Sridhar,X.Cheng,T.Ou,Y.Bisk,D.Fried,U.Alon,and G.Neubig. Webarena: Arealisticwebenvironmentforbuildingautonomousagents. InTheTwelfth InternationalConferenceonLearningRepresentations,ICLR2024,Vienna,Austria,May7-11,2024. OpenReview.net, 2024b. URLhttps:/...

work page 2024