pith. machine review for the scientific record. sign in

arxiv: 2604.22652 · v1 · submitted 2026-04-24 · 💻 cs.DB

Recognition: unknown

A dataset of early blockchain-registered AI agents on Ethereum

Yulin Liu

Pith reviewed 2026-05-08 08:55 UTC · model grok-4.3

classification 💻 cs.DB
keywords AI agentsEthereumERC-8004blockchain datasetdecentralized AIreputation systemson-chain dataagentic economy
0
0 comments X

The pith

The paper releases a structured dataset of 10,000 early AI agents registered on Ethereum under the ERC-8004 standard, integrating on-chain and off-chain records.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a structured dataset of 10,000 artificial intelligence agents registered on Ethereum using the ERC-8004 standard. The dataset combines on-chain identity records, minting transactions, transfer events, reputation summaries, and individual feedback with resolved off-chain metadata where available. Data were gathered from Ethereum mainnet through Web3 RPC queries and formatted into tables for easy analysis. It facilitates research into how agents form identities, build reputations, expose services, and participate in decentralized AI systems. The resource aids investigations in blockchain analytics, trust mechanisms, and the developing agentic economy.

Core claim

The authors assembled and released a dataset covering 10,000 agents within a defined block range on Ethereum mainnet. It includes both event-level records and aggregated summaries that merge on-chain identity records, minting transactions, transfer events, reputation summaries, feedback records, and resolved off-chain metadata. This structure enables empirical research on agent identity formation, reputation systems, service exposure, and early-stage decentralized AI ecosystems.

What carries the argument

The ERC-8004 standard for on-chain AI agent registration together with the tabular dataset that merges on-chain events and off-chain metadata for 10,000 agents.

If this is right

  • Researchers can examine patterns in agent identity formation and transfers using the event-level records.
  • Reputation systems can be studied through the provided summaries and individual feedback entries.
  • Service exposure and agent behavior in decentralized settings can be analyzed from the integrated records.
  • Broader examinations of blockchain analytics and trust infrastructure in early AI deployments become feasible with reproducible tabular data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Periodic updates to the dataset could track growth and evolution in agent registrations over additional blocks.
  • Cross-referencing with performance data from off-chain AI models might reveal links between on-chain reputation and actual agent capabilities.
  • Similar collections from other blockchains would permit comparisons of how decentralized AI agent ecosystems differ by platform.

Load-bearing premise

The Web3 RPC queries from Ethereum mainnet captured every relevant ERC-8004 agent and its associated metadata accurately within the specified block range.

What would settle it

A manual audit of Ethereum mainnet within the defined block range that finds ERC-8004 agents missing from the dataset or discrepancies in the recorded transactions, reputation data, or metadata.

read the original abstract

This study presents a structured dataset of blockchain-registered artificial intelligence agents under the ERC-8004 standard on Ethereum. The dataset integrates on-chain identity records, minting transactions, transfer events, reputation summaries, and individual feedback records, together with resolved off-chain metadata where available. Data were collected from Ethereum mainnet using Web3 RPC queries and processed into tabular form to enable reproducible analysis. The dataset covers 10,000 agents within a defined block range and includes both event-level records and aggregated summaries. It enables empirical research on agent identity formation, reputation systems, service exposure, and early-stage decentralized AI ecosystems. This resource supports studies in blockchain analytics, decentralized trust infrastructure, and the emerging agentic economy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper presents a structured dataset of 10,000 AI agents registered under the ERC-8004 standard on Ethereum. It integrates on-chain identity records, minting transactions, transfer events, reputation summaries, and feedback records with resolved off-chain metadata, collected from Ethereum mainnet via Web3 RPC queries within a defined block range and processed into tabular form to support analysis of agent identity formation, reputation systems, and decentralized AI ecosystems.

Significance. If the dataset is shown to be complete and accurate, it would provide a useful resource for empirical research in blockchain analytics, decentralized trust infrastructure, and the emerging agentic economy by enabling reproducible studies on on-chain identity and reputation. The integration of event-level and aggregated records is a positive aspect for facilitating such work.

major comments (2)
  1. [Abstract] Abstract: The description of data collection provides no validation steps, completeness metrics, error handling procedures, or sample statistics, leaving the central claim that the dataset supports studies in blockchain analytics and the agentic economy unsupported by verifiable evidence of data quality.
  2. [Data collection process] Data collection process: Collection is performed solely via Web3 RPC queries, which are subject to provider rate limits, pagination truncation, missed logs when event filters are not exhaustive, and silent failures on large block ranges. No independent verification against a full-archive node, TheGraph subgraph, or block-explorer export is described, so systematic omissions or corrupted off-chain resolutions would directly undermine the dataset's utility for the claimed analytics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which has helped us improve the clarity and rigor of our dataset description. We have revised the manuscript to incorporate additional details on validation, error handling, and verification procedures. Our responses to the major comments are provided below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The description of data collection provides no validation steps, completeness metrics, error handling procedures, or sample statistics, leaving the central claim that the dataset supports studies in blockchain analytics and the agentic economy unsupported by verifiable evidence of data quality.

    Authors: We agree that the original abstract was too brief and omitted these elements. In the revised manuscript, we have expanded the abstract to include a concise summary of validation steps (cross-verification against block explorer exports), completeness metrics (99.8% event coverage in the target block range), error handling (retry logic for RPC timeouts), and sample statistics (e.g., 7,245 agents with successfully resolved off-chain metadata). These additions provide the requested verifiable evidence supporting the dataset's utility. revision: yes

  2. Referee: [Data collection process] Data collection process: Collection is performed solely via Web3 RPC queries, which are subject to provider rate limits, pagination truncation, missed logs when event filters are not exhaustive, and silent failures on large block ranges. No independent verification against a full-archive node, TheGraph subgraph, or block-explorer export is described, so systematic omissions or corrupted off-chain resolutions would directly undermine the dataset's utility for the claimed analytics.

    Authors: We acknowledge the inherent risks of RPC-only collection. We have added a new 'Validation and Limitations' subsection that details: use of multiple RPC providers with rate-limit-aware pagination and exhaustive topic filtering to avoid truncation or missed logs; retry mechanisms with logging for any transient failures; and independent verification consisting of (a) full comparison of agent registration counts and event totals against Etherscan exports for the identical block range and (b) spot-checks of 500 random blocks against a local full-archive node. Off-chain metadata resolution includes checksum validation and a reported success rate, with any unresolved or suspect entries explicitly flagged in the dataset. These changes directly mitigate the identified risks. revision: yes

Circularity Check

0 steps flagged

No circularity: straightforward dataset collection with no derivations

full rationale

The paper presents a collected dataset of ERC-8004 agents on Ethereum, integrating on-chain records and off-chain metadata obtained via Web3 RPC queries. It contains no equations, predictions, fitted parameters, uniqueness theorems, or analytical derivations. The central contribution is the dataset itself and its description; there are no load-bearing steps that reduce by construction to self-definitions, self-citations, or renamed inputs. This matches the reader's assessment of zero circularity for a pure data resource paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a data release paper containing no mathematical derivations, models, or theoretical constructs; therefore it introduces no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5403 in / 1114 out tokens · 45089 ms · 2026-05-08T08:55:43.401682+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 5 canonical work pages

  1. [1]

    & Zhang, L

    Liu, Y. & Zhang, L. Cryptocurrency valuation: An explainable ai approach. SSRN Electron. J. https://doi.org/10.2139/ssrn.3657986 (2021). 4. Zhang, Y., Chen, Z., Sun, Y., Liu, Y., & Zhang, L. (2023, July). Blockchain network analysis: A comparative study of decentralized banks. In Science and information conference (pp. 1022-1042). Cham: Springer Nature Sw...

  2. [2]

    Wang, R., Ye, F., Tang, S., Zhang, H., He, J., Zhang, X., & Xu, C. (2025). Blockchain Technology for Big-data Sharing in Material Genome Engineering. Scientific Data, 12(1), 1813

  3. [3]

    L., Nguyen, L., Hoang, T., Bandara, D., Wang, Q., Lu, Q.,

    Nguyen, T. L., Nguyen, L., Hoang, T., Bandara, D., Wang, Q., Lu, Q., ... & Chen, S. (2025). Blockchain-empowered trustworthy data sharing: Fundamentals, applications, and challenges. ACM Computing Surveys, 57(8), 1-36

  4. [4]

    Bitcoin: A peer-to-peer electronic cash system

    Nakamoto, S. Bitcoin: A peer-to-peer electronic cash system. Decentralized Bus. Rev. 21260, https://www.debr.io/article/21260-bitcoin-a-peer-to-peer-electronic-cash-system (2008). 8. Böhme, R., Christin, N., Edelman, B. & Moore, T. Bitcoin: Economics, technology, and governance. J. Econ. Perspectives 29, 213–238, https://doi.org/10.1257/jep.29.2.213 (2015...

  5. [5]

    deciphering bitcoin blockchain data by cohort analysis

    Pagnotta, E. S. Decentralizing money: Bitcoin prices and blockchain security. The Rev. Financial Stud , https://doi.org/10.1093/rfs/hhaa149 (2021). 12. Liu, Zhang, Zhao. Deciphering bitcoin blockchain data by cohort analysis . Sci. Data https://doi.org/10.1038/s41597-022-01254-0 (2022). 13. Liu, Y., Zhang, L. & Zhao, Y. Replication data for: “deciphering ...

  6. [6]

    & Schwarz-Schilling, C

    John, K., Monnot, B., Mueller, P., Saleh, F. & Schwarz-Schilling, C. Economics of ethereum. J. Corp. Finance 91, 102718, https://doi.org/10.1016/j.jcorpfin.2024.102718 (2025). 16. Somin, S., Altshuler, Y. & Pentland, A. Crypto-asset trading on top of Ethereum Blockchain comprehensive dataset. Sci Data 12, 1407 (2025). https://doi.org/10.1038/s41597-025-05...

  7. [7]

    Kim, J., & Im, I. (2023). Anthropomorphic response: Understanding interactions between humans and artificial intelligence agents. Computers in Human Behavior, 139, 107512

  8. [8]

    Zhao, P., Jin, Z., & Cheng, N. (2023). An in-depth survey of large language model-based artificial intelligence agents. arXiv preprint arXiv:2309.14365

  9. [9]

    Kühl, N., Schemmer, M., Goutier, M., & Satzger, G. (2022). Artificial intelligence and machine learning. Electronic Markets, 32(4), 2235-2244

  10. [10]

    Hou, X., Zhao, Y., Wang, S., & Wang, H. (2025). Model context protocol (mcp): Landscape, security threats, and future research directions. ACM Transactions on Software Engineering and Methodology

  11. [11]

    Ray, P. P. (2025). A review on agent-to-agent protocol: Concept, state-of-the-art, challenges and future directions. Authorea Preprints

  12. [12]

    (2025, August 13)

    De Rossi, M., Crapis, D., Ellis, J., & Reppel, E. (2025, August 13). EIP-8004: Trustless agents. Ethereum Improvement Proposals. https://eips.ethereum.org/EIPS/eip-8004 24. Entriken, W., Shirley, D., Evans, J., & Sachs, N. (2018, January 24). EIP-721: Non-Fungible Token Standard. Ethereum Improvement Proposals. https://eips.ethereum.org/EIPS/eip-721 25. J...

  13. [13]

    W., Gaur, V., & Giesecke, K

    Biais, B., Capponi, A., Cong, L. W., Gaur, V., & Giesecke, K. (2023). Advances in blockchain and crypto economics. Management Science, 69(11), 6417-6426

  14. [14]

    Schilling, L., & Uhlig, H. (2019). Some simple bitcoin economics. Journal of Monetary Economics, 106, 16-26

  15. [15]

    C., Wang, W., Niyato, D., Wang, P., Liang, Y

    Liu, Z., Luong, N. C., Wang, W., Niyato, D., Wang, P., Liang, Y. C., & Kim, D. I. (2019). A survey on blockchain: A game theoretical perspective. IEEE Access, 7, 47615-47643. Acknowledgement The author would like to thank all the attendees of the Swiss QuantEcon AI Workshop for their valuable feedback and comments. Competing interests The author declares ...