Data Facts: A Metadata Schema for Structured Data Exchange in the NANDini Multi-Agent Ecosystem

Abhishek Mehta; Brittany Box; Jin Gao; Maria Gorskikh; Mukul Kemla; Pradyumna Chari; Pratik Behera; Ramesh Raskar

arxiv: 2606.26211 · v1 · pith:XAFZDIWXnew · submitted 2026-06-24 · 💻 cs.CR

Data Facts: A Metadata Schema for Structured Data Exchange in the NANDini Multi-Agent Ecosystem

Jin Gao , Maria Gorskikh , Pradyumna Chari , Brittany Box , Mukul Kemla , Pratik Behera , Abhishek Mehta , Ramesh Raskar This is my paper

Pith reviewed 2026-06-26 01:47 UTC · model grok-4.3

classification 💻 cs.CR

keywords Data FactsNANDinimulti-agent systemsmetadata schemadata exchangesecurity pipelineautonomous agentsJSON metadata

0 comments

The pith

Data Facts is a JSON metadata schema that lets autonomous agents advertise, verify, and securely exchange datasets via a single registry pointer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes Data Facts as the missing link for structured data exchange in the NANDini agent ecosystem. Existing registries and messaging protocols handle identity and communication but leave agents unable to discover or validate each other's data holdings without human oversight. The schema adds a data_facts_url field that points to a document containing dataset identity, access tier, endpoint, freshness TTL, and SHA-256 checksum. In 840 decision-making evaluations, agents using the schema reach 100 percent accuracy compared with 35.2 percent without data access, while the accompanying three-layer security pipeline stops every forgery attempt with no leakage.

Core claim

Data Facts is a lightweight JSON metadata schema that encodes dataset identity, access tier (public, semi-private, or private), endpoint, time-to-live for freshness validation, and SHA-256 integrity checksum. It is referenced by adding a data_facts_url pointer to an existing Agent Facts registry record. For private and semi-private data a three-layer security pipeline applies JWT authentication, capability-scoped gateway authorization, and A2A credential delegation. Across 840 decision-making evaluations data-informed agents achieve 100 percent accuracy versus 35.2 percent without data access; TTL enforcement reduces stale-data errors from 37.6 percent to 8.8 percent; checksum verification a

What carries the argument

Data Facts JSON metadata schema, which carries dataset identity, access tier, endpoint, TTL freshness marker, and SHA-256 checksum through a single registry pointer, backed by a three-layer security pipeline of JWT authentication, capability-scoped authorization, and A2A credential delegation.

If this is right

Agents using Data Facts reach 100 percent accuracy in 840 decision-making evaluations versus 35.2 percent without data access.
TTL enforcement reduces stale-data errors from 37.6 percent to 8.8 percent.
SHA-256 checksum verification detects 100 percent of corruptions at all tested injection rates.
The three-layer security pipeline blocks all 46 forgery attempts with zero data leakage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Standardizing data pointers this way could let separate agent ecosystems interoperate on data discovery without custom integrations.
The same pointer-plus-checksum pattern might be tested for live sensor or transaction streams where freshness is critical.
Extending the schema to include usage-cost or provenance metadata would address a natural next requirement for agent marketplaces.

Load-bearing premise

The 840 decision-making evaluations and 46 forgery attempts accurately represent performance in real-world autonomous agent interactions and data scenarios.

What would settle it

A deployment in which data-informed agents fall below 100 percent accuracy or the security pipeline allows even one successful forgery with data leakage would falsify the performance claims.

Figures

Figures reproduced from arXiv: 2606.26211 by Abhishek Mehta, Brittany Box, Jin Gao, Maria Gorskikh, Mukul Kemla, Pradyumna Chari, Pratik Behera, Ramesh Raskar.

**Figure 1.** Figure 1: shows the NANDini Data Facts architecture as two coordinated layers: an agent layer and a dataowner layer. User requests are interpreted by the LLM and executed through skills/tools. Agents expose identity and endpoint metadata through AgentFacts (via the NANDA registry), while dataset access metadata is exposed through a DataFacts pointer (the NANDini data exchange layer). On the data-owner side, each … view at source ↗

**Figure 2.** Figure 2: Decision quality on data-driven questions [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

NANDini (Networked Agents Natural Distillation of Interconnected Nodal Intelligence) envisions an automated ecosystem where intelligent agents independently create, process, and exchange data to drive decisions at scale. Realizing this vision requires infrastructure beyond agent discovery and communication: agents must be able to advertise, evaluate, and verify the datasets they hold. Current protocols, including NANDA for federated registry and A2A and MCP for inter-agent messaging, address identity and communication but provide no mechanism for structured data exchange. Existing enterprise data-sharing frameworks, such as IDS-RAM, Gaia-X, and Ocean Protocol, assume human-in-the-loop governance that is incompatible with autonomous, real-time agent interactions. We introduce Data Facts, a core NANDini concept: a lightweight JSON metadata schema that bridges agent discovery and data access through a single pointer, `data_facts_url`, added to an existing Agent Facts registry record. The linked document encodes dataset identity, access tier, whether public, semi-private, or private, endpoint, a time-to-live for freshness validation, and a SHA-256 integrity checksum. For private and semi-private data, we implement a three-layer security pipeline: JWT authentication, capability-scoped gateway authorization, and an A2A credential delegation protocol. Across 840 decision-making evaluations, data-informed agents achieve 100% accuracy versus 35.2% without data access (p < 0.001); TTL enforcement reduces stale-data errors from 37.6% to 8.8%; checksum verification achieves 100% corruption detection at all injection rates; and the security pipeline blocks all 46 forgery attempts with zero data leakage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Data Facts JSON schema is a practical pointer-based addition for agent data exchange, but the headline accuracy and security numbers come with no experimental design or task details.

read the letter

Dear Colleague,

The one thing to know is that this paper defines a lightweight JSON schema called Data Facts that lets agents reference datasets through a single data_facts_url pointer added to an existing registry record, along with a three-layer security setup for private data.

What is new is the schema structure itself: fields for dataset identity, access tier (public, semi-private, private), endpoint, TTL for freshness, and SHA-256 checksum. The security pipeline combines JWT authentication, capability-scoped gateway authorization, and A2A credential delegation. The abstract correctly notes that protocols like NANDA, A2A, MCP, IDS-RAM, Gaia-X, and Ocean Protocol focus on discovery, messaging, or human-governed sharing and do not target fully autonomous real-time data use by agents. The schema directly addresses that gap with a minimal, machine-readable format.

The paper does a clear job explaining why human-in-the-loop models will not scale for agent ecosystems and why a pointer plus integrity checks could help.

The soft spot is the evaluation. The abstract reports 100% accuracy on 840 decision-making evaluations versus 35.2% without data access (p < 0.001), TTL reducing stale errors from 37.6% to 8.8%, 100% corruption detection, and the pipeline blocking all 46 forgery attempts with zero leakage. None of these numbers are accompanied by any description of the decision tasks, how the 840 instances were generated or sampled, the agent implementations, the no-data baseline, how the data_facts_url was actually consumed, the forgery attack vectors, or the statistical test. The security pipeline stays at the level of high-level components with no formal model or adversarial analysis. Without that information the claims cannot be assessed.

This is for researchers working on multi-agent platforms who need concrete mechanisms for data exchange. A reader could usefully adapt the schema fields even if the performance numbers remain unverified. It deserves peer review so referees can request the missing experimental details and judge whether the core idea holds up.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces Data Facts, a lightweight JSON metadata schema for structured data exchange in the NANDini multi-agent ecosystem. It extends existing agent registries (e.g., NANDA) by adding a `data_facts_url` pointer to a document encoding dataset identity, access tier (public/semi-private/private), endpoint, TTL for freshness, and SHA-256 checksum. A three-layer security pipeline (JWT authentication, capability-scoped gateway authorization, A2A credential delegation) is proposed for private data. The authors report results from 840 decision-making evaluations in which data-informed agents reach 100% accuracy versus 35.2% without data access (p < 0.001), TTL enforcement reduces stale-data errors from 37.6% to 8.8%, checksum verification detects 100% of corruption, and the pipeline blocks all 46 forgery attempts with zero leakage.

Significance. If the reported performance gains prove reproducible, the work would address a genuine gap in autonomous agent infrastructure by supplying a machine-to-machine data-verification mechanism absent from NANDA/A2A/MCP and incompatible with human-centric frameworks such as IDS-RAM or Gaia-X. The concrete, pointer-based design and explicit integration with existing registries constitute a practical contribution; the quantitative claims, if supported by transparent methods, would strengthen the case for adoption in multi-agent decision systems.

major comments (1)

[Abstract] Abstract: the central empirical claims (100% accuracy across 840 evaluations, p < 0.001, zero leakage on 46 forgery attempts, checksum detection at all injection rates) are presented without any description of experimental design, decision tasks, sampling procedure for the 840 instances, baseline agent architecture, data_facts_url consumption mechanism, forgery attack vectors, or the statistical test used. These details are load-bearing for attributing the observed gap to the Data Facts schema rather than unstated simplifications in the test harness.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the need for greater transparency in the abstract regarding our empirical claims. We address this point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claims (100% accuracy across 840 evaluations, p < 0.001, zero leakage on 46 forgery attempts, checksum detection at all injection rates) are presented without any description of experimental design, decision tasks, sampling procedure for the 840 instances, baseline agent architecture, data_facts_url consumption mechanism, forgery attack vectors, or the statistical test used. These details are load-bearing for attributing the observed gap to the Data Facts schema rather than unstated simplifications in the test harness.

Authors: We agree that the abstract as submitted omits these methodological details, which limits the ability to evaluate the claims from the abstract alone. The full manuscript contains sections describing the experimental design (including the specific decision tasks, sampling of the 840 instances from a controlled multi-agent simulation environment, baseline agent architecture without data access, the data_facts_url consumption mechanism, the forgery attack vectors consisting of checksum tampering and credential spoofing, and the statistical test used for the p < 0.001 result). To address the concern, we will revise the abstract to include a concise description of the evaluation setup, baseline, attack vectors, and statistical method while respecting length constraints. revision: yes

Circularity Check

0 steps flagged

No circularity detected in schema definition or performance claims

full rationale

The paper introduces Data Facts as a JSON metadata schema with fields for identity, access tier, endpoint, TTL, and checksum, then adds a data_facts_url pointer to an Agent Facts registry. Performance metrics (100% accuracy, 35.2% baseline, p<0.001, zero leakage on 46 attempts) are explicitly presented as outcomes of separate 840 evaluations and forgery tests rather than being algebraically or definitionally entailed by the schema. No equations, fitted parameters renamed as predictions, self-citations, uniqueness theorems, or ansatzes appear in the text. The chain is schema definition followed by external empirical validation, which remains self-contained and falsifiable outside the schema itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the utility of the proposed JSON schema and the validity of the reported evaluations, which are summarized but not detailed in the abstract.

axioms (1)

domain assumption A lightweight JSON metadata document can effectively bridge agent discovery and structured data access
This premise underpins the Data Facts concept as described.

pith-pipeline@v0.9.1-grok · 5862 in / 1196 out tokens · 37190 ms · 2026-06-26T01:47:56.701041+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references

[1]

Data catalog vocabulary (DCAT) — version 2

Riccardo Albertoni, David Browning, Simon Cox, Alejandra Gonzalez Beltran, Andrea Perego, and Peter Winstanley. Data catalog vocabulary (DCAT) — version 2. W3C recommendation, World Wide Web Consortium, February 2020

2020
[2]

Introducing the model context proto- col

Anthropic. Introducing the model context proto- col. Anthropic Blog, November 2024

2024
[3]

Methodologies for data quality assessment and improvement.ACM Computing Surveys, 41(3):16:1–16:52, 2009

Carlo Batini, Cinzia Cappiello, Chiara Fran- calanci, and Andrea Maurino. Methodologies for data quality assessment and improvement.ACM Computing Surveys, 41(3):16:1–16:52, 2009

2009
[4]

Gaia-X: Technical architecture

Gaia-X AISBL. Gaia-X: Technical architecture. Technical report, Gaia-X European Association for Data and Cloud, 2021

2021
[5]

Data shapley: Equitable valuation of data for machine learn- ing

Amirata Ghorbani and James Zou. Data shapley: Equitable valuation of data for machine learn- ing. InProceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2242–2251. PMLR, 2019

2019
[6]

Announcing the Agent2Agent (A2A) protocol

Google. Announcing the Agent2Agent (A2A) protocol. Google Developers Blog, April 2025

2025
[7]

Aafaq Hussain, Junaid Qadir, et al. A survey of agent interoperability protocols: Model Context Protocol (MCP), Agent Communication Proto- col (ACP), Agent-to-Agent Protocol (A2A), and Agent Network Protocol (ANP).arXiv preprint arXiv:2505.02279, 2025

arXiv 2025
[8]

Datas- pace protocol specification

International Data Spaces Association. Datas- pace protocol specification. Technical report, IDSA, 2023

2023
[9]

PROV-O: The PROV ontology

Timothy Lebo, Satya Sahoo, Deborah McGuin- ness, Khalid Belhajjame, James Cheney, David Corsar, Daniel Garijo, Stian Soiland-Reyes, Stephan Zednik, and Jun Zhao. PROV-O: The PROV ontology. W3C recommendation, World Wide Web Consortium, April 2013

2013
[10]

Private data measurements for decentralized data markets

Charles Lu, Mohammad Mohammadi Amiri, and Ramesh Raskar. Private data measurements for decentralized data markets. InICLR 2024 Work- shop on Data-centric Machine Learning Research (DMLR): Harnessing Momentum for Science, 2024

2024
[11]

DAVED: Data acquisition via experimental design for data markets

Charles Lu, Baihe Huang, Sai Praneeth Karim- ireddy, Praneeth Vepakomma, Michael Jordan, and Ramesh Raskar. DAVED: Data acquisition via experimental design for data markets. InAd- vances in Neural Information Processing Systems, volume 37, 2024

2024
[12]

Ocean protocol: Tools for the Web3 data economy

Trent McConaghy. Ocean protocol: Tools for the Web3 data economy. InHandbook on Blockchain, volume 194 ofSpringer Optimization and Its Ap- plications. Springer, 2022

2022
[13]

Fundamentals of task- agnostic data valuation

Mohammad Mohammadi Amiri, Frederic Berdoz, and Ramesh Raskar. Fundamentals of task- agnostic data valuation. InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 37, pages 9226–9234, 2023

2023
[14]

International data spaces: Reference architecture for the digi- tization of industries

Boris Otto, Sebastian Steinbuß, Andreas Teuscher, and Steffen Lohmann. International data spaces: Reference architecture for the digi- tization of industries. InDesigning Data Spaces: The Ecosystem Approach to Competitive Advan- tage. Springer, 2019

2019
[15]

Beyond DNS: Unlocking the internet of AI agents via the NANDA in- dex and verified AgentFacts.arXiv preprint arXiv:2507.14263, 2025

Ramesh Raskar, Pradyumna Chari, John Zinky, Mahesh Lambe, Jared James Grogan, Sichao Wang, Rajesh Ranjan, Rekha Singhal, Shailja Gupta, et al. Beyond DNS: Unlocking the internet of AI agents via the NANDA in- dex and verified AgentFacts.arXiv preprint arXiv:2507.14263, 2025

arXiv 2025
[16]

Evo- lution of AI agent registry solutions: Centralized, enterprise, and distributed approaches.arXiv preprint arXiv:2508.03095, 2025

Aditi Singh, Abul Ehtesham, Ramesh Raskar, Mahesh Lambe, Pradyumna Chari, Jared James Grogan, Abhishek Singh, and Saket Kumar. Evo- lution of AI agent registry solutions: Centralized, enterprise, and distributed approaches.arXiv preprint arXiv:2508.03095, 2025

arXiv 2025
[17]

Decentralizedidentifiers(DIDs)v1.0

Manu Sporny, Dave Longley, Markus Sabadello, Drummond Reed, Orie Steele, and Christopher Allen. Decentralizedidentifiers(DIDs)v1.0. W3C recommendation, World Wide Web Consortium, July 2022

2022
[18]

NANDini: Networked agents natural distillation of interconnected nodal intelligence

Tresata. NANDini: Networked agents natural distillation of interconnected nodal intelligence. Tresata AI Blog, 2025. 9

2025
[19]

Efficient and fair data valuation for hor- izontal federated learning

Suyi Wei, Yongxin Tong, Zimu Zhou, and Tian- shu Song. Efficient and fair data valuation for hor- izontal federated learning. InFederated Learning: Privacy and Incentive, pages 139–152. Springer, 2020

2020
[20]

A survey on data markets.arXiv preprint arXiv:2411.07267, 2024

Jiayao Zhang, Yunshu Bi, Meng Cheng, Ji Liu, Kui Ren, Qiang Sun, Yuncheng Wu, Yang Cao, Raul Castro Fernandez, and Haifeng Xu. A survey on data markets.arXiv preprint arXiv:2411.07267, 2024. 10

arXiv 2024

[1] [1]

Data catalog vocabulary (DCAT) — version 2

Riccardo Albertoni, David Browning, Simon Cox, Alejandra Gonzalez Beltran, Andrea Perego, and Peter Winstanley. Data catalog vocabulary (DCAT) — version 2. W3C recommendation, World Wide Web Consortium, February 2020

2020

[2] [2]

Introducing the model context proto- col

Anthropic. Introducing the model context proto- col. Anthropic Blog, November 2024

2024

[3] [3]

Methodologies for data quality assessment and improvement.ACM Computing Surveys, 41(3):16:1–16:52, 2009

Carlo Batini, Cinzia Cappiello, Chiara Fran- calanci, and Andrea Maurino. Methodologies for data quality assessment and improvement.ACM Computing Surveys, 41(3):16:1–16:52, 2009

2009

[4] [4]

Gaia-X: Technical architecture

Gaia-X AISBL. Gaia-X: Technical architecture. Technical report, Gaia-X European Association for Data and Cloud, 2021

2021

[5] [5]

Data shapley: Equitable valuation of data for machine learn- ing

Amirata Ghorbani and James Zou. Data shapley: Equitable valuation of data for machine learn- ing. InProceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2242–2251. PMLR, 2019

2019

[6] [6]

Announcing the Agent2Agent (A2A) protocol

Google. Announcing the Agent2Agent (A2A) protocol. Google Developers Blog, April 2025

2025

[7] [7]

Aafaq Hussain, Junaid Qadir, et al. A survey of agent interoperability protocols: Model Context Protocol (MCP), Agent Communication Proto- col (ACP), Agent-to-Agent Protocol (A2A), and Agent Network Protocol (ANP).arXiv preprint arXiv:2505.02279, 2025

arXiv 2025

[8] [8]

Datas- pace protocol specification

International Data Spaces Association. Datas- pace protocol specification. Technical report, IDSA, 2023

2023

[9] [9]

PROV-O: The PROV ontology

Timothy Lebo, Satya Sahoo, Deborah McGuin- ness, Khalid Belhajjame, James Cheney, David Corsar, Daniel Garijo, Stian Soiland-Reyes, Stephan Zednik, and Jun Zhao. PROV-O: The PROV ontology. W3C recommendation, World Wide Web Consortium, April 2013

2013

[10] [10]

Private data measurements for decentralized data markets

Charles Lu, Mohammad Mohammadi Amiri, and Ramesh Raskar. Private data measurements for decentralized data markets. InICLR 2024 Work- shop on Data-centric Machine Learning Research (DMLR): Harnessing Momentum for Science, 2024

2024

[11] [11]

DAVED: Data acquisition via experimental design for data markets

Charles Lu, Baihe Huang, Sai Praneeth Karim- ireddy, Praneeth Vepakomma, Michael Jordan, and Ramesh Raskar. DAVED: Data acquisition via experimental design for data markets. InAd- vances in Neural Information Processing Systems, volume 37, 2024

2024

[12] [12]

Ocean protocol: Tools for the Web3 data economy

Trent McConaghy. Ocean protocol: Tools for the Web3 data economy. InHandbook on Blockchain, volume 194 ofSpringer Optimization and Its Ap- plications. Springer, 2022

2022

[13] [13]

Fundamentals of task- agnostic data valuation

Mohammad Mohammadi Amiri, Frederic Berdoz, and Ramesh Raskar. Fundamentals of task- agnostic data valuation. InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 37, pages 9226–9234, 2023

2023

[14] [14]

International data spaces: Reference architecture for the digi- tization of industries

Boris Otto, Sebastian Steinbuß, Andreas Teuscher, and Steffen Lohmann. International data spaces: Reference architecture for the digi- tization of industries. InDesigning Data Spaces: The Ecosystem Approach to Competitive Advan- tage. Springer, 2019

2019

[15] [15]

Beyond DNS: Unlocking the internet of AI agents via the NANDA in- dex and verified AgentFacts.arXiv preprint arXiv:2507.14263, 2025

Ramesh Raskar, Pradyumna Chari, John Zinky, Mahesh Lambe, Jared James Grogan, Sichao Wang, Rajesh Ranjan, Rekha Singhal, Shailja Gupta, et al. Beyond DNS: Unlocking the internet of AI agents via the NANDA in- dex and verified AgentFacts.arXiv preprint arXiv:2507.14263, 2025

arXiv 2025

[16] [16]

Evo- lution of AI agent registry solutions: Centralized, enterprise, and distributed approaches.arXiv preprint arXiv:2508.03095, 2025

Aditi Singh, Abul Ehtesham, Ramesh Raskar, Mahesh Lambe, Pradyumna Chari, Jared James Grogan, Abhishek Singh, and Saket Kumar. Evo- lution of AI agent registry solutions: Centralized, enterprise, and distributed approaches.arXiv preprint arXiv:2508.03095, 2025

arXiv 2025

[17] [17]

Decentralizedidentifiers(DIDs)v1.0

Manu Sporny, Dave Longley, Markus Sabadello, Drummond Reed, Orie Steele, and Christopher Allen. Decentralizedidentifiers(DIDs)v1.0. W3C recommendation, World Wide Web Consortium, July 2022

2022

[18] [18]

NANDini: Networked agents natural distillation of interconnected nodal intelligence

Tresata. NANDini: Networked agents natural distillation of interconnected nodal intelligence. Tresata AI Blog, 2025. 9

2025

[19] [19]

Efficient and fair data valuation for hor- izontal federated learning

Suyi Wei, Yongxin Tong, Zimu Zhou, and Tian- shu Song. Efficient and fair data valuation for hor- izontal federated learning. InFederated Learning: Privacy and Incentive, pages 139–152. Springer, 2020

2020

[20] [20]

A survey on data markets.arXiv preprint arXiv:2411.07267, 2024

Jiayao Zhang, Yunshu Bi, Meng Cheng, Ji Liu, Kui Ren, Qiang Sun, Yuncheng Wu, Yang Cao, Raul Castro Fernandez, and Haifeng Xu. A survey on data markets.arXiv preprint arXiv:2411.07267, 2024. 10

arXiv 2024