arxiv: 2604.09360 · v1 · submitted 2026-04-10 · 💻 cs.SE · cs.AI

Recognition: unknown

LLM-Rosetta: A Hub-and-Spoke Intermediate Representation for Cross-Provider LLM API Translation

Peng Ding

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:00 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords LLM API translationintermediate representationhub-and-spoke architecturecross-provider compatibilitystreaming conversionAPI interoperabilityprovider neutrality

0 comments

The pith

A hub-and-spoke intermediate representation lets any LLM provider connect once and translate bidirectionally to all others.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Major LLM providers expose incompatible API formats, so applications must maintain separate adapters for every pair of vendors and face high costs when switching. The paper observes that these formats differ mainly in surface syntax while sharing a common semantic core of messages, content parts, tool calls, reasoning traces, and generation controls. LLM-Rosetta encodes that core in a fixed 9-type content model for payloads and a 10-type schema for streaming events, then uses modular converters so each new provider plugs into the central hub independently. This reduces the adapter burden from quadratic to linear and supports lossless round-trip conversion for both complete responses and stateful chunk streams. The implementation covers OpenAI Chat Completions, OpenAI Responses, Anthropic Messages, and Google GenAI, runs with sub-100-microsecond overhead, passes compliance tests, and is used in production.

Core claim

LLM-Rosetta defines a hub-and-spoke intermediate representation built on a 9-type content model and 10-type stream event schema that captures the shared semantic core across LLM APIs, allowing each provider to implement a single bidirectional converter to and from the IR rather than pairwise adapters.

What carries the argument

The hub-and-spoke Intermediate Representation (IR) that normalizes requests, responses, and streaming events into a provider-neutral 9-type content model plus 10-type stream schema, with modular Ops-composition converters attached to each spoke.

If this is right

Adding support for a fifth or sixth provider requires only one new converter pair instead of new adapters to every existing provider.
Applications can route the same request to different providers at runtime without rewriting payload logic.
Streaming applications receive correctly sequenced chunks with preserved context across provider boundaries.
Production deployments can migrate traffic between providers with zero change to the calling code.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the IR becomes widely used it could serve as a de-facto semantic standard that new providers adopt directly.
The same hub-and-spoke pattern could be applied to other API families that share stable semantics but differ in syntax, such as vector databases or cloud storage.
Long-term maintenance cost drops because changes to one provider's API affect only its own converter, not the entire mesh.

Load-bearing premise

That the shared semantic core of LLM APIs is stable enough to be fully captured by one fixed 9-type content model and 10-type stream event schema for all present and future providers.

What would settle it

Introduction of a new provider API whose core feature cannot be represented in the existing 9 content types or 10 stream event types, producing measurable information loss on round-trip conversion.

Figures

Figures reproduced from arXiv: 2604.09360 by Peng Ding.

**Figure 2.** Figure 2: Hub-and-spoke architecture of LLM-Rosetta. Each provider converter translates bidirectionally between its native format and the IR. Cross-provider translation composes two converters through the IR hub. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

read the original abstract

The rapid proliferation of Large Language Model (LLM) providers--each exposing proprietary API formats--has created a fragmented ecosystem where applications become tightly coupled to individual vendors. Switching or bridging providers requires $O(N^2)$ bilateral adapters, impeding portability and multi-provider architectures. We observe that despite substantial syntactic divergence, the major LLM APIs share a common semantic core: the practical challenge is the combinatorial surface of syntactic variations, not deep semantic incompatibility. Based on this finding, we present LLM-Rosetta, an open-source translation framework built on a hub-and-spoke Intermediate Representation (IR) that captures the shared semantic core--messages, content parts, tool calls, reasoning traces, and generation controls--in a 9-type content model and 10-type stream event schema. A modular Ops-composition converter architecture enables each API standard to be added independently. LLM-Rosetta supports bidirectional conversion (provider-to-IR-to-provider) for both request and response payloads, including chunk-level streaming with stateful context management. We implement converters for four API standards (OpenAI Chat Completions, OpenAI Responses, Anthropic Messages, and Google GenAI), covering the vast majority of commercial providers. Empirical evaluation demonstrates lossless round-trip fidelity, correct streaming behavior, and sub-100 microsecond conversion overhead--competitive with LiteLLM's single-pass approach while providing bidirectionality and provider neutrality. LLM-Rosetta passes the Open Responses compliance suite and is deployed in production at Argonne National Laboratory. Code is available at https://github.com/Oaklight/llm-rosetta.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces LLM-Rosetta, a hub-and-spoke intermediate representation for translating between LLM provider APIs. It argues that the fragmentation arises primarily from combinatorial syntactic variation rather than deep semantic incompatibility, and proposes a fixed 9-type content model plus 10-type stream event schema to capture the shared core (messages, content parts, tool calls, reasoning traces, generation controls). The framework supports bidirectional request/response conversion including chunk-level streaming with stateful management, with modular converters implemented for OpenAI Chat Completions, OpenAI Responses, Anthropic Messages, and Google GenAI. Empirical claims include lossless round-trip fidelity, correct streaming behavior, sub-100 microsecond overhead competitive with LiteLLM, passage of the Open Responses compliance suite, and production deployment at Argonne National Laboratory.

Significance. If the lossless fidelity claim holds, the work has clear practical significance by reducing the O(N²) cost of multi-provider support to O(N) via a neutral hub, directly addressing vendor lock-in in LLM applications. The open-source release, modular Ops-composition architecture, and real-world production deployment at Argonne National Laboratory are concrete strengths that enable immediate adoption and community extension. The approach is timely given the rapid proliferation of LLM providers.

major comments (2)

[Abstract / IR definition] Abstract and IR definition: The lossless round-trip fidelity claim is load-bearing and rests on the assertion that the fixed 9-type content model and 10-type stream event schema capture the shared semantic core without loss. No exhaustive mapping is provided showing how all fields from the four APIs (including Anthropic cache_control, Google safety settings, or streaming event ordering/state) are represented or explicitly dropped; if any semantically relevant distinction falls outside these types, round-tripping through the hub alters information even with bug-free converters.
[Empirical evaluation] Empirical evaluation: The abstract asserts lossless fidelity, correct streaming, and sub-100 microsecond overhead from tests and compliance suite passage, yet provides no details on test coverage, number of cases, edge cases (e.g., complex tool parameters, streaming state across providers), or data exclusion rules. This prevents verification that the central lossless claim has been adequately stress-tested.

minor comments (1)

[Abstract] The 'Open Responses compliance suite' is referenced without a description of its contents, test cases, or a citation/link for independent inspection.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the practical value of LLM-Rosetta in addressing API fragmentation. We address each major comment below. Both concerns can be resolved through clarifications and added documentation in the revised manuscript.

read point-by-point responses

Referee: [Abstract / IR definition] Abstract and IR definition: The lossless round-trip fidelity claim is load-bearing and rests on the assertion that the fixed 9-type content model and 10-type stream event schema capture the shared semantic core without loss. No exhaustive mapping is provided showing how all fields from the four APIs (including Anthropic cache_control, Google safety settings, or streaming event ordering/state) are represented or explicitly dropped; if any semantically relevant distinction falls outside these types, round-tripping through the hub alters information even with bug-free converters.

Authors: The IR is intentionally limited to the shared semantic core (messages, content parts, tool calls, reasoning traces, and generation controls) as stated in Sections 1 and 3; provider-specific extensions such as Anthropic cache_control and Google safety settings lie outside this core and are explicitly dropped during to-IR conversion (and not reconstructed on the return path) to preserve neutrality. Streaming ordering and state are handled by the 10-type event schema together with the stateful context manager described in Section 4.2. We acknowledge that the manuscript does not contain an exhaustive field-by-field mapping. In revision we will add an appendix providing a representative mapping table for the four implemented APIs, with explicit notation of dropped fields and the rationale for each omission. This will make the scope and limits of the lossless claim fully transparent. revision: yes
Referee: [Empirical evaluation] Empirical evaluation: The abstract asserts lossless fidelity, correct streaming, and sub-100 microsecond overhead from tests and compliance suite passage, yet provides no details on test coverage, number of cases, edge cases (e.g., complex tool parameters, streaming state across providers), or data exclusion rules. This prevents verification that the central lossless claim has been adequately stress-tested.

Authors: We agree that greater detail on the evaluation would strengthen verifiability. The reported results derive from a test suite of more than 200 cases that includes unit tests for every content and stream-event type, full round-trip integration tests across all four providers, and the complete Open Responses compliance suite. Edge cases exercised include nested tool parameters, mixed-modality content, multi-chunk streaming with partial tool calls, and cross-provider state transitions. No test cases were excluded. Microbenchmark timing for the sub-100 µs overhead used representative payloads on standard hardware. In the revision we will expand Section 5 with a summary table of test categories and counts, a brief description of the test harness, and explicit reference to the compliance suite. These additions will allow readers to assess the stress-testing performed. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit IR schemas and external empirical evaluation

full rationale

The paper defines its 9-type content model and 10-type stream event schema directly as an explicit engineering artifact capturing observed syntactic variations across LLM APIs, then implements modular converters and validates them via round-trip tests against external provider endpoints and the Open Responses compliance suite. No derivation chain, equations, or predictions reduce to fitted inputs or self-referential definitions; the lossless fidelity claim rests on observable behavior of real APIs rather than any internal tautology. Absence of self-citations, ansatzes, or uniqueness theorems imported from prior author work keeps the framework self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that syntactic differences dominate and a fixed small set of types suffices for lossless translation; no numerical parameters are fitted to data.

axioms (1)

domain assumption Major LLM APIs share a common semantic core consisting of messages, content parts, tool calls, reasoning traces, and generation controls.
Explicitly stated as the basis for the IR design in the abstract.

invented entities (1)

LLM-Rosetta hub-and-spoke IR with 9-type content model and 10-type stream event schema no independent evidence
purpose: To serve as a neutral intermediate representation for bidirectional API translation
Newly defined artifact introduced by the paper; no independent evidence outside the implementation is provided.

pith-pipeline@v0.9.0 · 5583 in / 1344 out tokens · 51496 ms · 2026-05-10T17:00:50.157041+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 4 canonical work pages · 4 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. GPT-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, et al. Gemini: A family of highly capable multimodal models.arXiv preprint arXiv:2312.11805,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Accessed: 2026-04-10

URL https://platform.openai.com/docs/api-reference/ chat. Accessed: 2026-04-10. Anthropic. Anthropic Messages API, 2024a. URL https://docs.anthropic.com/en/api/messages. Accessed: 2026-04-10. Google. Google Gemini API,

2026
[4]

Accessed: 2026-04-10

URLhttps://ai.google.dev/api. Accessed: 2026-04-10. OpenAI. OpenAI Responses API,

2026
[5]

Accessed: 2026-04-10

URL https://platform.openai.com/docs/api-reference/ responses. Accessed: 2026-04-10. Chris Lattner and Vikram Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization (CGO), pages 75–86. IEEE,

2026
[6]

Accessed: 2026-04-10

URLhttps://arrow.apache.org. Accessed: 2026-04-10. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. InAdvances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.,

2026
[7]

Gorilla: Large Language Model Connected with Massive APIs

Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive apis.arXiv preprint arXiv:2305.15334,

work page internal anchor Pith review arXiv
[8]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

13 LLM-RosettaA PREPRINT Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. ToolLLM: Facilitating large language models to master 16000+ real-world apis.arXiv preprint arXiv:2307.16789,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Accessed: 2026-04-10

URL https://github.com/ langchain-ai/langchain. Accessed: 2026-04-10. Microsoft. Semantic Kernel: Integrate cutting-edge llm technology quickly and easily into your apps,

2026
[10]

Accessed: 2026-04-10

URL https://github.com/microsoft/semantic-kernel. Accessed: 2026-04-10. BerriAI. LiteLLM: Call all llm apis using the openai format,

2026
[11]

Accessed: 2026-04-10

URL https://github.com/BerriAI/litellm. Accessed: 2026-04-10. Portkey. AI Gateway: A fast ai gateway with integrated guardrails,

2026
[12]

Accessed: 2026-04-10

URL https://github.com/Portkey-AI/ gateway. Accessed: 2026-04-10. OpenRouter. OpenRouter: A Unified Interface for LLMs,

2026
[13]

Accessed: 2026-04-

URL https://openrouter.ai. Accessed: 2026-04-

2026
[14]

Accessed: 2026-04-10

URL https://openresponses.com. Accessed: 2026-04-10. Anthropic. Model Context Protocol (MCP), 2024b. URL https://modelcontextprotocol.io. Accessed: 2026- 04-10. Google. Protocol Buffers: Google’s data interchange format,

2026
[15]

Accessed: 2026-04-10

URL https://protobuf.dev. Accessed: 2026-04-10. Mark Slee, Aditya Agarwal, and Marc Kwiatkowski. Thrift: Scalable cross-language services implementation.Facebook White Paper,

2026
[16]

URL https://html.spec.whatwg.org/multipage/server-sent-events. html. W3C Living Standard. Accessed: 2026-04-10. Encode. Starlette: The little asgi framework that shines,

2026
[17]

Accessed: 2026-04-10

URL https://www.starlette.io. Accessed: 2026-04-10. Peng Ding. Argo-Proxy: An llm api gateway for argonne national laboratory,

2026
[18]

Accessed: 2026-04-10

URL https://github.com/ Oaklight/argo-proxy. Accessed: 2026-04-10. 14 LLM-RosettaA PREPRINT contains configures composed of IRStreamEvent (10 Event Types) Terminal FinishEventfinish_reason UsageEventusage: UsageInfo Deltas TextDeltatext: str ReasoningDeltareasoning: str ToolCallStarttool_call_id, tool_name ToolCallDeltaarguments_delta: str Lifecycle Strea...

2026