JD Oxygen AI Item Center (Oxygen AIIC) V1: An Industrial-Scale LLM/VLM-Centric Solution for Item Understanding, Management, and Applications

Chan Long; Chaofan Chen; Chaohui Dong; Chao Liu; Chunyuan Guo; Danping Liu; Debin Liu; Deping Xiang; Fulai Xu; Guangyue Liu

arxiv: 2606.28070 · v2 · pith:ARTSCHSJnew · submitted 2026-06-26 · 💻 cs.AI

JD Oxygen AI Item Center (Oxygen AIIC) V1: An Industrial-Scale LLM/VLM-Centric Solution for Item Understanding, Management, and Applications

Oxygen AIIC , Chan Long , Chao Liu , Chaofan Chen , Chaohui Dong , Chunyuan Guo , Danping Liu , Debin Liu

show 47 more authors

Deping Xiang Fulai Xu Guangyue Liu Hao Li Huichun Hu Jian Yang Jianan Wang Jianbo Zhao Jiaoyang Li Jiaxing Wang Jinglong Li Jinjin Guo Jun Fang Jun Liu Kai Zhou Li Wang Lili Gao Liying Chen Luning Yang Mengdi Zhou Pengzhang Liu Qi Lv Qianyun Wang Qixia Jiang Ruyue Li Shimu Liang Shuxing Wang Sijie Zhang Siqi Li Tianhao Gao Wang Ke Weihu Huang Wencan Lai Wenjie Zhang Xiaohui Zhang Xiaojing Dong Ya Liu Yifeng Zhang Yixiang Wang Yongtai Zhang Yongyi Liao Zhaoru Chen Zhen Chen Zhiyong Ma Zhiyuan Liu Zhongwei Liu Ziyan Xing

This is my paper

Pith reviewed 2026-06-29 04:07 UTC · model grok-4.3

classification 💻 cs.AI

keywords LLMVLMitem knowledge productionontology engineeringself-evolving modelsS2D architectureindustrial-scale AIe-commerce catalog

0 comments

The pith

An industrial platform uses self-evolving LLMs and VLMs to generate structured knowledge for tens of billions of product items.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a system for producing and serving item knowledge at the scale of tens of billions of SKUs and hundreds of millions of daily updates. It addresses three challenges—fast-emerging concepts, high-quality production at massive scale, and diverse downstream uses—through four pillars built around LLMs and VLMs. The approach relies on human-AI ontology collaboration, a semantic search then discrimination pipeline, self-evolving models, and a unified data-service tunnel. Reported outcomes include 94.2 percent precision and 82.8 percent recall in knowledge production along with business metrics such as 80.4 percent search coverage and a 37 percent reduction in quality issues. A sympathetic reader would care because the system claims to turn raw catalog data into reliable structured knowledge that supports search, recommendation, and operations without proportional growth in manual effort.

Core claim

The central claim is that the S2D knowledge identification architecture, when paired with self-evolving item-understanding LLMs and VLMs, enables stable and controllable model improvement that produces item knowledge at 94.2 percent precision and 82.8 percent recall while supporting dynamic ontology evolution with millions of entries and a unified item tunnel for service delivery across core business scenarios.

What carries the argument

The Semantic Search then Discrimination (S2D) architecture that identifies and discriminates item knowledge at high throughput when combined with self-evolving LLMs and VLMs.

If this is right

The ontology can expand agilely to millions of entries through ongoing human-AI collaboration.
Knowledge production scales to hundreds of millions of item updates per day on available hardware.
Search traffic coverage reaches 80.4 percent with an 80 percent automated fill rate for core attributes.
Item information quality issues decrease by 37 percent when the system is deployed in search, recommendation, and operations.
Hundreds of billions of item-knowledge assets accumulate over time as the platform runs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same four-pillar structure could be tested on other large, frequently updated catalogs such as scientific publications or supply-chain inventories.
If the self-evolving property holds, long-term operating costs might decline as manual oversight requirements shrink relative to catalog size.
The unified tunnel might allow direct real-time feedback loops from downstream applications back into model improvement.
Extending the ontology engineering pillar to incorporate user-generated content could further accelerate concept emergence handling.

Load-bearing premise

The self-evolving LLMs and VLMs together with the S2D architecture maintain their stated precision and recall across tens of billions of SKUs and hundreds of millions of daily updates without degradation or heavy manual intervention.

What would settle it

Observation of precision falling below 90 percent or recall below 75 percent on a production sample of several hundred million updates after initial deployment would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2606.28070 by Chan Long, Chaofan Chen, Chaohui Dong, Chao Liu, Chunyuan Guo, Danping Liu, Debin Liu, Deping Xiang, Fulai Xu, Guangyue Liu, Hao Li, Huichun Hu, Jianan Wang, Jianbo Zhao, Jian Yang, Jiaoyang Li, Jiaxing Wang, Jinglong Li, Jinjin Guo, Jun Fang, Jun Liu, Kai Zhou, Lili Gao, Li Wang, Liying Chen, Luning Yang, Mengdi Zhou, Oxygen AIIC, Pengzhang Liu, Qianyun Wang, Qi Lv, Qixia Jiang, Ruyue Li, Shimu Liang, Shuxing Wang, Sijie Zhang, Siqi Li, Tianhao Gao, Wang Ke, Weihu Huang, Wencan Lai, Wenjie Zhang, Xiaohui Zhang, Xiaojing Dong, Ya Liu, Yifeng Zhang, Yixiang Wang, Yongtai Zhang, Yongyi Liao, Zhaoru Chen, Zhen Chen, Zhiyong Ma, Zhiyuan Liu, Zhongwei Liu, Ziyan Xing.

**Figure 1.** Figure 1: Typical failure cases in traditional item knowledge systems across the demand, supply, and operations sides. efficiency, and experience” its core strategic priorities. As e-commerce has grown rapidly, traditional item knowledge systems can no longer support this strategy effectively, giving rise to three industrialscale bottlenecks across the demand, supply, and operations sides, as illustrated in [PITH_… view at source ↗

**Figure 2.** Figure 2: Overview of Oxygen AIIC across the item lifecycle. Ontology, and AI Item Library jointly support category planning, merchant workflows, user understanding, search, recommendation, and platform operations. These efforts confirm the feasibility of large models for intelligent item understanding. However, deploying them at JD, a platform that spans virtually every retail category and manages tens of billions … view at source ↗

**Figure 3.** Figure 3: Overall architecture of JD Oxygen AI Item Center. Oxygen AIIC integrates ontology engineering, AI Item Library, the item understanding LLMs/VLMs, the item tunnel, and the application matrix into a closed-loop industrial system. Item-Understanding LLMs/VLMs The item-understanding LLMs/VLMs support both ontology construction and AI Item Library production, serving as the foundation for continuous improvemen… view at source ↗

**Figure 4.** Figure 4: Human–AI collaborative ontology engineering. Human experts establish the fundamental ontology backbone, while an automated pipeline dynamically discovers, fuses, and validates emerging concepts from multi-source heterogeneous data. 3.2.2 Algorithm-driven ontology growth (bottom-up) Building upon the expert-defined ontology backbone and continuously incorporating signals from user behavior and industry tre… view at source ↗

**Figure 5.** Figure 5: Production architecture of the AI Item Library. Taking item data and a dynamically evolving ontology as input, the pipeline first mitigates computational redundancy across the SKU and attribute dimensions, and then performs precise item-to-ontology recognition through a two-stage “Semantic Search then Discrimination” (S 2D) engine, powered by the item understanding LLMs/VLMs. 4.2 Item Knowledge Recognit… view at source ↗

**Figure 6.** Figure 6: Overview of the Oxygen AIIC framework for the item understanding LLMs/VLMs. Constructed upon a unified multi-task item understanding foundation model, the framework supports incremental capability expansion, incorporates instruction-following knowledge representation, and implements a closed-loop model self-evolution mechanism to continuously enhance model performance and data quality. its continuous evo… view at source ↗

**Figure 7.** Figure 7: Incremental adaptation based on LoRAM experts and adaptive expert composition. A frozen SFT backbone is combined with multiple lightweight expert updates, and GRPO optimizes expert composition via task feedback. under limited business data, even when the learning rate is increased. Consequently, we introduce LoRAM initialization based on the Magnitude Principle (Zhang et al., 2026b). By directly construct… view at source ↗

**Figure 8.** Figure 8: Instruction-following knowledge representation training. The framework transfers reasoning capability through latent chain-of-thought (Latent CoT) distillation and enhances representational robustness via adaptive feature-space perturbation. In e-commerce, representation models must extract knowledge signals from comprehensive item information. Traditional embeddings are susceptible to interference from l… view at source ↗

**Figure 9.** Figure 9: provides a system-level view of the self-evolution loop; the following paragraphs delineate its four modules. Module 1: Data Evaluation Hardcase Set Consistency Confidence Stability Badcase Set Module 2: Data Analysis Item Understanding LLMs/VLMs Boundary Confusion Hallucination Expression Deviation Module 3: Data Synthesis Factual Constraints Boundary Calibration Fact Alignment Module 4: Data Selection Sy… view at source ↗

read the original abstract

JD$.$com, one of the world's largest e-commerce platforms, serves over 700 million active users and millions of merchants, with a catalog of tens of billions of SKUs. At this scale, high-quality, structured item knowledge underpins a better consumer experience, lower management costs, and higher operational efficiency-yet producing and serving it poses three industrial-scale challenges: fast-emerging concepts, high-quality knowledge production for massive SKUs, and diverse downstream requirements. To address these challenges, we present the JD Oxygen AI Item Center (Oxygen AIIC), an industrial-scale platform built on LLMs/VLMs for item-knowledge production and service. Oxygen AIIC is built around four core pillars: (i) ontology engineering driven by efficient human-AI collaboration, which supports the dynamic evolution and agile expansion of an ontology with millions of entries; (ii) a "Semantic Search then Discrimination"(S2D) knowledge identification architecture that, combined with throughput improvement strategies, enables scalable, extensible, and high-throughput AI Item Library production for tens of billions of SKUs; (iii) self-evolving item-understanding LLMs/VLMs that improve in a stable and controllable manner, enabling knowledge production with 94.2% precision and 82.8% recall; and (iv) a unified item tunnel that serves as the data and service hub. Oxygen AIIC now covers tens of thousands of JD categories and processes hundreds of millions of item updates per day on Huawei Ascend NPUs. It has accumulated hundreds of billions of item-knowledge assets. Deployed across core business scenarios-including search, recommendation, operations, category planning-Oxygen AIIC has delivered measurable gains at scale. Search-traffic coverage reaches 80.4%, item-information quality issues drop by 37%, the automated fill rate of core attributes during item listing exceeds 80%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

JD's paper describes a large deployed LLM system for e-commerce items but states performance numbers without any evaluation details or baselines.

read the letter

This paper is a system description from JD.com on their Oxygen AIIC platform for item knowledge at tens of billions of SKUs. The main takeaway is that they report 94.2% precision and 82.8% recall from self-evolving models plus real business gains, yet supply no protocol for how those numbers were obtained.

The work covers four pillars: human-AI ontology engineering for millions of dynamic entries, an S2D architecture for scalable knowledge extraction, the self-evolving LLMs/VLMs themselves, and a unified item tunnel as the service hub. It runs on Ascend NPUs, handles hundreds of millions of daily updates, and claims integration into search, recommendation, and operations with 80.4% search coverage and a 37% drop in quality issues.

What it does reasonably is lay out how the pieces connect in a production setting at genuine industrial volume. The business metrics show the system is live and tied to measurable outcomes rather than isolated benchmarks.

The soft spot is the complete absence of evaluation details. No test sets, inter-annotator agreement, held-out data, or non-evolving baselines appear for the precision/recall figures. The claim that the models improve in a stable, controllable way across that scale rests on assertion alone. If the full paper adds logs or internal validation, that would help; otherwise the numbers stay untestable.

This is for engineers and applied teams working on LLM deployments in retail or similar domains who want architecture patterns from a real catalog. It offers little for readers seeking new methods or reproducible experiments.

I would send it to peer review because the scale is substantial and the structure could be informative to others, though it would need added methodology to stand as a research contribution.

Referee Report

1 major / 0 minor

Summary. The paper introduces the JD Oxygen AI Item Center (Oxygen AIIC), an industrial platform for item knowledge production and service using LLMs and VLMs at JD.com's scale of tens of billions of SKUs. It is structured around four pillars: (i) ontology engineering via human-AI collaboration for dynamic ontology evolution, (ii) Semantic Search then Discrimination (S2D) architecture for scalable knowledge identification, (iii) self-evolving LLMs/VLMs achieving 94.2% precision and 82.8% recall, and (iv) a unified item tunnel for data and services. The system processes hundreds of millions of updates daily and reports business impacts like 80.4% search traffic coverage and 37% reduction in quality issues.

Significance. If the performance claims are substantiated, this work would be significant as one of the largest reported deployments of LLM/VLM systems in e-commerce, demonstrating practical solutions to challenges of scale, dynamic concepts, and diverse requirements. It provides a case study in integrating AI for knowledge management with measurable operational benefits.

major comments (1)

[Abstract, pillar (iii)] Abstract, pillar (iii): The assertion that self-evolving item-understanding LLMs/VLMs improve in a stable and controllable manner, enabling knowledge production with 94.2% precision and 82.8% recall, is presented without any description of the evaluation methodology, test sets, baselines, ground-truth labeling process, or protocols for measuring stability across catalog size and update rates. This is central to validating the core technical contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential significance of our industrial deployment. We address the single major comment below.

read point-by-point responses

Referee: [Abstract, pillar (iii)] Abstract, pillar (iii): The assertion that self-evolving item-understanding LLMs/VLMs improve in a stable and controllable manner, enabling knowledge production with 94.2% precision and 82.8% recall, is presented without any description of the evaluation methodology, test sets, baselines, ground-truth labeling process, or protocols for measuring stability across catalog size and update rates. This is central to validating the core technical contribution.

Authors: We agree that the evaluation methodology, test sets, baselines, ground-truth labeling process, and stability protocols are essential to substantiate the reported 94.2% precision and 82.8% recall. The current manuscript does not provide these details. In the revised version we will add a dedicated subsection under pillar (iii) that describes: (1) the construction and size of the held-out test sets, (2) the multi-stage human-expert ground-truth labeling protocol, (3) the baselines against which the self-evolving models are compared, and (4) the quantitative stability and controllability measurements across catalog scale and daily update volume. These additions will directly address the concern while preserving the industrial confidentiality constraints on proprietary data. revision: yes

Circularity Check

0 steps flagged

No circularity: high-level industrial system description with no derivations or self-referential predictions

full rationale

The paper is a descriptive account of an industrial platform (Oxygen AIIC) built around four pillars, with performance numbers (94.2% precision, 82.8% recall, 80.4% search coverage) asserted as outcomes of the deployed system. No equations, formal derivations, fitted parameters, or predictive claims appear in the provided text. The architecture claims (S2D, self-evolving models, ontology engineering) are presented as engineering choices rather than results derived from prior steps within the paper. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The numerical claims rest on undisclosed evaluation protocols rather than any internal reduction to inputs, so no circular step exists by the enumerated criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review is based solely on the abstract, which contains no mathematical content, free parameters, axioms, or invented entities. The system description assumes standard LLM capabilities scale to the stated industrial volumes.

pith-pipeline@v0.9.1-grok · 6093 in / 1122 out tokens · 73996 ms · 2026-06-29T04:07:30.510126+00:00 · methodology

JD Oxygen AI Item Center (Oxygen AIIC) V1: An Industrial-Scale LLM/VLM-Centric Solution for Item Understanding, Management, and Applications

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)