MOTOR: Learning ID-free Item Representation with Token Crossing for Embedding-based Multimodal Recommendation

Jianghao Lin; Jiarui Jin; Kangning Zhang; Ruilong Su; Weinan Zhang; Yingjie Qin; Yong Yu

read the original abstract

While multimodal recommendation models have effectively integrated visual and textual information, their reliance on unique ID embeddings constitutes a fundamental performance bottleneck. Specifically, ID-based paradigms suffer from three limitations: (1) \textbf{Information Isolation}, where unique IDs prevent semantic information exchange among related items; (2) \textbf{Cold-Start Vulnerability}, as ID embeddings are difficult to optimize with sparse interactions; and (3) \textbf{Storage Inefficiency}, where parameter costs scale linearly with item quantity. To overcome these challenges, we propose \textbf{MOTOR}, a novel \textbf{ID-free MultimOdal TOken Representation} scheme. MOTOR replaces explicit item IDs with learnable, shared multimodal tokens, fundamentally transforming the recommender into an ID-free framework. Methodologically, we first employ product quantization to discretize raw multimodal features into compact token IDs. These tokens serve as implicit item features, which are then synthesized via a novel \textbf{Token Cross Network (TCN)} to capture high-order interaction patterns. This "discretize-and-interact" mechanism enables semantic sharing across items and significantly compresses the model size without introducing complex auxiliary losses. Extensive experiments across nine mainstream models demonstrate the significant performance improvement achieved by MOTOR. Further, MOTOR improves the capability of these models to recommend items in cold-start scenarios.

MOTOR: Learning ID-free Item Representation with Token Crossing for Embedding-based Multimodal Recommendation

discussion (0)