MOTOR: Learning ID-free Item Representation with Token Crossing for Embedding-based Multimodal Recommendation
read the original abstract
While multimodal recommendation models have effectively integrated visual and textual information, their reliance on unique ID embeddings constitutes a fundamental performance bottleneck. Specifically, ID-based paradigms suffer from three limitations: (1) \textbf{Information Isolation}, where unique IDs prevent semantic information exchange among related items; (2) \textbf{Cold-Start Vulnerability}, as ID embeddings are difficult to optimize with sparse interactions; and (3) \textbf{Storage Inefficiency}, where parameter costs scale linearly with item quantity. To overcome these challenges, we propose \textbf{MOTOR}, a novel \textbf{ID-free MultimOdal TOken Representation} scheme. MOTOR replaces explicit item IDs with learnable, shared multimodal tokens, fundamentally transforming the recommender into an ID-free framework. Methodologically, we first employ product quantization to discretize raw multimodal features into compact token IDs. These tokens serve as implicit item features, which are then synthesized via a novel \textbf{Token Cross Network (TCN)} to capture high-order interaction patterns. This "discretize-and-interact" mechanism enables semantic sharing across items and significantly compresses the model size without introducing complex auxiliary losses. Extensive experiments across nine mainstream models demonstrate the significant performance improvement achieved by MOTOR. Further, MOTOR improves the capability of these models to recommend items in cold-start scenarios.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.