{"work":{"id":"0135787d-f4bc-4966-8b6d-8c2e1f6bd31f","openalex_id":null,"doi":null,"arxiv_id":"2203.03850","raw_key":null,"title":"UniXcoder: Unified Cross-Modal Pre-training for Code Representation","authors":null,"authors_text":"Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, Jian Yin","year":2022,"venue":"cs.CL","abstract":"Pre-trained models for programming languages have recently demonstrated great success on code intelligence. To support both code-related understanding and generation tasks, recent works attempt to pre-train unified encoder-decoder models. However, such encoder-decoder framework is sub-optimal for auto-regressive tasks, especially code completion that requires a decoder-only manner for efficient inference. In this paper, we present UniXcoder, a unified cross-modal pre-trained model for programming language. The model utilizes mask attention matrices with prefix adapters to control the behavior of the model and leverages cross-modal contents like AST and code comment to enhance code representation. To encode AST that is represented as a tree in parallel, we propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree. Furthermore, we propose to utilize multi-modal contents to learn representation of code fragment with contrastive learning, and then align representations among programming languages using a cross-modal generation task. We evaluate UniXcoder on five code-related tasks over nine datasets. To further evaluate the performance of code fragment representation, we also construct a dataset for a new task, called zero-shot code-to-code search. Results show that our model achieves state-of-the-art performance on most tasks and analysis reveals that comment and AST can both enhance UniXcoder.","external_url":"https://arxiv.org/abs/2203.03850","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-07-04T00:59:19.003745+00:00","pith_arxiv_id":"2203.03850","created_at":"2026-05-09T06:35:19.931707+00:00","updated_at":"2026-07-04T00:59:19.003745+00:00","title_quality_ok":true,"display_title":"UniXcoder: Unified Cross-Modal Pre-training for Code Representation","render_title":"UniXcoder: Unified Cross-Modal Pre-training for Code Representation"},"hub":{"state":{"work_id":"0135787d-f4bc-4966-8b6d-8c2e1f6bd31f","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":28,"external_cited_by_count":null,"distinct_field_count":7,"first_pith_cited_at":"2024-02-02T13:42:50+00:00","last_pith_cited_at":"2026-07-01T06:46:13+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-07-04T15:16:50.960593+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":5},{"context_role":"method","n":1}],"polarity_counts":[{"context_polarity":"background","n":5},{"context_polarity":"use_method","n":1}],"runs":{},"summary":{},"graph":{},"authors":[]}}