pith. sign in

arxiv: 2501.12919 · v2 · pith:UCQYEUD2new · submitted 2025-01-22 · 💻 cs.LG · cond-mat.mtrl-sci

Bridging Text and Crystal Structures: Literature-driven Contrastive Learning for Materials Science

classification 💻 cs.LG cond-mat.mtrl-sci
keywords crystalstructuresmaterialsclaspembeddingspacescapturecontrastive
0
0 comments X
read the original abstract

Understanding structure-property relationships is an essential yet challenging aspect of materials discovery and development. To facilitate this process, recent studies in materials informatics have sought latent embedding spaces of crystal structures to capture their similarities based on properties and functionalities. However, abstract feature-based embedding spaces are human-unfriendly and prevent intuitive and efficient exploration of the vast materials space. Here we introduce Contrastive Language--Structure Pre-training (CLaSP), a learning paradigm for constructing crossmodal embedding spaces between crystal structures and texts. CLaSP aims to achieve material embeddings that 1) capture property- and functionality-related similarities between crystal structures and 2) allow intuitive retrieval of materials via user-provided description texts as queries. To compensate for the lack of sufficient datasets linking crystal structures with textual descriptions, CLaSP leverages a dataset of over 400,000 published crystal structures and corresponding publication records, including paper titles and abstracts, for training. We demonstrate the effectiveness of CLaSP through text-based crystal structure screening and embedding space visualization.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Atomistic Language Models Understand and Generate Materials

    cs.LG 2026-06 unverdicted novelty 7.0

    ALMs unify pretrained atomistic encoder, LLM, and denoising diffusion via continuous projectors and staged training to reach SOTA on text-conditioned crystal prediction and de novo generation.