pith. machine review for the scientific record. sign in

arxiv: 2507.21990 · v4 · submitted 2025-07-29 · 💻 cs.CE · cs.AI

Recognition: unknown

ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge

Authors on Pith no claims yet
classification 💻 cs.CE cs.AI
keywords chemicalreasoningatomizedknowledgechemdfm-rllmsmodelfunctional
0
0 comments X
read the original abstract

Atomized chemical knowledge, such as functional group information of molecules and reactions, plays a pivotal intermediate role in the reasoning process that connects molecular structures with their properties and reactivities. While large language models (LLMs) have achieved impressive progress, the absence of atomized chemical knowledge results in their superficial understanding of chemistry and limited chemical reasoning capabilities. In this work, to tackle this problem, we develop a Chemical Reasoning LLM, ChemDFM-R. We first construct a comprehensive dataset of atomized chemical knowledge, ChemFG, annotating the presence of functional groups in molecules and the changes of functional groups during chemical reactions, to enhance the model's understanding of the fundamental principles and internal logic of chemistry. Then, we propose a mixed-source distillation method that initializes the model's reasoning capability with limited distilled data, and develop a four-stage training pipeline to equip the model with atomized chemical knowledge and chemical reasoning logic. Experiments on diverse chemical benchmarks demonstrate that ChemDFM-R achieves cutting-edge performance while providing interpretable, rationale-driven outputs, surpassing both the general-domain LLMs and domain-specific chemical LLMs. Moreover, ChemDFM-R achieves comparable or superior performance compared with cutting-edge commercial LLMs, such as o4-mini. Further case studies illustrate how explicit reasoning chains significantly improve the model's reliability, transparency, and practicality in real-world human-AI collaboration scenarios.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MolDeTox: Evaluating Language Model's Stepwise Fragment Editing for Molecular Detoxification

    cs.AI 2026-05 unverdicted novelty 6.0

    MolDeTox is a new benchmark that shows fragment-level stepwise editing by LLMs and VLMs improves structural validity and detoxification quality over prior toxicity-focused evaluations.

  2. Mol-Debate: Multi-Agent Debate Improves Structural Reasoning in Molecular Design

    cs.AI 2026-04 unverdicted novelty 6.0

    Mol-Debate applies multi-agent debate in an iterative loop with perspective orchestration to achieve state-of-the-art text-guided molecular design, scoring 59.82% exact match on ChEBI-20 and 50.52% weighted success on...