CatFlow: Co-generation of Slab-Adsorbate Systems via Flow Matching

Honghui Kim; Minkyu Kim; Nayoung Kim; Sungsoo Ahn

arxiv: 2602.05372 · v2 · pith:O575LETZnew · submitted 2026-02-05 · ❄️ cond-mat.mtrl-sci

CatFlow: Co-generation of Slab-Adsorbate Systems via Flow Matching

Minkyu Kim , Nayoung Kim , Honghui Kim , Sungsoo Ahn This is my paper

Pith reviewed 2026-05-21 14:15 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci

keywords heterogeneous catalysisgenerative modelingflow matchingslab-adsorbate interfacescatalyst designstructure generationOpen Catalyst dataset

0 comments

The pith

CatFlow generates higher-fidelity slab-adsorbate catalyst structures by applying flow matching to a factorized primitive-cell representation that reduces learnable variables by 9.2 times.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CatFlow, a flow matching-based framework for generating slab-adsorbate systems in heterogeneous catalyst design. It uses a primitive cell-based factorized representation that reduces learnable variables by an average of 9.2 times while encoding surface orientation explicitly. This addresses the coupling between surface geometry and adsorbate interactions that previous methods struggled with. Experiments on the Open Catalyst 2020 dataset show superior structural fidelity compared to autoregressive and sequential baselines. The generated structures also accurately capture adsorption energy distributions and lie closer to thermodynamic local minima.

Core claim

CatFlow is a flow matching model for the co-generation of slab and adsorbate structures using a primitive cell-based factorized representation of the slab-adsorbate complex. This representation reduces the number of learnable variables by an average of 9.2x and explicitly encodes the surface orientation, resulting in generated catalysts with improved structural fidelity and adsorption energies closer to physical plausibility.

What carries the argument

The primitive cell-based factorized representation of the slab-adsorbate complex that preserves intrinsic coupling between surface geometry and adsorbate interactions.

Load-bearing premise

The primitive cell-based factorized representation preserves the intrinsic coupling between surface geometry and adsorbate interactions without significant information loss.

What would settle it

A calculation or experiment demonstrating that CatFlow-generated structures do not lie closer to thermodynamic local minima than those from baseline models when evaluated with independent high-accuracy simulations.

Figures

Figures reproduced from arXiv: 2602.05372 by Honghui Kim, Minkyu Kim, Nayoung Kim, Sungsoo Ahn.

**Figure 1.** Figure 1: Visualization of the co-generation trajectory conditioned on the adsorbate. We illustrate the synchronized evolution of the slab-adsorbate system from the initial noise distribution (t = 0) to the final structure (t = 1) for de novo generation (top) and structure prediction (bottom). The framework jointly generates the components of the factorized representation to construct the slab-adsorbate system stru… view at source ↗

**Figure 2.** Figure 2: Histogram of atom counts in catalyst structures. We compare the histograms of atom counts for slab structures (blue) and their corresponding primitive cells (green). The primitive cells require fewer atoms than the slab structures, reducing the number of learnable variables for the generative model. configurations. Our factorized representation is designed to resolve redundancies in the slab-adsorbate syst… view at source ↗

**Figure 3.** Figure 3: Conceptual description of factorized representation. The primitive cell Sprim (top left) is transformed by the transformation matrix M to construct the slab lattice Lslab = M Lprim (top right). The slab structure is generated by replicating primitive cell atoms at all translation vectors lying within Lslab (bottom right). The vacuum scaling factor kvac extends Lslab along the vertical axis to create the s… view at source ↗

**Figure 4.** Figure 4: Visualization of generated slab-adsorbate structures. We present generated samples for (a) de novo generation and (b) structure prediction tasks. The multi-view renderings (perspective, top, and left) illustrate that CATFLOW constructs geometrically precise structures capable of accommodating diverse and bulky adsorbates. The accompanying adsorption energies further confirm that these generated configurati… view at source ↗

**Figure 5.** Figure 5: Adsorption energy histograms in de novo generation. Comparison of adsorption energy distributions for representative adsorbates, randomly selected to demonstrate diverse cases. Each panel shows the kernel density estimation (KDE) plots for the validation set in distribution (Val ID) (blue), CatGPT (orange), and CATFLOW (green). The vertical dashed lines indicate the mean adsorption energies for each distri… view at source ↗

**Figure 6.** Figure 6: Architecture of the atom attention encoder. The encoder processes noisy coordinates of the primitive cell and adsorbate, along with global lattice parameters, to generate a joint latent representation. B.1. Metadata Extraction and Slab Analysis We first extract the raw atomic structures, energies, and system identifiers from the OC20 LMDB database. To enable the decomposition of surface structures, we dete… view at source ↗

**Figure 7.** Figure 7: Architecture of the token transformer (trunk). This module refines the joint latent representations of primitive cell and adsorbate tokens through a series of DiT blocks, conditioned on the diffusion timestep. adsorbate. Only samples where the RMSD is strictly less than 1 × 10−5 for all three stages are included in the final dataset. Samples failing this criterion or raising calculation errors (∼ 0.8%) are… view at source ↗

**Figure 8.** Figure 8: Architecture of the atom attention decoder. The decoder transforms the refined latent representations into specific updates for atom positions, lattice parameters, and the transformation matrix via specialized projection heads. adsorbate positioning do not inflate the score. The procedure is as follows: 1. Slab isolation: For every generated sample, we isolate the surface structure by removing the adsorbat… view at source ↗

read the original abstract

Discovering heterogeneous catalysts tailored for specific reaction intermediates remains a fundamental bottleneck in materials science. While traditional trial-and-error methods and recent generative models have shown promise, they struggle to capture the intrinsic coupling between surface geometry and adsorbate interactions. To address this limitation, we propose CatFlow, a flow matching-based framework for de novo design and structure prediction of heterogeneous catalysts. Our model operates on a primitive cell-based factorized representation of the slab-adsorbate complex, reducing the number of learnable variables by an average of 9.2x while explicitly encoding the surface orientation of the slab-adsorbate interface. Experiments on the Open Catalyst 2020 dataset demonstrate that CatFlow significantly improves the structural fidelity of generated catalysts compared to autoregressive and sequential baselines. Further experiments show that the generated structures accurately capture the adsorption energy distributions of physically plausible interfaces and lie closer to thermodynamic local minima.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces CatFlow, a flow-matching generative framework for co-generating slab-adsorbate complexes in heterogeneous catalysis. It employs a primitive cell-based factorized representation that reduces learnable variables by an average of 9.2x while encoding surface orientation, and reports improved structural fidelity and closer matching to adsorption-energy distributions of physically plausible interfaces on the Open Catalyst 2020 dataset relative to autoregressive and sequential baselines.

Significance. If the empirical results hold after addressing validation gaps, the work could advance efficient de novo catalyst design by explicitly modeling geometry-adsorbate couplings with substantially fewer parameters. The flow-matching approach and reported proximity to thermodynamic minima represent potential strengths for generating realistic interfaces.

major comments (2)

The central empirical claims of improved structural fidelity and accurate capture of adsorption-energy distributions (Abstract) rest on the primitive cell-based factorized representation preserving intrinsic couplings. However, no quantitative error bars, ablation studies on long-range interactions, or explicit description of how structural fidelity and energy distributions were measured are provided, preventing full verification of the performance gains over baselines.
The assumption that the primitive-cell factorization (Abstract) reduces variables by 9.2x while preserving all relevant relaxations is load-bearing for the energy-fidelity claims. On OC20 surfaces with adsorbate-induced reconstructions or periodic effects beyond one cell, this could lead to generated structures whose DFT-relaxed energies deviate from true local minima, directly affecting the reported distribution-matching results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of CatFlow. We address each major comment below with clarifications and proposed revisions.

read point-by-point responses

Referee: The central empirical claims of improved structural fidelity and accurate capture of adsorption-energy distributions (Abstract) rest on the primitive cell-based factorized representation preserving intrinsic couplings. However, no quantitative error bars, ablation studies on long-range interactions, or explicit description of how structural fidelity and energy distributions were measured are provided, preventing full verification of the performance gains over baselines.

Authors: We agree that additional statistical details and methodological transparency are needed. In the revision we will add quantitative error bars (standard deviations across 5 independent runs) to all fidelity and energy-distribution metrics. We will also insert a dedicated subsection explicitly defining the structural fidelity measures (e.g., per-atom RMSD after alignment, adsorbate-surface distance histograms) and the adsorption-energy protocol (single-point DFT on unrelaxed generated structures using the same settings as OC20). Regarding long-range interaction ablations, the primitive-cell factorization is motivated by locality of adsorption; we will add a short discussion of this design choice and its implications, though a full ablation would require new large-supercell experiments beyond the current scope. revision: partial
Referee: The assumption that the primitive-cell factorization (Abstract) reduces variables by 9.2x while preserving all relevant relaxations is load-bearing for the energy-fidelity claims. On OC20 surfaces with adsorbate-induced reconstructions or periodic effects beyond one cell, this could lead to generated structures whose DFT-relaxed energies deviate from true local minima, directly affecting the reported distribution-matching results.

Authors: The 9.2x reduction is an empirical average computed over the OC20 training set; the factorization explicitly retains surface orientation and adsorbate placement within the primitive cell, allowing the flow-matching model to learn the joint distribution of geometry and adsorbate interactions. While we acknowledge that adsorbate-induced reconstructions extending beyond a single primitive cell are possible in principle, the OC20 dataset predominantly features local relaxations that are captured by our representation. Generated structures are shown to lie closer to DFT-relaxed minima than baselines, supporting that the learned distribution approximates the relevant thermodynamic landscape. We will expand the methods and discussion sections to clarify these points and to note the assumption's scope. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation is self-contained on external data

full rationale

The CatFlow framework introduces a flow-matching model on a primitive-cell factorized representation of slab-adsorbate systems and evaluates it directly on the public Open Catalyst 2020 dataset against standard autoregressive and sequential baselines. No equation, loss term, or central claim reduces by construction to a fitted parameter or self-citation whose authors overlap with the present work; the reported gains in structural fidelity and adsorption-energy distribution matching are measured outcomes on held-out external structures rather than tautological restatements of the input representation or prior author results. The factorization is presented as an explicit modeling choice whose validity is tested empirically, not assumed via self-referential definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard generative-modeling assumptions plus the untested premise that the chosen factorization retains all physically relevant couplings; no new physical entities are introduced.

axioms (1)

domain assumption Flow matching can model the joint distribution of slab geometry and adsorbate placement when the system is expressed in a primitive-cell factorized form.
This is the core modeling choice that enables the reported reduction in variables and the claimed fidelity gains.

pith-pipeline@v0.9.0 · 5685 in / 1292 out tokens · 49290 ms · 2026-05-21T14:15:04.330103+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our model operates on a primitive cell-based factorized representation of the slab-adsorbate complex, reducing the number of learnable variables by an average of 9.2× while explicitly encoding the surface orientation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

We then identify the primitive unit cell of this slab structure without a vacuum using a symmetry tolerance of 0.1 ˚A

Primitive cell extraction: We isolate the surface atoms (tags 0 and 1) and remove the vacuum layer by compressing the unit cell along the z-axis based on the ratio nslab/(nslab +n vac), resulting in a slab without vacuum. We then identify the primitive unit cell of this slab structure without a vacuum using a symmetry tolerance of 0.1 ˚A. This resulting s...

work page
[2]

This is represented by an integertransformation matrix M∈Z 3×3, calculated via the dot product of the slab lattice matrix and the inverse of the primitive cell lattice matrix

Transformation matrix calculation: We compute the affine transformation required to map the primitive cell back to the slab as the supercell. This is represented by an integertransformation matrix M∈Z 3×3, calculated via the dot product of the slab lattice matrix and the inverse of the primitive cell lattice matrix

work page
[3]

This scalar value represents the expansion required along the surface normal to restore the simulation cell dimensions

Vacuum scaling: To recover the original vacuum spacing in the system lattice cell, we calculate thevacuum scaling factor kvac = (nslab +n vac)/nslab. This scalar value represents the expansion required along the surface normal to restore the simulation cell dimensions. 4.Adsorbate positioning: The adsorbate atoms (tag 2) are separated from the surface. Th...

work page
[4]

The atomic species are assigned based on the input atomic numbers, and padding masks are applied to filter out invalid entries

Primitive cell instantiation: The primitive cell Sprim is first constructed using the predicted lattice parameters (a, b, c, α, β, γ) and the Cartesian atomic coordinates of the primitive cell. The atomic species are assigned based on the input atomic numbers, and padding masks are applied to filter out invalid entries

work page
[5]

This transformation utilizes the predicted transformation matrix M∈R 3×3

Supercell transformation: The primitive cell is expanded into the lattice of the slab as the specific surface supercell configuration. This transformation utilizes the predicted transformation matrix M∈R 3×3. The elements of M are rounded to the nearest integers to ensure a valid periodic transformation

work page
[6]

The third unit cell vector c is scaled by the predicted vacuum scaling factork vac

Vacuum addition: To recover the lattice cell of the system required for the energy calculations with simulations, the lattice of the slab is expanded along the surface normal vector. The third unit cell vector c is scaled by the predicted vacuum scaling factork vac. 4.System integration and tagging: The adsorbate atoms are introduced into the reconstructe...

work page
[7]

This is achieved by filtering atoms based on their tags, where atoms with the tag 2 (representing the adsorbate) are excluded

Slab isolation:For every generated sample, we isolate the surface structure by removing the adsorbate atoms. This is achieved by filtering atoms based on their tags, where atoms with the tag 2 (representing the adsorbate) are excluded

work page
[8]

This algorithm groups structures that are equivalent under translation, rotation, and periodic boundary conditions using default tolerances (ltol= 0.2,stol= 0.3,angle tol= 5)

Structure grouping:We employ the StructureMatcher class from the pymatgen library to identify structural duplicates. This algorithm groups structures that are equivalent under translation, rotation, and periodic boundary conditions using default tolerances (ltol= 0.2,stol= 0.3,angle tol= 5)

work page
[9]

Compositional Diversity To quantify the chemical variety of the generated structures, we calculate the mean pairwise distance between their compositional feature vectors

Metric calculation:The uniqueness score is defined as the ratio of the number of unique structure groups ( Nunique) to the total number of valid generated samples (Ntotal): Uniqueness= Nunique Ntotal (8) D.2. Compositional Diversity To quantify the chemical variety of the generated structures, we calculate the mean pairwise distance between their composit...

work page
[10]

Specifically, we employ the Magpie preset, which maps a composition to a 132-dimensional vector

Featurization:We represent the chemical composition of each structure using the ElementProperty featurizer from the matminer library. Specifically, we employ the Magpie preset, which maps a composition to a 132-dimensional vector. This vector contains weighted statistics (mean, average deviation, range, etc.) of fundamental elemental properties such as at...

work page
[11]

We apply a standard scaler to transform the distribution of each feature to have a mean of 0 and a variance of 1

Normalization:To ensure that all feature dimensions contribute equally to the distance metric, we standardize the feature vectors across the entire dataset. We apply a standard scaler to transform the distribution of each feature to have a mean of 0 and a variance of 1

work page
[12]

Metric calculation:We define the compositional diversity Dcomp as the average Euclidean distance between the normalized feature vectors (x) of all unique pairs in the dataset: Dcomp = 2 N(N−1) X i<j ∥xi −x j∥2 whereNis the total number of valid samples. 14

work page

[1] [1]

We then identify the primitive unit cell of this slab structure without a vacuum using a symmetry tolerance of 0.1 ˚A

Primitive cell extraction: We isolate the surface atoms (tags 0 and 1) and remove the vacuum layer by compressing the unit cell along the z-axis based on the ratio nslab/(nslab +n vac), resulting in a slab without vacuum. We then identify the primitive unit cell of this slab structure without a vacuum using a symmetry tolerance of 0.1 ˚A. This resulting s...

work page

[2] [2]

This is represented by an integertransformation matrix M∈Z 3×3, calculated via the dot product of the slab lattice matrix and the inverse of the primitive cell lattice matrix

Transformation matrix calculation: We compute the affine transformation required to map the primitive cell back to the slab as the supercell. This is represented by an integertransformation matrix M∈Z 3×3, calculated via the dot product of the slab lattice matrix and the inverse of the primitive cell lattice matrix

work page

[3] [3]

This scalar value represents the expansion required along the surface normal to restore the simulation cell dimensions

Vacuum scaling: To recover the original vacuum spacing in the system lattice cell, we calculate thevacuum scaling factor kvac = (nslab +n vac)/nslab. This scalar value represents the expansion required along the surface normal to restore the simulation cell dimensions. 4.Adsorbate positioning: The adsorbate atoms (tag 2) are separated from the surface. Th...

work page

[4] [4]

The atomic species are assigned based on the input atomic numbers, and padding masks are applied to filter out invalid entries

Primitive cell instantiation: The primitive cell Sprim is first constructed using the predicted lattice parameters (a, b, c, α, β, γ) and the Cartesian atomic coordinates of the primitive cell. The atomic species are assigned based on the input atomic numbers, and padding masks are applied to filter out invalid entries

work page

[5] [5]

This transformation utilizes the predicted transformation matrix M∈R 3×3

Supercell transformation: The primitive cell is expanded into the lattice of the slab as the specific surface supercell configuration. This transformation utilizes the predicted transformation matrix M∈R 3×3. The elements of M are rounded to the nearest integers to ensure a valid periodic transformation

work page

[6] [6]

The third unit cell vector c is scaled by the predicted vacuum scaling factork vac

Vacuum addition: To recover the lattice cell of the system required for the energy calculations with simulations, the lattice of the slab is expanded along the surface normal vector. The third unit cell vector c is scaled by the predicted vacuum scaling factork vac. 4.System integration and tagging: The adsorbate atoms are introduced into the reconstructe...

work page

[7] [7]

This is achieved by filtering atoms based on their tags, where atoms with the tag 2 (representing the adsorbate) are excluded

Slab isolation:For every generated sample, we isolate the surface structure by removing the adsorbate atoms. This is achieved by filtering atoms based on their tags, where atoms with the tag 2 (representing the adsorbate) are excluded

work page

[8] [8]

This algorithm groups structures that are equivalent under translation, rotation, and periodic boundary conditions using default tolerances (ltol= 0.2,stol= 0.3,angle tol= 5)

Structure grouping:We employ the StructureMatcher class from the pymatgen library to identify structural duplicates. This algorithm groups structures that are equivalent under translation, rotation, and periodic boundary conditions using default tolerances (ltol= 0.2,stol= 0.3,angle tol= 5)

work page

[9] [9]

Compositional Diversity To quantify the chemical variety of the generated structures, we calculate the mean pairwise distance between their compositional feature vectors

Metric calculation:The uniqueness score is defined as the ratio of the number of unique structure groups ( Nunique) to the total number of valid generated samples (Ntotal): Uniqueness= Nunique Ntotal (8) D.2. Compositional Diversity To quantify the chemical variety of the generated structures, we calculate the mean pairwise distance between their composit...

work page

[10] [10]

Specifically, we employ the Magpie preset, which maps a composition to a 132-dimensional vector

Featurization:We represent the chemical composition of each structure using the ElementProperty featurizer from the matminer library. Specifically, we employ the Magpie preset, which maps a composition to a 132-dimensional vector. This vector contains weighted statistics (mean, average deviation, range, etc.) of fundamental elemental properties such as at...

work page

[11] [11]

We apply a standard scaler to transform the distribution of each feature to have a mean of 0 and a variance of 1

Normalization:To ensure that all feature dimensions contribute equally to the distance metric, we standardize the feature vectors across the entire dataset. We apply a standard scaler to transform the distribution of each feature to have a mean of 0 and a variance of 1

work page

[12] [12]

Metric calculation:We define the compositional diversity Dcomp as the average Euclidean distance between the normalized feature vectors (x) of all unique pairs in the dataset: Dcomp = 2 N(N−1) X i<j ∥xi −x j∥2 whereNis the total number of valid samples. 14

work page