pith. machine review for the scientific record. sign in

arxiv: 2604.13520 · v1 · submitted 2026-04-15 · 💻 cs.LG

Recognition: unknown

LEGO-MOF: Equivariant Latent Manipulation for Editable, Generative, and Optimizable MOF Design

Chaoran Zhang, Dongxu Ji, Guangyao Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:03 UTC · model grok-4.3

classification 💻 cs.LG
keywords MOF designequivariant latent spacegenerative modelstest-time optimizationcarbon captureLinkerVAESE(3) equivariancematerial optimization
0
0 comments X

The pith

An equivariant latent space turns discrete MOF graphs into continuously editable and optimizable designs for carbon capture.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that embedding MOF linker structures into a smooth SE(3)-equivariant latent space makes geometry-aware edits and property optimization possible in a fully differentiable way. Traditional generative approaches for these materials rely on fixed building-block libraries and non-differentiable steps that break information flow from target properties back to structure. By contrast, the proposed mapping supports implicit style transfer, zero-shot expansion, and test-time optimization that raises average pure CO2 uptake by 147.5 percent while keeping every generated structure chemically valid. This creates a scalable route from automated discovery through targeted editing to full MOF assembly.

Core claim

LinkerVAE maps discrete 3D chemical graphs of MOF linkers into a continuous SE(3)-equivariant latent space that supports geometry-preserving manipulations such as chemical style transfer and isoreticular expansion. A surrogate model then guides test-time optimization of latent codes drawn from existing MOFs, producing new designs whose pure CO2 uptake rises by an average relative 147.5 percent while structural validity is strictly preserved. The same latent representations integrate with a latent diffusion model and rigid-body assembly to construct complete, functional MOFs, establishing an end-to-end differentiable pipeline for editable and optimizable material design.

What carries the argument

LinkerVAE encodes discrete 3D chemical graphs into a continuous SE(3)-equivariant latent manifold that serves as the differentiable substrate for all edits, expansions, and property optimizations.

If this is right

  • Existing MOFs can be refined in latent space for substantially higher CO2 uptake without discrete library search or post-hoc fixes.
  • Zero-shot isoreticular expansion and style transfer become direct operations on the latent codes rather than manual building-block swaps.
  • The full pipeline remains scalable because latent diffusion plus rigid-body assembly converts optimized codes into complete crystal structures.
  • Property-targeted design no longer severs the gradient path from performance objective back to atomic coordinates.
  • Structural validity is maintained end-to-end because all operations stay inside the learned manifold of valid graphs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same latent manipulation approach could be applied to other porous or crystalline materials whose design spaces are currently handled with discrete enumeration.
  • Multi-objective optimization (for example balancing uptake with selectivity or thermal stability) could be performed inside the same continuous space without retraining the encoder.
  • High-throughput experimental validation loops could be closed by feeding measured properties back into the surrogate to refine the latent optimization further.
  • The equivariant structure of the latent space may expose symmetry-based design rules that are invisible when working directly with discrete graphs.

Load-bearing premise

Small continuous shifts inside the learned latent space decode to chemically valid and synthesizable MOF structures, and the surrogate model gives accurate property predictions for points never seen in training.

What would settle it

Decoding a batch of test-time-optimized latent codes into full MOFs and computing their actual CO2 uptake (by simulation or measurement) would falsify the claim if the realized uptake values fall far short of the surrogate predictions or if a large fraction of the structures violate bonding or stability rules.

read the original abstract

Metal-organic frameworks (MOFs) are highly promising for carbon capture, yet navigating their vast design space remains challenging. Recent deep generative models enable de novo MOF design but primarily act as feed-forward structure generators. By heavily relying on predefined building block libraries and non-differentiable post-optimization, they fundamentally sever the information flow required for continuous structural editing. Here, we propose a target-driven generative framework focused on continuous structural manipulation. At its core is LinkerVAE, which maps discrete 3D chemical graphs into a continuous, SE(3)-equivariant latent space. This smooth manifold unlocks geometry-aware manipulations, including implicit chemical style transfer and zero-shot isoreticular expansion. Building upon this, we introduce a test-time optimization (TTO) strategy, utilizing an accurate surrogate model to continuously optimize the latent graphs of existing MOFs toward desired properties. This approach systematically enhances carbon capture performance, achieving a striking average relative boost of 147.5% in pure CO2 uptake while strictly preserving structural validity. Integrated with a latent diffusion model and rigid-body assembly for full MOF construction, our framework establishes a scalable, fully differentiable pathway for both the automated discovery, targeted optimization and editing of functional materials.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces LEGO-MOF, a target-driven generative framework for MOF design centered on LinkerVAE, which encodes discrete 3D chemical graphs into a continuous SE(3)-equivariant latent space. This enables geometry-aware manipulations such as chemical style transfer and isoreticular expansion. The framework incorporates test-time optimization (TTO) using a surrogate model to optimize latent representations for enhanced properties, claiming an average 147.5% relative boost in pure CO2 uptake while preserving structural validity. It also integrates a latent diffusion model and rigid-body assembly for complete MOF construction.

Significance. If the claims hold, particularly the surrogate's ability to guide optimization in latent space to valid, improved MOFs, this work could significantly advance editable and optimizable generative models for materials discovery, offering a differentiable alternative to traditional library-based approaches in carbon capture applications. The equivariant latent space and TTO strategy represent potentially impactful contributions to the field of machine learning for chemistry.

major comments (3)
  1. [Abstract] Abstract: The headline claim of a 147.5% average relative boost in pure CO2 uptake is obtained exclusively via TTO against a surrogate model, yet the manuscript provides no description of the surrogate's training data, architecture, accuracy on held-out data, or error on out-of-distribution latent vectors produced by gradient steps. Without these, the reported gain cannot be distinguished from in-sample fitting or extrapolation artifacts.
  2. [Methods (TTO subsection)] Methods (TTO subsection): No quantitative OOD generalization curves, latent-space coverage analysis, or post-optimization validity statistics (e.g., charge balance, geometric realizability, or synthesizability scores) are supplied beyond the qualitative statement that structural validity is 'strictly preserved.' These omissions are load-bearing for the central performance claim.
  3. [Experiments/Results] Experiments/Results: The abstract and results report precise performance numbers without baselines, error bars, cross-validation protocol for the surrogate, or comparison against non-latent optimization methods, making it impossible to evaluate whether the TTO improvements exceed what could be achieved by simpler approaches.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'accurate surrogate model' is used without supporting metrics; consider qualifying it or moving the accuracy claim to the methods section with explicit numbers.
  2. [Methods] Notation: The SE(3)-equivariance of the latent space is asserted but the precise group action and how it is enforced in the VAE loss are not detailed in the provided text; a short equation or diagram would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The concerns about the surrogate model details, validation metrics, and experimental rigor are well-taken and directly impact the interpretability of our central claims. We address each major comment below and have prepared revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claim of a 147.5% average relative boost in pure CO2 uptake is obtained exclusively via TTO against a surrogate model, yet the manuscript provides no description of the surrogate's training data, architecture, accuracy on held-out data, or error on out-of-distribution latent vectors produced by gradient steps. Without these, the reported gain cannot be distinguished from in-sample fitting or extrapolation artifacts.

    Authors: We agree that the abstract and main text currently provide insufficient detail on the surrogate to fully substantiate the TTO results. The Methods section describes the surrogate at a high level as an SE(3)-equivariant GNN but omits training corpus size, architecture hyperparameters, held-out accuracy, and OOD analysis. In the revised manuscript we will expand the TTO subsection with these elements, including dataset provenance, validation MAE, and a short study of prediction error under latent perturbations comparable to those used in optimization. This addition will allow readers to assess whether the reported gains exceed fitting or extrapolation effects. revision: yes

  2. Referee: [Methods (TTO subsection)] Methods (TTO subsection): No quantitative OOD generalization curves, latent-space coverage analysis, or post-optimization validity statistics (e.g., charge balance, geometric realizability, or synthesizability scores) are supplied beyond the qualitative statement that structural validity is 'strictly preserved.' These omissions are load-bearing for the central performance claim.

    Authors: The current manuscript indeed relies on a qualitative statement of validity preservation. We will revise the TTO subsection to include quantitative post-optimization statistics: fraction of structures satisfying charge neutrality, geometric realizability after rigid-body assembly, and a synthesizability heuristic score. We will also add a latent-space coverage plot and a short OOD generalization curve showing surrogate error as a function of distance from the training manifold. These additions directly address the load-bearing nature of the claim. revision: yes

  3. Referee: [Experiments/Results] Experiments/Results: The abstract and results report precise performance numbers without baselines, error bars, cross-validation protocol for the surrogate, or comparison against non-latent optimization methods, making it impossible to evaluate whether the TTO improvements exceed what could be achieved by simpler approaches.

    Authors: We acknowledge the absence of these controls. The revised Results section will report error bars computed over five independent optimization runs, specify the 5-fold cross-validation protocol used to train and select the surrogate, and add two baseline comparisons: (1) direct gradient-based optimization in the original chemical graph space and (2) random latent-space sampling followed by the same validity filter. These additions will allow direct assessment of whether TTO outperforms simpler non-latent alternatives. revision: yes

Circularity Check

1 steps flagged

Surrogate optimization boost reduces to in-sample fit without verified OOD generalization

specific steps
  1. fitted input called prediction [Abstract]
    "utilizing an accurate surrogate model to continuously optimize the latent graphs of existing MOFs toward desired properties. This approach systematically enhances carbon capture performance, achieving a striking average relative boost of 147.5% in pure CO2 uptake while strictly preserving structural validity."

    The reported 147.5% relative boost is the direct output of optimizing latent codes to maximize the surrogate's predicted CO2 uptake. If the surrogate was fitted to the same uptake data (or closely related splits) against which the final structures are evaluated, the gain is statistically forced by the fit rather than an independent prediction of material improvement.

full rationale

The central performance claim (147.5% CO2 uptake boost) is obtained exclusively by test-time gradient optimization of latent codes against a surrogate model. The abstract presents this as an empirical enhancement of carbon capture performance, yet provides no independent verification that the surrogate retains accuracy on the continuously edited latent points (which lie outside the original training support) or that post-optimization structures remain valid under ground-truth simulation. When the surrogate is trained on the same property data used to compute the reported gain, the optimization step becomes a fitted-input prediction by construction, matching pattern 2. No other circular steps are identifiable from the given text.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on the unproven assumption that an SE(3)-equivariant latent space learned from discrete graphs supports continuous, validity-preserving edits and that a surrogate model trained on existing MOF data generalizes to the optimized latent points. No explicit free parameters are named, but training of both VAE and surrogate necessarily introduces fitted weights. No new physical entities are postulated.

free parameters (1)
  • VAE and surrogate training hyperparameters
    Weights and hyperparameters of LinkerVAE and the property surrogate are fitted to data; their values are not reported.
axioms (2)
  • domain assumption A continuous SE(3)-equivariant latent manifold exists for discrete 3D chemical graphs of MOF linkers and supports geometry-aware manipulations that preserve chemical validity.
    Invoked to justify implicit chemical style transfer and zero-shot isoreticular expansion.
  • domain assumption The surrogate model provides accurate property predictions for points reached by test-time optimization.
    Required for the reported 147.5% uptake improvement to be meaningful.

pith-pipeline@v0.9.0 · 5522 in / 1593 out tokens · 49182 ms · 2026-05-10T13:03:59.863279+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 32 canonical work pages · 2 internal anchors

  1. [1]

    Nature402, 276–279 (1999) https://doi.org/10.1038/46248

    Li, H., Eddaoudi, M., O’Keeffe, M., Yaghi, O.M.: Design and synthesis of an exceptionally stable and highly porous metal–organic framework. Nature402, 276–279 (1999) https://doi.org/10.1038/46248

  2. [2]

    Microporous and Mesoporous Materials73(1-2), 3–14 (2004) https: //doi.org/10.1016/j.micromeso.2004.03.034

    Rowsell, J.L.C., Yaghi, O.M.: Metal–organic frameworks: a new class of porous materials. Microporous and Mesoporous Materials73(1-2), 3–14 (2004) https: //doi.org/10.1016/j.micromeso.2004.03.034

  3. [3]

    ACS Omega7(49), 44507–44531 (2022) https://doi.org/ 10.1021/acsomega.2c05310

    Yusuf, V.F.,et al.: Review on metal–organic framework classification, synthetic approaches, and influencing factors: Applications in energy, drug delivery, and wastewater treatment. ACS Omega7(49), 44507–44531 (2022) https://doi.org/ 10.1021/acsomega.2c05310

  4. [4]

    Frontiers in Chemistry11(2023) https://doi.org/10.3389/fchem.2023.1245159 22

    Ma, L.-F., Li, D.-S., Yang, G.-P., Zhang, Q.: Porous metal–organic framework (mof) materials: design strategy, synthesis, sensing and catalysis. Frontiers in Chemistry11(2023) https://doi.org/10.3389/fchem.2023.1245159 22

  5. [5]

    Computers & Chemical Engineering167, 108022 (2022) https://doi.org/10.1016/j.compchemeng.2022

    Yin, X., Gounaris, C.E.: Computational discovery of metal–organic frame- works for sustainable energy systems: Open challenges. Computers & Chemical Engineering167, 108022 (2022) https://doi.org/10.1016/j.compchemeng.2022. 108022

  6. [6]

    Chemical Society Reviews 54, 367–395 (2025) https://doi.org/10.1039/D4CS00432A

    Han, Z., Yang, Y., Rushlow, J., Huo, J., Liu, Z., Hsu, Y.-C., Yin, R., Wang, M., Liang, R., Wang, K.-Y., Zhou, H.-C.: Development of the design and synthesis of metal–organic frameworks (mofs) – from large scale attempts, functional oriented modifications, to artificial intelligence (ai) predictions. Chemical Society Reviews 54, 367–395 (2025) https://doi...

  7. [7]

    npj Computational Materials9, 170 (2023) https://doi.org/10.1038/ s41524-023-01125-1

    Comlek, Y., Pham, T.D., Snurr, R.Q., Chen, W.: Rapid design of top- performing metal-organic frameworks with qualitative representations of building blocks. npj Computational Materials9, 170 (2023) https://doi.org/10.1038/ s41524-023-01125-1

  8. [8]

    Digital Discovery5, 523–547 (2026) https://doi

    Ma, B., Qin, N., Yan, Q., Zhou, W., Zhang, S., Wang, X., Bao, L., Lu, X.: Advancing metal organic framework and covalent organic framework design via the digital-intelligent paradigm. Digital Discovery5, 523–547 (2026) https://doi. org/10.1039/D5DD00401B

  9. [9]

    Moghadam, P.Z., Li, A., Wiggin, S.B., Tao, A., Maloney, A.G.P., Wood, P.A., Ward, S.C., Fairen-Jimenez, D.: Development of a cambridge structural database subset: A collection of metal–organic frameworks for past, present, and future. Chem. Mater.29(7), 2618–2625 (2017)

  10. [10]

    Chong, S., Kim, J.: Pormake: A toolkit for the topology-based construction of porous materials. Mater. Adv.3(5), 2263–2272 (2022)

  11. [11]

    Journal of Chemical Information and Modeling66(1), 88–99 (2026) https://doi.org/10.1021/acs.jcim.5c01730

    Zhang, H., Pan, C., Liang, Q., Zhong, L., Pan, W.-P.: Multiobjective optimization of metal–organic framework structural properties and synthesis costs through machine learning. Journal of Chemical Information and Modeling66(1), 88–99 (2026) https://doi.org/10.1021/acs.jcim.5c01730

  12. [12]

    Advances in Colloid and Interface Science346, 103671 (2025) https://doi.org/ 10.1016/j.cis.2025.103671

    Wang, H., Yang, L., Leng, D., Du, Y., Ning, H.: Accelerating the discovery and optimization of metal–organic framework materials via machine learning. Advances in Colloid and Interface Science346, 103671 (2025) https://doi.org/ 10.1016/j.cis.2025.103671

  13. [13]

    Inorganic Chemistry50(18), 9147–9152 (2011) https://doi.org/10.1021/ ic201376t

    Furukawa, H., Go, Y.B., Ko, N., Park, Y.K., Uribe-Romo, F.J., Kim, J., O’Keeffe, M., Yaghi, O.M.: Isoreticular expansion of metal–organic frameworks with tri- angular and square building units and the lowest calculated density for porous crystals. Inorganic Chemistry50(18), 9147–9152 (2011) https://doi.org/10.1021/ ic201376t

  14. [14]

    Chemical Communications53(77), 23 10684–10687 (2017) https://doi.org/10.1039/C7CC04222A

    Schukraft, G.E.M., Ayala, S., Dick, B.L., Cohen, S.M.: Isoreticular expansion of polymofs achieves high surface area materials. Chemical Communications53(77), 23 10684–10687 (2017) https://doi.org/10.1039/C7CC04222A

  15. [15]

    In: Proc

    Hoogeboom, E., Satorras, V.G., Vignac, C., Welling, M.: Equivariant diffusion for molecule generation in 3d. In: Proc. Int. Conf. Mach. Learn., vol. 162, pp. 8867–8887 (2022)

  16. [16]

    arXiv preprint arXiv:2203.02923 , year=

    Xu, M., Yu, L., Song, Y., Shi, C., Ermon, S., Tang, J.: Geodiff: A geo- metric diffusion model for molecular conformation generation. arXiv preprint arXiv:2203.02923 (2022)

  17. [17]

    Igashov, I., St¨ ark, H., Vignac, C., Schneuing, A., Satorras, V.G., Frossard, P., Welling, M., Bronstein, M., Correia, B.: Equivariant 3d-conditional diffu- sion model for molecular linker design. Nat. Mach. Intell.6(4), 417–427 (2024) https://doi.org/10.1038/s42256-024-00815-9

  18. [18]

    arXiv preprint arXiv:2110.06197 (2021)

    Xie, T., Fu, X., Chen, Z., Nie, W., Li, T., Zhang, Y., Du, Y., Jaakkola, T.: Crystal diffusion variational autoencoder for periodic materials generation. arXiv preprint arXiv:2110.06197 (2021)

  19. [19]

    A generative model for inorganic materials design,

    Zeni, C., Pinsler, R., Z¨ ugner, D., Fowler, A., Horton, M., Fu, X., Wang, Z., Shysheya, A., Crabb´ e, J., Ueda, S., Sordillo, R., Sun, L., Smith, J., Nguyen, B., Schulz, H., Lewis, S., Huang, C.-W., Lu, Z., Zhou, Y., Yang, H., Hao, H., Li, J., Yang, C., Li, W., Tomioka, R., Xie, T.: A generative model for inorganic materials design. Nature (2025) https:/...

  20. [20]

    arXiv preprint arXiv:2310.10732 (2023)

    Fu, X., Xie, T., Rosen, A.S., Jaakkola, T., Smith, J.: Mofdiff: Coarse-grained diffusion for metal-organic framework design. arXiv preprint arXiv:2310.10732 (2023)

  21. [21]

    arXiv preprint arXiv:2505.08531 (2025)

    Duan, C., Nandy, A., Liu, S., Du, Y., He, L., Qu, Y., Jia, H., Dou, J.-H.: Building- block aware generative modeling for 3d crystals of metal organic frameworks. arXiv preprint arXiv:2505.08531 (2025)

  22. [22]

    arXiv preprint arXiv:2410.17270 (2024)

    Kim, N., Kim, S., Kim, M., Park, J., Ahn, S.: Mofflow: Flow matching for struc- ture prediction of metal-organic frameworks. arXiv preprint arXiv:2410.17270 (2024)

  23. [23]

    Mofasa: A step change in metal-organic framework generation, 2025

    Simkus, V., Christensen, A., Bennett, S., Johnson, I., Neumann, M., Gin, J., Godwin, J., Rhodes, B.: Mofasa: A step change in metal-organic framework generation. arXiv preprint arXiv:2512.01756 (2025)

  24. [24]

    Atommof: All-atom flow matching for mof-adsorbate structure prediction, 2026

    Kim, N., Kim, H., Yu, S., Kim, M., Kim, S., Ahn, S.: Atommof: All-atom flow matching for mof-adsorbate structure prediction. arXiv preprint arXiv:2602.07351 (2026)

  25. [25]

    arXiv preprint arXiv:2504.14110 (2025)

    Jaffrelot Inizan, T., Yang, S., Kaplan, A., Lin, Y.-h., Yin, J., Mirzaei, S., Abdel- gaid, M., Alawadhi, A.H., Cho, K., Zheng, Z., Cubuk, E.D., Borgs, C., Chayes, 24 J.T., Persson, K.A., Yaghi, O.M.: System of agentic ai for the discovery of metal-organic frameworks. arXiv preprint arXiv:2504.14110 (2025)

  26. [26]

    Kang, Y., Kim, J.: Chatmof: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models. Nat. Commun. 15, 4705 (2024)

  27. [27]

    Feng, B., Wang, B., Lv, L., Zhang, M., Chen, Z., Pan, F., Li, S.: Interpreting x-ray diffraction patterns of metal–organic frameworks via generative artificial intelligence. J. Am. Chem. Soc.148(1), 869–878 (2026) https://doi.org/10.1021/ jacs.5c16416

  28. [28]

    EGMOF: Efficient Generation of Metal-Organic Frameworks Using a Hybrid Diffusion-Transformer Architecture

    Han, S., Kang, Y., Bae, T., Bernales, V., Aspuru-Guzik, A., Kim, J.: Egmof: Effi- cient generation of metal-organic frameworks using a hybrid diffusion-transformer architecture. arXiv preprint arXiv:2511.03122 (2025)

  29. [29]

    arXiv preprint arXiv:2506.00771 (2025)

    Chen, Z., Jia, Y., Tian, Z., Ma, W.-Y., Lan, Y.: Manipulating 3d molecules in a fixed-dimensional e(3)-equivariant latent space. arXiv preprint arXiv:2506.00771 (2025)

  30. [30]

    Xie, T., Grossman, J.C.: Crystal graph convolutional neural networks for an accu- rate and interpretable prediction of material properties. Phys. Rev. Lett.120(14), 145301 (2018) https://doi.org/10.1103/PhysRevLett.120.145301

  31. [31]

    In: Advances in Neural Information Processing Systems, vol

    Sch¨ utt, K.T., Kindermans, P.-J., Sauceda, H.E., Chmiela, S., Tkatchenko, A., M¨ uller, K.-R.: Schnet: A continuous-filter convolutional neural network for mod- eling quantum interactions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

  32. [32]

    JOM74(4), 1395–1405 (2022) https://doi.org/10.1007/ s11837-022-05199-y

    Kaundinya, P.R., Choudhary, K., Kalidindi, S.R.: Prediction of the electron density of states for crystalline compounds with atomistic line graph neu- ral networks (alignn). JOM74(4), 1395–1405 (2022) https://doi.org/10.1007/ s11837-022-05199-y

  33. [33]

    Crystal Growth & Design19(11), 6682–6697 (2019) https://doi.org/10.1021/ acs.cgd.9b01050

    Bucior, B.J., Rosen, A.S., Haranczyk, M., Yao, Z., Ziebel, M.E., Farha, O.K., Hupp, J.T., Siepmann, J.I., Aspuru-Guzik, A., Snurr, R.Q.: Identification schemes for metal–organic frameworks to enable rapid search and cheminformatics analy- sis. Crystal Growth & Design19(11), 6682–6697 (2019) https://doi.org/10.1021/ acs.cgd.9b01050

  34. [34]

    npj Drug Discovery2(1), 14 (2025) https://doi.org/10.1038/s44386-025-00017-2

    Wang, S., Zhang, R., Li, X., Cai, F., Ma, X., Tang, Y., Xu, C., Wang, L., Ren, P., Liu, L., Wu, S., Qian, Q., Bai, F.: Recent advances in molecular representation methods and their applications in scaffold hopping. npj Drug Discovery2(1), 14 (2025) https://doi.org/10.1038/s44386-025-00017-2

  35. [35]

    Journal of Cheminformatics13(1), 25 87 (2021) https://doi.org/10.1186/s13321-021-00565-5

    Zheng, S., Lei, Z., Ai, H., Chen, H., Deng, D., Yang, Y.: Deep scaffold hopping with multimodal transformer neural networks. Journal of Cheminformatics13(1), 25 87 (2021) https://doi.org/10.1186/s13321-021-00565-5

  36. [36]

    Journal of Cheminformatics15(1), 91 (2023) https://doi.org/10.1186/s13321-023-00766-0

    Hu, C., Li, S., Yang, C., Chen, J., Xiong, Y., Fan, G., Liu, H., Hong, L.: Scaffoldgvae: scaffold generation and hopping of drug molecules via a vari- ational autoencoder based on multi-view graph neural networks. Journal of Cheminformatics15(1), 91 (2023) https://doi.org/10.1186/s13321-023-00766-0

  37. [37]

    Furukawa, H., Go, Y.B., Ko, N., Park, Y.K., Uribe-Romo, F.J., Kim, J., O’Keeffe, M., Yaghi, O.M.: Isoreticular expansion of metal–organic frameworks with triangular and square building units and the lowest calculated density for porous crystals. Inorg. Chem.50(18), 9147–9152 (2011) https://doi.org/10.1021/ ic201376t

  38. [38]

    Microporous and Mesoporous Materials149(1), 134–141 (2012) https: //doi.org/10.1016/j.micromeso.2011.08.020

    Willems, T.F., Rycroft, C.H., Kazi, M., Meza, J.C., Haranczyk, M.: Algorithms and tools for high-throughput geometry-based analysis of crystalline porous materials. Microporous and Mesoporous Materials149(1), 134–141 (2012) https: //doi.org/10.1016/j.micromeso.2011.08.020

  39. [39]

    Molecular Simulation42(2), 81–101 (2016) https://doi.org/10.1080/08927022

    Dubbeldam, D., Calero, S., Ellis, D.E., Snurr, R.Q.: Raspa: molecular simu- lation software for adsorption and diffusion in flexible nanoporous materials. Molecular Simulation42(2), 81–101 (2016) https://doi.org/10.1080/08927022. 2015.1010082

  40. [40]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426 (2018) https://doi.org/10. 48550/ARXIV.1802.03426

  41. [41]

    arXiv:1709.04545 (2017)

    Araujo Neto, A.C., Sander, J., Campello, R.J.G.B.: Efficient computation of multiple density-based clustering hierarchies. arXiv:1709.04545 (2017)

  42. [42]

    In: Proc

    Jin, W., Barzilay, R., Jaakkola, T.: Junction tree variational autoencoder for molecular graph generation. In: Proc. Int. Conf. Mach. Learn., vol. 80, pp. 2323– 2332 (2018)

  43. [43]

    In: Proceedings of the 34th International Conference on Machine Learning

    Kusner, M.J., Paige, B., Hern´ andez-Lobato, J.M.: Grammar variational autoen- coder. In: Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1945–1954. PMLR, Sydney, Australia (2017)

  44. [44]

    Frontiers in Pharmacology 11, 565644 (2020) https://doi.org/10.3389/fphar.2020.565644

    Polykovskiy, D., Zhebrak, A., Sanchez-Lengeling, B., Golovanov, S., Tatanov, O., Belyaev, S., Kurbanov, R., Artamonov, A., Aladinskiy, V., Veselov, M., Kadurin, A., Nikolenko, S.I., Aliper, A., Zhavoronkov, A.: Molecular sets (moses): A bench- marking platform for molecular generation models. Frontiers in Pharmacology 11, 565644 (2020) https://doi.org/10....

  45. [45]

    Boyd, P.G., Chidambaram, A., Garc´ ıa-D´ ıez, E., Ireland, C.P., Daff, T.D., Bounds, R., G ladysiak, A., Schouwink, P., Moosavi, S.M., Maroto-Valer, M.M., Reimer, 26 J.A., Navarro, J.A.R., Woo, T.K., Garcia, S., Stylianou, K.C., Smit, B.: Data- driven design of metal–organic frameworks for wet flue gas co 2 capture. Nature 576(7786), 253–256 (2019) https:...

  46. [46]

    Shape similarity is calculated via spatial Tanimoto overlap

    or Parent 2 (Gen 2). Shape similarity is calculated via spatial Tanimoto overlap. Anchor distance tracks the spatial shift of connection nodes relative to the geometric parent. Generated Set Geometric Parent Validity Shape Similarity Anchor Dist ( ˚A) Gen 1 (usingZ x,1) Parent 1 100% 0.646 0.283 Gen 2 (usingZ x,2) Parent 2 100% 0.669 0.253 Table S4:Quanti...