Chirality emerges in SMILES translation models through an abrupt encoder-centered reorganization of representations after a long plateau, identified via checkpoint analysis and ablation.
hub
Smiles, a chemical language and information system
9 Pith papers cite this work, alongside 6,035 external citations. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9roles
background 1polarities
background 1representative citing papers
SCPT creates similarity-constrained preference triplets from scaffolds to train LLMs as conditional molecular editors that improve properties while keeping scaffolds intact.
Chemically meaningful steering for properties like cLogP and TPSA emerges in entangled Transformer-VAE latent spaces only after controlling for SELFIES representation confounds through residualization and decoded traversals.
FRIGID scales a diffusion-based model for de novo molecular structure generation from mass spectra, reaching over 18% top-1 accuracy on MassSpecGym and tripling prior bests on NPLIB1 via large unlabeled training and inference-time fragmentation refinement with log-linear compute scaling.
Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.
Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.
Large benchmark shows classical ML and GNNs outperform pretrained large models on most of 22 drug-discovery endpoints under strict cross-validation.
Pretrained molecular embedding distances provide an effective similarity metric for ligand-based virtual screening and molecular generation without task-specific training.
citing papers explorer
-
From Syntax to Semantics: Unveiling the Emergence of Chirality in SMILES Translation Models
Chirality emerges in SMILES translation models through an abrupt encoder-centered reorganization of representations after a long plateau, identified via checkpoint analysis and ablation.
-
Scaffold-Conditioned Preference Triplets for Controllable Molecular Optimization with Large Language Models
SCPT creates similarity-constrained preference triplets from scaffolds to train LLMs as conditional molecular editors that improve properties while keeping scaffolds intact.
-
Molecules Meet Language: Confound-Aware Representation Learning and Chemical Property Steering in Transformer-VAE Latent Spaces
Chemically meaningful steering for properties like cLogP and TPSA emerges in entangled Transformer-VAE latent spaces only after controlling for SELFIES representation confounds through residualization and decoded traversals.
-
FRIGID: Scaling Diffusion-Based Molecular Generation from Mass Spectra at Training and Inference Time
FRIGID scales a diffusion-based model for de novo molecular structure generation from mass spectra, reaching over 18% top-1 accuracy on MassSpecGym and tripling prior bests on NPLIB1 via large unlabeled training and inference-time fragmentation refinement with log-linear compute scaling.
-
Bolek: A Multimodal Language Model for Molecular Reasoning
Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.
-
Heterogeneous Scientific Foundation Model Collaboration
Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
-
Galactica: A Large Language Model for Science
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.
-
Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction
Large benchmark shows classical ML and GNNs outperform pretrained large models on most of 22 drug-discovery endpoints under strict cross-validation.
-
Advancing Ligand-based Virtual Screening and Molecular Generation with Pretrained Molecular Embedding Distance
Pretrained molecular embedding distances provide an effective similarity metric for ligand-based virtual screening and molecular generation without task-specific training.