A model pretrained on synthetic bag-structured data performs in-context learning for new MIL tasks from a handful of examples and outperforms task-specific supervised baselines on twelve benchmarks.
arXiv preprint arXiv:2405.15682 , year=
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 8roles
background 1polarities
background 1representative citing papers
Chirality emerges in SMILES translation models through an abrupt encoder-centered reorganization of representations after a long plateau, identified via checkpoint analysis and ablation.
Scion is a new stochastic LMO-based optimizer family that unifies existing methods, supports unconstrained problems, and delivers hyperparameter transferability plus speedups on nanoGPT training.
Optimal hyperparameters for LLM continued pre-training follow predictable scaling laws derived from proxy models, enabling a two-stage framework that predicts settings from compute budget and checkpoint state to reduce search overhead by 90%.
CoAction introduces a task-aware transformer that simultaneously learns Pareto optimal solutions across multiple tasks by capturing inter-task correlations via self-attention and task embeddings.
Baguan-solar integrates Baguan weather foundation model forecasts with geostationary satellite data via a decoupled two-stage multimodal framework to deliver kilometer-scale 24-hour solar irradiance predictions, cutting RMSE by 16% versus baselines over East Asia.
HDET lets data-parallel replicas explore a spread of learning rates independently before averaging parameters, with an auto-LR controller driven by inter-replica loss differences to produce a self-adapting schedule without extra sweeps.
Pretraining plus Mixup/TrivialAugment and a feature pyramid network lift macro-F1 from 0.65 to 0.69 on 43-class malware image classification while cutting training epochs from 96 to 10.
citing papers explorer
-
In-Context Multiple Instance Learning
A model pretrained on synthetic bag-structured data performs in-context learning for new MIL tasks from a handful of examples and outperforms task-specific supervised baselines on twelve benchmarks.
-
From Syntax to Semantics: Unveiling the Emergence of Chirality in SMILES Translation Models
Chirality emerges in SMILES translation models through an abrupt encoder-centered reorganization of representations after a long plateau, identified via checkpoint analysis and ablation.
-
Training Deep Learning Models with Norm-Constrained LMOs
Scion is a new stochastic LMO-based optimizer family that unifies existing methods, supports unconstrained problems, and delivers hyperparameter transferability plus speedups on nanoGPT training.
-
Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training
Optimal hyperparameters for LLM continued pre-training follow predictable scaling laws derived from proxy models, enabling a two-stage framework that predicts settings from compute budget and checkpoint state to reduce search overhead by 90%.
-
CoAction: Cross-task Correlation-aware Pareto Set Learning
CoAction introduces a task-aware transformer that simultaneously learns Pareto optimal solutions across multiple tasks by capturing inter-task correlations via self-attention and task embeddings.
-
Integrating Weather Foundation Model and Satellite to Enable Fine-Grained Solar Irradiance Forecasting
Baguan-solar integrates Baguan weather foundation model forecasts with geostationary satellite data via a decoupled two-stage multimodal framework to deliver kilometer-scale 24-hour solar irradiance predictions, cutting RMSE by 16% versus baselines over East Asia.
-
Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models
HDET lets data-parallel replicas explore a spread of learning rates independently before averaging parameters, with an auto-LR controller driven by inter-replica loss differences to produce a self-adapting schedule without extra sweeps.
-
Image-Based Malware Type Classification on MalNet-Image Tiny: Effects of Multi-Scale Fusion, Transfer Learning, Data Augmentation, and Schedule-Free Optimization
Pretraining plus Mixup/TrivialAugment and a feature pyramid network lift macro-F1 from 0.65 to 0.69 on 43-class malware image classification while cutting training epochs from 96 to 10.