arXiv preprint arXiv:2405.15682 , year=

The Road Less Scheduled , author= · 2024 · arXiv 2405.15682

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

In-Context Multiple Instance Learning

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

A model pretrained on synthetic bag-structured data performs in-context learning for new MIL tasks from a handful of examples and outperforms task-specific supervised baselines on twelve benchmarks.

From Syntax to Semantics: Unveiling the Emergence of Chirality in SMILES Translation Models

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Chirality emerges in SMILES translation models through an abrupt encoder-centered reorganization of representations after a long plateau, identified via checkpoint analysis and ablation.

Training Deep Learning Models with Norm-Constrained LMOs

cs.LG · 2025-02-11 · unverdicted · novelty 7.0

Scion is a new stochastic LMO-based optimizer family that unifies existing methods, supports unconstrained problems, and delivers hyperparameter transferability plus speedups on nanoGPT training.

Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

Optimal hyperparameters for LLM continued pre-training follow predictable scaling laws derived from proxy models, enabling a two-stage framework that predicts settings from compute budget and checkpoint state to reduce search overhead by 90%.

CoAction: Cross-task Correlation-aware Pareto Set Learning

cs.LG · 2026-05-03 · unverdicted · novelty 6.0 · 2 refs

CoAction introduces a task-aware transformer that simultaneously learns Pareto optimal solutions across multiple tasks by capturing inter-task correlations via self-attention and task embeddings.

Integrating Weather Foundation Model and Satellite to Enable Fine-Grained Solar Irradiance Forecasting

cs.LG · 2026-03-16 · unverdicted · novelty 6.0

Baguan-solar integrates Baguan weather foundation model forecasts with geostationary satellite data via a decoupled two-stage multimodal framework to deliver kilometer-scale 24-hour solar irradiance predictions, cutting RMSE by 16% versus baselines over East Asia.

Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models

cs.LG · 2026-04-27 · unverdicted · novelty 5.0

HDET lets data-parallel replicas explore a spread of learning rates independently before averaging parameters, with an auto-LR controller driven by inter-replica loss differences to produce a self-adapting schedule without extra sweeps.

Image-Based Malware Type Classification on MalNet-Image Tiny: Effects of Multi-Scale Fusion, Transfer Learning, Data Augmentation, and Schedule-Free Optimization

cs.CR · 2026-04-22 · unverdicted · novelty 2.0

Pretraining plus Mixup/TrivialAugment and a feature pyramid network lift macro-F1 from 0.65 to 0.69 on 43-class malware image classification while cutting training epochs from 96 to 10.

citing papers explorer

Showing 8 of 8 citing papers.

In-Context Multiple Instance Learning cs.LG · 2026-06-04 · unverdicted · none · ref 40
A model pretrained on synthetic bag-structured data performs in-context learning for new MIL tasks from a handful of examples and outperforms task-specific supervised baselines on twelve benchmarks.
From Syntax to Semantics: Unveiling the Emergence of Chirality in SMILES Translation Models cs.LG · 2026-05-11 · unverdicted · none · ref 52
Chirality emerges in SMILES translation models through an abrupt encoder-centered reorganization of representations after a long plateau, identified via checkpoint analysis and ablation.
Training Deep Learning Models with Norm-Constrained LMOs cs.LG · 2025-02-11 · unverdicted · none · ref 173
Scion is a new stochastic LMO-based optimizer family that unifies existing methods, supports unconstrained problems, and delivers hyperparameter transferability plus speedups on nanoGPT training.
Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training cs.CL · 2026-06-04 · unverdicted · none · ref 25
Optimal hyperparameters for LLM continued pre-training follow predictable scaling laws derived from proxy models, enabling a two-stage framework that predicts settings from compute budget and checkpoint state to reduce search overhead by 90%.
CoAction: Cross-task Correlation-aware Pareto Set Learning cs.LG · 2026-05-03 · unverdicted · none · ref 27 · 2 links
CoAction introduces a task-aware transformer that simultaneously learns Pareto optimal solutions across multiple tasks by capturing inter-task correlations via self-attention and task embeddings.
Integrating Weather Foundation Model and Satellite to Enable Fine-Grained Solar Irradiance Forecasting cs.LG · 2026-03-16 · unverdicted · none · ref 11
Baguan-solar integrates Baguan weather foundation model forecasts with geostationary satellite data via a decoupled two-stage multimodal framework to deliver kilometer-scale 24-hour solar irradiance predictions, cutting RMSE by 16% versus baselines over East Asia.
Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models cs.LG · 2026-04-27 · unverdicted · none · ref 3
HDET lets data-parallel replicas explore a spread of learning rates independently before averaging parameters, with an auto-LR controller driven by inter-replica loss differences to produce a self-adapting schedule without extra sweeps.
Image-Based Malware Type Classification on MalNet-Image Tiny: Effects of Multi-Scale Fusion, Transfer Learning, Data Augmentation, and Schedule-Free Optimization cs.CR · 2026-04-22 · unverdicted · none · ref 17
Pretraining plus Mixup/TrivialAugment and a feature pyramid network lift macro-F1 from 0.65 to 0.69 on 43-class malware image classification while cutting training epochs from 96 to 10.

arXiv preprint arXiv:2405.15682 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer