Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.
Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.Neural Networks, 107:3–11, November 2018
11 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
IceCube events are encoded as 72x72x3 images and processed by ResNet18 to reach 1.10 rad mean angular error in neutrino direction reconstruction.
MoE-dqINR factorizes INR-based MRI reconstruction into shared spatial experts plus state-conditioned routing to unify dynamic and quantitative reconstruction at roughly 30 seconds per scan.
Double metric learning learns two embeddings per node to build directed graphs with chain connections, yielding better performance than single metric learning for high-pT particles and accurate edge direction prediction in ATLAS ITk simulations.
Gradient descent on wide shallow models with bounded nonlinearities converges globally in the mean-field limit as non-global critical points are unstable under the dynamics.
ANTIC reduces storage for large-scale PDE simulations by orders of magnitude through adaptive temporal snapshot selection combined with continual neural-field residual compression while preserving physics accuracy.
Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.
Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.
KAN noise robustness in star/galaxy/quasar classification arises from implicit C2-spline regularization rather than architecture, as weight-decay-tuned MLPs match performance on SDSS and DESI data.
Neural networks learn via sparse retrospective updates triggered internally when prediction error exceeds a threshold derived from recent error statistics, leading to stepwise parameter changes in simulations.
A DenseNet201 base model trained on a constructed plant leaf disease dataset outperforms baselines and enables faster, more robust transfer learning with less data than general models.
citing papers explorer
-
Bolek: A Multimodal Language Model for Molecular Reasoning
Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.
-
Internally triggered retrospective learning in neural networks
Neural networks learn via sparse retrospective updates triggered internally when prediction error exceeds a threshold derived from recent error statistics, leading to stepwise parameter changes in simulations.