Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.
Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.Neural Networks, 107:3–11, November 2018
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
IceCube events are encoded as 72x72x3 images and processed by ResNet18 to reach 1.10 rad mean angular error in neutrino direction reconstruction.
MoE-dqINR factorizes INR-based MRI reconstruction into shared spatial experts plus state-conditioned routing to unify dynamic and quantitative reconstruction at roughly 30 seconds per scan.
Double metric learning learns two embeddings per node to build directed graphs with chain connections, yielding better performance than single metric learning for high-pT particles and accurate edge direction prediction in ATLAS ITk simulations.
Gradient descent on wide shallow models with bounded nonlinearities converges globally in the mean-field limit as non-global critical points are unstable under the dynamics.
ANTIC reduces storage for large-scale PDE simulations by orders of magnitude through adaptive temporal snapshot selection combined with continual neural-field residual compression while preserving physics accuracy.
Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.
Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.
Neural networks learn via sparse retrospective updates triggered internally when prediction error exceeds a threshold derived from recent error statistics, leading to stepwise parameter changes in simulations.
A DenseNet201 base model trained on a constructed plant leaf disease dataset outperforms baselines and enables faster, more robust transfer learning with less data than general models.
citing papers explorer
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.
-
Neutrino Fingerprints: Image-Based Encodings of IceCube Events for CNN Direction Reconstruction
IceCube events are encoded as 72x72x3 images and processed by ResNet18 to reach 1.10 rad mean angular error in neutrino direction reconstruction.
-
MoE-dqINR: A Unified Mixture-of-Experts Implicit Neural Representation Framework for Scan-Specific Dynamic and Quantitative MRI Reconstruction
MoE-dqINR factorizes INR-based MRI reconstruction into shared spatial experts plus state-conditioned routing to unify dynamic and quantitative reconstruction at roughly 30 seconds per scan.
-
Double Metric Learning for Building Directed Graphs with Chain Connections for the ATLAS ITk Detector
Double metric learning learns two embeddings per node to build directed graphs with chain connections, yielding better performance than single metric learning for high-pT particles and accurate edge direction prediction in ATLAS ITk simulations.
-
On the global convergence of gradient descent for wide shallow models with bounded nonlinearities
Gradient descent on wide shallow models with bounded nonlinearities converges globally in the mean-field limit as non-global critical points are unstable under the dynamics.
-
ANTIC: Adaptive Neural Temporal In-situ Compressor
ANTIC reduces storage for large-scale PDE simulations by orders of magnitude through adaptive temporal snapshot selection combined with continual neural-field residual compression while preserving physics accuracy.
-
Bolek: A Multimodal Language Model for Molecular Reasoning
Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.
-
Training-inference input alignment outweighs framework choice in longitudinal retinal image prediction
Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.
-
Internally triggered retrospective learning in neural networks
Neural networks learn via sparse retrospective updates triggered internally when prediction error exceeds a threshold derived from recent error statistics, leading to stepwise parameter changes in simulations.
-
Developing a Strong Pre-Trained Base Model for Plant Leaf Disease Classification
A DenseNet201 base model trained on a constructed plant leaf disease dataset outperforms baselines and enables faster, more robust transfer learning with less data than general models.