Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.
Sigmoid-weighted linear units for neural network function approximation in reinforcement learning
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Gradient descent on wide shallow models with bounded nonlinearities converges globally in the mean-field limit as non-global critical points are unstable under the dynamics.
ANTIC reduces storage for large-scale PDE simulations by orders of magnitude through adaptive temporal snapshot selection combined with continual neural-field residual compression while preserving physics accuracy.
Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.
Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.
Neural networks learn via sparse retrospective updates triggered internally when prediction error exceeds a threshold derived from recent error statistics, leading to stepwise parameter changes in simulations.
A DenseNet201 base model trained on a constructed plant leaf disease dataset outperforms baselines and enables faster, more robust transfer learning with less data than general models.
citing papers explorer
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.
-
On the global convergence of gradient descent for wide shallow models with bounded nonlinearities
Gradient descent on wide shallow models with bounded nonlinearities converges globally in the mean-field limit as non-global critical points are unstable under the dynamics.
-
ANTIC: Adaptive Neural Temporal In-situ Compressor
ANTIC reduces storage for large-scale PDE simulations by orders of magnitude through adaptive temporal snapshot selection combined with continual neural-field residual compression while preserving physics accuracy.
-
Bolek: A Multimodal Language Model for Molecular Reasoning
Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary classification endpoints while generalizing to unseen tasks.
-
Training-inference input alignment outweighs framework choice in longitudinal retinal image prediction
Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.
-
Internally triggered retrospective learning in neural networks
Neural networks learn via sparse retrospective updates triggered internally when prediction error exceeds a threshold derived from recent error statistics, leading to stepwise parameter changes in simulations.
-
Developing a Strong Pre-Trained Base Model for Plant Leaf Disease Classification
A DenseNet201 base model trained on a constructed plant leaf disease dataset outperforms baselines and enables faster, more robust transfer learning with less data than general models.