Star Elastic trains N nested submodels in a single post-training job on a parent reasoning LLM, supporting elastic budget control that matches or exceeds independent baselines while cutting training compute by up to 360x.
hub
Nemotron 3 nano: Open, efficient mixture-of-experts hybrid mamba-transformer model for agentic reasoning
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 10representative citing papers
Skip-connected MLPs and residual-free MLPs of equal width represent generically disjoint function classes for common activations, with explicit impossibility proofs and a non-generic absorption condition for ReLU and GELU.
MetaGAI is a new large-scale benchmark for automated model and data card generation, constructed via semantic triangulation and multi-agent agents with human-in-the-loop verification.
PARD-2 uses Confidence-Adaptive Token optimization to align draft model training with acceptance length in speculative decoding, enabling dual-mode operation and up to 6.94x lossless speedup on Llama3.1-8B.
Priming transfers knowledge from pre-trained Transformers to hybrid SSM-attention models, recovering performance with minimal additional tokens and showing Gated KalmaNet outperforming Mamba-2 on long-context reasoning at 32B scale.
nGPT's hypersphere constraint makes dot-product signal accumulate constructively under 4-bit quantization while noise averages out, enabling native low-precision training.
EPM-RL uses PEFT followed by RL with agent-based rewards from judge models to create a trainable in-house product mapping model that improves on fine-tuning alone and beats API baselines in quality-cost while enabling private use.
SpikingBrain2.0 is a 5B hybrid spiking-Transformer that recovers most base model performance while delivering 10x TTFT speedup at 4M context and supporting over 10M tokens on limited GPUs via dual sparse attention and dual quantization paths.
Nemotron 3 Super is an open 120B hybrid Mamba-Attention MoE model with new LatentMoE architecture and MTP layers that matches accuracy of similar models while delivering up to 7.5x higher inference throughput.
An MCP server framework lets LLM agents run quantum primitives like sampling and expectation value computation on hybrid platforms by interpreting prompts and invoking tools for OpenQASM and CUDA-Q.
citing papers explorer
-
Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control
Star Elastic trains N nested submodels in a single post-training job on a parent reasoning LLM, supporting elastic budget control that matches or exceeds independent baselines while cutting training compute by up to 360x.
-
Can an MLP Absorb Its Own Skip Connection?
Skip-connected MLPs and residual-free MLPs of equal width represent generically disjoint function classes for common activations, with explicit impossibility proofs and a non-generic absorption condition for ReLU and GELU.
-
MetaGAI: A Large-Scale and High-Quality Benchmark for Generative AI Model and Data Card Generation
MetaGAI is a new large-scale benchmark for automated model and data card generation, constructed via semantic triangulation and multi-agent agents with human-in-the-loop verification.
-
PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding
PARD-2 uses Confidence-Adaptive Token optimization to align draft model training with acceptance length in speculative decoding, enabling dual-mode operation and up to 6.94x lossless speedup on Llama3.1-8B.
-
Priming: Hybrid State Space Models From Pre-trained Transformers
Priming transfers knowledge from pre-trained Transformers to hybrid SSM-attention models, recovering performance with minimal additional tokens and showing Gated KalmaNet outperforming Mamba-2 on long-context reasoning at 32B scale.
-
Normalized Architectures are Natively 4-Bit
nGPT's hypersphere constraint makes dot-product signal accumulate constructively under 4-bit quantization while noise averages out, enabling native low-precision training.
-
EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce
EPM-RL uses PEFT followed by RL with agent-based rewards from judge models to create a trainable in-house product mapping model that improves on fine-tuning alone and beats API baselines in quality-cost while enabling private use.
-
SpikingBrain2.0: Brain-Inspired Foundation Models for Efficient Long-Context and Cross-Platform Inference
SpikingBrain2.0 is a 5B hybrid spiking-Transformer that recovers most base model performance while delivering 10x TTFT speedup at 4M context and supporting over 10M tokens on limited GPUs via dual sparse attention and dual quantization paths.
-
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Nemotron 3 Super is an open 120B hybrid Mamba-Attention MoE model with new LatentMoE architecture and MTP layers that matches accuracy of similar models while delivering up to 7.5x higher inference throughput.
-
A Model Context Protocol Server for Quantum Execution in Hybrid Quantum-HPC Environments
An MCP server framework lets LLM agents run quantum primitives like sampling and expectation value computation on hybrid platforms by interpreting prompts and invoking tools for OpenQASM and CUDA-Q.