Tessera performs kernel-granularity disaggregation on heterogeneous GPUs, achieving up to 2.3x throughput and 1.6x cost efficiency gains for large model inference while generalizing beyond prior methods.
Misa-akmc:achieve kinetic monte carlo simulation of 20 quadrillion atoms on gpu clusters
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
AtomWorld enables the first direct atomistic simulation of RPV steel at year-and-meter scales, handling ten-quintillion-atom systems and simulating one service year in 1.71 days with 92-97% scaling efficiency on leadership supercomputers.
Joint resource allocation and routing for multi-model LLM serving can produce up to 87% variation in achievable output quality across setups on the same GPU cluster.
DQR enables efficient scheduling and failover for cut quantum circuit fragments across local QPUs and remote simulators on real HPC hardware with low coordination overhead.
citing papers explorer
-
Tessera: Unlocking Heterogeneous GPUs through Kernel-Granularity Disaggregation
Tessera performs kernel-granularity disaggregation on heterogeneous GPUs, achieving up to 2.3x throughput and 1.6x cost efficiency gains for large model inference while generalizing beyond prior methods.
-
Unfolding an Atomistic World: Atomistic Simulation of Reactor Pressure Vessel Steel Across Year-and-Meter Scales
AtomWorld enables the first direct atomistic simulation of RPV steel at year-and-meter scales, handling ten-quintillion-atom systems and simulating one service year in 1.71 days with 92-97% scaling efficiency on leadership supercomputers.
-
RouterWise: Joint Resource Allocation and Routing for Latency-Aware Multi-Model LLM Serving
Joint resource allocation and routing for multi-model LLM serving can produce up to 87% variation in achievable output quality across setups on the same GPU cluster.
-
Wave-Based Dispatch for Circuit Cutting in Hybrid HPC--Quantum Systems
DQR enables efficient scheduling and failover for cut quantum circuit fragments across local QPUs and remote simulators on real HPC hardware with low coordination overhead.