Helix: Serving large language models over heterogeneous gpus and network via max-flow

· 2025 · arXiv 9940.370721

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Tessera: Unlocking Heterogeneous GPUs through Kernel-Granularity Disaggregation

cs.DC · 2026-04-11 · unverdicted · novelty 8.0

Tessera performs kernel-granularity disaggregation on heterogeneous GPUs, achieving up to 2.3x throughput and 1.6x cost efficiency gains for large model inference while generalizing beyond prior methods.

Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution

cs.PL · 2026-04-19 · unverdicted · novelty 7.0

A new partitioning algorithm that provably load-balances arbitrary sparse tensor algebra expressions by generalizing parallel merging to multi-operand, multi-dimensional hierarchical structures, implemented in a compiler framework.

Feedback-Driven Execution for LLM-Based Binary Analysis

cs.CR · 2026-04-16 · unverdicted · novelty 7.0

FORGE uses a reasoning-action-observation loop and Dynamic Forest of Agents to perform scalable LLM-based binary analysis, finding 1,274 vulnerabilities across 591 of 3,457 real-world firmware binaries at 72.3% precision and broader coverage than prior methods.

Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization

cs.DC · 2026-04-22 · unverdicted · novelty 5.0

BloomBee is a distributed LLM inference system that achieves up to 1.76x higher throughput and 43.2% lower latency than prior decentralized systems by optimizing communication across multiple dimensions in low-bandwidth internet settings.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Tessera: Unlocking Heterogeneous GPUs through Kernel-Granularity Disaggregation cs.DC · 2026-04-11 · unverdicted · none · ref 55
Tessera performs kernel-granularity disaggregation on heterogeneous GPUs, achieving up to 2.3x throughput and 1.6x cost efficiency gains for large model inference while generalizing beyond prior methods.
Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization cs.DC · 2026-04-22 · unverdicted · none · ref 25
BloomBee is a distributed LLM inference system that achieves up to 1.76x higher throughput and 43.2% lower latency than prior decentralized systems by optimizing communication across multiple dimensions in low-bandwidth internet settings.

Helix: Serving large language models over heterogeneous gpus and network via max-flow

fields

years

verdicts

representative citing papers

citing papers explorer