veScale-FSDP uses RaggedShard and structure-aware planning to support block-wise quantization and non-element-wise optimizers while delivering 5-66% higher throughput and 16-30% lower memory than prior FSDP systems at massive scale.
Terabyte-scale analytics in the blink of an eye
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
PystachIO is a PyTorch-based distributed OLAP engine that delivers up to 3x end-to-end speedups for storage-resident queries by combining fast RDMA networks, NVMe storage, and I/O-computation overlap optimizations.
Relational engines achieve faster SQL+vector-search queries on GPU than CPU when using compact vector indexes and fast interconnects, reversing the CPU-only design in current systems.
citing papers explorer
-
veScale-FSDP: Flexible and High-Performance FSDP at Scale
veScale-FSDP uses RaggedShard and structure-aware planning to support block-wise quantization and non-element-wise optimizers while delivering 5-66% higher throughput and 16-30% lower memory than prior FSDP systems at massive scale.
-
PystachIO: Efficient Distributed GPU Query Processing with PyTorch over Fast Networks & Fast Storage
PystachIO is a PyTorch-based distributed OLAP engine that delivers up to 3x end-to-end speedups for storage-resident queries by combining fast RDMA networks, NVMe storage, and I/O-computation overlap optimizations.
-
To GPU or Not to GPU: Vector Search in Relational Engines
Relational engines achieve faster SQL+vector-search queries on GPU than CPU when using compact vector indexes and fast interconnects, reversing the CPU-only design in current systems.