pith. sign in

Canonical reference

DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency , url=

Canonical reference. 75% of citing Pith papers cite this work as background.

31 Pith papers citing it
Background 75% of classified citations

citation-role summary

background 6 dataset 1 method 1

citation-polarity summary

years

2026 23 2025 8

representative citing papers

Enabling AI ASICs for Zero Knowledge Proof

cs.AR · 2026-04-20 · conditional · novelty 8.0

MORPH reformulates ZKP MSM and NTT kernels into GEMM operations for TPUs using a new Big-T complexity model, achieving up to 10x NTT throughput over GZKP.

Enhancing Instruction Prefetching via Cache and TLB Management

cs.AR · 2026-05-12 · unverdicted · novelty 7.0

IP-CaT jointly optimizes TLB and cache management for L1I prefetching via a translation prefetch buffer and trimodal replacement policy, yielding 8.7% geomean speedup over EPI across 105 server workloads.

WHET: Welding Homomorphic Encryption to Accelerator Architectures

cs.CR · 2026-06-10 · unverdicted · novelty 6.0 · 5 refs

WHET applies fine-grained coefficient-to-slot transforms, plaintext compression, and modulus raising plus lightweight hardware tweaks to FHE accelerators, delivering 1.38-8.74x per-area gains and sub-millisecond CKKS bootstrapping.

Don't Let a Few Network Failures Slow the Entire AllReduce

cs.DC · 2026-06-01 · unverdicted · novelty 6.0

OptCC is a pipelined AllReduce algorithm that completes within 2-6% of fault-free NCCL performance under up to 50% bandwidth loss by approaching a new lower bound showing O(1/p) unavoidable overhead for p GPUs.

Designing Datacenter Power Delivery Hierarchies for the AI Era

cs.DC · 2026-05-15 · unverdicted · novelty 6.0

Develops a simulation framework showing multi-resource stranding changes deployable capacity and effective costs in AI datacenters, arguing the key metric is deployable capacity over time rather than installed megawatts.

EdgeFlow: Fast Cold Starts for LLMs on Mobile Devices

cs.OS · 2026-04-10 · unverdicted · novelty 6.0

EdgeFlow reduces mobile LLM cold-start latency up to 4.07x versus llama.cpp, MNN, and llm.npu by NPU-aware adaptive quantization, SIMD-friendly packing, and synergistic granular CPU-NPU pipelining at comparable accuracy.

The Energy Cost of Execution-Idle in GPU Clusters

cs.DC · 2026-04-06 · unverdicted · novelty 6.0

Execution-idle accounts for 19.7% of GPU execution time and 10.7% of energy in a large cluster, motivating power management that treats it as a distinct operating state.

PICO: Performance Insights for Collective Operations

cs.DC · 2025-08-22 · unverdicted · novelty 6.0

PICO is a benchmarking framework for collective operations that decouples portable setup from platform execution, supplies reference MPI implementations, and shows default choices can be up to 5x slower with up to 44% end-to-end training time reductions in simulator replays.

RAP: Runtime Adaptive Pruning for LLM Inference

cs.LG · 2025-05-22 · unverdicted · novelty 5.0

RAP is a reinforcement learning framework for runtime-adaptive pruning of LLMs that jointly optimizes model weights and KV-cache usage under varying memory budgets.

citing papers explorer

Showing 31 of 31 citing papers.