In2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Trans-FW: Short Circuiting Page Table Walk in Multi-GPU Systems via Remote Forwarding · 2023 · arXiv 6546.2023

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

Logical Compilation for Multi-Qubit Iceberg Patches

quant-ph · 2026-04-10 · unverdicted · novelty 8.0

A new heuristic compiler for multi-qubit iceberg patches reduces circuit depth by 34 percent, cuts gate counts, and improves fidelity metrics on 71 benchmarks compared with naive mapping.

Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods

cs.DC · 2026-04-02 · unverdicted · novelty 7.0

Simulation study shows cold TLB misses in reverse address translation dominate latency for small collectives in multi-GPU pods, causing up to 1.4x degradation, while larger ones see diminishing returns.

ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

ELMoE-3D achieves 6.6x average speedup and 4.4x energy efficiency gain for MoE serving on 3D hardware by scaling expert and bit elasticity for elastic self-speculative decoding.

The Energy Cost of Execution-Idle in GPU Clusters

cs.DC · 2026-04-06 · unverdicted · novelty 6.0

Execution-idle accounts for 19.7% of GPU execution time and 10.7% of energy in a large cluster, motivating power management that treats it as a distinct operating state.

AEGIS: Scaling Long-Sequence Homomorphic Encrypted Transformer Inference via Hybrid Parallelism on Multi-GPU Systems

cs.CR · 2026-04-03 · unverdicted · novelty 6.0 · 3 refs

AEGIS reduces inter-GPU communication by up to 81.3% in self-attention and reaches 96.62% scaling efficiency with 3.86x speedup on four GPUs for 2048-token encrypted Transformer inference.

Efficient Page Migration in Hybrid Memory Systems

cs.AR · 2026-04-21 · unverdicted · novelty 5.0

Duon eliminates TLB shootdown and cache invalidation costs during page migration in flat-address hybrid memory systems by updating mappings in-place, delivering 3.87% IPC gains over prior methods.

citing papers explorer

Showing 6 of 6 citing papers.

Logical Compilation for Multi-Qubit Iceberg Patches quant-ph · 2026-04-10 · unverdicted · none · ref 4
A new heuristic compiler for multi-qubit iceberg patches reduces circuit depth by 34 percent, cuts gate counts, and improves fidelity metrics on 71 benchmarks compared with naive mapping.
Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods cs.DC · 2026-04-02 · unverdicted · none · ref 66
Simulation study shows cold TLB misses in reverse address translation dominate latency for small collectives in multi-GPU pods, causing up to 1.4x degradation, while larger ones see diminishing returns.
ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving cs.LG · 2026-04-16 · unverdicted · none · ref 26
ELMoE-3D achieves 6.6x average speedup and 4.4x energy efficiency gain for MoE serving on 3D hardware by scaling expert and bit elasticity for elastic self-speculative decoding.
The Energy Cost of Execution-Idle in GPU Clusters cs.DC · 2026-04-06 · unverdicted · none · ref 61
Execution-idle accounts for 19.7% of GPU execution time and 10.7% of energy in a large cluster, motivating power management that treats it as a distinct operating state.
AEGIS: Scaling Long-Sequence Homomorphic Encrypted Transformer Inference via Hybrid Parallelism on Multi-GPU Systems cs.CR · 2026-04-03 · unverdicted · none · ref 3 · 3 links
AEGIS reduces inter-GPU communication by up to 81.3% in self-attention and reaches 96.62% scaling efficiency with 3.86x speedup on four GPUs for 2048-token encrypted Transformer inference.
Efficient Page Migration in Hybrid Memory Systems cs.AR · 2026-04-21 · unverdicted · none · ref 17
Duon eliminates TLB shootdown and cache invalidation costs during page migration in flat-address hybrid memory systems by updating mappings in-place, delivering 3.87% IPC gains over prior methods.

In2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

fields

years

verdicts

representative citing papers

citing papers explorer