pith. sign in

hub Canonical reference

Tacos: Topology-aware collective algorithm synthesizer for distributed machine learning

Canonical reference. 100% of citing Pith papers cite this work as background.

26 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 8

citation-polarity summary

years

2026 25 2025 1

roles

background 8

polarities

background 8

representative citing papers

Enabling AI ASICs for Zero Knowledge Proof

cs.AR · 2026-04-20 · conditional · novelty 8.0

MORPH reformulates ZKP MSM and NTT kernels into GEMM operations for TPUs using a new Big-T complexity model, achieving up to 10x NTT throughput over GZKP.

Unprivileged Topology Certificates for Cloud GPU Attestation

cs.CR · 2026-06-22 · unverdicted · novelty 7.0

CUDA latency matrix measurements produce unprivileged certificates that fingerprint individual GPU dies, recover cross-generation topology, and bind to datacenter location via public network probes.

Enhancing Instruction Prefetching via Cache and TLB Management

cs.AR · 2026-05-12 · unverdicted · novelty 7.0

IP-CaT jointly optimizes TLB and cache management for L1I prefetching via a translation prefetch buffer and trimodal replacement policy, yielding 8.7% geomean speedup over EPI across 105 server workloads.

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators

cs.AI · 2025-11-05 · unverdicted · novelty 7.0

SnapStream deploys sparse KV attention in a production inference system on dataflow accelerators, delivering 4x on-chip memory savings for DeepSeek-671B at 128k context with up to 1832 tokens/sec and minimal accuracy loss on LongBench-v2, AIME24, and LiveCodeBench.

Residual GPU Cache State on Apple M4 Pro

cs.AR · 2026-06-25 · unverdicted · novelty 6.0

Characterizes a reproducible post-GPU cache-displacement window on M4 Pro and quantifies a one-pass CPU recovery mechanism via Metal experiments and PMU data.

WHET: Welding Homomorphic Encryption to Accelerator Architectures

cs.CR · 2026-06-10 · unverdicted · novelty 6.0

WHET applies fine-grained coefficient-to-slot transforms, plaintext compression, and modulus raising plus lightweight hardware tweaks to FHE accelerators, delivering 1.38-8.74x per-area gains and sub-millisecond CKKS bootstrapping.

Stencil Computations on Cerebras Wafer-Scale Engine

cs.DC · 2026-05-08 · unverdicted · novelty 6.0

CStencil on the WSE-3 achieves up to 342x speedup for 2D stencils versus an adapted single-precision GPU solver and saturates both compute and on-chip memory bandwidth.

Proxics: an efficient programming model for far memory accelerators

cs.OS · 2026-04-20 · conditional · novelty 6.0

Proxics introduces lightweight virtual processors and low-latency communication channels as portable OS abstractions for programming near-data processing accelerators, demonstrated on real hardware for memory-intensive workloads.

citing papers explorer

Showing 26 of 26 citing papers.