pith. machine review for the scientific record. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

115 papers in cs.PF · page 1

  1. quant-ph 2026-05-14 reviewed
    Cache reorganization lifts GPU speedups for 28-qubit simulations on laptops

    Accelerating State-Vector Quantum Simulation on Integrated GPUs via Cache Locality Optimization: A Cross-Architecture Evaluation

    Eduarda Rodrigues Monteiro +4

  2. cs.OS 2026-05-14 reviewed
    LLM tunes Linux knobs for 72 percent stable gain over defaults

    SemaTune: Semantic-Aware Online OS Tuning with Large Language Models

    Georgios Liargkovas +3

  3. cs.DC 2026-05-13 reviewed
    Heterogeneous solvers up to 32% faster than GPU-only for big matrices

    Comparing the Performance of Heterogeneous Conjugate Gradient and Cholesky Solvers on Various Hardware Using SYCL

    Alexander Strack +2

  4. cs.LG 2026-05-12 reviewed
    Block-scale search cuts quantization error 27% in BFP

    Search Your Block Floating Point Scales!

    Austin Silveria +12

  5. cs.PF 2026-05-12 reviewed
    Adaptive packed layouts enable efficient VLA ML code

    Scalable Packed Layouts for Vector-Length-Agnostic ML Code Generation

    Ege Beysel +2

  6. cs.AR 2026-05-12 reviewed
    Joint TLB-cache tweaks boost instruction prefetching 8.7%

    Enhancing Instruction Prefetching via Cache and TLB Management

    Alexandre Valentin Jamet +4

  7. cs.IT 2026-05-12 reviewed
    Node failures scale wireless capacity and delay with sqrt of reliable nodes

    On Capacity and Delay of Wireless Networks with Node Failures

    Jiandong Li +3

  8. cs.DC 2026-05-12 reviewed
    Power capping leaves LLM decode energy untouched

    The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures

    Ayesha Afzal +3

  9. cs.DC 2026-05-11 reviewed
    Chakra standardizes graph traces for AI workload benchmarking

    MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

    Andy Balogh +27

  10. cs.LG 2026-05-11 reviewed
    DMI-Lib cuts LLM internal observability overhead to 0.4-6.8 percent

    Enabling Performant and Flexible Model-Internal Observability for LLM Inference

    Nengneng Yu +4

  11. cs.DC 2026-05-11 reviewed
    Edge micro-agent fixes failures safely with no destructive actions

    An Uncertainty-Aware Resilience Micro-Agent for Causal Observability in the Computing Continuum

    Alaa Saleh +4

  12. cs.GR 2026-05-11 reviewed
    Inverted culling speeds dynamic LiDAR ray tracing

    Geometrically Approximated Modeling for Emitter-Centric Ray-Triangle Filtering in Arbitrarily Dynamic LiDAR Simulation

    Joonas Haapala +2

  13. cs.CR 2026-05-11 reviewed
    KEM-IES upgrades ECIES with PQC KEM and Ascon

    Key Encapsulation Mechanism-Based Integrated Encryption Scheme (KEM-IES)

    Abel C. H. Chen

  14. cs.RO 2026-05-11 reviewed
    Caching reuses diffusion steps for 4.6x faster robot plans

    Muninn: Your Trajectory Diffusion Model But Faster

    Gokul Puthumanaillam +6

  15. cs.CR 2026-05-11 reviewed
    Mamba-2 classifies network bursts directly from raw bytes

    MambaNetBurst: Direct Byte-level Network Traffic Classification without Tokenization or Pretraining

    Gayan K. Kulatilleke +3

  16. cs.DC 2026-05-10 reviewed
    Cloud trace decomposition predicts performance at 2% error

    Cloud Performance Decomposition for Long-Term Performance Engineering: A Case Study

    Donald Lien +4

  17. cs.DC 2026-05-10 reviewed
    Adaptive DNN splits cut energy by 27-36% on real edge-cloud hardware

    Adaptive DNN Partitioning and Offloading in Heterogeneous Edge-Cloud Continuum

    Akuen Akoi Deng +3

  18. cs.LG 2026-05-09 reviewed
    Apple MPS shows 21x latency spikes in narrow decoding ranges

    Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

    Willy Fitra Hendria

  19. cs.LG 2026-05-09 reviewed
    MPS decoding latency spikes up to 21x in narrow ranges

    Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

    Willy Fitra Hendria

  20. cs.PF 2026-05-09 reviewed
    4.46× jump in quantum sim time at 29 qubits on M4 Pro

    A Controlled Study of Memory Hierarchy Transitions in Quantum Circuit Simulation on Apple M4 Pro Unified Memory Architecture

    Gyan Pratipat

  21. cs.PF 2026-05-09 reviewed
    GPU speedups reach 10x despite 1.85x bandwidth limit in quantum simulation

    A Controlled Study of Memory Hierarchy Transitions in Quantum Circuit Simulation on Apple M4 Pro Unified Memory Architecture

    Gyan Pratipat

  22. cs.PF 2026-05-09 reviewed
    Single-thread JPEG benchmarks misrank decoders for DataLoaders

    Single-Thread JPEG Decoder Benchmarks Mis-Evaluate ML Data Loaders

    Vladimir Iglovikov

  23. cs.AR 2026-05-09 reviewed
    DDR5 single sub-channel matches cache lines but loses 40-60% bandwidth

    Single 32-bit Sub-Channel DDR5 DIMMs: Architecture, Performance Bounds, and Standardisation

    Chih-Hua Ke

  24. cs.LG 2026-05-08 reviewed
    Cyclic tuning raises RAG quality by up to 54 percent

    CDS4RAG: Cyclic Dual-Sequential Hyperparameter Optimization for RAG

    Pengzhou Chen +1

  25. cs.LG 2026-05-08 reviewed
    Unified runtime delivers 2.55x decode speedup for low-rank transformers

    FlashSVD v1.5: Making Low-Rank Transformers Inference Actually Fast

    Danyang Zhuo +7

  26. cs.LG 2026-05-08 reviewed
    Fluxion speeds long-context inference 1.5x-3.7x via CPU-GPU hybrid sparse attention

    An Efficient Hybrid Sparse Attention with CPU-GPU Parallelism for Long-Context Inference

    Feiyu Yao +5

  27. cs.LG 2026-05-08 reviewed
    First benchmark supplies real data for LLM hyperparameter tuning

    LLMSYS-HPOBench: Hyperparameter Optimization Benchmark Suite for Real-World LLM Systems

    Gangda Xiong +5

  28. cs.DC 2026-05-07 reviewed
    AD replaces finite differences in INLA for 4-8x gradient speedups

    ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations

    Afif Boudaoud +8

  29. cs.AR 2026-05-07 reviewed
    Pipeline speeds power-of-two DNNs on edge FPGAs by up to 3.6x

    PoTAcc: A Pipeline for End-to-End Acceleration of Power-of-Two Quantized DNNs

    David Kaeli +4

  30. cs.AR 2026-05-07 reviewed
    LLMs automate FPGA accelerator design space exploration

    LLM-Driven Design Space Exploration of FPGA-based Accelerators

    Jos\'e Cano +3

  31. cs.PF 2026-05-07 reviewed
    Int4 KV cache outruns fp16 on Apple Silicon

    When Quantization Is Free: An int4 KV Cache That Outruns fp16 on Apple Silicon

    Mohamed Amine Bergach

  32. cs.LG 2026-05-06 reviewed
    Task category explains 3x more variance than method in LLM kernel correctness

    KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

    Han Wang +5

  33. cs.LG 2026-05-06 reviewed
    Task category predicts LLM kernel success far better than generation method

    KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

    Han Wang +5

  34. cs.GR 2026-05-06 reviewed
    Algebraic coarsening delivers 3x speedup in GPU contact solves

    AGIPC: Adaptive In-Solve Algebraic Coarsening for GPU IPC

    Kemeng Huang +4

  35. cs.PF 2026-05-06 reviewed
    LLM agents turn GPU profiles into optimization advice

    KEET: Explaining Performance of GPU Kernels Using LLM Agents

    Aadit Nilay +7

  36. cs.GT 2026-05-05 reviewed
    Light storage limits turn content-provider competition into a potential game

    Decentralized Edge Caching under Budget and Storage Constraints: A Game-Theoretic Approach

    Danilo Ardagna +3

  37. cs.AR 2026-05-05 reviewed
    4-5 workloads preserve 96-99% of SPEC CPU2026 behavior

    SPEC CPU2026: Characterization, Representativeness, and Cross-Suite Comparison

    Andrew Jacob +3

  38. cs.AR 2026-05-05 reviewed
    SPEC CPU2026 increases instruction volume and cache pressure

    SPEC CPU2026: Characterization, Representativeness, and Cross-Suite Comparison

    Andrew Jacob +3

  39. cs.DC 2026-05-05 reviewed
    GPU speeds exascale trace analysis by 314 times

    Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics

    Dragana Grbic (Department of Computer Science +1

  40. cs.DC 2026-05-05 reviewed
    GPU layer speeds exascale trace analysis by up to 314x

    Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics

    Dragana Grbic (Department of Computer Science +1

  41. cs.PF 2026-05-04 reviewed
    Same LLM name produces different services by host

    When Is the Same Model Not the Same Service? A Measurement Study of Hosted Open-Weight LLM APIs

    Dongsheng Liu +9

  42. cs.PF 2026-05-04 reviewed
    Same model name yields different speed

    When Is the Same Model Not the Same Service? A Measurement Study of Hosted Open-Weight LLM APIs

    Dongsheng Liu +9

  43. cs.LG 2026-05-04 reviewed
    Streaming top-k runs CSA indexer to 1M tokens on 6 GB

    StreamIndex: Memory-Bounded Compressed Sparse Attention via Streaming Top-k

    Jaber Jaber +1

  44. cs.CR 2026-05-04 reviewed
    Two post-quantum signatures pass Australia's payment speed test

    Post-Quantum Cryptography Migration in Australian Real-Time Payment Infrastructure: A Monte Carlo Simulation Study of the New Payments Platform

    Nazmus Salehin Sammo

  45. cs.PF 2026-05-02 reviewed
    SPEC CPU 2026 standardizes mixed-workload CPU benchmarking

    SPEC CPU: The Next Generation

    Allen Lee +33

  46. cs.PF 2026-05-02 reviewed
    Response time distributions derived for priority queues with preemption overhead

    Priority Scheduling in the M/G/1 with Preemption Overhead

    Edwin Peng +2

  47. cs.PL 2026-05-01 reviewed
    Compiler splits recursive datatypes into separate field buffers

    SoCal: A Language for Memory-Layout Factorization of Recursive Datatypes

    Artem Pelenitsyn +5

  48. cs.DC 2026-05-01 reviewed
    Fixed-core approach yields 211x higher efficiency for edge GEMM

    Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge

    J. N\'u\~nez-Y\'a\~nez +1

  49. cs.PF 2026-05-01 reviewed
    Apple Silicon runs 80B LLMs at 23x Nvidia energy efficiency

    Silicon Showdown: Performance, Efficiency, and Ecosystem Barriers in Consumer-Grade LLM Inference

    Abdurrahman Javat +1

  50. stat.ME 2026-05-01 reviewed
    Workflow turns raw measurements into defensible ECE/CS results

    How to Do Statistical Evaluations in ECE/CS Papers: A Practical Playbook for Defensible Results

    Bhaskar Krishnamachari