pith. machine review for the scientific record. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

344 papers in cs.AR · page 1

  1. quant-ph 2026-05-14 reviewed
    Cache reorganization lifts GPU speedups for 28-qubit simulations on laptops

    Accelerating State-Vector Quantum Simulation on Integrated GPUs via Cache Locality Optimization: A Cross-Architecture Evaluation

    Eduarda Rodrigues Monteiro +4

  2. cs.ET 2026-05-13 reviewed
    Time-domain near-memory MAC reaches 7.62 TOPS/W

    Time Domain Near Memory Computing Engine

    Sarthak Antal +1

  3. cs.CV 2026-05-13 reviewed
    ViTs reach 84% accuracy by replacing layer norm with evolved scalars

    Evolving Layer-Specific Scalar Functions for Hardware-Aware Transformer Adaptation

    Amirhossein Sadough +3

  4. cs.AR 2026-05-13 reviewed
    End-to-end DVS-memristor system is the missing piece for low-power vision

    Memristor Technologies for Dynamic Vision Sensors: A Critical Assessment and Research Roadmap

    Edris Zaman Farsa +3

  5. cs.AR 2026-05-13 reviewed
    FPGA accelerator skips sparse beams for 2x faster MIMO localization

    Efficient Implementation of an Adaptive Transformer Accelerator for Massive MIMO Outdoor Localization

    Ilayda Yaman +3

  6. cs.AR 2026-05-13 reviewed
    7B model surpasses 671B baselines on SVA generation

    Reward-Weighted On-Policy Distillation with an Open Property-Equivalence Verifier for NL-to-SVA Generation

    Bingsheng He +4

  7. cs.AR 2026-05-13 reviewed
    FPGA lock agents boost OLTP throughput 51X over CPUs

    FPGA-Accelerated Lock Management and Transaction Processing: Architecture, Optimization, and Design Space Exploration

    Gustavo Alonso +1

  8. cs.AR 2026-05-13 reviewed
    PoisonCap gives CHERI strict use-after-free at zero overhead

    PoisonCap: Efficient Hierarchical Temporal Safety for CHERI

    Alexandre Joannou +7

  9. cs.LG 2026-05-12 reviewed
    Block-scale search cuts quantization error 27% in BFP

    Search Your Block Floating Point Scales!

    Austin Silveria +12

  10. cs.AR 2026-05-12 reviewed
    Joint TLB-cache tweaks boost instruction prefetching 8.7%

    Enhancing Instruction Prefetching via Cache and TLB Management

    Alexandre Valentin Jamet +4

  11. cs.AR 2026-05-12 reviewed
    FPGA SoC matches silicon SNN accuracy for neuromorphic edge tasks

    Heterogeneous SoC Integrating an Open-Source Recurrent SNN Accelerator for Neuromorphic Edge Computing on FPGA

    Enrico Macii +3

  12. quant-ph 2026-05-12 reviewed
    Calibration feedback control cuts optimization gaps in local and tight-loop regimes

    Runtime Calibration as State-Trajectory Feedback Control in Quantum-Classical Workflows

    Xiaolong Deng

  13. cs.LG 2026-05-12 reviewed
    Cumulative updates fix gradient flow in low-power RNNs

    Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications

    Arthur Fyon +3

  14. cs.AR 2026-05-11 reviewed
    Dynamic scheduler lifts MoE inference 1.3-1.6x on PIM hardware

    Sieve: Dynamic Expert-Aware PIM Acceleration for Evolving Mixture-of-Experts Models

    Christos Kozyrakis +7

  15. cs.AR 2026-05-11 reviewed
    Triton gains direct warp-group control for modern GPU hardware

    TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments

    Daohang Shi +12

  16. cs.AR 2026-05-11 reviewed
    TLX adds MIMW warp-group control to Triton for modern GPUs

    TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments

    Daohang Shi +12

  17. cs.CR 2026-05-11 reviewed
    LLMs automate chip design but create security risks

    LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

    Johann Knechtel +2

  18. cs.CR 2026-05-11 reviewed
    LLMs generate hardware code but introduce security risks

    LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

    Johann Knechtel +2

  19. cs.AR 2026-05-11 reviewed
    Hybrid chip runs GNN at 2.94M events/sec for physics triggers

    Reconfigurable Computing Challenge: Real-Time Graph Neural Networks for Online Event Selection in Big Science

    Fabio Papagno +5

  20. cs.AR 2026-05-11 reviewed
    Error profiles detect stolen approximate circuit IP despite mimicry

    ObfAx: Obfuscation and IP Piracy Detection in Approximate Circuits

    Lukas Sekanina +1

  21. cs.AR 2026-05-11 reviewed
    Piezoelectric sensors turn desk vibrations into six-gesture commands

    Towards an End-To-End System for Real-Time Gesture Recognition from Surface Vibrations

    Andreas Erbsl\"oh +5

  22. cs.AI 2026-05-11 reviewed
    Hardware assertion sets reduced by 76 percent

    Arcane: An Assertion Reduction Framework through Semantic Clustering and MCTS-Guided Rule Exploring

    Hongqin Lyu +4

  23. cs.AR 2026-05-11 reviewed
    LLM agents size RF amplifiers via resource allocation

    RFAmpDesigner: A Self-Evolving Multi-Agent LLM Framework for Automated Radio Frequency Amplifier Design

    Chunyi Song +11

  24. cs.AR 2026-05-10 reviewed
    KV-cache movement regularization cuts static-graph LLM latency spikes

    KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving

    Bolun Sun +5

  25. cs.AR 2026-05-10 reviewed
    Wafer integration of three 2D devices decides next computing decade

    Emerging 2D Materials for Beyond von Neumann Computing: A Perspective

    Yaser Banad

  26. cs.CL 2026-05-10 reviewed
    LLM accuracy depends only on evicted tokens

    Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning

    Aojie Yuan +2

  27. cs.AR 2026-05-10 reviewed
    ReRAM-on-logic chip reaches 14-136 tokens per second on LLMs

    31.1 A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding

    Chi-Ying Tsui +15

  28. quant-ph 2026-05-10 reviewed
    Memoized heuristics scale ion-trap qubit mapping

    Scaling Qubit Mapping and Routing With Position Graph Abstraction and Memoization

    Bao Bach +3

  29. cs.LG 2026-05-09 reviewed
    Apple MPS shows 21x latency spikes in narrow decoding ranges

    Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

    Willy Fitra Hendria

  30. cs.LG 2026-05-09 reviewed
    MPS decoding latency spikes up to 21x in narrow ranges

    Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

    Willy Fitra Hendria

  31. cs.AR 2026-05-09 reviewed
    New cache bypass method meets deadlines while boosting heterogeneous system speed

    HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators

    Anannya Mathur +2

  32. cs.AR 2026-05-09 reviewed
    HyDRA balances accelerator deadlines with cache reuse via clustering

    HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators

    Anannya Mathur +2

  33. eess.SP 2026-05-09 reviewed
    Low-complexity denoiser matches heavy mmWave MIMO methods

    Low-Complexity Beamspace Channel Denoiser for mmWave Massive MIMO with Low-Resolution ADCs

    Eunho Kim +2

  34. cs.AR 2026-05-09 reviewed
    Reconfigurable multiplier cuts power 44-68% in RISC-V core

    A Reconfigurable Multiplier Architecture for Error-Resilient Applications in RISC-V Core

    B. Srinivasu +2

  35. cs.AR 2026-05-09 reviewed
    DDR5 single sub-channel matches cache lines but loses 40-60% bandwidth

    Single 32-bit Sub-Channel DDR5 DIMMs: Architecture, Performance Bounds, and Standardisation

    Chih-Hua Ke

  36. cs.AR 2026-05-09 reviewed
    Edge processor hits 109 TFLOPS/W on DeepSeek

    DSPE: An Energy-Efficient Edge Processor for DeepSeek Inference with MerkleTree-based Incremental Pruning, Multi-Stage Boothing Lookup and Dynamic Adaptive Posit Processing

    10) +36

  37. cs.AR 2026-05-09 reviewed
    Coprime test vectors localize faulty rows in systolic arrays after one pass

    FLARE: One-Shot PE-Level Fault Localization in Systolic Arrays via Algebraic Test Vectors

    Logashree Venkatasubramanian (1) +2

  38. cs.AR 2026-05-08 reviewed
    Static checker decides barrier sufficiency for accelerator races

    AccelSync: Verifying Synchronization Coverage in Accelerator Pipeline Programs

    Depei Qian +2

  39. cs.AR 2026-05-08 reviewed
    Model runs 1024-core chip sims 115x faster at under 7% error

    Accelerating Precise End-to-End Simulation: Latency-Sensitive Many-core System Modeling

    Bowen Wang +7

  40. cs.ET 2026-05-08 reviewed
    Plasma simulations need three post-Moore tech tiers

    Post-Moore Technologies for Plasma Simulation: A Community Roadmap

    Ales Podolnik +23

  41. cs.LG 2026-05-08 reviewed
    GNNs for EDA succeed when matched to each task's native algebra

    Graph Computation Meets Circuit Algebra: A Task-Aligned Analysis of Graph Neural Networks for Electronic Design Automation

    Hyunmog Kim

  42. cs.AR 2026-05-08 reviewed
    Bit-hardening methods surpass ECC for reliable DNNs with no memory cost

    Effective and Memory-Efficient Alternatives to ECC for Reliable Large-Scale DNNs

    Jaan Raik +5

  43. cs.AR 2026-05-08 reviewed
    TREA accelerator reduces edge detection latency up to 9x

    TREA: Low-precision Time-Multiplexed, Resource-Efficient Edge Accelerator for Object Detection and Classification

    Mukul Lokhande +4

  44. cs.AR 2026-05-08 reviewed
    Reconfigurable FPU gives up to 8x throughput for low-precision dot products

    TransDot: An Area-efficient Reconfigurable Floating-Point Unit for Trans-Precision Dot-Product Accumulation for FPGA AI Engines

    Ang Li +4

  45. cs.AR 2026-05-07 reviewed
    Open schema and datasets released for ML benchmarks in chip design

    EDA-Schema-V2: A Multimodal Schema, Open Datasets, and Benchmarks for Machine Learning in Digital Physical Design

    Alec Aversa +2

  46. cs.AR 2026-05-07 reviewed
    Agents solve only 37% of practical chip design rule problems

    Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing

    Caiwen Ding +4

  47. cs.AR 2026-05-07 reviewed
    CORDIC iteration depth trims 33 percent of inference cycles

    CARMEN: CORDIC-Accelerated Resource-Efficient Multi-Precision Inference Engine for Deep Learning

    Adam Teman +3

  48. cs.AR 2026-05-07 reviewed
    Posit engine cuts ADAS power by 72 percent with near full accuracy

    EULER-ADAS: Energy-Efficient & SIMD-Unified Logarithmic-Posit Engine for Precision-Reconfigurable Approximate ADAS Acceleration

    Adam Teman +4

  49. physics.chem-ph 2026-05-07 reviewed
    FPGA YOLOv3-Tiny system detects in 0.211 seconds

    Development of embedded target detection system based on FPGA and YOLOv3-Tiny

    Fanghao Liu +7

  50. cs.CV 2026-05-07 reviewed
    Self-supervised pretraining yields tiny wildfire spotters for satellites

    On-Orbit Real-Time Wildfire Detection Under On-Board Constraints

    Dimitri Scheftelowitsch +8