archive

Every paper Pith has read. Search by title, abstract, or pith.

344 papers in cs.AR · page 1

quant-ph 2026-05-14 reviewed

Cache reorganization lifts GPU speedups for 28-qubit simulations on laptops
Accelerating State-Vector Quantum Simulation on Integrated GPUs via Cache Locality Optimization: A Cross-Architecture Evaluation

Eduarda Rodrigues Monteiro +4
cs.ET 2026-05-13 reviewed

Time-domain near-memory MAC reaches 7.62 TOPS/W
Time Domain Near Memory Computing Engine

Sarthak Antal +1
cs.CV 2026-05-13 reviewed

ViTs reach 84% accuracy by replacing layer norm with evolved scalars
Evolving Layer-Specific Scalar Functions for Hardware-Aware Transformer Adaptation

Amirhossein Sadough +3
cs.AR 2026-05-13 reviewed

End-to-end DVS-memristor system is the missing piece for low-power vision
Memristor Technologies for Dynamic Vision Sensors: A Critical Assessment and Research Roadmap

Edris Zaman Farsa +3
cs.AR 2026-05-13 reviewed

FPGA accelerator skips sparse beams for 2x faster MIMO localization
Efficient Implementation of an Adaptive Transformer Accelerator for Massive MIMO Outdoor Localization

Ilayda Yaman +3
cs.AR 2026-05-13 reviewed

7B model surpasses 671B baselines on SVA generation
Reward-Weighted On-Policy Distillation with an Open Property-Equivalence Verifier for NL-to-SVA Generation

Bingsheng He +4
cs.AR 2026-05-13 reviewed

FPGA lock agents boost OLTP throughput 51X over CPUs
FPGA-Accelerated Lock Management and Transaction Processing: Architecture, Optimization, and Design Space Exploration

Gustavo Alonso +1
cs.AR 2026-05-13 reviewed

PoisonCap gives CHERI strict use-after-free at zero overhead
PoisonCap: Efficient Hierarchical Temporal Safety for CHERI

Alexandre Joannou +7
cs.LG 2026-05-12 reviewed

Block-scale search cuts quantization error 27% in BFP
Search Your Block Floating Point Scales!

Austin Silveria +12
cs.AR 2026-05-12 reviewed

Joint TLB-cache tweaks boost instruction prefetching 8.7%
Enhancing Instruction Prefetching via Cache and TLB Management

Alexandre Valentin Jamet +4
cs.AR 2026-05-12 reviewed

FPGA SoC matches silicon SNN accuracy for neuromorphic edge tasks
Heterogeneous SoC Integrating an Open-Source Recurrent SNN Accelerator for Neuromorphic Edge Computing on FPGA

Enrico Macii +3
quant-ph 2026-05-12 reviewed

Calibration feedback control cuts optimization gaps in local and tight-loop regimes
Runtime Calibration as State-Trajectory Feedback Control in Quantum-Classical Workflows

Xiaolong Deng
cs.LG 2026-05-12 reviewed

Cumulative updates fix gradient flow in low-power RNNs
Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications

Arthur Fyon +3
cs.AR 2026-05-11 reviewed

Dynamic scheduler lifts MoE inference 1.3-1.6x on PIM hardware
Sieve: Dynamic Expert-Aware PIM Acceleration for Evolving Mixture-of-Experts Models

Christos Kozyrakis +7
cs.AR 2026-05-11 reviewed

Triton gains direct warp-group control for modern GPU hardware
TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments

Daohang Shi +12
cs.AR 2026-05-11 reviewed

TLX adds MIMW warp-group control to Triton for modern GPUs
TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments

Daohang Shi +12
cs.CR 2026-05-11 reviewed

LLMs automate chip design but create security risks
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

Johann Knechtel +2
cs.CR 2026-05-11 reviewed

LLMs generate hardware code but introduce security risks
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

Johann Knechtel +2
cs.AR 2026-05-11 reviewed

Hybrid chip runs GNN at 2.94M events/sec for physics triggers
Reconfigurable Computing Challenge: Real-Time Graph Neural Networks for Online Event Selection in Big Science

Fabio Papagno +5
cs.AR 2026-05-11 reviewed

Error profiles detect stolen approximate circuit IP despite mimicry
ObfAx: Obfuscation and IP Piracy Detection in Approximate Circuits

Lukas Sekanina +1
cs.AR 2026-05-11 reviewed

Piezoelectric sensors turn desk vibrations into six-gesture commands
Towards an End-To-End System for Real-Time Gesture Recognition from Surface Vibrations

Andreas Erbsl\"oh +5
cs.AI 2026-05-11 reviewed

Hardware assertion sets reduced by 76 percent
Arcane: An Assertion Reduction Framework through Semantic Clustering and MCTS-Guided Rule Exploring

Hongqin Lyu +4
cs.AR 2026-05-11 reviewed

LLM agents size RF amplifiers via resource allocation
RFAmpDesigner: A Self-Evolving Multi-Agent LLM Framework for Automated Radio Frequency Amplifier Design

Chunyi Song +11
cs.AR 2026-05-10 reviewed

KV-cache movement regularization cuts static-graph LLM latency spikes
KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving

Bolun Sun +5
cs.AR 2026-05-10 reviewed

Wafer integration of three 2D devices decides next computing decade
Emerging 2D Materials for Beyond von Neumann Computing: A Perspective

Yaser Banad
cs.CL 2026-05-10 reviewed

LLM accuracy depends only on evicted tokens
Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning

Aojie Yuan +2
cs.AR 2026-05-10 reviewed

ReRAM-on-logic chip reaches 14-136 tokens per second on LLMs
31.1 A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding

Chi-Ying Tsui +15
quant-ph 2026-05-10 reviewed

Memoized heuristics scale ion-trap qubit mapping
Scaling Qubit Mapping and Routing With Position Graph Abstraction and Memoization

Bao Bach +3
cs.LG 2026-05-09 reviewed

Apple MPS shows 21x latency spikes in narrow decoding ranges
Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

Willy Fitra Hendria
cs.LG 2026-05-09 reviewed

MPS decoding latency spikes up to 21x in narrow ranges
Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

Willy Fitra Hendria
cs.AR 2026-05-09 reviewed

New cache bypass method meets deadlines while boosting heterogeneous system speed
HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators

Anannya Mathur +2
cs.AR 2026-05-09 reviewed

HyDRA balances accelerator deadlines with cache reuse via clustering
HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators

Anannya Mathur +2
eess.SP 2026-05-09 reviewed

Low-complexity denoiser matches heavy mmWave MIMO methods
Low-Complexity Beamspace Channel Denoiser for mmWave Massive MIMO with Low-Resolution ADCs

Eunho Kim +2
cs.AR 2026-05-09 reviewed

Reconfigurable multiplier cuts power 44-68% in RISC-V core
A Reconfigurable Multiplier Architecture for Error-Resilient Applications in RISC-V Core

B. Srinivasu +2
cs.AR 2026-05-09 reviewed

DDR5 single sub-channel matches cache lines but loses 40-60% bandwidth
Single 32-bit Sub-Channel DDR5 DIMMs: Architecture, Performance Bounds, and Standardisation

Chih-Hua Ke
cs.AR 2026-05-09 reviewed

Edge processor hits 109 TFLOPS/W on DeepSeek
DSPE: An Energy-Efficient Edge Processor for DeepSeek Inference with MerkleTree-based Incremental Pruning, Multi-Stage Boothing Lookup and Dynamic Adaptive Posit Processing

10) +36
cs.AR 2026-05-09 reviewed

Coprime test vectors localize faulty rows in systolic arrays after one pass
FLARE: One-Shot PE-Level Fault Localization in Systolic Arrays via Algebraic Test Vectors

Logashree Venkatasubramanian (1) +2
cs.AR 2026-05-08 reviewed

Static checker decides barrier sufficiency for accelerator races
AccelSync: Verifying Synchronization Coverage in Accelerator Pipeline Programs

Depei Qian +2
cs.AR 2026-05-08 reviewed

Model runs 1024-core chip sims 115x faster at under 7% error
Accelerating Precise End-to-End Simulation: Latency-Sensitive Many-core System Modeling

Bowen Wang +7
cs.ET 2026-05-08 reviewed

Plasma simulations need three post-Moore tech tiers
Post-Moore Technologies for Plasma Simulation: A Community Roadmap

Ales Podolnik +23
cs.LG 2026-05-08 reviewed

GNNs for EDA succeed when matched to each task's native algebra
Graph Computation Meets Circuit Algebra: A Task-Aligned Analysis of Graph Neural Networks for Electronic Design Automation

Hyunmog Kim
cs.AR 2026-05-08 reviewed

Bit-hardening methods surpass ECC for reliable DNNs with no memory cost
Effective and Memory-Efficient Alternatives to ECC for Reliable Large-Scale DNNs

Jaan Raik +5
cs.AR 2026-05-08 reviewed

TREA accelerator reduces edge detection latency up to 9x
TREA: Low-precision Time-Multiplexed, Resource-Efficient Edge Accelerator for Object Detection and Classification

Mukul Lokhande +4
cs.AR 2026-05-08 reviewed

Reconfigurable FPU gives up to 8x throughput for low-precision dot products
TransDot: An Area-efficient Reconfigurable Floating-Point Unit for Trans-Precision Dot-Product Accumulation for FPGA AI Engines

Ang Li +4
cs.AR 2026-05-07 reviewed

Open schema and datasets released for ML benchmarks in chip design
EDA-Schema-V2: A Multimodal Schema, Open Datasets, and Benchmarks for Machine Learning in Digital Physical Design

Alec Aversa +2
cs.AR 2026-05-07 reviewed

Agents solve only 37% of practical chip design rule problems
Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing

Caiwen Ding +4
cs.AR 2026-05-07 reviewed

CORDIC iteration depth trims 33 percent of inference cycles
CARMEN: CORDIC-Accelerated Resource-Efficient Multi-Precision Inference Engine for Deep Learning

Adam Teman +3
cs.AR 2026-05-07 reviewed

Posit engine cuts ADAS power by 72 percent with near full accuracy
EULER-ADAS: Energy-Efficient & SIMD-Unified Logarithmic-Posit Engine for Precision-Reconfigurable Approximate ADAS Acceleration

Adam Teman +4
physics.chem-ph 2026-05-07 reviewed

FPGA YOLOv3-Tiny system detects in 0.211 seconds
Development of embedded target detection system based on FPGA and YOLOv3-Tiny

Fanghao Liu +7
cs.CV 2026-05-07 reviewed

Self-supervised pretraining yields tiny wildfire spotters for satellites
On-Orbit Real-Time Wildfire Detection Under On-Board Constraints

Dimitri Scheftelowitsch +8