cs.ET — Pith

0

cs.ET 2026-05-13 Recognition

Latched memristor cell cuts AI search energy by 33 percent

A Fast and Energy-Efficient Latch-Based Memristive Analog Content-Addressable Memory

Dynamic comparator removes gain and crosstalk limits of earlier designs and supports workload-specific energy-latency tradeoffs.

abstract click to expand

Analog content-addressable memories (aCAMs) based on memristors provide a promising pathway toward energy-efficient large-scale associative computing for Edge AI and embedded intelligence applications. They have been successfully applied to decision-tree inference and extend the capabilities of compute-in-memory (CIM) architectures beyond conventional vector-matrix multiplication. However, conventional designs such as the 6T2M architecture suffer from static search power, limited voltage gain, and pronounced match-line crosstalk, constraining analog precision and scalability. We introduce a strong-arm latched memristor (SALM) aCAM cell that replaces static voltage division with a dynamic current-race comparator, enabling high regenerative gain, intrinsic result latching, and near-zero static search power. Compared to 6T2M, SALM reduces read energy by 33% at identical latency while eliminating the gain and crosstalk limitations that prevent 6T2M from scaling to large arrays. SALM further enables scalable sequential and parallel latch sharing, and a dataset-aware optimization framework exposes an explicit energy-latency tradeoff, achieving up to 50% energy reduction at 3x latency across representative workloads. To enable architectural exploration, we develop a circuit-accurate behavioral model derived from SPICE lookup tables in 22 nm FD-SOI technology, capturing match-line dynamics and crosstalk. Integrated into the X-TIME decision-tree compiler, this framework demonstrates that SALM maintains near-software accuracy for high-dimensional datasets, whereas baseline designs degrade due to limited gain and cumulative crosstalk.

0

cs.ET 2026-05-11 Recognition

Bayesian optimization cuts CIM chip area by up to 65% for DNN inference

Bayesian Optimization of Crossbar-Based Compute-In-Memory System Design for Efficient DNN Inference

Co-optimizing hardware and algorithm parameters across 10^27 design spaces maintains accuracy while reducing latency and energy

abstract click to expand

Leveraging the high density and energy efficiency of Compute-In-Memory (CIM) crossbar-based Deep Neural Network (DNN) accelerators requires optimal Design Space Exploration (DSE), which becomes increasingly challenging as complex models for advanced AI workloads expand the highly non-convex design space. Moreover, heterogeneous layer workloads (e.g., memory- vs. compute-bound) and learning representations make layer-wise NN parameter allocation beneficial for efficiency but severely exacerbate the design space complexity by expanding the number of parameters to be tuned for simultaneous multi-objective optimization. Among existing DSE approaches, multi-objective Bayesian Optimization (BO) is promising, as it explores high-quality design solutions while querying costly CIM simulators selectively. In this work, we propose a multi-objective BO framework that holistically co-optimizes hardware and algorithm parameters of a CIM crossbar-based hardware accelerator for various DNN inference tasks. Depending on NN model depth, our framework handles high-dimensional design spaces (with $26$ and $50$ dimensions) and extremely large search complexities on the order of $O(10^{12})$ and $O(10^{27})$ for VGG8/CIFAR-10 and VGG16/Tiny-ImageNet-200. Our method attains $91.72 \%$ and $57.2 \%$ accuracy, respectively, comparable to baseline designs, while improving chip area ($65.52 \%$ and $50.7 \%$), read latency ($9.52 \%$ and $13.27 \%$), read dynamic energy ($31.23 \%$ and $52.07 \%$) and increasing memory utilization ($13.41 \%$ and $2.67 \%$).

0

cs.ET 2026-05-11 Recognition

HBR decomposition pins fidelity loss to specific compiler phases

Per-Phase Fidelity Attribution for Quantum Compilers using HBR Decomposition

Routing dominates search circuits while synthesis leads Hamiltonian workloads, and SDK rankings reverse with optimization level.

abstract click to expand

Quantum compilers sit between an algorithm's theoretical promise and what executes on physical hardware. Existing benchmarks report aggregate post-transpilation metrics but cannot attribute where fidelity is lost within the compilation pipeline. We present HBR decomposition, a per-phase fidelity attribution model that quantifies relative fidelity loss across High-level structural decomposition (H), Basis translation (B), and Routing (R). We evaluate three production SDKs (Qiskit, PennyLane, TKET) across eight algorithms on two backend topologies: IBM Heron (heavy-hex) and IonQ Forte (all-to-all). The dominant compiler bottleneck is strongly circuit-class dependent: Routing accounts for up to 60% of relative fidelity loss in search-class circuits, while synthesis dominates Hamiltonian simulation workloads. Early synthesis choices amplify or compress downstream routing overhead depending on circuit connectivity. SDK rankings at diagnostic optimization level (opt=0) reverse at production levels (opt=2) for deep circuits, showing that stagewise diagnostics and production results answer different questions. HBR correctly predicts SDK rank ordering across noisy simulations (8 circuits x 3 SDKs x 2 tiers) and real IBM Fez hardware executions, revealing stage-specific bottlenecks that are not observable through aggregate compiler benchmarks.

0

cs.ET 2026-05-11 Recognition

Plasma simulations need three post-Moore tech tiers

Post-Moore Technologies for Plasma Simulation: A Community Roadmap

FPGA-class accelerators for near-term offload, non-von Neumann for operator acceleration, and quantum for warm dense matter microphysics, as

abstract click to expand

Plasma simulations are among the most computationally demanding scientific workloads, combining high-dimensional kinetic evolution, particle-mesh coupling, field solves, and data-intensive communication. As general-purpose processor scaling slows, post-Moore technologies are being explored to address bottlenecks in data movement, memory access, and power consumption. This paper provides a community perspective on the role of these technologies in plasma simulation, assessing three major classes: reconfigurable and data-path accelerators, non-von Neumann architectures, and quantum computing. Each is evaluated, in a co-design approach, against representative plasma workloads spanning particle-in-cell, continuum Vlasov, gyrokinetic, fluid/MHD, hybrid, and warm dense matter methods. We find that no single technology can replace existing HPC platforms. Instead, three tiers of opportunity emerge: FPGA-class and data-path accelerators offer near-term kernel offload and workflow-level data services, non-von Neumann architectures represent medium-term directions for operator-level acceleration, and quantum computing, although the least mature, is potentially the most disruptive for warm dense matter and inertial confinement fusion microphysics. We outline best practices for selective adoption and identify focused demonstrators, benchmarking, and modular software ecosystems as immediate community priorities.

0

cs.ET 2026-05-06

Transferred parameters raise FALQON Max-Cut ratios on large graphs

Second-Order FALQON Parameter Transfer for the Max-Cut Problem on 3-Regular Graphs

Small-instance tuning permits larger time steps on bigger 3-regular graphs, improving ratios while cutting optimization costs for NISQ use.

abstract click to expand

The Feedback-based Algorithm for Quantum Optimization (FALQON) offers a deterministic alternative to variational quantum algorithms by bypassing classical optimization loops. However, maintaining convergence on large problem instances often requires restricting the time step, necessitating quantum circuit depths that exceed Noisy Intermediate-Scale Quantum (NISQ) hardware capabilities. This paper investigates the parameter transferability of second-order FALQON applied to the Max-Cut problem on 3-regular graphs. Through numerical experiments evaluating quantum circuits up to 16 layers on graphs up to 24 nodes, we demonstrate a highly advantageous scaling behavior: transferring feedback parameters optimized on small instances to larger target graphs yields significantly higher approximation ratios than natively optimizing the parameters directly on the larger graphs. This performance advantage arises because parameters trained on smaller instances can safely adopt aggressively larger time steps. By offloading the expensive parameter discovery phase to small-scale instances, this transfer strategy simultaneously reduces computational overhead and enhances the approximation ratio, thereby bringing FALQON closer to practical viability on near-term quantum architectures.

0

cs.ET 2026-05-06

Symmetries let classical nonlinear systems compute in parallel

Symmetry-induced quantum-inspired parallelism of classical dynamic systems

V-2 spin networks evaluate AND/OR gates and N-bit adders simultaneously without needing linear superposition.

abstract click to expand

Performing multiple computations within the same system, without spatial or temporal separation of tasks, requires encoding multiple data items into a well-defined physical state. The most widely explored mechanism for such encoding is the superposition of physical states representing computational states. However, superposition requires the system to be linear, which significantly limits the set of achievable operations. We show that system symmetries provide an alternative mechanism for encoding multiple computational states. Notably, this mechanism also applies to nonlinear systems and therefore does not impose inherent limits on computed functions. Using the evaluation of Boolean functions as an example, we show that a relaxed spin network driven by the V-2 model supports this mechanism. We relate the resulting simultaneous computations enabled by symmetry-induced parallelism to properties of the evaluated functions. We demonstrate symmetry-induced parallelism for a logical AND/OR gate and an N-bit adder.

0

cs.ET 2026-05-06

SIMs boost ISAC energy efficiency up to 230% with fewer antennas

Resource Allocation and AoI-Aware Detection for ISAC with Stacked Intelligent Metasurfaces

Puncturing-based decomposition meets mixed traffic and sensing needs while cutting transmit hardware requirements.

abstract click to expand

Stacked intelligent metasurfaces (SIMs) provide wave-domain degrees of freedom that can empower integrated sensing and communication (ISAC) through flexible beampattern synthesis and interference management, while reducing hardware cost. In this paper, we investigate energy-efficient resource allocation for a downlink SIM-aided multi-user ISAC system that supports the coexistence of enhanced mobile broadband (eMBB) and ultra-reliable and low-latency communication (URLLC) via puncturing, while simultaneously illuminating sensing targets. We formulate an energy efficiency (EE) maximization problem that jointly optimizes resource block (RB) allocation, transmit power control, and SIM phase shifts. The formulated problem is highly challenging due to the large number of variables optimized on different time scales. To overcome this, we leverage the intrinsic two-timescale structure induced by the puncturing approach to decompose the original problem into two tractable subproblems: EE maximization for eMBB users in each time slot and EE maximization for URLLC users and sensing targets in each mini-slot. To address each subproblem, we develop an iterative algorithm that transforms the original non-convex formulation into a sequence of tractable subproblems, yielding convex updates for RB allocation and power control, along with low-complexity updates for SIM phase shifts. Simulation results show that the proposed design achieves up to 230% improvement in EE over a No-SIM baseline. In addition, it requires significantly fewer transmit antennas than conventional BS architectures, while preserving the EE achieved and satisfying the communication and sensing quality of service (QoS) requirements. Moreover, the results reveal fundamental trade-offs between EE and heterogeneous QoS requirements across communication and sensing functionalities.

0

cs.ET 2026-05-04

Framework quantifies AI memory energy costs

Analytic Framework for Estimating Memory Cost

Analytic estimates link data-center memory use in large models to environmental impact and guide lower-cost designs.

abstract click to expand

As artificial intelligence (AI) models quickly spread and become more advanced, they are requiring an ever-increasing amount of data and compute capability, leading to a significant energy cost. Training and inference of AI models including the large language models (LLMs) and deep neural networks (DNNs) are contributing to a large carbon footprint owing to the massive amount of memory they consume in data centers. In this article, we present a generalized framework that quantifies these energy costs incurred to the environment. This framework provides a foundational quantification of AI's ecological footprint, facilitating the development of sustainable architectural strategies for future models.

0

cs.ET 2026-05-01

Tighter quantum integration cuts energy waste

Energy-Aware Quantum-Enhanced Computing Continuum

A three-layer continuum with shared fiber and cryogenic logic targets lower energy per solved problem in hybrid systems.

abstract click to expand

We discuss a Quantum-Enhanced Computing Continuum, a heterogeneous, hybrid architecture that integrates quantum processing units (QPUs) within an Edge-Cloud-HPC fabric. Promote sustainability by shifting from performance to "energy-aware integration.' The architecture has three layers: a Physical Layer with shared fiber-optic infrastructure, a Control and Orchestration Layer managed by the user, and an Application Layer with an Adaptive Quantum Classical Fusion (AQCF) framework. Tighter system integration, like moving from cloud coupling to cryogenic logic, reduces energy waste and "thermal footprints.' The aim is a Green Performance Advantage: energy per problem solved in the era of Advanced Computing.

0

cs.ET 2026-05-01

Unified protocol proposed to standardize synthetic biological intelligence

Synthetic Biological Intelligence: System-Level Abstractions and Adaptive Bio-Digital Interaction

Survey reframes living neural networks plus hardware as a single interaction system and outlines shared steps for encoding, engineering, and

abstract click to expand

Concurrent advances across fields such as organoid technology, Microelectrode Arrays (MEAs), neuromorphic computing, and machine learning have given rise to a groundbreaking research paradigm: Synthetic Biological Intelligence (SBI). SBI refers to engineered systems in which living Biological Neural Networks (BNNs) are interfaced with hardware and software to perform task-oriented information processing in a closed loop. This cutting-edge technology, while still in its infancy, has the potential to deliver highly efficient performance across both computing capabilities and energy consumption. The early stage of this field underscores the need for reliable multi-scale and cross-domain interaction interfaces to support applications in robotics, biomedicine, signal processing, and neuroscience research. The hitherto lack of commercially available SBI platforms has slowed the development, as the conditions to produce a testbed are expensive and cumbersome. The introduction of standardized, platform- and cloud-integrated BNNs has been a crucial catalyst for the scientific community, improving the accessibility of SBI and leading the way to further developments. In this survey, we summarize the innovations that contributed to the emergence of SBI and the first testbed interfaces that enabled its embodiment. This work reframes SBI as a bio-digital interaction system and introduces a unified protocol across encoding, decoding, system engineering, and benchmarking.

0

cs.ET 2026-04-29

Unified framework shrinks gaps between neutral-atom compilers

Practical Insights into Fair Comparison and Evaluation Frame for Neutral-Atom Compilers

Consistent metrics and RSQASM representation make previously reported performance differences substantially smaller or absent.

abstract click to expand

Neutral-atom quantum computing is among the most promising platforms for scalable quantum computation, and compilation toolchains are crucial for leveraging capabilities such as qubit shuttling and parallel gate execution. An important challenge, however, is that existing neutral-atom compilers are often evaluated using metrics computed over different parts of the toolchain and under non-equivalent assumptions. Consequently, fair quantification and comparison of compiler performance remain difficult. Reported metrics may depend on inconsistent transpilation optimization levels, different movement-duration models, different sets of considered fidelity sources, and even minor implementation bugs or undocumented representation choices. To address this problem, we present a unified and reproducible evaluation framework for neutral-atom compilers. Our framework introduces RSQASM (Routed and Scheduled QASM), a QASM-inspired post-compilation representation that captures mapped, routed, and scheduled circuits, including explicit parallel gate execution and shuttling operations. As part of the framework, we provide adapter scripts that translate existing compiler outputs and intermediate artifacts into RSQASM. As a case study, we compare three well-known neutral-atom compilation toolchains: HybridMapper, DasAtom, and Enola, motivated by the large performance differences reported in prior work. Using our framework and representation, we perform a new evaluation and show that several previously claimed performance gaps become substantially smaller and, in some cases, are not reproduced once evaluation inconsistencies are removed.

0

cs.ET 2026-04-28

IoE unifies people, data, and things for 6G automation

Internet of Everything in the 6G Era: Paradigms, Enablers, Potentials and Future Directions

A review outlines how this integration can boost efficiency in cities, healthcare, and industry while tackling scalability and security.

abstract click to expand

The Internet of Everything (IoE) represents an evolution of the Internet of Things (IoT) by integrating people, data, processes, and things into a unified intelligent ecosystem. IoE aims to enhance automation, decision-making, and service efficiency across multiple application domains such as smart cities, healthcare, industry, and next-generation wireless networks. This paper provides a structured overview of the IoE concept, its core components, architectural foundations, enabling technologies, and major research challenges. Finally, open research directions toward 6G-enabled intelligent IoE systems are discussed, with emphasis on scalability, security, privacy, and energy efficiency.

0

cs.ET 2026-04-28

Repository blockchain turns fork chains into trees for single-process access

A Tree-Based Repository Blockchain Framework for Shared Governance in Collaborative Fork Ecosystems

Navigation replaces inter-blockchain communication so one process reaches every block across hard forks.

abstract click to expand

Collaborative blockchain ecosystems allow diverse groups to cooperate on tasks while providing properties such as decentralization and transaction security. We provide a model that uses a repository blockchain to manage hard forks within a collaborative system such that a single process (assuming that it has knowledge of the requirements of each fork) can access all of the blocks within the system. The repository blockchain replaces the need for Inter Blockchain Communication (IBC) within the ecosystem by navigating the networks. The resulting construction resembles a tree instead of a chain. A proof-of-concept implementation performs a depth-first search on the new structure.

0

cs.ET 2026-04-28

RF broadcasts remotely program spintronic network weights

Remotely programming the weights of a spintronic neural network by a radiofrequency broadcast signal

Shared signals selectively flip vortex cores in MTJ chains to reconfigure one 22-synapse device for digits or drone tasks.

abstract click to expand

Selectively programming large number of non-volatile synaptic weights without compromising scalability is a key challenge for in-memory computing. Here, we demonstrate remote programming of synaptic weights in series-connected chains of 11 vortex-based magnetic tunnel junctions using broadcast radiofrequency signals applied through a shared strip line. The programming relies on frequency-selective reversal of the vortex-core polarity and therefore does not require individual access lines or selector devices. By reconfiguring the binary states of these chains, we reshape the weighted sums they perform on frequency-multiplexed RF inputs. Using a 22-synapse network composed of two such chains, we remotely reconfigure the same hardware to perform two distinct tasks: handwritten-digit classification and drone RF-signature identification. The digit-optimized configuration reaches 94.91 +/- 0.26% accuracy on handwritten digits but only 13.17 +/- 0.47% on drone RF signatures, whereas the drone-optimized configuration reaches 97.33 +/- 0.62% on drones but only 47.59 +/- 1.5% on digits. Broadcast RF programming thus provides a compact and scalable route to rapidly reconfigurable spintronic neuromorphic hardware.

0

cs.ET 2026-04-27

Quantum annealer plays tic-tac-toe from rules alone

Playing Dice with the Universe: Programming Quantum Computers to Play Traditional Games

Encoding only the game rules into its Hamiltonian lets the device sample moves that raise the chance of winning.

abstract click to expand

The challenge of programming classical computers to play traditional, competitive games against human players has helped to advance classical hardware and software. Quantum computers have the potential to play games in a unique way: programmed only with the rules of a game, they should be able to implicitly represent all future paths of a game leading to wins, losses, or draws, and sample from this path set to identify moves that maximize the likelihood of a win. This permits skilled play without hard-coded or machine-learned strategy. As a proof of principle, we present early results obtained after programming the D-Wave quantum annealer with the rules of tic-tac-toe, enabling it to play against a human opponent. We anticipate that, as it has for classical computers, game-playing will serve as an important real-world benchmark for quantum computers.

0

cs.ET 2026-04-27

MTJ memory integrates stochastic computing to skip external random generators

Maximizing Memory-Level Parallelism via Integrated Stochastic Logic-in-Memory Architectures

Parallel bit-stream conversion and arithmetic happen inside the memory arrays, cutting data movement and energy use for intensive workloads.

abstract click to expand

Today's high-performance architectures are increasingly constrained by data movement latency and energy overhead, as the slowdown of single-core performance scaling coincides with the rise of highly data-intensive workloads. In-memory architectures have emerged as a complementary solution to conventional von Neumann systems by alleviating memory bandwidth bottlenecks, exploiting massive concurrency, and mitigating excessive data movement between memory and processing units. This study proposes a parallel in-memory stochastic computing (SC) architecture that implements an end-to-end computation pipeline within Magnetic Tunnel Junction (MTJ)-based memory augmented with logic-in-memory (LIM) capabilities. By leveraging the inherent stochasticity and write-read characteristics of MTJ devices, the proposed architecture enables a fully parallel and deterministic conversion of binary operands into probabilistic bit-streams, eliminating the need for energy-intensive external random number generation circuitry. These bit-streams are processed by parallel stochastic arithmetic units integrated directly within the memory arrays to efficiently implement core arithmetic and transcendental functions with minimal hardware complexity and inherent noise tolerance. The resulting stochastic outputs can be either reused as an input of future stochastic processing or converted back to binary form using parallel accumulation mechanisms and stored in the MTJ memory. By tightly integrating data storage, bit-stream generation, and computation within a unified in-memory fabric, the proposed design maximizes memory-level parallelism while substantially minimizing data movement.

0

cs.ET 2026-04-27

One fungal network device senses 14 stimuli

Mycoponically Integrated Network Device for Multimodal Sensing with Living Mycelial Networks

Bioelectrical responses follow conserved Hill-type curves across species and recover from damage for months of use.

abstract click to expand

Living mycelial filaments integrate chemical, optical, mechanical, thermal, and biological information via electrophysiological cellular trans-membrane potential. The challenge is to create a mycology interface that sustains metabolism, standardizes electrode geometry, and tolerates mechanical damage. Using mycoponics we overcome these factors that limited prior demonstrations to single modalities, and operational windows of days to weeks. We present MIND, an engineered biophysical interface integrating antimicrobial nutrient delivery (ceramic size exclusion) with non-invasive electrophysiology, in cylindrical (MINDTube) and planar (MINDPixel) form-factors. The platform sustains colonized \textit{Pleurotus ostreatus} mycelium beyond 11 months and distinguishes 14 stimulus classes from a single unmodified device. Steady-state intensity responses follow Hill-type calibration functions across five phylogenetically diverse fungi grown on the identical interface, making strain selection a tunable design parameter. Multichannel decoding from the standardized electrode geometry recovers stimulus duration, location, and trajectory. Continuous nutrition provided by mycoponics recovered complete electrophysiological function within 72 h after mechanical excision. MIND converts living mycelium networks into universal, self-repairing, biosensors.

0

cs.ET 2026-04-27 2 theorems

Fungal mycelium becomes year-long self-repairing multimodal sensor

Mycoponically Integrated Network Device for Multimodal Sensing with Living Mycelial Networks

MIND interface sustains living networks for 11 months and decodes 14 stimulus classes from one device with rapid recovery after damage.

abstract click to expand

Living mycelial filaments integrate chemical, optical, mechanical, thermal, and biological information via electrophysiological cellular trans-membrane potential. The challenge is to create a mycology interface that sustains metabolism, standardizes electrode geometry, and tolerates mechanical damage. Using mycoponics we overcome these factors that limited prior demonstrations to single modalities, and operational windows of days to weeks. We present MIND, an engineered biophysical interface integrating antimicrobial nutrient delivery (ceramic size exclusion) with non-invasive electrophysiology, in cylindrical (MINDTube) and planar (MINDPixel) form-factors. The platform sustains colonized \textit{Pleurotus ostreatus} mycelium beyond 11 months and distinguishes 14 stimulus classes from a single unmodified device. Steady-state intensity responses follow Hill-type calibration functions across five phylogenetically diverse fungi grown on the identical interface, making strain selection a tunable design parameter. Multichannel decoding from the standardized electrode geometry recovers stimulus duration, location, and trajectory. Continuous nutrition provided by mycoponics recovered complete electrophysiological function within 72 h after mechanical excision. MIND converts living mycelium networks into universal, self-repairing, biosensors.

0

cs.ET 2026-04-24

HDC operations realized as coherent wave behaviors

A wave-geometric duality for hyperdimensional computing

Unitary mapping turns bundling into superposition and binding into nonlinear mixing, with FDTD validation in coupled arrays.

abstract click to expand

Hyperdimensional computing (HDC), also referred to as vector symbolic architectures (VSA), represents information with high-dimensional vectors and a compact algebra of primitives. This paper establishes an explicitly unitary embedding from discrete bipolar HDC/VSA vectors to coherent broadband waveforms and develops a common wave-domain realization of the core HDC/VSA primitives within that embedding. Under the resulting RFC/UWE stack, bundling becomes linear superposition, permutation becomes coherent phase evolution, binding is reproduced by nonlinear spectral mixing together with an engineered aliasing step that restores circular-convolution structure, and similarity is recovered as a calibrated differential-power readout. Full-wave FDTD studies validate the physically nontrivial parts of this program, including array-level readout in a mutually coupled setting and the binding pipeline under realistic propagation. In a documented $N=1000$ mutually coupled-array calibration, the predicted interaction effect appears with the expected sign pattern and order of magnitude, yielding a coupled Correlation Contrast Ratio of approximately $8.7 \times 10^{-5}$. The result is a wave-geometric duality for HDC/VSA: existing symbolic operations admit a physically grounded waveform realization, while coherence, isolation, and readout sensitivity remain the central engineering constraints for future hardware.

0

cs.ET 2026-04-21

Photonic chip exceeds 1,000 TOPS on matrix math

Homodyne Photonic Tensor Processor exceeds 1,000-TOPS

Time-multiplexed homodyne circuit encodes and computes with light at 120 Gbaud per second while using foundry packaging.

abstract click to expand

High-performance computing underpins modern artificial intelligence (AI), enabling foundation models, real-time inference and perception in autonomous systems, and data-intensive scientific simulations. Recent advances in quantization techniques utilizing low-precision computation without degrading model accuracy, create new opportunities for analog photonic computing characterized by ultra-high clock rates and low energy consumption. Here we propose and demonstrate a coherent homodyne integrated circuit capable of general matrix multiplication (GEMM) with aggregate throughput that exceeds 1,000 TOPS (tera-operations per second), enabled by massive on-chip optical fanout and parallelism. By leveraging time multiplexing, the required modulator count is reduced from O($N^2$) to O(N), allowing dense integration of record-scale 256 $\times$ 256 homodyne units (each <0.0064 $mm^2$) within a single reticle. We employ wafer-scale fabricated 64 thin-film lithium niobate (TFLN) transmitters (each over 40-GHz bandwidth with propagation loss of 0.2 dB/cm) to encode data and chip-to-chip coupled to Si/SiN computing circuits (64 channels). Our system achieves up to 7-bit computational accuracy across 8 $\times$ 8 parallel channels at record computing clockrate 120 Gbaud/s, and 6-bit statistical accuracy across 256 $\times$ 100 channels at 20-128 Gbaud/s, representing a total throughput of 1,000-6,000 TOPS. Massive parallelism amortizes the optoelectronic (OE) conversion to allow 330-TOPS/W efficiency using foundry-available packaging technology. The system throughput is benchmarked with Qwen2.5-0.5 billion parameter models that generate accurate tokens. High throughput and energy efficiency establish a near-term pathway toward light-based accelerators for large-scale training and low-latency inference from datacenters to edges, accelerating new models toward artificial general intelligence.

0

cs.ET 2026-04-21

Equal inductors turn bridged-T network into high-pass filter

Scattering-Matrix-Based Parametric Characterization of a Two-Port Bridged-T Network for Microstrip Filter Applications

Canceling even terms in the S11 numerator produces sharp roll-off at 1 GHz with -30 dB/GHz slope.

abstract click to expand

The purpose of this study is to characterize a two-port Bridged-T network using transmission (T) and scattering (S) matrices. Using mathematical derivations, scattering parameters including S11, S12, S21, and S22 have been derived from the T and S matrices to permit a detailed investigation of the network's performance. As two of the most relevant parameters in the design of microstrip filters, both the magnitude and phase of S11 and S21 have been parametrically calculated after normalizing the frequency. Furthermore, when the inductors L1 and L2 are identical, all even coefficients of the numerator polynomial in the S11 transfer function are eliminated, leaving only the odd coefficients behind. Based on this feature, the bridged-T circuit is designed to operate as a high-pass filter. Therefore, the magnitude and phase of both S11 and S21 have been simulated for the designed filter with a corner frequency of 1 GHz. Simulation results performed by Keysight ADS show that S11 and S21 for the high-pass filter built upon the bridged-T network have sharp roll-off ratios of -30dB/GHz and -32dB/GHz respectively.

0

cs.ET 2026-04-21

Symmetry mapping slashes QAOA qubit count while keeping full accuracy

EQE-QAOA: An Equivalence-Preserving Qubit Efficient Framework for Combinatorial Optimization

Invariant subspace equivalence lets the algorithm run on fewer qubits and still reach the identical optimal solutions for symmetric problems

abstract click to expand

The limited number of qubits is a major bottleneck in Quantum Approximate Optimization Algorithm (QAOA) for large-scale combinatorial optimization in the Noisy Intermediate-Scale Quantum (NISQ) era. To make progress, existing techniques rely on qubit reduction at the cost of information loss, hence leading to degraded computational performance. As a remedy, we propose the Equivalence-preserving Qubit Efficient QAOA (EQE-QAOA), which significantly reduces the required number of qubits without degrading the performance of QAOA. By exploiting intrinsic symmetries and conserved quantities, we first demonstrate that the QAOA dynamics are strictly confined to an invariant subspace of the Hilbert space. We subsequently prove that the evolution within this subspace is exactly equivalent to that of the full-scale system, achieving the same optimal solution as the original QAOA. Moreover, to reduce the number of qubits, we propose an isometric mapping that re-encodes the subspace into a space relying on fewer qubits. Furthermore, we derive the applicability conditions of EQE-QAOA and show that it is broadly applicable to large-scale combinatorial optimization problems, excluding only unconstrained problems with completely independent variables. Numerical simulations based on Max-Cut instances validate that EQE-QAOA significantly reduces qubit requirements and computational resources, while preserving exact optimization performance.

0

cs.ET 2026-04-21

This paper surveys the roles of UAVs as relays

UAVs as Dynamic Nodes in Communication Networks

UAVs serve multiple dynamic roles in wireless networks, and a novel UAV-Network-in-a-Box architecture is proposed for emergency temporary…

abstract click to expand

Driven by the demands of 5G/Beyond 5G and 6G networks, Unmanned Aerial Vehicles (UAVs) have surfaced in critical roles for aerial communications. In the present survey, we explore the multi-mode roles of UAVs as relays, User Equipment (UE), gNB and Reconfigurable Intelligent Surfaces (RIS), along with their deployment scenarios, architectural frameworks, and different communication models incorporating Artificial Intelligence (AI) configurations. We consider the effects of alternate power sources on the communication payload. The survey also aims to address security issues in the UAV communications. As an advancement, we propose a novel UAV-Network-in-a-Box (NIB) architecture for disaster recovery and temporary coverage as an alternative to traditional network infrastructure.

0

cs.ET 2026-04-20

Inertia term lets Ising machines update all spins in parallel

A fully parallel densely connected probabilistic Ising machine with inertia for real-time applications

Modified dynamics improve success rates on dense problems and deliver average 35x speedups for 200-spin instances

abstract click to expand

Ising machines -- special-purpose hardware for heuristically solving Ising optimization problems -- based on probabilistic bits (p-bits) have been established as a promising alternative to heuristic optimization algorithms run on conventional computers. However, it has -- until now -- been thought that Ising spins that are connected in probabilistic Ising machines cannot be updated in parallel without ruining the machine's solving ability. This has been a major challenge for using probabilistic Ising machines as fast solvers for densely connected problems. Here, we circumvent this by introducing a modified Ising spin dynamics with an added inertia term, and verify in algorithm simulations, FPGA hardware emulation, and FPGA experiments that it enables fully parallel, synchronous updates while improving rather than degrading success probability. We evaluated on various types of abstract (Max-Cut and Sherrington-Kirkpatrick-model) and application-derived (MIMO, wireless detection) dense Ising benchmark instances. Performing fully parallel updates results in a speed advantage that grows faster than linearly with the number of spins, giving rise to large time-to-solution increases for practical problem sizes. For both Max-Cut and the SK-1 model at a problem size of 200, our approach achieved an average speedup of $\approx 35\times$, with the best single-instance speedup reaching $150\times$. As an example of the practical utility of our approach in an application where speed is critical, we further show by co-designing the algorithm dynamics with the hardware implementation -- co-optimizing for solver ability and silicon resource usage -- that probabilistic Ising machines based on our approach satisfy the stringent solution quality and latency/throughput requirements for real-time MIMO detection in modern 5G cellular wireless networks while using a practically reasonable silicon area.

1 0

0

cs.ET 2026-04-20

QAOA cuts energy use in route optimization by three orders of magnitude

Potential Energy Savings from Quantum Computing-Based Route Optimization

Benchmarks on graphs up to 20 nodes show higher quality solutions than classical methods with potential annual savings of 2.62 EJ fuel in U.

abstract click to expand

We investigate the potential of the Quantum Approximate Optimization Algorithm (QAOA) for reducing energy consumption in route planning, a key challenge in logistics due to the NP-hard nature of the Traveling Salesman and Vehicle Routing Problems. By encoding route optimization as a Quadratic Unconstrained Binary Optimization (QUBO) problem and implementing QAOA circuits at depth p = 3-5 alongside classical baselines of Simulated Annealing (SA) and Genetic Algorithms (GA), we perform systematic benchmarks on Euclidean graphs of sizes N = 5, 10, and 20. Our results demonstrate that QAOA attains higher solution quality with approximation ratios of 0.953 (N = 5), 0.921 (N = 10), and 0.903 (N = 20), outperforming SA and GA by 2.7-4.4%. Wall-clock runtimes for QAOA are 2-3x faster than SA across all tested sizes, and energy consumption measurements reveal a three-order-of-magnitude reduction, remaining in the picojoule range versus nanojoules for classical methods. Translating these gains to real-world logistics suggests an 8.2% improvement in routing efficiency could save approximately 2.62 EJ of fuel annually in the U.S., avoiding nearly 1.94 x 10^8 tonnes of CO2 emissions. These findings highlight QAOA's promise as a fast, energy-efficient optimizer for sustainable logistics applications and underscore its potential role in next-generation fleet-management systems.

0

cs.ET 2026-04-20

Control systems all compute, even purely mechanical ones

When does a control system compute? Digital, mechanical and open-loop systems

ART shows the plant acts as the representational user of the controller in thermostats, governors and open-loop loops alike.

abstract click to expand

Control systems are ubiquitous in modern technology, comprising an engineered plant to be kept within specific, often fine-tuned, limits, and a separate controller that ensures this is the case. While modern controllers often employ digital computers, other examples are purely mechanical, or even biological. It is an open question whether computation is happening within all controllers by virtue of them being part of a control system. Abstraction/ Representation theory (ART) has been developed to tackle just this question of whether a physical system is computing. Here, we demonstrate how to use ART to model control systems, and analyse them for computational properties. We determine that the plant of a control system is (a proxy for) the representational entity necessary in ART for the existence of any computation: the plant is the user of the controller. We consider specific systems: a digital thermostat, an electro-mechanical thermostat, the purely mechanical centrifugal governor, and an open-loop human-controlled heating system. We show that all these systems, and control systems in general, are performing some degree of computation. As an initial use of these results, we apply them to computationalism within cognitive theory: we show the governor is computing, so it cannot play its role of counter-example in the question of whether the brain is too.

0

cs.ET 2026-04-20

Scholar bots match Senior Lecturer level in academic tasks

The Relic Condition: When Published Scholarship Becomes Material for Its Own Replacement

Distilling reasoning from published papers lets LLMs handle supervision, peer review and panels at expert-rated quality, prompting calls for

abstract click to expand

We extracted the scholarly reasoning systems of two internationally prominent humanities and social science scholars from their published corpora alone, converted those systems into structured inference-time constraints for a large language model, and tested whether the resulting scholar-bots could perform core academic functions at expert-assessed quality. The distillation pipeline used an eight-layer extraction method and a nine-module skill architecture grounded in local, closed-corpus analysis. The scholar-bots were then deployed across doctoral supervision, peer review, lecturing and panel-style academic exchange. Expert assessment involved three senior academics producing reports and appointment-level syntheses. Across the preserved expert record, all review and supervision reports judged the outputs benchmark-attaining, appointment-level recommendations placed both bots at or above Senior Lecturer level in the Australian university system, and recovered panel scores placed Scholar A between 7.9 and 8.9/10 and Scholar B between 8.5 and 8.9/10 under multi-turn debate conditions. A research-degree-student survey showed high performance ratings across information reliability, theoretical depth and logical rigor, with pronounced ceiling effects on a 7-point scale, despite all participants already being frontier-model users. We term this the Relic condition: when publication systems make stable reasoning architectures legible, extractable and cheaply deployable, the public record of intellectual labor becomes raw material for its own functional replacement. Because the technical threshold for this transition is already crossed at modest engineering effort, we argue that the window for protective frameworks covering disclosure, consent, compensation and deployment restriction is the present, while deployment remains optional rather than infrastructural.

0

cs.ET 2026-04-20

Bacterial growth curves classify nonlinear patterns as reservoirs

What Makes a Bacterial Model a Good Reservoir Computer? Predicting Performance from Separability and Similarity

Simulations of multiple species and mutants show high accuracy tied to differences in state-matrix ranks, pointing to living systems for low

abstract click to expand

Biological systems are promising substrates for computation because they naturally process environmental information through complex internal dynamics. In this study, we investigate whether bacterial metabolic models can act as physical reservoirs and whether their computational performance can be predicted from dynamical properties linked to separability and similarity. We simulated the growth dynamics of five bacterial species, one yeast species, and 29 Escherichia coli single-gene deletion mutants using dynamic flux balance analysis (dFBA), with glucose and xylose concentrations as inputs and growth curves as reservoir states. Computational performance was assessed on random nonlinear classification tasks using a linear readout, while reservoir properties linked to separability and similarity were characterised through kernel and generalisation ranks computed from growth-curve state matrices. Several microbial models achieved high classification accuracy, showing that bacterial metabolic dynamics can support nonlinear computation. Clear differences were observed between species, with some models converging more rapidly and others reaching higher maximum accuracy, revealing a trade-off between convergence speed and peak performance. In contrast, all E. coli mutants were dominated by the wild-type model, suggesting that gene deletions reduce the dynamical richness required for efficient computation. The difference between kernel and generalisation ranks was generally associated with improved accuracy, but deviations across models and sensitivity at low rank values limited its predictive power in practice. Overall, these results show that bacterial metabolic models constitute promising substrates for reservoir computing and provide a first step towards identifying microbial strains with favourable computational properties for future experimental implementations.

0

cs.ET 2026-04-17

Molecule mixtures estimate source distance via degradation ratios

Source Distance Estimation in Turbulent Airflow: Exploiting Molecule Degradation Diversity

Different degradation rates produce distinguishable relative abundances at the receiver, enabling low-complexity distance estimation in real

abstract click to expand

In nature, estimating the location of a molecule source in turbulent airflow is a central, and yet highly challenging problem for mate search and foraging. Recently, it has also received increasing attention in synthetic molecular communication (SMC), e.g., for leakage detection. One important aspect of source localization is to estimate the distance to the molecule source, e.g., to determine whether it is worth to travel to a potential mating partner or food source, or to decide whether a leak is close enough for inspection. In this study, based on realistic simulations, we show that the diversity induced by molecule mixtures can aid source localization. In particular, when different molecule types in a mixture are subject to atmospheric degradation with different degradation rates, the relative abundance of the different species observed at the receiver enables low-complexity estimation of the source distance. Furthermore, this feature can be combined with already established concentration-based and temporal features of observed molecular signals to further increase estimation accuracy. Thereby, we show that molecule degradation diversity of molecule mixtures can help to realize one of the important envisioned SMC applications, namely source localization, even in turbulent airflow, opening new opportunities for the exploitation of SMC to solve real-world problems.

0

cs.ET 2026-04-16

CMOS-integrated sMTJs create tunable p-bits

CMOS-integrated superparamagnetic tunnel junction-based p-bit

Resistance fluctuations in the junctions produce controllable probabilistic digital voltages in standard 130 nm technology.

abstract click to expand

Probabilistic computers offer promising solutions for computationally hard problems in domains such as combinatorial optimization and machine learning. A key building block in these systems is the probabilistic bit (p-bit), which relies on superparamagnetic tunnel junctions (sMTJs) as its source of randomness. A challenging threshold to cross for scaling sMTJ-based p-bit systems is integration of sMTJs with CMOS technology. In this work, we present experimental results of a p-bit unit cell using sMTJs integrated with 130 nm CMOS technology and demonstrate that the sMTJ's resistance fluctuations can generate a corresponding fluctuating digital output voltage which is tunable via the input voltage. These findings establish the feasibility of CMOS-compatible, sMTJ-based probabilistic circuits and mark a key step toward scalable hardware for real-world probabilistic computing applications.

0

cs.ET 2026-04-16

IGZO FeFET NOR cells reach 0.016 um2 SRAM-like area and sub-5ns access

DTCO Exploration of NOR-Type IGZO FeFETs for Read-Dominated Memories

DTCO analysis shows the devices fit read-heavy AI memory but need positive-Vt fixes to control sneak-current penalties.

abstract click to expand

InGaZnO (IGZO) channel FeFETs have attracted notable interest thanks to their advances in endurance. This work evaluates the viability of NOR-type IGZO FeFETs for readcentric AI inference workloads via design-technology cooptimization (DTCO). We demonstrate the cross-node bitcell footprint scalability of back-end-of-line (BEOL) IGZO FeFETs capable of delivering 10-A SRAM-equivalent area (0.016 um2) with 7-nm ground rules and reaching sub-5 ns random access latency despite writability challenges. We further identify the sensing margin penalty in NOR FeFET arrays arising from sneak current associated with the negative program-state Vt, which requires positive-Vt engineering in order to eliminate the unwanted negative voltage read inhibition - for example, by ferroelectric layer thinning. Last but not least, we elucidate the read margin implications on 3D FeNOR for storage-class memories (SCMs), with the 3D stacking density limited by additional sneak current from neighbor channel shunting.

0

cs.ET 2026-04-15

DNA inference at nanonodes stabilizes alarms for weak anomalies

Embedded DNA Inference in In-Body Nanonetworks: Detection, Delay, and Communication Trade-Offs

Simulations locate a bounded regime where local logic improves detection without raising communication costs over simpler methods.

abstract click to expand

In-body molecular nanonetworks promise early abnormality detection close to the source of biochemical events, but their communication capabilities are severely constrained by slow diffusion-based signaling and unstable alarm traffic. We study whether simple embedded DNA-based inference at the nanonode can improve alarm transmission to an external gateway. We compare raw reporting (RR), single-marker threshold reporting (TR), and embedded inference reporting (EIR) under a communication-oriented abstraction of DNA strand-displacement-based computation with marker gating, edge-triggered alarming, hysteretic state transitions, temporally correlated marker dynamics, diffusion-based alarm transport, and leaky gateway evidence integration. The simulations identify a bounded EIR success regime in the weak-to-moderate anomaly range: EIR can improve detection relative to RR and TR while remaining competitive in event-driven communication cost, especially relative to RR. The gain does not come from uniformly lower activity, but from more stable local alarm dynamics. EIR does not dominate globally; TR often remains cheaper when abnormalities are present, and EIR incurs additional local delay. These results point to a limited operating regime in which EIR is useful, rather than to a general advantage across settings.

0

cs.ET 2026-04-15

Hybrid photonic system accelerates matrix multiplies with tunable precision

LightMat-HP: A Photonic-Electronic System for Accelerating General Matrix Multiplication With Configurable Precision

Block floating-point slicing overcomes noise limits and delivers gains over GPUs and FPGAs for smaller matrices.

abstract click to expand

Matrix multiplication is a fundamental kernel in large-scale artificial intelligence and scientific computing, but its performance on conventional electronic accelerators is increasingly constrained by memory bandwidth and energy efficiency. Photonic computing offers a promising alternative due to its ultra-high bandwidth, massive parallelism, and low power dissipation. However, most existing photonic systems are limited to low-precision computation because of analog optical modulation constraints and noise accumulation, which restricts their applicability in precision-critical workloads. To address this limitation, we propose LightMat-HP, a hybrid photonic-electronic computing system that enables end-to-end acceleration of general matrix multiplication with configurable computational precision. LightMat-HP adopts block floating-point (BFP) arithmetic to reduce computational complexity while enabling flexible precision-performance tradeoffs. To overcome the precision limitations of photonic devices, we propose a slicing-based photonic multiplication scheme that exploits the high accuracy of low bit-width photonic multiplication in combination with digital accumulation to achieve high-precision mantissa multiplication. A tile-based matrix multiplication dataflow is further designed to support matrices of arbitrary sizes. We experimentally validate LightMat-HP on a photonic computing prototype and evaluate its performance through large-scale simulations. The results demonstrate that LightMat-HP outperforms FPGA, GPU, and a state-of-the-art photonic accelerator across throughput, latency, and energy efficiency, particularly for small- and medium-sized matrix multiplications, owing to its highly parallel photonic architecture, efficient data movement, and slice-based BFP arithmetic.

0

cs.ET 2026-04-14

Optimization lifts HDC accuracy 48 percent on nonlinear hardware

Robust Reasoning and Learning with Brain-Inspired Representations under Hardware-Induced Nonlinearities

Minimizing the gap between ideal and hardware kernels in hypervector encoding keeps classification and graph reasoning reliable on CIM chips

abstract click to expand

Traditional machine learning depends on high-precision arithmetic and near-ideal hardware assumptions, which is increasingly challenged by variability in aggressively scaled semiconductor devices. Compute-in-memory (CIM) architectures alleviate data-movement bottlenecks and improve energy efficiency yet introduce nonlinear distortions and reliability concerns. We address these issues with a hardware-aware optimization framework based on Hyperdimensional Computing (HDC), systematically compensating for non-ideal similarity computations in CIM. Our approach formulates encoding as an optimization problem, minimizing the Frobenius norm between an ideal kernel and its hardware-constrained counterpart, and employs a joint optimization strategy for end-to-end calibration of hypervector representations. Experimental results demonstrate that our method when applied to QuantHD achieves 84\% accuracy under severe hardware-induced perturbations, a 48\% increase over naive QuantHD under the same conditions. Additionally, our optimization is vital for graph-based HDC reliant on precise variable-binding for interpretable reasoning. Our framework preserves the accuracy of RelHD on the Cora dataset, achieving a 5.4$\times$ accuracy improvement over naive RelHD under nonlinear environments. By preserving HDC's robustness and symbolic properties, our solution enables scalable, energy-efficient intelligent systems capable of classification and reasoning on emerging CIM hardware.

0

cs.ET 2026-04-13

Classical solvers beat D-Wave on wedding seating optimization

Entangled happily ever after: Wedding reception seating mapped to classical and quantum optimizers

A guest arrangement task mapped from protein design methods shows Monte Carlo finding optima the quantum annealer misses.

abstract click to expand

Although optimization is one of the most promising applications of quantum computers, the development of effective optimization strategies requires real-world test cases. When planning our recent wedding reception, we realized that the problem of optimally seating our guests, given constraints related to guests' relatedness, shared interests, and physical needs, could be mapped to a cost function network (CFN) form solvable with classical or quantum optimization algorithms. We compared the seating optimization performance of classical Monte Carlo CFN solvers in the Masala software suite to that of quantum annealing-based CFN optimization algorithms using one-hot, domain-wall, and approximate binary mappings, which we had developed for protein design problems. Surprisingly, the D-Wave Advantage 2 system, which performs well on similarly-structured CFN problems for protein design, struggled to return optimal seating arrangements that were easily found by classical Monte Carlo methods. We provide our seating optimization benchmark set, and code to convert seating optimization problems to CFN problems, as a plugin library for Masala, permitting this class of real-world problems to be used to benchmark performance of current and future classical CFN solvers, quantum optimization algorithms, and quantum computing hardware.

0

cs.ET 2026-04-13

Roadside LiDAR audits intersection near-misses with human checks

Roadside LiDAR for Cooperative Safety Auditing at Urban Intersections: Toward Auditable V2X Infrastructure Intelligence

New York City data shows the system cuts track fragmentation and false TTC triggers while revealing lateral conflicts missed by vehicle-only

abstract click to expand

Urban intersections expose the limitations of single-vehicle perception under occlusion and partial observability. In this study, we present an auditable roadside LiDAR framework for infrastructure-assisted safety analysis at a signalized urban intersection in New York City, developed and evaluated using real-world data. The proposed framework integrates trajectory construction, iterative human-in-the-loop quality assurance (QA), and interpretable near-miss analytics to produce defensible safety evidence from infrastructure sensing. Using a human-labeled heavy vehicle--bicycle interaction as an anchor case, we show that direction-agnostic time-to-collision (TTC) drops below 1s, while longitudinal TTC remains above conservative braking thresholds, revealing a lateral-intrusion-dominated conflict mechanism. Beyond individual cases, continuous-window evaluation and multi-round QA analysis demonstrate that the framework systematically reduces failure modes such as track fragmentation, spurious TTC triggers, unstable geometry, and cross-lane false conflicts. These results position roadside LiDAR as a practical post-hoc auditing mechanism for cooperative perception systems, with broader statistical validation discussed. This work provides a pathway toward scalable, data-driven safety auditing of urban intersections, enabling transportation agencies to identify and mitigate high-risk interactions beyond crash-based analyses.

0

cs.ET 2026-04-13 2 theorems

Agentic AI halves sensor use in wearable health monitors

Sense Less, Infer More: Agentic Multimodal Transformers for Edge Medical Intelligence

Joint gating and learned sparse sampling raise accuracy 1.9 percent while cutting sensor activity 48.8 percent across three datasets

abstract click to expand

Edge-based multimodal medical monitoring requires models that balance diagnostic accuracy with severe energy constraints. Continuous acquisition of ECG, PPG, EMG, and IMU streams rapidly drains wearable batteries, often limiting operation to under 10 hours, while existing systems overlook the high temporal redundancy present in physiological signals. We introduce Adaptive Multimodal Intelligence (AMI), an end-to-end framework that jointly learns when to sense and how to infer. AMI integrates three components: (1) a lightweight Agentic Modality Controller that uses differentiable Gumbel-Sigmoid gating to dynamically select active sensors based on model confidence and task relevance; (2) a Learned Sigma-Delta Sensing module that applies patch-wise Delta-Sigma operations with learnable thresholds to skip temporally redundant samples; and (3) a Foundation-backed Multimodal Prediction Model built on unimodal foundation encoders and a cross-modal transformer with temporal context, enabling robust fusion even under gated or missing inputs. These components are trained jointly via a multi-objective loss combining classification accuracy, sparsity regularization, cross-modal alignment, and predictive coding. AMI is hardware-aware, supporting dynamic computation graphs and masked operations, leading to real energy and latency savings. Across MHEALTH, HMC Sleep, and WESAD datasets, it reduces sensor usage by 48.8% while improving state-of-the-art accuracy by 1.9% on average.

0

cs.ET 2026-04-13

Physical forces added to quantum docking QUBO raise accuracy

A Physically-Informed Subgraph Isomorphism Approach to Molecular Docking Using Quantum Annealers

Coulomb, van der Waals, H-bond and hydrophobic terms correct geometric graph matching on D-Wave annealers

abstract click to expand

Molecular docking is a crucial step in the development of new drugs as it guides the positioning of a small molecule (ligand) within the pocket of a target protein. In the literature, a feasibility study explored the potential of D-Wave quantum annealers for purely geometric molecular docking, neglecting physicochemical interactions between the protein and the ligand and focusing solely on their simplified geometries. To achieve this, the ligands were represented as graphs incorporating their geometric properties and then mapped onto a grid that discretized the three-dimensional space of the protein pocket. The quality of the ligand pose on the protein pocket was evaluated through the isomorphism between the ligand graph and the spatial grid. This paper builds on the previous study by introducing physicochemical interactions between the protein-ligand pair into the QUBO problem to improve the accuracy of the docking results. This paper presents a novel QUBO formulation that includes Coulomb and van der Waals forces, together with components representing H-bond and hydrophobic interactions. We integrate these physical interactions as corrective terms to the previous purely geometric QUBO formulation, and provide experimental results using the D-Wave quantum annealers to demonstrate their impact on the accuracy of the docking results.

0

cs.ET 2026-04-10

Post-routing tool hits 100% timing closure in AQFP circuits

qPRO-AQFP: Post-Routing Optimization of AQFP Circuits with Delay Line Clocking

Reduces path-balancing buffers by 34% at a 4% frequency cost across common benchmarks

abstract click to expand

Adiabatic Quantum-Flux-Parametron (AQFP) logic is an ultra-low-power superconducting logic family with energy consumption approaching the Shannon limit, making it attractive for quantum computing control and cryogenic computing systems. Traditional AQFP designs face significant physical design challenges due to strict gate-level clocking requirements and limited interconnect lengths, leading to substantial buffer overhead and difficult timing closure. Recently, delay-line clocking of AQFP has been proposed to improve timing margins and reduce latency by enabling more flexible clock scheduling. However, prior work has primarily focused on placement and latency minimization, while relying on fixed timing parameters that do not capture the frequency dependence of AQFP setup and hold constraints. To address this limitation, we propose a frequency-aware post-routing optimization framework that jointly optimizes clock period, latency, and timing slack under user-specified weighting. Experimental results across common benchmarks achieve 100% post-routing timing closure across a range of performance--latency--slack trade-offs. Our approach also automates phase-skipping, reducing path-balancing buffer insertion by 34% on average while only reducing operating frequency by 4%.

0

cs.ET 2026-04-10

Analytical model gives wrapped-normal response for pulsatile MC channels

Analytical Modeling of Dispersive Closed-loop MC Channels with Pulsatile Flow

Closed-loop molecular communication under heartbeat-driven flow exhibits time-varying statistics that repeat with each pulse.

abstract click to expand

Molecular communication (MC) is a communication paradigm in which information is conveyed through the controlled release, propagation, and reception of molecules. Many envisioned healthcare applications of MC are expected to operate inside the human body. In this environment, the cardiovascular system ( CVS) acts as the physical channel, which forms a closed-loop network where particle transport is mainly governed by the combined effects of diffusion and flow. Despite the fact that physiological flows in many parts of the human body are inherently pulsatile due to the cardiac cycle, most existing models for dispersive closed-loop MC channels assume a constant flow velocity. In this paper, we present a time-variant one-dimensional (1D ) channel model for dispersive closed-loop MC systems with pulsatile flow. We derive an analytical expression for the channel impulse response (CIR ), which follows a wrapped Normal distribution with time-variant mean and variance. The obtained model reveals the cyclostationary nature of the channel and quantifies the influence of pulsation on the temporal concentration profile compared to steady-flow systems. Finally, the model is validated by three-dimensional ( 3D ) particle-based simulations (PBS s), showing excellent agreement and enabling an efficient analytical characterization of the channel.

0

cs.ET 2026-04-09

Longer drone routes can use less energy than shorter ones

Energy-Efficient Drone Logistics for Last-Mile Delivery: Implications of Payload-Dependent Routing Strategies

Payload changes after each delivery make shortest-distance paths suboptimal for total energy use in many cases.

abstract click to expand

Drone delivery is rapidly emerging as a cost-effective and energy efficient alternative for last-mile delivery. Unlike ground vehicles, a drone's energy consumption depends on its payload in addition to travel distance. This creates a unique environmental challenge for multi-stop delivery tours, as the drone's total weight, and therefore its energy consumption rate, dynamically changes after each delivery. This paper investigates a novel green drone routing problem focused on maximizing energy efficiency. Through a series of motivating examples and numerical experiments, we demonstrate that energy-aware routing leads to several counter-intuitive routing strategies that contradict traditional distance-minimization delivery: a longer route may actually consume less energy than a shorter one; separate single-customer tours can be superior to a multi-stop tour; and a heterogeneous fleet, with drones of varying sizes, can achieve greater efficiency by matching drone capacity to specific delivery demands. In the numerical study, the green routing strategy shows energy savings in 67% of the instances. For these cases, the average energy saving is 2.17%, with a maximum saving of 5.97%, compared to minimum distance routing. These findings highlight the potential for green drone routing strategies to improve the sustainability of last-mile delivery.

0

cs.ET 2026-04-09 2 theorems

FR3 outperforms FR2 at cell edges with equal antenna sizes

FR3 for 6G Networks: A Comparative Study against FR1 and FR2 Across Diverse Environments

Ray-tracing across urban settings shows mmWave path loss outweighs array gains for cell-edge users.

abstract click to expand

Motivated by increasing wireless capacity demands and 6G advancements, the newly defined Frequency Range 3 (FR3, 7.125-24.25 GHz), also known as the upper mid-band, has emerged as a promising spectrum candidate. It offers a balance between the large bandwidth potential of millimeter-wave bands and the favorable propagation characteristics of sub-6 GHz bands. As a result, the upper mid-band presents a strong opportunity to enhance both coverage and capacity, particularly for 6G systems and Cellular Vehicle-to-Base Station (C-V2B) communications. Harnessing this potential, however, requires addressing key technical challenges through accurate and realistic channel modeling across diverse urban environments, including Suburban, Urban, and HighRise Urban scenarios. To this end, we employ a ray-tracing tool to characterize downlink propagation and enable detailed channel modeling for reliable C-V2B links. We evaluate data rate performance across FR1 (sub-6 GHz), FR3, and FR2 (mmWave) bands using antenna array configurations designed for different urban environments. The results show that, under equal aperture sizes, FR3 achieves higher data rates than FR2 for cell-edge User Equipment (UEs) in both interference-free and full-interference scenarios, indicating that the additional array gain at mmWave is insufficient to fully compensate for the severe experienced path loss. Integrating one-hand-grip pedestrian UEs model into ray tracer shows that transitioning from vehicular to pedestrian UEs results in negligible differences in coverage probability (about 1\%--3\%) across all frequencies, with the minimum differences observed in FR3, particularly at 8.2 GHz.

0

cs.ET 2026-04-09

Constraint-aware QAOA raises feasible VRP solutions over standard version

Improving Feasibility in Quantum Approximate Optimization Algorithm for Vehicle Routing via Constraint-Aware Initialization and Hybrid XY-X Mixing

Local one-hot initialization and hybrid XY-X mixing increase valid-route share in both ideal and noisy simulations.

abstract click to expand

The Quantum Approximate Optimization Algorithm (QAOA) is a leading framework for quantum combinatorial optimization. The Vehicle Routing Problem (VRP), a core problem in logistics and transportation, is a natural application target, but it poses a major feasibility challenge for standard QAOA because feasible solutions occupy only a tiny fraction of the search space, and the conventional Pauli-$X$ mixer can disrupt partial solution structures that satisfy key local constraints. To address this issue, we propose a constraint-aware QAOA framework with two complementary components. First, we design a lightweight initialization strategy that encodes a selected subset of simple yet informative local one-hot constraints into the initial state, thereby reducing the initial superposition space and increasing the probability mass on states with important local structure. Second, we introduce a hybrid XY-$X$ mixer that preserves the constraint structure imposed at initialization while retaining exploratory flexibility over the remaining unconstrained degrees of freedom during QAOA evolution. We evaluate the proposed framework against standard QAOA under three progressively more realistic regimes: ideal statevector simulation, finite-shot sampling, and noisy finite-shot sampling. Across all regimes, the proposed method consistently achieves lower average energy and higher feasible-solution ratios than standard QAOA, indicating more effective guidance toward structurally valid, lower-cost VRP solutions. However, the performance gap narrows in the noisy regime. Because this setting adopts a hardware-inspired error model based on near-best-reported laboratory-level qubit gate and readout fidelities, the observed attenuation suggests that the practical advantage of the more structured mixer is likely to grow as quantum hardware improves and error rates decline.

0

cs.ET 2026-04-09 2 theorems

Spintronic CiM arrays keep temperature uniform with linear activity scaling

Computing In Spintronic Memory: A Thermal Perspective

Lateral heat spreading prevents hotspots while temperature grows with active cells and shrinks with larger arrays depending on the memory.

abstract click to expand

Computing-in-Memory (CiM) is a promising paradigm to address the memory bottleneck constraining traditional systems. Most power-efficient CiM variants can directly perform Boolean operations in non-volatile memory arrays. Higher microarchitectural activity due to CiM, however, can significantly increase power density (power per area) and result in thermal hotspots. In this paper, we provide a quantitative thermal characterization for CiM. We demonstrate that (i) the temperature remains mostly uniform due to lateral thermal conduction; (ii) the temperature increases linearly with the number of memory cells participating in computation; (iii) the temperature decreases linearly with the memory array size; (iv) the memory technology dictates the power density, hence the thermal characteristics.

0

cs.ET 2026-04-08 Recognition

Quantized QRC readout cuts memory 81% with <1% accuracy loss

Late Breaking Results: Hardware-Efficient Quantum Reservoir Computing via Quantized Readout

Fixed untrained quantum circuit plus 6-bit classical layer matches FP32 forecasting on Tetouan power data.

abstract click to expand

Due to rising electricity demand, accurate short-term load forecasting is increasingly important for grid stability and efficient energy management, particularly in resource-constrained edge settings. We present a hardware-efficient Quantum Reservoir Computing (QRC) framework based on a fixed, untrained quantum circuit with Chebyshev feature encoding, brickwork entanglement, and single- and two-qubit Pauli measurements, avoiding quantum backpropagation entirely. Using the Tetouan City Power Consumption dataset, we examine the effect of post-training fixed-point quantization on the classical readout layer, with the reservoir architecture selected through a genetic search over 18 candidate configurations. Under finite-shot evaluation, 8-bit and 6-bit quantization maintain forecasting accuracy within 1% of the FP32 baseline while reducing readout memory by 75% and 81%, respectively. These results suggest that quantized readout can improve the hardware efficiency and deployment practicality of QRC for memory-constrained energy forecasting.

0

cs.ET 2026-04-08 Recognition

Three-tier taxonomy classifies AI agents for tailored governance

Beyond Tools and Persons: Who Are They? Classifying Robots and AI Agents for Proportional Governance

Sorting robots by integration across four dimensions replaces the tool-or-person split with liability rules, care duties, or limited personh

abstract click to expand

The rapid commercialization of humanoid robots and generative AI agents is outpacing legal frameworks built on a binary distinction between ``tools'' and ``persons.'' Current regulations, including the EU AI Act, classify systems by risk level but lack a foundational ontology for determining \emph{what kind of entity} an autonomous system is -- and what governance follows from that determination. We propose a classification framework grounded in Cyber-Physical-Social-Thinking (CPST) space theory, which categorizes autonomous entities by their degree of integration across four interconnected dimensions: computational, embodied, relational, and cognitive. The resulting three-tier taxonomy -- Confined Actors, Socially-Aware Interactors, and CPST-Integrated Agents -- provides principled scaffolding for proportional governance: enhanced product liability for isolated systems, relational duties of care for interactive companions, and qualified legal personhood for deeply integrated agents. We operationalize this taxonomy by identifying standardized assessment metrics drawn from robotics, human--robot interaction research, social computing, and cognitive science, and we propose a composite assessment protocol for regulatory use. We further address temporal dynamics -- how entities transition between categories as they evolve -- and the institutional design necessary for credible classification. We call for international standardization of this taxonomy before the 2027 review of the EU AI Act, and outline three concrete policy steps toward implementation.

0

cs.ET 2026-04-07 1 theorem

CMOS chip hosts first 3T-1MTJ probabilistic bit

Experimental Demonstration of an On-Chip CMOS-Integrated 3T-1MTJ Probabilistic Bit -- A P-Bit

The integrated device outputs full-swing random signals tunable by voltage, opening paths to efficient probabilistic computing on standard芯片

abstract click to expand

Ongoing semiconductor scaling challenges and the rise of neuromorphic computing have sparked interest in exploring novel computing schemes to achieve higher power efficiency and computational capabilities. Probabilistic computing is one candidate that endows low power consumption, capability of solving probability-encoded computational problems, and the ease of integration with existing CMOS technology. A basic building block of this scheme is the probabilistic bit (P-Bit), which utilizes a novel device such as a stochastic magnetic tunnel junction (sMTJ) to generate tunable randomness by nature. This work presents the first experimental demonstration of a fully CMOS-integrated sMTJ-based P-Bit, capable of generating rail-to-rail stochastic output with a mere collection of 3 transistors + 1 sMTJ. Furthermore, simulations also confirm this P-Bit's functionality in probabilistic logic circuits. The demonstration of such P-Bit paves the way towards realizing monolithic large-scale probabilistic computing architecture on CMOS chips.

0

cs.ET 2026-04-07 2 theorems

FeFET hardware runs Bayesian trees with 40% higher accuracy

Probabilistic Tree Inference Enabled by FDSOI Ferroelectric FETs

Single device type supplies both storage and randomness to cut energy use by four orders of magnitude versus GPUs

abstract click to expand

Artificial intelligence applications in autonomous driving, medical diagnostics, and financial systems increasingly demand machine learning models that can provide robust uncertainty quantification, interpretability, and noise resilience. Bayesian decision trees (BDTs) are attractive for these tasks because they combine probabilistic reasoning, interpretable decision-making, and robustness to noise. However, existing hardware implementations of BDTs based on CPUs and GPUs are limited by memory bottlenecks and irregular processing patterns, while multi-platform solutions exploiting analog content-addressable memory (ACAM) and Gaussian random number generators (GRNGs) introduce integration complexity and energy overheads. Here we report a monolithic FDSOI-FeFET hardware platform that natively supports both ACAM and GRNG functionalities. The ferroelectric polarization of FeFETs enables compact, energy-efficient multi-bit storage for ACAM, and band-to-band tunneling in the gate-to-drain overlap region and subsequent hole storage in the floating body provides a high-quality entropy source for GRNG. System-level evaluations demonstrate that the proposed architecture provides robust uncertainty estimation, interpretability, and noise tolerance with high energy efficiency. Under both dataset noise and device variations, it achieves over 40% higher classification accuracy on MNIST compared to conventional decision trees. Moreover, it delivers more than two orders of magnitude speedup over CPU and GPU baselines and over four orders of magnitude improvement in energy efficiency, making it a scalable solution for deploying BDTs in resource-constrained and safety-critical environments.

0

cs.ET 2026-04-07 2 theorems

Sparsified spin logic solves Ising problems to 1600 spins

Quantum-inspired Ising machine using sparsified spin connectivity

E-MVL reaches exact solutions four times larger than standard simulated annealing and runs six times faster on FPGA.

abstract click to expand

Combinatorial optimization problems become computationally intractable as these NP-hard problems scale. We previously proposed extraction-type majority voting logic (E-MVL), a quantum-inspired algorithm using digital logic circuits. E-MVL mimics the thermal spin dynamics of simulated annealing (SA) through controlled sparsification of spin interactions for efficient ground-state search. This study investigates the performance potential of E-MVL through systematic optimization and comprehensive benchmarking against SA. The target problem is the Sherrington-Kirkpatrick (SK) model with bimodal and Gaussian coupling distributions. Through equilibrium state analysis, we demonstrate that the sparsity control mechanism provides a consistent search of the solution space regardless of the problem's coupling distribution (bimodal, Gaussian) or size. E-MVL not only achieves the best performance among all tested algorithms-solving exact solutions up to 1600 spins where the best SA baseline is limited to 400 spins-but also provides insights that significantly improve SA's own temperature scheduling. These results establish E-MVL's dual contribution as both an efficient optimizer and a practical methodology for enhancing SA performance. Moreover, FPGA implementation achieved an approximately 6-fold faster solution speed than SA.

0

cs.ET 2026-04-07 2 theorems

SAIL cuts long-tail prediction errors by up to 28.8%

SAIL: Scene-aware Adaptive Iterative Learning for Long-Tail Trajectory Prediction in Autonomous Vehicles

Defining rare cases by error, risk and complexity then using adaptive contrastive mechanisms improves forecasts exactly where current models

abstract click to expand

Autonomous vehicles (AVs) rely on accurate trajectory prediction for safe navigation in diverse traffic environments, yet existing models struggle with long-tail scenarios-rare but safety-critical events characterized by abrupt maneuvers, high collision risks, and complex interactions. These challenges stem from data imbalance, inadequate definitions of long-tail trajectories, and suboptimal learning strategies that prioritize common behaviors over infrequent ones. To address this, we propose SAIL, a novel framework that systematically tackles the long-tail problem by first defining and modeling trajectories across three key attribute dimensions: prediction error, collision risk, and state complexity. Our approach then synergizes an attribute-guided augmentation and feature extraction process with a highly adaptive contrastive learning strategy. This strategy employs a continuous cosine momentum schedule, similarity-weighted hard-negative mining, and a dynamic pseudo-labeling mechanism based on evolving feature clustering. Furthermore, it incorporates a focusing mechanism to intensify learning on hard-positive samples within each identified class. This comprehensive design enables SAIL to excel at identifying and forecasting diverse and challenging long-tail events. Extensive evaluations on the nuScenes and ETH/UCY datasets demonstrate SAIL's superior performance, achieving up to 28.8% reduction in prediction error on the hardest 1% of long-tail samples compared to state-of-the-art baselines, while maintaining competitive accuracy across all scenarios. This framework advances reliable AV trajectory prediction in real-world, mixed-autonomy settings.

0

cs.ET 2026-04-07 2 theorems

Cross-coupled STT-MRAM raises DNN IMC accuracy by up to 70%

STRIDe: Cross-Coupled STT-MRAM Enabling Robust In-Memory-Computing for Deep Neural Network Accelerators

Boosting bitcell current ratios to 8000 improves sense margins 3.86x and cuts read errors for reliable edge neural hardware.

abstract click to expand

As deep neural network (DNN) models are growing exponentially in size, their deployment on resource-constrained edge platforms is becoming increasingly challenging. In-memory-computing (IMC) with non-volatile memories (NVMs) has emerged as a potential solution by virtue of its higher energy efficiency compared to standard DNN hardware platforms. Amongst various NVMs, STT-MRAM is highly promising owing to its high endurance and other benefits. However, their IMC implementation is challenging because of their inherently low distinguishability. This issue is exacerbated due to array non-idealities and process-variations, leading to poor IMC robustness and severe inference accuracy degradation. To address this problem, we propose STRIDe - STT-MRAM-based IMC leveraging cross-coupling action to boost the bitcell-level high-to-low current ratio to up to 8000. We propose two flavors of STRIDe designs, both offering robust IMC for inputs and weights in {-1, 1}(XNOR-IMC) and {0, 1}(AND-IMC) regime. Our evaluations for STRIDe arrays show up to 3.86x and 1.77x sense margin (SM) improvement for XNOR-IMC and AND-IMC, respectively, and up to 27.6% read disturb margin (RDM) improvement over standard MRAM-IMC designs. The enhanced robustness of STRIDe translates to near-software inference accuracies (considering crossbar non-idealities and process variations) for ResNet18 BNN and 4-bit DNN trained on CIFAR10 dataset. We observe accuracy improvements of up to 70% (for BNN) and up to 35%(for 4-bit DNN) over standard MRAM designs, albeit with some energy-area-latency penalty.

0

cs.ET 2026-04-07 Recognition

One architecture runs quantum neural nets on any framework

Eliminating Vendor Lock-In in Quantum Machine Learning via Framework-Agnostic Neural Networks

Unified graph and abstraction layer deliver models to multiple classical and quantum systems with matching accuracy and minimal slowdown.

abstract click to expand

Quantum machine learning (QML) stands at the intersection of quantum computing and artificial intelligence, offering the potential to solve problems that remain intractable for classical methods. However, the current landscape of QML software frameworks suffers from severe fragmentation: models developed in TensorFlow Quantum cannot execute on PennyLane backends, circuits authored in Qiskit Machine Learning cannot be deployed to Amazon Braket hardware, and researchers who invest in one ecosystem face prohibitive switching costs when migrating to another. This vendor lock-in impedes reproducibility, limits hardware access, and slows the pace of scientific discovery. In this paper, we present a framework-agnostic quantum neural network (QNN) architecture that abstracts away vendor-specific interfaces through a unified computational graph, a hardware abstraction layer (HAL), and a multi-framework export pipeline. The core architecture supports simultaneous integration with TensorFlow, PyTorch, and JAX as classical co-processors, while the HAL provides transparent access to IBM Quantum, Amazon Braket, Azure Quantum, IonQ, and Rigetti backends through a single application programming interface (API). We introduce three pluggable data encoding strategies (amplitude, angle, and instantaneous quantum polynomial encoding) that are compatible with all supported backends. An export module leveraging Open Neural Network Exchange (ONNX) metadata enables lossless circuit translation across Qiskit, Cirq, PennyLane, and Braket representations. We benchmark our framework on the Iris, Wine, and MNIST-4 classification tasks, demonstrating training time parity (within 8\% overhead) compared to native framework implementations, while achieving identical classification accuracy.

0

cs.ET 2026-04-07 Recognition

Photonic ViTs regain near-clean accuracy from real device noise

Light-Bound Transformers: Hardware-Anchored Robustness for Silicon-Photonic Computer Vision Systems

Measured microring noise turned into variance bounds enables training that stabilizes attention without extra optical MACs or in-situ retral

abstract click to expand

Deploying Vision Transformers (ViTs) on near-sensor analog accelerators demands training pipelines that are explicitly aligned with device-level noise and energy constraints. We introduce a compact framework for silicon-photonic execution of ViTs that integrates measured hardware noise, robust attention training, and an energy-aware processing flow. We first characterize bank-level noise in microring-resonator (MR) arrays, including fabrication variation, thermal drift, and amplitude noise, and convert these measurements into closed-form, activation-dependent variance proxies for attention logits and feed-forward activations. Using these proxies, we develop Chance-Constrained Training (CCT), which enforces variance-normalized logit margins to bound attention rank flips, and a noise-aware LayerNorm that stabilizes feature statistics without changing the optical schedule. These components yield a practical ``measure $\rightarrow$ model $\rightarrow$ train $\rightarrow$ run'' pipeline that optimizes accuracy under noise while respecting system energy limits. Hardware-in-the-loop experiments with MR photonic banks show that our approach restores near-clean accuracy under realistic noise budgets, with no in-situ learning or additional optical MACs.

0

cs.ET 2026-04-06 1 theorem

Automation links manufacturers to on-demand platforms

Building a Dataspace for Manufacturing as a Service in Factory-X

By automating the full interaction from registration to quality reporting, SMEs can process more requests while maintaining high quality for

abstract click to expand

One way to solve the challenge of small and medium-sized enterprise (SME) manufacturers of acquiring sufficient orders is by joining digital Manufacturing-as-a-Service (MaaS) platforms for on-demand manufacturing. However, joining such platforms brings about new challenges such as efficient quoting handling in the face of potentially low success rates and the need for high production quality for low lot sizes. Automating the complete interaction between manufacturers and MaaS platforms, from registering the manufacturer and its capabilities to handling incoming requests and managing offers, orders, and production quality reporting, helps to overcome these challenges. Thus, the increased number of requests can be handled efficiently, and the production quality can be maintained at a high level even for low lot sizes. This paper presents an architecture for automating the interaction and functional building blocks between manufacturers and MaaS platforms, along with a prototype implementation and evaluation of its effectiveness in addressing the challenges SME manufacturers are faced with.

0

cs.ET 2026-04-03 2 theorems

Photonic CNN reaches 94% MNIST accuracy in pure optics

Photonic convolutional neural network with pre-trained in-situ training

The design skips all optical-to-electrical conversions and uses 100 to 242 times less energy than GPUs for each image.

abstract click to expand

Photonic computing is a computing paradigm which have great potential to overcome the energy bottlenecks of electronic von Neumann architecture. Throughput and power consumption are fundamental limitations of Complementary-metal-oxide-semiconductor (CMOS) chips, therefore convolutional neural network (CNN) is revolutionising machine learning, computer vision and other image based applications. In this work, we propose and validate a fully photonic convolutional neural network (PCNN) that performs MNIST image classification entirely in the optical domain, achieving 94 percent test accuracy. Unlike existing architectures that rely on frequent in-between conversions from optical to electrical and back to optical (O/E/O), our system maintains coherent processing utilizing Mach-Zehnder interferometer (MZI) meshes, wavelength-division multiplexed (WDM) pooling, and microring resonator-based nonlinearities. The max pooling unit is fully implemented on silicon photonics, which does not require opto-electrical or electrical conversions. To overcome the challenges of training physical phase shifter parameters, we introduce a hybrid training methodology deploying a mathematically exact differentiable digital twin for ex-situ backpropagation, followed by in-situ fine-tuning via Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm. Our evaluation demonstrates significant robustness to thermal crosstalk (only 0.43 percent accuracy degradation at severe coupling) and achieves 100 to 242 times better energy efficiency than state-of-the-art electronic GPUs for single-image inference.

0

cs.ET 2026-04-03 2 theorems

Quantum circuit learns cell communication as state shift

QuantumXCT: Learning Interaction-Induced State Transformation in Cell-Cell Communication via Quantum Entanglement and Generative Modeling

Unitary transformation in Hilbert space maps non-interacting RNA profiles to interacting ones without ligand-receptor lists.

abstract click to expand

Inferring cell-cell communication (CCC) from single-cell transcriptomics remains fundamentally limited by reliance on curated ligand-receptor databases, which primarily capture co-expression rather than the system-level effects of signaling on cellular states. Here, we introduce QuantumXCT, a hybrid quantum-classical generative framework that reframes CCC as a problem of learning interaction-induced state transformations between cellular state distributions. By encoding transcriptomic profiles into a high-dimensional Hilbert space, QuantumXCT trains parameterized quantum circuits to learn a unitary transformation that maps a baseline non-interacting cellular state to an interacting state. This approach enables the discovery of communication-driven changes in cellular state distributions without requiring prior biological assumptions. We validate QuantumXCT using both synthetic data with known ground-truth interactions and single-cell RNA-seq data from ovarian cancer-fibroblast co-culture model. The QuantumXCT model accurately recovered complex regulatory dependencies, including feedback structures, and identified dominant communication hubs such as the PDGFB-PDGFRB-STAT3 axis. Importantly, the learned quantum circuit is interpretable: its entangling topology was translated into biologically meaningful interaction networks, while post hoc contribution analysis quantified the relative influence of individual interactions on the observed state transitions. Notably, by shifting CCC inference from static interaction lookup to learning data-driven state transformations, QuantumXCT provides a generative framework for modeling intercellular communication. This work establishes a new paradigm for de novo discovery of communication programs in complex biological systems and highlights the potential of quantum machine learning in the context of single-cell biology.

0

cs.ET 2026-03-31 2 theorems

Library builds O(m) circuits for ten quantum amplitude patterns

PyEncode: An Open-Source Library for Structured Quantum State Preparation

PyEncode maps sparse, Fourier, Dicke and other structured vectors to verified Qiskit circuits plus composition tools.

abstract click to expand

Quantum algorithms require encoding classical vectors as quantum states, a step known as amplitude encoding. General-purpose routines produce circuits with $\bigO{2^m}$ gates for vectors of length $N = 2^m$. However, vectors arising in scientific and engineering applications often exhibit mathematical structure that admits far more efficient encoding. Theoretical work over the last decade has established efficient circuits for several structured vector classes, but without open-source implementations. We present \textbf{PyEncode}, an open-source Python library that implements this body of theory in a unified framework. It covers ten exact pattern families: \emph{sparse, step, square, Walsh, Fourier, geometric, Hamming, staircase, Dicke}, and \emph{polynomial}. A function \texttt{encode} maps each pattern to a verified Qiskit circuit, with no vector materialization and no approximation; for example, \texttt{encode(SPARSE([(19, 1.0)]), N=64)} encodes the vector $\mathbf{e}_{19}$ of length $N = 64$. Sparse, step, Walsh, Hamming, and staircase patterns require $\bigO{m}$ gates; square and Fourier patterns require $\bigO{m^2}$; Dicke states $|D^m_k\rangle$ require $\bigO{k(m-k)}$; degree-$d$ polynomials require $\bigO{m^{d+1}}$. A companion \texttt{predict\_gates} function estimates transpiled gate counts without synthesis. Three composition primitives are supported: \texttt{SUM} for weighted superpositions, \texttt{PARTITION} for ancilla-free composition of disjoint-support patterns, and \texttt{TENSOR} for separable states over disjoint subregisters. For amplitude vectors outside these exact families, PyEncode also provides a matrix product state (MPS) loader, \texttt{encode\_mps}. The library is available at https://github.com/UW-ERSL/PyEncode.

0