archive
Every paper Pith has read. Search by title, abstract, or pith.
32 papers in cs.MS · page 1
-
Docker container makes Basilisk GN&C simulations reproducible
Basilisk and Docker for Reproducible GN&C Simulation: A Workflow Reference
-
Compiler partitions multi-QPU circuits to cut port overflow and link congestion
QuPort: Topology-, Port-, and Congestion-Aware Compilation for Modular Multi-QPU Quantum Systems
-
Distributed NUFFT scales to 1024 GPUs on mixed hardware
A Performance-Portable, Massively Parallel Distributed Nonuniform FFT
-
Riemannian L-BFGS handles Euclidean bounds on manifolds
A Riemannian quasi-Newton algorithm for optimization with Euclidean bounds
-
GPU solver speeds up entropic optimal transport calculations
cuRegOT: A GPU-Accelerated Solver for Entropic-Regularized Optimal Transport
-
R package MeTime stores all metabolomics steps in one container
MeTime: An R package for reproducible longitudinal metabolomics data analysis
-
FalconGEMM exceeds GEMM speeds by 7-17% via lower-complexity algorithms
FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication
-
Automated low-complexity matrix multiplies beat hardware peaks
FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication
-
Gradients computed for DAE simulations with events
Differentiable Parameter Optimization for DAEs with State-Dependent Events
-
Randompack produces identical random sequences across languages and machines
Randompack: Cross-Platform Reproducible Random Number Generation and Distribution Sampling
-
CombOL delivers unbiased Boltzmann samples via dynamic precision
CombOL: a Library for Practical Enumeration and Boltzmann Sampling of Combinatorial Classes
-
mrdi file format supports distributed algebraic computing
Interprocess Communication of Algebraic Data
-
Hybrid seam gives exact SE(3) Hessians five times faster
Exact Higher-Order Derivatives for SE(3) via Analytical/AD Methods
-
Multi-objective workflow selects quantum strategies reproducibly
QBalance: A Reproducible Multi-Objective Workflow for Quantum Compilation, Noise Suppression, and Error-Mitigation Strategy Selection
-
FitED is a Python desktop application that provides an interactive GUI and numerical…
FitED: A User-Centric, Extensible Software Environment for Robust Peak-Profile and General Functional Data Fitting
-
Fast-vollib accelerates implied volatility via PyTorch JAX and CUDA backends
Fast-Vollib: A Fast Implied Volatility Library for Pythonwith PyTorch, JAX, and CUDA Fused-Kernel Backends
-
The paper applies Variational Expectation Maximization to fit nonlinear mixed effects…
Fitting Large Nonlinear Mixed Effects Models Using Variational Expectation Maximization
-
Classical emulator of HHL scales only with qubit count
Extending UNIQuE: Quantum Simulation Speedup for the HHL Algorithm
-
Asymmetric key-value quantization lowers attention KL error at 4 bits
Statistical Inference and Quality Measures of KV Cache Quantisations Inspired by TurboQuant
-
Compile-time templates fuse GPU kernels to beat PyTorch
Fast GPU Linear Algebra via Compile Time Expression Fusion
-
Hybrid JAX-PETSc code beats pure JAX on large micromechanics simulations
JetSCI: A Hybrid JAX-PETSc Framework for Scalable Differentiable Simulation
-
The paper introduces an algorithm that generates random graphs with prescribed expected…
Efficient generation of expected-degree graphs via edge-arrivals
-
New package computes path signature tensors in Julia
SignatureTensors.jl: A Package for Signature Tensors in Julia
-
HyperLogLog skips exact counts for faster GPU SpGEMM
Ocean: Fast Estimation-Based Sparse General Matrix-Matrix Multiplication on GPU
-
Interface abstracts block-encodings for quantum algorithms
Block-encodings as programming abstractions: The Eclipse Qrisp BlockEncoding Interface
-
FAMLIES vertically integrates BLIS and libflame for unified HPC linear algebra
A Proposed Framework for Advanced (Multi)Linear Infrastructure in Engineering and Science (FAMLIES)
-
Benchmark turns k-server conjecture into inequality search for agents
$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture
-
TNRKit pulls scaling dimensions from tensor fixed points
A Practical Introduction to Tensor Network Renormalization with TNRKit.jl
-
Benchmarks guide MATLAB backend choice for multivariate polynomials
Polylab: A MATLAB Toolbox for Multivariate Polynomial Modeling
-
Residue overrides eliminate most false reports in floating-point debuggers
Accurate Residues for Floating-Point Debugging
-
Hybrid solver reaches exact AVI solutions in finite steps
\texttt{DR-DAQP}: An Hybrid Operator Splitting and Active-Set Solver for Affine Variational Inequalities
-
Branch-free algorithms speed up high-precision arithmetic
Acceleration of multi-component multiple-precision arithmetic with branch-free algorithms and SIMD vectorization
-
FP64 tensor cores speed finite-element kernels 2x
Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores