archive
Every paper Pith has read. Search by title, abstract, or pith.
43 papers in cs.OS · page 1
-
Harness design stabilizes small language models at 95 percent success
It's Not the Size: Harness Design Determines Operational Stability in Small Language Models
-
KV-cache movement regularization cuts static-graph LLM latency spikes
KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving
-
Virtualization hardware isolates Linux kernel parts with no code changes
Pomegranate: A Lightweight Compartmentalization Architecture using Virtualization Extensions
-
Case study maps SIL rules and memory limits in real car software
Shedding Light onto Safety Integrity Level and Basic Software Constraints in a Real-World Automotive Application: Case Study with Driverator Framework
-
Pub/sub smart pointer limits reference updates to 0-1 per subscriber
ipc_shared_ptr: A Publish/Subscribe-Aware Smart Pointer for Cross-Process Object Lifetime Management
-
GPU-centric store makes SSD KV cache match DRAM speed
Tutti: Making SSD-Backed KV Cache Practical for Long-Context LLM Serving
-
Three-tier API governs urban sensor data with privacy tiers
CityOS: Privacy Architecture for Urban Sensing
-
CvxCluster uses a two-stage convex optimization approach to allocate resources across…
CvxCluster: Solving Large, Complex, Granular Resource Allocation Problems 100-1000x Faster
-
VUDA delivers 85% higher throughput via CUDA-Vulkan spatial sharing
VUDA: Breaking CUDA-Vulkan Isolation for Spatial Sharing of Compute and Graphics on the Same GPU
-
Workflow scheduling cuts AI agent task time by 1.64x
SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters
-
Agent sandboxes hit 100% recovery correctness at 87% less traffic
Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes
-
Affinity hints give 12% throughput boost on chiplet servers
Affinity Tailor: Dynamic Locality-Aware Scheduling at Scale
-
WebAssembly capsules run updatable code on tiny microcontrollers
treVM: Tiny Rust Embedded Virtual Machines with WASM on Variable Resource-Constrained Hardware
-
Rust matches C on microcontroller firmware size and speed
Embedded Rust or C Firmware? Lessons from an Industrial Microcontroller Use Case with Ariel OS
-
Tenant protocols match fixed-stack speed with isolation
Chamelio: A Fast Shared Cloud Network Stack for Isolated Tenant-Defined Protocols
-
Local cost signal lifts satellite goodput 20% and throughput 31%
Equinox: Decentralized Scheduling for Hardware-Aware Orbital Intelligence
-
GAAP enforces user data permissions for AI agents deterministically
An AI Agent Execution Environment to Safeguard User Data
-
CXL single-copy cache yields 5.6X geo-mean speedup
DPC: A Distributed Page Cache over CXL
-
PREEMPT_RT cuts UAV control latency by 88 percent on Raspberry Pi 5
Scheduling Analysis of UAV Flight Control Workloads using Raspberry Pi 5 Using PREEMPT_RT Linux
-
Confidential VMs run LLM agents securely on edge devices
AgenTEE: Confidential LLM Agent Execution on Edge Devices
-
Processes and pipes made lightweight for far memory accelerators
Proxics: an efficient programming model for far memory accelerators
-
Persistent GPU kernel yields 15x speedup for tiny tensor operations
GPUOS: A GPU Operating System Primitive for Transparent Operation Fusion
-
Kernel gateway blocks AI tool-call bypasses
Governed MCP: Kernel-Level Tool Governance for AI Agents via Logit-Based Safety Primitives
-
Filesystem lets AI agents self-correct file mistakes
Don't Let AI Agents YOLO Your Files: Shifting Information and Control to Filesystems for Agent Safety and Autonomy
-
eBPF hooks decide page moves in tiered memory for up to 17% higher throughput
TierBPF: Page Migration Admission Control for Tiered Memory via eBPF
-
MARS cuts agentic latency by 5.94x via co-scheduling
MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems
-
Periodic framework organizes distributed computing
A Periodic Space of Distributed Computing: Vision & Framework
-
Physics-informed DLinear forecasts AI data center power more accurately
A Physics-Aware Framework for Short-Term GPU Power Forecasting of AI Data Centers
-
Hybrid tuning raises tiered memory performance up to 30%
Hybrid Adaptive Tuning for Tiered Memory Systems
-
Kernel reads one logit to classify AI agent actions
ProbeLogits: Kernel-Level LLM Inference Primitives for AI-Native Operating Systems
-
Nanvix cuts serverless server needs by 20-100x
Nanvix: A Multikernel OS Design for High-Density Serverless Deployments
-
ClawVM makes LLM agent state residency deterministic
ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents
-
Decoupling vectors from indexes cuts storage by up to 59%
Decoupling Vector Data and Index Storage for Space Efficiency
-
Adaptive quantization cuts mobile LLM cold starts by 4x
EdgeFlow: Fast Cold Starts for LLMs on Mobile Devices
-
Game orchestrator finds 2.7x more kernel vulnerabilities per budget
VCAO: Verifier-Centered Agentic Orchestration for Strategic OS Vulnerability Discovery
-
Valve saves 2,170 GPUs by colocating online and offline inference
Valve: Production Online-Offline Inference Colocation with Jointly-Bounded Preemption Latency and Rate
-
Hardware middleware cuts device onboarding latency by 65%
A Hardware-Anchored Privacy Middleware for PII Sharing Across Heterogeneous Embedded Consumer Devices
-
CPU-free LLM serving cuts P99 latency up to 8x
Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC
-
Client scheduler hits 100% LLM deadlines at 4.2 requests per second
Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale
-
Nexus cuts serverless CPU use 44% by offloading I/O from VMs
Nexus: Transparent I/O Offloading for High-Density Serverless Computing
-
Scheduler cuts quantum queue times 30-75% at high load
Qurator: Scheduling Hybrid Quantum-Classical Workflows Across Heterogeneous Cloud Providers
-
Single GPU trains 120B-parameter models at full precision
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU
-
Migratable actors on CXL SSDs dodge thermal cliffs
WIO: Upload-Enabled Computational Storage on CXL SSDs