archive

Every paper Pith has read. Search by title, abstract, or pith.

221 papers in cs.DB · page 1

cs.DS 2026-05-14 reviewed

Hybrid sketches match best space bounds for dynamic graph connectivity
Hybrid Sketching Methods for Dynamic Connectivity on Sparse Graphs

David Tench +4
cs.DB 2026-05-14 reviewed

Retrieval augments schema graphs for relational database predictions
From Schema to Signal: Retrieval-Augmented Modeling for Relational Data Analytics

Beng Chin Ooi +5
cs.AR 2026-05-13 reviewed

FPGA lock agents boost OLTP throughput 51X over CPUs
FPGA-Accelerated Lock Management and Transaction Processing: Architecture, Optimization, and Design Space Exploration

Gustavo Alonso +1
cs.LO 2026-05-13 reviewed

ELbotpreceq extends DL-Lite with reachability in NL
A Horn extension of DL-Lite with NL data complexity

Bartosz Jan Bednarczyk +2
cs.DL 2026-05-13 reviewed

Graph links 200k research repos to papers and artifacts
SemRepo: A Knowledge Graph for Research Software and Its Scholarly Ecosystem

Abdul Rafay +3
cs.DB 2026-05-13 reviewed

Benchmark shows top multimodal models lag on e-commerce
OxyEcomBench: Benchmarking Multimodal Foundation Models across E-Commerce Ecosystems

Bing Bai +7
cs.CV 2026-05-12 reviewed

3D primitives in code raise VLM spatial scores up to 17 percent
3D Primitives are a Spatial Language for VLMs

Alejandro Mottini +10
eess.SP 2026-05-12 reviewed

Commercial 5G dataset aids AI handover and beam management
Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance

Deepa M.R +3
cs.DB 2026-05-12 reviewed

Chase termination undecidable even for decidable queries
Will My Favorite Chases Terminate if Evaluating Conjunctive Queries Does? One Does Not Simply Decide This

Lucas Larroque +1
cs.DB 2026-05-12 reviewed

Separating instances pick correct NL2SQL candidate
Data-aware candidate selection in NL2SQL translation via small separating instances

Alexander Shulgin +2
cs.IR 2026-05-12 reviewed

BatchBench framework equalizes autoscaling policy tests
BatchBench: Toward a Workload-Aware Benchmark for Autoscaling Policies in Big Data Batch Processing -- A Proposed Framework

Siri Chandana Sirigiri +1
cs.DB 2026-05-12 reviewed

Knowledge graphs source optimization problems via queries
Graph-Grounded Optimization: Rao-Family Metaheuristics, Classical OR, and SLM-Driven Formulation over Knowledge Graphs

Madhulatha Mandarapu (samyama.ai) +1
cs.DB 2026-05-12 reviewed

Graph queries for optimization reveal hidden data flaws
Graph-Grounded Optimization: Rao-Family Metaheuristics, Classical OR, and SLM-Driven Formulation over Knowledge Graphs

Madhulatha Mandarapu (samyama.ai) +1
cs.DB 2026-05-12 reviewed

Replicas detect and repair database corruption without stopping work
PROTECT-DB: Protecting Data using Replicated State Machines: Efficient Corruption Detection & Recovery

Anant Utgikar +1
cs.AI 2026-05-12 reviewed

LLMs cannot always be correct
A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination

Vinu Ellampallil Venugopal
cs.LG 2026-05-12 reviewed

Benchmark with 40 epidemic datasets enables fair model comparisons
EpiCastBench: Datasets and Benchmarks for Multivariate Epidemic Forecasting

Danny D'Agostino +4
cs.LG 2026-05-12 reviewed

Relational signals lift membership inference on tabular diffusion models
FERMI: Exploiting Relations for Membership Inference Against Tabular Diffusion Models

Abtin Mahyar +3
cs.DB 2026-05-11 reviewed

SHACL-DS validates named graphs faster than standard SHACL
Keeping track of errors: A study of SHACL-DS for RDF dataset validation on the ERA RINF Knowledge Graph

Christophe Debruyne +2
cs.DB 2026-05-11 reviewed

Single GPU kernel fuses IO and query steps for faster analytics
Data Path Fusion in GPU for Analytical Query Processing

Kazuo Goda +1
cs.DB 2026-05-11 reviewed

Text2Cypher must reason across multiple graph databases
Toward Multi-Database Query Reasoning for Text2Cypher

Makbule Gulcin Ozsoy
cs.AI 2026-05-11 reviewed

Autonomous objects resolve over half of scientific data conflicts
Autonomous FAIR Digital Objects: From Passive Assertions to Active Knowledge

Christoph Lange +3
cs.DB 2026-05-11 reviewed

Cloud GPUs speed graph index construction by 9x at 6x lower cost
ScaleGANN: Accelerate Large-Scale ANN Indexing by Cost-effective Cloud GPUs

Boon Thau Loo +7
cs.IR 2026-05-11 reviewed

Graph of codecs compresses data smaller and faster
OpenZL: Using Graphs to Compress Smaller and Faster

Danielle Rozenblit +12
cs.CL 2026-05-10 reviewed

Home activity benchmark shows AI question-answering gaps
HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activities

Aoi Ohta +7
cs.DB 2026-05-09 reviewed

Krone decomposes logs into entity-action-status units for modular anomaly detection
Detect, Localize, and Explain: Interactive Hierarchical Log Anomaly Analytics with LLM Augmentation

Athanasios Tassiadamis +7
cs.AI 2026-05-09 reviewed

One-to-one matching boosts ontology alignment precision
Open Ontologies: Tool-Augmented Ontology Engineering with Stable Matching Alignment

Fabio Rovai
cs.DB 2026-05-09 reviewed

Personalized privacy cuts infinite stream estimation error by 53.6%
Personalized w-Event Privacy for Infinite Stream Estimation

Kenli Li +6
cs.AI 2026-05-09 reviewed

Diagnosis consistency links to actual causality for AI explanations
Reconciling Consistency-Based Diagnosis with Actual-Causality-Based Explanations

Leopoldo Bertossi
cs.DB 2026-05-09 reviewed

LLMs fall short on natural language data prep tasks
PrepBench: How Far Are We from Natural-Language-Driven Data Preparation?

Guoliang Li +3
cs.DB 2026-05-09 reviewed

Elastic scheduling meets stream deadlines at lowest cost
Elastic Scheduling of Intermittent Query Processing in a Cluster Environment

Saranya Chandrasekaran +1
cs.DB 2026-05-08 reviewed

Heavy-light partitioning maintains arbitrary joins under updates
Maintaining Queries under Updates Using Heavy-Light Partitioning of the Input Relations

Ahmet Kara +4
cs.DB 2026-05-07 reviewed

SkipDisk hits 63% HNSW latency at 20% memory
Low-Latency Out-of-Core ANN Search in High-Dimensional Space

Bin Wang +3
cs.DB 2026-05-07 reviewed

Query rewrite rules written once deploy across database engines
An Extensible and Verifiable Language for Query Rewrite Rules

Alvin Cheung +5
cs.DB 2026-05-07 reviewed

Every query reduces to Filter
Anatomy of a Query: W5H Dimensions and FAR Patterns for Text-to-SQL Evaluation

Eduardo Valverde +2
cond-mat.mtrl-sci 2026-05-06 reviewed

Diversity selection builds versatile materials datasets
Building informative materials datasets beyond targeted objectives

Adji Bousso Dieng +8
cs.DB 2026-05-06 reviewed

Caching cuts redundant CBO calls in cost-based query rewrite
Efficient Cost-Based Rewrite in a Bottom-Up Optimizer

Chong Chen +6
cs.LG 2026-05-06 reviewed

Only solution concentration ranks consistently across electrospinning ML models
Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features

Ferenc Ender +2
cs.LG 2026-05-06 reviewed

Concentration alone has zero rank variability in electrospinning models
Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features

Ferenc Ender +2
cs.DB 2026-05-06 reviewed

Hierarchical agents clean messy time series without ground truth
AegisTS: A Hierarchical Agent System with Reinforcement Learning for Multivariate Time Series Data Cleaning

Lu Chen +4
cs.LG 2026-05-05 reviewed

Fused soil dataset pretrains representations matching real processes
LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems

Kuangdai Leng +3
cs.LG 2026-05-05 reviewed

Fused soil dataset pretrains model to capture real processes
LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems

Kuangdai Leng +3
cs.DB 2026-05-05 reviewed

Database repairs match preferred extensions in SETAFs
Inconsistent Databases and Argumentation Frameworks with Collective Attacks

Axel-Cyrille Ngonga Ngomo +3
cs.DB 2026-05-05 reviewed

ConRAD introduces a framework that applies conformal risk control inside neural graph…
ConRAD: Conformal Risk-Aware Neural Databases

Fabian Zeiher +6
cs.DB 2026-05-05 reviewed

Sliced kd-trees speed up multi-dimensional queries in memory
In-memory Multidimensional Indexing Using the skd-tree

Achilleas Michalopoulos +2
cs.AI 2026-05-05 reviewed

AI agents average 45 percent on workspace tasks with 20k files
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

Chunwei Liu +19
cs.AI 2026-05-05 reviewed

AI agents top out at 60% on workspace file dependency tasks
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

Chunwei Liu +19
cs.AI 2026-05-05 reviewed

Agents reach 68.7% on workspace tasks with big file sets
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

Chunwei Liu +19
cs.AI 2026-05-05 reviewed

Agents hit 43% average on realistic workspace tasks
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

Chunwei Liu +19
cs.DB 2026-05-05 reviewed

3B model hits 85% Text-to-SQL accuracy using fine-grained rewards
FINER-SQL: Boosting Small Language Models for Text-to-SQL

Hongzhi Yin +6
cs.SE 2026-05-05 reviewed

AI models recover semantics from legacy database code
Semantic Reverse Engineering Legacy Software Applications with ChatGPT, Gemini AI, and Claude AI

Christian Mancas +1