archive
Every paper Pith has read. Search by title, abstract, or pith.
221 papers in cs.DB · page 1
-
Hybrid sketches match best space bounds for dynamic graph connectivity
Hybrid Sketching Methods for Dynamic Connectivity on Sparse Graphs
-
Retrieval augments schema graphs for relational database predictions
From Schema to Signal: Retrieval-Augmented Modeling for Relational Data Analytics
-
FPGA lock agents boost OLTP throughput 51X over CPUs
FPGA-Accelerated Lock Management and Transaction Processing: Architecture, Optimization, and Design Space Exploration
-
ELbotpreceq extends DL-Lite with reachability in NL
A Horn extension of DL-Lite with NL data complexity
-
Graph links 200k research repos to papers and artifacts
SemRepo: A Knowledge Graph for Research Software and Its Scholarly Ecosystem
-
Benchmark shows top multimodal models lag on e-commerce
OxyEcomBench: Benchmarking Multimodal Foundation Models across E-Commerce Ecosystems
-
3D primitives in code raise VLM spatial scores up to 17 percent
3D Primitives are a Spatial Language for VLMs
-
Commercial 5G dataset aids AI handover and beam management
Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance
-
Chase termination undecidable even for decidable queries
Will My Favorite Chases Terminate if Evaluating Conjunctive Queries Does? One Does Not Simply Decide This
-
Separating instances pick correct NL2SQL candidate
Data-aware candidate selection in NL2SQL translation via small separating instances
-
BatchBench framework equalizes autoscaling policy tests
BatchBench: Toward a Workload-Aware Benchmark for Autoscaling Policies in Big Data Batch Processing -- A Proposed Framework
-
Knowledge graphs source optimization problems via queries
Graph-Grounded Optimization: Rao-Family Metaheuristics, Classical OR, and SLM-Driven Formulation over Knowledge Graphs
-
Graph queries for optimization reveal hidden data flaws
Graph-Grounded Optimization: Rao-Family Metaheuristics, Classical OR, and SLM-Driven Formulation over Knowledge Graphs
-
Replicas detect and repair database corruption without stopping work
PROTECT-DB: Protecting Data using Replicated State Machines: Efficient Corruption Detection & Recovery
-
LLMs cannot always be correct
A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination
-
Benchmark with 40 epidemic datasets enables fair model comparisons
EpiCastBench: Datasets and Benchmarks for Multivariate Epidemic Forecasting
-
Relational signals lift membership inference on tabular diffusion models
FERMI: Exploiting Relations for Membership Inference Against Tabular Diffusion Models
-
SHACL-DS validates named graphs faster than standard SHACL
Keeping track of errors: A study of SHACL-DS for RDF dataset validation on the ERA RINF Knowledge Graph
-
Single GPU kernel fuses IO and query steps for faster analytics
Data Path Fusion in GPU for Analytical Query Processing
-
Text2Cypher must reason across multiple graph databases
Toward Multi-Database Query Reasoning for Text2Cypher
-
Autonomous objects resolve over half of scientific data conflicts
Autonomous FAIR Digital Objects: From Passive Assertions to Active Knowledge
-
Cloud GPUs speed graph index construction by 9x at 6x lower cost
ScaleGANN: Accelerate Large-Scale ANN Indexing by Cost-effective Cloud GPUs
-
Graph of codecs compresses data smaller and faster
OpenZL: Using Graphs to Compress Smaller and Faster
-
Home activity benchmark shows AI question-answering gaps
HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activities
-
Krone decomposes logs into entity-action-status units for modular anomaly detection
Detect, Localize, and Explain: Interactive Hierarchical Log Anomaly Analytics with LLM Augmentation
-
One-to-one matching boosts ontology alignment precision
Open Ontologies: Tool-Augmented Ontology Engineering with Stable Matching Alignment
-
Personalized privacy cuts infinite stream estimation error by 53.6%
Personalized w-Event Privacy for Infinite Stream Estimation
-
Diagnosis consistency links to actual causality for AI explanations
Reconciling Consistency-Based Diagnosis with Actual-Causality-Based Explanations
-
LLMs fall short on natural language data prep tasks
PrepBench: How Far Are We from Natural-Language-Driven Data Preparation?
-
Elastic scheduling meets stream deadlines at lowest cost
Elastic Scheduling of Intermittent Query Processing in a Cluster Environment
-
Heavy-light partitioning maintains arbitrary joins under updates
Maintaining Queries under Updates Using Heavy-Light Partitioning of the Input Relations
-
SkipDisk hits 63% HNSW latency at 20% memory
Low-Latency Out-of-Core ANN Search in High-Dimensional Space
-
Query rewrite rules written once deploy across database engines
An Extensible and Verifiable Language for Query Rewrite Rules
-
Every query reduces to Filter
Anatomy of a Query: W5H Dimensions and FAR Patterns for Text-to-SQL Evaluation
-
Diversity selection builds versatile materials datasets
Building informative materials datasets beyond targeted objectives
-
Caching cuts redundant CBO calls in cost-based query rewrite
Efficient Cost-Based Rewrite in a Bottom-Up Optimizer
-
Only solution concentration ranks consistently across electrospinning ML models
Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features
-
Concentration alone has zero rank variability in electrospinning models
Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features
-
Hierarchical agents clean messy time series without ground truth
AegisTS: A Hierarchical Agent System with Reinforcement Learning for Multivariate Time Series Data Cleaning
-
Fused soil dataset pretrains representations matching real processes
LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems
-
Fused soil dataset pretrains model to capture real processes
LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems
-
Database repairs match preferred extensions in SETAFs
Inconsistent Databases and Argumentation Frameworks with Collective Attacks
-
ConRAD introduces a framework that applies conformal risk control inside neural graph…
ConRAD: Conformal Risk-Aware Neural Databases
-
Sliced kd-trees speed up multi-dimensional queries in memory
In-memory Multidimensional Indexing Using the skd-tree
-
AI agents average 45 percent on workspace tasks with 20k files
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies
-
AI agents top out at 60% on workspace file dependency tasks
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies
-
Agents reach 68.7% on workspace tasks with big file sets
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies
-
Agents hit 43% average on realistic workspace tasks
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies
-
3B model hits 85% Text-to-SQL accuracy using fine-grained rewards
FINER-SQL: Boosting Small Language Models for Text-to-SQL
-
AI models recover semantics from legacy database code
Semantic Reverse Engineering Legacy Software Applications with ChatGPT, Gemini AI, and Claude AI