It took longer than I was expecting: Why is dataset search still so hard?

· 2024 · arXiv 5939.366595

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

A Reference Architecture for Agentic Hybrid Retrieval in Dataset Search

cs.IR · 2026-03-28 · unverdicted · novelty 6.0

The paper defines a bounded reference architecture for LLM-orchestrated hybrid retrieval in dataset search using BM25, dense embeddings, reciprocal rank fusion, and metadata augmentation with pseudo-queries.

Less Is More? When Dataset Context Hurts LLM-Generated Dataset Descriptions

cs.DB · 2026-06-01 · unverdicted · novelty 4.0

An ablation study on 252 datasets finds that adding table schemas to LLM prompts consistently degrades narrative quality of generated descriptions compared to titles alone.

citing papers explorer

Showing 2 of 2 citing papers.

A Reference Architecture for Agentic Hybrid Retrieval in Dataset Search cs.IR · 2026-03-28 · unverdicted · none · ref 8
The paper defines a bounded reference architecture for LLM-orchestrated hybrid retrieval in dataset search using BM25, dense embeddings, reciprocal rank fusion, and metadata augmentation with pseudo-queries.
Less Is More? When Dataset Context Hurts LLM-Generated Dataset Descriptions cs.DB · 2026-06-01 · unverdicted · none · ref 4
An ablation study on 252 datasets finds that adding table schemas to LLM prompts consistently degrades narrative quality of generated descriptions compared to titles alone.

It took longer than I was expecting: Why is dataset search still so hard?

fields

years

verdicts

representative citing papers

citing papers explorer