The paper defines a bounded reference architecture for LLM-orchestrated hybrid retrieval in dataset search using BM25, dense embeddings, reciprocal rank fusion, and metadata augmentation with pseudo-queries.
It took longer than I was expecting: Why is dataset search still so hard?
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
An ablation study on 252 datasets finds that adding table schemas to LLM prompts consistently degrades narrative quality of generated descriptions compared to titles alone.
citing papers explorer
-
A Reference Architecture for Agentic Hybrid Retrieval in Dataset Search
The paper defines a bounded reference architecture for LLM-orchestrated hybrid retrieval in dataset search using BM25, dense embeddings, reciprocal rank fusion, and metadata augmentation with pseudo-queries.
-
Less Is More? When Dataset Context Hurts LLM-Generated Dataset Descriptions
An ablation study on 252 datasets finds that adding table schemas to LLM prompts consistently degrades narrative quality of generated descriptions compared to titles alone.