Ko-WideSearch is a new Korean breadth-search benchmark spanning 16 categories and three difficulty tiers that evaluates web agents on full set membership plus per-item attributes, showing consistent gaps between set recovery and row completion.
arXiv preprint arXiv:2602.01355 , year =
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
QO-Bench shows RAG systems retrieve relevant text but often discard typed values required for query operators, with paradigm performance inverting across operators and execution remaining a bottleneck even with gold evidence.
citing papers explorer
-
Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents
Ko-WideSearch is a new Korean breadth-search benchmark spanning 16 categories and three difficulty tiers that evaluates web agents on full set membership plus per-item attributes, showing consistent gaps between set recovery and row completion.
-
QO-Bench: Diagnosing Query-Operator-Preserving Retrieval over Typed Event Tuples
QO-Bench shows RAG systems retrieve relevant text but often discard typed values required for query operators, with paradigm performance inverting across operators and execution remaining a bottleneck even with gold evidence.