Ko-WideSearch is a new Korean breadth-search benchmark spanning 16 categories and three difficulty tiers that evaluates web agents on full set membership plus per-item attributes, showing consistent gaps between set recovery and row completion.
arXiv preprint arXiv:2602.12413 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
K-BrowseComp is a new Korean web-browsing agent benchmark where frontier LLMs score 30-46% and Korean LLMs score 0-10% on the verified subset.
citing papers explorer
-
Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents
Ko-WideSearch is a new Korean breadth-search benchmark spanning 16 categories and three difficulty tiers that evaluates web agents on full set membership plus per-item attributes, showing consistent gaps between set recovery and row completion.
-
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts
K-BrowseComp is a new Korean web-browsing agent benchmark where frontier LLMs score 30-46% and Korean LLMs score 0-10% on the verified subset.