GS-QA is a new benchmark of 2,800 QA pairs on 28 templates using OSM and Wikipedia data to evaluate LLMs on spatial predicates, multi-source reasoning, and diverse answer types including distances and counts.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6representative citing papers
Merlin generates CodeQL queries from natural language questions via RAG-based iteration and a self-test technique using assistive queries, achieving 3.8x higher task accuracy and 31% less completion time in user studies while finding additional software issues.
SPARK improves LLM-based test code fault localization by retrieving similar past faults and selectively annotating suspicious lines in new failing tests.
A RAG-enhanced LLM pipeline with segmentation improves C-to-Rust transpilation correctness and eliminates raw pointer dereferences and unsafe type casts in several Coreutils programs.
Mujica-MyGo decomposes multi-turn RAG interactions via multi-agent workflows and applies minimalist policy gradient optimization to improve performance on QA benchmarks while avoiding long-context problems.
KUBO, a dual-module platform for verified government updates and community discussions, outperforms Facebook in task speed, information recall, and user satisfaction for hyperlocal communication in the Philippines.
citing papers explorer
No citing papers match the current filters.