pith. machine review for the scientific record. sign in

hub

Test Intention Guided LLM-Based Unit Test Generation

48 Pith papers cite this work. Polarity classification is still indexing.

48 Pith papers citing it

hub tools

co-cited works

years

2026 48

clear filters

representative citing papers

Generating Complex Code Analyzers from Natural Language Questions

cs.SE · 2026-05-10 · unverdicted · novelty 7.0

Merlin generates CodeQL queries from natural language questions via RAG-based iteration and a self-test technique using assistive queries, achieving 3.8x higher task accuracy and 31% less completion time in user studies while finding additional software issues.

SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair

cs.SE · 2026-05-07 · unverdicted · novelty 7.0 · 4 refs

SmellBench is the first benchmark showing LLM agents resolve 47.7% of architectural code smells while accurately spotting false positives, but aggressive repairs often introduce new smells and degrade overall quality.

Certified Program Synthesis with a Multi-Modal Verifier

cs.SE · 2026-04-17 · unverdicted · novelty 7.0 · 2 refs

LeetProof achieves higher rates of fully certified program synthesis from natural language by using a multi-modal verifier in Lean to validate specifications via randomized testing and delegate proofs to AI tools, outperforming single-mode baselines on benchmarks while uncovering defects in prior参考.

Evaluating LLM Agents on Automated Software Analysis Tasks

cs.SE · 2026-04-13 · unverdicted · novelty 7.0

A custom LLM agent achieves 94% manually verified success on a new benchmark of 35 software analysis setups, outperforming baselines at 77%, but struggles with stage mixing, error localization, and overestimating its own success.

Generalizing Test Cases for Comprehensive Test Scenario Coverage

cs.SE · 2026-04-23 · unverdicted · novelty 6.0

TestGeneralizer generalizes an initial test into a set of executable tests covering more diverse scenarios, delivering +31.66% mutation-based and +23.08% LLM-assessed scenario coverage gains over ChatTester on 12 open-source Java projects.

citing papers explorer

Showing 3 of 3 citing papers after filters.