CODEMENV : Benchmarking Large Language Models on Code Migration

Cheng, Keyuan, Shen, Xudong, Yang, Yihao, Wang, Tengyue, Cao, Yang, Ali, Muhammad Asif · 2025 · DOI 10.18653/v1/2025.findings-acl.140

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

LoFa is a new benchmark and LFR@k metric for measuring LLM resistance to sustained logical fallacy attacks via generated question-argument pairs and debate simulations.

Converted, Not Equivalent: Benchmarking Codebase Conversion via Observational Equivalence

cs.SE · 2026-05-27 · unverdicted · novelty 6.0

T2J-Bench shows top coding agents achieve only 26.7-28.9% pass rate on codebase conversion under a three-stage observational equivalence check, with agents overestimating success by 66.6-97.8 points.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies cs.CL · 2026-06-30 · unverdicted · none · ref 42
LoFa is a new benchmark and LFR@k metric for measuring LLM resistance to sustained logical fallacy attacks via generated question-argument pairs and debate simulations.

CODEMENV : Benchmarking Large Language Models on Code Migration

fields

years

verdicts

representative citing papers

citing papers explorer