LoFa is a new benchmark and LFR@k metric for measuring LLM resistance to sustained logical fallacy attacks via generated question-argument pairs and debate simulations.
CODEMENV : Benchmarking Large Language Models on Code Migration
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
T2J-Bench shows top coding agents achieve only 26.7-28.9% pass rate on codebase conversion under a three-stage observational equivalence check, with agents overestimating success by 66.6-97.8 points.
citing papers explorer
-
Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies
LoFa is a new benchmark and LFR@k metric for measuring LLM resistance to sustained logical fallacy attacks via generated question-argument pairs and debate simulations.