Agent-Diff benchmarks LLM agents on enterprise API tasks using code execution and state-diff contracts to define success, evaluated on nine models across 224 tasks with code released.
2026.2026 State of Software Security: Pri- oritize, Protect, Prove
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A language-integrated framework for compile-time analysis of sensitive string producers that minimizes lexical distance between secure and insecure idioms.
Generative AI is reshaping creative work with new collaboration models and audience quality limits, evidenced by platform statistics and widespread opposition to expanded AI training rights.
citing papers explorer
-
Agent-Diff: Benchmarking LLM Agents on Enterprise API Tasks via Code Execution with State-Diff-Based Evaluation
Agent-Diff benchmarks LLM agents on enterprise API tasks using code execution and state-diff contracts to define success, evaluated on nine models across 224 tasks with code released.
-
Compile-time Security Analysis and Optimization of Sensitive String Producers
A language-integrated framework for compile-time analysis of sensitive string producers that minimizes lexical distance between secure and insecure idioms.
-
Dream machine -- the next creative economy
Generative AI is reshaping creative work with new collaboration models and audience quality limits, evidenced by platform statistics and widespread opposition to expanded AI training rights.