TeleResilienceBench quantifies LLM reasoning resilience in telecom by measuring recovery from mid-trace errors, finding low success rates (max 29.1% CFR) and that model scale does not reliably improve performance.
Chain-of-thought prompting elicits reasoning in large language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Empirical study identifies patterns in how model classes respond to structured prompts, optimization, and other techniques across two Verilog benchmarks.
citing papers explorer
-
TeleResilienceBench: Quantifying Resilience for LLM Reasoning in Telecommunications
TeleResilienceBench quantifies LLM reasoning resilience in telecom by measuring recovery from mid-trace errors, finding low success rates (max 29.1% CFR) and that model scale does not reliably improve performance.
-
VeriInteresting: An Empirical Study of Model Prompt Interactions in Verilog Code Generation
Empirical study identifies patterns in how model classes respond to structured prompts, optimization, and other techniques across two Verilog benchmarks.