{"total":10,"items":[{"citing_arxiv_id":"2605.09675","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CodeClinic: Evaluating Automation of Coding Skills for Clinical Reasoning Agents","primary_cat":"cs.AI","submitted_at":"2026-05-10T17:45:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CodeClinic benchmark demonstrates that LLM-generated Python skill libraries from clinical guidelines enhance consistency and reduce token consumption by up to 40% compared to zero-shot approaches on MIMIC-IV based tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04313","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"NoisyCausal: A Benchmark for Evaluating Causal Reasoning Under Structured Noise","primary_cat":"cs.CL","submitted_at":"2026-05-05T21:26:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"NoisyCausal benchmark tests LLMs on causal reasoning with structured noise, and a modular LLM-plus-causal-graph framework outperforms baselines while generalizing to Cladder.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18873","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Natural Language to Executable Narsese: A Neuro-Symbolic Benchmark and Pipeline for Reasoning with NARS","primary_cat":"cs.AI","submitted_at":"2026-04-20T21:53:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A new benchmark and deterministic pipeline translate natural language reasoning into executable Narsese for NARS, with execution-based validation and initial LLM adaptation for three-label classification.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15726","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM Reasoning Is Latent, Not the Chain of Thought","primary_cat":"cs.AI","submitted_at":"2026-04-17T05:59:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[12] Lyu, Q., Havaldar, S., Stein, A., Zhang, L., Rao, D., Wong, E., Apidianaki, M., and Callison-Burch, C. (2023). Faithful chain-of-thought reasoning. arXiv preprint arXiv:2301.13379. [13] Arakelyan, E., Minervini, P., Verga, P., Lewis, P., and Augenstein, I. (2024). FLARE: Faithful logic-aided reasoning and exploration. arXiv preprint arXiv:2410.11900. [14] Pan, L., Albalak, A., Wang, X., and Wang, W. Y . (2023). Logic-LM: Empowering large language models with symbolic solvers for faithful logical reasoning. arXiv preprint arXiv:2305.12295. [15] Shi, Y ., Sun, M., Liu, Z., Yang, M., Fang, Y ., Sun, T., and Gu, X. (2026). Reasoning in trees: Improving retrieval-augmented generation for multi-hop question answering."},{"citing_arxiv_id":"2604.10341","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"VeriTrans: Fine-Tuned LLM-Assisted NL-to-PL Translation via a Deterministic Neuro-Symbolic Pipeline","primary_cat":"cs.AI","submitted_at":"2026-04-11T19:59:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"VeriTrans achieves 94.46% SAT/UNSAT correctness on SatBench via LLM translation gated by round-trip similarity and deterministic neuro-symbolic execution.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09712","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language Models","primary_cat":"cs.CV","submitted_at":"2026-04-08T06:28:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LAST augments MLLMs with a tool-abstraction sandbox and three-stage training to deliver around 20% gains on spatial reasoning tasks, outperforming closed-source models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"begun shifting from resource-intensive task-specific fine-tuning [7] toward more generalizable approaches. Such approaches include progressive model training, integrating 3D-aware encodings [37], and modular systems that enable VLMs to invoke external com- puter vision tools to perform physical operations [ 5, 12]. Recent research interests have begun to shift towards spatial reasoning in path planning [28, 40]. Tool-augmented Reasoning.A major research trend enhances LLMs by equipping them with external modules that supply com- plementary information. Typical examples integrate calculators [25, 44], code executors [10, 26] and symbolic solvers [20, 31, 32, 50], leveraging their reliability to handle complex reasoning beyond the native capacity of language models [ 30]."},{"citing_arxiv_id":"2601.20055","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning","primary_cat":"cs.CL","submitted_at":"2026-01-27T20:59:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VERGE decomposes LLM outputs into atomic claims, autoformalizes them to first-order logic, verifies with SMT solvers and consensus, and refines via minimal correction subsets, yielding 18.7% average uplift on reasoning benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.01423","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM-Assisted Tool for Joint Generation of Formulas and Functions in Rule-Based Verification of Map Transformations","primary_cat":"cs.SE","submitted_at":"2025-11-03T10:19:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LLM-assisted pipeline jointly generates logical formulas and executable predicates for rule-based verification of HD map transformations in CommonRoad, evaluated on synthetic bridge and slope scenarios.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.24765","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Semantic-Aware Logical Reasoning via a Semiotic Framework","primary_cat":"cs.AI","submitted_at":"2025-09-29T13:31:22+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LogicAgent uses a semiotic-square-guided approach to enhance logical reasoning in LLMs on the new RepublicQA benchmark and others, reporting average gains of 6.25% and 7.05% respectively.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.08332","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ORFS-agent: Tool-Using Agents for Chip Design Optimization","primary_cat":"cs.AI","submitted_at":"2025-06-10T01:38:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ORFS-agent uses LLM agents to tune parameters in chip design flows, improving geometric-mean wirelength, clock period, and co-optimization objectives by up to 2.7% over OR-AutoTuner with 40% fewer iterations on ASAP7 and SKY130HD benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}