minictx: Neural theorem proving with (long-)contexts

· 2024 · arXiv 2408.03350

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

FVSpec: Real-World Property-Based Tests as Lean Challenges

cs.SE · 2026-05-31 · conditional · novelty 7.0

A new benchmark of 9,415 Lean 4 specifications derived from 2,772 scraped Python property-based tests, plus a three-agent LLM transpilation pipeline and proof-generation baselines.

s2n-bignum-bench: A practical benchmark for evaluating low-level code reasoning of LLMs

cs.PL · 2026-03-15 · unverdicted · novelty 7.0

s2n-bignum-bench is a new benchmark requiring LLMs to synthesize HOL Light proofs for real-world low-level cryptographic assembly code.

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

cs.LG · 2025-02-07 · unverdicted · novelty 7.0

A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.

ImProver: Agent-Based Automated Proof Optimization

cs.AI · 2024-10-07 · unverdicted · novelty 7.0

ImProver is an LLM agent using Chain-of-States, error-correction, and retrieval to rewrite Lean proofs for arbitrary user-defined optimization criteria like shortness and readability.

How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

Non-reasoning LLMs fail the equivalence class problem while reasoning LLMs perform better but remain incomplete, with difficulty peaking at phase transition for the former and maximum diameter for the latter.

citing papers explorer

Showing 2 of 2 citing papers after filters.

ImProver: Agent-Based Automated Proof Optimization cs.AI · 2024-10-07 · unverdicted · none · ref 12
ImProver is an LLM agent using Chain-of-States, error-correction, and retrieval to rewrite Lean proofs for arbitrary user-defined optimization criteria like shortness and readability.
How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem cs.AI · 2026-05-07 · unverdicted · none · ref 37
Non-reasoning LLMs fail the equivalence class problem while reasoning LLMs perform better but remain incomplete, with difficulty peaking at phase transition for the former and maximum diameter for the latter.

minictx: Neural theorem proving with (long-)contexts

fields

years

verdicts

representative citing papers

citing papers explorer