pith. machine review for the scientific record. sign in

arxiv: 2509.14448 · v2 · submitted 2025-09-17 · 💻 cs.AI

Recognition: unknown

VCBench: Benchmarking LLMs in Venture Capital

Authors on Pith no claims yet
classification 💻 cs.AI
keywords vcbenchventureachievesbenchmarkscapitalfounderindexllms
0
0 comments X
read the original abstract

Benchmarks such as SWE-bench and ARC-AGI demonstrate how shared datasets accelerate progress toward artificial general intelligence (AGI). We introduce VCBench, the first benchmark for predicting founder success in venture capital (VC), a domain where signals are sparse, outcomes are uncertain, and even top investors perform modestly. At inception, the market index achieves a precision of 1.9%. Y Combinator outperforms the index by a factor of 1.7x, while tier-1 firms are 2.9x better. VCBench provides 9,000 anonymized founder profiles, standardized to preserve predictive features while resisting identity leakage, with adversarial tests showing more than 90% reduction in re-identification risk. We evaluate nine state-of-the-art large language models (LLMs). DeepSeek-V3 delivers over six times the baseline precision, GPT-4o achieves the highest F0.5, and most models surpass human benchmarks. Designed as a public and evolving resource available at vcbench.com, VCBench establishes a community-driven standard for reproducible and privacy-preserving evaluation of AGI in early-stage venture forecasting.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. YC Bench: a Live Benchmark for Forecasting Startup Outperformance in Y Combinator Batches

    cs.LG 2026-04 accept novelty 6.0

    YC Bench is a new live benchmark that evaluates forecasting models for startup outperformance within YC batches using a short-term Pre-Demo Day Score derived from public traction signals.