WCXB provides 2,008 annotated multi-type web pages and shows extraction systems perform well on articles but diverge on structured pages.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
ScrapeGraphAI-100k releases 93,695 real telemetry examples pairing web page content with prompts, schemas, and LLM responses to support training and benchmarking of schema-constrained generation.
citing papers explorer
-
WCXB: A Multi-Type Web Content Extraction Benchmark
WCXB provides 2,008 annotated multi-type web pages and shows extraction systems perform well on articles but diverge on structured pages.
-
ScrapeGraphAI-100k: Dataset for Schema-Constrained LLM Generation
ScrapeGraphAI-100k releases 93,695 real telemetry examples pairing web page content with prompts, schemas, and LLM responses to support training and benchmarking of schema-constrained generation.