GRAB is a benchmark dataset of 1.61M sentences from 8,247 10-K filings with taxonomy-anchored weak supervision labels for standardized evaluation of unsupervised topic models on financial risk disclosures.
Pre-training is a hot topic: Contextualized document embeddings improve topic coherence
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2verdicts
UNVERDICTED 2representative citing papers
Embeddings reliably capture authorial stylistic features in French literary texts, and these signals persist after LLM rewriting while showing model-specific patterns.
citing papers explorer
-
GRAB: A Risk Taxonomy--Grounded Benchmark for Unsupervised Topic Discovery in Financial Disclosures
GRAB is a benchmark dataset of 1.61M sentences from 8,247 10-K filings with taxonomy-anchored weak supervision labels for standardized evaluation of unsupervised topic models on financial risk disclosures.
-
Measuring Embedding Sensitivity to Authorial Style in French: Comparing Literary Texts with Language Model Rewritings
Embeddings reliably capture authorial stylistic features in French literary texts, and these signals persist after LLM rewriting while showing model-specific patterns.