V4FinBench is a new million-record benchmark where imbalance-aware finetuned TabPFN matches or beats gradient boosting on long-horizon bankruptcy prediction while Llama-3-8B lags, with evidence of transferable patterns to US data.
Llama 3 model card
2 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
WildGuard is a new open moderation model and dataset for LLM safety that identifies harmful prompts, risky responses, and refusal rates, achieving SOTA open-source performance and sometimes exceeding GPT-4 while cutting jailbreak success from 79.8% to 2.4%.
citing papers explorer
-
V4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy Prediction
V4FinBench is a new million-record benchmark where imbalance-aware finetuned TabPFN matches or beats gradient boosting on long-horizon bankruptcy prediction while Llama-3-8B lags, with evidence of transferable patterns to US data.
-
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
WildGuard is a new open moderation model and dataset for LLM safety that identifies harmful prompts, risky responses, and refusal rates, achieving SOTA open-source performance and sometimes exceeding GPT-4 while cutting jailbreak success from 79.8% to 2.4%.