o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Encoder models trained on SEC filings struggle with earnings calls due to domain shift, while LLMs enable open-ended KPI extraction with 79.7% human-verified precision on newly introduced benchmarks.
Med-Gemini sets new records on 10 of 14 medical benchmarks including 91.1% on MedQA-USMLE, beats GPT-4V by 44.5% on multimodal tasks, and surpasses humans on medical text summarization.
citing papers explorer
-
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
-
Effective Performance Measurement: Challenges and Opportunities in KPI Extraction from Earnings Calls
Encoder models trained on SEC filings struggle with earnings calls due to domain shift, while LLMs enable open-ended KPI extraction with 79.7% human-verified precision on newly introduced benchmarks.
-
Capabilities of Gemini Models in Medicine
Med-Gemini sets new records on 10 of 14 medical benchmarks including 91.1% on MedQA-USMLE, beats GPT-4V by 44.5% on multimodal tasks, and surpasses humans on medical text summarization.