Title resolution pending

A benchmark for long-form medical question answering · 2024 · arXiv 2411.09834

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

LLM-as-a-Judge in Healthcare: A Scoping Analysis of Applications, Methods, and Human Alignment

cs.CY · 2026-05-24 · unverdicted · novelty 6.0

Scoping review of 134 studies on LLM-as-a-Judge in healthcare finds concentration in clinical decision support and NLP, frequent use of OpenAI models with prompt engineering, and moderate-to-strong human alignment where validated.

Tuning Language Models for Robust Prediction of Diverse User Behaviors

cs.CL · 2025-05-23 · unverdicted · novelty 6.0

BehaviorLM applies progressive fine-tuning in two stages to let LLMs predict both frequent anchor and rare tail user behaviors more robustly on real-world datasets.

citing papers explorer

Showing 2 of 2 citing papers.

LLM-as-a-Judge in Healthcare: A Scoping Analysis of Applications, Methods, and Human Alignment cs.CY · 2026-05-24 · unverdicted · none · ref 55
Scoping review of 134 studies on LLM-as-a-Judge in healthcare finds concentration in clinical decision support and NLP, frequent use of OpenAI models with prompt engineering, and moderate-to-strong human alignment where validated.
Tuning Language Models for Robust Prediction of Diverse User Behaviors cs.CL · 2025-05-23 · unverdicted · none · ref 11
BehaviorLM applies progressive fine-tuning in two stages to let LLMs predict both frequent anchor and rare tail user behaviors more robustly on real-world datasets.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer