PubMedQA: A Dataset for Biomedical Research Question Answering

Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William Cohen, Xinghua Lu · 2019 · DOI 10.18653/v1/d19-1259

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open at publisher browse 7 citing papers

representative citing papers

Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation

cs.CL · 2026-05-12 · conditional · novelty 7.0 · 2 refs

Checkup2Action is a new multimodal dataset and benchmark for generating safe, prioritized action cards from real-world clinical check-up reports using large language models.

BioGraphletQA: Knowledge-Anchored Generation of Complex QA Datasets

cs.CL · 2026-04-28 · conditional · novelty 7.0

A graphlet-anchored framework generates 119,856 factually grounded biomedical QA pairs that improve accuracy on PubMedQA and MedQA benchmarks.

PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs

cs.IR · 2026-04-23 · unverdicted · novelty 7.0

PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

cs.CL · 2024-02-05 · unverdicted · novelty 7.0

M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.

Compared to What? Baselines and Metrics for Counterfactual Prompting

cs.CL · 2026-05-01 · conditional · novelty 6.0

Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.

Building evidence-based knowledge bases from full-text literature for disease-specific biomedical reasoning

cs.CE · 2026-03-30 · unverdicted · novelty 6.0

EvidenceNet releases disease-specific biomedical knowledge bases with 7,872 and 6,622 evidence records for HCC and CRC, plus graphs, extracted via LLM pipeline with reported high fidelity.

A Parameter-Efficient Transfer Learning Approach through Multitask Prompt Distillation and Decomposition for Clinical NLP

cs.CL · 2026-04-08 · unverdicted · novelty 5.0

A shared metaprompt distilled from 21 clinical tasks enables adaptation to 10 held-out datasets across five task types with under 0.05% parameters, outperforming LoRA by 1.5-1.7% and single-task prompt tuning by 6.1-6.6%.

citing papers explorer

Showing 7 of 7 citing papers.

Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation cs.CL · 2026-05-12 · conditional · none · ref 10 · 2 links
Checkup2Action is a new multimodal dataset and benchmark for generating safe, prioritized action cards from real-world clinical check-up reports using large language models.
BioGraphletQA: Knowledge-Anchored Generation of Complex QA Datasets cs.CL · 2026-04-28 · conditional · none · ref 15
A graphlet-anchored framework generates 119,856 factually grounded biomedical QA pairs that improve accuracy on PubMedQA and MedQA benchmarks.
PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs cs.IR · 2026-04-23 · unverdicted · none · ref 6
PaperMind is a new benchmark that evaluates integrated multimodal reasoning and critique over scientific papers through four complementary task families across seven domains.
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation cs.CL · 2024-02-05 · unverdicted · none · ref 100
M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.
Compared to What? Baselines and Metrics for Counterfactual Prompting cs.CL · 2026-05-01 · conditional · none · ref 123
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
Building evidence-based knowledge bases from full-text literature for disease-specific biomedical reasoning cs.CE · 2026-03-30 · unverdicted · none · ref 51
EvidenceNet releases disease-specific biomedical knowledge bases with 7,872 and 6,622 evidence records for HCC and CRC, plus graphs, extracted via LLM pipeline with reported high fidelity.
A Parameter-Efficient Transfer Learning Approach through Multitask Prompt Distillation and Decomposition for Clinical NLP cs.CL · 2026-04-08 · unverdicted · none · ref 2
A shared metaprompt distilled from 21 clinical tasks enables adaptation to 10 held-out datasets across five task types with under 0.05% parameters, outperforming LoRA by 1.5-1.7% and single-task prompt tuning by 6.1-6.6%.

PubMedQA: A Dataset for Biomedical Research Question Answering

fields

years

verdicts

representative citing papers

citing papers explorer