VeriTaS is the first dynamic benchmark for multimodal automated fact-checking that updates quarterly with real-world claims and a standardized scoring scheme to resist data leakage.
Cosmos: Catch- ing out-of-context misinformation with self-supervised learning
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
ReMMD presents ReMMDBench (500 samples, 2756 images, five languages, five-way veracity) and ReMMD-Agent, which achieves 41.80% accuracy and 39.12% macro-F1 on five-way classification with GPT-5.2 while cutting costs versus prior agents.
EVID-Bench supplies 222 videos across nine manipulation types in three categories and shows that frontier multimodal models reach at most 61.43% point-level accuracy when forced to use web search to identify false information.
SynCred-Bench shows that 15 MLLMs reach only 10.5% TPR, open-source detectors under 5%, commercial APIs 57.6%, and humans 63% TPR at 5% FPR when identifying AI-generated images with synthetic credibility.
Introduces claim-conditioned re-scoring (SIFT) and warranted supports proportion (WSP) metric, reporting accuracy recovery up to 27.6 points and WSP calibration at AUC 0.92 on FEVER, SciFact and other benchmarks.
T-IMPACT is a new benchmark dataset and pipeline that supplies nearly 99k manipulated image-text pairs together with a human-calibrated continuous severity signal for contextual interpretation change.
CRAVE is a new framework that clusters retrieved text and image evidence into narratives and uses an LLM judge to produce explained fact-checking verdicts.
citing papers explorer
-
VeriTaS: The First Dynamic Benchmark for Multimodal Automated Fact-Checking
VeriTaS is the first dynamic benchmark for multimodal automated fact-checking that updates quarterly with real-world claims and a standardized scoring scheme to resist data leakage.
-
ReMMD: Realistic Multilingual Multi-Image Agentic Verification for Multimodal Misinformation Detection
ReMMD presents ReMMDBench (500 samples, 2756 images, five languages, five-way veracity) and ReMMD-Agent, which achieves 41.80% accuracy and 39.12% macro-F1 on five-way classification with GPT-5.2 while cutting costs versus prior agents.
-
When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection
EVID-Bench supplies 222 videos across nine manipulation types in three categories and shows that frontier multimodal models reach at most 61.43% point-level accuracy when forced to use web search to identify false information.
-
SynCred-Bench: Benchmarking Synthetic Credibility in AI-Generated Visual Misinformation
SynCred-Bench shows that 15 MLLMs reach only 10.5% TPR, open-source detectors under 5%, commercial APIs 57.6%, and humans 63% TPR at 5% FPR when identifying AI-generated images with synthetic credibility.
-
The Warrant Gap: Claim-Conditioned Re-scoring for Fact-Checking
Introduces claim-conditioned re-scoring (SIFT) and warranted supports proportion (WSP) metric, reporting accuracy recovery up to 27.6 points and WSP calibration at AUC 0.92 on FEVER, SciFact and other benchmarks.
-
T-IMPACT: A Severity-Aware Benchmark for Contextual Image-Text Manipulation
T-IMPACT is a new benchmark dataset and pipeline that supplies nearly 99k manipulated image-text pairs together with a human-calibrated continuous severity signal for contextual interpretation change.
-
Fact-Checking with Contextual Narratives: Leveraging Retrieval-Augmented LLMs for Social Media Analysis
CRAVE is a new framework that clusters retrieved text and image evidence into narratives and uses an LLM judge to produce explained fact-checking verdicts.