Introduces a benchmark for MLLM-based chart data extraction from unlabeled images and a human-centered training framework that reaches SOTA numerical accuracy with a 7B model.
In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI '25)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LLM annotations for social science tasks vary substantially with prompt wording in interpretive cases but become more stable when majority voting is applied across multiple equivalent prompts.
citing papers explorer
-
Making Multimodal LLMs Reliable Chart Data Extractors: A Benchmark and Training Framework
Introduces a benchmark for MLLM-based chart data extraction from unlabeled images and a human-centered training framework that reaches SOTA numerical accuracy with a 7B model.
-
What Is Actually Being Annotated? Inter-Prompt Reliability as a Measurement Problem in LLM-Based Social Science Labeling
LLM annotations for social science tasks vary substantially with prompt wording in interpretive cases but become more stable when majority voting is applied across multiple equivalent prompts.