pith. sign in

arxiv: 2505.11454 · v7 · pith:Q5UFIZG5new · submitted 2025-05-16 · 💻 cs.CV · cs.AI

HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation

classification 💻 cs.CV cs.AI
keywords humanibenchmodelsalignmentmultimodalbenchmarksempathyethicsfairness
0
0 comments X
read the original abstract

Although recent large multimodal models (LMMs) show impressive progress on vision language tasks, their alignment with human centered (HC) principles such as fairness, ethics, inclusivity, empathy, and robustness is often overlooked. Existing LMM benchmarks are largely accuracy-agnostic. We present HumaniBench, a unified framework for characterizing HC alignment across realistic, socially grounded visual contexts. It contains 32,000 expert-verified image-question pairs from real-world news imagery, each mapped to one or more HC principles through explicit metrics. Comparing 15 state of the art LMMs reveals consistent trade -offs: proprietary systems lead on ethics, reasoning, and empathy, while open-source models show superior visual grounding and resilience. All models show persistent gaps in fairness and multilingual inclusivity. Chain-of-thought prompting and test-time scaling yield 8to 12 % gains on several HC dimensions. HumaniBench enables fine-grained analysis of alignment trade-offs not captured by conventional multimodal benchmarks. https://vectorinstitute.github.io/humanibench/

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

    cs.CV 2026-05 unverdicted novelty 7.0

    FineBench is a new dense VQA benchmark for fine-grained human activity understanding in long videos, revealing weaknesses in open VLMs and showing that FineAgent improves them via localization and description modules.

  2. FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

    cs.CV 2026-05 unverdicted novelty 7.0

    FineBench is a new dense VQA benchmark for fine-grained human activity in long videos that exposes weaknesses in open VLMs and demonstrates gains from the proposed FineAgent modular framework.

  3. FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

    cs.CV 2026-05 unverdicted novelty 6.0

    FineBench is a large-scale human-centric VQA benchmark exposing weaknesses in open VLMs for fine-grained activity understanding, with FineAgent providing a practical enhancement method.

  4. UnBias-Plus: Detect, Explain, and Rewrite Bias

    cs.CL 2026-06 unverdicted novelty 4.0

    UnBias-Plus is an open-source toolkit unifying segment-level multi-class bias classification, biased span localization, neutral text rewriting, and decision reasoning.