HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation

Ahmed Y. Radwan; Amandeep Singh; Aravind Narayanan; Ashmal Vayani; Deval Pandya; Mubarak Shah; Mukund S. Chettiar; Shaina Raza; Vahid Reza Khazaie

arxiv: 2505.11454 · v7 · pith:Q5UFIZG5new · submitted 2025-05-16 · 💻 cs.CV · cs.AI

HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation

Shaina Raza , Aravind Narayanan , Vahid Reza Khazaie , Ashmal Vayani , Ahmed Y. Radwan , Mukund S. Chettiar , Amandeep Singh , Mubarak Shah

show 1 more author

Deval Pandya

This is my paper

classification 💻 cs.CV cs.AI

keywords humanibenchmodelsalignmentmultimodalbenchmarksempathyethicsfairness

0 comments

read the original abstract

Although recent large multimodal models (LMMs) show impressive progress on vision language tasks, their alignment with human centered (HC) principles such as fairness, ethics, inclusivity, empathy, and robustness is often overlooked. Existing LMM benchmarks are largely accuracy-agnostic. We present HumaniBench, a unified framework for characterizing HC alignment across realistic, socially grounded visual contexts. It contains 32,000 expert-verified image-question pairs from real-world news imagery, each mapped to one or more HC principles through explicit metrics. Comparing 15 state of the art LMMs reveals consistent trade -offs: proprietary systems lead on ethics, reasoning, and empathy, while open-source models show superior visual grounding and resilience. All models show persistent gaps in fairness and multilingual inclusivity. Chain-of-thought prompting and test-time scaling yield 8to 12 % gains on several HC dimensions. HumaniBench enables fine-grained analysis of alignment trade-offs not captured by conventional multimodal benchmarks. https://vectorinstitute.github.io/humanibench/

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding
cs.CV 2026-05 unverdicted novelty 7.0

FineBench is a new dense VQA benchmark for fine-grained human activity understanding in long videos, revealing weaknesses in open VLMs and showing that FineAgent improves them via localization and description modules.
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding
cs.CV 2026-05 unverdicted novelty 7.0

FineBench is a new dense VQA benchmark for fine-grained human activity in long videos that exposes weaknesses in open VLMs and demonstrates gains from the proposed FineAgent modular framework.
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding
cs.CV 2026-05 unverdicted novelty 6.0

FineBench is a large-scale human-centric VQA benchmark exposing weaknesses in open VLMs for fine-grained activity understanding, with FineAgent providing a practical enhancement method.
UnBias-Plus: Detect, Explain, and Rewrite Bias
cs.CL 2026-06 unverdicted novelty 4.0

UnBias-Plus is an open-source toolkit unifying segment-level multi-class bias classification, biased span localization, neutral text rewriting, and decision reasoning.