pith:JHTW7AMA
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Preference fine-tuning on automatically generated hallucinated responses aligns vision and language modalities in large models while cutting hallucinations.
arxiv:2402.11411 v1 · 2024-02-18 · cs.LG · cs.CL · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JHTW7AMALJK4JSDR3OR2WJXT3F}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
In experiments across broad benchmarks, we show that we can not only reduce hallucinations, but improve model performance across standard benchmarks, outperforming prior approaches.
The two-stage automated generation of dispreferred responses (GPT-4V hallucination injection and image distortion) produces high-quality preference pairs that accurately reflect and correct the model's hallucination behavior when used in DPO.
POVID generates AI-created preference data to fine-tune vision-language models with DPO, reducing hallucinations and improving benchmark scores.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:14.301452Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
49e76f81805a55c4c871dba3ab26f3d953edb4d0e81b5a42139b8ca28b71f008
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JHTW7AMALJK4JSDR3OR2WJXT3F \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 49e76f81805a55c4c871dba3ab26f3d953edb4d0e81b5a42139b8ca28b71f008
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "684c4ab3db146800bb4720a51e3c130124f5c1e1b7bb92be651fce3d29e8a9af",
"cross_cats_sorted": [
"cs.CL",
"cs.CV"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2024-02-18T00:56:16Z",
"title_canon_sha256": "95495881af9bfc34909c83f89fd32e4e9c83ca55cee19e4cb282e8de90b75049"
},
"schema_version": "1.0",
"source": {
"id": "2402.11411",
"kind": "arxiv",
"version": 1
}
}