Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment

· 2026 · cs.CV · arXiv 2604.08212

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

General-purpose vision-language models demonstrate strong performance in everyday domains but struggle with specialized technical fields requiring precise terminology, structured reasoning, and adherence to engineering standards. This work addresses whether domain-specific instruction tuning can enable comprehensive pavement condition assessment through vision-language models. PaveInstruct, a dataset containing 278,889 image-instruction-response pairs spanning 32 task types, was created by unifying annotations from nine heterogeneous pavement datasets. PaveGPT, a pavement foundation model trained on this dataset, was evaluated against state-of-the-art vision-language models across perception, understanding, and reasoning tasks. Instruction tuning transformed model capabilities, achieving improvements exceeding 20% in spatial grounding, reasoning, and generation tasks while producing ASTM D6433-compliant outputs. These results enable transportation agencies to deploy unified conversational assessment tools that replace multiple specialized systems, simplifying workflows and reducing technical expertise requirements. The approach establishes a pathway for developing instruction-driven AI systems across infrastructure domains including bridge inspection, railway maintenance, and building condition assessment.

representative citing papers

Multi-Task Crack Foundation Model for Engineering-Reliable Crack Representation and Topology Preservation in Civil Infrastructure

cs.CV · 2026-06-04 · unverdicted · novelty 6.0

CrackGeoFM is a multi-task framework that adapts a frozen visual foundation model with FCEM, CFAM, and SMTD modules for crack mask prediction, skeleton reconstruction, and uncertainty estimation, reporting SOTA results across 20 datasets including few-shot settings.

citing papers explorer

Showing 1 of 1 citing paper.

Multi-Task Crack Foundation Model for Engineering-Reliable Crack Representation and Topology Preservation in Civil Infrastructure cs.CV · 2026-06-04 · unverdicted · none · ref 20 · internal anchor
CrackGeoFM is a multi-task framework that adapts a frozen visual foundation model with FCEM, CFAM, and SMTD modules for crack mask prediction, skeleton reconstruction, and uncertainty estimation, reporting SOTA results across 20 datasets including few-shot settings.

Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment

fields

years

verdicts

representative citing papers

citing papers explorer