{"paper":{"title":"Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning","license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","headline":"A three-role agentic framework with a self-evolving knowledge bank raises VLM accuracy on few-shot multimodal time series classification while generating human-readable feature explanations.","cross_cats":["cs.LG","cs.MA","cs.MM"],"primary_cat":"cs.AI","authors_text":"Boxin Li, Dan Li, Erli Meng, Jian Lou, Jiawei Huang, Lin Li, Qihao Quan, See-kiong Ng, Wenjie Feng, Xiao Zhang","submitted_at":"2026-05-10T07:47:09Z","abstract_excerpt":"In this paper, we propose the first VL$\\underline{\\textbf{M}}$ $\\underline{\\textbf{a}}$gentic $\\underline{\\textbf{r}}$easoning framework for few-$\\underline{\\textbf{s}}$hot multimodal $\\underline{\\textbf{T}}$ime $\\underline{\\textbf{S}}$eries $\\underline{\\textbf{C}}$lassification ($\\textbf{MarsTSC}$), which introduces a self-evolving knowledge bank as a dynamic context iteratively refined via reflective agentic reasoning. The framework comprises three collaborative roles: i) Generator conducts reliable classification via reasoning; ii) Reflector diagnoses the root causes of reasoning errors to "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"MarsTSC delivers substantial and consistent performance gains across 6 VLM backbones, outperforming both classical and foundation model-based time series baselines under few-shot conditions, while producing interpretable rationales that ground each classification decision in human-readable feature evidence.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the Reflector can reliably identify and the Modifier can safely incorporate temporal features overlooked by the Generator without introducing new biases or causing the knowledge bank to collapse or overfit during iterative refinement and test-time updates.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 benchmarks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A three-role agentic framework with a self-evolving knowledge bank raises VLM accuracy on few-shot multimodal time series classification while generating human-readable feature explanations.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"e3fde10e252467c71949b0f008bda373fb42c80b58d272af082c5eb750c5e680"},"source":{"id":"2605.09395","kind":"arxiv","version":2},"verdict":{"id":"a4b16415-b594-4f81-9846-d62be9b2efb3","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-12T04:36:48.006667Z","strongest_claim":"MarsTSC delivers substantial and consistent performance gains across 6 VLM backbones, outperforming both classical and foundation model-based time series baselines under few-shot conditions, while producing interpretable rationales that ground each classification decision in human-readable feature evidence.","one_line_summary":"MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 benchmarks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the Reflector can reliably identify and the Modifier can safely incorporate temporal features overlooked by the Generator without introducing new biases or causing the knowledge bank to collapse or overfit during iterative refinement and test-time updates.","pith_extraction_headline":"A three-role agentic framework with a self-evolving knowledge bank raises VLM accuracy on few-shot multimodal time series classification while generating human-readable feature explanations."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.09395/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"ai_meta_artifact","ran_at":"2026-05-19T19:36:14.105463Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T13:01:18.341397Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T10:18:19.170716Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"97151b9fd097e5b2bb5c77c47b7ab86f84daa4120ce0500c115d2f50563c2ac6"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}