Large multilingual models pivot zero-shot multimodal learning across languages

Large multilingual models pivot zero-shot multimodal learning across languages , author= · 2023 · arXiv 2308.12038

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

cs.AI · 2024-07-01 · accept · novelty 7.0

WE-MATH benchmark reveals most LMMs rely on rote memorization for visual math while GPT-4o has shifted toward knowledge generalization.

Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models

cs.CL · 2026-06-02 · unverdicted · novelty 6.0

Adversarial images transfer across languages in MLLMs while apparent safety in weaker languages stems from comprehension and visual-grounding failures rather than genuine alignment.

Task-Aware Structured Memory for Dynamic Multi-modal In-Context Learning

cs.CV · 2026-06-10 · unverdicted · novelty 5.0

TASM proposes a task-aware structured memory framework using task-vector compression, bipartite token merging, and a Core Memory plus Latent Bank hierarchy to enable efficient dynamic multi-modal in-context learning.

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

cs.CV · 2024-08-03 · conditional · novelty 5.0

MiniCPM-Llama3-V 2.5 delivers GPT-4V-level multimodal performance on phones through architecture, pretraining, and alignment optimizations.

A Survey on Multimodal Large Language Models

cs.CV · 2023-06-23 · accept · novelty 3.0

This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models cs.CL · 2026-06-02 · unverdicted · none · ref 17
Adversarial images transfer across languages in MLLMs while apparent safety in weaker languages stems from comprehension and visual-grounding failures rather than genuine alignment.
Task-Aware Structured Memory for Dynamic Multi-modal In-Context Learning cs.CV · 2026-06-10 · unverdicted · none · ref 60
TASM proposes a task-aware structured memory framework using task-vector compression, bipartite token merging, and a Core Memory plus Latent Bank hierarchy to enable efficient dynamic multi-modal in-context learning.

Large multilingual models pivot zero-shot multimodal learning across languages

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer