arxiv: 2605.02940 · v1 · submitted 2026-05-01 · 💻 cs.LG · cs.AI

Recognition: unknown

PrismAgent: Illuminating Harm in Memes via a Zero-Shot Interpretable Multi-Agent Framework

Zihan Ding , Ziyuan Yang , Yi Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:03 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords harmful meme detectionzero-shot learningmulti-agent systemsinterpretable AIcontent moderationLLM workflowmisinformation

0 comments

The pith

A four-agent LLM workflow detects harmful memes by simulating a criminal investigation in zero-shot mode.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PrismAgent, which treats meme harm detection as a structured case investigation rather than a direct classification task. Four specialized agents handle analysis by paraphrasing the meme under benevolent and malicious assumptions, investigation by pulling contextual evidence from unannotated data, prosecution by making preliminary judgments on variants, and final judgment by weighing all inputs. This staged process generates explicit reasoning at every step, eliminating the need for annotated training data that burdens traditional methods. Experiments across three public datasets indicate stronger performance than prior zero-shot approaches while making the decision process traceable.

Core claim

PrismAgent frames harmful meme detection as a criminal case with four agents: the analyst paraphrases the meme under opposing assumptions to surface intent, the investigator gathers supporting evidence and builds contextual interpretations, the prosecutor issues three independent preliminary judgments by pairing the meme with each interpretation, and the judge integrates all evidence for a final verdict. The explicit chain of reasoning renders each intermediate output visible, producing both a harm label and an interpretable trail without any task-specific training.

What carries the argument

The four-agent collaborative workflow that sequences analysis, evidence gathering, preliminary judgments, and final deliberation to produce both a verdict and its supporting chain.

If this is right

Detection becomes possible on new or emerging memes without collecting or labeling fresh training examples.
Each decision includes visible intermediate outputs that can be inspected or audited.
Performance gains over other zero-shot baselines appear across multiple public meme datasets.
The same staged structure can be applied to related content types that currently require large annotated corpora.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The workflow could reduce dependence on centralized moderation teams by surfacing traceable evidence for human review.
Similar agent divisions might address other subjective labeling tasks such as toxicity in short video clips or image captions.
If the roles prove stable across model sizes, the method offers a route to lower inference costs compared with end-to-end fine-tuned detectors.

Load-bearing premise

Large language models can reliably assume and execute the distinct specialized roles of analyst, investigator, prosecutor, and judge without fine-tuning or task-specific examples.

What would settle it

A controlled test set of memes whose harm status is independently verified by human experts but where the four agents consistently reach the wrong verdict or produce contradictory intermediate steps.

Figures

Figures reproduced from arXiv: 2605.02940 by Yi Zhang, Zihan Ding, Ziyuan Yang.

**Figure 1.** Figure 1: Pipeline comparison between trainingbased methods and our proposed approach. collect relevant contextual evidence to reason about the case and establish preliminary interpretations. Finally, a judge evaluates both the collected evidence and the investigators’ reasoning to deliver a fair and well-grounded verdict. PrismAgent instantiates each stage of the investigation process as a dedicated agent, inclu… view at source ↗

**Figure 2.** Figure 2: Overview of the proposed PrismAgent. (Qian et al., 2024; Tao et al., 2024; Hong et al., 2024; Ma et al., 2024; Zahedifar et al., 2025; Yang et al., 2025) further enable interactive collaboration and collective problem-solving. However, research on multi-agent frameworks for harmful meme detection remains limited. Although Liu et al. (2025) proposed the MIND framework, it fails to fully address the challen… view at source ↗

**Figure 3.** Figure 3: Effect of Top_k in Similar Sample Retrieval of generating Core Representation: (a) Accuracy; (b) Macro F1-score sented in [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Examples of correctly predicted harmful memes: (a) The target meme is harmful but misclassified as harmless when analyzing only the original meme; PrismAgent correctly identifies it as harmful. (b) The target meme is harmless but misclassified as harmful when analyzing only the original meme; PrismAgent correctly revises the prediction to harmless. After intent-enhanced rewriting via our framework, diver… view at source ↗

**Figure 5.** Figure 5: Examples of correctly predicted harmful memes with retrieve process: The target meme is harmless, and the analysis result is harmful when analyzed solely based on the original meme, while the analysis result becomes harmless after using PrismAgent. Original Version (A): "my sense of humor is so dark it picks cotton." Output: harmful Thought: The key contradiction between the viewpoints lies in their interp… view at source ↗

**Figure 6.** Figure 6: Examples of correctly predicted harmful memes: (a) The target meme is harmful, Although the analysis result of the benevolent rewritten meme is harmless, the final analysis result remains harmful after using PrismAgent; (b) The target meme is harmless, Although the analysis result of the malicious rewritten meme is harmful, the final analysis result remains harmless after using PrismAgent. that adequate … view at source ↗

read the original abstract

The rapid spread of memes makes harmful content detection increasingly crucial, as effective identification can curb the circulation of misinformation. However, existing methods rely heavily on high-volume annotated data, which leads to substantial training costs and limited generalization. To address these challenges, we propose PrismAgent, a zero-shot, multi-agent, interpretable framework. PrismAgent conceptualizes this task as a criminal case investigation, employing four specialized agents responsible for the analysis, investigation, prosecution, and judgment stages within a structured collaborative workflow. In the first stage, the analyst agent paraphrases each meme under benevolent and malicious assumptions to probe its underlying intent. The investigator agent then retrieves supporting evidence from an unannotated dataset and constructs contextual interpretations for the meme and its variants. Next, the prosecutor agent performs three independent preliminary judgments by pairing the original meme with each of the three interpretations. Finally, the judge agent deliberates across all evidence to render a final verdict. Moreover, PrismAgent's explicit multi-stage reasoning chain makes the model inherently interpretable, as every intermediate step is explicitly explained rather than only producing a final detection result. Extensive experiments on three public datasets show that PrismAgent significantly outperforms existing zero-shot detection methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PrismAgent's four-agent workflow gives a concrete structure for zero-shot meme analysis, but the big performance claims have no numbers or overlap checks to support them.

read the letter

The main takeaway is a staged multi-agent setup that treats harm detection like a criminal case: an analyst rephrases the meme under different assumptions, an investigator pulls evidence from an unannotated corpus, a prosecutor makes three separate judgments, and a judge weighs everything for a final call. This produces an explicit reasoning trace instead of a black-box label, which is the clearest new piece here compared to plain zero-shot prompting baselines mentioned in the abstract. The framing is straightforward and could be useful for anyone who wants traceable steps in LLM-based classification. It also targets a real pain point by skipping annotated training data. The abstract says the method beats existing zero-shot approaches on three public datasets, but supplies zero metrics, no baseline details, and no statistical tests, so that claim sits unverified. The investigator's retrieval step raises a direct issue for the zero-shot guarantee: if the unannotated dataset shares memes or similar content with the test sets, the evidence it supplies is not available to the comparison methods, which would inflate results. The paper gives no overlap checks or retrieval implementation notes, leaving that risk open. This work is aimed at people building content-moderation tools who care about interpretability without large labeled sets. A reader experimenting with agent pipelines for safety tasks could pull the workflow and test it themselves. It is worth sending for peer review because the agent roles and stages are specified enough to replicate and the underlying problem matters, but any referee will need the full experimental section, numbers, and dataset hygiene details before the claims can be taken seriously.

Referee Report

2 major / 1 minor

Summary. The paper proposes PrismAgent, a zero-shot interpretable multi-agent framework for harmful meme detection. It frames the task as a criminal investigation using four agents (analyst, investigator, prosecutor, judge) in a staged workflow: the analyst paraphrases memes under benevolent/malicious assumptions; the investigator retrieves evidence from an unannotated dataset to build contextual interpretations; the prosecutor issues preliminary judgments; and the judge delivers a final verdict. The explicit reasoning chain provides interpretability, and the abstract claims significant outperformance over existing zero-shot methods on three public datasets.

Significance. If the empirical claims hold, PrismAgent could advance zero-shot harmful content detection by offering an interpretable, annotation-free alternative that reduces training costs and improves generalization for meme-based misinformation. The multi-agent design and staged workflow represent a novel application of LLM collaboration to a socially relevant task.

major comments (2)

[Abstract] Abstract: The central claim that 'PrismAgent significantly outperforms existing zero-shot detection methods' on three public datasets is asserted without any metrics, baselines, statistical tests, implementation details, or quantitative results, making the data-to-claim link impossible to evaluate and undermining the primary empirical contribution.
[Method] PrismAgent workflow description: The investigator agent's retrieval of supporting evidence from an unannotated dataset to construct contextual interpretations for the meme and variants risks violating the zero-shot regime if the corpus overlaps with the three evaluation datasets (via shared memes or indirect leakage); no overlap checks, dataset construction details, or retrieval implementation are provided to verify the constraint.

minor comments (1)

[Abstract] Abstract: Adding one sentence on the specific datasets used and the magnitude of improvements would help readers assess the scope of the experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and outline revisions to enhance clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'PrismAgent significantly outperforms existing zero-shot detection methods' on three public datasets is asserted without any metrics, baselines, statistical tests, implementation details, or quantitative results, making the data-to-claim link impossible to evaluate and undermining the primary empirical contribution.

Authors: We agree that the abstract would benefit from greater specificity to support the empirical claims. The original abstract prioritized conciseness, but this omitted key quantitative details. In the revised manuscript, we will update the abstract to include specific performance metrics (e.g., accuracy and F1-score improvements over baselines on each dataset) and note the evaluation protocol, while preserving overall length. revision: yes
Referee: [Method] PrismAgent workflow description: The investigator agent's retrieval of supporting evidence from an unannotated dataset to construct contextual interpretations for the meme and variants risks violating the zero-shot regime if the corpus overlaps with the three evaluation datasets (via shared memes or indirect leakage); no overlap checks, dataset construction details, or retrieval implementation are provided to verify the constraint.

Authors: We appreciate this concern about maintaining the zero-shot constraint. The unannotated dataset is a separately collected corpus with no shared memes or indirect overlap with the evaluation sets, as verified through deduplication and embedding-based checks during construction. However, the original submission did not detail these procedures or the retrieval method (e.g., embedding model and similarity metric). We will add a dedicated subsection in the Methods to describe the dataset, overlap verification steps, and retrieval implementation, thereby confirming adherence to zero-shot conditions. revision: yes

Circularity Check

0 steps flagged

No circularity detected; empirical evaluation is independent of internal definitions

full rationale

The paper describes a multi-agent LLM workflow for zero-shot meme harm detection and supports its performance claims through experiments on three public datasets. No equations, parameters, or derivations appear that reduce any result to a quantity defined inside the paper by construction. The workflow stages (analyst, investigator, prosecutor, judge) are presented as a procedural framework rather than a mathematical chain, and no self-citation load-bearing arguments or fitted-input predictions are present. The central claim rests on external empirical comparison, making the analysis self-contained against the listed circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the assumption that current LLMs can execute the four specialized reasoning roles zero-shot and that the staged evidence-gathering process yields reliable harm signals.

axioms (1)

domain assumption Large language models can reliably role-play as analyst, investigator, prosecutor, and judge agents for intent probing and evidence synthesis without task-specific fine-tuning.
Invoked by the description of the four-agent workflow operating in a zero-shot regime.

invented entities (1)

PrismAgent four-agent collaborative workflow no independent evidence
purpose: To produce interpretable zero-shot harm judgments for memes
New framework introduced in the paper; no external falsifiable handle supplied beyond the claimed experiments.

pith-pipeline@v0.9.0 · 5512 in / 1255 out tokens · 34524 ms · 2026-05-09T20:03:03.895967+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

82 extracted references · 32 canonical work pages

[1]

Shad and Dimitrov, Dimitar and Martino, Giovanni Da San and Firooz, Hamed and Halevy, Alon and Silvestri, Fabrizio and Nakov, Preslav and Chakraborty, Tanmoy , year=

Sharma, Shivam and Alam, Firoj and Akhtar, Md. Shad and Dimitrov, Dimitar and Martino, Giovanni Da San and Firooz, Hamed and Halevy, Alon and Silvestri, Fabrizio and Nakov, Preslav and Chakraborty, Tanmoy , year=. Detecting and Understanding Harmful Memes: A Survey , volume=. arXiv preprint , publisher=
[2]

Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge , volume=

Velioglu, Riza and Rose, Jewgeni , year=. Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge , volume=. arXiv preprint , publisher=
[3]

A Multimodal Framework for the Detection of Hateful Memes , volume=

Lippe, Phillip and Holla, Nithin and Chandra, Shantanu and Rajamanickam, Santhosh and Antoniou, Georgios and Shutova, Ekaterina and Yannakoudakis, Helen , year=. A Multimodal Framework for the Detection of Hateful Memes , volume=. arXiv preprint , publisher=
[4]

Vilio: State-of-the-art Visio-Linguistic Models applied to Hateful Memes , volume=

Muennighoff, Niklas , year=. Vilio: State-of-the-art Visio-Linguistic Models applied to Hateful Memes , volume=. arXiv preprint , publisher=
[5]

Prompting for Multimodal Hateful Meme Classification , url=

Cao, Rui and Lee, Roy Ka-Wei and Chong, Wen-Haw and Jiang, Jing , year=. Prompting for Multimodal Hateful Meme Classification , url=. doi:10.18653/v1/2022.emnlp-main.22 , booktitle=

work page doi:10.18653/v1/2022.emnlp-main.22 2022
[6]

Shad and Nakov, Preslav and Chakraborty, Tanmoy , year=

Pramanick, Shraman and Dimitrov, Dimitar and Mukherjee, Rituparna and Sharma, Shivam and Akhtar, Md. Shad and Nakov, Preslav and Chakraborty, Tanmoy , year=. Detecting Harmful Memes and Their Targets , volume=. arXiv preprint , publisher=
[7]

Shad and Nakov, Preslav and Chakraborty, Tanmoy , year=

Pramanick, Shraman and Sharma, Shivam and Dimitrov, Dimitar and Akhtar, Md. Shad and Nakov, Preslav and Chakraborty, Tanmoy , year=. MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets , url=. doi:10.18653/v1/2021.findings-emnlp.379 , booktitle=

work page doi:10.18653/v1/2021.findings-emnlp.379 2021
[8]

Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models , url=

Lin, Hongzhan and Luo, Ziyang and Gao, Wei and Ma, Jing and Wang, Bo and Yang, Ruichao , year=. Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models , url=. doi:10.1145/3589334.3645381 , booktitle=

work page doi:10.1145/3589334.3645381
[9]

Modularized Networks for Few-shot Hateful Meme Detection , url=

Cao, Rui and Lee, Roy Ka-Wei and Jiang, Jing , year=. Modularized Networks for Few-shot Hateful Meme Detection , url=. doi:10.1145/3589334.3648145 , booktitle=

work page doi:10.1145/3589334.3648145
[10]

Towards Low-Resource Harmful Meme Detection with LMM Agents , url=

Huang, Jianzhao and Lin, Hongzhan and Ziyan, Liu and Luo, Ziyang and Chen, Guang and Ma, Jing , year=. Towards Low-Resource Harmful Meme Detection with LMM Agents , url=. doi:10.18653/v1/2024.emnlp-main.136 , booktitle=

work page doi:10.18653/v1/2024.emnlp-main.136 2024
[11]

SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification , url=

Fersini, Elisabetta and Gasparini, Francesca and Rizzi, Giulia and Saibene, Aurora and Chulvi, Berta and Rosso, Paolo and Lees, Alyssa and Sorensen, Jeffrey , year=. SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification , url=. doi:10.18653/v1/2022.semeval-1.74 , booktitle=

work page doi:10.18653/v1/2022.semeval-1.74 2022
[12]

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection , isbn =

Singh, Amanpreet and Hu, Ronghang and Goswami, Vedanuj and Couairon, Guillaume and Galuba, Wojciech and Rohrbach, Marcus and Kiela, Douwe , year=. FLAVA: A Foundational Language And Vision Alignment Model , url=. doi:10.1109/cvpr52688.2022.01519 , booktitle=

work page doi:10.1109/cvpr52688.2022.01519 2022
[13]

Return of the Devil in the Details: Delving Deep into Convolutional Nets , url=

Chatfield, Ken and Simonyan, Karen and Vedaldi, Andrea and Zisserman, Andrew , year=. Return of the Devil in the Details: Delving Deep into Convolutional Nets , url=. doi:10.5244/c.28.6 , booktitle=

work page doi:10.5244/c.28.6
[14]

Deep Residual Learning for Image Recognition , isbn =

He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , year=. Deep Residual Learning for Image Recognition , url=. doi:10.1109/cvpr.2016.90 , booktitle=

work page doi:10.1109/cvpr.2016.90 2016
[15]

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , year=. doi:10.18653/v1/n19-1423 , booktitle=

work page doi:10.18653/v1/n19-1423
[16]

An Image Equals 16x16 Words: Scaling Image Recogni:on with Transformers , url=

Beyer, Alexy and Alexy, Luca , year=. An Image Equals 16x16 Words: Scaling Image Recogni:on with Transformers , url=. doi:10.2139/ssrn.5180447 , publisher=

work page doi:10.2139/ssrn.5180447
[17]

Evaluation of an international medical E-learning course with natural language processing and machine learning , volume=

Borakati, Aditya , year=. Evaluation of an international medical E-learning course with natural language processing and machine learning , volume=. BMC Medical Education , publisher=. doi:10.1186/s12909-021-02609-8 , number=

work page doi:10.1186/s12909-021-02609-8
[18]

Finding generalizable evidence by learning to convince q&a models

Perez, Ethan and Karamcheti, Siddharth and Fergus, Rob and Weston, Jason and Kiela, Douwe and Cho, Kyunghyun , year=. Finding Generalizable Evidence by Learning to Convince Q&A Models , url=. doi:10.18653/v1/d19-1244 , booktitle=

work page doi:10.18653/v1/d19-1244
[19]

NUIG at SemEval-2020 Task 12: Pseudo Labelling for Offensive Content Classification , url=

Suryawanshi, Shardul and Arcan, Mihael and Buitelaar, Paul , year=. NUIG at SemEval-2020 Task 12: Pseudo Labelling for Offensive Content Classification , url=. doi:10.18653/v1/2020.semeval-1.208 , booktitle=

work page doi:10.18653/v1/2020.semeval-1.208 2020
[20]

On Explaining Multimodal Hateful Meme Detection Models , url=

Hee, Ming Shan and Lee, Roy Ka-Wei and Chong, Wen-Haw , year=. On Explaining Multimodal Hateful Meme Detection Models , url=. doi:10.1145/3485447.3512260 , booktitle=

work page doi:10.1145/3485447.3512260
[21]

Multimodal Learning For Hateful Memes Detection , url=

Zhou, Yi and Chen, Zhenhao and Yang, Huiyuan , year=. Multimodal Learning For Hateful Memes Detection , url=. doi:10.1109/icmew53276.2021.9455994 , booktitle=

work page doi:10.1109/icmew53276.2021.9455994 2021
[22]

Multimodal Zero-Shot Hateful Meme Detection , url=

Zhu, Jiawen and Lee, Roy Ka-Wei and Chong, Wen Haw , year=. Multimodal Zero-Shot Hateful Meme Detection , url=. doi:10.1145/3501247.3531557 , booktitle=

work page doi:10.1145/3501247.3531557
[23]

Enhance Multimodal Transformer With External Label And In-Domain Pretrain: Hateful Meme Challenge Winning Solution , volume=

Zhu, Ron , year=. Enhance Multimodal Transformer With External Label And In-Domain Pretrain: Hateful Meme Challenge Winning Solution , volume=. arXiv preprint , publisher=
[24]

Detecting Hateful Memes Using a Multimodal Deep Ensemble , volume=

Sandulescu, Vlad , year=. Detecting Hateful Memes Using a Multimodal Deep Ensemble , volume=
[25]

Disentangling Hate in Online Memes , url=

Lee, Roy Ka-Wei and Cao, Rui and Fan, Ziqing and Jiang, Jing and Chong, Wen-Haw , year=. Disentangling Hate in Online Memes , url=. doi:10.1145/3474085.3475625 , booktitle=

work page doi:10.1145/3474085.3475625
[26]

Identifying Creative Harmful Memes via Prompt based Approach , url=

Ji, Junhui and Ren, Wei and Naseem, Usman , year=. Identifying Creative Harmful Memes via Prompt based Approach , url=. doi:10.1145/3543507.3587427 , booktitle=

work page doi:10.1145/3543507.3587427
[27]

Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection , url=

Cao, Rui and Hee, Ming Shan and Kuek, Adriel and Chong, Wen-Haw and Lee, Roy Ka-Wei and Jiang, Jing , year=. Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection , url=. doi:10.1145/3581783.3612498 , booktitle=

work page doi:10.1145/3581783.3612498
[28]

ReAct: Synergizing Reasoning and Acting in Language Models , volume=

Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , year=. ReAct: Synergizing Reasoning and Acting in Language Models , volume=. arXiv preprint , publisher=
[29]

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face , volume=

Shen, Yongliang and Song, Kaitao and Tan, Xu and Li, Dongsheng and Lu, Weiming and Zhuang, Yueting , year=. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face , volume=. arXiv preprint , publisher=
[30]

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought , volume=

Mu, Yao and Zhang, Qinglong and Hu, Mengkang and Wang, Wenhai and Ding, Mingyu and Jin, Jun and Wang, Bin and Dai, Jifeng and Qiao, Yu and Luo, Ping , year=. EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought , volume=. arXiv preprint , publisher=
[31]

Expel: LLM agents are experiential learners

Zhao, Andrew and Huang, Daniel and Xu, Quentin and Lin, Matthieu and Liu, Yong-Jin and Huang, Gao , year=. ExpeL: LLM Agents Are Experiential Learners , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , publisher=. doi:10.1609/aaai.v38i17.29936 , number=

work page doi:10.1609/aaai.v38i17.29936
[32]

AdaPlanner: Adaptive Planning from Feedback with Language Models , volume=

Sun, Haotian and Zhuang, Yuchen and Kong, Lingkai and Dai, Bo and Zhang, Chao , year=. AdaPlanner: Adaptive Planning from Feedback with Language Models , volume=. arXiv preprint , publisher=
[33]

and Chao, Wei-Lun and Su, Yu , year=

Song, Chan Hee and Wu, Jiaman and Washington, Clayton and Sadler, Brian M. and Chao, Wei-Lun and Su, Yu , year=. LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models , volume=. arXiv preprint , publisher=
[34]

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning , volume=

Miao, Ning and Teh, Yee Whye and Rainforth, Tom , year=. SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning , volume=. arXiv preprint , publisher=
[35]

, title =

Park, Joon Sung and O'Brien, Joseph and Cai, Carrie Jun and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , title =. 2023 , isbn =. doi:10.1145/3586183.3606763 , booktitle =

work page doi:10.1145/3586183.3606763 2023
[36]

and Mordatch, Igor , year=

Du, Yilun and Li, Shuang and Torralba, Antonio and Tenenbaum, Joshua B. and Mordatch, Igor , year=. Improving Factuality and Reasoning in Language Models through Multiagent Debate , volume=. arXiv preprint , publisher=
[37]

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate , volume=

Liang, Tian and He, Zhiwei and Jiao, Wenxiang and Wang, Xing and Wang, Yan and Wang, Rui and Yang, Yujiu and Shi, Shuming and Tu, Zhaopeng , year=. Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate , volume=. arXiv preprint , publisher=
[38]

ChatDev: Communicative Agents for Software Development , volume=

Qian, Chen and Liu, Wei and Liu, Hongzhang and Chen, Nuo and Dang, Yufan and Li, Jiahao and Yang, Cheng and Chen, Weize and Su, Yusheng and Cong, Xin and Xu, Juyuan and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , year=. ChatDev: Communicative Agents for Software Development , volume=. arXiv preprint , publisher=
[39]

MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution , volume=

Tao, Wei and Zhou, Yucheng and Wang, Yanlin and Zhang, Wenqiang and Zhang, Hongyu and Cheng, Yu , year=. MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution , volume=. arXiv preprint , publisher=
[40]

AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks , volume=

Zeng, Yifan and Wu, Yiran and Zhang, Xiao and Wang, Huazheng and Wu, Qingyun , year=. AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks , volume=. arXiv preprint , publisher=
[41]

MARG: Multi-Agent Review Generation for Scientific Papers , volume=

D’Arcy, Mike and Hope, Tom and Birnbaum, Larry and Downey, Doug , year=. MARG: Multi-Agent Review Generation for Scientific Papers , volume=. arXiv preprint , publisher=
[42]

and Luck, Michael and Bu, Qingwen and Qing, Yuhao and Cui, Heming , year=

Huang, Dong and Zhang, Jie M. and Luck, Michael and Bu, Qingwen and Qing, Yuhao and Cui, Heming , year=. AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation , volume=. arXiv preprint , publisher=
[43]

On the Evolution of (Hateful) Memes by Means of Multimodal Contrastive Learning , volume=

Qu, Yiting and He, Xinlei and Pierson, Shannon and Backes, Michael and Zhang, Yang and Zannettou, Savvas , year=. On the Evolution of (Hateful) Memes by Means of Multimodal Contrastive Learning , volume=. arXiv preprint , publisher=
[44]

REALM: Retrieval-Augmented Language Model Pre-Training , volume=

Guu, Kelvin and Lee, Kenton and Tung, Zora and Pasupat, Panupong and Chang, Ming-Wei , year=. REALM: Retrieval-Augmented Language Model Pre-Training , volume=. arXiv preprint , publisher=
[45]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , volume=

Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and Küttler, Heinrich and Lewis, Mike and Yih, Wen-tau and Rocktäschel, Tim and Riedel, Sebastian and Kiela, Douwe , year=. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , volume=. arXiv preprint , publisher=
[46]

Retrieval-Augmented Embodied Agents

Liu, Haotian and Li, Chunyuan and Li, Yuheng and Lee, Yong Jae , year=. Improved Baselines with Visual Instruction Tuning , url=. doi:10.1109/cvpr52733.2024.02484 , booktitle=

work page doi:10.1109/cvpr52733.2024.02484 2024
[47]

Team, Gemini and Georgiev, Petko and Lei, Ving Ian and Burnell, Ryan and Bai, Libin and Gulati, Anmol and Tanzer, Garrett and Vincent, Damien and Pan, Zhufeng and Wang, Shibo and Mariooryad, Soroosh and Ding, Yifan and Geng, Xinyang and Alcober, Fred and Frostig, Roy and Omernick, Mark and Walker, Lexi and Paduraru, Cosmin and Sorokin, Christina and Tacch...
[48]

The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes , volume=

Kiela, Douwe and Firooz, Hamed and Mohan, Aravind and Goswami, Vedanuj and Singh, Amanpreet and Ringshia, Pratik and Testuggine, Davide , year=. The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes , volume=. arXiv preprint , publisher=
[49]

Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models , volume=

Lin, Hongzhan and Luo, Ziyang and Ma, Jing and Chen, Long , year=. Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models , volume=. arXiv preprint , publisher=
[50]

OpenAI and Achiam, Josh and Adler, Steven and Agarwal, Sandhini and Ahmad, Lama and Akkaya, Ilge and Aleman, Florencia Leoni and Almeida, Diogo and Altenschmidt, Janko and Altman, Sam and Anadkat, Shyamal and Avila, Red and Babuschkin, Igor and Balaji, Suchir and Balcom, Valerie and Baltescu, Paul and Bao, Haiming and Bavarian, Mohammad and Belgum, Jeff a...
[51]

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning , volume=

Dai, Wenliang and Li, Junnan and Li, Dongxu and Tiong, Anthony Meng Huat and Zhao, Junqi and Wang, Weisheng and Li, Boyang and Fung, Pascale and Hoi, Steven , year=. InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning , volume=. arXiv preprint , publisher=
[52]

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning , volume=

Chen, Jun and Zhu, Deyao and Shen, Xiaoqian and Li, Xiang and Liu, Zechun and Zhang, Pengchuan and Krishnamoorthi, Raghuraman and Chandra, Vikas and Xiong, Yunyang and Elhoseiny, Mohamed , year=. MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning , volume=. arXiv preprint , publisher=
[53]

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models , volume=

Awadalla, Anas and Gao, Irena and Gardner, Josh and Hessel, Jack and Hanafy, Yusuf and Zhu, Wanrong and Marathe, Kalyani and Bitton, Yonatan and Gadre, Samir and Sagawa, Shiori and Jitsev, Jenia and Kornblith, Simon and Koh, Pang Wei and Ilharco, Gabriel and Wortsman, Mitchell and Schmidt, Ludwig , year=. OpenFlamingo: An Open-Source Framework for Trainin...
[54]

MIND : A Multi-agent Framework for Zero-shot Harmful Meme Detection

Liu, Ziyan and Fan, Chunxiao and Lou, Haoran and Wu, Yuexin and Deng, Kaiwei. MIND : A Multi-agent Framework for Zero-shot Harmful Meme Detection. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.46

work page doi:10.18653/v1/2025.acl-long.46 2025
[55]

Large Language Models are Zero-Shot Reasoners , volume=

Kojima, Takeshi and Gu, Shixiang Shane and Reid, Machel and Matsuo, Yutaka and Iwasawa, Yusuke , year=. Large Language Models are Zero-Shot Reasoners , volume=. arXiv preprint , publisher=
[56]

Learning Transferable Visual Models From Natural Language Supervision , volume=

Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and Krueger, Gretchen and Sutskever, Ilya , year=. Learning Transferable Visual Models From Natural Language Supervision , volume=. arXiv preprint , publisher=
[57]

Hate- CLIP per: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP Features

Kumar, Gokul Karthik and Nandakumar, Karthik. Hate- CLIP per: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP Features. Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI). 2022. doi:10.18653/v1/2022.nlp4pi-1.20

work page doi:10.18653/v1/2022.nlp4pi-1.20 2022
[58]

Contrastive Instruction Fine-Tuning Large Multimodal Model for Hateful Meme Classification , volume=

Hee, Ming Shan and Gao, Zihan and Wang, Yinglong and Chu, Xiangxiang and Lee, Roy Ka-Wei and Qin, Zengchang , year=. Contrastive Instruction Fine-Tuning Large Multimodal Model for Hateful Meme Classification , volume=. doi:10.1609/icwsm.v19i1.35844 ,journal=

work page doi:10.1609/icwsm.v19i1.35844
[59]

Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions , volume=

Hee, Ming Shan and Lee, Roy Ka-Wei , year=. Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions , volume=. doi:10.1609/icwsm.v19i1.35845 ,journal=

work page doi:10.1609/icwsm.v19i1.35845
[60]

Just KIDDIN ' : Knowledge Infusion and Distillation for Detection of IN decent Memes

Garg, Rahul and Padhi, Trilok and Jain, Hemang and Kursuncu, Ugur and Kumaraguru, Ponnurangam. Just KIDDIN ' : Knowledge Infusion and Distillation for Detection of IN decent Memes. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.1184

work page doi:10.18653/v1/2025.findings-acl.1184 2025
[61]

M eme D etox N et: Balancing Toxicity Reduction and Context Preservation

Kumari, Gitanjali and Solanki, Jitendra and Ekbal, Asif. M eme D etox N et: Balancing Toxicity Reduction and Context Preservation. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.1286

work page doi:10.18653/v1/2025.findings-acl.1286 2025
[62]

LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer , volume=

Zahedifar, Rasoul and Mirghasemi, Sayyed Ali and Baghshah, Mahdieh Soleymani and Taheri, Alireza , year=. LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer , volume=. arXiv preprint , publisher=
[63]

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem , volume=

Liu, Fan and Yang, Zherui and Liu, Cancheng and Song, Tianrui and Gao, Xiaofeng and Liu, Hao , year=. MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem , volume=. arXiv preprint , publisher=
[64]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework , volume=

Hong, Sirui and Zhuge, Mingchen and Chen, Jiaqi and Zheng, Xiawu and Cheng, Yuheng and Zhang, Ceyao and Wang, Jinlin and Wang, Zili and Yau, Steven Ka Shing and Lin, Zijuan and Zhou, Liyang and Ran, Chenyu and Xiao, Lingfeng and Wu, Chenglin and Schmidhuber, Jürgen , year=. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework , volume=. arX...
[65]

Scaling Large Language Model-based Multi-Agent Collaboration , volume=

Qian, Chen and Xie, Zihao and Wang, YiFei and Liu, Wei and Zhu, Kunlun and Xia, Hanchen and Dang, Yufan and Du, Zhuoyun and Chen, Weize and Yang, Cheng and Liu, Zhiyuan and Sun, Maosong , year=. Scaling Large Language Model-based Multi-Agent Collaboration , volume=. arXiv preprint , publisher=
[66]

2024 , url=

Yubin Kim and Chanwoo Park and Hyewon Jeong and Yik Siu Chan and Xuhai Xu and Daniel McDuff and Hyeonhoon Lee and Marzyeh Ghassemi and Cynthia Breazeal and Hae Won Park , booktitle=. 2024 , url=

2024
[67]

Trustworthy Hate Speech Detection Through Visual Augmentation , volume=

Yang, Ziyuan and Yan, Ming and Chen, Yingyu and Wang, Hui and Lu, Zexin and Zhang, Yi , year=. Trustworthy Hate Speech Detection Through Visual Augmentation , volume=. arXiv preprint , publisher=
[68]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Bang, Jihwan and Ahn, Sumyeong and Lee, Jae-Gil , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =

2024
[69]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Yang, Yijun and Zhou, Tianyi and Li, Kanxue and Tao, Dapeng and Li, Lusong and Shen, Li and He, Xiaodong and Jiang, Jing and Shi, Yuhui , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =

2024
[70]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Chen, Gongwei and Shen, Leyang and Shao, Rui and Deng, Xiang and Nie, Liqiang , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =

2024
[71]

The Thirteenth International Conference on Learning Representations , year=

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage , author=. The Thirteenth International Conference on Learning Representations , year=
[72]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
[73]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Hong, Shixin and Liu, Yu and Li, Zhi and Li, Shaohui and He, You , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =

2024
[74]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , month =

Ma, Feipeng and Zhou, Yizhou and Zhang, Yueyi and Wu, Siying and Zhang, Zheyu and He, Zilong and Rao, Fengyun and Sun, Xiaoyan , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , month =. 2024 , pages =

2024
[75]

2025 , eprint=

Multi-Agent Visual-Language Reasoning for Comprehensive Highway Scene Understanding , author=. 2025 , eprint=

2025
[76]

Chiang, J

Chiang, Wei-Lin and Gonzalez, Joseph and Li, Dacheng and Li, Zhuohan and Lin, Zi and Sheng, Ying and Stoica, Ion and Wu, Zhanghao and Xing, Eric and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao , year=. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , url=. doi:10.52202/075280-2020 , booktitle=

work page doi:10.52202/075280-2020 2020
[77]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence,

Generative Agents for Multimodal Controversy Detection , author =. Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence,. 2025 , month =. doi:10.24963/ijcai.2025/1107 , url =

work page doi:10.24963/ijcai.2025/1107 2025
[78]

2026 , url =

Qwen3.5: Towards Native Multimodal Agents , author =. 2026 , url =

2026
[79]

and Shlens, Jonathon and Szegedy, Christian , year=

Goodfellow, Ian J. and Shlens, Jonathon and Szegedy, Christian , year=. Explaining and Harnessing Adversarial Examples , volume=. arXiv preprint , publisher=
[80]

Zico and Fredrikson, Matt , year=

Zou, Andy and Wang, Zifan and Carlini, Nicholas and Nasr, Milad and Kolter, J. Zico and Fredrikson, Matt , year=. Universal and Transferable Adversarial Attacks on Aligned Language Models , volume=. arXiv preprint , publisher=

Showing first 80 references.