pith. machine review for the scientific record. sign in

arxiv: 2604.20511 · v1 · submitted 2026-04-22 · 💻 cs.LG · cs.AI· cs.CL· cs.CV· cs.CY

Recognition: unknown

CHASM: Unveiling Covert Advertisements on Chinese Social Media

Jingyi Zheng, Tianyi Hu, Wenhan Dong, Xinlei He, Yule Liu, Zhen Sun, Zifan Peng, Zongmin Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:44 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CLcs.CVcs.CY
keywords covert advertisementsmultimodal large language modelssocial media moderationChinese social mediadataset evaluationzero-shot detectionfine-tuning
0
0 comments X

The pith

No current multimodal large language models reliably detect covert advertisements disguised as ordinary social media posts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the CHASM dataset of 4,992 real posts from the Chinese platform Rednote to test MLLMs on spotting covert ads that blend into normal user content such as product experience shares. Under zero-shot and in-context settings the models prove unreliable at the task, while fine-tuning open-source versions brings some gains but leaves gaps in handling comment cues and visual-textual mismatches. A sympathetic reader would care because existing moderation benchmarks ignore this form of deception, allowing misleading promotions to reach consumers unchecked.

Core claim

The CHASM dataset demonstrates that none of the evaluated MLLMs achieve sufficient reliability in detecting covert advertisements under zero-shot and in-context learning conditions, although fine-tuning open-source models on the dataset produces noticeable improvements while leaving persistent difficulties in identifying subtle cues in comments and differences in visual and textual structures.

What carries the argument

The CHASM dataset of 4,992 manually curated, anonymized instances from real Rednote posts, built around challenging product experience sharing examples that closely resemble covert ads.

If this is right

  • Social media moderation benchmarks that omit covert ads leave an important detection gap.
  • Fine-tuning improves MLLM performance on this task but does not eliminate errors on subtle comment cues or structural mismatches.
  • Platforms need more precise methods to handle disguised promotional content that mimics ordinary posts.
  • Future evaluations should include error analysis of the specific failure modes observed here.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Platforms could adopt similar curated datasets to train or evaluate their own detection systems beyond off-the-shelf models.
  • The same evaluation approach could be applied to other languages and platforms to test whether the unreliability is widespread.
  • Model developers might add training emphasis on ambiguous user-generated content that mixes personal sharing with commercial intent.

Load-bearing premise

The manually curated and annotated posts accurately represent the covert advertisements that appear in everyday use on the platform.

What would settle it

An independent collection of covert advertisements from the same platform on which at least one current MLLM achieves high zero-shot accuracy.

Figures

Figures reproduced from arXiv: 2604.20511 by Jingyi Zheng, Tianyi Hu, Wenhan Dong, Xinlei He, Yule Liu, Zhen Sun, Zifan Peng, Zongmin Zhang.

Figure 1
Figure 1. Figure 1: Typical examples of covert advertisement. Although it appears very similar to the common [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The construction of CHASM follows a three-stage process: (1) Data collection and anonymization, (2) Committee-driven curation of guidelines and gold questions, (3) Difficulty-aware dynamic annotation workflow. These stages ensure that the dataset maintains strict privacy protection, includes challenging product-sharing examples, and achieves high-quality annotations. 2.2 Main Challenges and Guidelines Soci… view at source ↗
Figure 3
Figure 3. Figure 3: Impact of Removing Different Modalities on [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Typical Examples of Covert Advertisements [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Screenshot of The Annotation System F Demos of CHASM F.1 Screenshot of The Annotation System [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Example of the image anonymization Example 1 Chinese Text: <详细地址>的某华公寓,后面就是工业园,超级吵白天晚上都吵 Translate: The Mouhua Apartment at <detailed address> is right next to an industrial park. It’s extremely noisy both during the day and at night. Example 2 Chinese Text:虽然,但是文件要自己命名和管理才知道是什么,在哪里。ai代理的话 我怎么找到呢? <网址> Translate: Although... the files need to be named and organized manually, so I know what they are and whe… view at source ↗
Figure 7
Figure 7. Figure 7: Feature Distributions of Normal Posts and Covert Advertisements [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Examples of Six Different Error Types 28 [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗
read the original abstract

Current benchmarks for evaluating large language models (LLMs) in social media moderation completely overlook a serious threat: covert advertisements, which disguise themselves as regular posts to deceive and mislead consumers into making purchases, leading to significant ethical and legal concerns. In this paper, we present the CHASM, a first-of-its-kind dataset designed to evaluate the capability of Multimodal Large Language Models (MLLMs) in detecting covert advertisements on social media. CHASM is a high-quality, anonymized, manually curated dataset consisting of 4,992 instances, based on real-world scenarios from the Chinese social media platform Rednote. The dataset was collected and annotated under strict privacy protection and quality control protocols. It includes many product experience sharing posts that closely resemble covert advertisements, making the dataset particularly challenging.The results show that under both zero-shot and in-context learning settings, none of the current MLLMs are sufficiently reliable for detecting covert advertisements.Our further experiments revealed that fine-tuning open-source MLLMs on our dataset yielded noticeable performance gains. However, significant challenges persist, such as detecting subtle cues in comments and differences in visual and textual structures.We provide in-depth error analysis and outline future research directions. We hope our study can serve as a call for the research community and platform moderators to develop more precise defenses against this emerging threat.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces CHASM, a manually curated dataset of 4,992 instances from the Chinese platform Rednote, to benchmark MLLMs on detecting covert advertisements that disguise themselves as ordinary product-experience posts. It reports that current MLLMs are unreliable under zero-shot and in-context learning, that fine-tuning open-source models yields noticeable gains, and that persistent difficulties remain with subtle textual/visual cues and comment analysis; the work includes error analysis and future directions.

Significance. If the dataset curation is shown to be reliable and representative, the paper would usefully document a concrete limitation of current MLLMs for social-media moderation and supply a challenging, real-world benchmark that highlights the gap between general multimodal capabilities and the detection of covert commercial content. The fine-tuning results would further indicate that domain adaptation is feasible but insufficient to resolve all practical difficulties.

major comments (2)
  1. [Dataset Construction] Dataset Construction (or equivalent section): the manuscript states that CHASM was built 'under strict privacy protection and quality control protocols' and 'includes many product experience sharing posts that closely resemble covert advertisements,' yet supplies no explicit annotation guidelines, decision criteria for labeling a post as covert versus genuine, inter-annotator agreement statistics, or sampling strategy. Because the central claim that 'none of the current MLLMs are sufficiently reliable' rests on the dataset faithfully representing real covert advertisements without selection or labeling artifacts, these omissions are load-bearing.
  2. [Results and Evaluation] Results and Evaluation sections: the abstract asserts that zero-shot and in-context performance is insufficient and that fine-tuning produces 'noticeable performance gains,' but the provided text contains no quantitative metrics (accuracy, precision, recall, F1, or confusion matrices), no error bars, no statistical significance tests, and no comparison against simple baselines or human performance. Without these numbers it is impossible to judge whether the reported failures are practically meaningful or merely dataset-specific.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by including at least the headline quantitative results (e.g., best zero-shot F1 or accuracy) so readers can immediately gauge the scale of the claimed unreliability.
  2. [Methods / Experiments] Notation for model names, prompt templates, and fine-tuning hyperparameters should be standardized and placed in a dedicated table or appendix for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important areas for improving the transparency of our dataset construction and the rigor of our experimental reporting. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Dataset Construction] Dataset Construction (or equivalent section): the manuscript states that CHASM was built 'under strict privacy protection and quality control protocols' and 'includes many product experience sharing posts that closely resemble covert advertisements,' yet supplies no explicit annotation guidelines, decision criteria for labeling a post as covert versus genuine, inter-annotator agreement statistics, or sampling strategy. Because the central claim that 'none of the current MLLMs are sufficiently reliable' rests on the dataset faithfully representing real covert advertisements without selection or labeling artifacts, these omissions are load-bearing.

    Authors: We agree that explicit documentation of the annotation process is necessary to substantiate the dataset's reliability and representativeness. The original manuscript referenced strict protocols but did not elaborate on the specifics. In the revised version, we will add a dedicated subsection detailing the annotation guidelines, the precise decision criteria for distinguishing covert advertisements from genuine product-experience posts, the sampling strategy used to collect instances from Rednote, and inter-annotator agreement statistics (including the number of annotators and computed agreement scores). These additions will directly address the concern and allow readers to evaluate potential labeling artifacts. revision: yes

  2. Referee: [Results and Evaluation] Results and Evaluation sections: the abstract asserts that zero-shot and in-context performance is insufficient and that fine-tuning produces 'noticeable performance gains,' but the provided text contains no quantitative metrics (accuracy, precision, recall, F1, or confusion matrices), no error bars, no statistical significance tests, and no comparison against simple baselines or human performance. Without these numbers it is impossible to judge whether the reported failures are practically meaningful or merely dataset-specific.

    Authors: We acknowledge that the quantitative results require more complete and transparent presentation to support the claims. While the manuscript describes the experimental outcomes in the Results section, we will expand it in the revision to include comprehensive tables reporting accuracy, precision, recall, and F1 scores across all evaluated models and settings (zero-shot, in-context learning, and fine-tuning), along with confusion matrices, error bars or variance measures, statistical significance tests comparing conditions, comparisons against simple baselines (such as text-only classifiers or heuristic detectors), and human performance benchmarks. This will enable a clearer assessment of the practical significance of the observed limitations and gains. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset curation and MLLM evaluation independent of self-referential derivations

full rationale

The paper contains no mathematical derivations, parameter fitting, equations, or predictions derived from prior results. It is a purely empirical study involving manual dataset collection from Rednote, annotation under stated protocols, and zero-shot/in-context/fine-tuning evaluations of existing MLLMs. No self-citations serve as load-bearing premises for uniqueness theorems or ansatzes; the central claim rests on direct performance measurements on the new CHASM dataset rather than reducing to fitted inputs or renamed known patterns. The work is self-contained against external benchmarks of model reliability and does not invoke any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the quality of human annotation and the representativeness of the collected posts rather than on fitted parameters or new theoretical entities.

axioms (1)
  • domain assumption Manual curation under privacy protocols produces accurate labels for covert versus non-covert posts without significant annotator bias.
    Dataset construction and all downstream claims depend on this unverified human judgment step.

pith-pipeline@v0.9.0 · 5565 in / 1077 out tokens · 26965 ms · 2026-05-10T00:44:28.346930+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 9 canonical work pages · 4 internal anchors

  1. [1]

    Social media definition and the governance challenge: An introduction to the special issue, 2015

    Jonathan A Obar and Steve Wildman. Social media definition and the governance challenge: An introduction to the special issue, 2015

  2. [2]

    How effective is social media advertising? a study of facebook social advertisements

    Dawn Carmichael and David Cleave. How effective is social media advertising? a study of facebook social advertisements. In2012 International Conference for Internet Technology and Secured Transactions, pages 226–229. IEEE, 2012

  3. [3]

    On the role of social media platforms in the creator economy.International Journal of Research in Marketing, 41(3):411–426, 2024

    Alexander Bleier, Beth L Fossen, and Michal Shapira. On the role of social media platforms in the creator economy.International Journal of Research in Marketing, 41(3):411–426, 2024

  4. [4]

    Covert marketing unmasked: A legal and regulatory guide for practices that mask marketing messages.Journal of public policy & marketing, 27(1):7–18, 2008

    Ross D Petty and J Craig Andrews. Covert marketing unmasked: A legal and regulatory guide for practices that mask marketing messages.Journal of public policy & marketing, 27(1):7–18, 2008

  5. [5]

    A systematic review of the relationship between covert advertising recognition and consumer attitudes

    Louvins Pierre. A systematic review of the relationship between covert advertising recognition and consumer attitudes. InAmerican Academy of Advertising. Conference. Proceedings (Online), pages 99–99. American Academy of Advertising, 2023

  6. [6]

    Influencer market- ing research: a systematic literature review to identify influencer marketing threats.Management Review Quarterly, pages 1–26, 2024

    Seyyed Mohammadhossein Alipour, Mohammad Ghaffari, and Hamid Zare. Influencer market- ing research: a systematic literature review to identify influencer marketing threats.Management Review Quarterly, pages 1–26, 2024

  7. [7]

    Exploring the role of influencers’ perceived fraud between influencers’ credibility and consumer purchase intentions

    Muhammad Ahsanullah Qureshi, Sidra Shahzadi, and Tehleel Hussain. Exploring the role of influencers’ perceived fraud between influencers’ credibility and consumer purchase intentions. International Journal of Professional Business Review: Int. J. Prof. Bus. Rev., 9(1):34, 2024

  8. [8]

    Before blue birds became x-tinct: Un- derstanding the effect of regime change on twitter’s advertising and compliance of advertising policies.arXiv preprint arXiv:2309.12591, 2023

    Yash Vekaria, Zubair Shafiq, and Savvas Zannettou. Before blue birds became x-tinct: Un- derstanding the effect of regime change on twitter’s advertising and compliance of advertising policies.arXiv preprint arXiv:2309.12591, 2023

  9. [9]

    Comparing children’s and adults’ cognitive advertising competences in the netherlands.Journal of Children and Media, 4(1): 77–89, 2010

    Esther Rozendaal, Moniek Buijzen, and Patti Valkenburg. Comparing children’s and adults’ cognitive advertising competences in the netherlands.Journal of Children and Media, 4(1): 77–89, 2010

  10. [10]

    Key points of the administrative measures for internet advertising, May 2023

    Rödl & Partner. Key points of the administrative measures for internet advertising, May 2023. URL https://www.roedl.com/insights/newsletter-china/2023-05/ key-points-administrative-measures-internet-advertising . Accessed: 2025-05- 03

  11. [11]

    Dot com disclosures: Information about online advertising

    Federal Trade Commission. Dot com disclosures: Information about online advertising. Technical report, Federal Trade Commission, May 2000. URL https://www.ftc.gov/sites/default/files/attachments/press-releases/ ftc-staff-issues-guidelines-internet-advertising/0005dotcomstaffreport. pdf. Accessed: 2025-05-03

  12. [12]

    Analyzing the use of large language models for content moderation with chatgpt examples

    Mirko Franco, Ombretta Gaggi, and Claudio E Palazzi. Analyzing the use of large language models for content moderation with chatgpt examples. InProceedings of the 3rd International Workshop on Open Challenges in Online Social Networks, pages 1–8, 2023

  13. [13]

    Watch your language: Inves- tigating content moderation with large language models

    Deepak Kumar, Yousef Anees AbuHashem, and Zakir Durumeric. Watch your language: Inves- tigating content moderation with large language models. InProceedings of the International AAAI Conference on Web and Social Media, volume 18, pages 865–878, 2024

  14. [14]

    Zoom out and observe: News environment perception for fake news detection.arXiv preprint arXiv:2203.10885, 2022

    Qiang Sheng, Juan Cao, Xueyao Zhang, Rundong Li, Danding Wang, and Yongchun Zhu. Zoom out and observe: News environment perception for fake news detection.arXiv preprint arXiv:2203.10885, 2022

  15. [15]

    Correcting misinformation on social media with a large language model.arXiv preprint arXiv:2403.11169, 2024

    Xinyi Zhou, Ashish Sharma, Amy X Zhang, and Tim Althoff. Correcting misinformation on social media with a large language model.arXiv preprint arXiv:2403.11169, 2024

  16. [16]

    Content moderation, ai, and the question of scale.Big Data & Society, 7(2): 2053951720943234, 2020

    Tarleton Gillespie. Content moderation, ai, and the question of scale.Big Data & Society, 7(2): 2053951720943234, 2020. 11

  17. [17]

    The oversight of content moderation by ai: impact assessments and their limitations.Harv

    Yifat Nahmias and Maayan Perel. The oversight of content moderation by ai: impact assessments and their limitations.Harv. J. on Legis., 58:145, 2021

  18. [18]

    Machine learning techniques for hate speech classification of twitter data: State- of-the-art, future challenges and research directions.Computer Science Review, 38:100311, 2020

    Femi Emmanuel Ayo, Olusegun Folorunso, Friday Thomas Ibharalu, and Idowu Ademola Osinuga. Machine learning techniques for hate speech classification of twitter data: State- of-the-art, future challenges and research directions.Computer Science Review, 38:100311, 2020

  19. [19]

    Llm generated responses to mitigate the impact of hate speech

    Jakub Podolak, Szymon Łukasik, Paweł Balawender, Jan Ossowski, Jan Piotrowski, Katarzyna B ˛ akowicz, and Piotr Sankowski. Llm generated responses to mitigate the impact of hate speech. arXiv preprint arXiv:2311.16905, 2023

  20. [20]

    GPT-4o System Card

    Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

  21. [21]

    DeepSeek-V3 Technical Report

    Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024

  22. [22]

    Llava-next: Stronger llms supercharge multi- modal capabilities in the wild, May 2024

    Bo Li, Kaichen Zhang, Hao Zhang, Dong Guo, Renrui Zhang, Feng Li, Yuanhan Zhang, Ziwei Liu, and Chunyuan Li. Llava-next: Stronger llms supercharge multi- modal capabilities in the wild, May 2024. URL https://llava-vl.github.io/blog/ 2024-05-10-llava-next-stronger-llms/

  23. [23]

    Qwen2.5-vl, January 2025

    Qwen Team. Qwen2.5-vl, January 2025. URL https://qwenlm.github.io/blog/qwen2. 5-vl/

  24. [24]

    Gemini 2.5: More capable, more thoughtful

    Google DeepMind. Gemini 2.5: More capable, more thoughtful. https://blog.google/ technology/google-deepmind/gemini-model-thinking-updates-march-2025/ #gemini-2-5-thinking, March 2025. Accessed: 2025-05-11

  25. [25]

    Advertorials in magazines: Current use and compliance with industry guidelines.Journalism & Mass Communication Quarterly, 73(3):722–733, 1996

    Glen T Cameron, Kuen-Hee Ju-Pak, and Bong-Hyun Kim. Advertorials in magazines: Current use and compliance with industry guidelines.Journalism & Mass Communication Quarterly, 73(3):722–733, 1996

  26. [26]

    Beyond advertising and journalism: Hybrid promotional news discourse

    Karmen Erjavec. Beyond advertising and journalism: Hybrid promotional news discourse. Discourse & Society, 15(5):553–578, 2004

  27. [27]

    The covert advertising recognition and ef- fects (care) model: Processes of persuasion in native advertising and other masked formats

    Bartosz W Wojdynski and Nathaniel J Evans. The covert advertising recognition and ef- fects (care) model: Processes of persuasion in native advertising and other masked formats. International Journal of Advertising, 39(1):4–31, 2020

  28. [28]

    Increased persuasion knowledge of video news releases: Audience beliefs about news and support for source disclosure.Journal of Mass Media Ethics, 24(4):220–237, 2009

    Michelle R Nelson, Michelle LM Wood, and Hye-Jin Paek. Increased persuasion knowledge of video news releases: Audience beliefs about news and support for source disclosure.Journal of Mass Media Ethics, 24(4):220–237, 2009

  29. [29]

    A critical research on xiaohongshu for information sharing for chinese teenagers

    Jinglin Tan. A critical research on xiaohongshu for information sharing for chinese teenagers. Profesional de la información, 33(1), 2024

  30. [30]

    Presidio

    Microsoft. Presidio. https://github.com/microsoft/presidio, 2025. Accessed: 2025- 04-25

  31. [31]

    BlurFace - Face Information Masking API

    Alibaba Cloud. BlurFace - Face Information Masking API. https://help.aliyun.com/ zh/viapi/developer-reference/api-face-information-masking , 2024. Accessed: 2025-05-16

  32. [32]

    DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

    Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, and Chong Ruan. Deepseek-vl2: Mixture-of-experts visio...

  33. [33]

    Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

    Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24185–24198, 2024

  34. [34]

    The llama 4 herd: The beginning of a new era of natively multimodal ai innova- tion

    Meta AI. The llama 4 herd: The beginning of a new era of natively multimodal ai innova- tion. https://ai.meta.com/blog/llama-4-multimodal-intelligence/ , April 2025. Accessed: 2025-05-11

  35. [35]

    Qwen2.5-max: Exploring the intelligence of large-scale moe model

    Qwen Team. Qwen2.5-max: Exploring the intelligence of large-scale moe model. https: //qwenlm.github.io/blog/qwen2.5-max/, January 2025. Accessed: 2025-05-11

  36. [36]

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Team GLM, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, et al. Chatglm: A family of large language models from glm-130b to glm-4 all tools.arXiv preprint arXiv:2406.12793, 2024

  37. [37]

    Introducing gemini 2.0: our new ai model for the agen- tic era, December 2024

    Demis Hassabis and Koray Kavukcuoglu. Introducing gemini 2.0: our new ai model for the agen- tic era, December 2024. URL https://blog.google/technology/google-deepmind/ google-gemini-ai-update-december-2024/. Accessed: 2025-04-27

  38. [38]

    Qvq-max: Think with evidence

    Qwen Team. Qvq-max: Think with evidence. https://qwenlm.github.io/blog/ qvq-max-preview/, 2025. Accessed: 2025-05-11

  39. [39]

    Step-r1-v-mini guideline

    StepFun. Step-r1-v-mini guideline. https://platform.stepfun.com/docs/guide/ reasoning, 2025. Accessed: 2025-05-11

  40. [40]

    Detection and moderation of detrimental content on social media platforms: current status and future directions.Social Network Analysis and Mining, 12(1):129, 2022

    Vaishali U Gongane, Mousami V Munot, and Alwin D Anuse. Detection and moderation of detrimental content on social media platforms: current status and future directions.Social Network Analysis and Mining, 12(1):129, 2022

  41. [41]

    Detection of online fake news using n-gram analysis and machine learning techniques

    Hadeer Ahmed, Issa Traore, and Sherif Saad. Detection of online fake news using n-gram analysis and machine learning techniques. InIntelligent, Secure, and Dependable Systems in Distributed and Cloud Environments: First International Conference, ISDDC 2017, Vancouver, BC, Canada, October 26-28, 2017, Proceedings 1, pages 127–138. Springer, 2017

  42. [42]

    Th-bench: Evaluating evading attacks via humanizing ai text on machine-generated text detectors

    Jingyi Zheng, Junfeng Wang, Zhen Sun, Wenhan Dong, Yule Liu, and Xinlei He. Th-bench: Evaluating evading attacks via humanizing ai text on machine-generated text detectors. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 5948–5959, 2025

  43. [43]

    On the generalization ability of machine- generated text detectors.arXiv e-prints, pages arXiv–2412, 2024

    Yule Liu, Zhiyuan Zhong, Yifan Liao, Zhen Sun, Jingyi Zheng, Jiaheng Wei, Qingyuan Gong, Fenghua Tong, Yang Chen, Yang Zhang, et al. On the generalization ability of machine- generated text detectors.arXiv e-prints, pages arXiv–2412, 2024

  44. [44]

    Towards more robust hate speech detection: using social context and user data.Social Network Analysis and Mining, 13(1):47, 2023

    Seema Nagar, Ferdous Ahmed Barbhuiya, and Kuntal Dey. Towards more robust hate speech detection: using social context and user data.Social Network Analysis and Mining, 13(1):47, 2023

  45. [45]

    Cyberbullying detection from tweets using deep learning.Kybernetes, 51(9):2695–2711, 2022

    Shubham Bharti, Arun Kumar Yadav, Mohit Kumar, and Divakar Yadav. Cyberbullying detection from tweets using deep learning.Kybernetes, 51(9):2695–2711, 2022

  46. [46]

    Enhancing large language model capabilities for rumor detection with knowledge-powered prompting.Engineering Applications of Artificial Intelligence, 133:108259, 2024

    Yeqing Yan, Peng Zheng, and Yongjun Wang. Enhancing large language model capabilities for rumor detection with knowledge-powered prompting.Engineering Applications of Artificial Intelligence, 133:108259, 2024

  47. [47]

    Predicting advertisement revenue of social-media-driven content websites: Toward more efficient and sustainable social media posting.Sustainability, 14(7):4225, 2022

    Szu-Chuang Li, Yu-Ching Chen, Yi-Wen Chen, and Yennun Huang. Predicting advertisement revenue of social-media-driven content websites: Toward more efficient and sustainable social media posting.Sustainability, 14(7):4225, 2022

  48. [48]

    Automatic understanding of image and video advertise- ments

    Zaeem Hussain, Mingda Zhang, Xiaozhong Zhang, Keren Ye, Christopher Thomas, Zuha Agha, Nathan Ong, and Adriana Kovashka. Automatic understanding of image and video advertise- ments. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1705–1715, 2017. 13

  49. [49]

    Fixation prediction for advertising images: Dataset and benchmark.Journal of Visual Communication and Image Representation, 81: 103356, 2021

    Song Liang, Ruihang Liu, and Jiansheng Qian. Fixation prediction for advertising images: Dataset and benchmark.Journal of Visual Communication and Image Representation, 81: 103356, 2021

  50. [50]

    Iad: A benchmark dataset and a new method for illegal advertising classification

    Zebo Liu, Kehan Li, Xu Tan, and Jiming Chen. Iad: A benchmark dataset and a new method for illegal advertising classification. InECAI 2020, pages 2085–2092. IOS Press, 2020

  51. [51]

    Limitations

    Zongmin Zhang, Yujie Han, Zhou Zhang, Yule Liu, Jingyi Zheng, and Zhen Sun. Adspectorx: A multimodal expert spector for covert advertising detection on chinese social media. In Proceedings of the Third International Workshop on Social and Metaverse Computing, Sensing and Networking, SocialMeta ’24, page 50–56, New York, NY , USA, 2024. Association for Com...

  52. [52]

    Guidelines: • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

  53. [53]

    Often include detailed product information such as price, purchase method, and product address

  54. [54]

    Frequently contain purchase links, either embedded in the image or placed in the comment section

  55. [55]

    May direct followers to join groups, message privately, or move to external platforms

  56. [56]

    Comment sections may include remarks from users pointing out that the content is an ad

  57. [57]

    Often use irrelevant but popular product tags to attract unrelated traffic

  58. [58]

    Commonly promote unknown products or counterfeit versions of well-known items

  59. [59]

    May use clickbait-style or eye-catching titles to draw attention

  60. [60]

    Tend to focus on a single product or a set of products from the same brand, rather than covering diverse items

  61. [61]

    Adopt formal or commercial-style language, while lifestyle content tends to be casual and personal

  62. [62]

    Rarely mention disadvantages; instead, ads often exaggerate product strengths

  63. [63]

    best of the year

    Use exaggerated promotional phrases, such as “best of the year” or “unbeatable value.”

  64. [64]

    Brand names appear repeatedly and are visually emphasized in both text and images

  65. [65]

    Typical examples:We have summarized several common types of covert advertisements for the annotator’s reference

    The product is usually the central focus, unlike non-advertising content that may highlight other themes like travel or personal experiences. Typical examples:We have summarized several common types of covert advertisements for the annotator’s reference. Covert advertisements can take various forms, such as images displaying the name of the online shop an...