arxiv: 2604.20511 · v1 · submitted 2026-04-22 · 💻 cs.LG · cs.AI· cs.CL· cs.CV· cs.CY

Recognition: unknown

CHASM: Unveiling Covert Advertisements on Chinese Social Media

Jingyi Zheng, Tianyi Hu, Wenhan Dong, Xinlei He, Yule Liu, Zhen Sun, Zifan Peng, Zongmin Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:44 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CLcs.CVcs.CY

keywords covert advertisementsmultimodal large language modelssocial media moderationChinese social mediadataset evaluationzero-shot detectionfine-tuning

0 comments

The pith

No current multimodal large language models reliably detect covert advertisements disguised as ordinary social media posts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the CHASM dataset of 4,992 real posts from the Chinese platform Rednote to test MLLMs on spotting covert ads that blend into normal user content such as product experience shares. Under zero-shot and in-context settings the models prove unreliable at the task, while fine-tuning open-source versions brings some gains but leaves gaps in handling comment cues and visual-textual mismatches. A sympathetic reader would care because existing moderation benchmarks ignore this form of deception, allowing misleading promotions to reach consumers unchecked.

Core claim

The CHASM dataset demonstrates that none of the evaluated MLLMs achieve sufficient reliability in detecting covert advertisements under zero-shot and in-context learning conditions, although fine-tuning open-source models on the dataset produces noticeable improvements while leaving persistent difficulties in identifying subtle cues in comments and differences in visual and textual structures.

What carries the argument

The CHASM dataset of 4,992 manually curated, anonymized instances from real Rednote posts, built around challenging product experience sharing examples that closely resemble covert ads.

If this is right

Social media moderation benchmarks that omit covert ads leave an important detection gap.
Fine-tuning improves MLLM performance on this task but does not eliminate errors on subtle comment cues or structural mismatches.
Platforms need more precise methods to handle disguised promotional content that mimics ordinary posts.
Future evaluations should include error analysis of the specific failure modes observed here.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Platforms could adopt similar curated datasets to train or evaluate their own detection systems beyond off-the-shelf models.
The same evaluation approach could be applied to other languages and platforms to test whether the unreliability is widespread.
Model developers might add training emphasis on ambiguous user-generated content that mixes personal sharing with commercial intent.

Load-bearing premise

The manually curated and annotated posts accurately represent the covert advertisements that appear in everyday use on the platform.

What would settle it

An independent collection of covert advertisements from the same platform on which at least one current MLLM achieves high zero-shot accuracy.

Figures

Figures reproduced from arXiv: 2604.20511 by Jingyi Zheng, Tianyi Hu, Wenhan Dong, Xinlei He, Yule Liu, Zhen Sun, Zifan Peng, Zongmin Zhang.

**Figure 2.** Figure 2: The construction of CHASM follows a three-stage process: (1) Data collection and anonymization, (2) Committee-driven curation of guidelines and gold questions, (3) Difficulty-aware dynamic annotation workflow. These stages ensure that the dataset maintains strict privacy protection, includes challenging product-sharing examples, and achieves high-quality annotations. 2.2 Main Challenges and Guidelines Soci… view at source ↗

**Figure 3.** Figure 3: Impact of Removing Different Modalities on [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Typical Examples of Covert Advertisements [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗

**Figure 5.** Figure 5: Screenshot of The Annotation System F Demos of CHASM F.1 Screenshot of The Annotation System [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗

**Figure 6.** Figure 6: Example of the image anonymization Example 1 Chinese Text: <详细地址>的某华公寓，后面就是工业园，超级吵白天晚上都吵 Translate: The Mouhua Apartment at <detailed address> is right next to an industrial park. It’s extremely noisy both during the day and at night. Example 2 Chinese Text:虽然，但是文件要自己命名和管理才知道是什么，在哪里。ai代理的话我怎么找到呢? <网址> Translate: Although... the files need to be named and organized manually, so I know what they are and whe… view at source ↗

**Figure 7.** Figure 7: Feature Distributions of Normal Posts and Covert Advertisements [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Examples of Six Different Error Types 28 [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

read the original abstract

Current benchmarks for evaluating large language models (LLMs) in social media moderation completely overlook a serious threat: covert advertisements, which disguise themselves as regular posts to deceive and mislead consumers into making purchases, leading to significant ethical and legal concerns. In this paper, we present the CHASM, a first-of-its-kind dataset designed to evaluate the capability of Multimodal Large Language Models (MLLMs) in detecting covert advertisements on social media. CHASM is a high-quality, anonymized, manually curated dataset consisting of 4,992 instances, based on real-world scenarios from the Chinese social media platform Rednote. The dataset was collected and annotated under strict privacy protection and quality control protocols. It includes many product experience sharing posts that closely resemble covert advertisements, making the dataset particularly challenging.The results show that under both zero-shot and in-context learning settings, none of the current MLLMs are sufficiently reliable for detecting covert advertisements.Our further experiments revealed that fine-tuning open-source MLLMs on our dataset yielded noticeable performance gains. However, significant challenges persist, such as detecting subtle cues in comments and differences in visual and textual structures.We provide in-depth error analysis and outline future research directions. We hope our study can serve as a call for the research community and platform moderators to develop more precise defenses against this emerging threat.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CHASM adds a new dataset for covert ad detection on Chinese social media but its claims about MLLM limits rest on missing metrics and unclear labeling details.

read the letter

The paper's main contribution is the CHASM dataset of 4,992 Rednote posts, built to test how well multimodal models catch disguised product promotions that look like normal user content. They run zero-shot and in-context tests on several MLLMs, report poor results, then show that fine-tuning open-source models improves things but leaves gaps around subtle comment cues and visual-text mismatches. They also include some error analysis and future directions.

Referee Report

2 major / 2 minor

Summary. The paper introduces CHASM, a manually curated dataset of 4,992 instances from the Chinese platform Rednote, to benchmark MLLMs on detecting covert advertisements that disguise themselves as ordinary product-experience posts. It reports that current MLLMs are unreliable under zero-shot and in-context learning, that fine-tuning open-source models yields noticeable gains, and that persistent difficulties remain with subtle textual/visual cues and comment analysis; the work includes error analysis and future directions.

Significance. If the dataset curation is shown to be reliable and representative, the paper would usefully document a concrete limitation of current MLLMs for social-media moderation and supply a challenging, real-world benchmark that highlights the gap between general multimodal capabilities and the detection of covert commercial content. The fine-tuning results would further indicate that domain adaptation is feasible but insufficient to resolve all practical difficulties.

major comments (2)

[Dataset Construction] Dataset Construction (or equivalent section): the manuscript states that CHASM was built 'under strict privacy protection and quality control protocols' and 'includes many product experience sharing posts that closely resemble covert advertisements,' yet supplies no explicit annotation guidelines, decision criteria for labeling a post as covert versus genuine, inter-annotator agreement statistics, or sampling strategy. Because the central claim that 'none of the current MLLMs are sufficiently reliable' rests on the dataset faithfully representing real covert advertisements without selection or labeling artifacts, these omissions are load-bearing.
[Results and Evaluation] Results and Evaluation sections: the abstract asserts that zero-shot and in-context performance is insufficient and that fine-tuning produces 'noticeable performance gains,' but the provided text contains no quantitative metrics (accuracy, precision, recall, F1, or confusion matrices), no error bars, no statistical significance tests, and no comparison against simple baselines or human performance. Without these numbers it is impossible to judge whether the reported failures are practically meaningful or merely dataset-specific.

minor comments (2)

[Abstract] The abstract would be strengthened by including at least the headline quantitative results (e.g., best zero-shot F1 or accuracy) so readers can immediately gauge the scale of the claimed unreliability.
[Methods / Experiments] Notation for model names, prompt templates, and fine-tuning hyperparameters should be standardized and placed in a dedicated table or appendix for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important areas for improving the transparency of our dataset construction and the rigor of our experimental reporting. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Dataset Construction] Dataset Construction (or equivalent section): the manuscript states that CHASM was built 'under strict privacy protection and quality control protocols' and 'includes many product experience sharing posts that closely resemble covert advertisements,' yet supplies no explicit annotation guidelines, decision criteria for labeling a post as covert versus genuine, inter-annotator agreement statistics, or sampling strategy. Because the central claim that 'none of the current MLLMs are sufficiently reliable' rests on the dataset faithfully representing real covert advertisements without selection or labeling artifacts, these omissions are load-bearing.

Authors: We agree that explicit documentation of the annotation process is necessary to substantiate the dataset's reliability and representativeness. The original manuscript referenced strict protocols but did not elaborate on the specifics. In the revised version, we will add a dedicated subsection detailing the annotation guidelines, the precise decision criteria for distinguishing covert advertisements from genuine product-experience posts, the sampling strategy used to collect instances from Rednote, and inter-annotator agreement statistics (including the number of annotators and computed agreement scores). These additions will directly address the concern and allow readers to evaluate potential labeling artifacts. revision: yes
Referee: [Results and Evaluation] Results and Evaluation sections: the abstract asserts that zero-shot and in-context performance is insufficient and that fine-tuning produces 'noticeable performance gains,' but the provided text contains no quantitative metrics (accuracy, precision, recall, F1, or confusion matrices), no error bars, no statistical significance tests, and no comparison against simple baselines or human performance. Without these numbers it is impossible to judge whether the reported failures are practically meaningful or merely dataset-specific.

Authors: We acknowledge that the quantitative results require more complete and transparent presentation to support the claims. While the manuscript describes the experimental outcomes in the Results section, we will expand it in the revision to include comprehensive tables reporting accuracy, precision, recall, and F1 scores across all evaluated models and settings (zero-shot, in-context learning, and fine-tuning), along with confusion matrices, error bars or variance measures, statistical significance tests comparing conditions, comparisons against simple baselines (such as text-only classifiers or heuristic detectors), and human performance benchmarks. This will enable a clearer assessment of the practical significance of the observed limitations and gains. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset curation and MLLM evaluation independent of self-referential derivations

full rationale

The paper contains no mathematical derivations, parameter fitting, equations, or predictions derived from prior results. It is a purely empirical study involving manual dataset collection from Rednote, annotation under stated protocols, and zero-shot/in-context/fine-tuning evaluations of existing MLLMs. No self-citations serve as load-bearing premises for uniqueness theorems or ansatzes; the central claim rests on direct performance measurements on the new CHASM dataset rather than reducing to fitted inputs or renamed known patterns. The work is self-contained against external benchmarks of model reliability and does not invoke any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the quality of human annotation and the representativeness of the collected posts rather than on fitted parameters or new theoretical entities.

axioms (1)

domain assumption Manual curation under privacy protocols produces accurate labels for covert versus non-covert posts without significant annotator bias.
Dataset construction and all downstream claims depend on this unverified human judgment step.

pith-pipeline@v0.9.0 · 5565 in / 1077 out tokens · 26965 ms · 2026-05-10T00:44:28.346930+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

65 extracted references · 9 canonical work pages · 4 internal anchors

[1]

Social media definition and the governance challenge: An introduction to the special issue, 2015

Jonathan A Obar and Steve Wildman. Social media definition and the governance challenge: An introduction to the special issue, 2015

2015
[2]

How effective is social media advertising? a study of facebook social advertisements

Dawn Carmichael and David Cleave. How effective is social media advertising? a study of facebook social advertisements. In2012 International Conference for Internet Technology and Secured Transactions, pages 226–229. IEEE, 2012

2012
[3]

On the role of social media platforms in the creator economy.International Journal of Research in Marketing, 41(3):411–426, 2024

Alexander Bleier, Beth L Fossen, and Michal Shapira. On the role of social media platforms in the creator economy.International Journal of Research in Marketing, 41(3):411–426, 2024

2024
[4]

Covert marketing unmasked: A legal and regulatory guide for practices that mask marketing messages.Journal of public policy & marketing, 27(1):7–18, 2008

Ross D Petty and J Craig Andrews. Covert marketing unmasked: A legal and regulatory guide for practices that mask marketing messages.Journal of public policy & marketing, 27(1):7–18, 2008

2008
[5]

A systematic review of the relationship between covert advertising recognition and consumer attitudes

Louvins Pierre. A systematic review of the relationship between covert advertising recognition and consumer attitudes. InAmerican Academy of Advertising. Conference. Proceedings (Online), pages 99–99. American Academy of Advertising, 2023

2023
[6]

Influencer market- ing research: a systematic literature review to identify influencer marketing threats.Management Review Quarterly, pages 1–26, 2024

Seyyed Mohammadhossein Alipour, Mohammad Ghaffari, and Hamid Zare. Influencer market- ing research: a systematic literature review to identify influencer marketing threats.Management Review Quarterly, pages 1–26, 2024

2024
[7]

Exploring the role of influencers’ perceived fraud between influencers’ credibility and consumer purchase intentions

Muhammad Ahsanullah Qureshi, Sidra Shahzadi, and Tehleel Hussain. Exploring the role of influencers’ perceived fraud between influencers’ credibility and consumer purchase intentions. International Journal of Professional Business Review: Int. J. Prof. Bus. Rev., 9(1):34, 2024

2024
[8]

Before blue birds became x-tinct: Un- derstanding the effect of regime change on twitter’s advertising and compliance of advertising policies.arXiv preprint arXiv:2309.12591, 2023

Yash Vekaria, Zubair Shafiq, and Savvas Zannettou. Before blue birds became x-tinct: Un- derstanding the effect of regime change on twitter’s advertising and compliance of advertising policies.arXiv preprint arXiv:2309.12591, 2023

work page arXiv 2023
[9]

Comparing children’s and adults’ cognitive advertising competences in the netherlands.Journal of Children and Media, 4(1): 77–89, 2010

Esther Rozendaal, Moniek Buijzen, and Patti Valkenburg. Comparing children’s and adults’ cognitive advertising competences in the netherlands.Journal of Children and Media, 4(1): 77–89, 2010

2010
[10]

Key points of the administrative measures for internet advertising, May 2023

Rödl & Partner. Key points of the administrative measures for internet advertising, May 2023. URL https://www.roedl.com/insights/newsletter-china/2023-05/ key-points-administrative-measures-internet-advertising . Accessed: 2025-05- 03

2023
[11]

Dot com disclosures: Information about online advertising

Federal Trade Commission. Dot com disclosures: Information about online advertising. Technical report, Federal Trade Commission, May 2000. URL https://www.ftc.gov/sites/default/files/attachments/press-releases/ ftc-staff-issues-guidelines-internet-advertising/0005dotcomstaffreport. pdf. Accessed: 2025-05-03

2000
[12]

Analyzing the use of large language models for content moderation with chatgpt examples

Mirko Franco, Ombretta Gaggi, and Claudio E Palazzi. Analyzing the use of large language models for content moderation with chatgpt examples. InProceedings of the 3rd International Workshop on Open Challenges in Online Social Networks, pages 1–8, 2023

2023
[13]

Watch your language: Inves- tigating content moderation with large language models

Deepak Kumar, Yousef Anees AbuHashem, and Zakir Durumeric. Watch your language: Inves- tigating content moderation with large language models. InProceedings of the International AAAI Conference on Web and Social Media, volume 18, pages 865–878, 2024

2024
[14]

Zoom out and observe: News environment perception for fake news detection.arXiv preprint arXiv:2203.10885, 2022

Qiang Sheng, Juan Cao, Xueyao Zhang, Rundong Li, Danding Wang, and Yongchun Zhu. Zoom out and observe: News environment perception for fake news detection.arXiv preprint arXiv:2203.10885, 2022

work page arXiv 2022
[15]

Correcting misinformation on social media with a large language model.arXiv preprint arXiv:2403.11169, 2024

Xinyi Zhou, Ashish Sharma, Amy X Zhang, and Tim Althoff. Correcting misinformation on social media with a large language model.arXiv preprint arXiv:2403.11169, 2024

work page arXiv 2024
[16]

Content moderation, ai, and the question of scale.Big Data & Society, 7(2): 2053951720943234, 2020

Tarleton Gillespie. Content moderation, ai, and the question of scale.Big Data & Society, 7(2): 2053951720943234, 2020. 11

2020
[17]

The oversight of content moderation by ai: impact assessments and their limitations.Harv

Yifat Nahmias and Maayan Perel. The oversight of content moderation by ai: impact assessments and their limitations.Harv. J. on Legis., 58:145, 2021

2021
[18]

Machine learning techniques for hate speech classification of twitter data: State- of-the-art, future challenges and research directions.Computer Science Review, 38:100311, 2020

Femi Emmanuel Ayo, Olusegun Folorunso, Friday Thomas Ibharalu, and Idowu Ademola Osinuga. Machine learning techniques for hate speech classification of twitter data: State- of-the-art, future challenges and research directions.Computer Science Review, 38:100311, 2020

2020
[19]

Llm generated responses to mitigate the impact of hate speech

Jakub Podolak, Szymon Łukasik, Paweł Balawender, Jan Ossowski, Jan Piotrowski, Katarzyna B ˛ akowicz, and Piotr Sankowski. Llm generated responses to mitigate the impact of hate speech. arXiv preprint arXiv:2311.16905, 2023

work page arXiv 2023
[20]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

DeepSeek-V3 Technical Report

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Llava-next: Stronger llms supercharge multi- modal capabilities in the wild, May 2024

Bo Li, Kaichen Zhang, Hao Zhang, Dong Guo, Renrui Zhang, Feng Li, Yuanhan Zhang, Ziwei Liu, and Chunyuan Li. Llava-next: Stronger llms supercharge multi- modal capabilities in the wild, May 2024. URL https://llava-vl.github.io/blog/ 2024-05-10-llava-next-stronger-llms/

2024
[23]

Qwen2.5-vl, January 2025

Qwen Team. Qwen2.5-vl, January 2025. URL https://qwenlm.github.io/blog/qwen2. 5-vl/

2025
[24]

Gemini 2.5: More capable, more thoughtful

Google DeepMind. Gemini 2.5: More capable, more thoughtful. https://blog.google/ technology/google-deepmind/gemini-model-thinking-updates-march-2025/ #gemini-2-5-thinking, March 2025. Accessed: 2025-05-11

2025
[25]

Advertorials in magazines: Current use and compliance with industry guidelines.Journalism & Mass Communication Quarterly, 73(3):722–733, 1996

Glen T Cameron, Kuen-Hee Ju-Pak, and Bong-Hyun Kim. Advertorials in magazines: Current use and compliance with industry guidelines.Journalism & Mass Communication Quarterly, 73(3):722–733, 1996

1996
[26]

Beyond advertising and journalism: Hybrid promotional news discourse

Karmen Erjavec. Beyond advertising and journalism: Hybrid promotional news discourse. Discourse & Society, 15(5):553–578, 2004

2004
[27]

The covert advertising recognition and ef- fects (care) model: Processes of persuasion in native advertising and other masked formats

Bartosz W Wojdynski and Nathaniel J Evans. The covert advertising recognition and ef- fects (care) model: Processes of persuasion in native advertising and other masked formats. International Journal of Advertising, 39(1):4–31, 2020

2020
[28]

Increased persuasion knowledge of video news releases: Audience beliefs about news and support for source disclosure.Journal of Mass Media Ethics, 24(4):220–237, 2009

Michelle R Nelson, Michelle LM Wood, and Hye-Jin Paek. Increased persuasion knowledge of video news releases: Audience beliefs about news and support for source disclosure.Journal of Mass Media Ethics, 24(4):220–237, 2009

2009
[29]

A critical research on xiaohongshu for information sharing for chinese teenagers

Jinglin Tan. A critical research on xiaohongshu for information sharing for chinese teenagers. Profesional de la información, 33(1), 2024

2024
[30]

Presidio

Microsoft. Presidio. https://github.com/microsoft/presidio, 2025. Accessed: 2025- 04-25

2025
[31]

BlurFace - Face Information Masking API

Alibaba Cloud. BlurFace - Face Information Masking API. https://help.aliyun.com/ zh/viapi/developer-reference/api-face-information-masking , 2024. Accessed: 2025-05-16

2024
[32]

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, and Chong Ruan. Deepseek-vl2: Mixture-of-experts visio...

work page internal anchor Pith review arXiv 2024
[33]

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24185–24198, 2024

2024
[34]

The llama 4 herd: The beginning of a new era of natively multimodal ai innova- tion

Meta AI. The llama 4 herd: The beginning of a new era of natively multimodal ai innova- tion. https://ai.meta.com/blog/llama-4-multimodal-intelligence/ , April 2025. Accessed: 2025-05-11

2025
[35]

Qwen2.5-max: Exploring the intelligence of large-scale moe model

Qwen Team. Qwen2.5-max: Exploring the intelligence of large-scale moe model. https: //qwenlm.github.io/blog/qwen2.5-max/, January 2025. Accessed: 2025-05-11

2025
[36]

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Team GLM, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, et al. Chatglm: A family of large language models from glm-130b to glm-4 all tools.arXiv preprint arXiv:2406.12793, 2024

work page internal anchor Pith review arXiv 2024
[37]

Introducing gemini 2.0: our new ai model for the agen- tic era, December 2024

Demis Hassabis and Koray Kavukcuoglu. Introducing gemini 2.0: our new ai model for the agen- tic era, December 2024. URL https://blog.google/technology/google-deepmind/ google-gemini-ai-update-december-2024/. Accessed: 2025-04-27

2024
[38]

Qvq-max: Think with evidence

Qwen Team. Qvq-max: Think with evidence. https://qwenlm.github.io/blog/ qvq-max-preview/, 2025. Accessed: 2025-05-11

2025
[39]

Step-r1-v-mini guideline

StepFun. Step-r1-v-mini guideline. https://platform.stepfun.com/docs/guide/ reasoning, 2025. Accessed: 2025-05-11

2025
[40]

Detection and moderation of detrimental content on social media platforms: current status and future directions.Social Network Analysis and Mining, 12(1):129, 2022

Vaishali U Gongane, Mousami V Munot, and Alwin D Anuse. Detection and moderation of detrimental content on social media platforms: current status and future directions.Social Network Analysis and Mining, 12(1):129, 2022

2022
[41]

Detection of online fake news using n-gram analysis and machine learning techniques

Hadeer Ahmed, Issa Traore, and Sherif Saad. Detection of online fake news using n-gram analysis and machine learning techniques. InIntelligent, Secure, and Dependable Systems in Distributed and Cloud Environments: First International Conference, ISDDC 2017, Vancouver, BC, Canada, October 26-28, 2017, Proceedings 1, pages 127–138. Springer, 2017

2017
[42]

Th-bench: Evaluating evading attacks via humanizing ai text on machine-generated text detectors

Jingyi Zheng, Junfeng Wang, Zhen Sun, Wenhan Dong, Yule Liu, and Xinlei He. Th-bench: Evaluating evading attacks via humanizing ai text on machine-generated text detectors. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 5948–5959, 2025

2025
[43]

On the generalization ability of machine- generated text detectors.arXiv e-prints, pages arXiv–2412, 2024

Yule Liu, Zhiyuan Zhong, Yifan Liao, Zhen Sun, Jingyi Zheng, Jiaheng Wei, Qingyuan Gong, Fenghua Tong, Yang Chen, Yang Zhang, et al. On the generalization ability of machine- generated text detectors.arXiv e-prints, pages arXiv–2412, 2024

2024
[44]

Towards more robust hate speech detection: using social context and user data.Social Network Analysis and Mining, 13(1):47, 2023

Seema Nagar, Ferdous Ahmed Barbhuiya, and Kuntal Dey. Towards more robust hate speech detection: using social context and user data.Social Network Analysis and Mining, 13(1):47, 2023

2023
[45]

Cyberbullying detection from tweets using deep learning.Kybernetes, 51(9):2695–2711, 2022

Shubham Bharti, Arun Kumar Yadav, Mohit Kumar, and Divakar Yadav. Cyberbullying detection from tweets using deep learning.Kybernetes, 51(9):2695–2711, 2022

2022
[46]

Enhancing large language model capabilities for rumor detection with knowledge-powered prompting.Engineering Applications of Artificial Intelligence, 133:108259, 2024

Yeqing Yan, Peng Zheng, and Yongjun Wang. Enhancing large language model capabilities for rumor detection with knowledge-powered prompting.Engineering Applications of Artificial Intelligence, 133:108259, 2024

2024
[47]

Predicting advertisement revenue of social-media-driven content websites: Toward more efficient and sustainable social media posting.Sustainability, 14(7):4225, 2022

Szu-Chuang Li, Yu-Ching Chen, Yi-Wen Chen, and Yennun Huang. Predicting advertisement revenue of social-media-driven content websites: Toward more efficient and sustainable social media posting.Sustainability, 14(7):4225, 2022

2022
[48]

Automatic understanding of image and video advertise- ments

Zaeem Hussain, Mingda Zhang, Xiaozhong Zhang, Keren Ye, Christopher Thomas, Zuha Agha, Nathan Ong, and Adriana Kovashka. Automatic understanding of image and video advertise- ments. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1705–1715, 2017. 13

2017
[49]

Fixation prediction for advertising images: Dataset and benchmark.Journal of Visual Communication and Image Representation, 81: 103356, 2021

Song Liang, Ruihang Liu, and Jiansheng Qian. Fixation prediction for advertising images: Dataset and benchmark.Journal of Visual Communication and Image Representation, 81: 103356, 2021

2021
[50]

Iad: A benchmark dataset and a new method for illegal advertising classification

Zebo Liu, Kehan Li, Xu Tan, and Jiming Chen. Iad: A benchmark dataset and a new method for illegal advertising classification. InECAI 2020, pages 2085–2092. IOS Press, 2020

2020
[51]

Limitations

Zongmin Zhang, Yujie Han, Zhou Zhang, Yule Liu, Jingyi Zheng, and Zhen Sun. Adspectorx: A multimodal expert spector for covert advertising detection on chinese social media. In Proceedings of the Third International Workshop on Social and Metaverse Computing, Sensing and Networking, SocialMeta ’24, page 50–56, New York, NY , USA, 2024. Association for Com...

work page doi:10.1145/3698387.3700001 2024
[52]

Guidelines: • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

2025
[53]

Often include detailed product information such as price, purchase method, and product address
[54]

Frequently contain purchase links, either embedded in the image or placed in the comment section
[55]

May direct followers to join groups, message privately, or move to external platforms
[56]

Comment sections may include remarks from users pointing out that the content is an ad
[57]

Often use irrelevant but popular product tags to attract unrelated traffic
[58]

Commonly promote unknown products or counterfeit versions of well-known items
[59]

May use clickbait-style or eye-catching titles to draw attention
[60]

Tend to focus on a single product or a set of products from the same brand, rather than covering diverse items
[61]

Adopt formal or commercial-style language, while lifestyle content tends to be casual and personal
[62]

Rarely mention disadvantages; instead, ads often exaggerate product strengths
[63]

best of the year

Use exaggerated promotional phrases, such as “best of the year” or “unbeatable value.”
[64]

Brand names appear repeatedly and are visually emphasized in both text and images
[65]

Typical examples:We have summarized several common types of covert advertisements for the annotator’s reference

The product is usually the central focus, unlike non-advertising content that may highlight other themes like travel or personal experiences. Typical examples:We have summarized several common types of covert advertisements for the annotator’s reference. Covert advertisements can take various forms, such as images displaying the name of the online shop an...

2024