PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
’there has to be a lot that we’re missing’: Moderating ai-generated content on reddit
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
80% of hateful tweets remain online after five months with no higher removal rate than non-hateful content, while human-AI moderation pipelines can feasibly cut user exposure below regulatory penalty costs.
Larger Mastodon instances develop more extensive, topically diverse rules that are less readable, with consistent focus on problematic content and limited federation effects.
citing papers explorer
-
The Enforcement and Feasibility of Hate Speech Moderation on Twitter
80% of hateful tweets remain online after five months with no higher removal rate than non-hateful content, while human-AI moderation pipelines can feasibly cut user exposure below regulatory penalty costs.