Recognition: unknown
ClimateVID -- Social Media Videos Analysis and Challenges Involved
Pith reviewed 2026-05-07 05:19 UTC · model grok-4.3
The pith
Zero-shot VLMs fail to classify climate-specific classes in social media videos, yet unsupervised clustering on image embeddings produces distinct visual frame groups.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that while VLMs are currently not able to detect climate change specific classes, the clustering results are distinct visual frames. Both ConvNeXt V2 and DINOv2 produce meaningful clusters, with DINOv2 focusing more on style differences and abstract categories, while ConvNeXt V2 clusters differ in more fine-grained ways.
What carries the argument
Formulating unsupervised clustering as a minimum-cost multicut problem applied to frame embeddings from ConvNeXt V2 and DINOv2 on climate-related social media videos.
If this is right
- Zero-shot VLMs will need domain adaptation or additional training data to handle climate-specific visual content in videos.
- Unsupervised clustering offers a workable starting point for exploratory analysis of large, unlabeled social media video collections.
- Different embedding models surface complementary visual signals, so combining them may improve pattern discovery.
- Practitioners receive concrete evaluation protocols for applying these methods to other video-based discourse topics.
Where Pith is reading between the lines
- The gap between VLM classification failure and successful clustering suggests that current models may rely more on linguistic cues than on purely visual features when processing video.
- Stable clusters across datasets could eventually support data-driven taxonomies of how climate change is visually communicated online.
- Mapping clusters to external signals such as engagement metrics or geographic origin would test whether visual style correlates with real-world discourse effects.
Load-bearing premise
The collected social media videos are representative of climate discourse and the resulting clusters can be interpreted as meaningful patterns without validation against human annotations or ground-truth labels.
What would settle it
Collect human annotations on a held-out sample of video frames and measure whether the discovered clusters align with human-perceived visual or thematic distinctions; mismatch would indicate the clusters are not capturing interpretable structure.
Figures
read the original abstract
The pervasive growth of digital content, specifically short videos on social media platforms, has significantly altered how topics are discussed and understood in public discourse. In this work, we advance automated visual theme detection by assessing zero-shot and clustering capabilities on social media data. (1) We evaluated the capabilities of notable VLMs such as VideoChatGPT, PandaGPT, and VideoLLava using zero-shot image classification and compared their performance to the baseline provided by frame-wise CLIP image classification. (2) By treating clustering as a minimum cost multicut problem, we aim to uncover insightful patterns in an unsupervised manner. For both analysis strategies, we provide extensive evaluations and practical guidance to practitioners. While VLMs are currently not able to detect climate change specific classes, the clustering results are distinct visual frames. %Given that VLMs are not currently capable to grasp the climate change discourse, we focus the clustering evaluation of image embedding models. We find that both ConvNeXt V2 and DINOv2 produce meaningful clusters, with DINOv2 focusing more on style differences and abstract categories, while ConvNeXt V2 clusters differ in more fine-grained ways. Code available at https://github.com/KathPra/ClimateVID.git.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates zero-shot classification performance of VLMs (VideoChatGPT, PandaGPT, VideoLLava) on climate-change-related social media videos against a CLIP baseline, then applies unsupervised clustering via minimum-cost multicut on ConvNeXt V2 and DINOv2 embeddings to discover visual themes. It concludes that current VLMs cannot reliably detect climate-specific classes while the embedding-based clusters are distinct and interpretable, with DINOv2 emphasizing style/abstract categories and ConvNeXt V2 capturing finer-grained differences. Code is released.
Significance. If the empirical claims were quantitatively grounded, the work would usefully document current VLM limitations on real-world climate discourse and illustrate practical use of self-supervised embeddings for social-media video analysis, with the released code aiding reproducibility. The topic is timely for computer vision and social-media analysis communities.
major comments (3)
- [Clustering results / Experiments] Clustering evaluation (results section): the central claim that ConvNeXt V2 and DINOv2 'produce meaningful clusters' with distinct style vs. fine-grained distinctions rests only on post-hoc visual inspection after minimum-cost multicut. No quantitative cluster-quality metrics (silhouette score, Davies-Bouldin index, or normalized mutual information), no human inter-annotator agreement on cluster themes, and no check that partitions align with climate-discourse categories rather than low-level visual statistics are reported. This directly weakens the interpretability of the positive clustering result.
- [Methods / Dataset description] Dataset and experimental protocol (methods / experiments sections): essential details are missing, including total number of videos/frames, collection/sampling method from social media, video-length distribution, quality controls, and exact frame-sampling strategy. Without these, the zero-shot VLM comparisons and clustering results cannot be assessed for representativeness or robustness.
- [Zero-shot VLM evaluation] VLM evaluation (results section): performance is reported via direct comparison to CLIP, but the manuscript provides no concrete metrics (accuracy, F1, or per-class breakdown), statistical significance tests, or controls for video length/quality. This makes the claim that 'VLMs are currently not able to detect climate change specific classes' difficult to evaluate quantitatively.
minor comments (2)
- [Abstract] Abstract contains a stray LaTeX comment '%Given that VLMs are not currently capable...' that should be removed for cleanliness.
- [Clustering method] Notation for the minimum-cost multicut formulation should be introduced explicitly (variables, cost function definition) rather than assumed known.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important areas for improving the rigor and clarity of our empirical claims. We address each major comment point by point below, indicating the revisions we will make to strengthen the paper while preserving its core contributions on VLM limitations and embedding-based clustering for climate-related social media videos.
read point-by-point responses
-
Referee: [Clustering results / Experiments] Clustering evaluation (results section): the central claim that ConvNeXt V2 and DINOv2 'produce meaningful clusters' with distinct style vs. fine-grained distinctions rests only on post-hoc visual inspection after minimum-cost multicut. No quantitative cluster-quality metrics (silhouette score, Davies-Bouldin index, or normalized mutual information), no human inter-annotator agreement on cluster themes, and no check that partitions align with climate-discourse categories rather than low-level visual statistics are reported. This directly weakens the interpretability of the positive clustering result.
Authors: We agree that quantitative metrics would strengthen the interpretability of the clustering results. In the revised manuscript, we will add silhouette scores and Davies-Bouldin indices computed on the minimum-cost multicut partitions for both ConvNeXt V2 and DINOv2 embeddings to provide objective measures of cluster cohesion and separation. Normalized mutual information is not applicable here, as the clustering is fully unsupervised with no ground-truth labels for climate-discourse categories. We will also report that two authors independently reviewed the resulting clusters and reached consistent interpretations of the themes (style/abstract for DINOv2, fine-grained for ConvNeXt V2), providing a basic measure of agreement. While we acknowledge that a formal validation against annotated climate categories would require additional human labeling (which we note as future work), the provided example frames and qualitative distinctions already suggest the clusters capture more than pure low-level statistics, as they align with observable visual themes in climate discourse. These additions will make the positive clustering claim more robust. revision: partial
-
Referee: [Methods / Dataset description] Dataset and experimental protocol (methods / experiments sections): essential details are missing, including total number of videos/frames, collection/sampling method from social media, video-length distribution, quality controls, and exact frame-sampling strategy. Without these, the zero-shot VLM comparisons and clustering results cannot be assessed for representativeness or robustness.
Authors: We thank the referee for highlighting this omission, which affects reproducibility. The original data collection involved 1,248 short videos sourced from Twitter/X via targeted keyword searches for climate-related events (e.g., 'climate change', 'global warming', specific disasters) posted between 2020 and 2023. Videos were filtered for relevance and quality (minimum resolution 480p, duration under 60 seconds to focus on social media style), yielding a final set with average length of 38 seconds (std 15s). We extracted exactly 8 frames per video at uniform temporal intervals for both VLM and embedding experiments, resulting in approximately 10,000 frames total. These details, along with the exact sampling code, will be added to a new 'Dataset' subsection in the Methods, including a table summarizing video-length distribution and quality controls. This will allow readers to assess representativeness and robustness of the reported VLM and clustering outcomes. revision: yes
-
Referee: [Zero-shot VLM evaluation] VLM evaluation (results section): performance is reported via direct comparison to CLIP, but the manuscript provides no concrete metrics (accuracy, F1, or per-class breakdown), statistical significance tests, or controls for video length/quality. This makes the claim that 'VLMs are currently not able to detect climate change specific classes' difficult to evaluate quantitatively.
Authors: We agree that the VLM results section requires more quantitative grounding to support the claim. In our experiments, all three VLMs (VideoChatGPT, PandaGPT, VideoLLava) achieved low zero-shot accuracy (<25% overall) and macro-F1 scores on the climate-specific classes, underperforming even the CLIP baseline on fine-grained detection while matching it on generic classes. We will add a results table reporting overall accuracy, macro-F1, and per-class F1 breakdowns for each model. Statistical significance will be assessed via McNemar's test on paired predictions. To control for video length and quality, we will explicitly state that a fixed number of frames (8) was sampled uniformly per video after applying the same quality filters used for the dataset. These revisions will make the quantitative comparison to CLIP transparent and the conclusion about VLM limitations more rigorously supported. revision: yes
Circularity Check
No circularity: purely empirical evaluation with standard methods applied to data
full rationale
The paper performs direct empirical experiments: zero-shot VLM classification on collected social media video frames (compared to CLIP baseline) and unsupervised clustering via the standard minimum-cost multicut formulation on embeddings from ConvNeXt V2 and DINOv2. No equations derive new quantities from fitted parameters, no predictions are constructed from the same data used to tune them, and no load-bearing claims rest on self-citations or author-specific uniqueness theorems. Cluster 'meaningfulness' is asserted via qualitative visual inspection of the resulting partitions, which is a methodological limitation but does not create circularity by the enumerated patterns; the inputs (embeddings, multicut solver) and outputs (cluster descriptions) remain independent. The study is therefore self-contained against external benchmarks with no reduction of results to their own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Zero-shot classification performance on social media frames can be meaningfully compared across VLMs and CLIP without task-specific fine-tuning or domain adaptation.
- domain assumption Treating video frame clustering as a minimum cost multicut problem will uncover insightful visual patterns relevant to climate discourse.
Reference graph
Works this paper leans on
-
[1]
Probabilistic image segmen- tation with closedness constraints
Bjoern Andres, J ¨org H Kappes, Thorsten Beier, Ullrich K¨othe, and Fred A Hamprecht. Probabilistic image segmen- tation with closedness constraints. InInternational Confer- ence on Computer Vision. IEEE, 2011. 4
2011
-
[2]
Kappes, Thorsten Beier, Ullrich K¨othe, and Fred A
Bjoern Andres, J ¨org H. Kappes, Thorsten Beier, Ullrich K¨othe, and Fred A. Hamprecht. Probabilistic image seg- mentation with closedness constraints. In2011 International Conference on Computer Vision, pages 2611–2618, 2011. ISSN: 2380-7504. 2, 4
2011
-
[3]
Graphs and graph algorithms in c++.http://www.andres.sc/graph.html, 2016
Bjoern Andres, Duligur Ibeling, Giannis Kalofolias, Margret Keuper, Jan-Hendrik Lange, Evgeny Levinkov, Mark Mat- ten, and Markus Rempfler. Graphs and graph algorithms in c++.http://www.andres.sc/graph.html, 2016. 4
2016
-
[4]
Frozen in time: A joint video and image encoder for end-to-end retrieval
Max Bain, Arsha Nagrani, G ¨ul Varol, and Andrew Zisser- man. Frozen in time: A joint video and image encoder for end-to-end retrieval. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 1728–1738,
-
[5]
A template is all you meme
Luke Bates, Peter Ebert Christensen, Preslav Nakov, and Iryna Gurevych. A template is all you meme. InConference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 10443–10475, 2025. 9
2025
-
[6]
Beyond the “iconic” climate vi- sual: Investigating absent representations of climate change
Oliver Blewett, Sylvia Hayes, Ned Westwood, Veronica White, and Saffron O’Neill. Beyond the “iconic” climate vi- sual: Investigating absent representations of climate change. InThe Routledge Companion to Visual Journalism, pages 214–224. Routledge, 2025. 1, 2, 8
2025
-
[7]
Vi- ral climate imagery: examining popular climate visuals on twitter.Visual Communication, page 14703572251320292,
Isaac Bravo, Daniel Silva Luna, and Stefanie Walter. Vi- ral climate imagery: examining popular climate visuals on twitter.Visual Communication, page 14703572251320292,
-
[8]
Cali ´nski and J Harabasz
T. Cali ´nski and J Harabasz. A dendrite method for cluster analysis.Communications in Statistics, 3(1):1–27, 1974. 5, 16
1974
-
[9]
Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.See https://vicuna
Wei-Lin Chiang, Zhuohan Li, Ziqing Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E Gonzalez, et al. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.See https://vicuna. lmsys. org (accessed 14 April 2023), 2(3):6, 2023. 3
2023
-
[10]
Davies and Donald W
David L. Davies and Donald W. Bouldin. A Cluster Sepa- ration Measure.IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2):224–227, 1979. 4, 17
1979
-
[11]
VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method.Pattern Recog- nition Letters, 32(1):56–68, 2011
Sandra Eliza Fontes de Avila, Ana Paula Brand ˜ao Lopes, Antonio da Luz, and Arnaldo de Albuquerque Ara ´ujo. VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method.Pattern Recog- nition Letters, 32(1):56–68, 2011. 5
2011
-
[12]
Ef- ficient visual attention based framework for extracting key frames from videos.Signal Processing: Image Communica- tion, 28(1):34–44, 2013
Naveed Ejaz, Irfan Mehmood, and Sung Wook Baik. Ef- ficient visual attention based framework for extracting key frames from videos.Signal Processing: Image Communica- tion, 28(1):34–44, 2013. 5
2013
-
[13]
Spectral graph reduction for efficient image and streaming video segmentation
Fabio Galasso, Margret Keuper, Thomas Brox, and Bernt Schiele. Spectral graph reduction for efficient image and streaming video segmentation. InConference on Computer Vision and Pattern Recognition. IEEE, 2014. 4
2014
-
[14]
Multimodal narratives of climate de- nial: A novel, visual-first methodology for analysing con- spiracy theory discourse on instagram.Discourse, Context & Media, 68:100946, 2025
Caroline Gardam, Michelle Riedlinger, Daniel Angus, and Xue Ying (Jane) Tan. Multimodal narratives of climate de- nial: A novel, visual-first methodology for analysing con- spiracy theory discourse on instagram.Discourse, Context & Media, 68:100946, 2025. 1
2025
-
[15]
Imagebind: One embedding space to bind them all
Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. Imagebind: One embedding space to bind them all. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 15180–15190, 2023. 3
2023
-
[16]
Creating Summaries from User Videos
Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool. Creating Summaries from User Videos. InComputer Vision – ECCV 2014, pages 505–520, Cham,
2014
-
[17]
Springer International Publishing. 2, 5
-
[18]
Transformative jour- nalisms and the seductive power of imagery in digital climate niche journalism.Journalism, page 14648849251372742,
Sylvia Hayes and Saffron O’Neill. Transformative jour- nalisms and the seductive power of imagery in digital climate niche journalism.Journalism, page 14648849251372742,
-
[19]
Visual politics, protest, and power: Who shaped the climate visual discourse at cop26?Journalism Studies, 26(4):441–463, 2025
Sylvia Hayes and Saffron O’Neill. Visual politics, protest, and power: Who shaped the climate visual discourse at cop26?Journalism Studies, 26(4):441–463, 2025. 1
2025
-
[20]
A two-stage minimum cost multicut approach to self-supervised multiple person track- ing
Kalun Ho, Amirhossein Kardoost, Franz-Josef Pfreundt, Ja- nis Keuper, and Margret Keuper. A two-stage minimum cost multicut approach to self-supervised multiple person track- ing. InAsian Conference on Computer Vision. Springer,
-
[21]
Kalun Ho, Janis Keuper, and Margret Keuper. Unsupervised multiple person tracking using autoencoder-based lifted mul- ticuts.arXiv preprint arXiv:2002.01192, 2020. 4
-
[22]
Learning embeddings for image clustering: An em- pirical study of triplet loss approaches
Kalun Ho, Janis Keuper, Franz-Josef Pfreundt, and Margret Keuper. Learning embeddings for image clustering: An em- pirical study of triplet loss approaches. InInternational Con- ference of Pattern Recognition. IEEE, 2020. 4
2020
-
[23]
Msm: Multi-stage multicuts for scalable im- age clustering
Kalun Ho, Avraam Chatzimichailidis, Margret Keuper, and Janis Keuper. Msm: Multi-stage multicuts for scalable im- age clustering. InInternational Conference on High Perfor- mance Computing. Springer, 2021. 4
2021
-
[24]
Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022. 3
2022
-
[25]
Cluster-Based Video Summa- rization with Temporal Context Awareness
Hai-Dang Huynh-Lam, Ngoc-Phuong Ho-Thi, Minh-Triet Tran, and Trung-Nghia Le. Cluster-Based Video Summa- rization with Temporal Context Awareness. InImage and Video Technology, pages 15–28, Singapore, 2024. Springer Nature. 5
2024
-
[26]
Openclip.If you use this software, please cite it as below, 7,
Gabriel Ilharco, Mitchell Wortsman, Ross Wightman, Cade Gordon, Nicholas Carlini, Rohan Taori, Achal Dave, Vaishaal Shankar, Hongseok Namkoong, John Miller, et al. Openclip.If you use this software, please cite it as below, 7,
-
[27]
Video Key-Frame Ex- traction using Unsupervised Clustering and Mutual Compar- ison.International Journal of Image Processing (IJIP), 10 (2):73–84, 2016
Nitin J Janwe and Kishor K Bhoyar. Video Key-Frame Ex- traction using Unsupervised Clustering and Mutual Compar- ison.International Journal of Image Processing (IJIP), 10 (2):73–84, 2016. 5
2016
-
[28]
Learning to solve min- imum cost multicuts efficiently using edge-weighted graph convolutional neural networks
Steffen Jung and Margret Keuper. Learning to solve min- imum cost multicuts efficiently using edge-weighted graph convolutional neural networks. InJoint European Confer- ence on Machine Learning and Knowledge Discovery in Databases. Springer, 2022. 4
2022
-
[29]
Optimizing edge detection for image seg- mentation with multicut penalties
Steffen Jung, Sebastian Ziegler, Amirhossein Kardoost, and Margret Keuper. Optimizing edge detection for image seg- mentation with multicut penalties. InDAGM German Con- ference on Pattern Recognition. Springer, 2022. 4
2022
-
[30]
Climate change denial mes- sages as post-truth.Journal of Communication Pedagogy, 9: 77–83, 2025
David H Kahl and Ahmet Atay. Climate change denial mes- sages as post-truth.Journal of Communication Pedagogy, 9: 77–83, 2025. 1
2025
-
[31]
Solving mini- mum cost lifted multicut problems by node agglomeration
Amirhossein Kardoost and Margret Keuper. Solving mini- mum cost lifted multicut problems by node agglomeration. InAsian Conference on Computer Vision. Springer, 2019. 4
2019
-
[32]
Uncertainty in minimum cost multicuts for image and motion segmenta- tion
Amirhossein Kardoost and Margret Keuper. Uncertainty in minimum cost multicuts for image and motion segmenta- tion. InConference on Uncertainty in Artificial Intelligence. PMLR, 2021. 4
2021
-
[33]
Higher-order minimum cost lifted multi- cuts for motion segmentation
Margret Keuper. Higher-order minimum cost lifted multi- cuts for motion segmentation. InInternational Conference on Computer Vision. IEEE, 2017
2017
-
[34]
Motion trajectory segmentation via minimum cost multicuts
Margret Keuper, Bjoern Andres, and Thomas Brox. Motion trajectory segmentation via minimum cost multicuts. InIn- ternational Conference on Computer Vision. IEEE, 2015. 4
2015
-
[35]
Efficient decomposition of image and mesh graphs by lifted multi- cuts
Margret Keuper, Evgeny Levinkov, Nicolas Bonneel, Guil- laume Lavou´e, Thomas Brox, and Bjorn Andres. Efficient decomposition of image and mesh graphs by lifted multi- cuts. InInternational Conference on Computer Vision. IEEE,
-
[36]
Motion segmentation and multiple ob- ject tracking by correlation co-clustering.Transactions on Pattern Analysis and Machine Intelligence, 42(1), 2018
Margret Keuper, Siyu Tang, Bjoern Andres, Thomas Brox, and Bernt Schiele. Motion segmentation and multiple ob- ject tracking by correlation co-clustering.Transactions on Pattern Analysis and Machine Intelligence, 42(1), 2018. 4
2018
-
[37]
How the ex- perience of california wildfires shape twitter climate change framings.Climatic Change, 177(1):17, 2024
Jessie WY Ko, Shengquan Ni, Alexander Taylor, Xiusi Chen, Yicong Huang, Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, et al. How the ex- perience of california wildfires shape twitter climate change framings.Climatic Change, 177(1):17, 2024. 1, 8
2024
-
[38]
Large Scale Video Representation Learning via Rela- tional Graph Clustering
Hyodong Lee, Joonseok Lee, Joe Yue-Hei Ng, and Paul Nat- sev. Large Scale Video Representation Learning via Rela- tional Graph Clustering. InConference on computer vision and pattern recognition, pages 6807–6816, 2020. 5
2020
-
[39]
Higher-order multicuts for geometric mmdel fitting and motion segmentation.Transactions on Pattern Analysis and Machine Intelligence, 45(1), 2022
Evgeny Levinkov, Amirhossein Kardoost, Bjoern Andres, and Margret Keuper. Higher-order multicuts for geometric mmdel fitting and motion segmentation.Transactions on Pattern Analysis and Machine Intelligence, 45(1), 2022. 4
2022
-
[40]
Video-llava: Learning united visual repre- sentation by alignment before projection
Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng Jin, and Li Yuan. Video-llava: Learning united visual repre- sentation by alignment before projection. InConference on Empirical Methods in Natural Language Processing, pages 5971–5984, 2024. 1, 2, 3, 6, 17
2024
-
[41]
Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023. 3
2023
-
[42]
Improved baselines with visual instruction tuning
Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26296–26306, 2024. 3
2024
-
[43]
Combined key-frame extraction and object-based video segmentation.IEEE Transactions on Circuits and Systems for Video Technology, 15(7):869–884,
Lijie Liu and Guoliang Fan. Combined key-frame extraction and object-based video segmentation.IEEE Transactions on Circuits and Systems for Video Technology, 15(7):869–884,
-
[44]
Climate change deniers versus climate change decriers: the pragmatics of climate defense in the age of disinformation.FAST CAPITALISM, 21(1), 2024
Timothy W Luke. Climate change deniers versus climate change decriers: the pragmatics of climate defense in the age of disinformation.FAST CAPITALISM, 21(1), 2024. 1
2024
-
[45]
arXiv preprint arXiv:2306.07207 , year=
Ruipu Luo, Ziwang Zhao, Min Yang, Junwei Dong, Da Li, Pengcheng Lu, Tao Wang, Linmei Hu, Minghui Qiu, and Zhongyu Wei. Valley: Video assistant with large language model enhanced ability.arXiv preprint arXiv:2306.07207,
-
[46]
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Muhammad Maaz, Hanoona Rasheed, Salman Khan, and Fa- had Shahbaz Khan. Video-chatgpt: Towards detailed video understanding via large vision and language models.arXiv preprint arXiv:2306.05424, 2023. 1, 2, 3, 6, 19
work page internal anchor Pith review arXiv 2023
-
[47]
Fire as an aesthetic re- source in climate change communication: exploring the vi- sual discourse of the california wildfires on twitter/x.Visual Studies, 40(2):337–351, 2025
Aidan McGarry and Emiliano Trer ´e. Fire as an aesthetic re- source in climate change communication: exploring the vi- sual discourse of the california wildfires on twitter/x.Visual Studies, 40(2):337–351, 2025. 1, 8
2025
-
[48]
Comparing Clusterings by the Variation of In- formation
Marina Meil ˘a. Comparing Clusterings by the Variation of In- formation. InLearning Theory and Kernel Machines, pages 173–187, Berlin, Heidelberg, 2003. Springer. 8
2003
-
[49]
Coun- teracting climate denial: A systematic review.Public Under- standing of Science, 33(4):504–520, 2024
Laila Mendy, Mikael Karlsson, and Daniel Lindvall. Coun- teracting climate denial: A systematic review.Public Under- standing of Science, 33(4):504–520, 2024. 1 10
2024
-
[50]
(Social) Media Logics and Visualizing Cli- mate Change: 10 Years of #climatechange Images on Twit- ter.Social Media + Society, 9(1):20563051231164310,
Angelina Mooseder, Cornelia Brantner, Rodrigo Zamith, and J¨urgen Pfeffer. (Social) Media Logics and Visualizing Cli- mate Change: 10 Years of #climatechange Images on Twit- ter.Social Media + Society, 9(1):20563051231164310,
-
[51]
Publisher: SAGE Publications Ltd. 1, 3, 5
-
[52]
Keyframe-based video summarization using Delaunay clus- tering.International Journal on Digital Libraries, 6(2):219– 232, 2006
Padmavathi Mundur, Yong Rao, and Yelena Yesha. Keyframe-based video summarization using Delaunay clus- tering.International Journal on Digital Libraries, 6(2):219– 232, 2006. 5
2006
-
[53]
Image Segmentation By Using Thresholding Techniques For Medical Images.Com- puter Science & Engineering: An International Journal, 6 (1):1–13, 2016
Senthilkumaran N and Vaithegi S. Image Segmentation By Using Thresholding Techniques For Medical Images.Com- puter Science & Engineering: An International Journal, 6 (1):1–13, 2016. 5
2016
-
[54]
Lmgp: Lifted multicut meets geometry projections for multi-camera multi-object tracking
Duy MH Nguyen, Roberto Henschel, Bodo Rosenhahn, Daniel Sonntag, and Paul Swoboda. Lmgp: Lifted multicut meets geometry projections for multi-camera multi-object tracking. InConference on Computer Vision and Pattern Recognition. IEEE, 2022. 4
2022
-
[55]
Climate hoax: The shift from scientific discourse to speculative rhetoric in climate change conversations.Next Research, page 100322, 2025
Samuel Chukwujindu Nwokolo. Climate hoax: The shift from scientific discourse to speculative rhetoric in climate change conversations.Next Research, page 100322, 2025. 1
2025
-
[56]
Sch ¨afer
Saffron O’Neill and Mike S. Sch ¨afer. Frame Analysis in Climate Change Communication: Approaches for Assess- ing Journalists’ Minds, Online Communication and Media Portrayals. InSch ¨afer, Mike S; O’Neill, Saffron (2017). Frame Analysis in Climate Change Communication: Ap- proaches for Assessing Journalists’ Minds, Online Commu- nication and Media Portra...
2017
-
[57]
Visual portrayals of fun in the sun in european news outlets misrepresent heatwave risks.The Geographical Journal, 189(1), 2023
Saffron O’Neill, Sylvia Hayes, Nadine Strauß, Marie-No¨elle Doutreix, Katharine Steentjes, Joshua Ettinger, Ned West- wood, and James Painter. Visual portrayals of fun in the sun in european news outlets misrepresent heatwave risks.The Geographical Journal, 189(1), 2023. 1, 8
2023
-
[58]
Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Je- gou, Julien Mairal, Patr...
2024
-
[59]
Towards understanding climate change perceptions: A social media dataset
Katharina Prasse, Steffen Jung, Isaac B Bravo, Stefanie Wal- ter, and Margret Keuper. Towards understanding climate change perceptions: A social media dataset. InNeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning, 2023. 3
2023
-
[60]
I Spy with My Little Eye a Minimum Cost Multicut Investigation of Dataset Frames
Katharina Prasse, Isaac Bravo, Stefanie Walter, and Margret Keuper. I Spy with My Little Eye a Minimum Cost Multicut Investigation of Dataset Frames. In2025 IEEE/CVF Win- ter Conference on Applications of Computer Vision (WACV), pages 2134–2143, 2025. ISSN: 2642-9381. 1, 2, 4, 40
2025
-
[61]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 1, 2, 3, 31
2021
-
[62]
Stacy Rebich-Hespanha and Ronald E. Rice. Dominant Vi- sual Frames in Climate Change News Stories: Implications for Formative Evaluation in Climate Change Campaigns. International Journal of Communication (19328036), 10,
-
[63]
Rousseeuw
Peter J. Rousseeuw. Silhouettes: A graphical aid to the inter- pretation and validation of cluster analysis.Journal of Com- putational and Applied Mathematics, 20:53–65, 1987. 4, 16
1987
-
[64]
Seelig, Huixin Deng, and Songyi Liang
Michelle I. Seelig, Huixin Deng, and Songyi Liang. A frame analysis of climate change solutions in legacy news and dig- ital media.Newspaper Research Journal, 43(4):370–388,
-
[65]
Publisher: SAGE Publications Inc. 1
-
[66]
Conceptual captions: A cleaned, hypernymed, im- age alt-text dataset for automatic image captioning
Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. Conceptual captions: A cleaned, hypernymed, im- age alt-text dataset for automatic image captioning. InPro- ceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2556–2565, 2018. 3
2018
-
[67]
What do you meme? generating explanations for visual semantic role labelling in memes
Shivam Sharma, Siddhant Agarwal, Tharun Suresh, Preslav Nakov, Md Shad Akhtar, and Tanmoy Chakraborty. What do you meme? generating explanations for visual semantic role labelling in memes. InAAAI Conference on Artificial Intelligence, pages 9763–9771, 2023. 9
2023
-
[68]
Pandagpt: One model to instruction-follow them all.arXiv preprint arXiv:2305.16355, 2023
Yixuan Su, Tian Lan, Huayang Li, Jialu Xu, Yan Wang, and Deng Cai. Pandagpt: One model to instruction-follow them all.arXiv preprint arXiv:2305.16355, 2023. 1, 2, 3, 6, 26
-
[69]
Multi-person tracking by multicut and deep match- ing
Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, and Bernt Schiele. Multi-person tracking by multicut and deep match- ing. InECCV Workshop on Benchmarking Multi-Target Tracking: MOTChallenge. Springer, 2016. 4
2016
-
[70]
Multiple people tracking by lifted multicut and per- son re-identification
Siyu Tang, Mykhaylo Andriluka, Bjoern Andres, and Bernt Schiele. Multiple people tracking by lifted multicut and per- son re-identification. InConference on Computer Vision and Pattern Recognition. IEEE, 2017. 4
2017
-
[71]
Video abstraction: A systematic review and classification.ACM Trans
Ba Tu Truong and Svetha Venkatesh. Video abstraction: A systematic review and classification.ACM Trans. Multime- dia Comput. Commun. Appl., 3(1):3–es, 2007. 2
2007
-
[72]
The visual framing of climate change impacts and adaptation in the IPCC as- sessment reports.Climatic Change, 156(1):273–292, 2019
Arjan Wardekker and Susanne Lorenz. The visual framing of climate change impacts and adaptation in the IPCC as- sessment reports.Climatic Change, 156(1):273–292, 2019. 1
2019
-
[73]
Con- vnext v2: Co-designing and scaling convnets with masked autoencoders
Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Saining Xie. Con- vnext v2: Co-designing and scaling convnets with masked autoencoders. InConference on Computer Vision and Pat- tern Recognition. IEEE, 2023. 2, 4, 7
2023
-
[74]
Antal Wozniak, Hartmut Wessler, and Julia L ¨uck. Who Prevails in the Visual Framing Contest about the United Nations Climate Change Conferences?Journalism Stud- ies, 18(11):1433–1452, 2017. Publisher: Routledge eprint: https://doi.org/10.1080/1461670X.2015.1131129. 1 11
-
[75]
Multimodal climate change communication on wechat: analyzing visual/textual clusters on china’s largest social media platform.Climatic Change, 178(7):133, 2025
Xiaoyue Yan and Mike S Sch ¨afer. Multimodal climate change communication on wechat: analyzing visual/textual clusters on china’s largest social media platform.Climatic Change, 178(7):133, 2025. 1
2025
-
[76]
Key Frame Extraction Us- ing Unsupervised Clustering Based on a Statistical Model
Shuping Yang and Xinggang Lin. Key Frame Extraction Us- ing Unsupervised Clustering Based on a Statistical Model. Tsinghua Science & Technology, 10(2):169–173, 2005. 5
2005
-
[77]
Yeung and Bede Liu
M.M. Yeung and Bede Liu. Efficient matching and clustering of video shots. InProceedings., International Conference on Image Processing, pages 338–341 vol.1, 1995. 5
1995
-
[78]
Understanding climate-related visual storytelling on tiktok: A cross-national multimodal analysis.Journal of digital social research, 6(2):66–84,
Jing Zeng and Xiaoyue Yan. Understanding climate-related visual storytelling on tiktok: A cross-national multimodal analysis.Journal of digital social research, 6(2):66–84,
-
[79]
Video Summarization with Long Short-Term Memory
Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. Video Summarization with Long Short-Term Memory. In Computer Vision – ECCV 2016, pages 766–782, Cham,
2016
-
[80]
Springer International Publishing. 5
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.