pith. sign in

arxiv: 2606.20595 · v1 · pith:6GFLT6DXnew · submitted 2026-05-18 · 💻 cs.HC

Hybrid Intelligence in Cartoon Captioning: Evaluating AI as a Creative Writing Partner

Pith reviewed 2026-06-30 18:22 UTC · model grok-4.3

classification 💻 cs.HC
keywords cartoon captioningAI assistancehybrid intelligenceGPT-4ohumor generationcreative writinghuman-AI collaborationvisual storytelling
0
0 comments X

The pith

AI generates humorous cartoon captions but often diverges from the original intended meaning such as missing irony or cultural references.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests GPT-4o by removing captions from IEEE Computer magazine cartoons and prompting the model to create replacements that match the depicted situation. It finds that the outputs are frequently funny and contextually relevant yet can miss the cartoon's narrative intent. The authors conclude that AI functions effectively as an assistant for exploring ideas and refining captions but should not replace human creative control. This setup matters for understanding practical limits of current language models in humor tasks that depend on visual context and subtlety. The work supports integrating AI suggestions into a workflow where cartoonists retain final judgment.

Core claim

Prompting GPT-4o on de-captioned cartoon images produces alternatives that are often humorous and relevant but sometimes diverge from the cartoon's intended meaning by missing irony, cultural references, or contextual constraints. The study positions current AI systems as tools for broadening creative exploration rather than autonomous replacements, enabling cartoonists to streamline ideation while keeping control over the final output.

What carries the argument

The evaluation protocol of removing original captions from magazine cartoons and prompting GPT-4o via ChatGPT to generate replacements without added context or style instructions.

If this is right

  • Cartoonists can use AI outputs to explore diverse humor styles during ideation.
  • AI suggestions can accelerate the process of generating initial caption ideas.
  • Human creators retain the ability to select and refine captions for accuracy.
  • AI can occasionally produce captions that improve on the original humor.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar hybrid workflows may apply to other caption or short-text creative tasks where visual context matters.
  • Adding constraints like target humor style during prompting could reduce observed divergences.
  • The results highlight that replication of a single intended meaning is harder for AI than generation of plausible alternatives.

Load-bearing premise

That prompting the model on de-captioned images with no extra context fairly measures its ability to recover the original narrative intent.

What would settle it

An experiment that supplies the model with the original caption's tone or additional narrative details and measures whether divergence from intended meaning drops substantially.

Figures

Figures reproduced from arXiv: 2606.20595 by Derya Akleman, Ergun Akleman, Metin Sezgin, Sanem Sariel, U\u{g}ur \"Onal.

Figure 1
Figure 1. Figure 1: This study, inspired by the New Yorker caption contest (Allen, 2025), tests AI-generated cartoon captions. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: We provided GPT-4o this cartoon from IEEE [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: We provided GPT-4o this cartoon from IEEE [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: We provided GPT-4o this cartoon from IEEE [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: We presented GPT-4o with (8), a cartoon from IEEE Computer published in March 2025 (Akleman, 2025d), with the original text in the last speech balloon removed. GPT-4o generated the following three responses: (1) "Put this on, and you can fight dragons without leaving your couch!" (2) "With this, you can conquer worlds. . . and still be home for dinner!" (3) "It’s like a quest, but with better graphics and … view at source ↗
read the original abstract

Crafting cartoon captions requires an understanding of humor, context, and the relationship between image and text. Traditionally, illustrators and writers collaborate to strengthen visual storytelling and comedic timing. With advances in natural language generation, Large Language Models (LLMs) can assist in this process. This study examines AI's role in caption generation by testing GPT-4o via the ChatGPT interface on IEEE Computer magazine cartoons. By removing captions and prompting AI to generate replacements, we assess its ability to produce jokes that match the depicted situation and narrative intent. Our findings show that while AI-generated captions are often humorous and contextually relevant, they sometimes diverge from the cartoon's intended meaning, for example, by missing irony, cultural references, or contextual constraints. However, AI can also produce alternatives that broaden creative exploration and occasionally improve upon the original humor. We argue that current AI systems are best used as an assistant rather than a replacement for human creativity. By integrating AI-generated suggestions, cartoonists can explore diverse humor styles, streamline ideation, and refine final captions while retaining creative control. This study highlights AI's potential as a practical tool for caption ideation within a hybrid human-AI workflow.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that GPT-4o, when prompted on de-captioned IEEE Computer magazine cartoons, generates humorous and contextually relevant captions but sometimes diverges from the original intended meaning (e.g., missing irony, cultural references, or contextual constraints). The authors conclude that AI is best used as an assistant rather than a replacement, supporting hybrid human-AI workflows for caption ideation where humans retain creative control.

Significance. If the evaluation protocol is revised to better isolate model capabilities, the work could provide useful empirical examples for HCI research on hybrid intelligence in creative tasks. It illustrates both AI strengths in generating alternative humor styles and limitations in matching specific narrative intent, offering practical suggestions for streamlining ideation while preserving human oversight.

major comments (2)
  1. [Abstract, paragraph 3] Abstract, paragraph 3: The evaluation protocol removes the original caption and issues a generic prompt to generate a replacement from the image alone, without supplying the specific narrative intent, cultural framing, or stylistic constraints of the human-authored caption. Observed divergences from 'intended meaning' are therefore expected by construction and do not isolate whether the model could match that intent under a better-specified prompt. This choice is load-bearing for the central claim that AI 'sometimes diverge from the cartoon's intended meaning' and for the hybrid-workflow recommendation, since positive findings are also generated under the same under-specified regime.
  2. [Abstract] Abstract: No information is given on the number of cartoons evaluated, selection criteria, blinding procedures for qualitative judgments, or any inter-rater reliability measures. Without these details the statements that AI captions are 'often humorous,' 'sometimes diverge,' or 'occasionally improve' upon the original cannot be assessed for robustness or generalizability.
minor comments (1)
  1. [Abstract] The exact wording of the prompt issued to GPT-4o via the ChatGPT interface should be quoted verbatim, including any system instructions or temperature settings, to allow replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments on our evaluation protocol and methodological transparency. We address each point below and indicate where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract, paragraph 3] The evaluation protocol removes the original caption and issues a generic prompt to generate a replacement from the image alone, without supplying the specific narrative intent, cultural framing, or stylistic constraints of the human-authored caption. Observed divergences from 'intended meaning' are therefore expected by construction and do not isolate whether the model could match that intent under a better-specified prompt. This choice is load-bearing for the central claim that AI 'sometimes diverge from the cartoon's intended meaning' and for the hybrid-workflow recommendation.

    Authors: The generic prompt was chosen deliberately to test the model's ability to infer humor and narrative intent from the image alone, which reflects a realistic deployment scenario for an AI ideation assistant. Divergences from the original caption thus demonstrate limitations in independently capturing elements such as irony or cultural references, reinforcing the value of human oversight in hybrid workflows. We acknowledge that the rationale for this design choice requires clearer articulation and will revise the abstract and methods section to explain the protocol's intent and its implications for the hybrid-intelligence claim. revision: partial

  2. Referee: [Abstract] No information is given on the number of cartoons evaluated, selection criteria, blinding procedures for qualitative judgments, or any inter-rater reliability measures. Without these details the statements that AI captions are 'often humorous,' 'sometimes diverge,' or 'occasionally improve' upon the original cannot be assessed for robustness or generalizability.

    Authors: We agree that these details are essential. The revised manuscript will report the exact number of cartoons evaluated, the selection criteria applied to the IEEE Computer archive, and the qualitative judgment process. We will also explicitly note the absence of formal blinding and inter-rater reliability as a methodological limitation of the current study. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation with no derivations or self-referential steps

full rationale

The paper conducts a direct empirical comparison of GPT-4o caption generation against human-authored originals on de-captioned IEEE cartoons. No equations, fitted parameters, predictions derived from inputs, or self-citations appear in the provided text or abstract. The central claim rests on qualitative observations from the prompting protocol rather than any reduction of a result to its own inputs by construction. The study is self-contained as an observational assessment and does not invoke uniqueness theorems, ansatzes, or renamed empirical patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model, free parameters, or invented entities are present. The work is an empirical case study with no axiomatic structure.

pith-pipeline@v0.9.1-grok · 5757 in / 1145 out tokens · 19698 ms · 2026-06-30T18:22:11.980150+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

136 extracted references · 9 canonical work pages · 1 internal anchor

  1. [1]

    Emma Allen , howpublished=

  2. [2]

    2025 , publisher=

    Akleman, Ergun , journal=. 2025 , publisher=

  3. [3]

    2024 , publisher=

    Akleman, Ergun , journal=. 2024 , publisher=

  4. [4]

    2024 , howpublished=

    Cumhuriyet , month=. 2024 , howpublished=

  5. [5]

    2024 , howpublished=

    Muammer Kotbas , month=. 2024 , howpublished=

  6. [6]

    The Guardian , howpublished =

    Darroch, Gordon , year =. The Guardian , howpublished =

  7. [7]

    Kanwaldeep and Choudhary, Chahil and Kunal and Barnwal, Priyashi , booktitle=

    Trivedi, Abhinav and Kaur, Er. Kanwaldeep and Choudhary, Chahil and Kunal and Barnwal, Priyashi , booktitle=. 2023 , volume=

  8. [8]

    Cognitive Research: Principles and Implications , year=

    Bellaiche, Lucas and Shahi, Rohin and Turpin, Martin Harry and Ragnhildstveit, Anya and Sprockett, Shawn and Barr, Nathaniel and Christensen, Alexander and Seli, Paul , title=. Cognitive Research: Principles and Implications , year=

  9. [9]

    Sindhura, Siripurapu Phani and Abdul, Ashu , year=

  10. [10]

    Medsker, Larry R , year=

  11. [11]

    Business & Information Systems Engineering , volume=

    Dellermann, Dominik and Ebel, Philipp and S. Business & Information Systems Engineering , volume=. 2019 , publisher=

  12. [12]

    Kamar, Ece , booktitle=

  13. [13]

    Kamar, Ece and Hacker, Severin and Horvitz, Eric , booktitle=

  14. [14]

    Chang, Joseph Chee and Amershi, Saleema and Kamar, Ece , booktitle=

  15. [15]

    Bansal, Gagan and Wu, Tongshuang and Zhou, Joyce and Fok, Raymond and Nushi, Besmira and Kamar, Ece and Ribeiro, Marco Tulio and Weld, Daniel , booktitle=

  16. [16]

    2021 , publisher=

    Akleman, Ergun , journal=. 2021 , publisher=

  17. [17]

    Greenwade

    George D. Greenwade. The C omprehensive T ex A rchive N etwork ( CTAN ). TUGBoat. 1993

  18. [18]

    Caldwell, Craig , year=

  19. [19]

    An old lady, a cab ride and one redeeming moment

    Bill Williams. An old lady, a cab ride and one redeeming moment. Gaston Gazette, Gastonia, North Carolina. 2000

  20. [20]

    2022 , note =

    Mueller, Annie , howpublished = ". 2022 , note =

  21. [21]

    Evans, Richard , year=

  22. [22]

    Comic Chat

    David Kurlander and Tim Skelly and David Salesin. Comic Chat. Proceedings of ACM SIGGRAPH 1996. 1996

  23. [23]

    2012 , publisher =

    Yutu Liu and Ergun Akleman and Jianer Chen , title =. 2012 , publisher =

  24. [24]

    Lakoff, George , year=

  25. [25]

    George Lakoff and Srini Narayanan , title =

  26. [26]

    , title =

    Meehan, James R. , title =. 1977 , pages =

  27. [27]

    Poetics , year =

    Lebowitz, Michael , title =. Poetics , year =

  28. [28]

    Communications , year =

    Barthes, Roland , title =. Communications , year =

  29. [29]

    1973 , author =

    Logique du r. 1973 , author =

  30. [30]

    ICCS 2000 , pages=

    Sch. ICCS 2000 , pages=

  31. [31]

    2006 , pages =

    Gerv. 2006 , pages =

  32. [32]

    2004 , pages =

    Peinado, Federico and Gervas, Pablo and Di. 2004 , pages =

  33. [33]

    New Generation Computing , volume=

    Peinado, Federico and Gerv. New Generation Computing , volume=. 2006 , publisher=

  34. [34]

    Richards, Whitman and Finlayson, Mark Alan and Winston, Patrick Henry , journal=

  35. [35]

    Tobias, Ronald B , year=

  36. [36]

    Forster, Edward Morgan , year=

  37. [37]

    Chatman, Seymour Benjamin , year=

  38. [38]

    Akleman, Ergun and Franchi, Stefano and Kaleci, Devkan and Mandell, Laura and Yamauchi, Takashi and Akleman, Derya , booktitle=

  39. [39]

    Meadows, Mark Stephen , year=

  40. [40]

    Wang, Angela and Eason, Anthony Dalton and Akleman, Ergun , booktitle=

  41. [41]

    Writing Fa

    Mateas, Michael and Stern, Andrew , journal=. Writing Fa

  42. [42]

    2009 , publisher=

    Cavazza, Marc and Champagnat, Ronan and Leonardi, Riccardo , booktitle=. 2009 , publisher=

  43. [43]

    2009 , publisher=

    Iurgel, Ido A and Zagalo, Nelson and Petta, Paolo , volume=. 2009 , publisher=

  44. [44]

    2008 , publisher=

    Spierling, Ulrike and Szilas, Nicolas , volume=. 2008 , publisher=

  45. [45]

    Korte, Barbara , year=

  46. [46]

    2000 , publisher=

    Pearl, Judea , volume=. 2000 , publisher=

  47. [47]

    Pearl, Judea , year=

  48. [48]

    1986 , publisher=

    Pearl, Judea , journal=. 1986 , publisher=

  49. [49]

    1995 , publisher=

    Pearl, Judea , journal=. 1995 , publisher=

  50. [50]

    1998 , publisher=

    Bessler, David A and Akleman, Derya G , journal=. 1998 , publisher=

  51. [51]

    Akleman, Derya G and Bessler, David A and Burton, Diana M , journal=

  52. [52]

    Katz, Steven Douglas , year=

  53. [53]

    2006 , organization=

    Liu, Feng and Gleicher, Michael , booktitle=. 2006 , organization=

  54. [54]

    Arijon, Daniel , year=

  55. [55]

    He, Li-Wei and Cohen, Michael and Salesin, David , booktitle=

  56. [56]

    1977 , publisher=

    Scheier, Michael F and Carver, Charles S , journal=. 1977 , publisher=

  57. [57]

    Ben-Ze’ev, Aaron , journal=

  58. [58]

    1948 , publisher=

    Stevens, Edward B , journal=. 1948 , publisher=

  59. [59]

    2023 , publisher=

    Emergy Oy , howpublished=. 2023 , publisher=

  60. [60]

    2019 , publisher=

    Di Leo, Ivana and Muis, Krista R and Singh, Cara A and Psaradellis, Cynthia , journal=. 2019 , publisher=

  61. [61]

    2019 , publisher=

    Vogl, Elisabeth and Pekrun, Reinhard and Murayama, Kou and Loderer, Kristina and Schubert, Sandra , journal=. 2019 , publisher=

  62. [62]

    2020 , publisher=

    Vogl, Elisabeth and Pekrun, Reinhard and Murayama, Kou and Loderer, Kristina , journal=. 2020 , publisher=

  63. [63]

    1978 , publisher=

    Hamlyn, David W , journal=. 1978 , publisher=

  64. [64]

    1952 , publisher=

    Balint, Michael , journal=. 1952 , publisher=

  65. [65]

    Sugiyama, Michelle Scalise , year=

  66. [66]

    Zimmerman, Eugenia Noik , journal=

  67. [67]

    Sturrock, John , year=

  68. [68]

    Riemann, Bernhard , journal=

  69. [69]

    Annalen der Physik , volume=

    Schr. Annalen der Physik , volume=. 1920 , publisher=

  70. [70]

    2022 , publisher=

    Bujack, Roxana and Teti, Emily and Miller, Jonah and Caffrey, Elektra and Turton, Terece L , journal=. 2022 , publisher=

  71. [71]

    2009 , publisher=

    Weiss-Gal, Idit , journal=. 2009 , publisher=

  72. [72]

    Herdayanti, Kicki and Satria, Robby , journal=

  73. [73]

    Musdalifa, Musdalifa and Sili, Surya and Ariani, Setya , journal=

  74. [74]

    Robert Zemeckis , year=

  75. [75]

    Andrew Stanton and Lee Unkrich , year=

  76. [76]

    Greta Gerwig , year=

  77. [77]

    John McTiernan , year=

  78. [78]

    Alfred Hitchcock , year=

  79. [79]

    John Lasseter and Andrew Stanton , year=

  80. [80]

    John Lasseter , year=

Showing first 80 references.