arxiv: 2512.24329 · v1 · submitted 2025-12-30 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

World model inspired sarcasm reasoning with large language model agents

Keito Inoshita , Shinnosuke Mizuno

Authors on Pith no claims yet

Pith reviewed 2026-05-16 18:50 UTC · model grok-4.3

classification 💻 cs.CL

keywords sarcasm detectionworld modelLLM agentsinconsistency scoreintention reasoningnatural language processinginterpretable AI

0 comments

The pith

World model agents detect sarcasm by measuring inconsistency between literal meaning and speaker intention.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reformulates sarcasm detection as a structured reasoning process that breaks an utterance into literal meaning, surrounding context, normative expectations, and speaker intentions, each handled by its own LLM agent. It computes a deterministic inconsistency score from the gap between the literal evaluation and the normative expectation, then feeds that score together with an intention score into a simple logistic regression model to output a sarcasm probability. This design keeps the final decision numerically interpretable while using the agents to capture the cognitive mismatch that defines sarcasm. Experiments show the approach beats both traditional deep learning models and other LLM baselines on standard sarcasm benchmarks, with ablations confirming that the inconsistency and intention components are essential.

Core claim

WM-SAR decomposes sarcasm understanding into literal meaning, context, normative expectation, and intention using specialized LLM-based agents. The discrepancy between literal evaluation and normative expectation is quantified as a deterministic inconsistency score, which together with an intention score is integrated by logistic regression to infer sarcasm probability, yielding superior performance and interpretability on representative sarcasm detection benchmarks.

What carries the argument

The WM-SAR framework of specialized LLM agents that extract literal meaning, normative expectations, and intentions, then combine a deterministic inconsistency score with an intention score through logistic regression.

If this is right

The method supplies explicit numerical signals that explain why a given utterance is classified as sarcastic.
Explicit separation of literal meaning from normative expectation allows the model to handle cases where surface wording conflicts with social norms.
The lightweight logistic regression layer preserves interpretability even when the underlying agents are large language models.
Ablation results indicate that removing either the inconsistency score or the intention component measurably degrades benchmark performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same agent decomposition could be tested on related phenomena such as irony or indirect speech acts.
Running the inconsistency score on live social-media streams might expose how quickly normative expectations shift across communities.
Replacing the logistic regression with a small neural combiner could be checked to see whether performance gains justify the loss of direct numerical transparency.

Load-bearing premise

That LLM agents can reliably and consistently extract literal meaning, normative expectations, and intentions so the derived inconsistency score remains stable and the logistic regression produces a valid sarcasm probability.

What would settle it

A collection of utterances labeled sarcastic by humans where the model's computed inconsistency score shows no systematic difference from non-sarcastic utterances.

read the original abstract

Sarcasm understanding is a challenging problem in natural language processing, as it requires capturing the discrepancy between the surface meaning of an utterance and the speaker's intentions as well as the surrounding social context. Although recent advances in deep learning and Large Language Models (LLMs) have substantially improved performance, most existing approaches still rely on black-box predictions of a single model, making it difficult to structurally explain the cognitive factors underlying sarcasm. Moreover, while sarcasm often emerges as a mismatch between semantic evaluation and normative expectations or intentions, frameworks that explicitly decompose and model these components remain limited. In this work, we reformulate sarcasm understanding as a world model inspired reasoning process and propose World Model inspired SArcasm Reasoning (WM-SAR), which decomposes literal meaning, context, normative expectation, and intention into specialized LLM-based agents. The discrepancy between literal evaluation and normative expectation is explicitly quantified as a deterministic inconsistency score, and together with an intention score, these signals are integrated by a lightweight Logistic Regression model to infer the final sarcasm probability. This design leverages the reasoning capability of LLMs while maintaining an interpretable numerical decision structure. Experiments on representative sarcasm detection benchmarks show that WM-SAR consistently outperforms existing deep learning and LLM-based methods. Ablation studies and case analyses further demonstrate that integrating semantic inconsistency and intention reasoning is essential for effective sarcasm detection, achieving both strong performance and high interpretability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives sarcasm detection a multi-agent decomposition with an explicit inconsistency score fed to logistic regression, but the abstract shows no numbers and the determinism claim looks shaky.

read the letter

The main thing to know is that WM-SAR splits sarcasm into separate LLM agents for literal meaning, normative expectations, context, and intention, then turns the gap between literal and normative into a single inconsistency score that gets combined with an intention score via logistic regression. That decomposition is the clearest new piece compared with the single-model baselines cited in the abstract. The authors treat the whole thing as a lightweight world-model style process, which gives the output a bit more structure than pure black-box LLM classifiers. The ablations apparently show that dropping either the inconsistency or intention part hurts performance, which lines up with how sarcasm actually works in conversation. That part of the design is straightforward and worth looking at if you care about breaking down social reasoning rather than just pushing accuracy numbers. The logistic regression step keeps the final decision interpretable in a simple numerical way. The soft spots are mostly around evidence and reproducibility. The abstract claims consistent outperformance on representative benchmarks and says the ablations prove the components are essential, yet it supplies none of the actual scores, dataset sizes, or error breakdowns. Without those, the central empirical claim stays hard to evaluate. The bigger practical issue is the determinism of the inconsistency score. The paper calls it deterministic, but the agents are LLM-based and the description gives no temperature, seed, or decoding details to enforce that. If the scores shift across runs, the logistic regression outputs and the ablation results become noisy rather than structural. The regression itself is also fit on the same benchmark data used for final evaluation, so some of the reported gains could trace to that fitting step. This is the sort of paper that would interest people working on explainable multi-agent setups for social NLP tasks. The framing is clean enough that a serious referee could usefully press on the missing numbers and implementation controls. I would send it to review rather than desk reject.

Referee Report

2 major / 1 minor

Summary. The paper introduces WM-SAR, a framework inspired by world models for sarcasm reasoning using large language model agents. It decomposes the task into specialized agents for literal meaning, context, normative expectation, and intention. A deterministic inconsistency score is computed from the discrepancy between literal evaluation and normative expectation, combined with an intention score using logistic regression to predict sarcasm probability. The manuscript reports that this approach outperforms existing deep learning and LLM-based methods on sarcasm detection benchmarks, with ablation studies confirming the necessity of the semantic inconsistency and intention reasoning components.

Significance. If the results hold after addressing reproducibility, the work contributes an interpretable, modular approach to sarcasm detection that explicitly models key cognitive elements like inconsistency and intention, potentially improving both performance and explainability in NLP applications involving figurative language and social context. The hybrid design with lightweight logistic regression on LLM agents balances reasoning power with numerical transparency.

major comments (2)

[Abstract] The assertion of a 'deterministic inconsistency score' in the abstract lacks any specification of mechanisms to control for the inherent stochasticity of LLMs, such as setting temperature to 0, employing greedy decoding, or fixing random seeds. This is load-bearing for the central empirical claims because the score is used in the logistic regression and the ablation studies rely on it to demonstrate the importance of the inconsistency component; without such controls, the results may vary across runs and the interpretability is compromised.
[Experiments] Details on how the logistic regression coefficients are obtained are insufficient. If they are fitted using the same benchmark data as the evaluation, this introduces circularity that could inflate performance metrics and weaken the cross-benchmark claims of consistent outperformance.

minor comments (1)

[Abstract] The abstract would benefit from including at least high-level quantitative results, specific benchmark names, or dataset sizes to allow readers to immediately gauge the magnitude of the reported improvements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on reproducibility and experimental details. We address each point below and will revise the manuscript to incorporate clarifications that strengthen the claims.

read point-by-point responses

Referee: [Abstract] The assertion of a 'deterministic inconsistency score' in the abstract lacks any specification of mechanisms to control for the inherent stochasticity of LLMs, such as setting temperature to 0, employing greedy decoding, or fixing random seeds. This is load-bearing for the central empirical claims because the score is used in the logistic regression and the ablation studies rely on it to demonstrate the importance of the inconsistency component; without such controls, the results may vary across runs and the interpretability is compromised.

Authors: We agree that explicit controls for stochasticity must be stated to support the determinism claim and the ablation results. In our implementation, all LLM agents used temperature=0 with greedy decoding and fixed random seeds to produce deterministic outputs for literal evaluation and normative expectation. We will revise the abstract to note these controls and add a methods subsection detailing the exact decoding parameters, ensuring the inconsistency score remains fully reproducible. revision: yes
Referee: [Experiments] Details on how the logistic regression coefficients are obtained are insufficient. If they are fitted using the same benchmark data as the evaluation, this introduces circularity that could inflate performance metrics and weaken the cross-benchmark claims of consistent outperformance.

Authors: The logistic regression is fitted exclusively on a held-out training split (via cross-validation) that is disjoint from all evaluation benchmark test sets, avoiding any circularity. Coefficients are learned to combine the inconsistency and intention scores on training data only, after which the fixed model is applied to the test benchmarks. We will expand the experiments section with the precise fitting procedure, data splits, and hyperparameters to make this transparent. revision: yes

Circularity Check

1 steps flagged

Logistic regression coefficients fitted to benchmark data reduce final sarcasm probability to data-driven fit

specific steps

fitted input called prediction [Abstract (integration step)]
"the discrepancy between literal evaluation and normative expectation is explicitly quantified as a deterministic inconsistency score, and together with an intention score, these signals are integrated by a lightweight Logistic Regression model to infer the final sarcasm probability"

The inconsistency and intention scores are produced by the LLM agents; the LR then combines them into the final probability. Because the LR coefficients are fitted directly to the benchmark labels used for reported accuracy and ablation results, the 'prediction' of sarcasm is statistically forced by the same data rather than emerging from the world-model structure alone.

full rationale

The paper's central inference step extracts literal/normative scores via LLM agents then feeds them into logistic regression whose parameters are learned from the same sarcasm detection benchmarks used for final evaluation. This matches the fitted-input-called-prediction pattern: the reported performance and ablation gains are not independent predictions but outputs of a supervised combiner trained on the evaluation distribution. No evidence of held-out parameter fitting or external validation of the LR step is provided in the abstract or described method, creating moderate circular dependence even though the agent decomposition itself is not self-referential.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; ledger is minimal. The method assumes LLMs can extract the four components reliably and that the inconsistency score is a stable, deterministic quantity independent of prompt variation.

pith-pipeline@v0.9.0 · 5543 in / 1087 out tokens · 45207 ms · 2026-05-16T18:50:04.545573+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

The discrepancy between literal evaluation and normative expectation is explicitly quantified as a deterministic inconsistency score... D(u, C(u)) = M_literal(u) − E_norm(C(u))... SD(u, C(u)) = I[sgn(M_literal(u)) ≠ sgn(E_norm(C(u)))]
IndisputableMonolith/Foundation/ArrowOfTime.lean forward_accumulates echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

reformulate sarcasm understanding as a world model inspired reasoning process... observation→latent state→prediction→prediction error→decision

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 8 internal anchors

[1]

In: Proceedings of the 3rd International Conference on Smart Data Intelligence (ICSMDI), pp

Salini, Y., HariKiran, J.: Sarcasm detection: A systematic review of methods and approaches. In: Proceedings of the 3rd International Conference on Smart Data Intelligence (ICSMDI), pp. 15–22. IEEE, Trichy, India (2023). https://doi.org/ 10.1109/ICSMDI57622.2023.00012

work page doi:10.1109/icsmdi57622.2023.00012 2023
[2]

In: Proceedings of the 10th International Conference on Contemporary Computing (IC3), pp

Jain, T., Agrawal, N., Goyal, G., Aggrawal, N.: Sarcasm detection of tweets: A comparative study. In: Proceedings of the 10th International Conference on Contemporary Computing (IC3), pp. 1–6. IEEE, Noida, India (2017). https:// doi.org/10.1109/IC3.2017.8284317

work page doi:10.1109/ic3.2017.8284317 2017
[3]

AI Open 4, 13–18 (2023) https://doi.org/10.1016/j.aiopen.2023.01.001

Misra, R., Arora, P.: Sarcasm detection using news headlines dataset. AI Open 4, 13–18 (2023) https://doi.org/10.1016/j.aiopen.2023.01.001

work page doi:10.1016/j.aiopen.2023.01.001 2023
[4]

IAES International Journal of Artificial Intelligence 13(4), 4695–4702 (2024) https://doi.org/10.11591/ijai.v13.i4.pp4695-4702

Palaniammal, A., Anandababu, P.: Sarcasm detection on social data: Heuristic search and deep learning. IAES International Journal of Artificial Intelligence 13(4), 4695–4702 (2024) https://doi.org/10.11591/ijai.v13.i4.pp4695-4702

work page doi:10.11591/ijai.v13.i4.pp4695-4702 2024
[5]

NPJ Artificial Intelligence1(1), 20 (2025) https://doi.org/10.1038/s44387-025-00031-9

Wu, Y., Guo, W., Liu, Z., Ji, H., Xu, Z., Zhang, D.: How large language models encode theory of mind: A study on sparse parameter patterns. NPJ Artificial Intelligence1(1), 20 (2025) https://doi.org/10.1038/s44387-025-00031-9

work page doi:10.1038/s44387-025-00031-9 2025
[6]

IEEE Transactions on Artificial Intelligence, 1–15 (2024) https://doi.org/10.1109/TAI.2024.3515935

Boutsikaris, L., Polykalas, S.: A comparative review of deep learning techniques on the classification of irony and sarcasm in text. IEEE Transactions on Artificial Intelligence, 1–15 (2024) https://doi.org/10.1109/TAI.2024.3515935

work page doi:10.1109/tai.2024.3515935 2024
[7]

In: Proceedings of the 32nd International Conference on Neural Information Processing (ICONIP)

Liu, Z., Zhou, Z., Hu, M.: Caf-i: A collaborative multi-agent framework for enhanced irony detection with large language models. In: Proceedings of the 32nd International Conference on Neural Information Processing (ICONIP). IEEE, Okinawa, Japan (2026). https://doi.org/10.48550/arXiv.2506.08430

work page doi:10.48550/arxiv.2506.08430 2026
[8]

World Models

Ha, D., Schmidhuber, J.: World Models. arXiv preprint arXiv:1803.10122 (2018). https://doi.org/10.48550/arXiv.1803.10122

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1803.10122 2018
[9]

In: Proceedings of the 14th Conference on Compu- tational Natural Language Learning (CoNLL), pp

Davidov, D., Tsur, O., Rappoport, A.: Semi-supervised recognition of sarcasm in twitter and amazon. In: Proceedings of the 14th Conference on Compu- tational Natural Language Learning (CoNLL), pp. 107–116. Association for Computational Linguistics, Uppsala, Sweden (2010) 26

work page 2010
[10]

Language Resources and Evaluation47(1), 239–268 (2013) https:// doi.org/10.1007/s10579-012-9196-x

Reyes, A., Rosso, P., Veale, T.: A multidimensional approach for detecting irony in twitter. Language Resources and Evaluation47(1), 239–268 (2013) https:// doi.org/10.1007/s10579-012-9196-x

work page doi:10.1007/s10579-012-9196-x 2013
[11]

PLOS ONE16(6), 0252918 (2021) https://doi.org/10.1371/journal.pone.0252918

Eke, C., Norman, A., Shuib, L.: Multi-feature fusion framework for sarcasm iden- tification on twitter data: A machine learning based approach. PLOS ONE16(6), 0252918 (2021) https://doi.org/10.1371/journal.pone.0252918

work page doi:10.1371/journal.pone.0252918 2021
[12]

In: Proceedings of the International Conference on Text, Speech, and Dialogue

Bharti, S.K., Sathya Babu, K., Jena, S.K.: Harnessing online news for sarcasm detection in hindi tweets. In: Proceedings of the International Conference on Text, Speech, and Dialogue. Lecture Notes in Computer Science, vol. 10415, pp. 679–686. Springer, Prague, Czech Republic (2017). https://doi.org/10.1007/ 978-3-319-69900-4 86

work page 2017
[13]

International Journal on Semantic Web and Information Systems13(4), 89–108 (2017) https://doi.org/ 10.4018/IJSWIS.2017100105

Bharti, S.K., Pradhan, R., Babu, K.S., Jena, S.K.: Sarcastic sentiment detection based on types of sarcasm occurring in twitter data. International Journal on Semantic Web and Information Systems13(4), 89–108 (2017) https://doi.org/ 10.4018/IJSWIS.2017100105

work page doi:10.4018/ijswis.2017100105 2017
[14]

In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Bhattacharyya, P., Joshi, A.: Computational sarcasm. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Copenhagen, Denmark (2017)

work page 2017
[15]

URL https://doi.org/10.3115/v1/d14-1162

Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word rep- resentation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Compu- tational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/D14-1162

work page doi:10.3115/v1/d14-1162 2014
[16]

In: Proceedings of the 26th International Conference on Computational Linguistics (COLING), pp

Poria, S., Cambria, E., Hazarika, D., Vij, P.: A deeper look into sarcastic tweets using deep convolutional neural networks. In: Proceedings of the 26th International Conference on Computational Linguistics (COLING), pp. 1601–

work page
[17]

A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks

Association for Computational Linguistics, Osaka, Japan (2016). https: //doi.org/10.48550/arXiv.1610.08815

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1610.08815 2016
[18]

In: Proceedings of the 26th International Conference on Computational Lin- guistics (COLING), pp

Zhang, M., Zhang, Y., Fu, G.: Tweet sarcasm detection using deep neural network. In: Proceedings of the 26th International Conference on Computational Lin- guistics (COLING), pp. 2449–2460. Association for Computational Linguistics, Osaka, Japan (2016)

work page 2016
[19]

In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), pp

Liang, B., Lou, C., Li, X., Yang, M., Gui, L., He, Y.,et al.: Multi-modal sarcasm detection via cross-modal graph convolutional network. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1767–1777. Association for Computational Linguistics, Dublin, Ireland (2022). https://doi.org/10.18653/v1/2022.acl-long.124

work page doi:10.18653/v1/2022.acl-long.124 2022
[20]

In: Proceedings of the 27 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Com- munications Technology (IAICT), pp

Ueno, T., Inoshita, K.: Dual-branch feature extraction via discrepancy-aware fusion with evidential deep learning for sarcasm detection. In: Proceedings of the 27 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Com- munications Technology (IAICT), pp. 345–352. IEEE, Bali, Indonesia (2025). https://doi.org/10.1109/IAICT65714.202...

work page doi:10.1109/iaict65714.2025.11101504 2025
[21]

In: Proceedings of the International Conference on Neural Information Processing

Inoshita, K., Ueno, T., Zhou, X.: Multi-scale convolutional fusion with con- trastive feature alignment for imbalanced data classification. In: Proceedings of the International Conference on Neural Information Processing. Lecture Notes in Computer Science, pp. 3–18. Springer, Kanazawa, Japan (2026). https://doi.org/ 10.1007/978-3-031-97141-9 1

work page doi:10.1007/978-3-031-97141-9 2026
[22]

IEEE Transactions on Affective Computing16(4), 2560–2578 (2025) https://doi.org/10.1109/TAFFC

Zhang, Y., Zou, C., Lian, Z., Tiwari, P., Qin, J.: Sarcasmbench: Towards eval- uating large language models on sarcasm understanding. IEEE Transactions on Affective Computing16(4), 2560–2578 (2025) https://doi.org/10.1109/TAFFC. 2025.3604806

work page doi:10.1109/taffc 2025
[23]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F.,et al.: Chain- of-thought prompting elicits reasoning in large language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS), pp. 24824–24837. Curran Associates Inc., New Orleans, USA (2022). https://doi.org/10.48550/arXiv.2201.11903

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2201.11903 2022
[24]

Automatic Chain of Thought Prompting in Large Language Models

Zhang, Z., Zhang, A., Li, M., Smola, A.: Automatic chain of thought prompting in large language models. In: Proceedings of the 11th International Conference on Learning Representations (ICLR), Kigali, Rwanda (2023). https://doi.org/10. 48550/arXiv.2210.03493

work page internal anchor Pith review Pith/arXiv arXiv 2023
[25]

25651–25659 (2025)

Yao, B., Zhang, Y., Li, Q., Qin, J.: Is sarcasm detection a step-by-step reasoning process in large language models? In: Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI), pp. 25651–25659 (2025). https://doi.org/10. 1609/aaai.v39i24.34756

work page 2025
[26]

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Du, Y., Li, S., Torralba, A., Tenenbaum, J.B., Mordatch, I.: Improving factuality and reasoning in language models through multiagent debate. In: Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, pp. 11733–11763 (2024). https://doi.org/10.48550/arXiv.2305.14325

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.14325 2024
[27]

CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

Li, G., Hammoud, H.A.A.K., Itani, H., Khizbullin, D., Ghanem, B.: Camel: Com- municative agents for “mind” exploration of large language model society. In: Proceedings of the 37th International Conference on Neural Information Process- ing Systems (NeurIPS), pp. 51991–52008. Curran Associates Inc., New Orleans, USA (2023). https://doi.org/10.48550/arXiv.2...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.17760 2023
[28]

In: Proceedings of the ICLR 2024 Workshop on LLM Agents, Vienna, Austria (2024)

Wu, Y., Jia, F., Zhang, S., Li, H., Zhu, E., Wang, Y.,et al.: Mathchat: Con- verse to tackle challenging math problems with llm agents. In: Proceedings of the ICLR 2024 Workshop on LLM Agents, Vienna, Austria (2024). https: //doi.org/10.48550/arXiv.2306.01337 28

work page doi:10.48550/arxiv.2306.01337 2024
[29]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E.,et al.: Autogen: Enabling next-gen llm applications via multi-agent conversation. In: Proceedings of the Conference on Language Modeling (COLM). Association for Computa- tional Linguistics, Pennsylvania, USA (2024). https://doi.org/10.48550/arXiv. 2308.08155

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2024
[30]

Social Development (2023) https://doi.org/10.1111/sode.12666

Misgav, K., Chomsky, A., Daniel, E.: Children’s understanding of values as men- tal concepts: Longitudinal changes and association with theory of mind. Social Development (2023) https://doi.org/10.1111/sode.12666

work page doi:10.1111/sode.12666 2023
[31]

Really? Well. Apparently Bootstrapping Improves the Performance of Sarcasm and Nastiness Classifiers for Online Dialogue

Lukin, S., Walker, M.: Really? well. apparently bootstrapping improves the per- formance of sarcasm and nastiness classifiers for online dialogue. In: Proceedings of the Workshop on Language Analysis in Social Media, pp. 30–40. Association for Computational Linguistics, Atlanta, Georgia (2013). https://doi.org/10.48550/ arXiv.1708.08572

work page internal anchor Pith review Pith/arXiv arXiv 2013
[32]

In: Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pp

Oraby, S., Harrison, V., Reed, L., Hernandez, E., Riloff, E., Walker, M.: Creating and characterizing a diverse corpus of sarcasm in dialogue. In: Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pp. 31–41. Association for Computational Linguistics, Los Angeles, USA (2016). https://doi.org/10.18653/...

work page doi:10.18653/v1/w16-3604 2016
[33]

In: Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval), pp

Van Hee, C., Lefever, E., Hoste, V.: Semeval-2018 task 3: Irony detection in english tweets. In: Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval), pp. 39–50. Association for Computational Linguistics, New Orleans, USA (2018). https://doi.org/10.18653/v1/S18-1005

work page doi:10.18653/v1/s18-1005 2018
[34]

In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pp

Tay, Y., Luu, A.T., Hui, S.C., Su, J.: Reasoning with sarcasm by reading in- between. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1010–1020. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-1093

work page doi:10.18653/v1/p18-1093 2018
[35]

In: Frontiers in Artificial Intelligence and Applications, pp

Hongliang, P., Zheng, L., Peng, F., Wang, W.: Modeling the incongruity between sentence snippets for sarcasm detection. In: Frontiers in Artificial Intelligence and Applications, pp. 337–344. IOS Press, Santiago Chile (2020). https://doi.org/10. 3233/FAIA200337

work page 2020
[36]

In: Findings of the Asso- ciation for Computational Linguistics: NAACL 2022, pp

Liu, Y., Wang, Y., Sun, A., Meng, X., Li, J., Guo, J.: A dual-channel framework for sarcasm recognition by detecting sentiment conflict. In: Findings of the Asso- ciation for Computational Linguistics: NAACL 2022, pp. 1797–1808. Association for Computational Linguistics, Seattle, USA (2022). https://doi.org/10.18653/ v1/2022.findings-naacl.126

work page 2022
[37]

BERT: pre-training of deep bidirectional transformers for language understanding

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 4171–4186. Association for Computational Linguistics, 29 Minneapolis, Minnesota (2019). https:...

work page doi:10.18653/v1/n19-1423 2019