arxiv: 2604.17174 · v1 · submitted 2026-04-19 · 💻 cs.CL

Recognition: unknown

Modeling Multi-Dimensional Cognitive States in Large Language Models under Cognitive Crowding

Lin Zhong , Siyu Zhu , Zizhen Yuan , Jinhao Cui , Xinyang Zhao , Lingzhi Wang , Hao Chen , Qing Liao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:53 UTC · model grok-4.3

classification 💻 cs.CL

keywords cognitive stateshyperbolic geometrylarge language modelscognitive crowdingmulti-dimensional modelingCognitiveBenchGromov hyperbolicityalignment tuning

0 comments

The pith

HyCoLLM embeds cognitive states in hyperbolic space to resolve representation overlap, letting an 8B model outperform GPT-4o on joint emotion, stance, thinking style, and intention tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds CognitiveBench, the first dataset with unified labels across four psychological dimensions, and shows that LLMs handle isolated tasks well but suffer sharp drops when modeling all dimensions together. Analysis via Gromov delta-hyperbolicity reveals strong hierarchical structure in the data. The authors trace the drop to cognitive crowding, in which hierarchies need exponentially growing representational volume while standard LLM embeddings expand only polynomially, producing overlaps. HyCoLLM moves the modeling into hyperbolic space and applies Hyperbolic Guided Alignment Tuning to adapt LLM vectors, yielding large gains that allow a modest 8B model to surpass GPT-4o and other baselines.

Core claim

CognitiveBench demonstrates that joint multi-dimensional cognitive modeling exposes a fundamental geometric mismatch: hierarchical states require exponential capacity, yet Euclidean LLM spaces grow polynomially and therefore overlap. HyCoLLM corrects this by performing cognitive-state modeling directly in hyperbolic space and aligning the LLM via Hyperbolic Guided Alignment Tuning, which preserves hierarchy without crowding and restores joint-task accuracy.

What carries the argument

Hyperbolic space embedding of cognitive states combined with Hyperbolic Guided Alignment Tuning, which realigns LLM representations to match the hierarchical geometry revealed by Gromov delta-hyperbolicity analysis.

If this is right

Joint multi-dimensional cognitive understanding becomes practical for LLMs without the observed accuracy collapse.
Smaller-parameter models can exceed much larger baselines once the geometric mismatch is removed.
Hyperbolic embeddings become a viable replacement for Euclidean ones on any psychological or hierarchical annotation task.
Alignment tuning that respects hyperbolic geometry improves simultaneous performance across emotion, stance, intention, and thinking style.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same crowding mechanism may limit LLM performance on other hierarchically organized tasks such as multi-hop reasoning or nested planning.
CognitiveBench-style benchmarks could be extended to measure crowding in real-time dialogue systems that must track emotion and intention simultaneously.
If hyperbolic alignment generalizes, it offers a parameter-efficient route to richer internal state representations without increasing model size.

Load-bearing premise

The performance collapse on joint tasks is caused by the mismatch between exponential space demands of hierarchical cognitive states and the polynomial growth of Euclidean LLM embeddings.

What would settle it

A controlled experiment that keeps all other factors fixed and shows that joint accuracy on CognitiveBench does not recover when the same 8B model is fine-tuned in Euclidean space with otherwise identical alignment tuning.

Figures

Figures reproduced from arXiv: 2604.17174 by Hao Chen, Jinhao Cui, Lingzhi Wang, Lin Zhong, Qing Liao, Siyu Zhu, Xinyang Zhao, Zizhen Yuan.

**Figure 2.** Figure 2: Gromov δ hyperbolicity analysis across four datasets. Consistently low relative delta values of approximately 1% indicate a strong intrinsic hierarchical structure. nomic discussions, whereas Emotion and Intent show slightly lower but still substantial agreement due to the inherently more subjective nature of affect attribution. We view this combination of scale, topical diversity, multi-user coverage, a… view at source ↗

**Figure 3.** Figure 3: The overall framework of HyCoLLM. thinking, stance, and intent simultaneously) is only 5.7% for GPT-4o and 3.4% for Claude-4.5. Analysis reveals that models frequently confuse concepts that are semantically related but hierarchically distinct, such as misidentifying “intuitive thinking” as “emotional judgment.” We hypothesize that this failure stems from “Cognitive Crowding,” a phenomenon caused by a geo… view at source ↗

**Figure 4.** Figure 4: Results of pairwise human evaluations on 100 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of the Semantic-Cognitive Alignment using UMAP. The Circles (◦) represent the cognitive anchors derived from the HCN, while the Stars (⋆) denote the LLM’s semantic features. 5.3 Qualitative Analysis To intuitively verify the impact of HyCoLLM, we visualize the alignment between the cognitive geometric prior and the LLM’s semantic space using UMAP ( [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Distribution of labels across Emotion, Thinking, Stance, and Intent dimensions for the four datasets (CUT, [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: Computational Cost Analysis across different [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 9.** Figure 9: Visualization of training dynamics. Top: The raw and smoothed loss curves for SFT (generation), SCT (alignment), and Total Loss. Bottom: The relative ratio (%) of the SFT and SCT loss components over time. The curves demonstrate stable convergence and persistent geometric guidance [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗

**Figure 8.** Figure 8: Hyperparameter sensitivity analysis averaged [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

**Figure 10.** Figure 10: The instruction interface provided to human evaluators for pairwise comparison. [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗

**Figure 11.** Figure 11: The prompt template used for closed-source LLMs (e.g., GPT-4o, Claude-4.5). The prompt employs a [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗

read the original abstract

Modeling human cognitive states is essential for advanced artificial intelligence. Existing Large Language Models (LLMs) mainly address isolated tasks such as emotion analysis or stance detection, and fail to capture interactions among cognitive dimensions defined in psychology, including emotion, thinking style, stance, and intention. To bridge this gap, we construct CognitiveBench, the first benchmark with unified annotations across the above four dimensions. Experiments on CognitiveBench show that although LLMs perform well on single dimension tasks, their performance drops sharply in joint multi-dimensional modeling. Using Gromov $\delta$-hyperbolicity analysis, we find that CognitiveBench exhibits a strong hierarchical structure. We attribute the performance bottleneck to ``Cognitive Crowding'', where hierarchical cognitive states require exponential representational space, while the Euclidean space of LLMs grows only polynomially, causing representation overlap and degraded performance. To address this mismatch, we propose HyCoLLM, which models cognitive states in hyperbolic space and aligns LLM representations via Hyperbolic Guided Alignment Tuning. Results show that HyCoLLM substantially improves multi-dimensional cognitive understanding, allowing 8B parameter model to outperform strong baselines, including GPT-4o.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CognitiveBench and HyCoLLM add a useful benchmark and hyperbolic alignment approach for joint cognitive dimensions, but the crowding mechanism lacks direct evidence tying hyperbolicity to embedding overlap.

read the letter

Colleague, the main thing to know is that this paper builds CognitiveBench as the first unified set of annotations across emotion, thinking style, stance, and intention, then shows LLMs drop on joint tasks and introduces HyCoLLM to model them in hyperbolic space with guided alignment tuning. An 8B model reportedly beats GPT-4o on the multi-dimensional setup, which is the concrete result worth looking at. The benchmark itself is new and fills a gap in handling interacting cognitive states rather than isolated ones. They also measure strong hierarchy via Gromov δ-hyperbolicity on the data, which supports trying non-Euclidean geometry. That combination of benchmark plus method is the part that feels fresh compared to prior single-dimension work. The performance lift from the hyperbolic tuning is the practical payoff they highlight. The softer part is the causal account of cognitive crowding. They link the hierarchy to exponential space needs versus polynomial Euclidean growth and resulting overlap, but the paper does not appear to include direct checks such as measuring actual embedding distortions, overlap quantities, or capacity loss specifically under joint versus single-dimension conditions. Other factors like label correlations or optimization effects could explain the joint-task drop, and the hyperbolic fix cannot yet be isolated as resolving the claimed bottleneck. The abstract also leaves out error bars, exact splits, and full baseline comparisons, so the strength of the gains needs the full results to assess. This is work for people focused on geometric representations in cognitive AI or multi-task human-state modeling. A reader who wants a new testbed and a geometry-based proposal will find material to engage with. It deserves peer review because the benchmark and method are specific enough to evaluate and build on, even if the mechanistic story requires more supporting measurements.

Referee Report

3 major / 2 minor

Summary. The paper constructs CognitiveBench, the first benchmark with unified annotations across four cognitive dimensions (emotion, thinking style, stance, intention). It reports that LLMs perform well on single-dimension tasks but exhibit sharp degradation on joint multi-dimensional modeling. Gromov δ-hyperbolicity analysis reveals strong hierarchical structure in the data, which the authors attribute to 'Cognitive Crowding' arising from the mismatch between exponential representational requirements of hierarchies and the polynomial volume growth of Euclidean LLM spaces, leading to representation overlap. To address this, they introduce HyCoLLM, which embeds cognitive states in hyperbolic space and applies Hyperbolic Guided Alignment Tuning; experiments indicate substantial gains, with an 8B-parameter model outperforming strong baselines including GPT-4o.

Significance. If the causal mechanism is substantiated, the work offers a principled approach to multi-dimensional cognitive modeling by leveraging hyperbolic geometry for hierarchical structures, potentially benefiting dialogue systems, psychological AI, and multi-task NLP. The unified CognitiveBench provides a reusable resource for standardized evaluation across dimensions. Credit is due for the benchmark construction with joint annotations and the empirical demonstration of gains via hyperbolic alignment, which could inspire further geometry-aware methods if the crowding hypothesis is isolated from confounds.

major comments (3)

[Abstract] Abstract and hyperbolicity analysis: The central claim attributes joint-task degradation to Cognitive Crowding from exponential-vs-polynomial space mismatch, supported only by δ-hyperbolicity on CognitiveBench plus observed performance drops. No direct measurements (e.g., embedding overlap, pairwise distance distortion, or effective capacity under joint vs. single-dimension conditions) are provided to establish the posited representational overlap, leaving alternative explanations such as task interference or label correlations unaddressed and the hyperbolic fix's specificity unisolated.
[Experiments] Experiments section: The headline result that an 8B HyCoLLM outperforms GPT-4o and other strong baselines lacks reported details on exact baselines, statistical significance tests, error bars, data splits, and ablation controls that isolate the hyperbolic component. This undermines interpretability of the performance gains and the claim that the method resolves the identified bottleneck.
[Method] Method and analysis: The definition and operationalization of Cognitive Crowding (as distinct from general multi-task interference) is introduced without a quantitative metric beyond hyperbolicity; the Hyperbolic Guided Alignment Tuning procedure requires explicit equations for the alignment loss and any projection steps to allow reproduction and verification that it specifically mitigates the claimed volume mismatch.

minor comments (2)

[Abstract] The abstract states 'strong hierarchical structure' but does not report the specific δ-hyperbolicity values or comparisons against non-hierarchical baselines.
Ensure all tables and figures include error bars, exact metric definitions, and legends; clarify notation for any hyperbolic operations (e.g., Möbius addition) in the method description.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has identified important areas for strengthening the manuscript. We address each major comment point by point below.

read point-by-point responses

Referee: [Abstract] Abstract and hyperbolicity analysis: The central claim attributes joint-task degradation to Cognitive Crowding from exponential-vs-polynomial space mismatch, supported only by δ-hyperbolicity on CognitiveBench plus observed performance drops. No direct measurements (e.g., embedding overlap, pairwise distance distortion, or effective capacity under joint vs. single-dimension conditions) are provided to establish the posited representational overlap, leaving alternative explanations such as task interference or label correlations unaddressed and the hyperbolic fix's specificity unisolated.

Authors: We agree that direct measurements of representational overlap would provide stronger causal evidence for the Cognitive Crowding mechanism. The current manuscript relies on the combination of sharp performance degradation in joint tasks and high Gromov δ-hyperbolicity as supporting indicators of hierarchical structure leading to volume mismatch. In the revision we will add explicit analyses of embedding overlap, pairwise distance distortion, and effective capacity comparisons between joint and single-dimension conditions. We will also expand the discussion to address alternative explanations such as task interference and label correlations, and include further controls to assess the specificity of the hyperbolic intervention. revision: partial
Referee: [Experiments] Experiments section: The headline result that an 8B HyCoLLM outperforms GPT-4o and other strong baselines lacks reported details on exact baselines, statistical significance tests, error bars, data splits, and ablation controls that isolate the hyperbolic component. This undermines interpretability of the performance gains and the claim that the method resolves the identified bottleneck.

Authors: We accept that the experimental reporting requires greater detail and transparency to support the claims. In the revised manuscript we will specify all baseline models and prompting configurations exactly, report statistical significance tests with p-values, include error bars computed over multiple random seeds, document the train/validation/test splits, and add ablation studies that isolate the hyperbolic embedding and Hyperbolic Guided Alignment Tuning components. revision: yes
Referee: [Method] Method and analysis: The definition and operationalization of Cognitive Crowding (as distinct from general multi-task interference) is introduced without a quantitative metric beyond hyperbolicity; the Hyperbolic Guided Alignment Tuning procedure requires explicit equations for the alignment loss and any projection steps to allow reproduction and verification that it specifically mitigates the claimed volume mismatch.

Authors: We agree that a quantitative metric for Cognitive Crowding beyond hyperbolicity and full mathematical specifications are necessary for reproducibility. In the revision we will introduce an explicit quantitative measure of representational overlap (e.g., based on embedding similarity or distortion) to operationalize Cognitive Crowding as distinct from generic multi-task effects. We will also provide the complete equations for the alignment loss and all projection operations in Hyperbolic Guided Alignment Tuning. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation chain is self-contained with independent empirical components

full rationale

The paper introduces CognitiveBench, measures its Gromov δ-hyperbolicity to establish hierarchy, reports observed performance drops on joint vs. single-dimension tasks, offers an interpretive attribution to a space-mismatch effect labeled Cognitive Crowding, and then presents HyCoLLM with empirical gains. No step reduces to another by construction: the hyperbolicity result is a direct computation on the benchmark data, the performance numbers are experimental outcomes, and the hyperbolic modeling proposal is a distinct architectural choice whose success is evaluated separately. No self-citations, fitted parameters renamed as predictions, or definitional loops appear in the provided chain. The attribution is an explanatory hypothesis rather than a tautological reduction, leaving the overall derivation independent.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that cognitive states are hierarchically structured (measured by hyperbolicity) and that Euclidean embeddings cannot efficiently represent them without overlap. No explicit free parameters or additional invented entities beyond the explanatory term 'Cognitive Crowding' are stated in the abstract.

axioms (1)

domain assumption Cognitive states across emotion, thinking style, stance, and intention form a hierarchical structure that can be quantified by Gromov δ-hyperbolicity
Invoked to analyze CognitiveBench and to attribute the performance drop to space mismatch.

invented entities (1)

Cognitive Crowding no independent evidence
purpose: Explains why joint multi-dimensional modeling degrades LLM performance
Postulated as the mechanism linking hierarchical structure to representation overlap in Euclidean space.

pith-pipeline@v0.9.0 · 5516 in / 1450 out tokens · 51641 ms · 2026-05-10T06:53:56.868207+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

78 extracted references · 14 canonical work pages · 5 internal anchors

[1]

I understand your perspective

“I understand your perspective”: LLM Persuasion through the Lens of Communicative Action Theory , author =. Findings of the Association for Computational Linguistics: ACL 2025 , pages =

2025
[2]

Machine Learning , volume =

Evaluating large language models for user stance detection on X (Twitter) , author =. Machine Learning , volume =. 2024 , publisher =

2024
[3]

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 , pages =

P-stance: A large dataset for stance detection in political domain , author =. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 , pages =

2021
[4]

Journal of Scientific & Industrial Research , volume =

Stance and Sentiment Analysis of Health-related Tweets with Data Augmentation , author =. Journal of Scientific & Industrial Research , volume =
[5]

Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , pages =

SentiStance: quantifying the intertwined changes of sentiment and stance in response to an event in online forums , author =. Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , pages =

2021
[6]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

What is the real intention behind this question? dataset collection and intention classification , author =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =
[7]

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers , pages =

A dataset for multi-target stance detection , author =. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers , pages =
[8]

2015 IEEE international conference on data mining workshop (ICDMW) , pages =

An ensemble sentiment classification system of twitter data for airline services analysis , author =. 2015 IEEE international conference on data mining workshop (ICDMW) , pages =

2015
[9]

Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) , pages =

Semeval-2016 task 6: Detecting stance in tweets , author =. Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) , pages =

2016
[10]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages =

Improving multi-task stance detection with multi-task interaction network , author =. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages =

2022
[11]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

Gunstance: Stance detection for gun control and gun regulation , author =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =
[12]

2024 , eprint =

EcoVerse: An annotated Twitter dataset for eco-relevance classification, environmental impact analysis, and stance detection , author =. 2024 , eprint =

2024
[13]

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics , pages =

Stance detection in COVID-19 tweets , author =. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics , pages =
[14]

Electronics , volume =

Integrating emotional features for stance detection aimed at social network security: A multi-task learning approach , author =. Electronics , volume =
[15]

Proceedings of the ACM Web Conference 2023 , pages =

A multi-task model for emotion and offensive aided stance detection of climate change tweets , author =. Proceedings of the ACM Web Conference 2023 , pages =

2023
[16]

Scientific reports , volume =

The global landscape of cognition: hierarchical aggregation as an organizational principle of human cortical networks and functions , author =. Scientific reports , volume =. 2015 , publisher =

2015
[17]

, author =

Emotion knowledge: further exploration of a prototype approach. , author =. Journal of personality and social psychology , volume =. 1987 , publisher =

1987
[18]

Large language models fail on trivial alterations to theory-of-mind tasks, 2023

Large language models fail on trivial alterations to theory-of-mind tasks , author =. arXiv preprint arXiv:2302.08399 , year =

work page arXiv
[19]

Proceedings of the AAAI conference on artificial intelligence , volume =

Atomic: An atlas of machine commonsense for if-then reasoning , author =. Proceedings of the AAAI conference on artificial intelligence , volume =
[20]

, author =

Empirical validation of affect, behavior, and cognition as distinct components of attitude. , author =. Journal of personality and social psychology , volume =. 1984 , publisher =

1984
[21]

Thomson Wadsworth , year =

The psychology of attitudes , author =. Thomson Wadsworth , year =
[22]

Theories of emotion , pages =

A general psychoevolutionary theory of emotion , author =. Theories of emotion , pages =. 1980 , publisher =

1980
[23]

Proceedings of the 33rd ACM International Conference on Information and Knowledge Management , pages =

Can llms reason like humans? assessing theory of mind reasoning in llms for open-ended questions , author =. Proceedings of the 33rd ACM International Conference on Information and Knowledge Management , pages =
[24]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Tomato: Verbalizing the mental states of role-playing llms for benchmarking theory of mind , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =
[25]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Act2P: LLM-Driven Online Dialogue Act Classification for Power Analysis , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025
[26]

arXiv preprint arXiv:2507.01543 , year=

Is External Information Useful for Stance Detection with LLMs? , author=. arXiv preprint arXiv:2507.01543 , year=

work page arXiv
[27]

Australasian Joint Conference on Artificial Intelligence , pages=

Beyond Factualism: A Study of LLM Calibration Through the Lens of Conversational Emotion Recognition , author=. Australasian Joint Conference on Artificial Intelligence , pages=. 2024 , organization=

2024
[28]

Tsinghua Science and Technology , volume=

LLM4DEU: fine tuning large language model for medical diagnosis in outpatient and emergency department visits of neurosurgery , author=. Tsinghua Science and Technology , volume=. 2025 , publisher=

2025
[29]

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

Comapoi: A collaborative multi-agent framework for next poi prediction bridging the gap between trajectory and language , author=. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
[30]

Nature reviews neuroscience , volume=

On the relationship between emotion and cognition , author=. Nature reviews neuroscience , volume=. 2008 , publisher=

2008
[31]

Proceedings of the 2018 international conference on technical debt , pages=

Cognitive complexity: An overview and evaluation , author=. Proceedings of the 2018 international conference on technical debt , pages=

2018
[32]

theory of mind

What is “theory of mind”? Concepts, cognitive processes and individual differences , author=. Quarterly journal of experimental psychology , volume=. 2012 , publisher=

2012
[33]

, author =

Social judgment: Assimilation and contrast effects in communication and attitude change. , author =. 1961 , publisher =

1961
[34]

Attitude Organization and Change , year =

Cognitive, affective, and behavioral components of attitudes , author =. Attitude Organization and Change , year =
[35]

1975 , publisher =

How to do things with words , author =. 1975 , publisher =

1975
[36]

Language in society , volume =

A classification of illocutionary acts1 , author =. Language in society , volume =. 1976 , publisher =

1976
[37]

British journal of social psychology , volume =

The significance of the social identity concept for social psychology with reference to individualism, interactionism and social influence , author =. British journal of social psychology , volume =. 1986 , publisher =

1986
[38]

1980 , publisher =

Emotion, theory, research, and experience , author =. 1980 , publisher =

1980
[39]

2011 , publisher =

Thinking, fast and slow , author =. 2011 , publisher =

2011
[40]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[41]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities , author=. arXiv preprint arXiv:2507.06261 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[42]

Qwen2.5-Coder Technical Report

Qwen2. 5-coder technical report , author=. arXiv preprint arXiv:2409.12186 , year=

work page internal anchor Pith review arXiv
[43]

LLaMA: Open and Efficient Foundation Language Models

Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[44]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[45]

Jiang, Y

From clip to dino: Visual encoders shout in multi-modal large language models , author=. arXiv preprint arXiv:2310.08825 , year=

work page arXiv
[46]

GitHub repository , howpublished=

Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec , title=. GitHub repository , howpublished=. 2020 , publisher=

2020
[47]

Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
[48]

arXiv preprint arXiv:2402.06044 , year =

OpenToM: A comprehensive benchmark for evaluating theory-of-mind reasoning capabilities of large language models , author =. arXiv preprint arXiv:2402.06044 , year =

work page arXiv
[49]

Proceedings of the National Academy of Sciences , volume =

Evaluating large language models in theory of mind tasks , author =. Proceedings of the National Academy of Sciences , volume =. 2024 , publisher =

2024
[50]

Nature Human Behaviour , volume =

Testing theory of mind in large language models and humans , author =. Nature Human Behaviour , volume =. 2024 , publisher =

2024
[51]

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

Clever hans or neural theory of mind? stress testing social reasoning in large language models , author =. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =
[52]

Proceedings of the 33rd ACM International Conference on Information and Knowledge Management , pages =

Two heads are better than one: zero-shot cognitive reasoning via multi-LLM knowledge fusion , author =. Proceedings of the 33rd ACM International Conference on Information and Knowledge Management , pages =
[53]

arXiv preprint arXiv:2402.07092 , year =

Generalizing conversational dense retrieval via llm-cognition data augmentation , author =. arXiv preprint arXiv:2402.07092 , year =

work page arXiv
[54]

arXiv preprint arXiv:2405.16964 , year =

Exploring the llm journey from cognition to expression with linear representations , author =. arXiv preprint arXiv:2405.16964 , year =

work page arXiv
[55]

International Journal of Human-Computer Studies , volume =

Integrating augmented reality and LLM for enhanced cognitive support in critical audio communications , author =. International Journal of Human-Computer Studies , volume =. 2025 , publisher =

2025
[56]

Findings of the association for computational linguistics: EMNLP 2024 , pages =

Cognitive bias in decision-making with LLMs , author =. Findings of the association for computational linguistics: EMNLP 2024 , pages =

2024
[57]

arXiv preprint arXiv:2403.16008 , year =

CBT-LLM: A Chinese large language model for cognitive behavioral therapy-based mental health question answering , author =. arXiv preprint arXiv:2403.16008 , year =

work page arXiv
[58]

ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages =

Llm supervised pre-training for multimodal emotion recognition in conversations , author =. ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages =. 2025 , organization =

2025
[59]

Proceedings of the 32nd ACM International Conference on Multimedia , pages =

Towards emotion-enriched text-to-motion generation via LLM-guided limb-level emotion manipulating , author =. Proceedings of the 32nd ACM International Conference on Multimedia , pages =
[60]

Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems , pages =

Don't Get Too Excited-Eliciting Emotions in LLMs , author =. Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems , pages =
[61]

Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems , pages =

Context over Categories: Implementing the Theory of Constructed Emotion with LLM-Guided User Analysis , author =. Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems , pages =
[62]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers) , pages =

LLM-driven knowledge injection advances zero-shot and cross-target stance detection , author =. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers) , pages =

2024
[63]

Findings of the Association for Computational Linguistics: ACL 2025 , pages =

Can Large Language Models Address Open-Target Stance Detection? , author =. Findings of the Association for Computational Linguistics: ACL 2025 , pages =

2025
[64]

Companion Proceedings of the ACM on Web Conference 2025 , pages =

TAT: Improving Stance Detection on Social Media through Thought Alignment with LLMs , author =. Companion Proceedings of the ACM on Web Conference 2025 , pages =

2025
[65]

Proceedings of the 47th international ACM SIGIR conference on research and development in information retrieval , pages =

Pag-llm: Paraphrase and aggregate with large language models for minimizing intent classification errors , author =. Proceedings of the 47th international ACM SIGIR conference on research and development in information retrieval , pages =
[66]

arXiv preprint arXiv:2404.13940 , year =

A user-centric multi-intent benchmark for evaluating large language models , author =. arXiv preprint arXiv:2404.13940 , year =

work page arXiv
[67]

Nickel, Maximillian and Kiela, Douwe , journal =. Poincar
[68]

Advances in neural information processing systems , volume =

Hyperbolic neural networks , author =. Advances in neural information processing systems , volume =
[69]

Poincaré glove: Hyperbolic word embeddings

Poincar 'e glove: Hyperbolic word embeddings , author =. arXiv preprint arXiv:1810.06546 , year =

work page arXiv
[70]

IEEE Transactions on pattern analysis and machine intelligence , volume =

Hyperbolic deep neural networks: A survey , author =. IEEE Transactions on pattern analysis and machine intelligence , volume =. 2021 , publisher =

2021
[71]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

Fully hyperbolic neural networks , author =. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =
[72]

Proceedings of the IEEE/CVF international conference on computer vision , pages =

Pointr: Diverse point cloud completion with geometry-aware transformers , author =. Proceedings of the IEEE/CVF international conference on computer vision , pages =
[73]

Advances in Neural Information Processing Systems , volume =

Context and geometry aware voxel transformer for semantic scene completion , author =. Advances in Neural Information Processing Systems , volume =
[74]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages =

Geometry-aware scene text detection with instance transformation network , author =. Proceedings of the IEEE conference on computer vision and pattern recognition , pages =
[75]

Continuous hierarchical representations with poincar

Mathieu, Emile and Le Lan, Charline and Maddison, Chris J and Tomioka, Ryota and Teh, Yee Whye , journal =. Continuous hierarchical representations with poincar
[76]

Multi-relational poincar

Balazevic, Ivana and Allen, Carl and Hospedales, Timothy , journal =. Multi-relational poincar
[77]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Hyperbolic contrastive learning for visual representations beyond objects , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
[78]

Proceedings of the 40th International Conference on Machine Learning (ICML) , pages =

Hyperbolic image-text representations , author =. Proceedings of the 40th International Conference on Machine Learning (ICML) , pages =