pith. sign in

arxiv: 2605.17228 · v1 · pith:EMJYWM5Tnew · submitted 2026-05-17 · 💻 cs.CL

Artificial Intolerance: Stigmatizing Language in Clinical Documentation Skews Large Language Model Decision-Making

Pith reviewed 2026-05-20 14:44 UTC · model grok-4.3

classification 💻 cs.CL
keywords stigmatizing languagelarge language modelsclinical decision supportAI biashealth disparitiesclinical NLPLLM robustness
0
0 comments X

The pith

Stigmatizing language in clinical notes causes large language models to recommend less aggressive patient management

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large language models pick up stigmatizing language that doctors sometimes write in patient notes and whether that language changes the medical decisions the models suggest. Researchers created patient vignettes for four conditions and added sentences expressing doubt, blame, or maligning the patient at different strengths. All nine frontier models they tried showed the same pattern: the presence of even one such sentence shifted recommendations toward less aggressive care, and more stigmatizing language produced stronger shifts. Common fixes such as asking the model to reason step by step or to correct its own bias did not remove the effect. A sympathetic reader would care because hospitals are starting to use these models to help with real clinical choices, which could quietly widen differences in how patients are treated.

Core claim

All nine evaluated frontier large language models exhibit substantial bias when processing clinical vignettes that contain stigmatizing language. Clinical decision-making is significantly skewed toward less aggressive patient management, with a clear dose-response relationship in which a single stigmatizing sentence is sufficient to alter model outputs. Standard prompt-based mitigation strategies, including Chain-of-Thought reasoning and model self-debiasing, show limited efficacy because models struggle to identify stigmatizing language explicitly while remaining implicitly influenced by it.

What carries the argument

Stigmatizing language phenotypes (doubt, blame, and maligning) injected at varying intensities into otherwise matched clinical vignettes for four medical conditions; this serves as the controlled variable to isolate effects on model-generated clinical management decisions

If this is right

  • Models used for clinical decision support will systematically recommend less aggressive care when patient notes contain stigmatizing language.
  • A single stigmatizing sentence is enough to change model outputs, showing high sensitivity to linguistic framing.
  • Chain-of-Thought and self-debiasing prompts fail to eliminate the implicit influence of stigmatizing language.
  • The bias appears consistently across all nine tested frontier models, pointing to a general limitation in current LLM clinical applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Hospitals deploying LLMs may need to scan and rewrite incoming notes to remove stigmatizing phrasing before feeding them to models.
  • The dose-response pattern implies that the prevalence of stigmatizing language in training corpora could determine how strongly future models inherit this bias.
  • Similar effects could appear when LLMs process any human text that carries subtle negative framing, such as legal or employment records.
  • Routine linguistic audits of training data and input text may be required to prevent automated reinforcement of existing health disparities.

Load-bearing premise

The artificially constructed vignettes with added stigmatizing language produce the same model behavior as stigmatizing language that occurs naturally in real human-written clinical documentation.

What would settle it

Run the same models on pairs of real hospital clinical notes that differ only in the presence or absence of stigmatizing language and check whether the models still recommend less aggressive management for the stigmatized versions.

Figures

Figures reproduced from arXiv: 2605.17228 by Amy Oh, Anne R. Links, Didi Zhou, Faith Kamau, Jen-tse Huang, Mark Dredze, Mary Catherine Beach, Somnath Saha.

Figure 1
Figure 1. Figure 1: The presence of stigmatizing language within clinical notes can bias LLMs to favor less aggressive management. thermore, exposure to SL resulted in a consistent decline in simulated clinician attitudes across all models and clinical scenarios. Mitigation strate￾gies showed limited efficacy; while CoT provided partial relief, self-debiasing underperformed, sug￾gesting models struggle to explicitly identify … view at source ↗
Figure 2
Figure 2. Figure 2: Impact of varying intensities of SL on LLM clinical decision-making and simulated attitudes across four disease scenarios (SCD, Obesity, Cirrhosis, and Fibromyalgia) evaluated on nine frontier models. Across both panels, markers denote the dose of SL injected into the clinical vignette: Neutral baseline (light purple circles), 7 SL sentences (purple squares), 14 SL sentences (dark purple downward triangles… view at source ↗
Figure 3
Figure 3. Figure 3: Comparative effect sizes of patient demographics versus SL on model outputs. The lollipop charts illustrate the magnitude of influence each variable exerts on the LLMs’ responses. consistent. This indicates that LLMs inadvertently inherit and propagate implicit biases embedded within human￾generated clinical narratives, defaulting to less compre￾hensive care when a patient’s presentation is linguistically … view at source ↗
Figure 4
Figure 4. Figure 4: Impact of SL type and amount. The grouped bar chart illustrates the effect of varying doses (1, 4, and 7 sentences) of SL across Doubt (red), Blame (blue), Maligning (yellow), and a Mixed set of all three (All, grey). weighed those of all demographic permutations across both objective treatment decisions (Cramer’s V; Figure 3a) and simulated clinician attitudes (η 2 ; Figure 3b). This finding signifies a p… view at source ↗
Figure 5
Figure 5. Figure 5: Impact of SL across clinical scenarios and models. The heatmap displays the delta between the stigmatized and neutral baseline. Darker purple cells indicate more severe disparities. Values in parentheses represent the marginal average. that the primary vector for AI-driven healthcare disparities is shifting from overt demographic prejudice toward the covert forms of implicit bias, like the mechanism of SL.… view at source ↗
Figure 6
Figure 6. Figure 6: Legend: SCD (DeepSeek) 0 25 50 75 100 N S(1) S(4) S(7) S(14) S(21) Treatment Score Obesity (Qwen) 0 25 50 75 100 N S(1) S(4) S(7) S(14) S(21) Treatment Score Cirrhosis (Gemini) 0 25 50 75 100 N S(1) S(4) S(7) S(14) S(21) Treatment Score Fibromyalgia (LLaMA) 0 25 50 75 100 N S(1) S(4) S(7) S(14) S(21) Treatment Score 15 20 25 30 35 40 N S(1) S(4) S(7) S(14) S(21) PASS Score 15 20 25 30 35 40 N S(1) S(4) S(7… view at source ↗
Figure 7
Figure 7. Figure 7: The prompt used for our debiasing (mitigation strategy 2). Notes are appended at the end, replacing “note”. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: One pair (neutral and stigmatized) of example prompts used for testing models on the SCD scenario. Demographic information and other relevant descriptions (e.g., pronouns) are varied across one trail. Red highlights doubt language, Blue represents blame language, and Yellow denotes maligning language. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: One pair (neutral and stigmatized) of example prompts used for testing models on the obesity scenario. Demographic information and other relevant descriptions (e.g., pronouns) are varied across one trail. Red highlights doubt language, Blue represents blame language, and Yellow denotes maligning language. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: One pair (neutral and stigmatized) of example prompts used for testing models on the cirrhosis scenario. Demographic information and other relevant descriptions (e.g., pronouns) are varied across one trail. Red highlights doubt language, Blue represents blame language, and Yellow denotes maligning language. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: One pair (neutral and stigmatized) of example prompts used for testing models on the fibromyalgia scenario. Demographic information and other relevant descriptions (e.g., pronouns) are varied across one trail. Red highlights doubt language, Blue represents blame language, and Yellow denotes maligning language. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
read the original abstract

Large Language Models (LLMs) are increasingly deployed in high-stakes domains such as clinical decision support and medical documentation. However, the robustness of these models against subtle linguistic variations, specifically stigmatizing language (SL) commonly found in human-authored clinical notes, remains critically under-explored. In this work, we investigate whether frontier LLMs inherit and propagate this human bias when processing clinical text. We systematically evaluate nine frontier LLMs across four stigmatized medical conditions, utilizing clinical vignettes injected with varying intensities and phenotypes of SL (doubt, blame, and maligning). Our results demonstrate that all evaluated models exhibit substantial bias, with clinical decision-making significantly skewed towards less aggressive patient management. Notably, we observe a high sensitivity to linguistic framing, where a single SL sentence is sufficient to alter model outputs, revealing a clear dose-response relationship. Furthermore, we evaluate standard prompt-based mitigation strategies, including Chain-of-Thought (CoT) reasoning and model self-debiasing. These approaches show limited efficacy; models struggle to explicitly identify SL while remaining implicitly influenced by it. Our findings expose a critical vulnerability in current LLMs regarding fairness and robustness in clinical NLP, underscoring the need for rigorous algorithmic guardrails to prevent the automation of health disparities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper evaluates nine frontier LLMs on clinical vignettes for four stigmatized conditions, injecting varying intensities and phenotypes of stigmatizing language (doubt, blame, maligning). It reports that all models exhibit bias toward less aggressive patient management, that a single SL sentence suffices to shift outputs, and that a dose-response pattern appears; standard mitigations (CoT, self-debiasing) show limited success in removing the implicit influence.

Significance. If the attribution to SL holds after proper controls, the work supplies concrete empirical evidence that current LLMs can propagate subtle linguistic biases from human clinical notes into high-stakes decisions. The multi-model, multi-condition design and the finding that even minimal SL alters outputs would be useful for informing robustness requirements in clinical NLP deployments.

major comments (2)
  1. [Methods / Vignette Design] Vignette construction (Methods): the manuscript compares SL-injected vignettes against the original versions but does not describe matched neutral-sentence controls of equivalent length, sentence count, or lexical diversity. Without these controls, shifts in model outputs cannot be isolated to the stigmatizing content rather than incidental prompt-length or structural changes, which directly undermines the central claim that SL itself skews decision-making.
  2. [Results] Results reporting: the abstract and main findings state consistent directional effects and a dose-response relationship, yet no details are supplied on the precise operationalization of 'less aggressive management,' the statistical tests used, effect sizes, or vignette validation procedures. These omissions make it impossible to judge the reliability or magnitude of the reported bias.
minor comments (2)
  1. [Methods] The description of the four medical conditions and the exact SL phenotypes would benefit from an explicit table listing the injected sentences for each intensity level.
  2. [Results / Figures] Figure captions for the dose-response plots should include the number of model runs or seeds and whether error bars represent standard error or confidence intervals.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments, which have prompted us to strengthen the methodological controls and reporting transparency in our work. We address each major comment in detail below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Methods / Vignette Design] Vignette construction (Methods): the manuscript compares SL-injected vignettes against the original versions but does not describe matched neutral-sentence controls of equivalent length, sentence count, or lexical diversity. Without these controls, shifts in model outputs cannot be isolated to the stigmatizing content rather than incidental prompt-length or structural changes, which directly undermines the central claim that SL itself skews decision-making.

    Authors: We appreciate this methodological concern. Our original vignettes function as the primary baseline, with SL phrases inserted into otherwise fixed clinical content and structure across conditions. However, we acknowledge that this design does not fully rule out confounds from added sentence length or lexical shifts. To address the point directly, the revised manuscript will incorporate a new set of matched neutral-sentence controls constructed to preserve equivalent length, sentence count, and lexical diversity (measured via type-token ratio and word frequency norms). We will detail the construction procedure in Methods, present comparative results, and discuss how these controls confirm that the observed shifts are attributable to SL content rather than surface features. revision: yes

  2. Referee: [Results] Results reporting: the abstract and main findings state consistent directional effects and a dose-response relationship, yet no details are supplied on the precise operationalization of 'less aggressive management,' the statistical tests used, effect sizes, or vignette validation procedures. These omissions make it impossible to judge the reliability or magnitude of the reported bias.

    Authors: We thank the referee for identifying these reporting gaps. In the revision we will expand the Results and Methods sections with the following: (1) explicit operationalization of 'less aggressive management' via concrete decision metrics (e.g., binary choice of aggressive vs. conservative treatment options and a 1-5 intensity rating scale); (2) full description of statistical procedures, including the specific tests (paired t-tests or McNemar's tests with Bonferroni correction), significance thresholds, and power considerations; (3) effect sizes (Cohen's d for continuous outcomes and odds ratios for categorical decisions); and (4) vignette validation details, including review by two board-certified clinicians for clinical plausibility and face validity. These additions will enable readers to evaluate both the magnitude and robustness of the reported effects. revision: yes

Circularity Check

0 steps flagged

No significant circularity: purely empirical evaluation

full rationale

The paper conducts a controlled empirical comparison of LLM outputs on clinical vignettes with and without injected stigmatizing language across multiple models and conditions. No mathematical derivations, equations, fitted parameters, or first-principles predictions are presented that could reduce to inputs by construction. Results derive directly from model inference runs and statistical analysis of output differences, with no self-citation chains or ansatzes serving as load-bearing premises for the central claims. The evaluation is self-contained against external benchmarks of model behavior.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central findings rest on the assumption that the chosen stigmatizing language categories and vignette modifications isolate the intended bias effect without introducing unrelated textual artifacts.

axioms (1)
  • domain assumption Stigmatizing language in clinical notes can be reliably categorized into doubt, blame, and maligning phenotypes with controllable intensity.
    Invoked to construct the experimental vignettes and interpret the dose-response results.

pith-pipeline@v0.9.0 · 5778 in / 1172 out tokens · 53324 ms · 2026-05-20T14:44:52.279956+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

166 extracted references · 166 canonical work pages · 40 internal anchors

  1. [1]

    Advances in neural information processing systems , volume=

    Attention is all you need , author=. Advances in neural information processing systems , volume=

  2. [2]

    OpenAI Blog , year=

    Improving language understanding by generative pre-training , author=. OpenAI Blog , year=

  3. [3]

    OpenAI blog , year=

    Language models are unsupervised multitask learners , author=. OpenAI blog , year=

  4. [4]

    Advances in neural information processing systems , volume=

    Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

  5. [5]

    OpenAI Blog Nov 30 2022 , url=

    Introducing ChatGPT , author=. OpenAI Blog Nov 30 2022 , url=

  6. [6]

    GPT-4 Technical Report

    Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

  7. [7]

    GPT-4o System Card

    Gpt-4o system card , author=. arXiv preprint arXiv:2410.21276 , year=

  8. [8]

    OpenAI Blog Apr 14 2025 , url=

    Introducing GPT-4.1 in the API , author=. OpenAI Blog Apr 14 2025 , url=

  9. [9]

    OpenAI Blog Feb 27 2025 , url=

    Introducing GPT-4.5 , author=. OpenAI Blog Feb 27 2025 , url=

  10. [10]

    OpenAI GPT-5 System Card

    Openai gpt-5 system card , author=. arXiv preprint arXiv:2601.03267 , year=

  11. [11]

    OpenAI Blog Nov 12 2025 , url=

    GPT-5.1: A smarter, more conversational ChatGPT , author=. OpenAI Blog Nov 12 2025 , url=

  12. [12]

    OpenAI Blog Dec 11 2025 , url=

    Introducing GPT-5.2 , author=. OpenAI Blog Dec 11 2025 , url=

  13. [13]

    OpenAI Blog Mar 5 2026 , url=

    Introducing GPT-5.4 , author=. OpenAI Blog Mar 5 2026 , url=

  14. [14]

    OpenAI Blog Sep 15 2025 , url=

    Introducing upgrades to Codex , author=. OpenAI Blog Sep 15 2025 , url=

  15. [15]

    OpenAI Blog Nov 19 2025 , url=

    Building more with GPT‑5.1‑Codex‑Max , author=. OpenAI Blog Nov 19 2025 , url=

  16. [16]

    OpenAI Blog Dec 18 2025 , url=

    Introducing GPT‑5.2‑Codex , author=. OpenAI Blog Dec 18 2025 , url=

  17. [17]

    OpenAI Blog Feb 5 2026 , url=

    Introducing GPT‑5.3‑Codex , author=. OpenAI Blog Feb 5 2026 , url=

  18. [18]

    gpt-oss-120b & gpt-oss-20b Model Card

    gpt-oss-120b & gpt-oss-20b model card , author=. arXiv preprint arXiv:2508.10925 , year=

  19. [19]

    OpenAI o1 System Card

    Openai o1 system card , author=. arXiv preprint arXiv:2412.16720 , year=

  20. [20]

    OpenAI Blog Apr 16 2025 , url=

    Introducing OpenAI o3 and o4-mini , author=. OpenAI Blog Apr 16 2025 , url=

  21. [21]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

  22. [22]

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context , author=. arXiv preprint arXiv:2403.05530 , year=

  23. [23]

    Google Blog Dec 11 2024 , url=

    Introducing Gemini 2.0: our new AI model for the agentic era , author=. Google Blog Dec 11 2024 , url=

  24. [24]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities , author=. arXiv preprint arXiv:2507.06261 , year=

  25. [25]

    Google Blog Nov 18 2025 , url=

    A new era of intelligence with Gemini 3 , author=. Google Blog Nov 18 2025 , url=

  26. [26]

    Google Blog Dec 17 2025 , url=

    Gemini 3 Flash: frontier intelligence built for speed , author=. Google Blog Dec 17 2025 , url=

  27. [27]

    Google Blog Feb 12 2026 , url=

    Gemini 3 Deep Think: Advancing science, research and engineering , author=. Google Blog Feb 12 2026 , url=

  28. [28]

    Google Blog Feb 19 2026 , url=

    Gemini 3.1 Pro: A smarter model for your most complex tasks , author=. Google Blog Feb 19 2026 , url=

  29. [29]

    Gemma: Open Models Based on Gemini Research and Technology

    Gemma: Open models based on gemini research and technology , author=. arXiv preprint arXiv:2403.08295 , year=

  30. [30]

    Gemma 2: Improving Open Language Models at a Practical Size

    Gemma 2: Improving open language models at a practical size , author=. arXiv preprint arXiv:2408.00118 , year=

  31. [31]

    Gemma 3 Technical Report

    Gemma 3 technical report , author=. arXiv preprint arXiv:2503.19786 , year=

  32. [32]

    Anthropic Blog Mar 24 2023 , url=

    Introducing Claude , author=. Anthropic Blog Mar 24 2023 , url=

  33. [33]

    Anthropic Blog Jul 11 2023 , url=

    Claude 2 , author=. Anthropic Blog Jul 11 2023 , url=

  34. [34]

    Anthropic Blog Nov 21 2023 , url=

    Introducing Claude 2.1 , author=. Anthropic Blog Nov 21 2023 , url=

  35. [35]

    Anthropic Blog Mar 13 2024 , url=

    Claude 3 Haiku: our fastest model yet , author=. Anthropic Blog Mar 13 2024 , url=

  36. [36]

    Anthropic Blog Jun 21 2024 , url=

    Claude 3.5 Sonnet , author=. Anthropic Blog Jun 21 2024 , url=

  37. [37]

    Anthropic Blog Feb 24 2025 , url=

    Claude 3.7 Sonnet and Claude Code , author=. Anthropic Blog Feb 24 2025 , url=

  38. [38]

    Anthropic Blog Mar 22 2025 , url=

    Introducing Claude 4 , author=. Anthropic Blog Mar 22 2025 , url=

  39. [39]

    Anthropic Blog Aug 5 2025 , url=

    Claude Opus 4.1 , author=. Anthropic Blog Aug 5 2025 , url=

  40. [40]

    Anthropic Blog Sep 29 2025 , url=

    Introducing Claude Sonnet 4.5 , author=. Anthropic Blog Sep 29 2025 , url=

  41. [41]

    Anthropic Blog Oct 15 2025 , url=

    Introducing Claude Haiku 4.5 , author=. Anthropic Blog Oct 15 2025 , url=

  42. [42]

    Anthropic Blog Nov 24 2025 , url=

    Introducing Claude Opus 4.5 , author=. Anthropic Blog Nov 24 2025 , url=

  43. [43]

    Anthropic Blog Feb 5 2026 , url=

    Introducing Claude Opus 4.6 , author=. Anthropic Blog Feb 5 2026 , url=

  44. [44]

    Anthropic Blog Feb 17 2026 , url=

    Introducing Claude Sonnet 4.6 , author=. Anthropic Blog Feb 17 2026 , url=

  45. [45]

    xAI Blogs Nov 3 2023 , url=

    Announcing Grok , author=. xAI Blogs Nov 3 2023 , url=

  46. [46]

    xAI Blogs Mar 28 2024 , url=

    Announcing Grok-1.5 , author=. xAI Blogs Mar 28 2024 , url=

  47. [47]

    xAI Blogs Aug 13 2024 , url=

    Grok-2 Beta Release , author=. xAI Blogs Aug 13 2024 , url=

  48. [48]

    xAI Blogs Feb 19 2025 , url=

    Grok 3 Beta — The Age of Reasoning Agents , author=. xAI Blogs Feb 19 2025 , url=

  49. [49]

    xAI Blogs Jul 9 2025 , url=

    Grok 4 , author=. xAI Blogs Jul 9 2025 , url=

  50. [50]

    xAI Blogs Nov 17 2025 , url=

    Grok 4.1 , author=. xAI Blogs Nov 17 2025 , url=

  51. [51]

    LLaMA: Open and Efficient Foundation Language Models

    Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=

  52. [52]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

  53. [53]

    The Llama 3 Herd of Models

    The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

  54. [54]

    Meta Blog Jul 23 2024 , url=

    Introducing Llama 3.1: Our most capable models to date , author=. Meta Blog Jul 23 2024 , url=

  55. [55]

    Meta Blog Sep 25 2024 , url=

    Llama 3.2: Revolutionizing edge AI and vision with open, customizable models , author=. Meta Blog Sep 25 2024 , url=

  56. [56]

    Meta Blog Dec 6 2024 , url=

    The Meta Llama 3.3 70B Instruct , author=. Meta Blog Dec 6 2024 , url=

  57. [57]

    Meta Blog Apr 5 2025 , url=

    The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation , author=. Meta Blog Apr 5 2025 , url=

  58. [58]

    Advances in neural information processing systems , volume=

    Visual instruction tuning , author=. Advances in neural information processing systems , volume=

  59. [59]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Improved baselines with visual instruction tuning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  60. [60]

    LLaVA Blogs Jan 2024 , url=

    LLaVA-NeXT: Improved reasoning, OCR, and world knowledge , author=. LLaVA Blogs Jan 2024 , url=

  61. [61]

    Seed1.5-thinking: Advancing superb reasoning models with reinforce- ment learning

    Seed1. 5-thinking: Advancing superb reasoning models with reinforcement learning , author=. arXiv preprint arXiv:2504.13914 , year=

  62. [62]

    Seed1.5-VL Technical Report

    Seed1.5-vl technical report , author=. arXiv preprint arXiv:2505.07062 , year=

  63. [63]

    ByteDance Seed Blog Jun 25 2025 , url=

    Introduction to Techniques Used in Seed1.6 , author=. ByteDance Seed Blog Jun 25 2025 , url=

  64. [64]

    ByteDance Seed Blog Dec 18 2025 , url=

    Official Release of Seed1.8: A Generalized Agentic Model , author=. ByteDance Seed Blog Dec 18 2025 , url=

  65. [65]

    ByteDance Seed Blog Feb 14 2026 , url=

    Seed 2.0 Official Launch , author=. ByteDance Seed Blog Feb 14 2026 , url=

  66. [66]

    ByteDance Seed Blog Aug 21 2025 , url=

    Seed-OSS Open-Source Models Release , author=. ByteDance Seed Blog Aug 21 2025 , url=

  67. [67]

    Qwen Technical Report

    Qwen technical report , author=. arXiv preprint arXiv:2309.16609 , year=

  68. [68]

    Qwen Blogs Feb 4 2024 , url=

    Introducing Qwen1.5 , author=. Qwen Blogs Feb 4 2024 , url=

  69. [69]

    Qwen2 Technical Report

    Qwen2 technical report , author=. arXiv preprint arXiv:2407.10671 , year=

  70. [70]

    Qwen2.5 Technical Report

    Qwen2.5 technical report , author=. arXiv preprint arXiv:2412.15115 , year=

  71. [71]

    Qwen3 Technical Report

    Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

  72. [72]

    Qwen Blogs Feb 16 2026 , url=

    Qwen3.5: Towards Native Multimodal Agents , author=. Qwen Blogs Feb 16 2026 , url=

  73. [73]

    Qwen Blogs Nov 28 2024 , url=

    QwQ: Reflect Deeply on the Boundaries of the Unknown , author=. Qwen Blogs Nov 28 2024 , url=

  74. [74]

    Qwen Blogs Dec 25 2024 , url=

    QVQ: To See the World with Wisdom , author=. Qwen Blogs Dec 25 2024 , url=

  75. [75]

    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond , author=. arXiv preprint arXiv:2308.12966 , year=

  76. [76]

    Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

    Qwen2-vl: Enhancing vision-language model's perception of the world at any resolution , author=. arXiv preprint arXiv:2409.12191 , year=

  77. [77]

    Qwen2.5-VL Technical Report

    Qwen2.5-vl technical report , author=. arXiv preprint arXiv:2502.13923 , year=

  78. [78]

    Qwen3-VL Technical Report

    Qwen3-vl technical report , author=. arXiv preprint arXiv:2511.21631 , year=

  79. [79]

    Qwen Blogs Jan 25 2024 , url=

    Introducing Qwen-VL , author=. Qwen Blogs Jan 25 2024 , url=

  80. [80]

    DeepSeek-V3 Technical Report

    Deepseek-v3 technical report , author=. arXiv preprint arXiv:2412.19437 , year=

Showing first 80 references.