When Rating Scales Fall Short: LLM-Assisted Discovery of ADHD Signals in Turkish Teacher Narratives

Ahmet Ozaslan; Baris Karacan; Elvan Iseri; Irem Aktar Songur

arxiv: 2606.02509 · v1 · pith:FPLMBZTVnew · submitted 2026-06-01 · 💻 cs.CL

When Rating Scales Fall Short: LLM-Assisted Discovery of ADHD Signals in Turkish Teacher Narratives

Baris Karacan , Irem Aktar Songur , Ahmet Ozaslan , Elvan Iseri This is my paper

Pith reviewed 2026-06-28 14:13 UTC · model grok-4.3

classification 💻 cs.CL

keywords ADHDteacher narrativesrating scalesLLM theme discoverycomplementary signalsnatural language processingbehavioral patternsTurkish clinical data

0 comments

The pith

Structured rating scales miss ADHD patterns that teachers' open-ended narratives capture, and the two sources flag largely different cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Teachers supply both numerical rating-scale scores and free-text narratives when children are evaluated for ADHD. The study finds that the numerical scores often fail to separate diagnosed ADHD cases from non-ADHD cases, whereas models trained on the narratives succeed on many of those same cases. The sets of cases missed by each approach show little overlap, which indicates that the structured scores and the narrative text carry distinct information. An LLM pipeline then surfaces recurring themes in the narratives around attention, behavior, and family context that appear to be the signals the rating scales overlook.

Core claim

The paper establishes that teacher narratives encode complementary signals to the Conners' Teacher Rating Scale-Revised Short Form. Structured scores do not clearly distinguish ADHD from non-ADHD students in a subset of cases, while narrative-based models capture distinct behavioral patterns in those cases; the two error sets overlap minimally. LLM-assisted theme discovery then identifies attention-related, behavioral, and family-related patterns as the content of the overlooked signals.

What carries the argument

LLM-assisted theme discovery pipeline applied to the subset of narratives where structured scores fail but narrative models succeed.

If this is right

Combining narrative text with rating scales can improve separation of ADHD from non-ADHD cases where either source alone is insufficient.
The minimal overlap in missed cases implies that narrative data supplies information absent from the structured instrument.
Attention, behavioral, and family-related patterns extracted from narratives constitute candidate signals for refining ADHD screening tools.
Natural language processing of teacher reports can be used to surface clinically relevant details that standardized scales do not quantify.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same narrative-analysis approach could be applied to parent reports or clinician notes to test whether complementary signals appear across other information sources.
If the discovered themes prove stable across languages, the method might support cross-cultural ADHD assessment tools that reduce reliance on translated rating scales alone.
Clinics could pilot a workflow that routes ambiguous rating-scale results to an automated narrative review before final diagnosis decisions.

Load-bearing premise

The LLM-assisted theme discovery pipeline produces clinically meaningful and unbiased themes from the narratives that genuinely reflect overlooked ADHD signals rather than artifacts of the model or data processing.

What would settle it

If a fresh sample of teacher narratives is processed through the same LLM pipeline and the resulting themes do not align with clinician-verified behavioral differences between the ADHD and non-ADHD groups in that sample, the claim that narratives supply overlooked signals would not hold.

Figures

Figures reproduced from arXiv: 2606.02509 by Ahmet Ozaslan, Baris Karacan, Elvan Iseri, Irem Aktar Songur.

**Figure 2.** Figure 2: Prompt used for LLM-assisted theme extrac [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Attention Deficit Hyperactivity Disorder (ADHD) is one of the most common neurodevelopmental disorders in childhood, and its diagnosis relies on assessments combining clinician judgment with standardized rating scales and reports from parents and teachers. While structured instruments such as the Conners' Teacher Rating Scale-Revised Short Form (CTRS-R:S) quantify ADHD-related behaviors, teachers also provide open-ended narratives that may contain complementary signals not captured by structured assessments. However, it remains unclear to what extent teacher narratives encode signals overlooked by rating scales. In this study, we analyze de-identified Turkish teacher evaluation forms collected during clinical ADHD assessments, including both CTRS-R:S scores and open-ended teacher narratives. We compare predictive signals from structured scores and narrative text and identify cases where structured assessments fail to clearly distinguish ADHD from non-ADHD students while narrative-based models capture distinct behavioral patterns. Notably, these cases show minimal overlap with those missed by the narrative model, suggesting that structured and narrative information encode complementary signals. To interpret these differences, we apply a large language model (LLM)-assisted theme discovery pipeline that reveals distinct attention, behavioral, and family-related patterns, highlighting the potential of natural language processing (NLP) to uncover clinically relevant signals from teacher narratives and to complement traditional ADHD screening tools.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract flags potential complementarity between rating scales and teacher narratives for ADHD via LLM themes, but supplies zero methods or validation so the claim cannot be assessed.

read the letter

The punchline is that this work claims structured scales like CTRS-R:S miss some ADHD cases that open-ended Turkish teacher narratives catch, with minimal overlap between the two sets of misses, and then uses an LLM pipeline to surface attention, behavioral, and family themes that supposedly explain the gap. That is the entire contribution visible here.

What is actually new is the application to de-identified Turkish clinical forms and the specific observation of low overlap between the two data sources. The abstract does a clear job stating the clinical motivation and the basic comparison idea.

The soft spots are substantial and central. No sample size, no model details, no performance metrics, no statistical tests, and no description of the LLM prompting, temperature, or any human validation of the extracted themes appear in the abstract. The stress-test concern lands directly: if the themes are LLM artifacts or translation effects rather than genuine overlooked signals, the complementarity story does not follow. Without those pieces the predictive comparison alone is not enough to support the interpretation.

This is for readers already working on NLP for mental health screening who want to see the idea tried in a new language and setting. A serious reader gets almost no usable evidence from the current text. I would not bring it to a reading group and would not cite it. It does not yet deserve peer review; the methods section needs to exist and be checked first before any referee time is spent.

Referee Report

1 major / 0 minor

Summary. The paper analyzes de-identified Turkish teacher evaluation forms collected during clinical ADHD assessments, comparing predictive signals from structured CTRS-R:S scores against open-ended teacher narratives. It reports that structured assessments fail to clearly distinguish ADHD from non-ADHD cases while narrative-based models capture distinct patterns, with minimal overlap between the missed cases, indicating complementary signals. An LLM-assisted theme discovery pipeline is then applied to interpret the differences, surfacing distinct attention, behavioral, and family-related patterns.

Significance. If the empirical comparison and theme interpretations hold after validation, the work would demonstrate that teacher narratives encode clinically relevant ADHD signals overlooked by standard rating scales, with potential to improve screening via hybrid NLP approaches. The Turkish clinical data adds value for underrepresented languages and contexts in mental health NLP. The complementarity finding, if robustly quantified, could motivate further research on integrating text and structured data in neurodevelopmental disorder assessment.

major comments (1)

[Methods and Results (LLM theme discovery pipeline)] The LLM-assisted theme discovery pipeline (described in the methods and results sections on interpretation of differences) provides no details on prompting strategy, model version, temperature, few-shot exemplars, deduplication, or any validation against clinical ground truth, inter-rater checks, or clinician review. This is load-bearing for the central claim because the interpretation that the surfaced 'distinct attention, behavioral, and family-related patterns' explain why narratives catch cases missed by CTRS-R:S rests on the themes being genuine signals rather than artifacts of LLM priors, translation, or narrative length.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thorough review and for highlighting the need for greater transparency in our LLM-assisted theme discovery pipeline. We agree this is central to the interpretability of our complementarity findings and will substantially expand the Methods and Results sections in revision.

read point-by-point responses

Referee: [Methods and Results (LLM theme discovery pipeline)] The LLM-assisted theme discovery pipeline (described in the methods and results sections on interpretation of differences) provides no details on prompting strategy, model version, temperature, few-shot exemplars, deduplication, or any validation against clinical ground truth, inter-rater checks, or clinician review. This is load-bearing for the central claim because the interpretation that the surfaced 'distinct attention, behavioral, and family-related patterns' explain why narratives catch cases missed by CTRS-R:S rests on the themes being genuine signals rather than artifacts of LLM priors, translation, or narrative length.

Authors: We agree that the manuscript as submitted provides insufficient methodological detail on the LLM pipeline. In the revised version we will add a dedicated subsection that specifies: (1) the exact model and version used, (2) temperature and other generation parameters, (3) the full prompting strategy including any few-shot exemplars or chain-of-thought instructions, (4) post-processing steps such as deduplication and theme merging criteria, and (5) any human validation performed (clinician review, inter-rater agreement on a subset of themes). We will also report whether themes were cross-checked against the original Turkish text to mitigate translation artifacts. Where formal validation against clinical ground truth was not conducted, we will state this limitation explicitly and describe the qualitative safeguards we applied instead. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of data sources with no self-referential derivations

full rationale

The paper conducts an empirical analysis comparing predictive performance of CTRS-R:S scores versus teacher narratives on de-identified Turkish data, then applies an LLM pipeline for post-hoc theme interpretation. No equations, parameters fitted to subsets and renamed as predictions, or self-citations are present in the provided text. The central claim of complementary signals rests on the observed minimal overlap in missed cases, which is a direct data observation rather than a reduction to inputs by definition or citation chain. This matches the default expectation for non-circular empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no information on free parameters, axioms, or invented entities; the analysis appears to rest on standard assumptions about LLM reliability and clinical data validity that are not detailed here.

pith-pipeline@v0.9.1-grok · 5769 in / 1132 out tokens · 25045 ms · 2026-06-28T14:13:18.616299+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 1 linked inside Pith

[1]

Neuropediatrics , volume=

ADHD: Current concepts and treatments in children and adolescents , author=. Neuropediatrics , volume=. 2020 , publisher=

2020
[2]

Child and adolescent psychiatry and mental health , volume=

Prevalence and diagnostic stability of ADHD and ODD in Turkish children: a 4-year longitudinal study , author=. Child and adolescent psychiatry and mental health , volume=. 2013 , publisher=

2013
[3]

2022 , publisher=

Diagnostic and statistical manual of mental disorders , author=. 2022 , publisher=

2022
[4]

European journal of pediatrics , volume=

Variations in prevalence of attention deficit hyperactivity disorder worldwide , author=. European journal of pediatrics , volume=. 2007 , publisher=

2007
[5]

BMC medicine , volume=

A systematic review and analysis of long-term outcomes in attention deficit hyperactivity disorder: effects of treatment and non-treatment , author=. BMC medicine , volume=. 2012 , publisher=

2012
[6]

European child & adolescent psychiatry , volume=

The clinical utility of the continuous performance test and objective measures of activity for diagnosing and monitoring ADHD in children: A systematic review , author=. European child & adolescent psychiatry , volume=. 2016 , publisher=

2016
[7]

Pediatrics , volume=

Tools for the diagnosis of ADHD in children and adolescents: a systematic review , author=. Pediatrics , volume=. 2024 , publisher=

2024
[8]

Child and Adolescent Mental Health , volume=

Five years on: public sector service use related to mental health in young people with ADHD or hyperkinetic disorder five years after diagnosis , author=. Child and Adolescent Mental Health , volume=. 2008 , publisher=

2008
[9]

El manual diagn

Malpica, Carlos Rojas and Salas, Miguel. El manual diagn. Gac M
[10]

Journal of abnormal child psychology , volume=

Revision and restandardization of the Conners Teacher Rating Scale (CTRS-R): factor structure, reliability, and criterion validity , author=. Journal of abnormal child psychology , volume=. 1998 , publisher=

1998
[11]

EGITIM VE BILIM-EDUCATION AND SCIENCE , volume=

Conners teacher rating scale-revised short: Turkish adaptation study , author=. EGITIM VE BILIM-EDUCATION AND SCIENCE , volume=
[12]

Proceedings of the workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality , pages=

Quantifying mental health signals in Twitter , author=. Proceedings of the workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality , pages=
[13]

Translational psychiatry , volume=

Predicting early psychiatric readmission with natural language processing of narrative discharge summaries , author=. Translational psychiatry , volume=. 2016 , publisher=

2016
[14]

npj Schizophrenia , volume=

Automated analysis of free speech predicts psychosis onset in high-risk youths , author=. npj Schizophrenia , volume=. 2015 , publisher=

2015
[15]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

Towards comprehensive language analysis for clinically enriched spontaneous dialogue , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

2024
[16]

2023 , publisher =

Korkmaz, Nezahat , title =. 2023 , publisher =

2023
[17]

Information processing & management , volume=

Term-weighting approaches in automatic text retrieval , author=. Information processing & management , volume=. 1988 , publisher=

1988
[18]

NPJ digital medicine , volume=

Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports , author=. NPJ digital medicine , volume=. 2025 , publisher=

1933
[19]

arXiv preprint arXiv:2407.21783 , year=

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

Pith/arXiv arXiv
[20]

Introduction to nonparametric statistics for the biological sciences using R , pages=

Mann--whitney u test , author=. Introduction to nonparametric statistics for the biological sciences using R , pages=. 2016 , publisher=

2016
[21]

Journal of Machine Learning Research , volume=

Scikit-learn: Machine learning in Python , author=. Journal of Machine Learning Research , volume=. 2011 , publisher=

2011
[22]

Applied Neuropsychology: Child , volume=

Quantitative electroencephalography in children with attention deficit hyperactivity disorder and healthy children: behavioral and age correlates , author=. Applied Neuropsychology: Child , volume=. 2025 , publisher=

2025
[23]

Cells , volume=

From aberrant brainwaves to altered plasticity: A review of QEEG biomarkers and neurofeedback in the neurobiological landscape of ADHD , author=. Cells , volume=. 2025 , publisher=

2025
[24]

PLoS One , volume=

A novel quantitative electroencephalography subtype with high alpha power in ADHD: ADHD or misdiagnosed ADHD? , author=. PLoS One , volume=. 2020 , publisher=

2020

[1] [1]

Neuropediatrics , volume=

ADHD: Current concepts and treatments in children and adolescents , author=. Neuropediatrics , volume=. 2020 , publisher=

2020

[2] [2]

Child and adolescent psychiatry and mental health , volume=

Prevalence and diagnostic stability of ADHD and ODD in Turkish children: a 4-year longitudinal study , author=. Child and adolescent psychiatry and mental health , volume=. 2013 , publisher=

2013

[3] [3]

2022 , publisher=

Diagnostic and statistical manual of mental disorders , author=. 2022 , publisher=

2022

[4] [4]

European journal of pediatrics , volume=

Variations in prevalence of attention deficit hyperactivity disorder worldwide , author=. European journal of pediatrics , volume=. 2007 , publisher=

2007

[5] [5]

BMC medicine , volume=

A systematic review and analysis of long-term outcomes in attention deficit hyperactivity disorder: effects of treatment and non-treatment , author=. BMC medicine , volume=. 2012 , publisher=

2012

[6] [6]

European child & adolescent psychiatry , volume=

The clinical utility of the continuous performance test and objective measures of activity for diagnosing and monitoring ADHD in children: A systematic review , author=. European child & adolescent psychiatry , volume=. 2016 , publisher=

2016

[7] [7]

Pediatrics , volume=

Tools for the diagnosis of ADHD in children and adolescents: a systematic review , author=. Pediatrics , volume=. 2024 , publisher=

2024

[8] [8]

Child and Adolescent Mental Health , volume=

Five years on: public sector service use related to mental health in young people with ADHD or hyperkinetic disorder five years after diagnosis , author=. Child and Adolescent Mental Health , volume=. 2008 , publisher=

2008

[9] [9]

El manual diagn

Malpica, Carlos Rojas and Salas, Miguel. El manual diagn. Gac M

[10] [10]

Journal of abnormal child psychology , volume=

Revision and restandardization of the Conners Teacher Rating Scale (CTRS-R): factor structure, reliability, and criterion validity , author=. Journal of abnormal child psychology , volume=. 1998 , publisher=

1998

[11] [11]

EGITIM VE BILIM-EDUCATION AND SCIENCE , volume=

Conners teacher rating scale-revised short: Turkish adaptation study , author=. EGITIM VE BILIM-EDUCATION AND SCIENCE , volume=

[12] [12]

Proceedings of the workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality , pages=

Quantifying mental health signals in Twitter , author=. Proceedings of the workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality , pages=

[13] [13]

Translational psychiatry , volume=

Predicting early psychiatric readmission with natural language processing of narrative discharge summaries , author=. Translational psychiatry , volume=. 2016 , publisher=

2016

[14] [14]

npj Schizophrenia , volume=

Automated analysis of free speech predicts psychosis onset in high-risk youths , author=. npj Schizophrenia , volume=. 2015 , publisher=

2015

[15] [15]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

Towards comprehensive language analysis for clinically enriched spontaneous dialogue , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

2024

[16] [16]

2023 , publisher =

Korkmaz, Nezahat , title =. 2023 , publisher =

2023

[17] [17]

Information processing & management , volume=

Term-weighting approaches in automatic text retrieval , author=. Information processing & management , volume=. 1988 , publisher=

1988

[18] [18]

NPJ digital medicine , volume=

Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports , author=. NPJ digital medicine , volume=. 2025 , publisher=

1933

[19] [19]

arXiv preprint arXiv:2407.21783 , year=

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

Pith/arXiv arXiv

[20] [20]

Introduction to nonparametric statistics for the biological sciences using R , pages=

Mann--whitney u test , author=. Introduction to nonparametric statistics for the biological sciences using R , pages=. 2016 , publisher=

2016

[21] [21]

Journal of Machine Learning Research , volume=

Scikit-learn: Machine learning in Python , author=. Journal of Machine Learning Research , volume=. 2011 , publisher=

2011

[22] [22]

Applied Neuropsychology: Child , volume=

Quantitative electroencephalography in children with attention deficit hyperactivity disorder and healthy children: behavioral and age correlates , author=. Applied Neuropsychology: Child , volume=. 2025 , publisher=

2025

[23] [23]

Cells , volume=

From aberrant brainwaves to altered plasticity: A review of QEEG biomarkers and neurofeedback in the neurobiological landscape of ADHD , author=. Cells , volume=. 2025 , publisher=

2025

[24] [24]

PLoS One , volume=

A novel quantitative electroencephalography subtype with high alpha power in ADHD: ADHD or misdiagnosed ADHD? , author=. PLoS One , volume=. 2020 , publisher=

2020