arxiv: 2604.11239 · v1 · submitted 2026-04-13 · 📊 stat.ME

Recognition: unknown

Optimized questionnaire item selection for tracking the progression of motor symptoms in Parkinson's disease

Karl Sigfrid , Ellinor Fackle-Fornius , Frank Miller

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:49 UTC · model grok-4.3

classification 📊 stat.ME

keywords Parkinson's diseaseMDS-UPDRSitem selectionFisher informationcoordinate descentadaptive testinglatent trait estimationquestionnaire reduction

0 comments

The pith

Coordinate descent and adaptive selection of MDS-UPDRS items cut expected standard deviation by 26 and 34 percent for five-item subsets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares three ways to pick a small number of questions from the full MDS-UPDRS scale used to track Parkinson's motor symptoms, with the goal of keeping uncertainty in the disease severity estimate as low as possible. Simple ranking of items by their expected Fisher information already beats random selection, but a coordinate descent search that directly minimizes the standard error of the severity estimate does better, and an adaptive method that chooses items based on the true latent score does slightly better still. For five-item versions the reductions in expected standard deviation are 14, 26, and 34 percent respectively. Because the item parameters stay exactly the same as in the full test, the shorter versions continue to measure the same underlying construct. Gains from the more complex methods are largest when only a few items are kept and shrink as the subset grows.

Core claim

Three item selection methods were compared for minimizing uncertainty in disease severity estimates from the MDS-UPDRS: ranking items by expected Fisher information, coordinate descent to directly minimize estimate standard deviation, and adaptive selection based on true latent scores. For five-item subsets the reductions in expected standard deviation relative to random selection were 14, 26, and 34 percent respectively. The adaptive method represents a best-case performance limit. Gains from sophisticated methods are largest for small subsets and diminish as more items are included. Reduced sets measure the same latent construct because item parameters are unchanged from the full test.

What carries the argument

Coordinate descent local search that directly minimizes the standard error of the latent trait estimate in the IRT model.

If this is right

For five-item subsets coordinate descent reduces expected standard deviation by 26 percent compared with random selection.
Adaptive selection achieves a 34 percent reduction but only as an upper limit under ideal information.
Advantages of the optimization methods shrink as the number of retained items increases.
All reduced sets continue to measure the identical latent trait because they reuse the original item parameters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In real use, where only estimated scores are available, the adaptive method's reported advantage is likely to shrink.
The same selection procedures could be applied to other IRT-based clinical scales to create shorter yet precise versions.
Routine clinical monitoring might become feasible at higher frequency if shorter forms maintain acceptable precision.

Load-bearing premise

The adaptive selection gains assume the optimal items can be chosen using the patient's true unknown disease severity score rather than scores estimated from the responses.

What would settle it

A simulation that repeatedly selects the five items using only the estimated severity score obtained from those same items and measures whether the reduction in expected standard deviation still reaches 34 percent.

Figures

Figures reproduced from arXiv: 2604.11239 by Ellinor Fackle-Fornius, Frank Miller, Karl Sigfrid.

**Figure 2.** Figure 2: Left: The information functions for 7 items in a 2-PL IRT [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Top: The estimated distribution of symptom severity in the p [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Each plot shows the Fisher information of an item as a fun [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Items included in an optimal set of size 5 with di [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: The expected standard deviation of an ability estimate with di [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

read the original abstract

Long questionnaires increase the response burden for patients and healthcare workers. In the treatment of Parkinson's disease, the MDS-UPDRS questionnaire to track disease progression may be underutilized due to time requirements. While reduced item sets have been studied using Fisher information from Item Response Theory (IRT) models, optimal selection methods remain unclear. We compared three methods for selecting an optimal subset of items, with the aim of minimizing the uncertainty in the estimates of the disease severity: Ranking by the Fisher information, coordinate descent local search to directly minimize estimate uncertainty, and adaptive selection. Whereas item ranking based on the expected Fisher information outperformed random choice of items, we saw further gains with the coordinate descent algorithm that directly minimizes the uncertainty of the disease severity estimate. An adaptive algorithm gave an additional slight gain compared to the coordinate descent method. However, the performance of the adaptive method is a best-case limit as we assume that we find the optimal set for the true latent trait scores. For a 5-item subset, the ranked Fisher information method reduced the expected standard deviation by 14 percent compared to random item selection. The corresponding reductions for coordinate descent and adaptive selection were 26 percent and 34 percent respectively. More sophisticated selection methods substantially improved estimate accuracy for small item sets, with diminishing returns for larger subsets. Because item parameters are retained from the full test, reduced item sets measure the same latent construct as the original test. The choice of method entails a trade-off between methodological complexity and precision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript compares three methods for selecting optimal subsets of items from the MDS-UPDRS questionnaire to minimize uncertainty in IRT-based estimates of Parkinson's disease severity: ranking by expected Fisher information, coordinate descent optimization, and adaptive selection. It reports concrete performance gains for a 5-item subset (14%, 26%, and 34% reductions in expected standard deviation versus random selection) and notes that the adaptive method represents a best-case limit assuming knowledge of true latent trait scores.

Significance. If the results hold under practical conditions, the work provides a useful framework for shortening clinical questionnaires while preserving precision on the same latent construct, which could increase routine use of the MDS-UPDRS in Parkinson's monitoring. The direct optimization of estimate uncertainty and the transparent labeling of the adaptive method's oracle limitation are methodological strengths that advance optimal test design in applied psychometrics.

major comments (2)

Abstract: The 34% reduction for adaptive selection is obtained by selecting the optimal set using the true (unknown) latent trait scores rather than estimates derived from responses. Although labeled a 'best-case limit,' including this figure in the headline comparison with the 26% coordinate-descent result overstates the attainable incremental gain; a practical evaluation using estimated trait scores is needed to support the claim of 'additional slight gain' from the adaptive algorithm.
Methods section: The concrete percentage improvements (14%, 26%, 34%) and the soundness of the simulation-based comparisons cannot be fully assessed without visible details on the data source, IRT model specification, number of Monte Carlo replications, variance estimation procedure, and any error analysis or confidence intervals around the reported reductions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important issues of clarity and practical relevance, which we address point by point below.

read point-by-point responses

Referee: Abstract: The 34% reduction for adaptive selection is obtained by selecting the optimal set using the true (unknown) latent trait scores rather than estimates derived from responses. Although labeled a 'best-case limit,' including this figure in the headline comparison with the 26% coordinate-descent result overstates the attainable incremental gain; a practical evaluation using estimated trait scores is needed to support the claim of 'additional slight gain' from the adaptive algorithm.

Authors: We agree that the adaptive result is an oracle bound and that its direct juxtaposition with the coordinate-descent figure in the abstract risks overstating the incremental practical gain. In the revised version we will rephrase the abstract to foreground that the 34% figure is a theoretical upper limit, remove the phrase 'additional slight gain' from the headline comparison, and add a short paragraph (or supplementary simulation) that implements adaptive selection using trait estimates obtained from an initial non-adaptive item set. This will provide the requested realistic evaluation while preserving the theoretical comparison. revision: yes
Referee: Methods section: The concrete percentage improvements (14%, 26%, 34%) and the soundness of the simulation-based comparisons cannot be fully assessed without visible details on the data source, IRT model specification, number of Monte Carlo replications, variance estimation procedure, and any error analysis or confidence intervals around the reported reductions.

Authors: We acknowledge that the current Methods section does not make these simulation parameters sufficiently explicit for independent assessment. In the revision we will expand the relevant subsection to state: the source of the item-parameter estimates (the specific PD cohort or public dataset used for IRT calibration), the precise IRT model and estimation method, the number of Monte Carlo replications performed, the exact procedure used to obtain the standard deviation of the latent-trait estimates (analytical inverse-information approximation), and Monte-Carlo-based standard errors or 95% confidence intervals for the reported percentage reductions. These additions will render the numerical results fully reproducible and allow readers to judge the precision of the comparisons. revision: yes

Circularity Check

0 steps flagged

No circularity; standard IRT optimization on external questionnaire data

full rationale

The paper fits an IRT model to the full MDS-UPDRS questionnaire, retains the item parameters, and applies standard Fisher information ranking, coordinate descent minimization of estimate variance, and adaptive selection to choose subsets. All reported reductions (14%, 26%, 34%) are direct empirical comparisons of expected standard deviation against random selection on the same fitted model; the adaptive result is explicitly qualified as an oracle best-case limit rather than a practical prediction. No step equates a fitted quantity to a derived output by construction, no self-citation chain bears the central claim, and no known result is renamed as a new derivation. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard assumptions of Item Response Theory for modeling questionnaire responses and on the validity of the simulation framework used to compare selection methods; no new free parameters or invented entities are introduced.

axioms (1)

domain assumption The MDS-UPDRS items follow a standard Item Response Theory model relating responses to a latent disease severity trait.
Required to compute Fisher information and estimate uncertainty for each selection method.

pith-pipeline@v0.9.0 · 5572 in / 1165 out tokens · 51425 ms · 2026-05-10T15:49:54.710578+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 3 canonical work pages

[1]

L., Carthy , M

AlMahadin, G., Lotﬁ, A., Zysk, E., Siena, F . L., Carthy , M. M., & B reedon, P . (2020). Parkinson’s disease: current assessment methods and wearable devices for ev aluation of movement disorder motor symptoms - a patient and healthcare professional perspective. BMC Neurology, 20(1),

2020
[2]

URL https://doi.org/10.1186/s12883-020-01996-7 Arrington, L., Ueckert, S., Ahamadi, M., Macha, S., & Karlsson , M. O. (2020). Performance of longitudinal item response theory models in shortened or partial as sessments. Journal of Phar- macokinetics and Pharmacodynamics, 47(5), 461–471. URL https://doi.org/10.1007/s10928-020-09697-x Casella, G. (1985). An...

work page doi:10.1186/s12883-020-01996-7 2020
[3]

20 Goetz, C

John Wiley&Sons. 20 Goetz, C. G., Tilley , B. C., Shaftman, S. R., Stebbins, G. T., Fahn, S., Martinez-Martin, P ., Poewe, W ., Sampaio, C., Stern, M. B., Dodel, R., Dubois, B., Holloway , R., Jankovic, J., Kulisevsky , J., Lang, A. E., Lees, A., Leurgans, S., LeWitt, P . A., Nyenhuis, D., Ola now , C. W ., Rascol, O., Schrag, A., Teresi, J. A., van Hilte...

work page doi:10.1002/mds.22340 2008
[4]

V ehtari, A., Gelman, A., Simpson, D., Carpenter, B., & B ¨urkner, P .-C. (2021). Rank-normalization, folding, and localization: An improved Rˆ for assessing conve rgence of MCMC (with discus- sion). Bayesian Analysis, 16(2), 667–718. URL https://doi.org/10.1214/20-BA1221 Zafar, S., & Y addanapudi, S. S. (2025). Parkinson Disease. In StatPearls. Treasure ...

work page doi:10.1214/20-ba1221 2021