arxiv: 2605.03193 · v1 · submitted 2026-05-04 · 📊 stat.AP

Recognition: unknown

Evaluating the probative value of forensic gait analysis evidence using empirical data

Ruoyun Hui , Amy L Wilson , Colin Aitken , Ivan Birch , Nadia Asgeirsdottir , Graham Jackson

Authors on Pith no claims yet

Pith reviewed 2026-05-08 02:07 UTC · model grok-4.3

classification 📊 stat.AP

keywords forensic gait analysislikelihood ratioprincipal component analysiswithin-individual variabilityprobative valuevideo evidenceforensic statisticsdimension reduction

0 comments

The pith

A likelihood ratio model for gait features produces misleading results in under 10 percent of comparisons when within-person variability is correctly specified.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a statistical approach to measure how much forensic gait analysis from video footage should affect legal decisions. It draws on population-level data plus repeated observations of the same walkers to quantify both differences between people and day-to-day changes within one person. Gait traits are recoded as simple binary variables and then reduced with principal component analysis to account for strong correlations among them. The resulting likelihood ratio model yields misleading indications in fewer than 10 percent of test comparisons when the first four components are retained, but the error rate rises sharply if the model uses an incorrect value for within-person variability. The authors conclude that the model can support expert judgment yet cannot replace it when differences in walking speed or camera setup might explain observed mismatches.

Core claim

By recoding observed gait features as dichotomous variables and applying principal component analysis for dimension reduction, the authors build a likelihood ratio model that produces misleading likelihood ratios in less than 10 percent of comparisons when the first four principal components are used, on the condition that within-individual variability is correctly specified; correlations among features are high enough that they cannot be treated as independent contributors to the weight of evidence, and human expertise remains essential for judging whether differences in conditions account for any mismatch between reference and questioned footage.

What carries the argument

The likelihood ratio model obtained by converting gait features to binary variables and reducing dimensionality with principal component analysis.

Load-bearing premise

Within-individual variability in gait features is correctly specified inside the likelihood ratio model.

What would settle it

A fresh collection of gait comparisons in which within-individual variability matches the model's specification yet the rate of misleading likelihood ratios exceeds 10 percent would falsify the reported performance.

Figures

Figures reproduced from arXiv: 2605.03193 by Amy L Wilson, Colin Aitken, Graham Jackson, Ivan Birch, Nadia Asgeirsdottir, Ruoyun Hui.

**Figure 1.** Figure 1: Distribution of features of gait in the population database by sex view at source ↗

**Figure 2.** Figure 2: Polychoric correlation between features of gait in the population database by sex. The view at source ↗

**Figure 3.** Figure 3: Distribution of features of gait in footage from Dataset A, after removing instances where view at source ↗

**Figure 4.** Figure 4: The amount of variation explained by each principal component in the population dataset view at source ↗

**Figure 5.** Figure 5: Histogram showing the distribution of the scores on the first four PC using the population view at source ↗

**Figure 6.** Figure 6: Histogram of log (base e) likelihood ratios obtained using a two-level model from scores on view at source ↗

**Figure 2.** Figure 2: As discussed inSection 3.2.2, some of the LRs for the different source comparisons are very view at source ↗

**Figure 7.** Figure 7: Empirical cross entropy plots of the likelihood ratios produced by the two-level model view at source ↗

**Figure 8.** Figure 8: Empirical cross entropy plots of the likelihood ratios produced by the two-level model from view at source ↗

**Figure 9.** Figure 9: Histogram of log (base e) likelihood ratios obtained using a two-level model from scores view at source ↗

read the original abstract

Forensic gait analysis can aid the investigation of crimes through comparing features of gait captured in video footage. Modelling the probative value of gait evidence requires an understanding of the variation of features of gait between individuals in the population and within the same individuals. We address this question using a previously described population dataset and newly collected datasets with repeated observations of the same individuals on separate occasions. In addition to exploring the level of variability, correlation between features of gait, and the effect of demographic factors, we developed a likelihood ratio model through recoding features of gait as dichotomous variables and dimension reduction using PCA. High correlations between some features were observed, confirming that they should not contribute independently to the weight of evidence. The likelihood ratio model produced misleading likelihood ratios in less than 10% of the comparisons using the first four principal components. However, the risk increases when within-individual variability is mis-specified. Therefore, while the current model provides assistance to the judgement of gait experts, human expertise is indispensable to decide whether or not the difference in walking and/or recording conditions between the reference and questioned footage could have caused any observed differences in the features of gait. We discuss future directions in understanding the sources of the variability, improving statistical modelling and note the need to consider carefully how to select the relevant population for model fitting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds an LR model for gait evidence from new repeat data and reports under 10% misleading rates with four PCs, but that figure depends on their estimate of within-individual variability.

read the letter

The main point is that the authors combine an existing population gait dataset with new repeated observations on the same individuals, recode the features as binary, reduce dimension with PCA, and fit a likelihood ratio model. With the first four components they get misleading LRs in less than 10% of comparisons, while noting that the rate rises if within-individual variability is mis-specified. They also record high correlations between some features and conclude that human judgment on recording conditions remains necessary.

Referee Report

1 major / 2 minor

Summary. The manuscript evaluates the probative value of forensic gait analysis evidence by combining a previously described population dataset with newly collected repeated-observation data from the same individuals. It quantifies between- and within-individual variability in gait features, examines correlations and demographic effects, and constructs a likelihood ratio model by recoding features as dichotomous variables followed by PCA dimension reduction. The central empirical result is that this LR model produces misleading likelihood ratios in less than 10% of comparisons when the first four principal components are retained. The authors note that performance degrades if within-individual variability is mis-specified and conclude that the model can assist but not replace human expert judgment on condition differences between reference and questioned footage.

Significance. If the reported performance holds, the work supplies one of the few empirical, data-driven quantifications of misleading rates for gait evidence, directly addressing a recognized gap in forensic statistics. Strengths include the use of repeated-observation datasets to estimate within-individual variance components and the explicit counting of misleading LRs rather than relying solely on theoretical derivations. These elements provide a concrete, falsifiable benchmark that could inform future validation studies. The explicit caveat about within-individual variability specification and the call for human oversight are appropriately cautious.

major comments (1)

[Abstract and LR model section] Abstract and the section describing the likelihood ratio model: The claim that misleading LRs occur in less than 10% of comparisons (first four principal components) is obtained by fitting the model to the new repeated-observation datasets and then evaluating on held-out comparisons. The abstract states that the misleading rate rises when within-individual variability is mis-specified, yet no sensitivity analysis, bootstrap perturbation of the variance components, or alternative distributional assumptions are reported. Because the within-individual variance is estimated solely from these datasets and directly enters the LR numerator and denominator, any under-sampling of repeats or unmodeled condition effects would propagate into the very metric used to support the <10% figure. This is load-bearing for the central claim.

minor comments (2)

[Methods] The methods description would benefit from explicit statements of data exclusion rules, the precise PCA implementation (including any scaling or centering steps), and the exact train/test split protocol used to compute the misleading-rate percentage.
[Results] Table or figure reporting the misleading-rate results should include the total number of comparisons performed and the breakdown by same-source versus different-source pairs to allow readers to assess the base rate.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed review, which highlights both the strengths of our empirical approach and an important area for strengthening the robustness claims. We address the major comment point by point below.

read point-by-point responses

Referee: [Abstract and LR model section] Abstract and the section describing the likelihood ratio model: The claim that misleading LRs occur in less than 10% of comparisons (first four principal components) is obtained by fitting the model to the new repeated-observation datasets and then evaluating on held-out comparisons. The abstract states that the misleading rate rises when within-individual variability is mis-specified, yet no sensitivity analysis, bootstrap perturbation of the variance components, or alternative distributional assumptions are reported. Because the within-individual variance is estimated solely from these datasets and directly enters the LR numerator and denominator, any under-sampling of repeats or unmodeled condition effects would propagate into the very metric used to support the <10% figure. This is load-bearing for the central claim.

Authors: We agree that the robustness of the reported misleading LR rate to the specification of within-individual variability is central to the manuscript's main claim and that a formal sensitivity analysis was not included in the original submission. The current <10% figure derives from direct empirical evaluation on held-out comparisons drawn from the repeated-observation datasets, which already embed the observed within-individual variability. Nevertheless, to address the referee's concern directly, we will add an explicit sensitivity analysis in the revised manuscript. This will include scaling the estimated within-individual variance components by factors of 0.5, 0.75, 1.25 and 1.5, recomputing the misleading LR proportions for the first four principal components, and reporting the results in a new table or figure. We will also add a short discussion of possible unmodeled condition effects and data limitations. These changes will be reflected in the abstract, the likelihood ratio model section, and the discussion. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical LR performance evaluated on separate repeated-observation data

full rationale

The paper fits a likelihood ratio model by recoding gait features as binary variables, applying PCA dimension reduction, and estimating between-individual variation from a prior population dataset plus within-individual variation from newly collected repeated observations. The central performance claim (<10% misleading LRs with first four PCs) is an empirical count obtained by applying the fitted model to comparisons drawn from those datasets. No equation reduces this count to a fitted parameter by construction, and the paper explicitly flags sensitivity to within-individual variability mis-specification rather than claiming the result is forced. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling are present in the derivation chain. This is a standard empirical modeling workflow with acknowledged modeling assumptions.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The model rests on empirical data from referenced population and repeated-measures datasets plus standard statistical assumptions about feature independence after PCA and population representativeness.

free parameters (2)

Number of principal components retained
Four components selected to achieve the reported low misleading rate; choice is data-driven.
Dichotomization thresholds for gait features
Cutoffs used to convert continuous gait measures to binary variables are not specified in the abstract.

axioms (2)

domain assumption The collected gait datasets adequately represent the relevant population for forensic comparisons.
Model fitting and testing rely on these datasets representing typical between- and within-individual variation.
domain assumption Principal components capture the relevant variation after accounting for correlations between gait features.
PCA is applied to handle observed high correlations between features.

pith-pipeline@v0.9.0 · 5544 in / 1410 out tokens · 37881 ms · 2026-05-08T02:07:18.289175+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 20 canonical work pages

[1]

A review of the historical use and criti- cisms of gait analysis evidence

Michael Nirenberg, Wesley Vernon, and Ivan Birch. “A review of the historical use and criti- cisms of gait analysis evidence”. In:Science & Justice58.4 (2018), pp. 292–298.doi:10.1016/ j.scijus.2018.03.002

2018
[2]

Forensic Science Regulator.Forensic Science Regulator Codes of Practice and Conduct: De- velopment of Evaluative Opinions.https://assets.publishing.service.gov.uk/media/ 602407728fa8f5146f0769d9/FSR-C-118_Interpretation_Appendix_Issue_1__002_.pdf. 2021

2021
[3]

https : / / rss

The Inns of Court College of Advocacy and the Royal Statistical Society.Statistics and prob- ability for advocates: Understanding the use of statistical evidence in courts and tribunals. https : / / rss . org . uk / RSS / media / File - library / Publications / ICCA - RSS - guide - version-6-branded-171019-REV03-designed-covers.pdf. 2019

2019
[4]

The Royal Society and the Royal Society of Edinburgh.The use of statistics in legal proceedings a primer for courts.https://royalsociety.org/-/media/about-us/programmes/science- and-law/science-and-law-statistics-primer.pdf. 2020

2020
[5]

European Network of Forensic Science Institutes.ENFSI Guideline for Evaluative Reporting in Forensic Science. 2016

2016
[6]

Rami Abboud et al.Forensic Gait Analysis: a primer for courts. Ed. by Andrew Cubie, Tim Theologis, and Daniel Wolpert. Primers for courts. 2017

2017
[7]

Critical review of the use and scientific basis of forensic gait analysis

Nina M. van Mastrigt et al. “Critical review of the use and scientific basis of forensic gait analysis”. In:Forensic Sciences Research3.3 (2018), pp. 183–193.doi:10.1080/20961790. 2018.1503579

work page doi:10.1080/20961790 2018
[8]

Forensic Gait Analysis and Recog- nition: Standards of Evidence Admissibility

Ioana Macoveciuc, Carolyn J. Rando, and Herv´ e Borrion. “Forensic Gait Analysis and Recog- nition: Standards of Evidence Admissibility”. In:Journal of Forensic Sciences64.5 (2019). eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/1556-4029.14036, pp. 1294–1303.doi: 10.1111/1556-4029.14036

work page doi:10.1111/1556-4029.14036 2019
[9]

The repeatability and reproducibility of the Sheffield Features of Gait Tool

Ivan Birch et al. “The repeatability and reproducibility of the Sheffield Features of Gait Tool”. In:Science & Justice59.5 (2019), pp. 544–551.doi:10.1016/j.scijus.2019.04.001. 21

work page doi:10.1016/j.scijus.2019.04.001 2019
[11]

The development of a tool for assessing the quality of closed circuit camera footage for use in forensic gait analysis

Ivan Birch et al. “The development of a tool for assessing the quality of closed circuit camera footage for use in forensic gait analysis”. In:Journal of Forensic and Legal Medicine20.7 (2013), pp. 915–917.doi:10.1016/j.jflm.2013.07.005

work page doi:10.1016/j.jflm.2013.07.005 2013
[12]

Bert Otten and Mickey Wiedemeijer.Forensic Gait Analysis Method Paper. Tech. rep. Contact: egbert.otten@umcg.nl. University Medical Center Groningen, University of Groningen, 2021

2021
[13]

Forensic interpretation framework for body and gait analysis: feature extraction, frequency and distinctiveness

Dilan Seckiner et al. “Forensic interpretation framework for body and gait analysis: feature extraction, frequency and distinctiveness”. In:Australian Journal of Forensic Sciences56.4 (2024), pp. 338–354.doi:10.1080/00450618.2022.2161636

work page doi:10.1080/00450618.2022.2161636 2024
[14]

Aiding the interpretation of forensic gait analysis: Development of a features of gait database

Ivan Birch, Claire Gwinnett, and Jeremy Walker. “Aiding the interpretation of forensic gait analysis: Development of a features of gait database”. In:Science & Justice56.6 (2016), pp. 426–430.doi:10.1016/j.scijus.2016.06.009

work page doi:10.1016/j.scijus.2016.06.009 2016
[15]

Effects of high heeled shoes on gait. A review

M. M. Wiedemeijer and E. Otten. “Effects of high heeled shoes on gait. A review”. In:Gait & Posture61 (2018), pp. 423–430.doi:10.1016/j.gaitpost.2018.01.036

work page doi:10.1016/j.gaitpost.2018.01.036 2018
[16]

The effect of viewing angle on observations of foot orientation in forensic gait analysis

Selina Reidy et al. “The effect of viewing angle on observations of foot orientation in forensic gait analysis”. In:Science & Justice60.6 (2020), pp. 504–511.doi:10.1016/j.scijus.2020. 06.005

work page doi:10.1016/j.scijus.2020 2020
[17]

The Development and Testing of a Forensic Interpretation Framework for use on Anthropometric and Morphological Data Collected During Stance and Gait

Dilan Seckiner. “The Development and Testing of a Forensic Interpretation Framework for use on Anthropometric and Morphological Data Collected During Stance and Gait”. PhD thesis. University of Technology Sydney, 2021

2021
[18]

Foot placement variables of pedestrians in community setting during curve walking

B. Bergsma et al. “Foot placement variables of pedestrians in community setting during curve walking”. In:Gait & Posture86 (2021), pp. 120–124.doi:10.1016/j.gaitpost.2021.03.017

work page doi:10.1016/j.gaitpost.2021.03.017 2021
[19]

The logical foundations of forensic science: towards reliable knowledge

Ian Evett. “The logical foundations of forensic science: towards reliable knowledge”. In:Philo- sophical Transactions of the Royal Society B: Biological Sciences370.1674 (2015), p. 20140263. doi:10.1098/rstb.2014.0263

work page doi:10.1098/rstb.2014.0263 2015
[20]

Advancing a paradigm shift in evaluation of forensic evidence: The rise of forensic data science

Geoffrey Stewart Morrison. “Advancing a paradigm shift in evaluation of forensic evidence: The rise of forensic data science”. In:Forensic Science International: Synergy5 (2022), p. 100270. doi:https://doi.org/10.1016/j.fsisyn.2022.100270

work page doi:10.1016/j.fsisyn.2022.100270 2022
[21]

Probative value of gait analysis

Graham Jackson and Ivan Birch. “Probative value of gait analysis”. In:Forensic Gait Analysis. Num Pages: 18. CRC Press, 2020

2020
[22]

Establishing the most appropriate databases for addressing source level propositions

C. Champod, I.W. Evett, and G. Jackson. “Establishing the most appropriate databases for addressing source level propositions”. In:Science & Justice44.3 (2004), pp. 153–164.doi: 10.1016/S1355-0306(04)71708-6

work page doi:10.1016/s1355-0306(04)71708-6 2004
[23]

Biometric recognition by gait: A survey of modalities and features

Patrick Connor and Arun Ross. “Biometric recognition by gait: A survey of modalities and features”. In:Computer Vision and Image Understanding167 (2018), pp. 1–27.doi:10.1016/ j.cviu.2018.01.007

2018
[24]

Mathematical Contributions to the Theory of Evolution. VII. On the Correla- tion of Characters not Quantitatively Measurable

Karl Pearson. “Mathematical Contributions to the Theory of Evolution. VII. On the Correla- tion of Characters not Quantitatively Measurable”. In:Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character195 (1900), pp. 1–405. 22

1900
[25]

Evaluation of Trace Evidence in the Form of Multivariate Data

C. G. G. Aitken and D. Lucy. “Evaluation of Trace Evidence in the Form of Multivariate Data”. In:Journal of the Royal Statistical Society. Series C (Applied Statistics)53.1 (2004). Publisher: [Wiley, Royal Statistical Society], pp. 109–122

2004
[26]

I. T. Jolliffe.Principal Component Analysis. Springer Series in Statistics. New York, NY: Springer New York, 1986.doi:10.1007/978-1-4757-1904-8

work page doi:10.1007/978-1-4757-1904-8 1986
[27]

Stanislav Kolenikov and Gustavo Angeles.The use of discrete data in PCA: theory, simula- tions, and applications to socioeconomic indices. 2004

2004
[28]

R package version 2.2.5

William Revelle.psych: Procedures for Psychological, Psychometric, and Personality Research. R package version 2.2.5. Northwestern University. Evanston, Illinois, 2022

2022
[29]

R Foundation for Statistical Computing

R Core Team.R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2022

2022
[30]

Choosing the window width when estimating a density

B. W. Silverman. “Choosing the window width when estimating a density”. In:Biometrika 65.1 (1978), pp. 1–11.doi:10.1093/biomet/65.1.1

work page doi:10.1093/biomet/65.1.1 1978
[31]

Information-Theoretical Assessment of the Performance of Likelihood Ratio Computation Methods

Daniel Ramos et al. “Information-Theoretical Assessment of the Performance of Likelihood Ratio Computation Methods”. In:Journal of Forensic Sciences58.6 (2013), pp. 1503–1518. doi:10.1111/1556-4029.12233

work page doi:10.1111/1556-4029.12233 2013
[32]

Facial Identification Scientific Working Group.Facial Comparison Overview and Methodology Guidelines v1.0. 2019

2019
[33]

Footwear Examination and Analysis

Denis Wesley Vernon and John A. DiMaggio. “Footwear Examination and Analysis”. In:Foren- sic Podiatry. 2nd ed. Num Pages: 34. CRC Press, 2017

2017
[34]

Kurt Varmuza and Peter Filzmoser.Introduction to Multivariate Statistical Analysis in Chemo- metrics. 1st ed. CRC Press, 2016.doi:10.1201/9781420059496

work page doi:10.1201/9781420059496 2016
[35]

Analysis of variance of designed chromatographic data sets: The analysis of variance-target projection approach

Federico Marini et al. “Analysis of variance of designed chromatographic data sets: The analysis of variance-target projection approach”. In:Journal of Chromatography A1405 (2015), pp. 94– 102.doi:10.1016/j.chroma.2015.05.060

work page doi:10.1016/j.chroma.2015.05.060 2015
[36]

doi:10.1016/j

Alyssa Allen, Mary R. Williams, and Michael E. Sigman. “Application of likelihood ratios and optimal decision thresholds in fire debris analysis based on a partial least squares discriminant analysis (PLS-DA) model”. In:Forensic Chemistry16 (2019), p. 100188.doi:10.1016/j. forc.2019.100188

work page doi:10.1016/j 2019
[37]

Forensic comparison of pyrograms using score-based likelihood ratios

Agnieszka Martyna, Grzegorz Zadora, and Daniel Ramos. “Forensic comparison of pyrograms using score-based likelihood ratios”. In:Journal of Analytical and Applied Pyrolysis133 (2018), pp. 198–215.doi:10.1016/j.jaap.2018.03.024

work page doi:10.1016/j.jaap.2018.03.024 2018
[38]

Probabilistic Evaluation of Handwriting Evidence: Likelihood Ratio for Authorship

Silvia Bozza et al. “Probabilistic Evaluation of Handwriting Evidence: Likelihood Ratio for Authorship”. In:Journal of the Royal Statistical Society Series C: Applied Statistics57.3 (2008), pp. 329–341.doi:10.1111/j.1467-9876.2007.00616.x. 23

work page doi:10.1111/j.1467-9876.2007.00616.x 2008