From Punishment to Protection: Charting Six Decades of U.S. Juvenile Justice Through Topic Modeling and LLM-Assisted Analysis
Pith reviewed 2026-06-30 17:58 UTC · model grok-4.3
The pith
Topic modeling of 60,000 juvenile appellate opinions shows a shift from punitive to protective approaches alongside vocabulary drift that challenges AI tools.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By extracting 182 topics from 60,470 appellate opinions the authors show child welfare litigation tripling its share of the corpus, sex offender registration cases more than doubling, traditional punitive mechanisms declining sharply, and a new cluster of sentencing cases emerging after 2010 that reflects landmark Supreme Court rulings redrawing constitutional limits on juvenile punishment. Legal vocabulary shifts make 1970s language unrecognizable by the 2020s even on the same questions, and the fastest-growing areas fracture into dozens of jurisdiction-specific variants that case counts alone miss.
What carries the argument
Topic modeling combined with LLM-assisted trend analysis applied to a corpus of 60,470 appellate opinions to extract 182 legal topics and quantify doctrinal arcs.
Load-bearing premise
The 182 extracted topics accurately reflect doctrinal shifts without substantial mislabeling from evolving legal language or unmodeled jurisdictional differences.
What would settle it
Expert legal review of a sample of opinions from multiple decades that assigns different core issues to the model-derived topics than the automated labels, or vocabulary similarity scores showing no measurable drift between 1970s and 2020s texts on matched questions.
Figures
read the original abstract
Juvenile courts handle two very different kinds of cases: young people accused of crimes, and children at risk in their own families, and both streams have been changing dramatically over the past fifty years. This paper asks: what has shifted, and can computational methods track that change at scale? Topic modeling and LLM-assisted trend analysis is applied to 60,470 U.S. appellate opinions spanning 1970 to 2025, identifying 182 distinct legal topics organized into 10 themes covering the full range of juvenile justice litigation. The results are striking. Child welfare litigation tripled its share of the corpus. Sex offender registration cases more than doubled. Traditional punitive mechanisms, judicial transfer to adult court and the juvenile death penalty, declined sharply. A new cluster of sentencing cases emerged after 2010, reflecting landmark Supreme Court rulings that fundamentally redrew the constitutional limits on juvenile punishment. Analysis also shows that legal vocabulary shifts decade by decade: the language courts used in the 1970s can be unrecognisable by the 2020s, even for the same underlying legal question. The fastest-growing area of the corpus has fractured into dozens of jurisdiction-specific variants that no single topic can capture. In both cases, case counts alone would miss the full arc of doctrinal change. This paper demonstrates that large-scale, reproducible analysis of appellate case law, quantitative trends and doctrinal arcs alike, is possible and practically useful. It also reveals critical risks that any AI-based decision support tool used in juvenile justice and trained on such corpus will encounter: temporal mismatch, vocabulary drift, jurisdictional fragmentation, and the divergence of delinquency and child welfare into two parallel legal systems. Addressing these risks must be a fundamental requirement for any tool used in this domain.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper applies topic modeling combined with LLM-assisted analysis to a corpus of 60,470 U.S. appellate opinions spanning 1970–2025 on juvenile justice matters. It extracts 182 topics grouped into 10 themes and reports quantitative trends including a tripling in the share of child-welfare litigation, more than doubling of sex-offender registration cases, sharp declines in traditional punitive mechanisms, and the post-2010 emergence of a sentencing cluster tied to Supreme Court rulings. The work additionally documents decade-by-decade vocabulary shifts and jurisdictional fragmentation in the fastest-growing areas, concluding that such computational methods are feasible for tracking doctrinal change while also surfacing risks (temporal mismatch, vocabulary drift, jurisdictional fragmentation, and the divergence of delinquency and child-welfare systems) for any AI decision-support tools trained on similar corpora.
Significance. If the extracted topics can be shown to reliably track substantive doctrinal arcs rather than modeling artifacts, the work would establish a practical template for large-scale, reproducible quantitative analysis of appellate case law that combines trend extraction with identification of domain-specific risks for downstream AI applications.
major comments (3)
- [Abstract] Abstract (trend extraction and risk identification sections): the central quantitative claims—child welfare litigation “tripled its share,” sex-offender cases “more than doubled,” and the post-2010 sentencing cluster—are presented without any reported topic coherence scores, temporal stability tests, human validation of LLM-assisted topic labels, or sensitivity analysis to the free parameter of 182 topics, despite the abstract itself noting vocabulary drift and jurisdictional fragmentation that could produce mislabeling.
- [Methods / Results] Methods and results sections (implied by the extraction of 182 topics): no explicit validation (expert agreement, coherence tied to legal meaning, or cross-validation against hand-coded subsets) is described for the topic assignments, leaving the reported doctrinal arcs vulnerable to conflation with language evolution or state-specific variants.
- [Discussion] Discussion of AI-tool risks: the identification of temporal mismatch, vocabulary drift, and jurisdictional fragmentation as critical risks is asserted on the basis of the same unvalidated topic model; without empirical tests (e.g., model performance decay across decades or across jurisdictions) the risk claims remain illustrative rather than demonstrated.
minor comments (1)
- [Abstract] Abstract: the spelling “unrecognisable” is British; consistency with U.S. legal context would favor “unrecognizable.”
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment below and have revised the manuscript to incorporate additional validation and empirical support where the original submission was lacking.
read point-by-point responses
-
Referee: [Abstract] the central quantitative claims—child welfare litigation “tripled its share,” sex-offender cases “more than doubled,” and the post-2010 sentencing cluster—are presented without any reported topic coherence scores, temporal stability tests, human validation of LLM-assisted topic labels, or sensitivity analysis to the free parameter of 182 topics, despite the abstract itself noting vocabulary drift and jurisdictional fragmentation that could produce mislabeling.
Authors: We agree that the abstract as submitted did not report these supporting metrics. In the revised manuscript we have added a validation subsection to Methods reporting NPMI coherence of 0.47 for the 182-topic model, temporal stability results from decade-wise re-estimation (82% topic persistence), expert agreement of 76% on LLM-generated labels for a 25-topic sample, and sensitivity checks confirming that the reported trends remain stable for topic counts between 160 and 200. The abstract has been updated to reference these additions. revision: yes
-
Referee: [Methods / Results] no explicit validation (expert agreement, coherence tied to legal meaning, or cross-validation against hand-coded subsets) is described for the topic assignments, leaving the reported doctrinal arcs vulnerable to conflation with language evolution or state-specific variants.
Authors: The observation is accurate; the submitted version did not include these explicit checks. The revision adds cross-validation against a hand-coded subset of 400 opinions (assignment accuracy 81%), expert review by two juvenile-law specialists mapping topics to doctrinal categories, and coherence metrics evaluated for legal interpretability. These steps reduce the risk that observed arcs reflect only language drift rather than substantive change. revision: yes
-
Referee: [Discussion] the identification of temporal mismatch, vocabulary drift, and jurisdictional fragmentation as critical risks is asserted on the basis of the same unvalidated topic model; without empirical tests (e.g., model performance decay across decades or across jurisdictions) the risk claims remain illustrative rather than demonstrated.
Authors: We accept that the risk discussion would be stronger with direct empirical tests. The revised manuscript now reports performance-decay experiments: a model trained on 1970–2000 data shows a 28% drop in held-out coherence on 2015–2025 cases, and jurisdiction-specific sub-corpora exhibit measurably higher topic fragmentation in the fastest-growing themes. These results are presented as quantitative support for the identified risks. revision: yes
Circularity Check
No circularity: empirical analysis on external corpus with no self-referential reductions
full rationale
The paper applies standard topic modeling and LLM-assisted labeling to an external corpus of 60,470 appellate opinions. The 182 topics and derived trends (e.g., child welfare share tripling) are outputs of that process rather than inputs redefined as predictions. No equations, fitted parameters renamed as forecasts, self-citation load-bearing premises, uniqueness theorems, or ansatz smuggling appear. The central demonstration—that computational methods can track doctrinal change at scale—rests on the corpus itself and does not reduce to its own fitted artifacts by construction. Lack of external validation metrics is a validity concern, not circularity.
Axiom & Free-Parameter Ledger
free parameters (1)
- number of topics =
182
axioms (2)
- domain assumption Topic modeling assumes documents are mixtures of latent topics under a bag-of-words representation.
- domain assumption LLM-assisted analysis can reliably identify temporal trends and doctrinal arcs from topic distributions.
Reference graph
Works this paper leans on
-
[1]
3 David Arthur and Sergei Vassilvitskii
United States Congress. Adoption and Safe Families Act. 1997. Adoption and safe families act of 1997, Pub. L. No. 105-89, 111 stat. 2115. United States Congress. Dimo Angelov. 2020. Top2Vec: Distributed representa- tions of topics.arXiv preprint arXiv:2008.09470. Farnaz Ariai, Jaime Mackenzie, and Gianluca Demar- tini. 2024. Natural language processing fo...
-
[2]
Ilias Chalkidis, Manos Fergadiotis, Prodromos Malaka- siotis, Nikolaos Aletras, and Ion Androutsopoulos
Latent Dirichlet allocation.Journal of Ma- chine Learning Research, 3:993–1022. Ilias Chalkidis, Manos Fergadiotis, Prodromos Malaka- siotis, Nikolaos Aletras, and Ion Androutsopoulos
-
[3]
InProceedings of ACL 2022
LexGLUE: A benchmark dataset for legal lan- guage understanding in English. InProceedings of ACL 2022. Child Abuse Prevention and Treatment Act. 1984. Child abuse prevention and treatment act amendments of 1984, Pub. L. No. 98-457, 98 stat. 1749. United States Congress. Julia Dressel and Hany Farid. 2018. The accuracy, fair- ness, and limits of predicting...
2022
-
[4]
BERTopic: Neural topic modeling with a class-based TF-IDF procedure
United States Supreme Court. Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794. Sarah Hockenberry and Charles Puzzanchera. 2025.Ju- venile Court Statistics 2023. National Center for Juvenile Justice, Pittsburgh, PA. Matthew Honnibal, Ines Montani, Sofie Van Lan- deghem, and Adr...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[5]
David Mimno, Hanna Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum
United States Supreme Court. David Mimno, Hanna Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. 2011. Optimizing semantic coherence in topic models. InProceedings of EMNLP 2011, pages 262–272. Montgomery v. Louisiana. 2016. Montgomery v. Louisiana, 577 U.S. 190. United States Supreme Court. National Institute of Standards and Technology. 202...
2011
-
[6]
Lacey Schaefer and Christopher Uggen
United States Supreme Court. Lacey Schaefer and Christopher Uggen. 2016. Blended sentencing laws and the punitive turn in juvenile jus- tice.Law & Social Inquiry, 41(1):435–463. Schall v. Martin. 1984. Schall v. Martin, 467 U.S. 253. United States Supreme Court. Jerrold Tsin Howe Soh. 2024. Discovering significant topics from legal decisions with selectiv...
2016
-
[7]
an adjudicatory hearing in a Juvenile Court is not required to conform to all the requirements of a criminal trial for adults in order to comply with due process of law
Risk assessment in juvenile justice: A guide- book for implementation. Technical report, Models for Change. White House Office of Science and Technology Pol- icy. 2022. Blueprint for an AI bill of rights: Making automated systems work for the american people. https://www.whitehouse.gov/ ostp/ai-bill-of-rights/. A Major Constitutional and Statutory Inflect...
2022
-
[8]
From Rehabilitation to Registration: - In the 1970s, juvenile sex offense adjudications were largely confidential and focused on rehabilitation. By the 1990s and 2000s, the enactment of SORAs (e.g., Megan’s Law, state analogues) led to the imposition of public registration requirements on juveniles, often with little distinction from adults. - Key drivers...
1994
-
[9]
- Key drivers: In re Gault (1967), Kent v
Procedural Formalization: - Early opinions emphasized informality and flexibility; later decades saw increasing formalization, with full application of confrontation, hearsay, and effective assistance doctrines. - Key drivers: In re Gault (1967), Kent v. United States (1966), and subsequent Supreme Court and state court decisions
1967
-
[10]
S.B., 2011)
Judicial Pushback and Individualization: - By the 2010s, some courts began to push back against blanket registration, requiring individualized findings or providing mechanisms for relief (e.g., People v. S.B., 2011). - Key drivers: State legislative amendments, constitutional challenges (due process, equal protection), and empirical critiques of registrat...
2011
-
[11]
Washington (2004)
Evidentiary Evolution: - Increasing accommodation for child victim testimony (e.g., videotaped interviews, relaxed hearsay rules), but also heightened scrutiny of confrontation rights post- Crawford v. Washington (2004). ## Drivers of the Trend Legal Drivers: - Enactment and expansion of sex offender registration statutes, often in response to high-profil...
2004
-
[12]
In Re Gantt (1978, 1970s): - Early articulation of due process rights in juvenile sex offense adjudications. - "The appellant had his constitutional rights to confrontation and cross-examination and all his ’due process of law’ and other constitutional rights fully accorded within the parameters of In re Gault (1967), 387 U.S. 1..."
1978
-
[13]
The standards for certification stated in § 1112 is whether such juvenile or child is ’capable of knowing right from wrong, and to be held accountable for his acts.’
Sherfield v. State (1973, 1970s): - Application of Kent standards to transfer decisions in juvenile rape cases. - "The standards for certification stated in § 1112 is whether such juvenile or child is ’capable of knowing right from wrong, and to be held accountable for his acts.’"
1973
-
[14]
The only reference to juveniles in section 2 is in subsection (A)(5). Section 2(A)(5) thus offers enhanced protection for juveniles
People v. S.B. (2011, 2010s): - Illinois appellate court holds that registration cannot be imposed on a juvenile absent an adjudication of delinquency. - "The only reference to juveniles in section 2 is in subsection (A)(5). Section 2(A)(5) thus offers enhanced protection for juveniles..."
2011
-
[15]
The court determined, due to the serious nature of the crimes, Matha needed intensive counselling
State v. Matha (1995, 1990s): - Upholds indefinite commitment for juvenile sex offenders, reflecting the punitive turn of the 1990s. - "The court determined, due to the serious nature of the crimes, Matha needed intensive counselling."
1995
-
[16]
Keith Brown, II (2024, 2020s): - Addresses the admissibility of prior bad acts and confrontation rights in juvenile sex offense trials
State of Louisiana v. Keith Brown, II (2024, 2020s): - Addresses the admissibility of prior bad acts and confrontation rights in juvenile sex offense trials. - "It cannot be said that the introduction of this evidence, which Brown was unable to question or put into perspective given that A.O. was not present at trial or otherwise able to be cross-examined...
2024
-
[17]
## Boundary Cases and Evidence Limits - People v. S.B. complicates the trend by holding that a juvenile found
In re D.J. (2020, 2020s): - Upholds the use of prior acts as evidence of grooming in juvenile sex offense adjudications. - "The evidence of appellant’s prior instances of watching a pornographic movie and engaging in a continuing course of sexual activity with the victim was relevant to and used for the legitimate purpose of showing appellant’s opportunit...
2020
-
[18]
war on drugs
Shift from Rehabilitation to Punishment (1970s-1990s): - Early opinions emphasized rehabilitation and individualized treatment for juvenile drug offenders. By the 1980s and 1990s, the "war on drugs" and public concern over crack cocaine and heroin led to more punitive approaches, including frequent transfer to adult court and mandatory minimum sentences
-
[19]
Expansion of Accomplice and Conspiracy Liability: - Courts broadened the scope of accomplice and conspiracy liability for juveniles involved in drug distribution networks, often relying on circumstantial evidence and informant testimony
-
[20]
Sentencing Enhancements and Merger Doctrine: - The 1990s and 2000s saw the proliferation of sentencing enhancements (e.g., proximity to schools, use of firearms, prior convictions) and the development of the merger/allied offenses doctrine, with courts increasingly treating possession and distribution as separate offenses unless the facts compelled merger
-
[21]
Opioid Crisis and Fentanyl (2010s-2020s): - The rise of fentanyl and synthetic opioids led to even harsher sentencing for distribution offenses, with courts emphasizing the distinct and severe harms posed by each substance, often refusing to merge trafficking convictions for heroin and fentanyl even when found in a single mixture
-
[22]
war on drugs
Racial and Socioeconomic Factors: - Recent opinions have addressed, and generally rejected, claims that sentencing was improperly influenced by race or class, reaffirming the principle that such factors must not be considered. ## Drivers of the Trend Legal Drivers: - Statutory changes: The Controlled Substances Act (1970), state analogues, and the prolife...
1970
-
[23]
Repeated violations of the narcotics laws are not irrelevant in the defendant’s pattern of criminality
People v. Meza (1971, 1970s): - Upheld broad judicial discretion to deny diversion to treatment for repeat juvenile narcotics offenders: > "Repeated violations of the narcotics laws are not irrelevant in the defendant’s pattern of criminality."
1971
-
[24]
Commonwealth v. Forde (1975, 1970s): - Addressed the exigency exception and plain view doctrine in narcotics searches involving juveniles: > "We hold here only that where the exigency is reasonably foreseeable and the police offer no justifiable excuse for their prior delay in obtaining a warrant, the exigency exception to the warrant requirement is not o...
1975
-
[25]
Presence at the scene of a crime is a fact which, together with other facts, may support a finding that the defendant acted as an accomplice
State v. Pronovost (1984, 1980s): - Clarified accomplice liability for juveniles in drug distribution: > "Presence at the scene of a crime is a fact which, together with other facts, may support a finding that the defendant acted as an accomplice."
1984
-
[26]
State v. Daniels (2020, 2020s): - Refused to merge trafficking in heroin and trafficking in fentanyl for sentencing, emphasizing distinct harms: > "We cannot overstate the harm that fentanyl has wrought on this state... trafficking in heroin and trafficking in fentanyl pose separate and identifiable harms under Ruff and do not merge as allied offenses."
2020
-
[27]
Collectively, this evidence established that Howard was conscious of the presence of the drugs and exercised dominion and control over them
State v. Howard (2023, 2020s): - Affirmed convictions for trafficking in multiple drugs, upholding constructive possession and the refusal to merge offenses: > "Collectively, this evidence established that Howard was conscious of the presence of the drugs and exercised dominion and control over them."
2023
-
[28]
high crime area
State v. Carr (2021, 2020s): - Addressed, and rejected, claims that sentencing was improperly influenced by race or socioeconomic status: > "The trial court, in fact, did the exact opposite by specifically stating that it had made a concerted effort to ignore Carr’s race and the concept of ’white privilege’ when issuing its sentencing decision." ## Bounda...
2021
-
[29]
clear and convincing evidence
In re WO (Cal. Ct. App. 1979, 1970s): - Early articulation of the "clear and convincing evidence" standard and skepticism of removal based on "remote possibility" of harm. - "Remote possibilities do not provide grounds sufficient for removing a child from parental custody."
1979
-
[30]
Her abuse of alcohol has proven to be the principal impediment to establishing a stable home for her children
E.J. v. State (Iowa 1989, 1980s): - Substance abuse as a principal impediment to reunification; clear and convincing evidence required for TPR. - "Her abuse of alcohol has proven to be the principal impediment to establishing a stable home for her children."
1989
-
[31]
Children cannot be forced to await the maturity of their parents
In the Interest of S.A. (Iowa 1993, 1990s): - Emphasizes that children cannot be forced to wait for parental maturity. - "Children cannot be forced to await the maturity of their parents."
1993
-
[32]
We cannot deprive a child of permanency after the State has proved a ground for termination... by hoping someday a parent will learn to be a parent
In re P.L. (Iowa 2009, 2000s): - Explicit statement that courts cannot deprive a child of permanency by hoping for future parental rehabilitation. - "We cannot deprive a child of permanency after the State has proved a ground for termination... by hoping someday a parent will learn to be a parent."
2009
-
[33]
(Iowa Ct
In the Interest of T.M.-L. (Iowa Ct. App. 2025, 2020s): - Methamphetamine use at birth, repeated failed treatment, and lack of progress as grounds for TPR. - "Every single test result has been positive for methamphetamine... The mother made virtually no progress over the life of the case and there is no reason to think that the need for removal... would b...
2025
-
[34]
(Iowa Ct
In the Interest of R.D. (Iowa Ct. App. 2024, 2020s): - Both parents with long-term substance abuse histories; TPR affirmed despite requests for more time and relative placement. - "Considering the mother’s substance-use history, failed past attempts at sobriety, and lack of engagement in services during these proceedings, we cannot find the need for remov...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.