pith. sign in

arxiv: 2606.19469 · v1 · pith:JID7QYQZnew · submitted 2026-06-17 · 💻 cs.AI · cs.SE

Measuring Curriculum Alignment across Topical Coverage, Competency, and Cognitive Depth: A Longitudinal Framework Applied to CS2013 and CS2023

Pith reviewed 2026-06-26 21:03 UTC · model grok-4.3

classification 💻 cs.AI cs.SE
keywords curriculum alignmentCS2013CS2023knowledge unitscognitive depthcomputer science educationprogram evaluation
0
0 comments X

The pith

A computer science program covers roughly half the knowledge units in both CS2013 and CS2023 guidelines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a repeatable human-in-the-loop method to map a degree program's courses against external curriculum guidelines. It applies the method to one accredited BSc program against the 2013 and 2023 computer science standards. Coverage stays near 50 percent for both, with competency statements present for most covered units but recommended cognitive depth achieved for fewer units under the newer guideline. The approach isolates unchanging program shortfalls from shifts in the guidelines themselves.

Core claim

The program covers 49.7% of CS2023 and 50.9% of CS2013 knowledge units, near-constant across a decade. Extending the same retrieve-then-confirm design to competency articulation and cognitive depth shows that the program articulates the competency for ~88% of covered units under each guideline, yet delivers it at the recommended depth for 76% of present units under CS2023 against 95% under CS2013, a gap reflecting the newer guideline's raised expectations, not the program. The longitudinal comparison separates persistent structural gaps from differences that reflect the standard's evolution.

What carries the argument

Human-in-the-loop pipeline that represents the program and guidelines as structured corpora, generates candidate matches by semantic retrieval, and confirms them through human judgment under an explicit coverage definition.

If this is right

  • Coverage levels have remained nearly constant over ten years.
  • Competency articulation reaches about 88 percent of covered units under both guidelines.
  • Cognitive depth delivery drops from 95 percent under CS2013 to 76 percent under CS2023.
  • Certain areas such as parallel and distributed computing stay uncovered against both guidelines and ABET criteria.
  • The pipeline can be reused for other programs and later guideline versions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Other programs could run the same mapping process on their own curricula to track alignment as standards change.
  • Aggregating results across multiple programs might reveal which gaps are common rather than institution-specific.
  • The finding that a small sentence model outperformed a long-context model on this task suggests retriever choice needs case-by-case testing.

Load-bearing premise

Human raters applying the explicit coverage definition produce reliable and reproducible mappings between courses and knowledge units.

What would settle it

A third independent rater producing substantially different coverage percentages for the same program and guidelines would show the reported alignments are not reproducible.

Figures

Figures reproduced from arXiv: 2606.19469 by Khaled Shuaib, Mamoun Awad, Mary John, Nazar Zaki, Saja Aldabet, Sherzod Turaev.

Figure 1
Figure 1. Figure 1: Overview of the three-lens curriculum-alignment framework. The program and the two guideline [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Retriever benchmark: mean average precision and recall at ten against the pooled, human-confirmed [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Program coverage of CS2023 knowledge units by knowledge area; bar shade encodes the CS2023 [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Program emphasis minus CS2023 recommended-hour share, by knowledge area. Blue indicates [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Coverage of the same program against CS2013 and CS2023 by aligned knowledge area. [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
read the original abstract

Undergraduate computer science is governed by international curricular guidelines revised about once a decade, yet programs lack a reliable, reproducible way to measure how completely they cover the current guidelines and how that coverage shifts when the guidelines are restructured. We address this with a human-in-the-loop pipeline that measures a program's coverage of an external body of knowledge, applied longitudinally to one accredited BSc in Computer Science against Computer Science Curricula 2013 (CS2013) and 2023 (CS2023). The pipeline represents the program and each guideline as structured corpora, generates candidate course-to-knowledge-unit matches by semantic retrieval, and confirms them through human judgment under an explicit coverage definition. Of seven benchmarked retrievers, a reciprocal-rank-fusion ensemble was strongest, and a reputed long-context model underperformed a small sentence model, so retriever choice must be measured. Both maps were validated by an independent second rater (Cohen's kappa 0.64 for CS2023, 0.69 for CS2013). The program covers 49.7% of CS2023 and 50.9% of CS2013 knowledge units, near-constant across a decade. Extending the same retrieve-then-confirm design to competency articulation and cognitive depth shows that the program articulates the competency for ~88% of covered units under each guideline, yet delivers it at the recommended depth for 76% of present units under CS2023 against 95% under CS2013, a gap reflecting the newer guideline's raised expectations, not the program. The longitudinal comparison separates persistent structural gaps (parallel and distributed computing, foundations of programming languages, systems fundamentals), uncovered against both guidelines and ABET, from differences that reflect the standard's evolution. The instrument is reusable and available from the authors on request.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents a human-in-the-loop retrieve-then-confirm pipeline that maps a single accredited CS program's courses to the knowledge units of CS2013 and CS2023, reporting 49.7% coverage of CS2023 units and 50.9% of CS2013 units, ~88% competency articulation for covered units under both, and recommended cognitive depth met for 76% of present units under CS2023 versus 95% under CS2013; the longitudinal comparison attributes the depth gap to raised expectations in the newer guideline while identifying persistent uncovered areas such as parallel computing.

Significance. If the mappings prove stable, the work supplies a reusable, benchmarked instrument for longitudinal curriculum alignment that separates program structure from guideline evolution and is offered to other programs; the retriever benchmarking (showing a small sentence model outperforming a long-context model) and explicit coverage definition are concrete contributions to empirical curriculum research.

major comments (1)
  1. [validation and inter-rater agreement paragraph] The central percentages (49.7%, 50.9%, 76%, 95%) are simple ratios of binary human confirmations performed after retrieval; the reported Cohen's kappa values of 0.64 (CS2023) and 0.69 (CS2013) indicate only moderate agreement between two raters, yet the manuscript provides no sensitivity analysis, per-unit disagreement counts, third-rater adjudication, or raw match data to demonstrate that modest shifts in a few dozen judgments would not move the headline figures by several points.
minor comments (2)
  1. [results] No error bars or confidence intervals accompany the reported percentages despite the reliance on human judgment.
  2. [methods] Full exclusion criteria for non-matches and the complete list of candidate matches before human confirmation are not supplied.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on validation. We address the concern about inter-rater agreement and robustness below.

read point-by-point responses
  1. Referee: The central percentages (49.7%, 50.9%, 76%, 95%) are simple ratios of binary human confirmations performed after retrieval; the reported Cohen's kappa values of 0.64 (CS2023) and 0.69 (CS2013) indicate only moderate agreement between two raters, yet the manuscript provides no sensitivity analysis, per-unit disagreement counts, third-rater adjudication, or raw match data to demonstrate that modest shifts in a few dozen judgments would not move the headline figures by several points.

    Authors: We agree that the moderate kappa values indicate a need for additional evidence of robustness. In the revision we will add a sensitivity analysis that identifies all units with rater disagreement, simulates alternative resolutions of those disagreements, and reports the resulting range for each headline percentage. We will also include a table of per-unit disagreement counts. The raw (anonymized) match data will be released as supplementary material. A third rater was not employed in the original study owing to resource constraints; the sensitivity analysis will serve as the primary demonstration that modest changes in a small number of judgments do not materially alter the reported figures. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical counts from retrieval-plus-human mapping

full rationale

The paper's central results are direct ratios obtained by applying an explicit coverage definition to human-confirmed matches between a program's course corpus and the guideline knowledge units. No equations, fitted parameters, or self-citations are invoked to generate the headline percentages (49.7 %, 50.9 %, 88 %, 76 %, 95 %); the pipeline is a measurement procedure whose outputs are the counted confirmations. Cohen's kappa values quantify rater agreement on the final binary judgments but do not enter the derivation of the fractions themselves. The work therefore contains no self-definitional, fitted-input, or self-citation-load-bearing steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central results rest on an explicit but unformalized coverage definition and on the assumption that the chosen guidelines constitute the authoritative body of knowledge. No free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Human judgments performed under the explicit coverage definition accurately reflect true alignment between courses and knowledge units.
    The pipeline's output percentages are produced by these judgments; the reported kappa values are the only quantitative check supplied.

pith-pipeline@v0.9.1-grok · 5888 in / 1314 out tokens · 37624 ms · 2026-06-26T21:03:21.833687+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 6 canonical work pages

  1. [1]

    Computer science curricula 2013: Curriculum guidelines for undergraduate degree programs in computer science

    ACM/IEEE-CS Joint Task Force on Computing Curricula. Computer science curricula 2013: Curriculum guidelines for undergraduate degree programs in computer science. Technical report, Association for Computing Machinery and IEEE Computer Society, 2013. URLhttps://www.acm.org/binaries/ content/assets/education/cs2013_web_final.pdf. Availableat: https://www.ac...

  2. [2]

    Computer science curricula 2023 (cs2023): The final report

    ACM/IEEE-CS/AAAI Joint Task Force on Computer Science Curricula. Computer science curricula 2023 (cs2023): The final report. Technical report, Association for Computing Machinery, IEEE Computer Society, and AAAI, 2024. URLhttps://csed.acm.org/

  3. [3]

    Criteria for accrediting computing programs, 2025–

    ABET Computing Accreditation Commission. Criteria for accrediting computing programs, 2025–

  4. [4]

    URLhttps://www.abet.org/wp-content/uploads/2024/11/ 2025-2026_CAC_Criteria.pdf

    Technical report, ABET, 2025. URLhttps://www.abet.org/wp-content/uploads/2024/11/ 2025-2026_CAC_Criteria.pdf. Available at: https://www.abet.org/wp-content/uploads/2024/ 11/2025-2026_CAC_Criteria.pdf[Accessed June 15, 2026]

  5. [5]

    Sherzod Turaev, Mary John, Mamoun Awad, Nazar Zaki, and Khaled Shuaib. An NLP-driven framework for curriculum–labor market alignment: Schema-constrained LLM extraction, ESCO-anchored semantic matching, and multi-dimensional gap quantification.arXiv preprint arXiv:2606.01982, 2026. URL https://arxiv.org/abs/2606.01982

  6. [6]

    Curriculum analysis of CS departments based on CS2013 by simplified, supervised LDA

    Takayuki Sekiya, Yoshitatsu Matsuda, and Kazunori Yamaguchi. Curriculum analysis of CS departments based on CS2013 by simplified, supervised LDA. InProceedings of the Fifth International Conference on Learning Analytics and Knowledge (LAK ’15), pages 330–339, 2015. doi: 10.1145/2723576.2723594

  7. [7]

    Mapping materials to curriculum standards for design, alignment, audit, and search

    Alec Goncharow, Matthew Mcquaigue, Erik Saule, Kalpathi Subramanian, Jamie Payton, and Paula Goolkasian. Mapping materials to curriculum standards for design, alignment, audit, and search. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (SIGCSE ’21), pages 295–301, 2021. doi: 10.1145/3408877.3432388

  8. [8]

    An ontology for representing curriculum and learning material.arXiv preprint arXiv:2506.05751, 2025

    Antrea Christou, Chris Davis Jaldi, Joseph Zalewski, Hande Küçük McGinty, Pascal Hitzler, and Cogan Shimizu. An ontology for representing curriculum and learning material.arXiv preprint arXiv:2506.05751, 2025

  9. [9]

    Krishnan, and E

    Nazar Zaki, Sherzod Turaev, Khaled Shuaib, A. Krishnan, and E. Mohamed. Automating the mapping of course learning outcomes to program learning outcomes using natural language processing for accurate educational program evaluation.Education and Information Technologies, 28:16723–16742, 2023. doi: 10.1007/s10639-023-11877-4

  10. [10]

    Automatic classification of pedagogical materials against CS curriculum guidelines.arXiv preprint arXiv:2602.03962, 2026

    Erik Saule, Kalpathi Subramanian, and Razvan Bunescu. Automatic classification of pedagogical materials against CS curriculum guidelines.arXiv preprint arXiv:2602.03962, 2026

  11. [11]

    Toward the visual understanding of computing curricula

    Shingo Takada, Ernesto Cuadros-Vargas, John Impagliazzo, Steven Gordon, Linda Marshall, Heikki Topi, Gerrit van der Veer, and Leslie Waguespack. Toward the visual understanding of computing curricula. Education and Information Technologies, 25:4231–4270, 2020. doi: 10.1007/s10639-020-10127-1

  12. [12]

    The use of semantic technologies in computer science curriculum: A systematic review.arXiv preprint arXiv:2205.00462, 2022

    Yixin Cheng and Bernardo Pereira Nunes. The use of semantic technologies in computer science curriculum: A systematic review.arXiv preprint arXiv:2205.00462, 2022

  13. [13]

    Understanding the progression of educational topics via semantic matching.arXiv preprint arXiv:2403.05553, 2024

    Tamador Alkhidir, Edmond Awad, and Aamena Alshamsi. Understanding the progression of educational topics via semantic matching.arXiv preprint arXiv:2403.05553, 2024

  14. [14]

    Sentence-BERT: Sentence embeddings using Siamese BERT-networks

    Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992,

  15. [15]

    URLhttps://aclanthology.org/D19-1410/

  16. [16]

    M3-embedding: Multi- linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation

    Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. M3-embedding: Multi- linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. arXiv preprint arXiv:2402.03216, 2024

  17. [17]

    Text embeddings by weakly-supervised contrastive pre-training.arXiv preprint arXiv:2212.03533, 2022

    Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Ma- jumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre-training.arXiv preprint arXiv:2212.03533, 2022

  18. [18]

    Towards general text embeddings with multi-stage contrastive learning.arXiv preprint arXiv:2308.03281, 2023

    Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. Towards general text embeddings with multi-stage contrastive learning.arXiv preprint arXiv:2308.03281, 2023

  19. [19]

    Cormack, Charles L

    Gordon V. Cormack, Charles L. A. Clarke, and Stefan Büttcher. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. InProceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’09), pages 758–759, 2009. doi: 10.1145/1571941.1572114

  20. [20]

    Richard and Koch, Gary G

    J. Richard Landis and Gary G. Koch. The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174, 1977. doi: 10.2307/2529310

  21. [21]

    Open University Press / McGraw-Hill, 4 edition, 2011

    John Biggs and Catherine Tang.Teaching for Quality Learning at University. Open University Press / McGraw-Hill, 4 edition, 2011. 23 A Preprint

  22. [22]

    Bloom’s for computing: Enhancing bloom’s revised taxonomy with verbs for computing disciplines

    ACM Committee for Computing Education in Community Colleges (CCECC). Bloom’s for computing: Enhancing bloom’s revised taxonomy with verbs for computing disciplines. Technical report, Association for Computing Machinery, 2023. 24