pith. machine review for the scientific record. sign in

arxiv: 2605.05958 · v2 · submitted 2026-05-07 · 💻 cs.AI

Recognition: no theorem link

Temporal Smoothness Doubly Robust Learning for Debiased Knowledge Tracing

Authors on Pith no claims yet

Pith reviewed 2026-05-11 00:53 UTC · model grok-4.3

classification 💻 cs.AI
keywords knowledge tracingdoubly robust estimationselection biastemporal smoothnessdebiasingimputation modelpropensity modeleducational data
0
0 comments X

The pith

A doubly robust estimator with temporal smoothness regularization debiases knowledge tracing by jointly optimizing the predictor and imputation model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Knowledge tracing relies on selectively observed logs that introduce selection bias into standard training. This paper develops a doubly robust estimator for KT by combining propensity modeling of exercise recommendations with imputation of prediction errors, guaranteeing unbiased mastery estimates if either component is accurate. In sequential KT, the estimator's variance leads to accumulating deviations that destabilize training over time. A derived generalization bound identifies temporal smoothness as key to controlling this variance. The proposed TSDR method jointly optimizes the KT predictor and imputation model with a smoothness regularizer to lower variance without compromising the unbiasedness guarantee.

Core claim

The paper establishes a doubly robust formulation for knowledge tracing that integrates a propensity model with an error imputation model to guarantee unbiased estimation of student mastery if either model is correctly specified. It derives a generalization bound showing that estimator variance drives accumulating stochastic deviations across sequential interactions. The TSDR framework then augments this estimator with a temporal smoothness regularizer on the imputation model and jointly optimizes it with the KT predictor, reducing variance and improving stability while preserving the doubly robust unbiasedness property.

What carries the argument

The TSDR framework, which augments a doubly robust KT estimator with a temporal smoothness regularizer to jointly train the predictor and imputation model while controlling variance accumulation over student sequences.

If this is right

  • KT models trained with TSDR produce unbiased mastery estimates despite non-random exercise selection in educational logs.
  • Reduced estimator variance improves training stability and limits error accumulation across long student interaction sequences.
  • The framework can be plugged into multiple existing KT backbones to boost their performance on real biased datasets.
  • The derived generalization bound supplies explicit guidance on how variance affects sequential prediction quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same doubly robust plus smoothness pattern could address selection bias in other sequential recommendation or prediction tasks outside education.
  • Excessive smoothness strength risks flattening genuine abrupt changes in student knowledge, so adaptive regularization may be needed in practice.
  • Online variants of TSDR could support real-time debiasing as new student logs arrive in live tutoring systems.

Load-bearing premise

That at least one of the propensity model or the error imputation model is correctly specified, and that temporal smoothness serves as a valid control on estimator variance in sequential KT without introducing new bias.

What would settle it

On synthetic sequential KT data with known ground-truth mastery probabilities, selection propensities, and error distributions, check whether TSDR recovers unbiased estimates when exactly one of the two models is correctly specified, and whether ablating the smoothness regularizer measurably increases variance and long-horizon error accumulation.

Figures

Figures reproduced from arXiv: 2605.05958 by Peilin Zhan, Ruichu Cai, Shuyi Pan, Wei Chen, Weilin Chen.

Figure 1
Figure 1. Figure 1: The causal mechanism of selection bias in KT: student view at source ↗
Figure 2
Figure 2. Figure 2: A Model-Agnostic DR Framework Overview. 5.1 Model Architecture We propose a Temporal-Smooth Doubly Robust (TSDR) learning framework that jointly learns a propensity model, an error imputation model, and a KT model. The overall archi￾tecture of our framework is presented in view at source ↗
Figure 4
Figure 4. Figure 4: The AUC and ACC of TSDR on varying number of view at source ↗
read the original abstract

Knowledge Tracing (KT) is fundamental to intelligent education systems, yet relies on educational logs that are selectively observed. The non-random nature of exercise recommendations and student choices inevitably induces severe selection bias. Most existing KT methods neglect this issue, training on observed logs using standard empirical risk, which yields biased mastery estimates and accumulates errors in subsequent recommendations. To address this, we introduce a doubly robust (DR) formulation for KT that integrates a propensity model with an error imputation model, theoretically guaranteeing unbiasedness if either model is accurate. Beyond unbiasedness, in the sequential setting of KT, we identify that the estimator's performance is compromised by variance-dependent stochastic deviations that accumulate over time, thereby causing training instability and limiting performance. To mitigate this, we derive a generalization bound that explicitly characterizes the impact of estimator variance and identifies temporal smoothness as a key factor in controlling it. Building on these theoretical insights, we propose the Temporal Smoothness Doubly Robust (TSDR) framework. TSDR jointly optimizes the KT predictor and the imputation model with a smoothness regularizer, effectively reducing variance while preserving the unbiasedness guarantee of DR. Experiments on multiple real-world benchmarks demonstrate that TSDR consistently enhances various state-of-the-art KT backbones, underscoring the vital role of principled bias correction in KT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes the Temporal Smoothness Doubly Robust (TSDR) framework for debiased Knowledge Tracing (KT). It combines a propensity model (for exercise recommendation probability) with an error imputation model to form a doubly robust estimator that is unbiased if either model is correctly specified. In the sequential KT setting, it derives a generalization bound linking estimator variance to performance degradation and identifies temporal smoothness as a controlling factor. TSDR jointly optimizes the KT predictor and imputation model by adding a temporal smoothness regularizer to the objective, with the claim that this reduces variance while preserving the DR unbiasedness guarantee. Experiments apply TSDR to multiple KT backbones on real-world benchmarks and report consistent gains.

Significance. If the central theoretical claim holds—that the smoothness regularizer can be added without violating the DR unbiasedness property when at least one nuisance model is correct—this would be a meaningful contribution to debiased sequential prediction. The application of DR estimation to KT addresses a real selection-bias issue in educational logs, and the variance-control insight via smoothness could generalize to other online recommendation or tutoring systems. The empirical improvements on standard KT datasets provide supporting evidence, though the strength depends on verification of the unbiasedness preservation.

major comments (2)
  1. [Abstract / TSDR objective derivation] Abstract and theoretical derivation of TSDR: The claim that adding the temporal smoothness regularizer preserves the DR unbiasedness guarantee requires an explicit argument showing that the regularizer term has zero expectation (or is orthogonal to the influence function) under correct specification of either the propensity or imputation model. In the sequential KT setting, where observations are temporally dependent, it is not immediate that a penalty on consecutive predictions or imputations satisfies this without additional assumptions on the true function; the generalization bound should be re-derived or extended to include the regularized objective.
  2. [Generalization bound section] Generalization bound and variance analysis: The bound is said to characterize the impact of estimator variance and identify temporal smoothness as a key control. However, the interaction between the smoothness penalty and the DR estimating equation must be shown not to inflate the variance term or introduce higher-order bias terms that accumulate over time steps; without this, the bound does not directly support the joint-optimization claim.
minor comments (2)
  1. [Method] The description of the joint optimization procedure would benefit from an explicit combined loss equation that separates the DR term, the smoothness regularizer, and any hyper-parameters.
  2. [Experiments] Experiments should report separate ablations for the DR component alone versus DR plus smoothness to isolate the contribution of the regularizer.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below, indicating where revisions will strengthen the theoretical foundations of TSDR while preserving the core contributions.

read point-by-point responses
  1. Referee: [Abstract / TSDR objective derivation] Abstract and theoretical derivation of TSDR: The claim that adding the temporal smoothness regularizer preserves the DR unbiasedness guarantee requires an explicit argument showing that the regularizer term has zero expectation (or is orthogonal to the influence function) under correct specification of either the propensity or imputation model. In the sequential KT setting, where observations are temporally dependent, it is not immediate that a penalty on consecutive predictions or imputations satisfies this without additional assumptions on the true function; the generalization bound should be re-derived or extended to include the regularized objective.

    Authors: We agree that an explicit argument is required to rigorously establish preservation of the DR unbiasedness property. While the original derivation shows that the main DR estimating equation remains unbiased when at least one nuisance model is correct, the smoothness regularizer was introduced as a variance-control term without a separate lemma confirming its expectation is zero under correct specification. In the revised manuscript we will add a new proposition proving that, under the sequential dependence structure of KT, the regularizer term (applied to the DR estimator) has zero expectation whenever either the propensity or imputation model is correctly specified. We will also re-derive the generalization bound for the regularized objective, showing that the added term does not alter the unbiasedness while tightening the variance component. revision: yes

  2. Referee: [Generalization bound section] Generalization bound and variance analysis: The bound is said to characterize the impact of estimator variance and identify temporal smoothness as a key control. However, the interaction between the smoothness penalty and the DR estimating equation must be shown not to inflate the variance term or introduce higher-order bias terms that accumulate over time steps; without this, the bound does not directly support the joint-optimization claim.

    Authors: We acknowledge that the current bound does not explicitly analyze the interaction between the smoothness penalty and the DR estimating equation. In the revision we will extend the variance analysis to include this interaction. Specifically, we will bound the higher-order terms arising from the regularizer in the sequential setting and demonstrate that they do not inflate the leading variance term or accumulate bias over time steps, under the same temporal dependence assumptions used in the original bound. This extended analysis will directly support the claim that joint optimization reduces variance without compromising the DR guarantee. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation chain is self-contained

full rationale

The paper invokes the standard doubly robust (DR) unbiasedness property (unbiased if either propensity or imputation model is correct) as a known result, then separately derives a generalization bound on estimator variance in the sequential KT setting and identifies temporal smoothness as a variance-control factor from that bound. The TSDR framework adds a smoothness regularizer motivated by the bound while asserting preservation of the DR guarantee. No self-definitional loops, fitted inputs renamed as predictions, load-bearing self-citations, uniqueness theorems imported from the same authors, or ansatzes smuggled via citation are present. The central claims reduce to standard DR theory plus an independent theoretical bound rather than to the paper's own fitted quantities or prior self-references by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; standard DR assumptions and a smoothness regularizer are implied but not detailed.

pith-pipeline@v0.9.0 · 5533 in / 963 out tokens · 29911 ms · 2026-05-11T00:53:28.292799+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 1 canonical work pages

  1. [1]

    1978 , publisher=

    The development of higher psychological processes , author=. 1978 , publisher=

  2. [2]

    Intelligent tutoring systems

    Anderson, John R and Boyle, C Franklin and Reiser, Brian J. Intelligent tutoring systems. Science. 1985

  3. [3]

    Adaptive intelligent tutoring systems for STEM education: analysis of the learning impact and effectiveness of personalized feedback

    Villegas-Ch, William and Buenano-Fernandez, Diego and Navarro, Alexandra Maldonado and Mera-Navarrete, Aracely. Adaptive intelligent tutoring systems for STEM education: analysis of the learning impact and effectiveness of personalized feedback. Smart Learning Environments. 2025

  4. [4]

    Knowledge tracing: Modeling the acquisition of procedural knowledge

    Corbett, Albert T and Anderson, John R. Knowledge tracing: Modeling the acquisition of procedural knowledge. User modeling and user-adapted interaction. 1994

  5. [5]

    The basics of item response theory

    Baker, Frank B. The basics of item response theory. 2001

  6. [6]

    Deep knowledge tracing

    Piech, Chris and Bassen, Jonathan and Huang, Jonathan and Ganguli, Surya and Sahami, Mehran and Guibas, Leonidas J and Sohl-Dickstein, Jascha. Deep knowledge tracing. Advances in neural information processing systems. 2015

  7. [7]

    Advances in neural information processing systems , volume=

    Attention is all you need , author=. Advances in neural information processing systems , volume=

  8. [8]

    Graph-based knowledge tracing: modeling student proficiency using graph neural network

    Nakagawa, Hiromi and Iwasawa, Yusuke and Matsuo, Yutaka. Graph-based knowledge tracing: modeling student proficiency using graph neural network. IEEE/WIC/aCM international conference on web intelligence. 2019

  9. [9]

    GIKT: a graph-based interaction model for knowledge tracing

    Yang, Yang and Shen, Jian and Qu, Yanru and Liu, Yunfei and Wang, Kerong and Zhu, Yaoming and Zhang, Weinan and Yu, Yong. GIKT: a graph-based interaction model for knowledge tracing. Joint European conference on machine learning and knowledge discovery in databases. 2020

  10. [10]

    A Self-Attentive Model for Knowledge Tracing

    Pandey, Shalini and Karypis, George. A Self-Attentive Model for Knowledge Tracing. International Educational Data Mining Society. 2019

  11. [11]

    Context-aware attentive knowledge tracing

    Ghosh, Aritra and Heffernan, Neil and Lan, Andrew S. Context-aware attentive knowledge tracing. Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2020

  12. [12]

    Statistical analysis with missing data

    Little, Roderick JA and Rubin, Donald B. Statistical analysis with missing data. 2019

  13. [13]

    Dual-Channel Adaptive Scale Hypergraph Encoders With Cross-View Contrastive Learning for Knowledge Tracing

    Li, Jiawei and Deng, Yuanfei and Qin, Yixiu and Mao, Shun and Jiang, Yuncheng. Dual-Channel Adaptive Scale Hypergraph Encoders With Cross-View Contrastive Learning for Knowledge Tracing. IEEE Transactions on Neural Networks and Learning Systems. 2024

  14. [14]

    Boosting Knowledge Tracing With Structure-Preserving Adaptive Contrastive Learning

    Mao, Shun and Li, Gang and Jiang, Yuncheng. Boosting Knowledge Tracing With Structure-Preserving Adaptive Contrastive Learning. IEEE Transactions on Consumer Electronics. 2025

  15. [15]

    Improving exercise-level knowledge tracing via knowledge Concept-based Memory Network

    Mao, Shun and Zhan, Jieyu and Deng, Yuanfei and Qin, Yixiu and Jiang, Yuncheng. Improving exercise-level knowledge tracing via knowledge Concept-based Memory Network. Expert Systems with Applications. 2025

  16. [16]

    Multi-Armed Bandits for Intelligent Tutoring Systems , booktitle =

    Benjamin Cl. Multi-Armed Bandits for Intelligent Tutoring Systems , booktitle =. 2015 , url =

  17. [17]

    npj Science of Learning , volume=

    The neurocognitive mechanism underlying math avoidance among math anxious people , author=. npj Science of Learning , volume=. 2025 , publisher=

  18. [18]

    Feedback loops and the longer-term: towards feedback spirals

    Carless, David. Feedback loops and the longer-term: towards feedback spirals. Assessment & Evaluation in Higher Education. 2019

  19. [19]

    How algorithmic confounding in recommendation systems increases homogeneity and decreases utility

    Chaney, Allison JB and Stewart, Brandon M and Engelhardt, Barbara E. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. Proceedings of the 12th ACM conference on recommender systems. 2018

  20. [20]

    Disentangled knowledge tracing for alleviating cognitive bias

    Zhou, Yiyun and Lv, Zheqi and Zhang, Shengyu and Chen, Jingyuan. Disentangled knowledge tracing for alleviating cognitive bias. Proceedings of the ACM on Web Conference 2025. 2025

  21. [21]

    Top-n recommendation with missing implicit feedback

    Lim, Daryl and McAuley, Julian and Lanckriet, Gert. Top-n recommendation with missing implicit feedback. Proceedings of the 9th ACM Conference on Recommender Systems. 2015

  22. [22]

    Item popularity and recommendation accuracy

    Steck, Harald. Item popularity and recommendation accuracy. Proceedings of the fifth ACM conference on Recommender systems. 2011

  23. [23]

    Doubly robust joint learning for recommendation on data missing not at random

    Wang, Xiaojie and Zhang, Rui and Sun, Yu and Qi, Jianzhong. Doubly robust joint learning for recommendation on data missing not at random. International Conference on Machine Learning. 2019

  24. [24]

    Investigating algorithmic bias on bayesian knowledge tracing and carelessness detectors

    Zambrano, Andres Felipe and Zhang, Jiayi and Baker, Ryan S. Investigating algorithmic bias on bayesian knowledge tracing and carelessness detectors. Proceedings of the 14th Learning Analytics and Knowledge Conference. 2024

  25. [25]

    Debiased Cognition Representation Learning for Knowledge Tracing

    Lv, Xiangwei and Wang, Guifeng and Chen, Jingyuan and Su, Hejian and Dong, Zhiang and Zhu, Yumeng and Liao, Beishui and Wu, Fei. Debiased Cognition Representation Learning for Knowledge Tracing. ACM Transactions on Information Systems. 2025

  26. [26]

    Forgetting-aware linear bias for attentive knowledge tracing

    Im, Yoonjin and Choi, Eunseong and Kook, Heejin and Lee, Jongwuk. Forgetting-aware linear bias for attentive knowledge tracing. Proceedings of the 32nd ACM international conference on information and knowledge management. 2023

  27. [27]

    Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining , pages=

    Learning process-consistent knowledge tracing , author=. Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining , pages=

  28. [28]

    Improving Contextual Models of Guessing and Slipping with a Truncated Training Set

    Baker, Ryan SJd and Corbett, Albert T and Aleven, Vincent. Improving Contextual Models of Guessing and Slipping with a Truncated Training Set. Educational Data Mining. 2008

  29. [29]

    More accurate student modeling through contextual estimation of slip and guess probabilities in bayesian knowledge tracing

    Baker, Ryan SJ d and Corbett, Albert T and Aleven, Vincent. More accurate student modeling through contextual estimation of slip and guess probabilities in bayesian knowledge tracing. International conference on intelligent tutoring systems. 2008

  30. [30]

    Improving model fairness with time-augmented Bayesian knowledge tracing

    Barrett, Jake and Day, Alasdair and Gal, Kobi. Improving model fairness with time-augmented Bayesian knowledge tracing. Proceedings of the 14th Learning Analytics and Knowledge Conference. 2024

  31. [31]

    Enhancing length generalization for attention based knowledge tracing models with linear biases

    Li, Xueyi and Bai, Youheng and Guo, Teng and Liu, Zitao and Huang, Yaying and Zhao, Xiangyu and Xia, Feng and Luo, Weiqi and Weng, Jian. Enhancing length generalization for attention based knowledge tracing models with linear biases. Proceedings of the thirty-third international joint conference on artificial intelligence (IJCAI-24). 2024

  32. [32]

    Learning consistent representations with temporal and causal enhancement for knowledge tracing

    Huang, Changqin and Wei, Hangjie and Huang, Qionghao and Jiang, Fan and Han, Zhongmei and Huang, Xiaodi. Learning consistent representations with temporal and causal enhancement for knowledge tracing. Expert Systems with Applications. 2024

  33. [33]

    Causal Confusion in Imitation Learning

    de Haan, Pim and Jayaraman, Dinesh and Levine, Sergey. Causal Confusion in Imitation Learning. NeurIPS. 2019

  34. [34]

    Proceedings of the ACM Web Conference 2024 , pages=

    Hd-kt: Advancing robust knowledge tracing via anomalous learning interaction detection , author=. Proceedings of the ACM Web Conference 2024 , pages=

  35. [35]

    The Eleventh International Conference on Learning Representations,

    Zitao Liu and Qiongqiong Liu and Jiahao Chen and Shuyan Huang and Weiqi Luo , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =

  36. [36]

    International Conference on Machine Learning , pages=

    Doubly Robust Causal Effect Estimation under Networked Interference via Targeted Learning , author=. International Conference on Machine Learning , pages=. 2024 , organization=

  37. [37]

    Identifying aberrant responses in intelligent tutoring systems: an application of anomaly detection methods

    Gorgun, Guher and Bulut, Okan. Identifying aberrant responses in intelligent tutoring systems: an application of anomaly detection methods. Psychological Test and Assessment Modeling. 2022

  38. [38]

    Rapid-guessing behavior: Its identification, interpretation, and implications

    Wise, Steven L. Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice. 2017

  39. [39]

    Enhancing knowledge tracing via adversarial training

    Guo, Xiaopeng and Huang, Zhijie and Gao, Jie and Shang, Mingyu and Shu, Maojing and Sun, Jun. Enhancing knowledge tracing via adversarial training. Proceedings of the 29th ACM international conference on multimedia. 2021

  40. [40]

    arXiv preprint arXiv:2308.07779 , year=

    Do We Fully Understand Students' Knowledge States? Identifying and Mitigating Answer Bias in Knowledge Tracing , author=. arXiv preprint arXiv:2308.07779 , year=

  41. [41]

    Rebalancing Discriminative Responses for Knowledge Tracing

    Cui, Jiajun and Qian, Hong and Zheng, Chanjin and Wang, Lu and Yu, Mo and Zhang, Wei. Rebalancing Discriminative Responses for Knowledge Tracing. ACM Transactions on Information Systems. 2025

  42. [42]

    Enhancing knowledge tracing with concept map and response disentanglement

    Park, Soonwook and Lee, Donghoon and Park, Hogun. Enhancing knowledge tracing with concept map and response disentanglement. Knowledge-Based Systems. 2024

  43. [43]

    Evaluation of recommendations: rating-prediction and ranking

    Steck, Harald. Evaluation of recommendations: rating-prediction and ranking. Proceedings of the 7th ACM conference on Recommender systems. 2013

  44. [44]

    Recommendations as treatments: Debiasing learning and evaluation

    Schnabel, Tobias and Swaminathan, Adith and Singh, Ashudeep and Chandak, Navin and Joachims, Thorsten. Recommendations as treatments: Debiasing learning and evaluation. international conference on machine learning. 2016

  45. [45]

    Data-efficient off-policy policy evaluation for reinforcement learning

    Thomas, Philip and Brunskill, Emma. Data-efficient off-policy policy evaluation for reinforcement learning. International conference on machine learning. 2016

  46. [46]

    Doubly Robust Policy Evaluation and Learning , booktitle =

    Miroslav Dud. Doubly Robust Policy Evaluation and Learning , booktitle =. 2011 , url =

  47. [47]

    Enhanced doubly robust learning for debiasing post-click conversion rate estimation

    Guo, Siyuan and Zou, Lixin and Liu, Yiding and Ye, Wenwen and Cheng, Suqi and Wang, Shuaiqiang and Chen, Hechang and Yin, Dawei and Chang, Yi. Enhanced doubly robust learning for debiasing post-click conversion rate estimation. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021

  48. [48]

    CDR: Conservative doubly robust learning for debiased recommendation

    Song, Zijie and Chen, Jiawei and Zhou, Sheng and Shi, Qihao and Feng, Yan and Chen, Chun and Wang, Can. CDR: Conservative doubly robust learning for debiased recommendation. Proceedings of the 32nd ACM international conference on information and knowledge management. 2023

  49. [49]

    Improving ad click prediction by considering non-displayed events

    Yuan, Bowen and Hsia, Jui-Yang and Yang, Meng-Yuan and Zhu, Hong and Chang, Chih-Yao and Dong, Zhenhua and Lin, Chih-Jen. Improving ad click prediction by considering non-displayed events. Proceedings of the 28th ACM international conference on information and knowledge management. 2019

  50. [50]

    Doubly robust off-policy evaluation for ranking policies under the cascade behavior model

    Kiyohara, Haruka and Saito, Yuta and Matsuhiro, Tatsuya and Narita, Yusuke and Shimizu, Nobuyuki and Yamamoto, Yasuo. Doubly robust off-policy evaluation for ranking policies under the cascade behavior model. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 2022

  51. [51]

    Towards Robust Knowledge Tracing Models via k-Sparse Attention

    Huang, Shuyan and Liu, Zitao and Zhao, Xiangyu and Luo, Weiqi and Weng, Jian. Towards Robust Knowledge Tracing Models via k-Sparse Attention. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2023

  52. [52]

    Tohoku Mathematical Journal, Second Series , volume=

    Weighted sums of certain dependent random variables , author=. Tohoku Mathematical Journal, Second Series , volume=. 1967 , publisher=

  53. [53]

    Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

    PTADisc: A Cross-Course Dataset Supporting Personalized Learning in Cold-Start Scenarios , author=. Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

  54. [54]

    Advances in neural information processing systems , volume=

    Automatic discovery of cognitive skills to improve the prediction of student learning , author=. Advances in neural information processing systems , volume=

  55. [55]

    Journal of Learning Analytics , volume=

    Adaptive geography practice data set , author=. Journal of Learning Analytics , volume=

  56. [56]

    Proceedings of the workshop on scientific findings from the aSSISTments longitudinal data competition, international conference on educational data mining , year=

    Assistments longitudinal data mining competition 2017: A preface , author=. Proceedings of the workshop on scientific findings from the aSSISTments longitudinal data competition, international conference on educational data mining , year=

  57. [57]

    Development data sets from KDD Cup , year=

    Algebra I 2005-2006 and bridge to algebra 2006-2007 , author=. Development data sets from KDD Cup , year=

  58. [58]

    User modeling and user-adapted interaction , volume=

    Addressing the assessment challenge with an online system that tutors as it assesses , author=. User modeling and user-adapted interaction , volume=. 2009 , publisher=

  59. [59]

    EdNet: A Large-Scale Hierarchical Dataset in Education

    Choi, Youngduck and Lee, Youngnam and Shin, Dongmin and Cho, Junghyun and Park, Seoyon and Lee, Seewoo and Baek, Jineon and Bae, Chan and Kim, Byungsoo and Heo, Jaewe. EdNet: A Large-Scale Hierarchical Dataset in Education. Artificial Intelligence in Education. 2020

  60. [60]

    2018 , publisher=

    Foundations of machine learning , author=. 2018 , publisher=

  61. [61]

    2022 , publisher=

    Lecture notes for machine learning theory (CS229M/STATS214) , author=. 2022 , publisher=

  62. [62]

    Cognitive skills and their acquisition , pages=

    Mechanisms of skill acquisition and the law of practice , author=. Cognitive skills and their acquisition , pages=. 2013 , publisher=

  63. [63]

    2016 , publisher=

    Causal Inference in Statistics: A Primer , author=. 2016 , publisher=

  64. [64]

    2025 , organization=

    Causal View of Time Series Imputation: Some Identification Results on Missing Mechanism , author=. 2025 , organization=