arxiv: 2605.04713 · v1 · submitted 2026-05-06 · 💻 cs.CV

Recognition: unknown

Not Every Subject Should Stay: Machine Unlearning for Noisy Engagement Recognition

Alexander Vedernikov

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:30 UTC · model grok-4.3

classification 💻 cs.CV

keywords machine unlearningnoisy labelsengagement recognitionsubject-level unlearningDAiSEEEngageNetTCCT-Netapproximate unlearning

0 comments

The pith

Subject-level machine unlearning removes the influence of noisy subjects from trained engagement recognition models and recovers 89-92 percent of the performance gain from full retraining at roughly one quarter of the cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Engagement recognition datasets are subject-indexed and contain noisy subjective labels, so removing an entire problematic subject after training is a practical sanitization task. The paper tests whether approximate machine unlearning can achieve this without retraining from scratch. Using a model-dependent proxy to rank harmful subjects, a lightweight unlearning step is applied to a baseline TCCT-Net model on DAiSEE and EngageNet. In K=3 forget-set settings the resulting model recovers 89.3 percent of the oracle gain on EngageNet and 92.5 percent on DAiSEE. The benefit is largest at intermediate forget-set sizes and holds at one-quarter the retraining cost, showing that subject-level unlearning can serve as a low-cost post-hoc correction when subject selection is reliable.

Core claim

Starting from a baseline trained on all subjects, candidate harmful subjects are ranked by a model-dependent proxy, a lightweight approximate unlearning update is applied, and the result is compared to an oracle retrained from scratch on the retained subjects only; in representative K=3 forget-set settings the unlearned model recovers 89.3 percent of the oracle gain on EngageNet and 92.5 percent on DAiSEE at roughly one quarter of retraining cost.

What carries the argument

Subject-level approximate machine unlearning that ranks harmful subjects via a model-dependent proxy and then applies a lightweight update, evaluated by comparing baseline, unlearned, and oracle models on the TCCT-Net architecture.

If this is right

Effectiveness peaks at an intermediate forget-set size across the tested small-audit regimes.
Approximate subject-level unlearning functions as a practical low-cost correction for noisy engagement datasets.
Gains depend on subject-selection quality and the chosen removal regime.
The method operates post-hoc on already-trained models without requiring full retraining.
Comparable recovery rates appear on both the EngageNet and DAiSEE datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same proxy-plus-unlearning pipeline could be tested on other noisy-label domains such as emotion recognition or clinical image annotation.
Embedding the approach in a deployed system would allow ongoing dataset sanitization without periodic full retraining.
Adaptive selection of which subjects to forget could be driven by live performance monitoring rather than a static proxy.
Lower retraining cost would make repeated refinement cycles feasible for teams with limited compute.

Load-bearing premise

The model-dependent proxy must correctly identify harmful subjects whose removal improves performance, and the approximate unlearning update must remove their influence without unintended degradation on the retained subjects.

What would settle it

An experiment in which the unlearned model shows no gain over the baseline or falls substantially short of the oracle on held-out retained-subject data under the same K=3 removal protocol.

read the original abstract

Engagement recognition datasets are typically subject-indexed and often contain noisy, subjective supervision, making post-hoc dataset revision a practical problem. Existing noisy-label and data-cleaning methods largely operate at the sample level before or during training, but do not directly address a different question: once a model has already been trained, can the influence of an entire problematic subject be removed without full retraining? We study this setting through subject-level machine unlearning as a post-hoc sanitization mechanism for engagement recognition. Starting from a baseline trained on all subjects, we rank candidate harmful subjects using a model-dependent proxy, apply a lightweight approximate unlearning update, and compare the result against an oracle model retrained from scratch on the retained subjects only. We instantiate this protocol on DAiSEE and EngageNet using Tensor-Convolution and Convolution-Transformer Network (TCCT-Net) as a fixed platform and evaluate three matched model states under the same removal scenario: baseline, unlearned, and oracle. In representative K=3 forget-set settings, the unlearned model recovers 89.3% and 92.5% of the oracle gain on EngageNet and DAiSEE, respectively, at roughly one quarter of retraining cost. Across the tested small-audit regimes, effectiveness is strongest at an intermediate forget-set size, indicating that approximate subject-level unlearning is a useful low-cost correction mechanism, but one whose benefit depends on subject selection quality and removal regime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Approximate unlearning recovers 89-92% of the oracle gain from dropping a few noisy subjects in engagement datasets at roughly one-quarter retraining cost, but the proxy ranking step has no controls against random selection.

read the letter

The main takeaway is that this paper gives a practical low-cost way to sanitize subject-indexed engagement data after a model is already trained. Starting from a baseline on all subjects, it ranks a small number of problematic ones with a model-dependent proxy, applies lightweight approximate unlearning, and shows the result closes most of the gap to an oracle retrained from scratch on the retained subjects only. On DAiSEE and EngageNet with K=3, it recovers 89.3% and 92.5% of the oracle improvement at about one-quarter the cost. That is a concrete, usable number for anyone who has already spent the training budget and now wants to audit a few subjects without starting over.

Referee Report

2 major / 2 minor

Summary. The paper claims that subject-level machine unlearning can serve as a practical post-hoc sanitization tool for engagement recognition models trained on noisy, subject-indexed datasets. Starting from a baseline model, subjects are ranked by a model-dependent proxy for harmfulness; an approximate unlearning update is then applied to remove the top-K subjects, and the result is compared to an oracle model retrained from scratch on the retained subjects only. On DAiSEE and EngageNet using TCCT-Net, the unlearned model recovers 89.3% and 92.5% of the oracle performance gain for representative K=3 forget sets at roughly one-quarter the retraining cost, with strongest benefits at intermediate forget-set sizes.

Significance. If the results hold, the work offers a low-cost alternative to full retraining for correcting subject-level noise in engagement recognition, a setting where datasets are often subjective and revision is needed after initial training. The concrete recovery percentages, direct oracle comparison, and cost ratios provide clear, falsifiable evidence of practical utility in small-audit regimes. This could influence deployment practices in computer vision applications where retraining budgets are limited.

major comments (2)

[Abstract and Experimental Evaluation section] Abstract and Experimental Evaluation section: the protocol reports 89.3% and 92.5% oracle-gain recovery for K=3 but contains no control that applies the same approximate unlearning update to randomly selected subjects instead of proxy-ranked ones. Without this baseline, the headline recovery numbers cannot be attributed to the unlearning step rather than the proxy simply surfacing subjects whose removal improves performance regardless of method.
[Method section (proxy definition)] Method section (proxy definition): the model-dependent proxy used to rank harmful subjects is described only at a high level with no explicit formula, pseudocode, or ablation against simpler alternatives such as per-subject validation loss or gradient norm. This detail is load-bearing for the claim that the overall pipeline isolates the benefit of approximate unlearning.

minor comments (2)

[Abstract] Abstract: the acronym TCCT-Net is introduced without a one-sentence description of its architecture or citation, which would aid readers unfamiliar with the fixed platform.
[Results presentation] Results presentation: the recovery percentages are given as point estimates with no error bars, standard deviations, or mention of the number of random seeds or statistical tests used to establish the 89.3% and 92.5% figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for major revision. We address each point below and will incorporate the suggested clarifications and controls to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and Experimental Evaluation section] Abstract and Experimental Evaluation section: the protocol reports 89.3% and 92.5% oracle-gain recovery for K=3 but contains no control that applies the same approximate unlearning update to randomly selected subjects instead of proxy-ranked ones. Without this baseline, the headline recovery numbers cannot be attributed to the unlearning step rather than the proxy simply surfacing subjects whose removal improves performance regardless of method.

Authors: We agree that a random-selection control is a valuable addition to isolate the role of the proxy. The existing design shows that approximate unlearning, when applied to proxy-identified subjects, recovers most of the performance gain obtained by exact oracle retraining on the retained set. This directly quantifies the fidelity of the unlearning step for those subjects. To further demonstrate that the proxy is effective at surfacing subjects whose removal is beneficial (rather than any removal), we will add experiments applying the identical unlearning procedure to randomly chosen forget sets of the same size and report the corresponding recovery percentages in the revised Experimental Evaluation section. revision: yes
Referee: [Method section (proxy definition)] Method section (proxy definition): the model-dependent proxy used to rank harmful subjects is described only at a high level with no explicit formula, pseudocode, or ablation against simpler alternatives such as per-subject validation loss or gradient norm. This detail is load-bearing for the claim that the overall pipeline isolates the benefit of approximate unlearning.

Authors: We acknowledge that the proxy is currently described at a high level. The proxy is a model-dependent quantity computed from the trained baseline to estimate each subject's contribution to performance degradation. In the revision we will supply the explicit formula, pseudocode for ranking and the subsequent unlearning update, and an ablation that compares the proxy against per-subject validation loss and gradient-norm alternatives. These additions will be placed in the Method section and will be accompanied by the corresponding experimental results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical recovery percentages are measured against an independent oracle retrain.

full rationale

The paper's protocol trains a baseline, ranks subjects with a model-dependent proxy, applies approximate unlearning, and reports recovery of oracle gain (89.3% and 92.5%) where the oracle is a full retrain from scratch on the retained subjects. This comparison is external to the unlearning step and proxy choice. No equations, fitted parameters, or self-citations are presented that reduce the reported percentages to the inputs by construction. The central result is a standard empirical benchmark rather than a self-referential derivation. Minor self-citation (if present) is not load-bearing for the headline claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard machine-learning assumptions about influence functions and the validity of approximate unlearning updates; no free parameters, invented entities, or ad-hoc axioms are explicitly introduced in the abstract.

axioms (1)

domain assumption Approximate unlearning updates can remove subject influence without full retraining
Invoked as the core mechanism enabling the low-cost correction.

pith-pipeline@v0.9.0 · 5554 in / 1127 out tokens · 37781 ms · 2026-05-08T17:30:49.975741+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 20 canonical work pages · 3 internal anchors

[1]

arXiv preprint arXiv:1609.01885 (2016)

Gupta, A., D’Cunha, A., Awasthi, K., Balasubramanian, V.N.: DAiSEE: Towards user engagement recognition in the wild. arXiv preprint arXiv:1609.01885 (2016)

work page arXiv 2016
[2]

In: Pro- ceedings of the 2023 International Conference on Multimodal Interaction (2023)

Singh, M., Hoque, X., Zeng, D., Wang, Y., Ikeda, K., Dhall, A.: Do i have your attention: A large scale engagement prediction dataset and baselines. In: Pro- ceedings of the 2023 International Conference on Multimodal Interaction (2023). https://doi.org/10.1145/3577190.3614164

work page doi:10.1145/3577190.3614164 2023
[3]

https: //arxiv.org/abs/2403.08824

Kumar, P., Vedernikov, A., Chen, Y., Zheng, W., Li, X.: Computational Analysis of Stress, Depression and Engagement in Mental Health: A Survey (2025). https: //arxiv.org/abs/2403.08824

work page arXiv 2025
[4]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp

Wu, C.-H., Liu, S.-Y., Huang, X., Wang, X., Zhang, R., Minciullo, L., Yiu, W.K., Kwan, K., Cheng, K.-T.: CMOSE: Comprehensive multi-modality online student engagement dataset with high-quality labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 4636– 4645 (2024)

2024
[5]

PriorNet: Prior-Guided Engagement Estimation from Face Video

Vedernikov, A.: PriorNet: Prior-Guided Engagement Estimation from Face Video (2026). https://arxiv.org/abs/2605.03615

work page internal anchor Pith review Pith/arXiv arXiv 2026
[6]

In: Advances in Neural Information Processing Systems, vol

Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I.W., Sugiyama, M.: Co-teaching: Robust training of deep neural networks with extremely noisy labels. In: Advances in Neural Information Processing Systems, vol. 31 (2018) 20

2018
[7]

Journal of Artificial Intelligence Research70, 1373–1411 (2021)

Northcutt, C.G., Jiang, L., Chuang, I.L.: Confident learning: Estimating uncer- tainty in dataset labels. Journal of Artificial Intelligence Research70, 1373–1411 (2021)

2021
[8]

IEEE Transactions on Neural Networks and Learning Systems35(11), 15170–15181 (2023) https://doi.org/10.1109/tnnls

Song, H., Kim, M., Park, D., Shin, Y., Lee, J.-G.: Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems34(11), 8135–8153 (2023) https://doi.org/10.1109/TNNLS. 2022.3152527

work page doi:10.1109/tnnls 2023
[9]

In: Proceedings of the 34th International Conference on Machine Learning

Koh, P.W., Liang, P.: Understanding black-box predictions via influence func- tions. In: Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1885–1894. PMLR, ??? (2017)

2017
[10]

In: Proceedings of the 2020 Conference on Empirical Meth- ods in Natural Language Processing (EMNLP), pp

Swayamdipta, S., Schwartz, R., Lourie, N., Wang, Y., Hajishirzi, H., Smith, N.A., Choi, Y.: Dataset cartography: Mapping and diagnosing datasets with training dynamics. In: Proceedings of the 2020 Conference on Empirical Meth- ods in Natural Language Processing (EMNLP), pp. 9275–9293. Association for Computational Linguistics, Online (2020)

2020
[11]

Proceedings of the IEEE Symposium on Security and Privacy , year =

Cao, Y., Yang, J.: Towards making systems forget with machine unlearning. In: 2015 IEEE Symposium on Security and Privacy, pp. 463–480 (2015). https://doi. org/10.1109/SP.2015.35

work page doi:10.1109/sp.2015.35 2015
[12]

In: Proceedings of the 37th International Conference on Machine Learning

Guo, C., Goldstein, T., Hannun, A., Maaten, L.: Certified data removal from machine learning models. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 3832–3842. PMLR, Virtual (2020)

2020
[13]

One Engine to Fuzz 'em All: Generic Language Processor Testing with Semantic Validation,

Bourtoule, L., Chandrasekaran, V., Choquette-Choo, C.A., Jia, H., Travers, A., Zhang, B., Lie, D., Papernot, N.: Machine unlearning. In: 2021 IEEE Sympo- sium on Security and Privacy (SP), pp. 141–159 (2021). https://doi.org/10.1109/ SP40001.2021.00019

work page arXiv 2021
[14]

Machine Unlearning: A Comprehensive Survey

Wang, W., Tian, Z., Yu, S.: Machine unlearning: A comprehensive survey. arXiv preprint arXiv:2405.07406 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Golatkar, A., Achille, A., Soatto, S.: Eternal sunshine of the spotless net: Selective forgetting in deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9304–9312 (2020)

2020
[16]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp

Vedernikov, A., Kumar, P., Chen, H., Sepp¨ anen, T., Li, X.: TCCT-Net: Two- stream network architecture for fast and efficient engagement estimation via behavioral feature signals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 4723–4732 (2024). https://doi.org/10.1109/CVPRW63382.2024.00475 21

work page doi:10.1109/cvprw63382.2024.00475 2024
[17]

Alexandria Engineering Journal 107, 198–204 (2024) https://doi.org/10.1016/j.aej.2024.06.074

Su, R., He, L., Luo, M.: Leveraging part-and-sensitive attention network and transformer for learner engagement detection. Alexandria Engineering Journal (2024) https://doi.org/10.1016/j.aej.2024.06.074

work page doi:10.1016/j.aej.2024.06.074 2024
[18]

Multimedia Tools and Applications83, 49641–49672 (2024) https://doi.org/10

Mandia, S., Mitharwal, R., Singh, K.: Automatic student engagement measure- ment using machine learning techniques: A literature study of data and methods. Multimedia Tools and Applications83, 49641–49672 (2024) https://doi.org/10. 1007/s11042-023-17534-9

2024
[19]

Computers14(3) (2025) https://doi

Alarefah, W., Kammoun Jarraya, S., Abuzinadah, N.: Transformer-based student engagement recognition using few-shot learning. Computers14(3), 109 (2025) https://doi.org/10.3390/computers14030109

work page doi:10.3390/computers14030109 2025
[20]

In: Pattern Recognition and Machine Intelligence: 10th International Conference, PReMI 2023, Kolkata, India, December 7–10, 2023, Proceedings

Gandhi, S., Fadia, A., Agrawal, R., Agrawal, S., Kumar, P.: MuOE: A multi-task ordinality aware approach towards engagement detection. In: Pattern Recognition and Machine Intelligence: 10th International Conference, PReMI 2023, Kolkata, India, December 7–10, 2023, Proceedings. Lecture Notes in Computer Science, vol. 14301, pp. 70–79 (2023). https://doi.or...

work page doi:10.1007/978-3-031-45170-6 2023
[21]

In: Speech and Computer: 26th Inter- national Conference, SPECOM 2024, Belgrade, Serbia, November 25–28, 2024, Proceedings, Part II

Dresvyanskiy, D., Karpov, A., Minker, W.: A cross-multi-modal fusion approach for enhanced engagement recognition. In: Speech and Computer: 26th Inter- national Conference, SPECOM 2024, Belgrade, Serbia, November 25–28, 2024, Proceedings, Part II. Lecture Notes in Computer Science, vol. 15300, pp. 3–17 (2024). https://doi.org/10.1007/978-3-031-78014-1 1

work page doi:10.1007/978-3-031-78014-1 2024
[22]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp

Vedernikov, A., Sun, Z., Kykyri, V.-L., Pohjola, M., Nokia, M., Li, X.: Ana- lyzing Participants’ Engagement during Online Meetings Using Unsupervised Remote Photoplethysmography with Behavioral Features . In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 389–399. IEEE Computer Society, Los Alamitos, CA, USA (2...

work page doi:10.1109/cvprw63382.2024.00044 2024
[23]

In: Proceedings of the 35th Interna- tional Conference on Machine Learning

Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight exam- ples for robust deep learning. In: Proceedings of the 35th Interna- tional Conference on Machine Learning. Proceedings of Machine Learn- ing Research, vol. 80, pp. 4334–4343. PMLR, Stockholm, Sweden (2018). https://proceedings.mlr.press/v80/ren18a.html

2018
[24]

https://arxiv

Vedernikov, A., Kumar, P., Chen, H., Sepp¨ anen, T., Li, X.: Vision Large Language Models Are Good Noise Handlers in Engagement Analysis (2025). https://arxiv. org/abs/2511.14749

work page arXiv 2025
[25]

In: Proceedings of the 36th International Confer- ence on Machine Learning

Ghorbani, A., Zou, J.: Data shapley: Equitable valuation of data for machine learning. In: Proceedings of the 36th International Confer- ence on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 2242–2251. PMLR, Long Beach, California, USA (2019). https://proceedings.mlr.press/v97/ghorbani19c.html 22

2019
[26]

In: International Conference on Learning Representations (2019)

Toneva, M., Sordoni, A., Combes, R., Trischler, A., Bengio, Y., Gordon, G.J.: An empirical study of example forgetting during deep neural network learning. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=BJlxm30cKm

2019
[27]

In: Advances in Neural Information Processing Systems, vol

Ginart, A.A., Guan, M.Y., Valiant, G., Zou, J.: Making AI forget you: Data dele- tion in machine learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

2019
[28]

In: Proceedings of the 32nd International Conference on Algorithmic Learning Theory

Neel, S., Roth, A., Sharifi-Malvajerdi, S.: Descent-to-delete: Gradient-based methods for machine unlearning. In: Proceedings of the 32nd International Conference on Algorithmic Learning Theory. Proceedings of Machine Learning Research, vol. 132, pp. 931–962. PMLR, Virtual Conference, Worldwide (2021). https://proceedings.mlr.press/v132/neel21a.html

2021
[29]

Proceedings of the AAAI Conference on Artificial Intelligence , author =

Graves, L., Nagisetty, V., Ganesh, V.: Amnesiac machine learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 11516–11524 (2021). https://doi.org/10.1609/aaai.v35i13.17371

work page doi:10.1609/aaai.v35i13.17371 2021
[30]

In: 2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR)

Seo, J., Lee, S.-H., Lee, T.-Y., Moon, S., Park, G.-M.: Generative unlearning for any identity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9151–9161 (2024). https://doi.org/10.1109/ CVPR52733.2024.00874

work page arXiv 2024
[31]

arXiv preprint arXiv:2003.10933 (2020)

Liu, Y., Xiao, Z., Wang, X., Tang, W., Shi, J., Ye, K., Xu, C.-Z.: Learn to for- get: User-level memorization elimination in federated learning. arXiv preprint arXiv:2003.10933 (2020)

work page arXiv 2003
[32]

Halimi, A., Kadhe, S., Rawat, A.S., Baracaldo, N., Oprea, A., Lee, J., Tremblay, C., Nandwani, Y.: Federated unlearning: How to efficiently erase a client in FL? arXiv preprint arXiv:2207.05521 (2022)

work page arXiv 2022
[33]

In: 31st USENIX Security Sympo- sium (USENIX Security 22), pp

Thudi, A., Jia, H., Shumailov, I., Papernot, N.: On the necessity of auditable algorithmic definitions for machine unlearning. In: 31st USENIX Security Sympo- sium (USENIX Security 22), pp. 4007–4022. USENIX Association, Boston, MA, USA (2022)

2022
[34]

In: Advances in Neural Information Processing Systems, vol

Pleiss, G., Zhang, T., Elenberg, E.R., Weinberger, K.Q.: Identify- ing mislabeled data using the area under the margin ranking. In: Advances in Neural Information Processing Systems, vol. 33 (2020). https://proceedings.neurips.cc/paper/2020/hash/c6102b3727b2a7d8b1bb6981147081ef- Abstract.html

2020
[35]

Distilling the Knowledge in a Neural Network

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015) 23

work page internal anchor Pith review arXiv 2015