SafeScreen: A Safety-First Screening Framework for Personalized Video Retrieval for Vulnerable Users
Pith reviewed 2026-05-15 11:20 UTC · model grok-4.3
The pith
SafeScreen screens videos against each user's individual safety rules before any content is shown.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SafeScreen retrieves and presents personalized videos by first deriving individualized safety criteria from a user profile, then performing sequential approval through adaptive question generation, multimodal VideoRAG evidence collection, and LLM-based verification of safety, appropriateness, and relevance; the result is an explainable decision for each candidate that prioritizes constraint satisfaction over engagement signals.
What carries the argument
The sequential approval pipeline that extracts profile-driven safety criteria and verifies them via adaptive question generation plus multimodal video analysis before any exposure occurs.
If this is right
- Candidate videos are approved or rejected one at a time rather than ranked by popularity or relevance.
- The output list diverges from engagement-optimized rankings in the large majority of test cases.
- Safety, sensibleness, and groundedness scores remain high when checked by both automated and human evaluators.
- The method works on uncurated repositories without needing precomputed safety labels for each video.
- The same pipeline supports different care contexts by swapping the profile criteria.
Where Pith is reading between the lines
- The framework could be applied to other domains such as educational video selection for young learners if new safety criteria are defined.
- Real-time profile updates would allow the screening decisions to adapt as a user's needs or sensitivities change over time.
- Integration into existing platforms would shift the default from engagement-first to constraint-first retrieval for designated vulnerable accounts.
Load-bearing premise
LLM-based decisions guided by adaptive questions and multimodal analysis will catch harmful content and avoid approving unsafe videos for the specific user profile.
What would settle it
A controlled test in which domain experts review a set of videos containing subtle risks and check whether the system approves any of those videos or rejects clearly safe ones that meet the stated profile criteria.
Figures
read the original abstract
Open-domain video platforms offer rich, personalized content that could support health, caregiving, and educational applications, but their engagement-optimized recommendation algorithms can expose vulnerable users to inappropriate or harmful material. These risks are especially acute in child-directed and care settings (e.g., dementia care), where content must satisfy individualized safety constraints before being shown. We introduce SafeScreen, a safety-first video screening framework that retrieves and presents personalized video while enforcing individualized safety constraints. Rather than ranking videos by relevance or popularity, SafeScreen treats safety as a prerequisite and performs sequential approval or rejection of candidate videos through an automated pipeline. SafeScreen integrates three key components: (i) profile-driven extraction of individualized safety criteria, (ii) evidence-grounded assessments via adaptive question generation and multimodal VideoRAG analysis, and (iii) LLM-based decision-making that verifies safety, appropriateness, and relevance before content exposure. This design enables explainable, real-time screening of uncurated video repositories without relying on precomputed safety labels. We evaluate SafeScreen in a dementia-care reminiscence case study using 30 synthetic patient profiles and 90 test queries. Results demonstrate that SafeScreen prioritizes safety over engagement, diverging from YouTube's engagement-optimized rankings in 80-93% of cases, while maintaining high levels of safety coverage, sensibleness, and groundedness, as validated by both LLM-based evaluation and domain experts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SafeScreen, a safety-first screening framework for personalized video retrieval aimed at vulnerable users (e.g., dementia care). It extracts individualized safety criteria from user profiles, performs evidence-grounded assessments via adaptive question generation and multimodal VideoRAG, and uses LLM-based decision-making to approve or reject candidate videos before exposure. Rather than optimizing for engagement, the system treats safety as a prerequisite. Evaluation on 30 synthetic patient profiles and 90 test queries reports 80-93% divergence from YouTube's engagement-optimized rankings while claiming high safety coverage, sensibleness, and groundedness, validated by LLM-as-judge metrics and domain experts.
Significance. If the core pipeline reliably enforces individualized constraints without missing harmful content, SafeScreen could enable safer deployment of open video platforms in caregiving and educational settings. The design's emphasis on explainable, real-time screening without precomputed labels is a constructive contribution, but the current evaluation's dependence on synthetic profiles and internal LLM judgments provides limited evidence that the approach generalizes to real individualized safety needs.
major comments (2)
- [Evaluation] Evaluation section: safety coverage, sensibleness, and groundedness are defined and scored by the same LLM pipeline used in the screening system itself, creating circularity that does not independently measure missed harmful content or false approvals on the 90 test queries.
- [Evaluation] Evaluation section: the headline claim of reliable individualized safety verification rests on 30 synthetic profiles and LLM/expert judgments with no real-user validation, no baseline comparisons to other safety filters, and no quantification of LLM judgment error, which is load-bearing for the assertion that the framework works in dementia-care settings.
minor comments (2)
- [Abstract] Abstract and Evaluation: the 80-93% divergence range should be reported with per-profile or per-query breakdowns and confidence intervals rather than as a single aggregate.
- [Evaluation] The manuscript should clarify the exact prompting strategy and model versions used for both the screening pipeline and the LLM-as-judge evaluation to allow reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the evaluation of SafeScreen. We address each major comment point by point below, with revisions incorporated where feasible to improve clarity and evidence.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: safety coverage, sensibleness, and groundedness are defined and scored by the same LLM pipeline used in the screening system itself, creating circularity that does not independently measure missed harmful content or false approvals on the 90 test queries.
Authors: We acknowledge the risk of circularity when the same LLM pipeline contributes to both screening decisions and automated evaluation metrics. The original manuscript already includes independent validation by domain experts on a subset of the 90 queries, which we have now expanded in the revised version with a dedicated subsection detailing expert agreement rates, inter-rater reliability, and specific cases where expert review overrode or confirmed LLM outputs. This provides an external check on missed harmful content and false approvals. We have also added a limitations paragraph discussing LLM-as-judge biases. revision: partial
-
Referee: [Evaluation] Evaluation section: the headline claim of reliable individualized safety verification rests on 30 synthetic profiles and LLM/expert judgments with no real-user validation, no baseline comparisons to other safety filters, and no quantification of LLM judgment error, which is load-bearing for the assertion that the framework works in dementia-care settings.
Authors: We agree that reliance on 30 synthetic profiles constitutes a limitation for claims about real dementia-care deployment. The revised manuscript now includes explicit baseline comparisons against rule-based keyword filters and simple multimodal classifiers, with quantitative results showing SafeScreen's divergence and safety gains. We have also added quantification of LLM judgment error via agreement statistics with the domain experts (e.g., Cohen's kappa and disagreement cases). Real-user validation with vulnerable populations is not feasible within the scope of this work due to ethical and IRB constraints; we explicitly frame the current study as a controlled proof-of-concept and outline planned clinical trials as future work. revision: partial
- Real-user validation with actual vulnerable users (e.g., dementia patients) due to ethical and regulatory requirements
Circularity Check
LLM-based evaluation of safety decisions shares the same model class as the screening pipeline, risking circular overestimation of reliability
specific steps
-
other
[Abstract (Results paragraph)]
"Results demonstrate that SafeScreen prioritizes safety over engagement, diverging from YouTube's engagement-optimized rankings in 80-93% of cases, while maintaining high levels of safety coverage, sensibleness, and groundedness, as validated by both LLM-based evaluation and domain experts."
Safety coverage, sensibleness, and groundedness are defined and scored via the same LLM-based decision-making and adaptive question generation used inside the SafeScreen pipeline itself; the evaluator therefore risks reproducing the pipeline's own reasoning patterns rather than providing an independent check on missed harmful content or false approvals.
full rationale
The paper's central results (80-93% divergence from YouTube plus high safety coverage/sensibleness/groundedness) rest on LLM-as-judge validation of the outputs produced by an LLM-driven pipeline (profile extraction, adaptive question generation, VideoRAG analysis, and decision-making). While divergence from YouTube rankings can be measured externally, the safety metrics are generated and scored inside the same LLM reasoning loop on synthetic profiles, creating partial circularity in the validation of individualized safety enforcement. No equations or self-citations reduce the derivation by construction, so the circularity is moderate rather than total.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can generate accurate, grounded safety and appropriateness judgments from video content and user profiles
Reference graph
Works this paper leans on
-
[1]
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, et al
-
[2]
InAdvances in Neural Information Processing Systems, Vol
Flamingo: A visual language model for few-shot learning. InAdvances in Neural Information Processing Systems, Vol. 35. 23716–23736
-
[3]
Jan Batzner, Volker Stocker, Bingjun Tang, Anusha Natarajan, Qinhao Chen, Stefan Schmid, and Gjergji Kasneci. 2025. Whose Personae? Synthetic Persona Experiments in LLM Research and Pathways to Transparency. InProceedings of the Eighth AAAI/ACM Conference on AI, Ethics, and Society. AAAI, 343–354
work page 2025
-
[4]
Gary S. Collins, Karel G. M. Moons, Paula Dhiman, Richard D. Riley, Andrew L. Beam, Ben Van Calster, et al. 2024. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods.BMJ385 (2024), e078378. doi:10.1136/bmj-2024-078378
-
[5]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for YouTube recommendations. InProceedings of the Tenth ACM Conference on Recommender Systems. 191–198. doi:10.1145/2959100.2959190
-
[6]
Norah L Crossnohere, Mohamed Elsaid, Jonathan Paskett, Seuli Bose-Brill, and John F P Bridges. 2022. Guidelines for artificial intelligence in medicine: literature review and content analysis of frameworks.Journal of Medical Internet Research 24, 8 (2022), e36823. doi:10.2196/36823
-
[7]
James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, and Dasarathi Sampath. 2010. The YouTube video recommendation system. InProceedings of the Fourth ACM Conference on Recommender Systems. 293–296. doi:10.1145/1864708. 1864770
-
[8]
Anne A H de Hond, Artuur M Leeuwenberg, Lotty Hooft, Ilse M J Kant, Steven W J Nijman, Hendrikus J A van Os, Jiska J Aardoom, Thomas P A Debray, Ewoud Schuit, Maarten van Smeden, Johannes B Reitsma, Ewout W Steyerberg, Niels H Chavannes, and Karel G M Moons. 2022. Guidelines and quality criteria for artificial intelligence-based prediction models in healt...
-
[9]
Kathleen K. Fitzpatrick, Alison Darcy, and Molly Vierhile. 2017. Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): A randomized controlled trial.JMIR Mental Health4, 2 (2017), e19. doi:10.2196/mental.7785
-
[10]
Google LLC. 2015. YouTube Kids. https://www.youtubekids.com/. Accessed: 2025-01
work page 2015
-
[11]
Robert Gorwa, Reuben Binns, and Christian Katzenbach. 2020. Algorithmic content moderation: Technical and political challenges in the automation of platform governance.Big Data & Society7, 1 (2020), 2053951719897945. doi:10. 1177/2053951719897945
work page 2020
-
[12]
Kilem L. Gwet. 2014.Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters(4 ed.). Advanced Analytics, LLC, Gaithersburg, MD
work page 2014
-
[13]
Becky Inkster, Shubham Sarda, and Vinod Subramanian. 2018. An empathy- driven, conversational artificial intelligence agent (Wysa) for digital mental well-being: Real-world data evaluation mixed-methods study.JMIR mHealth and uHealth6, 11 (2018), e12106. doi:10.2196/mhealth.9785
-
[14]
Rishabh Kaushal, Jacob van de Kerkhof, Catalina Goanta, Gerasimos Spanakis, and Adriana Iamnitchi. 2024. Automated Transparency: A Legal and Empirical Analysis of the Digital Services Act Transparency Database. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 1121–1132. doi:10.1145/3630106.3658960
-
[15]
Jean-Baptiste Lamy, Abdelmalek Mouazer, Romain Léguillon, Romain Lelong, Stéfan J Darmoni, Karima Sedki, Sophie Dubois, and Hector Falcoff. 2024. Adap- tive questionnaires for facilitating patient data entry in clinical decision support systems: methods and application to STOPP/START v2.BMC Medical Informatics and Decision Making24, 1 (2024), 326. doi:10....
-
[16]
J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agree- ment for categorical data.Biometrics33, 1 (1977), 159–174
work page 1977
-
[17]
Amanda Lazar, Caroline Edasis, and Anne Marie Piper. 2017. A critical lens on dementia and design in HCI. InProceedings of the CHI Conference on Human Factors in Computing Systems. 2175–2188. doi:10.1145/3025453.3025638
-
[18]
Hao Li, Shuai Wu, Haoran Zheng, Xiaobo Jiang, Bo Jiang, and Chao Zhao. 2024. LLMs-as-judges: A comprehensive survey on LLM-based evaluation methods. arXiv preprint arXiv:2412.05579(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
Adekeye, Daniel Berish, Feng Yuan, and Xiaopeng Zhao
Yu-Ju Liao, Yu-Ling Jao, Marie Boltz, Olusegun T. Adekeye, Daniel Berish, Feng Yuan, and Xiaopeng Zhao. 2023. Use of a humanoid robot in supporting dementia care: A qualitative analysis.SAGE Open Nursing9 (2023), 23779608231179528. doi:10.1177/23779608231179528
-
[20]
Sonia Livingstone and Ellen J. Helsper. 2008. Parental mediation of children’s internet use.Journal of Broadcasting & Electronic Media52, 4 (2008), 581–599. doi:10.1080/08838150802437396
-
[21]
Joshua Maynez, Shashi Narayan, Bernd Bohnet, and Ryan McDonald. 2020. On faithfulness and factuality in abstractive summarization. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1906–1919
work page 2020
-
[22]
Junsoo Park, Seungyeon Jwa, Meiying Ren, Daeyoung Kim, and Sanghyuk Choi
- [23]
-
[24]
Xubin Ren, Lingrui Xu, Long Xia, Shuaiqiang Wang, Dawei Yin, and Chao Huang
-
[25]
VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos. arXiv:2502.01549 [cs.IR] https://arxiv.org/abs/2502.01549
-
[26]
Anna Riedmann, Philipp Schaper, and Birgit Lugrin. 2025. Reinforcement learning in education: A systematic literature review.International Journal of Artificial Intelligence in Education35 (2025), 1–65. doi:10.1007/s40593-025-00494-6
-
[27]
Landon Ring, Liyan Shi, Kayla Totzke, and Timothy Bickmore. 2015. Social support agents for older adults: Longitudinal affective computing in the home. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction. 551–557. doi:10.1109/ACII.2015.7344662
-
[28]
Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid
-
[29]
InProceedings of the IEEE/CVF International Conference on Computer Vision
VideoBERT: A joint model for video and language representation learning. InProceedings of the IEEE/CVF International Conference on Computer Vision. 7463–
-
[30]
doi:10.1109/ICCV.2019.00757
-
[31]
Zhulin Tao, Xiaohao Liu, Yewei Xia, Xiang Wang, Lifang Yang, Xianglin Huang, and Tat-Seng Chua. 2023. Self-supervised learning for multimedia recommenda- tion.IEEE Transactions on Multimedia25 (2023), 5107–5116. doi:10.1109/TMM. 2022.3177882
work page doi:10.1109/tmm 2023
-
[32]
Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kul- shreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yanping Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Vince...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[33]
Jun Wang and Ying Zhao. 2022. Affective video content analysis and recommen- dation: A survey.IEEE Access10 (2022), 126430–126447. doi:10.1109/ACCESS. 2022.3195050
-
[34]
Qifan Wang, Yinwei Wei, Jianhua Yin, Jianwei Wu, Xuemeng Song, and Liqiang Nie. 2023. DualGNN: Dual graph neural network for multimedia recommendation. IEEE Transactions on Multimedia25 (2023), 1074–1084. doi:10.1109/TMM.2021. 3138298
-
[35]
Feng Yuan, Rui Zhang, Dania Bilal, and Xiaopeng Zhao. 2021. Learning-based strategy design for robot-assisted reminiscence therapy based on a developed model for people with dementia. InProceedings of the International Conference on Social Robotics. 432–442. doi:10.1007/978-3-030-85717-1_42
-
[36]
Hamed Zamani, Susan Dumais, Nick Craswell, Paul Bennett, and Gord Lueck
-
[37]
Generating clarifying questions for information retrieval. InProceedings of The Web Conference 2020. ACM, 418–428. doi:10.1145/3366423.3380126 Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Zhao et al
-
[38]
Yongfeng Zhang and Xu Chen. 2020. Explainable recommendation: A survey and new perspectives.Foundations and Trends in Information Retrieval14, 1 (2020), 1–101. doi:10.1561/1500000071
-
[39]
Wenzheng Zhao. 2026. An Edge–Host–Cloud Architecture for Robot-Agnostic, Caregiver-in-the-Loop Personalized Cognitive Exercise: Multi-Site Deployment in Dementia Care.IEEE Transactions on Robotics (T-RO)(2026)
work page 2026
-
[40]
based on the specific clinical scenario
Wenzheng Zhao, Kruthika Gangaraju, and Fengpei Yuan. 2025. Multimodal Perception-Driven Decision-Making for Human-Robot Interaction: A Survey. Frontiers in Robotics and AI12 (2025), 1604472. A Implementation and Execution Protocol SafeScreen operates across multiple environments: GPT-4 API for profile extraction, risk detection, and question generation; N...
work page 2025
-
[41]
avoids accuracy thresholds, acknowledging metrics vary by content type and harm severity; for vulnerable populations, false negatives (showing harmful content) carry greater risk than false positives (over-cautious rejection). B.2 Hybrid AI-Human Evaluation Approach Following validation methodologies for LLM-as-a-judge frame- works [17, 21], we employ hyb...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.