pith. machine review for the scientific record. sign in

arxiv: 2605.10404 · v1 · submitted 2026-05-11 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Position: Life-Logging Video Streams Make the Privacy-Utility Trade-off Inevitable

Liang Yue, Sijie Cheng, Tianyuan Zou, Yang Liu, Ya-Qin Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:00 UTC · model grok-4.3

classification 💻 cs.CV
keywords life-logging videoprivacy-utility trade-offalways-on AIsmart glassespipeline-aware privacydata exploitation pipelinevisual sensingproactive agents
0
0 comments X

The pith

Life-logging video streams create an unavoidable privacy-utility trade-off for next-generation AI systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Always-on cameras in devices like smart glasses are turning continuous video recording into a standard feature that will support AI systems that perceive and respond to the world in real time. These streams reveal detailed personal information such as habits, emotions, and social connections that single images do not capture. Existing privacy tools either address only specific attacks or reduce the data's usefulness significantly, and they overlook how information moves through the full chain from capture to model training and inference. The paper concludes that this privacy-utility trade-off is therefore a core, unresolved problem for always-on AI and requires entirely new approaches that protect privacy across the entire data pipeline while preserving long-term utility. It also highlights the absence of standard ways to measure leaks and compare solutions as a barrier to progress.

Core claim

Life-logging video streams from pervasive always-on hardware form the backbone of next-generation AI systems that continuously perceive and react to the physical world. These streams expose sensitive information including behavioral patterns, emotional states, and social interactions beyond what isolated images reveal. Existing privacy protections are either attack-specific or incur substantial utility loss, and fail to consider the entire data exploitation pipeline. The authors therefore posit that the privacy-utility trade-off in life-logging video streams is a foundational challenge for next-generation AI systems that demands further investigation, and they call for novel pipeline-aware隐私

What carries the argument

The full data exploitation pipeline, from capture through processing, storage, and use in AI models, which current privacy methods do not address as a whole.

If this is right

  • Next-generation always-on AI systems will face reduced public trust and slower adoption unless the trade-off is resolved.
  • Privacy designs must jointly optimize utility and privacy across the entire long-horizon data pipeline rather than at isolated stages.
  • Formal metrics for quantifying privacy leakage in video streams are needed to guide development.
  • Standardized benchmarks for life-logging visual data will be required to compare new pipeline-aware methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Device makers may default to on-device processing only, limiting cloud-based world models and proactive agents.
  • Regulatory standards could emerge that restrict continuous visual sensing in consumer products until better protections exist.
  • New research may focus on semantic compression techniques that discard identifying details while retaining task-relevant information across time.

Load-bearing premise

Existing privacy protections cannot be extended or combined to handle continuous life-logging video without either leaving major attack vectors open or causing large drops in data utility.

What would settle it

A concrete pipeline-aware privacy method applied to real life-logging video data that maintains high downstream AI task performance while resisting a broad set of known and future attacks on the full pipeline.

read the original abstract

With the growing prevalence of always-on hardware such as smart glasses, body cameras, and home security systems, life-logging visual sensing is becoming inevitable, forming the backbone of persistent, always-on AI systems. Meanwhile, recent advances in proactive agents and world models signal a fundamental shift from episodic, prompt-driven tools to next-generation AI systems that continuously perceive and react to the physical world. Although life-logging video streams can substantially improve utility of these promising systems, they also introduce significant privacy risks by revealing sensitive information, such as behavioral patterns, emotional states, and social interactions, beyond what isolated images expose. If unresolved, these risks may undermine public trust and hinder the sustainable development of always-on AI technologies. Existing privacy protections are either attack-specific or incur substantial utility loss, and fail to consider the entire data exploitation pipeline. We therefore posit that the privacy-utility trade-off in life-logging video streams is a foundational challenge for next-generation AI systems that demands further investigation. We call for novel pipeline-aware privacy-preserving designs that jointly optimize utility and privacy for long-horizon life-logging visual data. In parallel, formal privacy leakage metrics and standardized benchmarks remain important open directions for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript is a position paper arguing that always-on life-logging video streams from devices such as smart glasses, body cameras, and home security systems create substantial privacy risks (revealing behavioral patterns, emotional states, and social interactions) for next-generation AI systems based on continuous perception and world models. It asserts that existing privacy protections are attack-specific or cause substantial utility loss and ignore the full data exploitation pipeline, leading to the claim that the privacy-utility trade-off is inevitable and foundational. The paper calls for pipeline-aware privacy-preserving designs that jointly optimize utility and privacy for long-horizon visual data, plus formal leakage metrics and standardized benchmarks.

Significance. If the position holds, it identifies a timely barrier to sustainable always-on AI and could usefully steer the community toward holistic, pipeline-aware privacy methods rather than piecemeal defenses. The manuscript correctly notes the shift from episodic to persistent visual sensing and the distinctive risks of video streams over isolated images. It also usefully flags the need for standardized benchmarks as a concrete open direction.

major comments (1)
  1. [Abstract] Abstract: the assertion that 'Existing privacy protections are either attack-specific or incur substantial utility loss, and fail to consider the entire data exploitation pipeline' is load-bearing for the inevitability claim yet is presented as a general premise without citations, concrete examples of overlooked pipeline stages, or discussion of why attack-specific methods cannot be composed into pipeline-aware solutions.
minor comments (1)
  1. The title's use of 'Inevitable' is a strong framing; the body should explicitly define what 'inevitable' means (e.g., without new research directions) to prevent misinterpretation as an absolute rather than a current-state observation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review, positive assessment of the position paper's timeliness, and recommendation for minor revision. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that 'Existing privacy protections are either attack-specific or incur substantial utility loss, and fail to consider the entire data exploitation pipeline' is load-bearing for the inevitability claim yet is presented as a general premise without citations, concrete examples of overlooked pipeline stages, or discussion of why attack-specific methods cannot be composed into pipeline-aware solutions.

    Authors: We agree that this premise is central to the inevitability claim and would benefit from explicit grounding. In the revised version we will (1) add a short clause in the abstract referencing the pipeline limitation, (2) insert a new paragraph early in the introduction that supplies concrete examples of attack-specific techniques (e.g., frame-level adversarial perturbations against attribute inference, differential privacy applied only at capture, or model-level defenses against membership inference), and (3) explain why such methods do not compose into pipeline-aware solutions: each targets an isolated stage and therefore leaves downstream cumulative leakage (long-horizon behavioral pattern extraction across continuous streams and world-model training) unaddressed. Relevant citations to the visual-privacy literature will be included. These additions strengthen the position without changing its core argument. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a position paper whose central claim is a call to treat the privacy-utility trade-off as foundational for next-generation AI and to pursue pipeline-aware designs. It advances no formal theorem, derivation, equations, fitted parameters, or quantitative predictions. The supporting premise about existing defenses is presented as motivation rather than a demonstrated result via self-referential construction or citation chain. No load-bearing step reduces to its own inputs by definition or self-citation, so the paper is self-contained as a non-technical advocacy piece.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a position statement and introduces no free parameters, axioms, or invented entities in a technical sense.

pith-pipeline@v0.9.0 · 5521 in / 954 out tokens · 72334 ms · 2026-05-12T04:00:23.470685+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

121 extracted references · 121 canonical work pages · 8 internal anchors

  1. [1]

    OpenClaw — Personal AI Assistant,

    OpenClaw, “OpenClaw — Personal AI Assistant,”https://openclaw.ai/, 2026. Official website. Accessed: 2026-04-22

  2. [2]

    Proactive Conversational AI: A Comprehensive Survey of Advancements and Opportunities,

    Deng, Y., Liao, L., Lei, W., Yang, G. H., Lam, W., and Chua, T.-S., “Proactive Conversational AI: A Comprehensive Survey of Advancements and Opportunities,”ACM Transactions on Information Systems, Vol. 43, No. 3, 2025, pp. 1–45

  3. [3]

    Towards Human-centered Proactive Conversational Agents,

    Deng, Y., Liao, L., Zheng, Z., Yang, G. H., and Chua, T.-S., “Towards Human-centered Proactive Conversational Agents,” Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024, pp. 807–818

  4. [4]

    Ask-before-Plan: Proactive Language Agents for Real-World Planning,

    Zhang, X., Deng, Y., Ren, Z., Ng, S. K., and Chua, T.-S., “Ask-before-Plan: Proactive Language Agents for Real-World Planning,”Findings of the Association for Computational Linguistics: EMNLP 2024, 2024, pp. 10836–10863

  5. [5]

    DINOv3

    Siméoni, O., Vo, H. V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khalidov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al., “DINOv3,”arXiv preprint arXiv:2508.10104, 2025

  6. [6]

    SAM 3: Segment Anything with Concepts

    Carion, N., Gustafson, L., Hu, Y.-T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alwala, K. V., Khedr, H., Huang, A., et al., “SAM 3: Segment Anything with Concepts,”arXiv preprint arXiv:2511.16719, 2025

  7. [7]

    Perception Encoder: The best visual embeddings are not at the output of the network

    Bolya, D., Huang, P.-Y., Sun, P., Cho, J. H., Madotto, A., Wei, C., Ma, T., Zhi, J., Rajasegaran, J., Rasheed, H., et al., “PerceptionEncoder: Thebestvisualembeddingsarenotattheoutputofthenetwork,”arXivpreprintarXiv:2504.13181,2025

  8. [8]

    Self-supervised learning from images with a joint-embedding predictive architecture,

    Assran, M., Duval, Q., Misra, I., Bojanowski, P., Vincent, P., Rabbat, M., LeCun, Y., and Ballas, N., “Self-supervised learning from images with a joint-embedding predictive architecture,”Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 15619–15629

  9. [9]

    Genie 3: A New Frontier for World Models,

    Google DeepMind, “Genie 3: A New Frontier for World Models,”https://deepmind.google/blog/genie-3-a-new- frontier-for-world-models/, Aug. 2025. Google DeepMind Blog. Accessed: 2026-04-22

  10. [10]

    Ray-Ban Meta AI Glasses,

    Meta, “Ray-Ban Meta AI Glasses,”https://www.meta.com/ai-glasses/ray-ban-meta/, 2026. Accessed: 2026-04-21

  11. [11]

    RayNeo AR Smart Glasses | Official Website,

    RayNeo, “RayNeo AR Smart Glasses | Official Website,”https://www.rayneo.com/, 2026. Accessed: 2026-04-21

  12. [12]

    ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems in the Wild

    Yang, B., Xu, L., Zeng, L., Guo, Y., Jiang, S., Lu, W., Liu, K., Xiang, H., Jiang, X., Xing, G., et al., “ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems,”arXiv preprint arXiv:2512.06721, 2025

  13. [13]

    A New Look at How Android XR Will Bring Gemini to Glasses and Headsets,

    Google, “A New Look at How Android XR Will Bring Gemini to Glasses and Headsets,”https://blog.google/ products-and-platforms/platforms/android/android-xr-gemini-glasses-headsets/ , May 2025. Google Official Blog. Accessed: 2026-04-21

  14. [14]

    HeyCyan – Smart Glasses Companion App,

    HeyCyan, “HeyCyan – Smart Glasses Companion App,”https://heycyan.net/, 2026. Official website. Accessed: 2026-04-29. 9

  15. [15]

    VisionClaw: Always-On AI Agents through Smart Glasses

    Liu, X., Lee, D., Gonzalez, E. J., Gonzalez-Franco, M., and Suzuki, R., “VisionClaw: Always-On AI Agents through Smart Glasses,”arXiv preprint arXiv:2604.03486, 2026

  16. [16]

    Multi-step or Direct: A Proactive Home-Assistant System Based on Commonsense Reasoning,

    Yamasaki, K., Tanaka, S., Yuguchi, A., Kawano, S., and Yoshino, K., “Multi-step or Direct: A Proactive Home-Assistant System Based on Commonsense Reasoning,”Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2025, pp. 561–572

  17. [17]

    From Image to Video: An Empirical Study of Diffusion Representations,

    Vélez, P., Polanía, L. F., Yang, Y., Zhang, C., Kabra, R., Arnab, A., and Sajjadi, M. S., “From Image to Video: An Empirical Study of Diffusion Representations,”Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 16948–16958

  18. [18]

    MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents,

    Zhou, Z., Qu, A., Wu, Z., Kim, S., Prakash, A., Rus, D., Zhao, J., Low, B. K. H., and Liang, P. P., “MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents,”First Workshop on Multi-Turn Interactions in Large Language Models, 2026

  19. [19]

    TAMEing Long Contexts in Personalization: Towards Training-Free andState-AwareMLLMPersonalizedAssistant,

    Hong, R., Lang, J., Zhong, T., Wang, Y., and Zhou, F., “TAMEing Long Contexts in Personalization: Towards Training-Free andState-AwareMLLMPersonalizedAssistant,”Proceedingsofthe32ndACMSIGKDDConferenceonKnowledgeDiscovery and Data Mining V. 1, 2026, pp. 452–463

  20. [20]

    Memex(rl): Scaling long-horizon llm agents via indexed experience memory,

    Wang, Z., Chen, H., Wang, J., and Wei, W., “Memex (RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory,”arXiv preprint arXiv:2603.04257, 2026

  21. [21]

    Vivint Smart Home Security Systems,

    Vivint, “Vivint Smart Home Security Systems,”https://www.vivint.com/, 2026. Official website. Accessed: 2026-04-29

  22. [22]

    SimpliSafe Home Security Systems,

    SimpliSafe, “SimpliSafe Home Security Systems,”https://simplisafe.com/value, 2026. Official promotional webpage. Accessed: 2026-04-29

  23. [23]

    AI Security Guard,

    Spot AI, “AI Security Guard,”https://www.spot.ai/ai-security-guard, 2026. Official product page. Accessed: 2026-04-29

  24. [24]

    Explosion Protected StreamCam,

    EarthCam, “Explosion Protected StreamCam,”https://www.earthcam.net/products/explosionprotectedstreamcam. php, 2026. Official product page. Accessed: 2026-04-29

  25. [25]

    Explosion Protected StreamCam Robotic,

    EarthCam, “Explosion Protected StreamCam Robotic,” https://www.earthcam.net/products/ explosionprotectedstreamcamrobotic.php, 2026. Official product page. Accessed: 2026-04-29

  26. [26]

    4 Channel Dash Cam Collection,

    IIWEY, “4 Channel Dash Cam Collection,”https://iiwey.com/collections/4-channel-dash-cam, 2026. Official product collection page. Accessed: 2026-04-29

  27. [27]

    Neideso Official Website,

    Neideso, “Neideso Official Website,”https://www.neideso.cn/, 2026. Official website. Accessed: 2026-04-29

  28. [28]

    DJI Nano,

    DJI, “DJI Nano,”https://www.dji.com/nano, 2026. Official product page. Accessed: 2026-04-28

  29. [29]

    Looki L1,

    Looki, “Looki L1,”https://www.looki.ai/products/looki-l1, 2026. Official product page. Accessed: 2026-04-28

  30. [30]

    Body Cameras: The Complete Guide for Law Enforcement Professionals,

    Axon, “Body Cameras: The Complete Guide for Law Enforcement Professionals,”https://www.axon.com/resources/ body-cameras-complete-guide, 2026. Axon Resources. Accessed: 2026-04-28

  31. [31]

    AI Flow at the Network Edge,

    Shao, J., and Li, X., “AI Flow at the Network Edge,”IEEE Network, 2025

  32. [32]

    Deep Face Recognition: A Survey,

    Wang, M., and Deng, W., “Deep Face Recognition: A Survey,”Neurocomputing, Vol. 429, 2021, pp. 215–244

  33. [33]

    PASS: Protected Attribute Suppression System for Mitigating Bias in Face Recognition,

    Dhar, P., Gleason, J., Roy, A., Castillo, C. D., and Chellappa, R., “PASS: Protected Attribute Suppression System for Mitigating Bias in Face Recognition,”Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15087–15096

  34. [34]

    Jain,L.C.,Halici,U.,Hayashi,I.,Lee,S.,andTsutsui,S.,IntelligentBiometricTechniquesinFingerprintandFaceRecognition, Routledge, 2022

  35. [35]

    The Pervasive Blind Spot: Benchmarking VLM Inference Risks on Everyday Personal Videos,

    Zhang, S., Li, Z., Wen, C., Ma, Y., Li, S., Zhang, G., Zhang, Z., Meng, Y., Zhao, H., Yi, X., et al., “The Pervasive Blind Spot: Benchmarking VLM Inference Risks on Everyday Personal Videos,”arXiv preprint arXiv:2511.02367, 2025

  36. [36]

    Collaborative Spatiotemporal Feature Learning for Video Action Recognition,

    Li, C., Zhong, Q., Xie, D., and Pu, S., “Collaborative Spatiotemporal Feature Learning for Video Action Recognition,” Proceedings of the ieee/cvf conference on computer vision and pattern recognition, 2019, pp. 7872–7881. 10

  37. [37]

    Actor Conditioned Attention Maps for Video Action Detection,

    Ulutan, O., Rallapalli, S., Srivatsa, M., Torres, C., and Manjunath, B., “Actor Conditioned Attention Maps for Video Action Detection,”Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 527–536

  38. [38]

    Frame-by-Frame: Tracking Emotions in Videos with AI,

    Legara, J. S., “Frame-by-Frame: Tracking Emotions in Videos with AI,”Medium, 2023. URLhttps://medium.com/ @johnsolomonlegara/frame-by-frame-tracking-emotions-in-videos-with-ai-ee31a1a05ab6 , accessed: 2026-04- 03

  39. [39]

    Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 (General Data Protection Regulation),

    European Parliament and Council of the European Union, “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 (General Data Protection Regulation),”https://eur-lex.europa.eu/eli/reg/2016/679/oj,

  40. [40]

    Official Journal of the European Union, L119, pp. 1–88

  41. [41]

    California Consumer Privacy Act (CCPA),

    California State Legislature, “California Consumer Privacy Act (CCPA),”https://cppa.ca.gov/regulations/pdf/ccpa_ statute.pdf, 2018. As amended, effective January 1, 2020

  42. [42]

    Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule,

    U.S. Department of Health & Human Services, “Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule,” https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html, 1996. Accessed 2026

  43. [43]

    Live Face De-Identification in Video,

    Gafni, O., Wolf, L., and Taigman, Y., “Live Face De-Identification in Video,”Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9378–9387

  44. [44]

    Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach,

    Huang, W., Ni, Y., Dehaghani, A. R., Jeong, S. E., Chen, H., Liu, Y., Wen, F., and Imani, M., “Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach,”Proceedings of the Winter Conference on Applications of Computer Vision, 2025, pp. 5239–5249

  45. [45]

    Audio-Visual Autoencoding for Privacy-Preserving Video Streaming,

    Xu, H., Cai, Z., Takabi, D., and Li, W., “Audio-Visual Autoencoding for Privacy-Preserving Video Streaming,”IEEE Internet of Things Journal, Vol. 9, No. 3, 2021, pp. 1749–1761

  46. [46]

    Preserving Privacy and Video Quality Through Remote Physiological Signal Removal,

    Bhutani, S., Elgendi, M., and Menon, C., “Preserving Privacy and Video Quality Through Remote Physiological Signal Removal,”Communications Engineering, Vol. 4, No. 1, 2025, p. 66

  47. [47]

    Privacy-ProtectedSleepStagingUsingBlurredVideos,

    Wang,Q.,Xia,M.,Zhu,Y.,Cheng,H.,andWang,W.,“Privacy-ProtectedSleepStagingUsingBlurredVideos,”IEEEJournal of Biomedical and Health Informatics, Vol. 29, No. 12, 2025, pp. 8839–8846

  48. [48]

    Protecting Visual Secrets Using Adversarial Nets,

    Machanavajjhala, A., Landon Cox, N. P., et al., “Protecting Visual Secrets Using Adversarial Nets,”Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 25–28

  49. [49]

    I Know That Person: Generative Full Body and Face De-Identification of People in Images,

    Brkic, K., Sikiric, I., Hrkac, T., and Kalafatic, Z., “I Know That Person: Generative Full Body and Face De-Identification of People in Images,”2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE, 2017, pp. 1319–1328

  50. [50]

    Privacy-Preserving Video Analytics Through GAN-Based Face De-Identification,

    More, R., Maity, A., Kambli, G., and Ambadekar, S., “Privacy-Preserving Video Analytics Through GAN-Based Face De-Identification,”2024 Second International Conference on Networks, Multimedia and Information Technology (NMITCON), IEEE, 2024, pp. 1–6

  51. [51]

    CartoonGAN: Generative Adversarial Networks for Photo Cartoonizatio,

    Chen, Y., Lai, Y.-K., and Liu, Y.-J., “CartoonGAN: Generative Adversarial Networks for Photo Cartoonizatio,”Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9465–9474

  52. [52]

    ASteganographyAlgorithmBasedonCycleGANforCovertCommunication in the Internet of Things,

    Meng,R.,Cui,Q.,Zhou,Z.,Fu,Z.,andSun,X.,“ASteganographyAlgorithmBasedonCycleGANforCovertCommunication in the Internet of Things,”IEEE Access, Vol. 7, 2019, pp. 90574–90584

  53. [53]

    Differential Privacy,

    Dwork, C., “Differential Privacy,”International colloquium on automata, languages, and programming, Springer, 2006, pp. 1–12

  54. [54]

    Communication-efficient learning of deep networks from decentralized data,

    McMahan,H.B.,Moore,E.,Ramage,D.,andyArcas,B.A.,“FederatedLearningofDeepNetworksUsingModelAveraging,” arXiv preprint arXiv:1602.05629, Vol. 2, 2016

  55. [55]

    More effort is needed to protect pedestrian privacy in the era of AI,

    Zhang, X., and Zhao, Z., “More effort is needed to protect pedestrian privacy in the era of AI,”The Thirty-Ninth Annual Conference on Neural Information Processing Systems Position Paper Track, 2025

  56. [56]

    Side-Channel Information Leakage of Encrypted Video Stream in Video SurveillanceSystems,

    Li, H., He, Y., Sun, L., Cheng, X., and Yu, J., “Side-Channel Information Leakage of Encrypted Video Stream in Video SurveillanceSystems,”IEEEINFOCOM2016-The35thAnnualIEEEInternationalConferenceonComputerCommunications, IEEE, 2016, pp. 1–9

  57. [57]

    Deep Residual Learning for Image Recognition,

    He, K., Zhang, X., Ren, S., and Sun, J., “Deep Residual Learning for Image Recognition,”Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. 11

  58. [58]

    Learning Transferable Visual Models from Natural Language Supervision,

    Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al., “Learning Transferable Visual Models from Natural Language Supervision,”International conference on machine learning, PmLR, 2021, pp. 8748–8763

  59. [59]

    CLID-ReID: Exploiting Vision-Language Model for Image re-identification without Concrete Text Labels,

    Li, S., Sun, L., and Li, Q., “CLID-ReID: Exploiting Vision-Language Model for Image re-identification without Concrete Text Labels,”Proceedings of the AAAI conference on artificial intelligence, Vol. 37, 2023, pp. 1405–1413

  60. [60]

    Person Re-identification: Past, Present and Future,

    Zheng, L., Yang, Y., and Hauptmann, A. G., “Person Re-identification: Past, Present and Future,”arXiv preprint arXiv:1610.02984, 2016

  61. [61]

    A System Identification Approach for Video-based Face Recognition

    Aggarwal, G., Roy-Chowdhury, A. K., and Chellappa, R., “A System Identification Approach for Video-based Face Recognition.”ICPR (4), 2004, pp. 175–178

  62. [62]

    Video Person Re-ID: Fantastic Techniques and Where to Find Them (Student Abstract),

    Pathak, P., Eshratifar, A. E., and Gormish, M., “Video Person Re-ID: Fantastic Techniques and Where to Find Them (Student Abstract),”Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 13893–13894

  63. [63]

    K., and Prabhakar, S.,Handbook of Fingerprint Recognition, Springer, 2009

    Maltoni, D., Maio, D., Jain, A. K., and Prabhakar, S.,Handbook of Fingerprint Recognition, Springer, 2009

  64. [64]

    A Review on Iris Recognition,

    Kaur, N., and Juneja, M., “A Review on Iris Recognition,”2014 Recent Advances in Engineering and Computational Sciences (RAECS), 2014, pp. 1–5

  65. [65]

    Biometric Recognition: Challenges and Opportunities,

    Millett, L. I., and Pato, J. N., “Biometric Recognition: Challenges and Opportunities,” 2010

  66. [66]

    Gait Recognition in the Wild: A Benchmark,

    Zhu, Z., Guo, X., Yang, T., Huang, J., Deng, J., Huang, G., Du, D., Lu, J., and Zhou, J., “Gait Recognition in the Wild: A Benchmark,”Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 14789–14799

  67. [67]

    Are You Really Looking at Me? A Feature-Extraction Framework for Estimating Interpersonal Eye Gaze from Conventional Video,

    Tran, M., Sen, T., Haut, K., Ali, M. R., and Hoque, E., “Are You Really Looking at Me? A Feature-Extraction Framework for Estimating Interpersonal Eye Gaze from Conventional Video,”IEEE Transactions on Affective Computing, Vol. 13, No. 2, 2020, pp. 912–925

  68. [68]

    AI vs. Humans: Comparing Road User Intention Recognition Performance,

    Vellenga, K., Steinhauer, H. J., Falkman, G., Andersson, J., and Sjögren, A., “AI vs. Humans: Comparing Road User Intention Recognition Performance,”Transportation Research Part F: Traffic Psychology and Behaviour, Vol. 118, 2026, p. 103491

  69. [69]

    A Hybrid Algorithm for Human Interaction Recognition from Drone Videos: Experimental Analysis to Enhance Disaster Response and Rescue,

    Wang, X., Pirasteh, S., Varshosaz, M., and Fang, Z., “A Hybrid Algorithm for Human Interaction Recognition from Drone Videos: Experimental Analysis to Enhance Disaster Response and Rescue,”Geomatics, Natural Hazards and Risk, Vol. 17, No. 1, 2026, p. 2621550

  70. [70]

    Spatiotemporal VideoEncodersandZero-shotSegmentationfor3DActionRecognitionandBehaviorAnalysisofBroilerChickensAssociated with Different Welfare Indicators and Body Weight,

    Asali, E., Li, G., Saeidifar, M., Liu, T., Oso, O. M., Mandiga, A., Bodempudi, V. U. C., and Kota, S. A. R., “Spatiotemporal VideoEncodersandZero-shotSegmentationfor3DActionRecognitionandBehaviorAnalysisofBroilerChickensAssociated with Different Welfare Indicators and Body Weight,”Computers and Electronics in Agriculture, Vol. 241, 2026, p. 111305

  71. [71]

    Multi-Modal Multi-Ation Video Recognition,

    Shi, Z., Liang, J., Li, Q., Zheng, H., Gu, Z., Dong, J., and Zheng, B., “Multi-Modal Multi-Ation Video Recognition,” Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 13678–13687

  72. [72]

    Analyze Emotions in Your Videos,

    MorphCast, “Analyze Emotions in Your Videos,” , n.d. URLhttps://www.morphcast.com/experiments/demo-analyze- emotions-in-your-videos/, accessed: 2026-04-03

  73. [73]

    Video Emotion Recognition: Analyze Video Emotions and Personality with Multimodal Emotion Analysis,

    Imentiv AI, “Video Emotion Recognition: Analyze Video Emotions and Personality with Multimodal Emotion Analysis,” , n.d. URLhttps://imentiv.ai/product-use-cases/video-emotion-recognition/, accessed: 2026-04-03

  74. [74]

    Multimodal Video Emotion Recognition with Reliable Reasoning Priors,

    Wang, Z., Zhu, Y., Dong, G., Yi, H., Chen, F., Wang, X., and Xie, J., “Multimodal Video Emotion Recognition with Reliable Reasoning Priors,”arXiv preprint arXiv:2508.03722, 2025

  75. [75]

    Does Clip Know My Face?

    Hintersdorf, D., Struppek, L., Brack, M., Friedrich, F., Schramowski, P., and Kersting, K., “Does Clip Know My Face?” Journal of Artificial Intelligence Research, Vol. 80, 2024, pp. 1033–1062

  76. [76]

    Membership Inference Attacks Against Large Vision-Language Models,

    Li, Z., Wu, Y., Chen, Y., Tonin, F., Abad Rocamora, E., and Cevher, V., “Membership Inference Attacks Against Large Vision-Language Models,”Advances in Neural Information Processing Systems, Vol. 37, 2024, pp. 98645–98674

  77. [77]

    Membership Inference Attacks Against Vision-Language Models,

    Hu, Y., Li, Z., Liu, Z., Zhang, Y., Qin, Z., Ren, K., and Chen, C., “Membership Inference Attacks Against Vision-Language Models,”34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 1589–1608

  78. [78]

    GradViT: Gradient Inversion of Vision Transformers,

    Hatamizadeh, A., Yin, H., Roth, H. R., Li, W., Kautz, J., Xu, D., and Molchanov, P., “GradViT: Gradient Inversion of Vision Transformers,”Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10021–10030. 12

  79. [79]

    GIFD:AGenerativeGradientInversionMethodwithFeatureDomain Optimization,

    Fang,H.,Chen,B.,Wang,X.,Wang,Z.,andXia,S.-T.,“GIFD:AGenerativeGradientInversionMethodwithFeatureDomain Optimization,”Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4967–4976

  80. [80]

    PrivacyLeaksbyAdversaries: AdversarialIterationsforMembership Inference Attack,

    Xue,J.,Sun,Z.,Ye,H.,Luo,L.,Chang,X.,andDai,G.,“PrivacyLeaksbyAdversaries: AdversarialIterationsforMembership Inference Attack,”Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 40, 2026, pp. 35967–35975

Showing first 80 references.