Recognition: unknown
[Emerging Ideas] Artificial Tripartite Intelligence: A Bio-Inspired, Sensor-First Architecture for Physical AI
Pith reviewed 2026-05-10 12:50 UTC · model grok-4.3
The pith
A bio-inspired tripartite architecture for physical AI keeps adaptive sensing on-device and invokes higher inference only when needed, raising accuracy while cutting remote calls.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that structuring physical AI into a Brainstem layer for reflexive safety and signal-integrity control, a Cerebellum layer for continuous sensor calibration, and a Cerebral Inference Subsystem spanning L3 and L4 for skill selection, coordination, and deep reasoning allows sensor control, adaptive sensing, edge-cloud execution, and foundation-model reasoning to co-evolve inside one closed-loop architecture, with time-critical operations kept local and higher inference invoked only as needed.
What carries the argument
Artificial Tripartite Intelligence (ATI), a three-layer bio-inspired division in which L1 provides reflexive sensing control, L2 performs ongoing calibration, and L3/L4 handle inference tasks to enable closed-loop adaptation.
If this is right
- Time-critical sensing and reflexive control remain on-device, lowering latency and energy use.
- Higher-level foundation model reasoning is invoked selectively rather than continuously.
- Sensor control and inference components can co-evolve within the same modular architecture.
- The design meets combined constraints on latency, energy, privacy, and reliability more effectively than model scaling alone.
- Performance gains appear in dynamic environments such as varying lighting and motion.
Where Pith is reading between the lines
- The same layered split could be tested in other sensor-rich domains such as autonomous navigation or wearable health monitoring to check whether the three-part structure generalizes.
- Future prototypes might vary the boundaries between L1, L2, and L3/L4 to determine which division of labor yields the largest efficiency gains.
- The results suggest that hardware-software co-design starting from controllable sensors may be more important for physical AI than further increases in model size.
- Keeping more processing local could reduce data transmission and thereby improve privacy, an implication the prototype does not directly measure.
Load-bearing premise
That the tripartite bio-inspired split of sensing, calibration, and inference responsibilities will deliver benefits for physical AI systems beyond the single tested mobile camera prototype.
What would settle it
Running the same routed evaluation on a different physical AI embodiment such as a robot arm or drone and finding that end-to-end accuracy stays near 53.8 percent with no reduction in remote L4 invocations would falsify the claim of general value.
Figures
read the original abstract
As AI moves from data centers to robots and wearables, scaling ever-larger models becomes insufficient. Physical AI operates under tight latency, energy, privacy, and reliability constraints, and its performance depends not only on model capacity but also on how signals are acquired through controllable sensors in dynamic environments. We present Artificial Tripartite Intelligence (ATI), a bio-inspired, sensor-first architectural contract for physical AI. ATI is tripartite at the systems level: a Brainstem (L1) provides reflexive safety and signal-integrity control, a Cerebellum (L2) performs continuous sensor calibration, and a Cerebral Inference Subsystem spanning L3/L4 supports routine skill selection and execution, coordination, and deep reasoning. This modular organization allows sensor control, adaptive sensing, edge-cloud execution, and foundation model reasoning to co-evolve within one closed-loop architecture, while keeping time-critical sensing and control on device and invoking higher-level inference only when needed. We instantiate ATI in a mobile camera prototype under dynamic lighting and motion. In our routed evaluation (L3-L4 split inference), compared to the default auto-exposure setting, ATI (L1/L2 adaptive sensing) improves end-to-end accuracy from 53.8% to 88% while reducing remote L4 invocations by 43.3%. These results show the value of co-designing sensing and inference for embodied AI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Artificial Tripartite Intelligence (ATI), a bio-inspired, sensor-first architectural contract for physical AI. It divides the system into a Brainstem (L1) for reflexive safety and signal-integrity control, a Cerebellum (L2) for continuous sensor calibration, and a Cerebral Inference Subsystem (L3/L4) for skill selection, execution, coordination, and deep reasoning. The architecture is instantiated in a mobile camera prototype under dynamic lighting and motion, where a routed L3-L4 evaluation claims that ATI adaptive sensing raises end-to-end accuracy from 53.8% to 88% while cutting remote L4 invocations by 43.3% relative to default auto-exposure.
Significance. If the empirical claims can be substantiated with proper controls and baselines, the work would offer a concrete modular framework for co-designing controllable sensors with edge-cloud inference in embodied systems. The emphasis on keeping time-critical sensing on-device while invoking higher-level reasoning only when needed addresses real constraints in latency, energy, and privacy for physical AI. The tripartite division provides a reusable contract that could guide future sensor-inference co-evolution, though its generality beyond the single prototype remains to be demonstrated.
major comments (2)
- [Abstract] Abstract: The central quantitative claim states that ATI (L1/L2 adaptive sensing) improves accuracy from 53.8% to 88% and reduces L4 invocations by 43.3% versus default auto-exposure in a routed L3-L4 evaluation. No task definition, dataset, trial count, variance, statistical tests, or implementation details of the L1 reflexive or L2 calibration modules are supplied. Without these, the result cannot be assessed for support of the architectural claim.
- [Prototype evaluation] Prototype evaluation: The reported gains are compared solely against default auto-exposure. To show that the tripartite structure (rather than generic adaptive sensing) is load-bearing, the evaluation must include at least one non-ATI adaptive baseline (e.g., histogram equalization, PID exposure control, or a learned policy). Absent such controls, the 34.2 pp accuracy lift and 43.3% invocation reduction cannot be attributed specifically to the L1/L2/L3-L4 division.
minor comments (2)
- [Introduction] The bio-inspired terminology (Brainstem, Cerebellum) is used consistently but would benefit from a brief table or paragraph explicitly mapping each level's functions to the corresponding biological roles to clarify the analogy's scope.
- [Abstract] Clarify the precise meaning of 'end-to-end accuracy' and the mobile-camera task (e.g., object classification, detection, or tracking under motion) so readers can judge the practical relevance of the 88% figure.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on clarity and experimental rigor. We address each major point below and outline targeted revisions.
read point-by-point responses
-
Referee: The abstract's central quantitative claim lacks task definition, dataset, trial count, variance, statistical tests, or implementation details of the L1 reflexive or L2 calibration modules, preventing assessment of support for the architectural claim.
Authors: The abstract is kept concise per the Emerging Ideas format. The full manuscript defines the task as image classification under dynamic lighting and motion, describes the custom dataset collected with the mobile prototype, reports trial counts and variance in Section 4, and details L1 (threshold-based signal integrity) and L2 (continuous calibration loop) implementations in Section 3. We will revise the abstract to briefly reference the evaluation protocol, dataset size, and key L1/L2 mechanisms, and will add explicit statistical test results if not already highlighted. revision: yes
-
Referee: Gains are compared only against default auto-exposure; to show the tripartite structure is load-bearing rather than generic adaptive sensing, at least one non-ATI adaptive baseline (e.g., histogram equalization or PID control) is required.
Authors: We agree that sole comparison to the camera's default auto-exposure limits isolation of the L1/L2/L3-L4 contribution. The default represents the production baseline, but to demonstrate the specific value of the tripartite contract we will add a non-ATI adaptive baseline (PID-based exposure control) and report comparative accuracy and invocation metrics in the revised evaluation section. revision: yes
Circularity Check
No circularity: architectural proposal plus empirical prototype measurement
full rationale
The paper advances a tripartite bio-inspired architecture (L1 brainstem reflexive control, L2 cerebellum calibration, L3/L4 cerebral inference) and reports a single prototype experiment on a mobile camera under dynamic lighting. The headline result (53.8% to 88% accuracy, 43.3% fewer L4 calls) is presented as a direct empirical comparison against default auto-exposure, not as a mathematical prediction or derivation. No equations, fitted parameters, ansatzes, or derivation chains appear in the manuscript. Consequently there are no opportunities for self-definition, fitted-input-as-prediction, or self-citation load-bearing reductions. The work is self-contained as an architectural contract plus measured outcome; the tripartite division is offered as a design choice whose value is tested experimentally rather than deduced from prior self-referential premises.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Bio-inspired brain structures can be directly mapped to effective layered architectures for physical AI.
invented entities (3)
-
Brainstem (L1)
no independent evidence
-
Cerebellum (L2)
no independent evidence
-
Cerebral Inference Subsystem (L3/L4)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Jonathan Ashmore. 2008. Cochlear outer hair cell motility.Physiological reviews 88, 1 (2008), 173–210. MobiSys ’26, June 21–25, 2026, Cambridge, United Kingdom You Rim Choi, Subeom Park, and Hyung-Sin Kim
2008
-
[2]
Eunsu Baek, Sung-hwan Han, Taesik Gong, and Hyung-Sin Kim. 2025. Adaptive Camera Sensor for Vision Models. InThe Thirteenth International Conference on Learning Representations
2025
-
[3]
Eunsu Baek, Keondo Park, Jiyoon Kim, and Hyung-Sin Kim. 2024. Unexplored faces of robustness and out-of-distribution: Covariate shifts in environment and sensor domains. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22294–22303
2024
-
[4]
Eunsu Baek, Keondo Park, Jeonggil Ko, Min-hwan Oh, Taesik Gong, and Hyung- Sin Kim. 2025. Position: AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift. InThe Thirty-Ninth Annual Conference on Neural Information Processing Systems Position Paper Track
2025
-
[5]
Johan Bjorck, Fernando Castañeda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, et al. 2025. Gr00t n1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734(2025)
work page internal anchor Pith review arXiv 2025
-
[6]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners.Advances in neural information processing systems33 (2020), 1877–1901
2020
-
[7]
Herman Bruyninckx. 2001. Open robot control software: the OROCOS project. In Proceedings 2001 ICRA. IEEE international conference on robotics and automation (Cat. No. 01CH37164), Vol. 3. IEEE, 2523–2528
2001
-
[8]
Kaiyuan Chen, Kush Hari, Trinity Chung, Michael Wang, Nan Tian, Christian Juette, Jeffrey Ichnowski, Liu Ren, John Kubiatowicz, Ion Stoica, et al . 2024. FogROS2-FT: Fault Tolerant Cloud Robotics. In2024 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS). IEEE, 1390–1397
2024
-
[9]
Kaifei Chen, Tong Li, Hyung-Sin Kim, David E Culler, and Randy H Katz. 2018. Marvel: Enabling mobile augmented reality with low energy and low latency. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems. 292–304
2018
-
[10]
Kaiyuan Chen, Nan Tian, Christian Juette, Tianshuang Qiu, Liu Ren, John Kubia- towicz, and Ken Goldberg. 2025. Fogros2-plr: Probabilistic latency-reliability for cloud robotics. In2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 16290–16297
2025
-
[11]
Lun Chen, Yingzhao Zhu, and Man Li. 2024. Tactile-GAT: tactile graph attention networks for robot tactile perception classification.Scientific Reports14, 1 (2024), 27543
2024
-
[12]
Yousung Choi, Ahreum Seo, and Hyung-Sin Kim. 2022. ScriptPainter: Vision- based, on-device test script generation for mobile systems. In2022 21st ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). IEEE, 477–490
2022
-
[13]
Michael W Cole, Jeremy R Reynolds, Jonathan D Power, Grega Repovs, Alan Anticevic, and Todd S Braver. 2013. Multi-task connectivity reveals flexible hubs for adaptive task control.Nature neuroscience16, 9 (2013), 1348–1355
2013
-
[14]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multi- modality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[15]
Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho, Alec Wolman, Stefan Saroiu, Ranveer Chandra, and Paramvir Bahl. 2010. Maui: making smartphones last longer with code offload. InProceedings of the 8th international conference on Mobile systems, applications, and services. 49–62
2010
-
[16]
Michael Tri H Do. 2019. Melanopsin and the intrinsically photosensitive retinal ganglion cells: biophysics to behavior.Neuron104, 2 (2019), 205–226
2019
-
[17]
Danny Driess, Fei Xia, Mehdi SM Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al
-
[18]
InInternational Conference on Machine Learning
PaLM-E: An Embodied Multimodal Language Model. InInternational Conference on Machine Learning. PMLR, 8469–8488
-
[19]
John Duncan. 2010. The multiple-demand (MD) system of the primate brain: mental programs for intelligent behaviour.Trends in cognitive sciences14, 4 (2010), 172–179
2010
-
[20]
Wen Fan, Hongbo Bo, Yijiong Lin, Yifan Xing, Weiru Liu, Nathan Lepora, and Dandan Zhang. 2022. Graph neural networks for interpretable tactile sensing. In 2022 27th International Conference on Automation and Computing (ICAC). IEEE, 1–6
2022
-
[21]
Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. Nestdnn: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. InProceedings of the 24th Annual International Conference on Mobile Computing and Networking. 115–127
2018
-
[22]
Yuan Gong, Yu-An Chung, and James Glass. 2021. AST: Audio Spectrogram Transformer. InInterspeech 2021. 571–575. doi:10.21437/Interspeech.2021-698
-
[23]
Lixiang Han, Zimu Zhou, and Zhenjiang Li. 2024. Pantheon: Preemptible multi- dnn inference on mobile edge gpus. InProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services. 465–478
2024
-
[24]
Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. 2016. Mcdnn: An approximation-based execution framework for deep stream processing under resource constraints. InProceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services. 123–136
2016
-
[25]
Yegyu Han, Taegyoon Yoon, Dayeon Woo, Sojeong Kim, and Hyung-Sin Kim
- [26]
-
[27]
J Johanna Hopp and Albert F Fuchs. 2004. The characteristics and neuronal substrate of saccadic eye movement plasticity.Progress in neurobiology72, 1 (2004), 27–53
2004
-
[28]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingx- ing Tan, Weijun Wang, Yukun arrows Zhu, Ruoming Pang, Vijay Vasudevan, et al
-
[29]
InProceedings of the IEEE/CVF international conference on computer vision
Searching for mobilenetv3. InProceedings of the IEEE/CVF international conference on computer vision. 1314–1324
- [30]
-
[31]
Jeffrey Ichnowski, Kaiyuan Chen, Karthik Dharmarajan, Simeon Adebola, Michael Danielczuk, Víctor Mayoral-Vilches, Nikhil Jha, Hugo Zhan, Edith Llontop, Derek Xu, et al. 2023. FogROS2: An Adaptive Platform for Cloud and Fog Robotics Using ROS 2. In2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5493–5500
2023
- [32]
-
[33]
Masao Ito. 1998. Cerebellar learning in the vestibulo–ocular reflex.Trends in cognitive sciences2, 9 (1998), 313–321
1998
-
[34]
Masao Ito. 2008. Control of mental activities by internal models in the cerebellum. Nature Reviews Neuroscience9, 4 (2008), 304–313
2008
-
[35]
Joo Seong Jeong, Jingyu Lee, Donghyun Kim, Changmin Jeon, Changjin Jeong, Youngki Lee, and Byung-Gon Chun. 2022. Band: coordinated multi-dnn infer- ence on heterogeneous mobile processors. InProceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services. 235–247
2022
-
[36]
Fucheng Jia, Shiqi Jiang, Ting Cao, Wei Cui, Tianrui Xia, Xu Cao, Yuanchun Li, Qipeng Wang, Deyu Zhang, Ju Ren, et al. 2024. Empowering in-browser deep learning inference on edge through just-in-time kernel optimization. InProceed- ings of the 22nd Annual International Conference on Mobile Systems, Applications and Services. 438–450
2024
-
[37]
Roland S Johansson and J Randall Flanagan. 2009. Coding and use of tactile signals from the fingertips in object manipulation tasks.Nature Reviews Neuroscience10, 5 (2009), 345–359
2009
-
[38]
Dong-Sig Kang, Eunsu Baek, Sungwook Son, Youngki Lee, Taesik Gong, and Hyung-Sin Kim. 2024. Mirror: Towards generalizable on-device video virtual try- on for mobile shopping.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies7, 4 (2024), 1–27
2024
-
[39]
Ji-yoon Kim, Eunsu Baek, and Hyung-Sin Kim. 2026. ImageNet-sES: A First Sys- tematic Study of Sensor-Environment Simulation Anchored by Real Recaptures. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1117–1126
2026
-
[40]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifi- cation with deep convolutional neural networks.Advances in neural information processing systems25 (2012)
2012
-
[41]
Victor AF Lamme and Pieter R Roelfsema. 2000. The distinct modes of vision offered by feedforward and recurrent processing.Trends in neurosciences23, 11 (2000), 571–579
2000
-
[42]
Susan J Lederman and Roberta L Klatzky. 2009. Haptic perception: A tutorial. Attention, Perception, & Psychophysics71, 7 (2009), 1439–1459
2009
-
[43]
Kyunghyun Lee, Ukcheol Shin, and Byeong-Uk Lee. 2024. Learning to control camera exposure via reinforcement learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2975–2983
2024
-
[44]
Stephen G Lisberger. 2015. Visual guidance of smooth pursuit eye movements. Annual review of vision science1, 1 (2015), 447–468
2015
-
[45]
Yang Liu, Weixing Chen, Yongjie Bai, Xiaodan Liang, Guanbin Li, Wen Gao, and Liang Lin. 2025. Aligning cyber space with physical world: A comprehensive survey on embodied ai.IEEE/ASME Transactions on Mechatronics(2025)
2025
-
[46]
Hong Lu, Wei Pan, Nicholas D Lane, Tanzeem Choudhury, and Andrew T Camp- bell. 2009. Soundsense: scalable sound sensing for people-centric applications on mobile phones. InProceedings of the 7th international conference on Mobile systems, applications, and services. 165–178
2009
-
[47]
Yueen Ma, Zixing Song, Yuzheng Zhuang, Jianye Hao, and Irwin King. 2024. A survey on vision-language-action models for embodied ai.arXiv preprint arXiv:2405.14093(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[48]
Steven Macenski, Tully Foote, Brian Gerkey, Chris Lalancette, and William Woodall. 2022. Robot operating system 2: Design, architecture, and uses in the wild.Science robotics7, 66 (2022), eabm6074
2022
-
[49]
James L McClelland, Bruce L McNaughton, and Randall C O’Reilly. 1995. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory.Psychological review102, 3 (1995), 419. [Emerging Ideas] Artificial Tripartite Intelligence: A Bio-Inspired, Sensor-Firs...
1995
-
[50]
Emiliano Miluzzo, Nicholas D Lane, Kristóf Fodor, Ronald Peterson, Hong Lu, Mirco Musolesi, Shane B Eisenman, Xiao Zheng, and Andrew T Campbell. 2008. Sensing meets mobile social networks: the design, implementation and evaluation of the cenceme application. InProceedings of the 6th ACM conference on Embedded network sensor systems. 337–350
2008
-
[51]
Jonathan W Mink. 1996. The basal ganglia: focused selection and inhibition of competing motor programs.Progress in neurobiology50, 4 (1996), 381–425
1996
-
[52]
Jonathan W Mink. 2018. Basal ganglia mechanisms in action selection, plasticity, and dystonia.European Journal of Paediatric Neurology22, 2 (2018), 225–229
2018
-
[53]
2012.Hearing: anatomy, physiology, and disorders of the auditory system
Aage R Møller. 2012.Hearing: anatomy, physiology, and disorders of the auditory system. Plural Publishing
2012
-
[54]
Keondo Park, You Rim Choi, Inhoe Lee, and Hyung-Sin Kim. 2023. PointSplit: Towards on-device 3D object detection with heterogeneous low-power accelera- tors. InProceedings of the 22nd International Conference on Information Processing in Sensor Networks. 67–81
2023
-
[55]
Keondo Park, Joopyo Hong, Wooseok Lee, Hyun-Woo Shin, and Hyung-Sin Kim. 2025. DistillSleep: real-time, on-device, interpretable sleep staging from single-channel electroencephalogram.SLEEPJ48, 12 (2025), zsaf240
2025
-
[56]
José Luis Pech-Pacheco, Gabriel Cristóbal, Jesús Chamorro-Martinez, and Joaquín Fernández-Valdivia. 2000. Diatom autofocusing in brightfield microscopy: a com- parative study. InProceedings 15th International Conference on Pattern Recognition. ICPR-2000, Vol. 3. IEEE, 314–317
2000
-
[57]
Ngoc-Quan Pham, Thai Son Nguyen, Jan Niehues, Markus Müller, and Alexan- der H. Waibel. 2019. Very Deep Self-Attention Networks for End-to-End Speech Recognition. InProc. Interspeech
2019
-
[58]
Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, Andrew Y Ng, et al. 2009. ROS: an open-source Robot Operating System. InICRA workshop on open source software, Vol. 3. Kobe, 5
2009
-
[59]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InInternational conference on machine learning. PmLR, 8748–8763
2021
-
[60]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520
2018
-
[61]
Mahadev Satyanarayanan, Paramvir Bahl, Ramón Caceres, and Nigel Davies
-
[62]
The case for vm-based cloudlets in mobile computing.IEEE pervasive Computing8, 4 (2009), 14–23
2009
-
[63]
V Skljarevski and NM Ramadan. 2002. The nociceptive flexion reflex in humans– review article.Pain96, 1-2 (2002), 3–8
2002
-
[64]
Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. InInternational conference on machine learning. PMLR, 6105–6114
2019
-
[65]
Gemini Robotics Team, Saminda Abeyruwan, Joshua Ainslie, Jean-Baptiste Alayrac, Montserrat Gonzalez Arenas, Travis Armstrong, Ashwin Balakrishna, Robert Baruch, Maria Bauza, Michiel Blokzijl, et al. 2025. Gemini robotics: Bring- ing ai into the physical world.arXiv preprint arXiv:2503.20020(2025)
work page internal anchor Pith review arXiv 2025
-
[66]
Simon Thorpe, Denis Fize, and Catherine Marlot. 1996. Speed of processing in the human visual system.nature381, 6582 (1996), 520–522
1996
-
[67]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)
2017
-
[68]
John Von Neumann. 1993. First Draft of a Report on the EDVAC.IEEE Annals of the History of Computing15, 4 (1993), 27–75
1993
-
[69]
Zhonglin Yang, Hao Fang, Huanyu Liu, Junbao Li, Yutong Jiang, and Mengqi Zhu. 2024. Active Visual Perception Enhancement Method Based on Deep Reinforcement Learning.Electronics13, 9 (2024), 1654
2024
-
[70]
Juheon Yi, Sunghyun Choi, and Youngki Lee. 2020. EagleEye: Wearable camera- based person identification in crowded urban spaces. InProceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1–14
2020
-
[71]
Juheon Yi and Youngki Lee. 2020. Heimdall: mobile GPU coordination platform for augmented reality applications. InProceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1–14
2020
-
[72]
Taegyoon Yoon, Yegyu Han, Seojin Ji, Jaewoo Park, Sojeong Kim, Taein Kwon, and Hyung-Sin Kim. 2026. EgoXtreme: A Dataset for Robust Object Pose Estimation in Egocentric Views under Extreme Conditions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
2026
-
[73]
Jiseok Youn, Jaehun Song, Hyung-Sin Kim, and Saewoong Bahk. 2022. Bitwidth- adaptive quantization-aware neural network training: A meta-learning approach. InEuropean Conference on Computer Vision. Springer, 208–224
2022
-
[74]
Shuyang Zhang, Jinhao He, Yilong Zhu, Jin Wu, and Jie Yuan. 2024. Efficient Camera Exposure Control for Visual Odometry via Deep Reinforcement Learning. IEEE Robotics and Automation Letters(2024)
2024
-
[75]
Brianna Zitkovich, Tianhe Yu, Sichun Xu, Peng Xu, Ted Xiao, Fei Xia, Jialin Wu, Paul Wohlhart, Stefan Welker, Ayzaan Wahid, et al. 2023. Rt-2: Vision-language- action models transfer web knowledge to robotic control. InConference on Robot Learning. PMLR, 2165–2183
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.