Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI
Pith reviewed 2026-05-18 09:53 UTC · model grok-4.3
The pith
This survey argues that aligning perception, reasoning, modeling and interaction with physical laws lets AI move beyond pattern matching to genuine physical understanding.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Intelligent systems that ground learning in both physical principles and embodied reasoning processes can transcend pattern recognition toward genuine understanding of physical laws, enabling next-generation world models capable of explaining physical phenomena and predicting future states.
What carries the argument
A unified bridging framework that connects structured symbolic reasoning, embodied systems and generative models through physics-grounded methods to produce applied physical understanding.
If this is right
- AI gains improved real-world comprehension by grounding outputs in physical laws.
- World models become able to explain observed phenomena and forecast future physical states.
- Systems advance toward greater safety, generalization and interpretability in embodied tasks.
- Perception, reasoning, modeling and interaction become mutually reinforcing rather than separate tracks.
Where Pith is reading between the lines
- Robotics applications could gain more reliable prediction of object interactions and dynamics.
- Training data requirements might decrease if physical constraints replace some statistical learning.
- The survey's distinctions between theoretical and applied understanding could guide evaluation benchmarks in cognitive robotics.
Load-bearing premise
Recent advances in physics-grounded methods across symbolic reasoning, embodied systems and generative models can be brought together in one framework that yields genuine physical understanding rather than just stronger pattern matching.
What would settle it
Build and test a unified physical AI system on novel physical scenarios; if its generalization and explanatory power remain no better than current pattern-based models, the central claim is falsified.
Figures
read the original abstract
The rapid advancement of embodied intelligence and world models has intensified efforts to integrate physical laws into AI systems, yet physical perception and symbolic physics reasoning have developed along separate trajectories without a unified bridging framework. This work provides a comprehensive overview of physical AI, establishing clear distinctions between theoretical physics reasoning and applied physical understanding while systematically examining how physics-grounded methods enhance AI's real-world comprehension across structured symbolic reasoning, embodied systems, and generative models. Through rigorous analysis of recent advances, we advocate for intelligent systems that ground learning in both physical principles and embodied reasoning processes, transcending pattern recognition toward genuine understanding of physical laws. Our synthesis envisions next-generation world models capable of explaining physical phenomena and predicting future states, advancing safe, generalizable, and interpretable AI systems. We maintain a continuously updated resource at https://github.com/AI4Phys/Awesome-AI-for-Physics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a survey on Physical AI that reviews the integration of physical laws into AI systems. It distinguishes between theoretical physics reasoning and applied physical understanding, examines physics-grounded methods in symbolic reasoning, embodied systems, and generative models, and advocates for grounding learning in physical principles and embodied reasoning to achieve genuine understanding beyond pattern recognition, envisioning advanced world models for explaining and predicting physical phenomena.
Significance. If the advocated synthesis holds, this survey could significantly influence the field by promoting more interpretable and generalizable AI systems that incorporate physical understanding, potentially leading to safer and more robust embodied intelligence and world models. The continuous resource at the GitHub link adds value for the community.
major comments (2)
- [Abstract] Abstract: The positioning of the work as providing a 'unified bridging framework' for transcending pattern recognition is load-bearing for the central claim, yet the review of separate trajectories across symbolic reasoning, embodied systems, and generative models does not include a concrete integration mechanism or formal definition of 'genuine understanding' versus improved statistical correlation.
- [Synthesis sections] Synthesis/Advocacy sections: The distinction between theoretical and applied physical understanding is presented without external benchmarks or comparative evaluations on physical prediction tasks, which undermines the assertion that reviewed methods achieve transcendence beyond pattern matching.
minor comments (2)
- Update the GitHub resource link with the latest references to maintain currency in this rapidly evolving field.
- Ensure figure captions and tables clearly indicate the scope of reviewed methods to improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback on our survey manuscript. We address each major comment point by point below, clarifying the scope of a survey paper while indicating specific revisions that will strengthen the presentation without altering its core contribution as a synthesis of the literature.
read point-by-point responses
-
Referee: [Abstract] Abstract: The positioning of the work as providing a 'unified bridging framework' for transcending pattern recognition is load-bearing for the central claim, yet the review of separate trajectories across symbolic reasoning, embodied systems, and generative models does not include a concrete integration mechanism or formal definition of 'genuine understanding' versus improved statistical correlation.
Authors: As a survey, the manuscript does not introduce a new technical integration mechanism; the 'unified bridging framework' is intended as the organizational taxonomy and cross-domain analysis that connects the reviewed trajectories through shared physical principles. We will revise the abstract to describe this more precisely as a conceptual synthesis and organizational structure. We will also add a short subsection early in the introduction that offers a working definition of 'genuine understanding' in physical AI, drawing on distinctions from the literature such as causal intervention, counterfactual reasoning, and systematic generalization on physical tasks, to differentiate it from improved statistical correlation. revision: partial
-
Referee: [Synthesis sections] Synthesis/Advocacy sections: The distinction between theoretical and applied physical understanding is presented without external benchmarks or comparative evaluations on physical prediction tasks, which undermines the assertion that reviewed methods achieve transcendence beyond pattern matching.
Authors: The referee is correct that the current draft presents the distinction conceptually without new empirical comparisons. Because this is a survey, we do not conduct fresh experiments. In revision we will expand the synthesis sections to reference existing physical prediction benchmarks and datasets from the literature (e.g., those appearing in physics-informed neural network evaluations and embodied reasoning challenges), summarize performance trends reported in the cited works, and explicitly note the limitations of a review format in providing direct head-to-head evaluations. This will make the scope and evidential basis clearer while preserving the survey's role. revision: yes
Circularity Check
Survey with no derivations or self-referential predictions; claims rest on external citations
full rationale
This is a literature survey without equations, fitted parameters, or original derivations. The central advocacy for grounding AI in physical principles plus embodied reasoning and for next-generation world models is presented as a synthesis of reviewed advances across symbolic reasoning, embodied systems, and generative models. No load-bearing step reduces by construction to a self-definition, a fitted input renamed as prediction, or a self-citation chain; distinctions between theoretical and applied understanding are offered as organizational framing rather than a derived result. The paper therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Physics-grounded methods can enhance AI's real-world comprehension beyond pattern recognition
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean, IndisputableMonolith/Cost/FunctionalEquation.leanreality_from_one_distinction, washburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our survey uniquely focuses on the evolutionary trajectory that unites these four capabilities into a coherent paradigm... hybrid approaches that integrate physics-grounded architectures, physics-informed training, and symbolic reasoning into unified frameworks.
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
physics-informed neural networks (PINNs)... neuro-symbolic integration... differentiable physics engines
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
AI meets physics: a comprehensive survey,
L. Jiao, X. Song, C. Youet al., “AI meets physics: a comprehensive survey,”Artif. Intell. Rev., vol. 57, 2024
work page 2024
-
[3]
Newtonian Scene Understanding: Unfolding the Dy- namics of Objects in Static Images,
R. Mottaghi, H. Bagherinezhad, and M. e. a. Rastegari, “Newtonian Scene Understanding: Unfolding the Dy- namics of Objects in Static Images,” inCVPR, 2016
work page 2016
-
[4]
Interaction Networks for Learning about Objects, Relations and Physics,
P . W. Battaglia, R. Pascanu, M. Laiet al., “Interaction Networks for Learning about Objects, Relations and Physics,” inNeurIPS, vol. 29, 2016
work page 2016
-
[5]
SeePhys: Does Seeing Help Thinking? – Benchmarking Vision-Based Physics Reasoning,
K. Xiang, H. Li, T. J. Zhanget al., “SeePhys: Does Seeing Help Thinking? – Benchmarking Vision-Based Physics Reasoning,”arXiv:2505.19099, 2025
-
[6]
I-PHYRE: Interactive Physical Reasoning,
S. Li, K. Wu, C. Zhanget al., “I-PHYRE: Interactive Physical Reasoning,” inICLR, 2024. 15
work page 2024
-
[7]
PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly,
L. Ma, J. Wen, M. Linet al., “PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly,” inNeurIPS, 2025
work page 2025
-
[8]
A. Cherian, R. Corcodel, S. Jainet al., “LLMPhy: Com- plex Physical Reasoning Using Large Language Models and World Models,”arXiv:2411.08027, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[9]
ComPhy: Compositional Physical Reasoning of Objects and Events from Videos,
Z. Chen, K. Yi, Y. Liet al., “ComPhy: Compositional Physical Reasoning of Objects and Events from Videos,” inICLR, 2022
work page 2022
-
[10]
Semi-supervised classifica- tion with graph convolutional networks,
T. N. Kipf and M. Welling, “Semi-supervised classifica- tion with graph convolutional networks,” inICLR, 2017
work page 2017
-
[11]
P . Veliˇ ckovi´ c, G. Cucurull, A. Casanova, and et al., “Graph Attention Networks,” inICLR, 2018
work page 2018
-
[12]
Inductive Representation Learning on Large Graphs,
W. L. Hamilton, Z. Ying, and J. Leskovec, “Inductive Representation Learning on Large Graphs,” inNeurIPS, vol. 30, 2017
work page 2017
-
[13]
Visual In- teraction Networks: Learning a Physics Simulator from Video,
N. Watters, D. Zoran, T. Weber, and et al., “Visual In- teraction Networks: Learning a Physics Simulator from Video,” inNeurIPS, vol. 30, 2017
work page 2017
-
[14]
A Com- positional Object-Based Approach to Learning Physical Dynamics,
M. B. Chang, T. D. Ullman, A. Torralbaet al., “A Com- positional Object-Based Approach to Learning Physical Dynamics,” inICLR, 2017
work page 2017
-
[15]
Motion- Craft: Physics-Based Zero-Shot Video Generation,
A. Montanaro, L. Savant Aira, E. Aielloet al., “Motion- Craft: Physics-Based Zero-Shot Video Generation,” in NeurIPS, vol. 37, 2024
work page 2024
-
[16]
Videorepa: Learning physics for video generation through relational alignment with foundation models
X. Zhang, J. Liao, S. Zhanget al., “VideoREPA: Learning Physics for Video Generation through Relational Align- ment with Foundation Models,”arXiv:2505.23656, 2025
-
[17]
How Do Transformers Model Physics? Investigating the Simple Harmonic Oscillator,
S. Kantamneni, Z. Liu, and M. Tegmark, “How Do Transformers Model Physics? Investigating the Simple Harmonic Oscillator,”Entropy, vol. 26, 2024
work page 2024
-
[18]
A Physics-guided Multimodal Transformer Path to Weather and Climate Sciences,
J. Han, H. Chen, K. Hanet al., “A Physics-guided Multimodal Transformer Path to Weather and Climate Sciences,”CoRR, 2025
work page 2025
-
[19]
Solving fluid flow problems using semi-supervised symbolic regression on sparse data,
Y. M. F. El Hasadi and J. T. Padding, “Solving fluid flow problems using semi-supervised symbolic regression on sparse data,”AIP Adv., vol. 9, 2019
work page 2019
-
[20]
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of- Thought Reasoning,
X. Chen, R. Zhang, D. Jianget al., “MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of- Thought Reasoning,”arXiv:2506.05331, 2025
-
[21]
B. Jiang, S. Chen, Q. Zhanget al., “AlphaDrive: Un- leashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning,”arXiv, 2025
work page 2025
-
[22]
Z. Yuan, J. Tang, J. Luoet al., “AutoDrive-R 2: Incen- tivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving,”arXiv:2509.01944, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[23]
K. Liu, D. Yang, Z. Qian, W. Yin, Y. Wang, H. Li, J. Liu, P . Zhai, Y. Liu, and L. Zhang, “Reinforcement learning meets large language models: A survey of advancements and applications across the llm lifecycle,” 2025. [Online]. Available: https: //arxiv.org/abs/2509.16679
-
[24]
Causal Modeling of Dynamical Systems,
S. Bongers, T. Blom, and J. M. Mooij, “Causal Modeling of Dynamical Systems,”arXiv, 2018
work page 2018
-
[25]
Using Causal Threads to Explain Changes in a Dynamic System,
R. B. Allen, “Using Causal Threads to Explain Changes in a Dynamic System,” inICADL, vol. 14458, 2023
work page 2023
-
[26]
PhysORD: a neuro-symbolic approach for physics-infused motion prediction in off- road driving,
Z. Zhao, B. Li, Y. Duet al., “PhysORD: a neuro-symbolic approach for physics-infused motion prediction in off- road driving,” inIROS, 2024
work page 2024
-
[27]
Functional optimiza- tion of fluidic devices with differentiable stokes flow,
T. Du, K. Wu, A. Spielberget al., “Functional optimiza- tion of fluidic devices with differentiable stokes flow,” ACM Trans. Graph., vol. 39, 2020
work page 2020
-
[28]
Scalable Differen- tiable Physics for Learning and Control,
Y.-L. Qiao, J. Liang, V . Koltunet al., “Scalable Differen- tiable Physics for Learning and Control,” inICML, vol. 119, 2020
work page 2020
-
[29]
OpenAI, Aaron Hurst, Adam Lerer, and et al., “GPT-4o System Card,”arXiv preprint arXiv:2410.21276, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[30]
H. Liu, C. Li, Q. Wuet al., “Visual Instruction Tuning,” inNeurIPS, vol. 36, 2023
work page 2023
- [31]
-
[32]
S. Bai, K. Chen, X. Liuet al., “Qwen2.5-VL Technical Report,”arXiv:2502.13923, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [33]
- [34]
-
[35]
H. Liang, R. Wu, B. Zenget al., “Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge,”arXiv:2509.06079, 2025
-
[36]
GAIA-1: A Generative World Model for Autonomous Driving,
A. Hu, L. Russell, H. Yeoet al., “GAIA-1: A Generative World Model for Autonomous Driving,”arXiv, 2023
work page 2023
-
[37]
DriveDreamer: Towards Real-world-driven World Models for Au- tonomous Driving,
X. Wang, Z. Zhu, G. Huanget al., “DriveDreamer: Towards Real-world-driven World Models for Au- tonomous Driving,” inECCV, vol. 15106, 2024
work page 2024
-
[38]
OpenVLA: An Open-Source Vision-Language-Action Model
M. J. Kim, K. Pertsch, S. Karamchetiet al., “Open- VLA: An Open-Source Vision-Language-Action Model,” arXiv:2406.09246, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[39]
Pi0: A vision- language-action flow model for general robot control,
K. Black, N. Brown, D. Driesset al., “Pi0: A vision- language-action flow model for general robot control,” inRSS, 2025
work page 2025
-
[40]
A survey on multimodal large language models,
S. Yin, C. Fu, S. Zhaoet al., “A survey on multimodal large language models,”Natl. Sci. Rev., vol. 11, 2024
work page 2024
-
[41]
Large lan- guage models predict human sensory judgments across six modalities,
R. Marjieh, I. Sucholutsky, P . van Rijnet al., “Large lan- guage models predict human sensory judgments across six modalities,”Sci. Rep., vol. 14, 2024
work page 2024
-
[42]
Object detection with mul- timodal large vision-language models: An in-depth re- view,
R. Sapkota and M. Karkee, “Object detection with mul- timodal large vision-language models: An in-depth re- view,”arXiv, vol. abs/2508.19294, 2025
-
[43]
Phygrasp: Gener- alizing robotic grasping with physics-informed large multimodal models,
D. Guo, Y. Xiang, S. Zhaoet al., “Phygrasp: Gener- alizing robotic grasping with physics-informed large multimodal models,”arXiv, vol. abs/2402.16836, 2024
-
[44]
Probing perceptual con- stancy in large vision language models,
H. Sun, S. Yu, Y. Liet al., “Probing perceptual con- stancy in large vision language models,”arXiv, vol. abs/2502.10273, 2025
-
[45]
C. Zhou, M. Wang, Y. Maet al., “From perception to cognition: A survey of vision-language interactive rea- soning in multimodal large language models,”arXiv, vol. abs/2509.25373, 2025
-
[46]
From System 1 to System 2: A Survey of Reasoning Large Language Models
Z.-Z. Li, D. Zhang, M.-L. Zhanget al., “From system 1 to system 2: A survey of reasoning large language models,” arXiv, vol. abs/2502.17419, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
M. Ravishankara and V . V . P . Maharaj, “The artificial intelligence cognitive examination: A survey on the evolution of multimodal evaluation from recognition to reasoning,”arXiv, vol. arXiv:2510.04141, 2025
-
[48]
A survey on machine learning approaches for modelling intuitive physics,
J. Duan, A. Dasgupta, J. Fischeret al., “A survey on machine learning approaches for modelling intuitive physics,”IJCAI, vol. abs/2202.06481, 2022
-
[49]
Towards Reasoning in Large Language Models: A Survey
J. Huang and K. C.-C. Chang, “Towards reason- ing in large language models: A survey,”arXiv, vol. abs/2212.10403, 2022
work page internal anchor Pith review arXiv 2022
-
[50]
Foundation model driven robotics: A comprehensive review,
M. T. Khan and A. Waheed, “Foundation model driven robotics: A comprehensive review,”CoRR, vol. abs/2507.10087, 2025
-
[51]
K. G. Barman, S. Caron, E. Sullivan, and et al., “Large physics models: towards a collaborative approach with large language models and foundation models,”Eur. Phys. J. C, vol. 85, 2025
work page 2025
-
[52]
Understanding world or predicting future? a comprehensive survey of world models,
J. Ding, Y. Zhang, Y. Shanget al., “Understanding world or predicting future? a comprehensive survey of world models,”ACM Comput. Surv., 2025
work page 2025
-
[53]
Generative physical ai in vision: A survey.arXiv preprint arXiv:2501.10928,
D. Liu, J. Zhang, A.-D. Dinhet al., “Generative physical ai in vision: A survey,”CoRR, vol. abs/2501.10928, 2025
-
[54]
Is sora a world simulator? a comprehensive survey on general world models and beyond
Z. Zhu, X. Wang, W. Zhaoet al., “Is sora a world simulator? a comprehensive survey on general world models and beyond,”arXiv, vol. abs/2405.03520, 2024
-
[55]
3d and 4d world modeling: A survey,
L. Kong, W. Yang, J. Meiet al., “3d and 4d world modeling: A survey,”arXiv, vol. abs/2509.07996, 2025
-
[56]
Ma- chine learning for data-driven discovery in solid earth geoscience,
K. J. Bergen, P . A. Johnson, M. V . de Hoopet al., “Ma- chine learning for data-driven discovery in solid earth geoscience,”Science, vol. 363, 2019
work page 2019
-
[57]
From 2d to 3d cognition: A brief survey of general world models,
N. Xie, Z. Tian, L. Yanget al., “From 2d to 3d cognition: A brief survey of general world models,”CoRR, vol. abs/2506.20134, 2025
-
[58]
A survey on world mod- els grounded in acoustic physical information,
X. Chen, L. Chang, X. Yuet al., “A survey on world mod- els grounded in acoustic physical information,”arXiv, vol. abs/2506.13833, 2025
-
[59]
From efficient multi- modal models to world models: A survey,
X. Mai, Z. Tao, and J. L. et al., “From efficient multi- modal models to world models: A survey,”CoRR, vol. abs/2407.00118, 2024
-
[60]
Aligning cyber space with physical world: A comprehensive survey on embodied ai,
Y. Liu, W. Chen, Y. Baiet al., “Aligning cyber space with physical world: A comprehensive survey on embodied ai,”CoRR, vol. abs/2407.06886, 2024. 16
-
[61]
Shapellm: Universal 3d object understanding for embodied interaction,
Z. Qi, R. Dong, S. Zhanget al., “Shapellm: Universal 3d object understanding for embodied interaction,” in ECCV, 2024
work page 2024
-
[62]
Foundation models for au- tonomous driving perception: A survey through core capabilities,
R. Sathyam and Y. Li, “Foundation models for au- tonomous driving perception: A survey through core capabilities,”IEEE Open J. Veh. Technol., vol. 6, 2025
work page 2025
-
[63]
Embodied ai: From llms to world models,
T. Feng, X. Wang, and Y.-G. J. et al., “Embodied ai: From llms to world models,”arXiv, vol. abs/2509.20021, 2025
-
[64]
A survey: Learn- ing embodied intelligence from physical simulators and world models,
X. Long, Q. Zhao, K. Zhanget al., “A survey: Learn- ing embodied intelligence from physical simulators and world models,”arXiv, vol. abs/2507.00917, 2025
-
[65]
A survey on large lan- guage model based autonomous agents,
L. Wang, C. Ma, X. Fenget al., “A survey on large lan- guage model based autonomous agents,”Front. Comput. Sci., vol. 18, 2024
work page 2024
-
[66]
Large model empow- ered embodied ai: A survey on decision-making and embodied learning,
W. Liang, R. Zhou, Y. Maet al., “Large model empow- ered embodied ai: A survey on decision-making and embodied learning,”arXiv, vol. abs/2508.10399, 2025
-
[67]
A survey of embodied learning for object-centric robotic manipulation,
Y. Zheng, L. Yao, Y. Suet al., “A survey of embodied learning for object-centric robotic manipulation,”Mach. Intell. Res., vol. 22, 2025
work page 2025
-
[68]
Toward embodied agi: A re- view of embodied ai and the road ahead,
Y. Wang and A. Sun, “Toward embodied agi: A re- view of embodied ai and the road ahead,”arXiv, vol. abs/2505.14235, 2025
-
[69]
A survey of embodied ai: From simulators to research tasks,
J. Duan, S. Yu, and T. L. et al., “A survey of embodied ai: From simulators to research tasks,”IEEE Trans. Emerg. Top. Comput. Intell., vol. 6, 2022
work page 2022
-
[70]
A survey on robotics with foundation models: toward embodied ai,
Z. Xu, K. Wu, J. Wenet al., “A survey on robotics with foundation models: toward embodied ai,”CoRR, vol. abs/2402.02385, 2024
-
[71]
A survey on deep reinforcement learning algorithms for robotic manipulation,
D. Han, B. Mulyana, V . Stankovicet al., “A survey on deep reinforcement learning algorithms for robotic manipulation,”Sensors, vol. 23, 2023
work page 2023
- [72]
- [73]
-
[74]
Grounding DINO: Mar- rying DINO with Grounded Pre-Training for Open-Set Object Detection,
S. Liu, Z. Zeng, T. Renet al., “Grounding DINO: Mar- rying DINO with Grounded Pre-Training for Open-Set Object Detection,” inECCV, vol. 15105, 2024
work page 2024
-
[75]
J. Xu, Z. Guo, H. Huet al., “Qwen3-Omni Technical Report,”arXiv:2509.17765, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[76]
Inter-object discriminative graph modeling for indoor scene recognition,
C. Song, H. Wu, and X. Ma, “Inter-object discriminative graph modeling for indoor scene recognition,”Knowl.- Based Syst., vol. 302, 2024
work page 2024
-
[77]
View-Invariant Pixelwise Anomaly Detection in Multi-object Scenes with Adap- tive View Synthesis,
S. Varghese and V . Hoskere, “View-Invariant Pixelwise Anomaly Detection in Multi-object Scenes with Adap- tive View Synthesis,”arXiv:2406.18012, 2024
-
[78]
Cognition Guided Video Anomaly Detection Framework for Surveillance Services,
M. Zhang, J. Wang, Q. Qiet al., “Cognition Guided Video Anomaly Detection Framework for Surveillance Services,”IEEE Trans. Serv. Comput., vol. 17, 2024
work page 2024
-
[79]
T. Ji, N. Chakraborty, A. Schreiberet al., “An expert ensemble for detecting anomalous scenes, interactions, and behaviors in autonomous driving,”Int. J. Robot. Res., vol. 44, 2025
work page 2025
-
[80]
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models,
C. Fu, P . Chen, Y. Shenet al., “MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models,”arXiv, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.