Recognition: no theorem link
AI2-THOR: An Interactive 3D Environment for Visual AI
Pith reviewed 2026-05-12 05:20 UTC · model grok-4.3
The pith
AI2-THOR supplies near photo-realistic 3D indoor scenes where AI agents navigate and interact with objects to complete tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. The goal of AI2-THOR is to facilitate building visually intelligent models and push the research forward in this domain.
What carries the argument
The AI2-THOR framework of interactive 3D indoor scenes that allow agent navigation and object manipulation.
Load-bearing premise
The simulated interactions and near photo-realistic visuals are representative enough of real-world conditions to support transferable learning and that the research community will widely adopt and extend the framework.
What would settle it
Train a policy in AI2-THOR on a navigation or object-interaction task and test the same policy on equivalent tasks in a physical room; substantially lower success rates would indicate limited transfer.
read the original abstract
We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor.allenai.org. AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2-THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning, learning by interaction, planning, visual question answering, unsupervised representation learning, object detection and segmentation, and learning models of cognition. The goal of AI2-THOR is to facilitate building visually intelligent models and push the research forward in this domain.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces AI2-THOR (The House Of inteRactions), a publicly available framework for visual AI research consisting of near photo-realistic 3D indoor scenes. Agents can navigate these scenes and interact with objects to perform tasks. The framework is presented as a platform to support research across domains including deep reinforcement learning, imitation learning, planning, visual question answering, unsupervised representation learning, object detection and segmentation, and models of cognition.
Significance. If the described environment functions as outlined, the work offers a useful, open-source simulation platform that advances embodied visual AI by moving beyond static datasets toward interactive 3D settings. The explicit public release supports reproducibility and community extensions, which are concrete strengths for a systems-style contribution in this area.
minor comments (3)
- The abstract and title use an unconventional capitalization in the acronym expansion ('inteRactions'); a consistent typographic treatment would improve readability.
- The manuscript would benefit from a dedicated section or appendix providing a minimal usage example (e.g., agent action API calls or scene loading code) to lower the barrier for new users.
- Related-work discussion could more explicitly contrast the interaction fidelity and scene variety against contemporaneous simulators such as AI2-THOR's predecessors or other 3D environments referenced in the field.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript and for recommending acceptance. The referee's summary correctly identifies AI2-THOR as a publicly available framework providing near photo-realistic 3D indoor environments that support navigation and object interaction for a range of visual AI tasks.
Circularity Check
No significant circularity; paper introduces a new simulation framework without derivations or predictions
full rationale
The paper's central contribution is the direct presentation of the AI2-THOR framework consisting of near photo-realistic 3D indoor scenes supporting navigation and object interactions. No equations, fitted parameters, predictions, or derivation chains appear in the abstract or described content. The work is self-contained as a platform introduction rather than a solved transfer problem or model derivation, so no steps reduce to inputs by construction.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 42 Pith papers
-
SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning
SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
-
SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning
SimWorld Studio uses a self-evolving coding agent to generate adaptive 3D environments that improve embodied agent performance, with reported gains of 18 points over fixed environments in navigation tasks.
-
Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI
HM3D offers 1000 building-scale 3D environments that are larger and higher-fidelity than existing datasets, enabling better-performing embodied AI agents for tasks like PointGoal navigation.
-
Ego2World: Compiling Egocentric Cooking Videos into Executable Worlds for Belief-State Planning
Ego2World turns real egocentric cooking videos into hidden symbolic world graphs for evaluating belief-state planning and memory in embodied agents.
-
Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents
VeGAS improves MLLM-based embodied agents by sampling action ensembles and using a verifier trained on LLM-synthesized failure cases, yielding up to 36% relative gains on hard multi-object long-horizon tasks in Habita...
-
3D-Belief: Embodied Belief Inference via Generative 3D World Modeling
3D-Belief maintains and updates explicit 3D beliefs about partially observed environments to enable multi-hypothesis imagination and improved performance on embodied tasks.
-
EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents
EnactToM benchmark reveals frontier AI models achieve 0% on functional Theory of Mind task completion in embodied multi-agent settings despite 45% average on literal belief probes.
-
Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents
VIGIL decouples world-state completion (W) from benchmark success (B) requiring correct terminal reports, showing up to 19.7 pp gaps in B for models with similar W across 20 systems on 1000 episodes.
-
MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents
MemCompiler introduces state-conditioned memory compilation that dynamically selects and compiles relevant memory into text and latent guidance, yielding up to 129% gains over no-memory baselines and 60% lower latency...
-
ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue
ESARBench is the first unified benchmark for MLLM-driven UAV agents that must explore, locate clues, and decide on victim positions in photorealistic simulated SAR environments.
-
3D Generation for Embodied AI and Robotic Simulation: A Survey
3D generation for embodied AI is shifting from visual realism toward interaction readiness, organized into data generation, simulation environments, and sim-to-real bridging roles.
-
KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning
KinDER is a new open-source benchmark that demonstrates substantial gaps in current robot learning and planning methods for handling physical constraints.
-
SpaMEM: Benchmarking Dynamic Spatial Reasoning via Perception-Memory Integration in Embodied Environments
SpaMEM benchmark shows multimodal LLMs succeed at spatial tasks with text histories but sharply fail at long-horizon belief maintenance from raw visual streams alone.
-
Exploring Spatial Intelligence from a Generative Perspective
Fine-tuning multimodal models on a new synthetic spatial benchmark improves generative spatial compliance on real and synthetic tasks and transfers to better spatial understanding.
-
ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints
ADAPT augments planners with affordance reasoning to raise task success in environments with unspecified and time-varying object affordances, and a LoRA-finetuned VLM backend beats GPT-4o on the new DynAfford benchmark.
-
EgoFun3D: Modeling Interactive Objects from Egocentric Videos using Function Templates
EgoFun3D creates a new task, 271-video dataset, and pipeline using function templates to model interactive 3D objects from egocentric videos for simulation.
-
Voyager: An Open-Ended Embodied Agent with Large Language Models
Voyager achieves superior lifelong learning in Minecraft by combining an automatic exploration curriculum, a library of executable skills, and iterative LLM prompting with environment feedback, yielding 3.3x more uniq...
-
RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data
A co-evolutionary VLM-VGM loop on 500 unlabeled images raises planner success by 30 points and simulator success by 48 percent while beating fully supervised baselines.
-
Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents
VIGIL separates world-state completion (W) from benchmark success (B) requiring correct terminal reports, showing up to 19.7 pp gaps between models with similar execution on 1000 episodes across 20 systems.
-
How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study
VLMs show consistent deficits in identifying sensitive items in cluttered scenes, adapting to social contexts, and resolving conflicts between commands and privacy constraints in a new physical simulator benchmark.
-
How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study
Vision-language models exhibit perceptual fragility and fail to consistently respect privacy constraints when operating in simulated physical environments, with performance declining in cluttered scenes and under conf...
-
Assistance Without Interruption: A Benchmark and LLM-based Framework for Non-Intrusive Human-Robot Assistance
The work creates NIABench and an LLM-plus-scoring-model framework that enables robots to deliver proactive assistance during human multi-step activities while avoiding interruptions and reducing human effort.
-
SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models
SafetyALFRED shows multimodal LLMs recognize kitchen hazards accurately in QA tests but achieve low success rates when required to mitigate those hazards through embodied planning.
-
ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation
ESCAPE combines spatio-temporal fusion mapping for depth-free 3D memory with a memory-driven grounding module and adaptive execution policy to reach 65.09% success on ALFRED test-seen long-horizon mobile manipulation tasks.
-
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting
Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
-
Scalable Trajectory Generation for Whole-Body Mobile Manipulation
AutoMoMa unifies AKR kinematic modeling with parallel trajectory optimization to produce 500k+ valid coordinated trajectories across 330 scenes and multiple robot embodiments, 80x faster than prior CPU methods.
-
HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation
HiRO-Nav adaptively triggers reasoning only on high-entropy actions via a hybrid training pipeline and shows better success-token trade-offs than always-reason or never-reason baselines on the CHORES-S benchmark.
-
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
RoboCasa supplies a large-scale kitchen simulator, generative assets, 100 tasks, and automated data pipelines that produce a clear scaling trend in imitation learning for generalist robots.
-
On Evaluation of Embodied Navigation Agents
Consensus recommendations for standardized evaluation measures, problem statements, and benchmarking scenarios in embodied navigation research.
-
Cross-Modal Navigation with Multi-Agent Reinforcement Learning
CRONA is a MARL framework that uses modality-specialized agents with auxiliary beliefs and a centralized multi-modal critic to achieve better performance and efficiency than single-agent baselines on visual-acoustic n...
-
ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures
ReCAPA adds predictive correction and multi-level semantic alignment to VLA models, plus two new metrics for tracking error spread and recovery, yielding competitive benchmark results over LLM baselines.
-
ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures
ReCAPA uses multi-level predictive correction and semantic alignment modules to reduce cascading failures in VLA systems, with new metrics for tracking error propagation and recovery on embodied benchmarks.
-
Environmental Understanding Vision-Language Model for Embodied Agent
EUEA fine-tunes VLMs on object perception, task planning, action understanding and goal recognition, with recovery and GRPO, to raise ALFRED success rates by 11.89% over behavior cloning.
-
EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development
EmbodiedClaw automates embodied AI development workflows through conversation, reducing manual effort and improving consistency and reproducibility.
-
RoboAgent: Chaining Basic Capabilities for Embodied Task Planning
RoboAgent chains basic vision-language capabilities inside a single VLM via a scheduler and trains it in three stages (behavior cloning, DAgger, RL) to improve embodied task planning.
-
Pre-Execution Safety Gate & Task Safety Contracts for LLM-Controlled Robot Systems
SafeGate adds a deterministic pre-execution gate and runtime contracts with Z3 SMT solving to block unsafe LLM commands for robots while passing safe ones.
-
From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments
An empirical literature analysis reveals a bifurcation in RL environments into Semantic Prior (LLM-dominated) and Domain-Specific Generalization ecosystems with distinct cognitive fingerprints.
-
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
The paper presents robosuite v1.5, a MuJoCo-based modular simulation framework with benchmark environments for reproducible robot learning research.
-
Leveraging VR Robot Games to Facilitate Data Collection for Embodied Intelligence Tasks
A VR gamified data collection system in Unity for humanoid robots demonstrates broad state-action coverage in pick-and-place tasks, with higher difficulty increasing motion intensity and workspace exploration.
-
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
OpenWorldLib offers a standardized codebase and definition for world models that combine perception, interaction, and memory to understand and predict the world.
-
3D Generation for Embodied AI and Robotic Simulation: A Survey
The survey organizes 3D generation for embodied AI into data generators for assets, simulation environments for interaction, and sim-to-real bridges, noting a shift toward interaction readiness and listing bottlenecks...
-
3D Generation for Embodied AI and Robotic Simulation: A Survey
The paper surveys 3D generation techniques for embodied AI and robotics, categorizing them into data generation, simulation environments, and sim-to-real bridging while identifying bottlenecks in physical validity and...
Reference graph
Works this paper leans on
-
[1]
Robothor: An open simulation-to-real embodied ai platform
Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, and Ali Farhadi. Robothor: An open simulation-to-real embodied ai platform. In CVPR, 2020. 2, 3, 6, 7
work page 2020
-
[2]
Procthor: Large-scale embodied ai using procedural generation
Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, and Roozbeh Mottaghi. Procthor: Large-scale embodied ai using procedural generation. arXiv, 2022. 2, 3, 6, 7, 8, 12
work page 2022
-
[3]
Learning object relation graph and tentative policy for visual navigation
Heming Du, Xin Yu, and Liang Zheng. Learning object relation graph and tentative policy for visual navigation. In ECCV, 2020. 6
work page 2020
-
[4]
What do navigation agents learn about their environment? In CVPR, 2022
Kshitij Dwivedi, Gemma Roig, Aniruddha Kembhavi, and Roozbeh Mottaghi. What do navigation agents learn about their environment? In CVPR, 2022. 7, 8
work page 2022
-
[5]
Manipulathor: A framework for visual object manipulation
Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. Manipulathor: A framework for visual object manipulation. In CVPR, 2021. 3, 8
work page 2021
-
[6]
Segan: Segmenting and generating the invisible
Kiana Ehsani, Roozbeh Mottaghi, and Ali Farhadi. Segan: Segmenting and generating the invisible. In CVPR, 2018. 8
work page 2018
-
[7]
Threedworld: A platform for interactive multi-modal physical simulation
Chuang Gan, Jeremy Schwartz, Seth Alter, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, et al. Threedworld: A platform for interactive multi-modal physical simulation. In Neural Information Processing Systems Datasets and Benchmarks Track (Round 1) , 2020. 2, 8
work page 2020
-
[8]
Look, listen, and act: Towards audio- visual embodied navigation
Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, and Joshua B Tenenbaum. Look, listen, and act: Towards audio- visual embodied navigation. In ICRA, 2020. 6, 7
work page 2020
-
[9]
Dialfred: Dialogue- enabled agents for embodied instruction following
Xiaofeng Gao, Qiaozi Gao, Ran Gong, Kaixiang Lin, Govind Thattai, and Gaurav S Sukhatme. Dialfred: Dialogue- enabled agents for embodied instruction following. IEEE Robotics and Automation Letters , 2022. 6
work page 2022
-
[10]
Iqa: Visual question answering in interactive environments
Daniel Gordon, Aniruddha Kembhavi, Mohammad Rastegari, Joseph Redmon, Dieter Fox, and Ali Farhadi. Iqa: Visual question answering in interactive environments. In CVPR, 2018. 6
work page 2018
-
[11]
Unnat Jain, Luca Weihs, Eric Kolve, Ali Farhadi, Svetlana Lazebnik, Aniruddha Kembhavi, and Alexander G. Schwing. A cordial sync: Going beyond marginal policies for multi-agent embodied tasks. In ECCV, 2020. 6
work page 2020
-
[12]
Schwing, and Aniruddha Kembhavi
Unnat Jain, Luca Weihs, Eric Kolve, Mohammad Rastegari, Svetlana Lazebnik, Ali Farhadi, Alexander G. Schwing, and Aniruddha Kembhavi. Two body problem: Collaborative visual task completion. In CVPR, 2019. 6, 7
work page 2019
-
[13]
Learning adaptive language interfaces through decomposition
Siddharth Karamcheti, Dorsa Sadigh, and Percy Liang. Learning adaptive language interfaces through decomposition. arXiv, 2020. 6
work page 2020
-
[14]
The design of stretch: A compact, lightweight mobile manipulator for indoor human environments
Charles C Kemp, Aaron Edsinger, Henry M Clever, and Blaine Matulevich. The design of stretch: A compact, lightweight mobile manipulator for indoor human environments. In ICRA, 2022. 3
work page 2022
-
[15]
Simple but effective: Clip embeddings for embodied ai
Apoorv Khandelwal, Luca Weihs, Roozbeh Mottaghi, and Aniruddha Kembhavi. Simple but effective: Clip embeddings for embodied ai. In CVPR, 2022. 6
work page 2022
-
[16]
Contrasting contrastive self- supervised representation learning pipelines
Klemen Kotar, Gabriel Ilharco, Ludwig Schmidt, Kiana Ehsani, and Roozbeh Mottaghi. Contrasting contrastive self- supervised representation learning pipelines. In ICCV, 2021. 8
work page 2021
-
[17]
Interactron: Embodied adaptive object detection
Klemen Kotar and Roozbeh Mottaghi. Interactron: Embodied adaptive object detection. In CVPR, 2022. 7, 8
work page 2022
-
[18]
igibson 2.0: Object-centric simulation for robot learning of everyday household tasks
Chengshu Li, Fei Xia, Roberto Martín-Martín, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Elliott Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, Andrey Kurenkov, Karen Liu, Hyowon Gweon, Jiajun Wu, Li Fei-Fei, and Silvio Savarese. igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. In CoRL, 2021. 2, 8
work page 2021
-
[19]
Ifr-explore: Learning inter-object functional relationships in 3d indoor scenes
Qi Li, Kaichun Mo, Yanchao Yang, Hang Zhao, and Leonidas Guibas. Ifr-explore: Learning inter-object functional relationships in 3d indoor scenes. In ICLR, 2022. 7
work page 2022
-
[20]
Multi-agent embodied visual semantic navigation with scene prior knowledge
Xinzhu Liu, Di Guo, Huaping Liu, and Fuchun Sun. Multi-agent embodied visual semantic navigation with scene prior knowledge. IEEE Robotics and Automation Letters , 2022. 6
work page 2022
-
[21]
Learning about objects by learning to interact with them
Martin Lohmann, Jordi Salvador, Aniruddha Kembhavi, and Roozbeh Mottaghi. Learning about objects by learning to interact with them. In NeurIPS, 2020. 8
work page 2020
-
[22]
Yi Lu, Yaran Chen, Dongbin Zhao, and Dong Li. Mgrl: Graph neural network based inference in a markov network with reinforcement learning for visual navigation. Neurocomputing, 2021. 6
work page 2021
-
[23]
Film: Following instructions in language with modular methods
So Yeon Min, Devendra Singh Chaplot, Pradeep Ravikumar, Yonatan Bisk, and Ruslan Salakhutdinov. Film: Following instructions in language with modular methods. In ICLR, 2022. 6
work page 2022
-
[24]
Pyrobot: An open-source robotics framework for research and benchmarking
Adithyavairavan Murali, Tao Chen, Kalyan Vasudev Alwala, Dhiraj Gandhi, Lerrel Pinto, Saurabh Gupta, and Abhinav Gupta. Pyrobot: An open-source robotics framework for research and benchmarking. arXiv, 2019. 3
work page 2019
-
[25]
Learning affordance landscapes for interaction exploration in 3d environments
Tushar Nagarajan and Kristen Grauman. Learning affordance landscapes for interaction exploration in 3d environments. In NeurIPS, 2020. 7
work page 2020
-
[26]
Shaping embodied agent behavior with activity-context priors from egocentric video
Tushar Nagarajan and Kristen Grauman. Shaping embodied agent behavior with activity-context priors from egocentric video. In NeurIPS, 2021. 7
work page 2021
-
[27]
Teach: Task-driven embodied agents that chat
Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, and Dilek Hakkani-Tur. Teach: Task-driven embodied agents that chat. In AAAI,
-
[28]
Episodic transformer for vision-and-language navigation
Alexander Pashevich, Cordelia Schmid, and Chen Sun. Episodic transformer for vision-and-language navigation. In ICCV, 2021. 6 9
work page 2021
-
[29]
Habitat-matterport 3d dataset (HM3d): 1000 large-scale 3d environments for embodied AI
Santhosh Kumar Ramakrishnan, Aaron Gokaslan, Erik Wijmans, Oleksandr Maksymets, Alexander Clegg, John M Turner, Eric Undersander, Wojciech Galuba, Andrew Westbury, Angel X Chang, Manolis Savva, Yili Zhao, and Dhruv Batra. Habitat-matterport 3d dataset (HM3d): 1000 large-scale 3d environments for embodied AI. InNeural Information Processing Systems Dataset...
work page 2021
-
[30]
Habitat: A platform for embodied ai research
Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, and Dhruv Batra. Habitat: A platform for embodied ai research. In ICCV,
-
[31]
Alfred: A benchmark for interpreting grounded instructions for everyday tasks
Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, and Dieter Fox. Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In CVPR, 2020. 6, 7
work page 2020
-
[32]
Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra
Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir V ondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel X. Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra. Habitat 2.0: Traini...
work page 2021
-
[33]
Multi-agent embodied question answering in interactive environments
Sinan Tan, Weilai Xiang, Huaping Liu, Di Guo, and Fuchun Sun. Multi-agent embodied question answering in interactive environments. In ECCV, 2020. 6
work page 2020
-
[34]
Luca Weihs, Matt Deitke, Aniruddha Kembhavi, and Roozbeh Mottaghi. Visual room rearrangement. In CVPR, 2021. 7, 8
work page 2021
-
[35]
Learning generalizable visual representations via interactive gameplay
Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, and Ali Farhadi. Learning generalizable visual representations via interactive gameplay. In ICLR, 2021. 6, 8
work page 2021
-
[36]
Allenact: A framework for embodied AI research
Luca Weihs, Jordi Salvador, Klemen Kotar, Unnat Jain, Kuo-Hao Zeng, Roozbeh Mottaghi, and Aniruddha Kembhavi. Allenact: A framework for embodied AI research. arXiv, 2020. 11
work page 2020
-
[37]
Learning to learn how to learn: Self-adaptive visual navigation using meta-learning
Mitchell Wortsman, Kiana Ehsani, Mohammad Rastegari, Ali Farhadi, and Roozbeh Mottaghi. Learning to learn how to learn: Self-adaptive visual navigation using meta-learning. In CVPR, 2019. 6
work page 2019
-
[38]
Communicative learning with natural gestures for embodied navigation agents with human-in-the-scene
Qi Wu, Cheng-Ju Wu, Yixin Zhu, and Jungseock Joo. Communicative learning with natural gestures for embodied navigation agents with human-in-the-scene. In IROS, 2021. 6, 7
work page 2021
-
[39]
Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel X. Chang, Leonidas J. Guibas, and Hao Su. Sapien: A simulated part-based interactive environment. In CVPR, 2020. 8
work page 2020
-
[40]
Visual semantic navigation using scene priors
Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, and Roozbeh Mottaghi. Visual semantic navigation using scene priors. In ICLR, 2019. 6
work page 2019
-
[41]
Peters, Roozbeh Mottaghi, Aniruddha Kembhavi, Ali Farhadi, and Yejin Choi
Rowan Zellers, Ari Holtzman, Matthew E. Peters, Roozbeh Mottaghi, Aniruddha Kembhavi, Ali Farhadi, and Yejin Choi. Piglet: Language grounding through neuro-symbolic interaction in a 3d world. In ACL, 2021. 6
work page 2021
-
[42]
Visual reaction: Learning to play catch with your drone
Kuo-Hao Zeng, Roozbeh Mottaghi, Luca Weihs, and Ali Farhadi. Visual reaction: Learning to play catch with your drone. In CVPR, 2020. 3
work page 2020
-
[43]
Lumi- nous: Indoor scene generation for embodied ai challenges
Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi Gao, Govind Thattai, Jesse Thomason, and Gaurav S Sukhatme. Lumi- nous: Indoor scene generation for embodied ai challenges. arXiv, 2021. 8
work page 2021
-
[44]
Towards optimal correlational object search
Kaiyu Zheng, Rohan Chitnis, Yoonchang Sung, George Konidaris, and Stefanie Tellex. Towards optimal correlational object search. In ICRA, 2022. 6
work page 2022
-
[45]
Target-driven visual navigation in indoor scenes using deep reinforcement learning
Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J Lim, Abhinav Gupta, Li Fei-Fei, and Ali Farhadi. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In ICRA, 2017. 6, 7, 11 A Contributions Eric Kolve was the lead engineer and built the API that connects Python and Unity, setup the infrastructure for maintenance and develop...
work page 2017
-
[46]
Different simulators support different agents, each with their own action spaces and capabilities, with little standardization across simulators. AI2-THOR supports many different types of agents, including the Ma- nipulaTHOR, Abstract, and LoCoBot agents. The ManipulaTHOR agent is often slower to simulate than a navigation-only LoCoBot agent as it is more...
- [47]
-
[48]
These factors include: (a) Model forward pass when computing agent rollouts
When training agents via reinforcement learning, there are a large number of factors that bottleneck training speed and so the value of raw simulator speed is substantially reduced. These factors include: (a) Model forward pass when computing agent rollouts. (b) Model backward pass when computing gradients for RL losses. (c) Environment resets - for many ...
work page 2080
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.