pith. machine review for the scientific record. sign in

arxiv: 2604.08033 · v1 · submitted 2026-04-09 · 💻 cs.AI · cs.MA· cs.NI

Recognition: 2 theorem links

· Lean Theorem

IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:45 UTC · model grok-4.3

classification 💻 cs.AI cs.MAcs.NI
keywords semantic-spatial sensor schedulingIoTlarge language modelsneuro-symbolic AIspatial trajectory graphintent-driven systemssensor networks
0
0 comments X

The pith

IoT-Brain turns LLM sensor planning into verifiable graph optimization to close semantic-to-physical gaps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that large language models cannot reliably decide which sensors to activate and when based on natural-language goals because of gaps in representing physical space, reasoning about constraints, and optimizing actions. It formalizes the problem as Semantic-Spatial Sensor Scheduling and introduces the Spatial Trajectory Graph as a neuro-symbolic layer that forces verification of every plan before execution. This converts open-ended LLM outputs into a structured graph problem that can be checked and solved efficiently. The resulting IoT-Brain system raises task success rates, cuts computation and network use, and approaches the reliability limit of exhaustive search in both simulated benchmarks and real campus deployments.

Core claim

Direct LLM planning for sensor scheduling is unreliable due to representation, reasoning, and optimization gaps; the Spatial Trajectory Graph bridges these by converting planning into a verifiable graph optimization problem under a verify-before-commit discipline, enabling IoT-Brain to approach reliability upper bounds.

What carries the argument

The Spatial Trajectory Graph (STG): a neuro-symbolic structure that maps semantic intent to spatial trajectories and turns open planning into verifiable graph optimization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same verify-before-commit graph layer could be applied to other language-to-action settings such as robot task planning or drone mission generation.
  • If the graph construction step proves sensitive to environment details, future work would need automated ways to learn or adapt the graph from data rather than manual design.
  • The reported bandwidth and token reductions suggest that grounding LLMs this way could make large-scale intent-driven sensor networks practical at city scale.

Load-bearing premise

The Spatial Trajectory Graph accurately captures all relevant semantic-to-physical mapping gaps and real-world constraints without introducing new representation errors or requiring extensive manual tuning for different environments.

What would settle it

Deploying IoT-Brain in a new physical environment whose constraints are absent from the graph and measuring whether task success falls below the strongest baseline search method.

Figures

Figures reproduced from arXiv: 2604.08033 by Jinke Song, Junda Lin, Junyang Wang, Lan Zhang, Mu Yuan, Zhaomeng Zhou.

Figure 1
Figure 1. Figure 1: The challenge of the S 3 problem. Solving the S3 problem with off-the-shelf LLMs is far from straightforward. Our preliminary study (§2.2) tasks an LLM with end-to-end scheduling in a real-world topological envi￾ronment, revealing three fundamental challenges: (1) Symbol-to-Semantic Chasm. LLMs’ native shortcom￾ing in comprehending raw, machine-oriented symbolic topolo￾gies prevents them from building an e… view at source ↗
Figure 2
Figure 2. Figure 2: The representation gap and resulting planning failures. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The system overview and workflow of IoT-Brain. hypotheses into concrete checks by invoking our custom￾crafted deterministic Verifying Toolkit, a Python library de￾signed to query the explicit geometry and sensor coverage within 𝑊 . This rigorous loop prunes topologically infeasible branches until a consistent grounded graph emerges. Veri￾fied facts are cached as system-wide topological consensus in Spatial… view at source ↗
Figure 4
Figure 4. Figure 4: Dataflow of the Semantic Structuring phase. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Workflow of the Hypothesis-Verification Loop. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Workflow of the Perception Aligner. the contained spatial path 𝜏𝑉 are consistent with the physical world, thereby providing a solid and reliable foundation for the subsequent scheduling optimization. 4.4 From Plan to Optimized Action With a fully grounded spatial blueprint 𝐺★ in place, the final phase of IoT-Brain translates it into dynamic action and per￾ception in the physical world. The workflow compris… view at source ↗
Figure 7
Figure 7. Figure 7: Conceptual workflow comparison of agentic paradigms. We contrast the brittle Hierarchical approach, [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The overall performance on TopoSense-Bench. We evaluate Hierarchical (Hi.), Reactive (Re.), Backtracking (Ba.), and IoT-Brain (Io.) across five task categories: T1.F (Focal Scene), T1.P (Panoramic), T2 (Intra-Building), T3.O (Open-Space), and T3.H (Hybrid). Metrics include task success rate, blueprint correctness, token usage, iteration rounds, and end-to-end latency. The legend displays the average perfor… view at source ↗
Figure 9
Figure 9. Figure 9: (a) & (d): Ablation study of IoT-Brain’s core components on single and multi scenario tasks. (A.: Anchor, R.: Reasoner, V.: Verifier, M.: Memory, S: Single-Scenario, M: Multi-Scenario). (b) & (c): Sensitivity to different LLMs. (GL.: GLM-4, Q.: Qwen-Max, D.: DeepSeek-V3, Ge.: Gemini-2.5-Flash). (e) & (f): Scalability with query complexity. impacting single-run TSR, nearly doubles latency in com￾plex settin… view at source ↗
Figure 10
Figure 10. Figure 10: The real-world testbed environment. 6.1 Testbed Configuration Physical Environment. Our real-world testbed is a large￾scale university campus instrumented with 2,510 Hikvision IP cameras distributed across 11 major areas. This diverse environment presents significant heterogeneity, where cov￾erage ranges from sparse outdoor road networks to dense indoor deployments, with the Lab Building alone hosting 268… view at source ↗
Figure 11
Figure 11. Figure 11: System performance analysis. sensor set increases the volume of frames for VLM infer￾ence. By contrast, the initial Semantic Structuring stage is remarkably efficient. Overall, this analysis reveals that the primary latency drivers are not inefficiencies in our planning engine, but rather the intrinsic complexity of the tasks, which demands substantial verification and perception effort. Sensitivity to VL… view at source ↗
read the original abstract

Intelligent systems powered by large-scale sensor networks are shifting from predefined monitoring to intent-driven operation, revealing a critical Semantic-to-Physical Mapping Gap. While large language models (LLMs) excel at semantic understanding, existing perception-centric pipelines operate retrospectively, overlooking the fundamental decision of what to sense and when. We formalize this proactive decision as Semantic-Spatial Sensor Scheduling (S3) and demonstrate that direct LLM planning is unreliable due to inherent gaps in representation, reasoning, and optimization. To bridge these gaps, we introduce the Spatial Trajectory Graph (STG), a neuro-symbolic paradigm governed by a verify-before-commit discipline that transforms open-ended planning into a verifiable graph optimization problem. Based on STG, we implement IoT-Brain, a concrete system embodiment, and construct TopoSense-Bench, a campus-scale benchmark with 5,250 natural-language queries across 2,510 cameras. Evaluations show that IoT-Brain boosts task success rate by 37.6% over the strongest search-intensive methods while running nearly 2 times faster and using 6.6 times fewer prompt tokens. In real-world deployment, it approaches the reliability upper bound while reducing 4.1 times network bandwidth, providing a foundational framework for LLMs to interact with the physical world with unprecedented reliability and efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript formalizes the Semantic-Spatial Sensor Scheduling (S3) problem to address gaps between LLM semantic reasoning and physical sensor selection in IoT networks. It introduces the Spatial Trajectory Graph (STG) as a neuro-symbolic structure enforcing a verify-before-commit discipline that converts open-ended LLM planning into verifiable graph optimization. The IoT-Brain system is implemented and evaluated on the TopoSense-Bench benchmark (5,250 natural-language queries across 2,510 cameras), reporting a 37.6% task success rate improvement over search-intensive baselines, nearly 2× faster runtime, 6.6× fewer prompt tokens, and in real-world deployment a 4.1× network bandwidth reduction while approaching reliability bounds.

Significance. If the central claims hold after addressing evaluation gaps, the work supplies a concrete neuro-symbolic framework for grounding LLMs in proactive physical-world sensor control. The STG approach and accompanying TopoSense-Bench benchmark constitute reusable contributions that could influence research on embodied AI, intent-driven IoT, and reliable LLM planning for sensor networks.

major comments (2)
  1. [§6] §6 (Experiments): The reported performance gains (37.6% success-rate lift, 2× speed, 6.6× token reduction, 4.1× bandwidth savings) are presented without error bars, statistical tests, or explicit descriptions of baseline implementations, query filtering rules, and hyperparameter choices for the STG. This information is load-bearing for assessing whether the gains are robust or sensitive to post-hoc benchmark tuning.
  2. [§4] §4 (Spatial Trajectory Graph) and §6: The STG construction and verify-before-commit checks are evaluated only on a single campus-scale environment; no cross-environment ablations, automatic induction from raw sensor metadata, or tests under distribution shift are reported. This directly affects the claim that the neuro-symbolic paradigm reliably bridges the semantic-to-physical gap in general settings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential impact of the neuro-symbolic STG framework and TopoSense-Bench. We address the two major comments point-by-point below, committing to revisions that enhance reproducibility and clarify scope without overstating current results.

read point-by-point responses
  1. Referee: [§6] §6 (Experiments): The reported performance gains (37.6% success-rate lift, 2× speed, 6.6× token reduction, 4.1× bandwidth savings) are presented without error bars, statistical tests, or explicit descriptions of baseline implementations, query filtering rules, and hyperparameter choices for the STG. This information is load-bearing for assessing whether the gains are robust or sensitive to post-hoc benchmark tuning.

    Authors: We agree that the absence of error bars, statistical tests, and full baseline details limits assessment of robustness. In the revised manuscript we will add: (i) standard deviations and error bars from at least five independent runs with different random seeds; (ii) statistical significance results (paired t-tests or Wilcoxon tests as appropriate); (iii) explicit pseudocode and hyperparameter tables for all baselines, including search depth, beam size, and prompt templates; (iv) the precise query filtering and sampling rules applied to generate the 5,250 queries; and (v) the complete STG hyperparameter set (trajectory length, verification thresholds, graph pruning criteria). These additions will make clear that reported gains are not artifacts of post-hoc tuning. revision: yes

  2. Referee: [§4] §4 (Spatial Trajectory Graph) and §6: The STG construction and verify-before-commit checks are evaluated only on a single campus-scale environment; no cross-environment ablations, automatic induction from raw sensor metadata, or tests under distribution shift are reported. This directly affects the claim that the neuro-symbolic paradigm reliably bridges the semantic-to-physical gap in general settings.

    Authors: The current evaluation deliberately uses a single but large and heterogeneous campus deployment (2,510 cameras, 5,250 queries) to stress-test the verify-before-commit discipline under realistic spatial-semantic complexity. We acknowledge that cross-environment ablations and explicit distribution-shift experiments are absent. In revision we will: expand §4 with a precise algorithmic description of automatic STG induction directly from raw sensor metadata (coordinates, FOVs, semantic tags); add a dedicated limitations paragraph in §6 and the conclusion that explicitly states the single-environment scope and outlines how the same construction pipeline could be applied to new deployments; and include a brief qualitative discussion of expected behavior under moderate distribution shift. We do not claim universal generalizability from the present results and will tone down any such implication. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper formalizes the Semantic-to-Physical Mapping Gap and Semantic-Spatial Sensor Scheduling (S3) problem, then introduces the Spatial Trajectory Graph (STG) as a novel neuro-symbolic construct using verify-before-commit to convert open-ended LLM planning into graph optimization. IoT-Brain is presented as the system embodiment of this approach, with a new campus-scale benchmark (TopoSense-Bench) constructed for evaluation. Performance claims (e.g., 37.6% success-rate improvement, speed/token/bandwidth gains) are reported as empirical outcomes on this benchmark rather than quantities defined by construction from fitted parameters or prior self-citations. No load-bearing steps reduce to self-definition, renamed known results, or self-citation chains; the central claims rest on independent benchmark results and the introduced STG paradigm. This is the most common honest outcome for papers whose core contribution is a new representation plus empirical validation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the STG successfully bridging representation, reasoning, and optimization gaps in LLMs for physical scheduling, with empirical support from the benchmark but no formal verification or external independent evidence for the graph's completeness.

axioms (1)
  • domain assumption Graph-based optimization can reliably verify and commit to sensor activation plans derived from natural language
    Invoked in the verify-before-commit discipline and transformation of planning into graph optimization.
invented entities (1)
  • Spatial Trajectory Graph (STG) no independent evidence
    purpose: Transforms open-ended LLM planning into a verifiable graph optimization problem to bridge semantic-spatial gaps
    New structure introduced to address inherent LLM limitations in representation, reasoning, and optimization for sensor scheduling.

pith-pipeline@v0.9.0 · 5549 in / 1358 out tokens · 85907 ms · 2026-05-10T17:45:17.057552+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

93 extracted references · 26 canonical work pages · 12 internal anchors

  1. [1]

    Nominatim API Manual (latest)

    2025. Nominatim API Manual (latest). https://nominatim.org/release- docs/latest/api/Overview/

  2. [2]

    OpenStreetMap Taginfo

    2025. OpenStreetMap Taginfo. https://taginfo.openstreetmap.org/

  3. [3]

    Anil Kumar, and Lelitha Devi Vanajakshi

    Avinash Achar, Dhivya Bharathi, B. Anil Kumar, and Lelitha Devi Vanajakshi. 2020. Bus Arrival Time Prediction: A Spatial Kalman Filter Approach.IEEE Transactions on Intelligent Transportation Sys- tems21 (2020), 1298–1307. https://api.semanticscholar.org/CorpusID: 182478323

  4. [4]

    Lawrence Zitnick, Devi Parikh, and Dhruv Batra

    Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Devi Parikh, and Dhruv Batra. 2015. VQA: Visual Question Answering.International Journal of Computer Vision123 (2015), 4 – 31. https://api.semanticscholar.org/CorpusID:3180429

  5. [5]

    Ibrahim, and Paul Steenson

    Harith Al-Safi, Harith S. Ibrahim, and Paul Steenson. 2025. Vega: LLM- Driven Intelligent Chatbot Platform for Internet of Things Control and Development.Sensors (Basel, Switzerland)25 (2025). https://api. semanticscholar.org/CorpusID:279531727

  6. [6]

    Bhowmick, Krishna Murthy Jatavallabhula, Mohan Sridharan, and Madhava Krishna

    Raghav Arora, Shivam Singh, Karthik Swaminathan, Ahana Datta, Snehasis Banerjee, B. Bhowmick, Krishna Murthy Jatavallabhula, Mohan Sridharan, and Madhava Krishna. 2024. Anticipate & Act: Integrating LLMs and Classical Planning for Efficient Task Execu- tion in Household Environments†.2024 IEEE International Confer- ence on Robotics and Automation (ICRA)(2...

  7. [7]

    Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. 2023. Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities.ArXiv abs/2308.12966 (2023). https://api.semanticscholar.org/CorpusID: 263875678

  8. [8]

    Stefano Basagni, Federico Ceccarelli, Chiara Petrioli, Nithila Raman, and Abhimanyu Venkatraman Sheshashayee. 2019. Wake-up Ra- dio Ranges: A Performance Study.2019 IEEE Wireless Communi- cations and Networking Conference (WCNC)(2019), 1–6. https: //api.semanticscholar.org/CorpusID:202548980

  9. [9]

    Littman, and Blase Ur

    Will Brackenbury, Abhimanyu Deora, Jillian Ritchey, Jason Vallee, Weijia He, Guan Wang, Michael L. Littman, and Blase Ur. 2019. How Users Interpret Bugs in Trigger-Action Programming.Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (2019). https://api.semanticscholar.org/CorpusID:140242523

  10. [10]

    Jiarui Cai, Mingze Xu, Wei Li, Yuanjun Xiong, Wei Xia, Zhuowen Tu, and Stefan 0 Soatto. 2022. MeMOT: Multi-Object Tracking with Memory.2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2022), 8080–8090. https://api.semanticscholar. org/CorpusID:247839756

  11. [11]

    Ai Chen, Santosh Kumar, and Ten-Hwang Lai. 2007. Designing local- ized algorithms for barrier coverage. InACM/IEEE International Confer- ence on Mobile Computing and Networking. https://api.semanticscholar. org/CorpusID:2864152

  12. [12]

    Junzhi Chen, Juhao Liang, and Benyou Wang. 2024. Smurfs: Multi- Agent System using Context-Efficient DFSDT for Tool Planning. In North American Chapter of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:269635774

  13. [13]

    Liyi Chen, Panrong Tong, Zhongming Jin, Ying Sun, Jieping Ye, and Huixia Xiong. 2024. Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs.ArXivabs/2410.23875 (2024). https://api.semanticscholar.org/CorpusID:273707190

  14. [14]

    Yanbei Chen, Xiatian Zhu, and Shaogang Gong. 2017. Person Re- identification by Deep Learning Multi-scale Representations.2017 IEEE International Conference on Computer Vision Workshops (ICCVW) (2017), 2590–2600. https://api.semanticscholar.org/CorpusID:4729614

  15. [15]

    Ziyang Chen, Zhangli Zhou, Lin Li, and Zheng Kan. 2024. Active Inference for Reactive Temporal Logic Motion Planning.2024 IEEE International Conference on Robotics and Automation (ICRA)(2024), 2520–2526. https://api.semanticscholar.org/CorpusID:271798591

  16. [16]

    Ye Cheng, Minghui Xu, Yue Zhang, Kun Li, Ruoxi Wang, and Lian Yang. 2024. AutoIoT: Automated IoT Platform Using Large Language Models.IEEE Internet of Things Journal12 (2024), 13644–13656. https: //api.semanticscholar.org/CorpusID:274131336

  17. [17]

    DeepSeek-AI. 2025. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv:2501.12948 https://arxiv. org/abs/2501.12948

  18. [18]

    Donald, Craig H

    Fiona M. Donald, Craig H. M. Donald, and Andrew Thatcher. 2015. Work exposure and vigilance decrements in closed circuit televi- sion surveillance.Applied ergonomics47 (2015), 220–8. https: //api.semanticscholar.org/CorpusID:25574518

  19. [19]

    Feng, Yuwei Du, Tianhui Liu, Siqi Guo, Yuming Lin, and Yong Li

    J. Feng, Yuwei Du, Tianhui Liu, Siqi Guo, Yuming Lin, and Yong Li. 2024. CityGPT: Empowering Urban Spatial Cognition of Large Language Models.ArXivabs/2406.13948 (2024). https://api.semanticscholar.org/ CorpusID:270619725

  20. [20]

    Feng, Shengyuan Wang, Tianhui Liu, Yanxin Xi, and Yong Li

    J. Feng, Shengyuan Wang, Tianhui Liu, Yanxin Xi, and Yong Li

  21. [21]

    https://api.semanticscholar.org/CorpusID: 280010693

    UrbanLLaVA: A Multi-modal Large Language Model for Ur- ban Intelligence with Spatial Reasoning and Understanding.ArXiv abs/2506.23219 (2025). https://api.semanticscholar.org/CorpusID: 280010693

  22. [22]

    Yi Gao, Kaijie Xiao, Fu Li, Weifeng Xu, Jiaming Huang, and Wei Dong

  23. [23]

    https://api.semanticscholar

    ChatIoT: Zero-code Generation of Trigger-action Based IoT Programs.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies8 (2024), 1 – 29. https://api.semanticscholar. org/CorpusID:272563565

  24. [24]

    Yingqiang Ge, Wenyue Hua, Kai Mei, Jianchao Ji, Juntao Tan, Shuyuan Xu, Zelong Li, and Yongfeng Zhang. 2023. OpenAGI: When LLM Meets Domain Experts.ArXivabs/2304.04370 (2023). https://api. semanticscholar.org/CorpusID:258049306

  25. [25]

    2025.Gemini 2.5 Flash: Model Card

    Google DeepMind. 2025.Gemini 2.5 Flash: Model Card. Technical Re- port. Google. https://storage.googleapis.com/model-cards/documents/ gemini-2.5-flash.pdf

  26. [26]

    Google DeepMind. 2025. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Gen- eration Agentic Capabilities.arXivabs/2507.06261 (2025). https: //arxiv.org/abs/2507.06261

  27. [27]

    Mordechai (Muki) Haklay and Patrick Weber. 2008. OpenStreetMap: User-Generated Street Maps.IEEE Pervasive Computing7 (2008), 12–18. https://api.semanticscholar.org/CorpusID:16588111

  28. [28]

    Hodgetts, François Vachon, Cindy Chamberland, and Sébastien Tremblay

    Helen M. Hodgetts, François Vachon, Cindy Chamberland, and Sébastien Tremblay. 2017. See No Evil: Cognitive Challenges of Se- curity Surveillance and Monitoring.Journal of applied research in memory and cognition6 (2017), 230–243. https://api.semanticscholar. org/CorpusID:261257329

  29. [29]

    Bin Hu, Xinggang Wang, and Wenyu Liu. 2024. PersonViT: Large-scale Self-supervised Vision Transformer for Person Re-Identification.Mach. Vis. Appl.36 (2024), 32. https://api.semanticscholar.org/CorpusID: 271854919

  30. [30]

    Justin Huang and Maya Cakmak. 2015. Supporting mental model accuracy in trigger-action programming.Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (2015). https://api.semanticscholar.org/CorpusID:207225561

  31. [31]

    HUST Vision and Learning Group. 2025. PersonViT. https://github. com/hustvl/PersonViT. GitHub repository

  32. [32]

    Riheng Jia, Jinhao Wu, Xiong Wang, Jianfeng Lu, Feilong Lin, Zhon- glong Zheng, and Minglu Li. 2023. Energy Cost Minimization in Wireless Rechargeable Sensor Networks.IEEE/ACM Transactions on Networking31 (2023), 2345–2360. https://api.semanticscholar.org/ IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling MobiCom ’26, October 26–30, 2026, A...

  33. [33]

    Jiayu Jiang, Changxing Ding, Wentao Tan, Junhong Wang, Jin Tao, and Xiangmin Xu. 2025. Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification.2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2025), 9220–9230. https://api.semanticscholar.org/CorpusID:276961622

  34. [34]

    Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim

  35. [35]

    A Survey on Large Language Models for Code Generation

    A Survey on Large Language Models for Code Generation.ArXiv abs/2406.00515 (2024). https://api.semanticscholar.org/CorpusID: 270214176

  36. [36]

    Evan King, Haoxiang Yu, Sangsu Lee, and Christine Julien. 2024. Sasha: Creative Goal-Oriented Reasoning in Smart Homes with Large Lan- guage Models.Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 8, 1, Article 12 (2024), 38 pages. doi:10.1145/3643505

  37. [37]

    Tomavs Krajnik, Jaime Pulido Fentanes, Marc Hanheide, and Tom Duckett. 2016. Persistent localization and life-long mapping in chang- ing environments using the Frequency Map Enhancement.2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(2016), 4558–4563. https://api.semanticscholar.org/CorpusID: 4969989

  38. [38]

    Santosh Kumar, Ten-Hwang Lai, and Anish Arora. 2005. Barrier cov- erage with wireless sensors.Wireless Networks13 (2005), 817–834. https://api.semanticscholar.org/CorpusID:565989

  39. [39]

    Carolin Lawrence and Stefan Riezler. 2016. NLmaps: A Natural Lan- guage Interface to Query OpenStreetMap. InCOLING 2016 System Demonstrations

  40. [40]

    Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kuttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval- Augmented Generation for Knowledge-Intensive NLP Tasks.ArXiv abs/2005.11401 (2020). https://api.semanticscholar.org/CorpusID: 218869575

  41. [41]

    Yunhao Li, Xiaoqiong Liu, Luke Liu, Heng Fan, and Libo Zhang

  42. [42]

    https://api.semanticscholar.org/CorpusID: 270391880

    LaMOT: Language-Guided Multi-Object Tracking.ArXiv abs/2406.08324 (2024). https://api.semanticscholar.org/CorpusID: 270391880

  43. [43]

    Jie Lin, Wei Yu, Nan Zhang, Xinyu Yang, Hanlin Zhang, and Wei Zhao. 2017. A Survey on Internet of Things: Architecture, Enabling Technologies, Security and Privacy, and Applications.IEEE Internet of Things Journal4 (2017), 1125–1142. https://api.semanticscholar.org/ CorpusID:31245252

  44. [44]

    Kaiwei Liu, Bufang Yang, Lilin Xu, Yunqi Guo, Guoliang Xing, Xian Shuai, Xiaozhe Ren, Xin Jiang, and Zhenyu Yan. 2025. TaskSense: A Translation-like Approach for Tasking Heterogeneous Sensor Systems with LLMs.Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems(2025). https://api.semanticscholar.org/ CorpusID:278326090

  45. [45]

    Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, and Wenhai Wang. 2023. ControlLLM: Augment Language Models with Tools by Searching on Graphs.ArXivabs/2310.17796 (2023). https://api.semanticscholar.org/ CorpusID:264555643

  46. [46]

    Fu, Qinghua Hu, and Bing Wu

    Huan Ma, Changqing Zhang, Yatao Bian, Lemao Liu, Zhirui Zhang, Peilin Zhao, Shu Zhang, H. Fu, Qinghua Hu, and Bing Wu. 2023. Fairness-guided Few-shot Prompting for Large Language Mod- els.ArXivabs/2303.13217 (2023). https://api.semanticscholar.org/ CorpusID:257687840

  47. [47]

    Tinashe Magara and Yousheng Zhou. 2024. Internet of Things (IoT) of Smart Homes: Privacy and Security.J. Electr. Comput. Eng.2024 (2024), 1–17. https://api.semanticscholar.org/CorpusID:269065642

  48. [48]

    Alexandre Marois, Daniel Lafond, Alexandre Williot, François Va- chon, and Sébastien Tremblay. 2020. Real-Time Gaze-Aware Cognitive Support System for Security Surveillance.Proceedings of the Human Factors and Ergonomics Society Annual Meeting64 (2020), 1145 – 1149. https://api.semanticscholar.org/CorpusID:231876226

  49. [49]

    Francesca Meneghello, Matteo Calore, Daniel Zucchetto, Michele Polese, and Andrea Zanella. 2019. IoT: Internet of Threats? A Survey of Practical Security Vulnerabilities in Real IoT Devices.IEEE Internet of Things Journal6 (2019), 8182–8201. https://api.semanticscholar. org/CorpusID:201889124

  50. [50]

    Hamid Menouar, Ismail Guvenc, Kemal Akkaya, Arif Selcuk Ulu- agac, Abdullah Kadri, and Adem Tuncer. 2017. UAV-Enabled In- telligent Transportation Systems for the Smart City: Applications and Challenges.IEEE Communications Magazine55 (2017), 22–28. https://api.semanticscholar.org/CorpusID:38330180

  51. [51]

    Meta AI. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXivabs/2307.09288 (2023). https://arxiv.org/abs/2307.09288

  52. [52]

    Morgan, Emily Collins, Tasos Spiliotopoulos, David J

    Phillip L. Morgan, Emily Collins, Tasos Spiliotopoulos, David J. Greeno, and Dylan M. Jones. 2022. Reducing risk to security and privacy in the selection of trigger-action rules: Implicit vs. explicit priming for domestic smart devices.Int. J. Hum. Comput. Stud.168 (2022), 102902. https://api.semanticscholar.org/CorpusID:251341078

  53. [53]

    Naser Hossein Motlagh. 2021. How Low Can You Go?Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5 (2021), 1 – 22. https://api.semanticscholar.org/CorpusID:248245897

  54. [54]

    OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL] doi:10.48550/arXiv.2303.08774

  55. [55]

    2025.o3 and o4-mini System Card

    OpenAI. 2025.o3 and o4-mini System Card. System Card. OpenAI. https://cdn.openai.com/pdf/2221c875-02dc-4789-800b- e7758f3722c1/o3-and-o4-mini-system-card.pdf

  56. [56]

    Serge Pelletier, Joel Suss, François Vachon, and Sébastien Trem- blay. 2015. Atypical Visual Display for Monitoring Multiple CCTV Feeds.Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems(2015). https: //api.semanticscholar.org/CorpusID:304177

  57. [57]

    Kai Peng, Hualong Huang, Muhammad Bilal, and Xiaolong Xu. 2023. Distributed Incentives for Intelligent Offloading and Resource Allo- cation in Digital Twin Driven Smart Industry.IEEE Transactions on Industrial Informatics19 (2023), 3133–3143. https://api.semanticscholar. org/CorpusID:249911526

  58. [58]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

    Yujia Qin, Shi Liang, Yining Ye, Kunlun Zhu, Lan Yan, Ya-Ting Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Runchu Tian, Ruobing Xie, Jie Zhou, Marc H. Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. 2023. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs.ArXivabs/2307.16789 (2023). https://api.semanticscholar...

  59. [59]

    Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, and Jirong Wen. 2024. Tool Learning with Large Language Models: A Survey.ArXivabs/2405.17935 (2024). https: //api.semanticscholar.org/CorpusID:270067624

  60. [60]

    Brian Reily, Terran Mott, and Hao Zhang. 2020. Adaptation to Team Composition Changes for Heterogeneous Multi-Robot Sensor Cover- age.2021 IEEE International Conference on Robotics and Automation (ICRA)(2020), 9051–9057. https://api.semanticscholar.org/CorpusID: 229297612

  61. [61]

    Zhiwei Ren, Junbo Li, Minjia Zhang, Di Wang, Xiaoran Fan, and Longfei Shangguan. 2025. Toward Sensor-In-the-Loop LLM Agent: Benchmarks and Implications.Proceedings of the 23rd ACM Con- ference on Embedded Networked Sensor Systems(2025). https://api. semanticscholar.org/CorpusID:278326126

  62. [62]

    Stefano De Sabbata, Stefano Mizzaro, and Kevin Roitero. 2025. Geospa- tial Mechanistic Interpretability of Large Language Models.ArXiv abs/2505.03368 (2025). https://api.semanticscholar.org/CorpusID: 278339325 MobiCom ’26, October 26–30, 2026, Austin, TX, USA Zhou et al

  63. [63]

    Christoph Schöller, Vincent Aravantinos, Florian Samuel Lay, and Alois Knoll. 2019. The Simpler the Better: Constant Velocity for Pedestrian Motion Prediction.ArXivabs/1903.07933 (2019). https: //api.semanticscholar.org/CorpusID:83458829

  64. [64]

    Leming Shen, Qian Yang, Xinyu Huang, Zijing Ma, and Yuanqing Zheng. 2025. GPIoT: Tailoring Small Language Models for IoT Pro- gram Synthesis and Development.Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems(2025). https: //api.semanticscholar.org/CorpusID:276742179

  65. [65]

    Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yue Ting Zhuang. 2023. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face.ArXivabs/2303.17580 (2023). https://api.semanticscholar.org/CorpusID:257833781

  66. [66]

    Kyujin Shim, Sungjoon Yoon, Kangwook Ko, and Changick Kim

  67. [67]

    https://api.semanticscholar.org/CorpusID:235632675

    Multi-Target Multi-Camera Vehicle Tracking for City-Scale Traffic Management.2021 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition Workshops (CVPRW)(2021), 4188–4195. https://api.semanticscholar.org/CorpusID:235632675

  68. [68]

    Noah Shinn, Federico Cassano, Beck Labash, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: language agents with verbal reinforcement learning. InNeural Information Processing Sys- tems. https://api.semanticscholar.org/CorpusID:258833055

  69. [69]

    Wentao Tan, Changxing Ding, Jiayu Jiang, Fei Wang, Yibing Zhan, and Dapeng Tao. 2024. Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2024), 17127–17137. https: //api.semanticscholar.org/CorpusID:269626531

  70. [70]

    Naphade, Ming-Yu Liu, Xiaodong Yang, Stan Birchfield, Shuo Wang, Ratnesh Kumar, D

    Zheng Tang, Milind R. Naphade, Ming-Yu Liu, Xiaodong Yang, Stan Birchfield, Shuo Wang, Ratnesh Kumar, D. Anastasiu, and Jenq-Neng Hwang. 2019. CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2019), 8789–8798. https://api.semanti...

  71. [71]

    Ultralytics. 2025. Ultralytics YOLOv8. https://github.com/ultralytics/ ultralytics. GitHub repository

  72. [72]

    Blase Ur, Melwyn Pak Yong Ho, Stephen Brawner, Jiyun Lee, Sarah Mennicken, Noah Picard, Diane Schulze, and Michael L. Littman. 2016. Trigger-Action Programming in the Wild: An Analysis of 200,000 IFTTT Recipes.Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems(2016). https://api.semanticscholar.org/ CorpusID:10883440

  73. [73]

    Hanqing Wang, Wenguan Wang, Wei Liang, Caiming Xiong, and Jian- bing Shen. 2021. Structured Scene Memory for Vision-Language Nav- igation.2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2021), 8451–8460. https://api.semanticscholar. org/CorpusID:232135021

  74. [74]

    Jiang Wang, Yuanzheng He, Daobilige Su, Katsutoshi Itoyama, Kazuhiro Nakadai, Junfeng Wu, Shoudong Huang, Youfu Li, and He Kong. 2024. SLAM-Based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization.IEEE Transactions on Robotics40 (2024), 4024–4044. https://api.semanticscholar.org/ CorpusID:270123565

  75. [75]

    Dongming Wu, Wencheng Han, Tiancai Wang, Xingping Dong, Xi- angyu Zhang, and Jianbing Shen. 2023. Referring Multi-Object Track- ing.2023 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR)(2023), 14633–14642. https://api.semanticscholar.org/ CorpusID:257365320

  76. [76]

    Duo Wu, Jinghe Wang, Yuan Meng, Yanning Zhang, Le Sun, and Zhi Wang. 2024. CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning.ArXivabs/2411.16313 (2024). https://api. semanticscholar.org/CorpusID:274234379

  77. [77]

    Minghu Wu, Yeqiang Qian, Chunxiang Wang, and Ming Yang. 2021. A Multi-Camera Vehicle Tracking System based on City-Scale Vehicle Re-ID and Spatial-Temporal Information.2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)(2021), 4072–4081. https://api.semanticscholar.org/CorpusID:235702604

  78. [78]

    Sixu Wu, Haipeng Dai, Linfeng Liu, Lijie Xu, Fu Xiao, and Jia Xu

  79. [79]

    https://api.semanticscholar.org/CorpusID:253343259

    Cooperative Scheduling for Directional Wireless Charging With Spatial Occupation.IEEE Transactions on Mobile Computing23 (2024), 286–301. https://api.semanticscholar.org/CorpusID:253343259

  80. [80]

    Srivastava

    Huatao Xu, Liying Han, Qirui Yang, Mo Li, and Mani B. Srivastava

Showing first 80 references.