arxiv: 2604.05529 · v2 · submitted 2026-04-07 · 💻 cs.AI

Recognition: no theorem link

ActivityEditor: Learning to Synthesize Physically Valid Human Mobility

Chenjie Yang , Yutian Jiang , Anqi Liang , Wei Qi , Chenyu Wu , Junbo Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:38 UTC · model grok-4.3

classification 💻 cs.AI

keywords human mobility modelingtrajectory generationzero-shot transferLLM agentsreinforcement learningphysical constraintsurban simulation

0 comments

The pith

A dual-LLM-agent system learns to revise activity chains so that generated human trajectories obey physical constraints in any new city.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that human mobility trajectories can be synthesized without any local historical data by splitting the task between two LLM agents. One agent draws on demographic information to produce high-level intentions and coarse activity sequences that respect socio-semantic patterns. The second agent then repeatedly edits those sequences using reinforcement learning whose rewards are defined directly from measurable physical rules such as realistic travel times, speeds, and spatial feasibility. A sympathetic reader would care because many cities and applications lack the trajectory records that current data-driven models require, yet still need statistically faithful and physically plausible mobility simulations for planning and prediction.

Core claim

ActivityEditor decomposes trajectory synthesis into an intention-based agent that produces demographic-driven coarse activity chains and an editor agent that iteratively revises them through reinforcement learning with multiple rewards grounded in real-world physical constraints, thereby achieving zero-shot cross-regional generation while preserving high statistical fidelity and physical validity.

What carries the argument

The editor agent, which acquires the capacity to enforce human mobility regularities by training on reinforcement learning rewards derived from physical constraints and then applies iterative revisions to coarse activity chains.

Load-bearing premise

Rewards based on physical constraints are enough to make the editor agent internalize mobility regularities without introducing artifacts or breaking socio-semantic coherence.

What would settle it

In a city never seen during training, generate trajectories with ActivityEditor and measure whether they exhibit statistically lower physical-validity scores or poorer distributional match to real mobility patterns than a simple baseline that ignores physical rewards.

Figures

Figures reproduced from arXiv: 2604.05529 by Anqi Liang, Chenjie Yang, Chenyu Wu, Junbo Zhang, Wei Qi, Yutian Jiang.

**Figure 1.** Figure 1: Overview of the ACTIVITYEDITOR framework. Input Normalization. The agent receives a structured user profile Di , comprising K diverse attributes such as age_range, employment_status, relationship, and primary_activity. Skeleton Generation. Upon receiving Di , the agent initiates a reasoning trace within the thought space to infer the user’s latent intention, denoted as VI . By leveraging its parametric k… view at source ↗

**Figure 2.** Figure 2: Ablation studies on the proposed ActivityEditor. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Activity number distribution by state. You are an Editor Agent for daily schedule refinement. Your task: 1. Check the INITIAL SCHEDULE against 5 constraint types: Physical, Logical, Common Sense, Temporal, and Coherence 2. Identify any violations in the draft 3. Apply edit operations from the action space: ADD / DELETE / SHIFT / REPLACE / SPLIT 4. Output the final constraint-satisfying schedule OUTPUT FORM… view at source ↗

**Figure 4.** Figure 4: Activity type distribution by state. schedule. PERSON PROFILE: {user_profile as JSON} INITIAL SCHEDULE (from unified 2-stage generator): {initial_schedule as JSON} GROUND TRUTH REFERENCE (for comparison): {ground_truth_schedule as JSON} YOUR TASK: Act as the Editor. Check constraints on the INITIAL SCHEDULE, identify violations, and apply edits to match the GROUND TRUTH. OUTPUT FORMAT: [THOUGHT] Constraint… view at source ↗

**Figure 5.** Figure 5: Activity duration by state. 4. Temporal Constraints (Soft): realistic durations 5. Coherence Constraints (Soft): logical transitions, not over-fragmented Applying Edit Operations: - DELETE: activity ’X’ at index N (reason) - ADD: activity ’Y’ at time HH:MM (reason) - SHIFT: activity time adjustment (reason) - REPLACE: activity type change (reason) Final Result: Yes / No [/THOUGHT] [JSON] [refined schedule … view at source ↗

**Figure 6.** Figure 6: Chain length distribution by state. A.4 Detailed Parameter Settings To ensure the reproducibility of our experiments, we provide the complete hyperparameter configurations for both the Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) stages in [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

read the original abstract

Human mobility modeling is indispensable for diverse urban applications. However, existing data-driven methods often suffer from data scarcity, limiting their applicability in regions where historical trajectories are unavailable or restricted. To bridge this gap, we propose \textbf{ActivityEditor}, a novel dual-LLM-agent framework designed for zero-shot cross-regional trajectory generation. Our framework decomposes the complex synthesis task into two collaborative stages. Specifically, an intention-based agent, which leverages demographic-driven priors to generate structured human intentions and coarse activity chains to ensure high-level socio-semantic coherence. These outputs are then refined by editor agent to obtain mobility trajectories through iteratively revisions that enforces human mobility law. This capability is acquired through reinforcement learning with multiple rewards grounded in real-world physical constraints, allowing the agent to internalize mobility regularities and ensure high-fidelity trajectory generation. Extensive experiments demonstrate that \textbf{ActivityEditor} achieves superior zero-shot performance when transferred across diverse urban contexts. It maintains high statistical fidelity and physical validity, providing a robust and highly generalizable solution for mobility simulation in data-scarce scenarios. Our code is available at: https://anonymous.4open.science/r/ActivityEditor-066B.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ActivityEditor splits mobility synthesis into LLM intention generation plus RL editing for physical constraints, but the abstract gives no metrics or reward details so the zero-shot claims stay unproven.

read the letter

The main move here is breaking trajectory generation into two LLM agents: one that builds demographic-driven intentions and coarse chains, then an editor that iteratively revises them via RL rewards based on speed, distance, and similar physical rules. The goal is zero-shot transfer to cities without any local trajectory data. That decomposition is the clearest new piece compared with standard generative models that train end-to-end on available traces. Releasing the code is also useful for anyone who wants to test the setup themselves. The framing of data scarcity in urban mobility applications is straightforward and the physical-constraint angle makes sense as a way to avoid obviously impossible paths. The soft spot is the missing evidence. The abstract asserts better statistical fidelity and physical validity across regions, yet supplies no tables, no baseline numbers, no ablation on the reward weights, and no description of how the editor actually learns the claimed mobility regularities. Without those, it is impossible to tell whether the RL step preserves socio-semantic coherence or simply produces physically valid but behaviorally odd sequences. The stress-test worry about physical rewards alone failing to capture region-varying patterns such as activity durations or trip chaining looks plausible on the current description, since the paper does not derive full mobility laws from the listed constraints. If the full experiments contain clear quantitative comparisons and checks for artifacts, the work would be stronger; right now the central performance claim rests on assertion. This is worth a serious referee for groups doing synthetic mobility data for planning or public-health models. The idea is concrete enough that reviewers can ask for the missing implementation and evaluation details rather than reject outright. I would send it to review.

Referee Report

3 major / 0 minor

Summary. The paper proposes ActivityEditor, a dual-LLM-agent framework for zero-shot cross-regional human mobility trajectory synthesis. An intention agent uses demographic priors to produce socio-semantically coherent activity chains; an editor agent then iteratively revises these trajectories via reinforcement learning whose rewards are grounded in real-world physical constraints (speed, distance, etc.) so that the policy internalizes mobility regularities. The authors claim that the resulting trajectories exhibit superior zero-shot transfer performance across diverse urban contexts while preserving high statistical fidelity and physical validity, with code released at an anonymous repository.

Significance. If the empirical claims are substantiated, the work would offer a practical route to mobility simulation in data-scarce regions by combining LLM priors with physically grounded RL, reducing reliance on region-specific trajectory datasets. The public code release is a clear strength that supports reproducibility and follow-up work.

major comments (3)

Abstract and §4 (Experiments): the abstract asserts 'superior zero-shot performance' and 'high statistical fidelity and physical validity' yet reports no quantitative metrics, no baseline comparisons, and no tables or figures summarizing results. Without these, the central claim that ActivityEditor outperforms existing methods cannot be evaluated.
§3.2 (Editor Agent) and reward design: the framework states that RL rewards grounded only in physical constraints allow the editor to 'internalize human mobility law,' but no derivation is given showing how terms such as speed and distance suffice to reproduce region-varying regularities (activity-duration distributions, trip-chaining dependencies, demographic sequencing). The skeptic concern that physical constraints alone may introduce artifacts or fail to preserve socio-semantic coherence from the intention agent therefore remains unaddressed.
§4 (Evaluation protocol): the zero-shot transfer claim requires explicit description of how statistical fidelity is measured (e.g., which distributions are compared), how physical validity is quantified, and which baselines (data-driven or LLM-only) are used. Absence of these details makes it impossible to assess whether the reported superiority is robust or merely an artifact of the chosen metrics.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to improve clarity, completeness, and substantiation of our claims.

read point-by-point responses

Referee: [—] Abstract and §4 (Experiments): the abstract asserts 'superior zero-shot performance' and 'high statistical fidelity and physical validity' yet reports no quantitative metrics, no baseline comparisons, and no tables or figures summarizing results. Without these, the central claim that ActivityEditor outperforms existing methods cannot be evaluated.

Authors: We agree that the abstract would benefit from including key quantitative results to immediately support the claims. The experiments in §4 include comparisons against baselines with tables and figures, but these are not summarized in the abstract. We will revise the abstract to report the main performance metrics (e.g., improvements in statistical fidelity and physical validity scores) and add a concise results overview at the start of §4. revision: partial
Referee: [—] §3.2 (Editor Agent) and reward design: the framework states that RL rewards grounded only in physical constraints allow the editor to 'internalize human mobility law,' but no derivation is given showing how terms such as speed and distance suffice to reproduce region-varying regularities (activity-duration distributions, trip-chaining dependencies, demographic sequencing). The skeptic concern that physical constraints alone may introduce artifacts or fail to preserve socio-semantic coherence from the intention agent therefore remains unaddressed.

Authors: We appreciate this concern regarding the sufficiency of physical rewards. The design relies on RL allowing the editor to discover mobility regularities through constraint satisfaction, while the intention agent provides the socio-semantic starting point. To address potential artifacts or coherence loss, we will add an explanation of the reward terms in §3.2, including why physical constraints help capture higher-order patterns, and include empirical checks showing preservation of activity types and demographics from the intention agent. revision: partial
Referee: [—] §4 (Evaluation protocol): the zero-shot transfer claim requires explicit description of how statistical fidelity is measured (e.g., which distributions are compared), how physical validity is quantified, and which baselines (data-driven or LLM-only) are used. Absence of these details makes it impossible to assess whether the reported superiority is robust or merely an artifact of the chosen metrics.

Authors: We agree that the evaluation protocol needs to be stated more explicitly. We will expand §4 with a dedicated subsection detailing the statistical fidelity metrics (e.g., distribution comparisons for activity durations, trip lengths, and sequencing), physical validity quantification (constraint violation rates and feasibility checks), the full list of baselines (including data-driven and LLM-only variants), and the precise zero-shot cross-regional protocol. This will make the superiority claims easier to evaluate. revision: yes

Circularity Check

0 steps flagged

No significant circularity: RL training on external physical constraints is independent of target outputs

full rationale

The paper's core derivation decomposes synthesis into an intention agent (demographic priors for chains) followed by an editor agent trained via RL whose rewards are defined from real-world physical constraints (speed, distance, etc.). This training process is presented as enabling the agent to internalize regularities, with zero-shot transfer evaluated on held-out urban contexts. No equations, fitted parameters, or self-citations are shown that would make the learned policy equivalent to the evaluation statistics by construction. The mechanism relies on external constraint definitions rather than re-expressing target data patterns, rendering the chain self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

Ledger populated from abstract only; full paper would likely reveal additional hyperparameters and reward formulations.

free parameters (1)

RL reward coefficients
Multiple rewards for physical constraints are combined; their relative weights are not specified in the abstract and are presumed tuned.

axioms (2)

domain assumption Large language models can reliably translate demographic priors into socio-semantically coherent human intentions and activity chains.
Invoked for the intention-based agent stage.
domain assumption Iterative LLM revisions guided by RL can enforce real-world mobility laws without external simulation engines.
Central to the editor agent's operation.

invented entities (1)

ActivityEditor dual-LLM-agent framework no independent evidence
purpose: Decompose and solve zero-shot trajectory synthesis
New architecture introduced by the paper.

pith-pipeline@v0.9.0 · 5514 in / 1392 out tokens · 51714 ms · 2026-05-10T18:38:52.331503+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 4 canonical work pages

[1]

Guanting Dong, Yifei Chen, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Yutao Zhu, Hangyu Mao, Guorui Zhou, Zhicheng Dou, and Ji-Rong Wen

Generating individual travel diaries using large language models informed by census and land-use data.arXiv preprint arXiv:2509.09710. Guanting Dong, Yifei Chen, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Yutao Zhu, Hangyu Mao, Guorui Zhou, Zhicheng Dou, and Ji-Rong Wen. 2025. Tool-star: Empowering llm-brained multi-tool rea- soner via reinforcement learning.ar...

work page arXiv 2025
[2]

Chenyang Shao, Fengli Xu, Bingbing Fan, Jingtao Ding, Yuan Yuan, Meng Wang, and Yong Li

Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems, 36:53728–53741. Chenyang Shao, Fengli Xu, Bingbing Fan, Jingtao Ding, Yuan Yuan, Meng Wang, and Yong Li. 2024. Chain-of-planned-behaviour workflow elicits few- shot mobility generation in llms.arXiv preprint arXiv:2402.09836. X...

work page arXiv 2024
[3]

InProceedings of the twenty-fifth inter- national joint conference on artificial intelligence, pages 2618–2624

Deeptransport: Prediction and simulation of human mobility and transportation mode at a city- wide level. InProceedings of the twenty-fifth inter- national joint conference on artificial intelligence, pages 2618–2624. Ke Sun, Tieyun Qian, Tong Chen, Yile Liang, Quoc Viet Hung Nguyen, and Hongzhi Yin. 2020. Where to go next: Modeling long-and short-term us...

2020
[4]

Huandong Wang, Qizhong Zhang, Yuchen Wu, Depeng Jin, Xing Wang, Lin Zhu, and Li Yu

Tool4poi: A tool-augmented llm frame- work for next poi recommendation.arXiv preprint arXiv:2511.06405. Huandong Wang, Qizhong Zhang, Yuchen Wu, Depeng Jin, Xing Wang, Lin Zhu, and Li Yu. 2023a. Synthe- sizing human trajectories based on variational point processes.IEEE Transactions on Knowledge and Data Engineering, 36(4):1785–1799. Jiawei Wang, Renhe Ji...

work page arXiv 2024
[5]

A Reward Tier Definitions Read-only tools: get_user_details, get_reservation_details, search_direct_flight, search_onestop_flight, list_all_airports,calculate

Comapoi: A collaborative multi-agent frame- work for next poi prediction bridging the gap between trajectory and language. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1768–1778. Yifei Zhou, Song Jiang, Yuandong Tian, Jason We- ston, Sergey Levine, Sainbayar Sukhbaatar, and Xian L...

work page arXiv 2025
[7]

Identify any violations in the draft
[8]

Apply edit operations from the action space: ADD / DELETE / SHIFT / REPLACE / SPLIT
[9]

Output the final constraint-satisfying schedule OUTPUT FORMAT: [THOUGHT] Constraint Checking:
[11]

Logical Constraints (Hard): starts/ends at home, starts at 00:00, consecutive identical activities must be merged
[12]

Common Sense Constraints (Soft): activities match socio-demographic profile
[13]

Temporal Constraints (Soft): realistic durations for each activity type
[14]

You understand human behavior patterns and generate realistic daily schedules

Coherence Constraints (Soft): logical transitions, not over-fragmented Edit Operations Applied: - ADD: activity ’Y’ at time HH:MM (reason) - DELETE: activity ’X’ at index N (reason) - SHIFT: activity ’Z’ start/end time adjustment (reason) - REPLACE: activity ’A’ -> ’B’ (reason) - SPLIT: activity ’X’ divided into two segments (reason) Final Result: All con...
[15]

Physical (Hard): overlaps? 24h coverage? ends at 24:00? -> Yes / No
[16]

Logical (Hard): starts/ends home? starts at 00:00? -> Yes / No
[17]

Common Sense (Soft): activities match profile? -> Yes / No
[18]

Temporal (Soft): realistic durations? -> Yes / No
[19]

Show edit operations explicitly if violations are found

Coherence (Soft): logical flow? not fragmented? -> Yes / No Edits to Match Ground Truth: If already matches: No edits needed Otherwise list each operation: - DELETE: ’X’ at idx N (reason) - ADD: ’Y’ at time HH:MM (reason) - SHIFT: ’Z’ time adjustment (reason) - REPLACE: ’A’ -> ’B’ (reason) Final Result: All constraints satisfied after edits? Yes / No [/TH...
[20]

Check the INITIAL SCHEDULE against 5 constraint types: Physical, Logical, Common Sense, Temporal, and Coherence
[21]

Identify any violations
[22]

Apply edit operations: ADD / DELETE / SHIFT / REPLACE
[23]

Output the refined schedule OUTPUT FORMAT: [THOUGHT] Constraint Checking:
[24]

Physical Constraints (Hard): overlaps, 24 h coverage, ends at 24:00
[25]

Logical Constraints (Hard): starts/ends at home, starts at 00:00
[26]

Common Sense Constraints (Soft): age/ employment appropriate activities Figure 5: Activity duration by state
[27]

Temporal Constraints (Soft): realistic durations
[28]

activity

Coherence Constraints (Soft): logical transitions, not over-fragmented Applying Edit Operations: - DELETE: activity ’X’ at index N (reason) - ADD: activity ’Y’ at time HH:MM (reason) - SHIFT: activity time adjustment (reason) - REPLACE: activity type change (reason) Final Result: Yes / No [/THOUGHT] [JSON] [refined schedule as JSON array] [/JSON] USER MES...