Agent4POI: Agentic Context-Conditioned Affordance Reasoning for Multimodal Point-of-Interest Recommendation

Jinze Wang; Lu Zhang; Tiehua Zhang; Xingjun Ma; Yangchen Zeng; Yongchao Liu; Yuze Liu; Zhu Sun

arxiv: 2605.15203 · v1 · pith:BKDHWMF2new · submitted 2026-04-03 · 💻 cs.IR · cs.AI· cs.MA

Agent4POI: Agentic Context-Conditioned Affordance Reasoning for Multimodal Point-of-Interest Recommendation

Jinze Wang , Yangchen Zeng , Tiehua Zhang , Lu Zhang , Yuze Liu , Yongchao Liu , Xingjun Ma , Zhu Sun This is my paper

Pith reviewed 2026-05-19 17:31 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.MA

keywords POI recommendationmultimodal representationsLLM agentaffordance reasoningcontext-conditioned rankingchain-of-thought reasoningcold-startcontext shift

0 comments

The pith

No pre-computed encoder can satisfy context-sensitive POI ranking under bilinear scoring, so Agent4POI generates dynamic affordance representations at recommendation time instead.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that static multimodal embeddings for points of interest lock in representations before context is known, which prevents correct ranking when the same location affords different uses depending on time, companions, or goals. It proves formally that no fixed encoder can produce the required context-sensitive scores under standard bilinear models. Agent4POI therefore runs a four-phase LLM agent at query time to build fresh, uncertainty-aware affordance representations from images, reviews, and metadata. The system records large gains on standard benchmarks plus markedly smaller drops when context shifts, and it continues to work in cold-start settings where ID-based methods collapse.

Core claim

Agent4POI inverts the usual computation: given a situational context, a four-phase LLM agent first produces dynamic affordance queries, then runs a five-step cross-modal chain-of-thought over image, review, and metadata evidence, assembles an uncertainty-adjusted affordance vector grounded in Gibsonian theory, and finally aligns it with user preferences through semantic caching for low-latency ranking.

What carries the argument

Four-phase LLM agent that executes five-step cross-modal chain-of-thought reasoning to produce uncertainty-aware affordance representations from multimodal POI evidence.

If this is right

Agent4POI records a 23.2 percent relative gain over the strongest baseline across three POI benchmarks.
Performance degrades by only 7.5 percent under context-shift conditions while the strongest baselines degrade 16-17 percent.
In cold-start scenarios the method outperforms the best content-based baseline by up to 2.4 times.
ID-based methods fail to generalize when new POIs appear, whereas the context-conditioned representations continue to function.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same inference-time reasoning pattern could be tested on other recommendation domains where context changes rapidly, such as news or session-based product suggestions.
Uncertainty-aware affordance vectors may offer a route to more robust multimodal systems even when full LLM agents are replaced by lighter modules.
Explicit grounding in Gibsonian affordance theory opens the possibility of transferring the framework to embodied agents that must decide what an object affords in a given physical setting.

Load-bearing premise

The four-phase LLM agent can reliably generate accurate uncertainty-aware affordance representations through five-step cross-modal chain-of-thought reasoning over image, review, and metadata evidence.

What would settle it

An experiment in which the agent-generated representations produce no reduction in performance drop under controlled context shifts (or yield rankings no better than the strongest static baseline) would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.15203 by Jinze Wang, Lu Zhang, Tiehua Zhang, Xingjun Ma, Yangchen Zeng, Yongchao Liu, Yuze Liu, Zhu Sun.

**Figure 1.** Figure 1: Agent4POI four-phase inference pipeline. Unlike prior methods that encode each POI into a fixed vector before any [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

read the original abstract

We introduce Agent4POI, the first POI recommendation framework that generates context-conditioned multimodal representations at recommendation time, rather than relying on static POI embeddings pre-computed independently of context. Existing multimodal systems encode each POI once as a static embedding, a design that precludes reasoning about why the same cafe affords solo work on Monday but group celebration on Friday evening. We formally prove that no pre-computed encoder can satisfy context-sensitive ranking under standard bilinear scoring, motivating inference-time item-side representation. Agent4POI inverts this computation: given a situational context, a four-phase LLM agent generates dynamic, context-specific affordance queries (Phase 1) and executes a five-step cross-modal chain-of-thought over image, review, and metadata evidence (Phase 2). The resulting uncertainty-aware affordance representation is grounded in Gibsonian affordance theory. These cross-modal verdicts form a structured, uncertainty-adjusted affordance representation (Phase 3), which is aligned with user preferences via a semantic caching system for low-latency ranking (Phase 4). On three POI benchmarks and three evaluation configurations (standard, cold-start, context-shift), Agent4POI achieves a 23.2% relative gain over the strongest baseline and degrades by only 7.5% under context-shift versus 16--17\% for the strongest baselines. In cold-start scenarios, Agent4POI outperforms the best content-based baseline by up to 2.4x, whereas ID-based methods fail to generalize.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Agent4POI builds an LLM-agent pipeline for on-the-fly context-conditioned multimodal POI representations and claims a proof against static encoders, but the proof may miss common context-augmented user models.

read the letter

The punchline here is that the paper builds an agentic system to generate context-specific multimodal POI representations on the fly using LLM reasoning, claims a formal proof that static encoders can't handle this under bilinear scoring, and reports solid gains especially in shifting contexts. What is new is the four-phase agent that first creates affordance queries from context, then runs a five-step cross-modal CoT over images, reviews, and metadata to build uncertainty-aware reps based on Gibsonian ideas. Phase 3 structures them and phase 4 uses semantic caching for efficient ranking. This inverts the usual precompute approach. The results on three benchmarks show a 23.2% relative improvement and only 7.5% drop under context-shift compared to 16-17% for baselines, plus strong cold-start performance. The paper does well in spelling out why the same POI can afford different things depending on situation, and in testing robustness to context changes. That part feels grounded in a real limitation of current systems. The main soft spot is the theoretical claim. The proof targets pre-computed encoders, but it may not cover the common case where context augments the user representation while items stay static. If that's true, then many existing context-aware recommenders already sidestep the issue without needing full dynamic item reps at inference. The weakest link in practice is whether the LLM agent reliably produces accurate affordance verdicts every time. Even with caching, this could add latency or inconsistency not fully explored. This work is for researchers in POI recommendation and multimodal recsys who want to explore agentic inference-time methods. A reader focused on practical context sensitivity would get useful ideas from the pipeline and the shift experiments. It has enough formal and empirical content to merit a serious referee, though the proof scope and LLM reliability will be key points to probe. Recommendation: Yes, send it for peer review.

Referee Report

1 major / 2 minor

Summary. The paper introduces Agent4POI, the first POI recommendation framework that generates context-conditioned multimodal representations at recommendation time using a four-phase LLM agent grounded in Gibsonian affordance theory. It formally proves that no pre-computed encoder can satisfy context-sensitive ranking under standard bilinear scoring, motivating inference-time dynamic item representations via affordance queries, five-step cross-modal chain-of-thought reasoning, uncertainty-adjusted representations, and semantic caching. On three POI benchmarks across standard, cold-start, and context-shift settings, it reports a 23.2% relative gain over the strongest baseline, only 7.5% degradation under context-shift (vs. 16-17% for baselines), and up to 2.4x improvement over content-based baselines in cold-start.

Significance. If the theoretical result is corrected and the LLM agent reliably produces accurate uncertainty-aware affordance representations, the work could meaningfully advance multimodal recommendation by demonstrating the value of inference-time context-conditioned item representations over static pre-computed embeddings, particularly in context-sensitive domains like POI. The explicit grounding in affordance theory and the structured four-phase agent design provide a clear conceptual contribution.

major comments (1)

[Theoretical section] Theoretical section (formal proof): The claim that no pre-computed encoder can satisfy context-sensitive ranking under standard bilinear scoring assumes context-independent embeddings on both user and item sides. This overlooks the standard construction where the user representation is context-augmented (u = f(user, context)) while the item embedding i remains static and pre-computed; bilinear scoring u · i then supports context sensitivity without requiring dynamic item representations at inference time. This distinction is load-bearing for the motivation of the agentic approach and requires explicit addressing or revision of the proof.

minor comments (2)

Abstract and results: The reported 23.2% relative gain should explicitly name the strongest baseline, the primary metric (e.g., NDCG@K or Recall@K), and the exact evaluation configuration to allow direct verification.
Phase 2 description: The five-step cross-modal chain-of-thought reasoning over image, review, and metadata would benefit from a concrete example or pseudocode showing how uncertainty is quantified and propagated into the final affordance representation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback, which helps clarify the theoretical motivation in our work. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: The claim that no pre-computed encoder can satisfy context-sensitive ranking under standard bilinear scoring assumes context-independent embeddings on both user and item sides. This overlooks the standard construction where the user representation is context-augmented (u = f(user, context)) while the item embedding i remains static and pre-computed; bilinear scoring u · i then supports context sensitivity without requiring dynamic item representations at inference time. This distinction is load-bearing for the motivation of the agentic approach and requires explicit addressing or revision of the proof.

Authors: We appreciate the referee pointing out this key distinction in the assumptions underlying our formal proof. The proof as presented in the theoretical section does assume pre-computed, context-independent embeddings for both users and items under standard bilinear scoring. We agree that context-augmented user representations paired with static item embeddings can enable context sensitivity in the scoring function. However, for multimodal POI recommendation, the core challenge lies in capturing context-dependent affordances on the item side (e.g., how the same POI's visual and textual features afford different activities under varying situational contexts), which static pre-computed item embeddings cannot fully address even with user-side conditioning. This is particularly relevant for uncertainty-aware reasoning and cross-modal evidence integration. We will revise the theoretical section to explicitly articulate the proof's assumptions, acknowledge the user-context construction, and strengthen the motivation by discussing why inference-time dynamic item representations provide complementary benefits in affordance grounding and robustness to context shifts, without claiming impossibility for all pre-computed approaches. revision: yes

Circularity Check

0 steps flagged

No significant circularity; formal proof and empirical claims remain independent of inputs

full rationale

The paper presents a formal proof that no pre-computed encoder satisfies context-sensitive ranking under standard bilinear scoring as an independent mathematical step motivating inference-time representations. This does not reduce to the empirical results or self-citations by construction. Performance gains (23.2% relative improvement, 7.5% degradation under context-shift) are reported against external baselines on three benchmarks without renaming fitted parameters as predictions. The affordance construction is grounded in Gibsonian theory and cross-modal CoT without load-bearing self-citations or ansatz smuggling. The derivation chain is self-contained against external benchmarks, with no equations shown to equal their inputs tautologically.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so specific free parameters, axioms, and invented entities cannot be exhaustively identified; the approach relies on LLM reasoning capabilities and Gibsonian affordance theory as background assumptions without further detail.

axioms (1)

domain assumption Gibsonian affordance theory can be operationalized for multimodal POI evidence via LLM chain-of-thought
Representations are explicitly grounded in this theory per the abstract.

pith-pipeline@v0.9.0 · 5833 in / 1339 out tokens · 74848 ms · 2026-05-19T17:31:31.145060+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 3 internal anchors

[1]

Paola Ardón, Èric Pairet, Katrin S Lohan, Subramanian Ramamoorthy, and Ronald Petrick. 2020. Affordances in robotic tasks–a survey.arXiv preprint arXiv:2004.07400(2020)

work page arXiv 2020
[2]

Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yanchen Luo, Chong Chen, Fuli Feng, and Qi Tian. 2025. A bi-step grounding paradigm for large language models in recommendation systems.ACM Transactions on Recommender Systems3, 4 (2025), 1–27

work page 2025
[3]

Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. TALLRec: An effective and efficient tuning framework to align large language model with recommendation. InProceedings of the 17th ACM Conference on Recommender Systems. 1007–1014

work page 2023
[4]

Ramesh Baral, XiaoLong Zhu, SS Iyengar, and Tao Li. 2018. REEL: Review aware explanation of location recommendation. InProceedings of the 26th Conference on User Modeling, Adaptation and Personalization. 23–32

work page 2018
[5]

William W Gaver. 1991. Technology affordances. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems. 79–84

work page 1991
[6]

James J Gibson. 1977. The theory of affordances.Hilldale, USA1, 2 (1977), 67–82

work page 1977
[7]

James G Greeno. 1994. Gibson’s affordances. (1994)

work page 1994
[8]

Chao Hao, Shuai Wang, and Kaiwen Zhou. 2025. Uncertainty-aware GUI agent: Adaptive perception through component recommendation and human-in-the- loop refinement.arXiv preprint arXiv:2508.04025(2025)

work page arXiv 2025
[9]

Siyuan Huang, Jiahui Jin, Xin Lin, Xigang Sun, and Yukun Ban. 2025. IM-POI: Bridging ID and Multi-modal Gaps in Next POI Recommendation. InProceedings of the 33rd ACM International Conference on Multimedia. 5979–5987

work page 2025
[10]

Theis Jendal, Mads Corfixen, Magnus Olesen, Peter Dolog, Katja Hose, Daniele Dell’Aglio, and Matteo Lissandrini. 2025. The Yelp Collaborative Knowledge Graph. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6414–6419

work page 2025
[11]

affordances

Harold S Jenkins. 2008. Gibson’s “affordances”: evolution of a pivotal concept. Journal of Scientific Psychology12, 2008 (2008), 34–45

work page 2008
[12]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In2018 IEEE International Conference on Data Mining (ICDM). IEEE, 197–206

work page 2018
[13]

Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, et al

work page
[14]

InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

From generation to judgment: Opportunities and challenges of LLM-as- a-Judge. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2757–2791

work page 2025
[15]

Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, and Yiqun Liu. 2024. LLMs-as-Judges: a comprehensive survey on LLM-based evaluation methods.arXiv preprint arXiv:2412.05579(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[16]

Jinming Li, Wentao Zhang, Tian Wang, Guanglei Xiong, Alan Lu, and Gerard Medioni. 2023. GPT4Rec: A generative framework for personalized recommen- dation and user interests interpretation.arXiv preprint arXiv:2304.03879(2023)

work page arXiv 2023
[17]

Peibo Li, Maarten de Rijke, Hao Xue, Shuang Ao, Yang Song, and Flora D Salim

work page
[18]

In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Large language models for next point-of-interest recommendation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1463–1472

work page
[19]

Jianxin Liao, Tongcun Liu, Hongzhi Yin, Tong Chen, Jingyu Wang, and Yulong Wang. 2021. An integrated model based on deep multimodal and rank learning for point-of-interest recommendation.World Wide Web24, 2 (2021), 631–655

work page 2021
[20]

Jiahao Liu, Xueshuo Yan, Dongsheng Li, Guangping Zhang, Hansu Gu, Peng Zhang, Tun Lu, Li Shang, and Ning Gu. 2025. Improving LLM-powered recommen- dations with personalized information. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2560–2565

work page 2025
[21]

Yuqing Liu, Yu Wang, Lichao Sun, and Philip S Yu. 2024. Rec-GPT4V: Mul- timodal recommendation with large vision-language models.arXiv preprint arXiv:2402.08670(2024)

work page arXiv 2024
[22]

Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. 2024. Representation learning with large language models for recommendation. InProceedings of the ACM Web Conference 2024. 3464–3475

work page 2024
[23]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme

work page
[24]

BPR: Bayesian personalized ranking from implicit feedback.arXiv preprint arXiv:1205.2618(2012)

work page internal anchor Pith review Pith/arXiv arXiv 2012
[25]

Pablo Sánchez, Alejandro Bellogin, and José L Jorro-Aragoneses. 2025. Context Trails: A Dataset to Study Contextual and Route Recommendation. InProceedings of the Nineteenth ACM Conference on Recommender Systems. 716–725

work page 2025
[26]

Maya Sappelli, Suzan Verberne, and Wessel Kraaij. 2013. Recommending person- alized touristic sights using Google Places. InProceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 781–784

work page 2013
[27]

Zhu Sun, Jie Yang, Jie Zhang, Alessandro Bozzon, Long-Kai Huang, and Chi Xu

work page
[28]

In Proceedings of the 12th ACM Conference on Recommender Systems

Recurrent knowledge graph embedding for effective recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems. 297–305

work page
[29]

Jinze Wang, Lu Zhang, Yiyang Cui, Tiehua Zhang, Zhishu Shen, Yuze Liu, Xingjun Ma, and Jiong Jin. 2025. Do we really need SFT? Prompt-as-policy over knowledge graphs for cold-start next POI recommendation.arXiv preprint arXiv:2510.08012 (2025)

work page arXiv 2025
[30]

Jinze Wang, Lu Zhang, Zhu Sun, and Yew-Soon Ong. 2023. Meta-learning en- hanced next POI recommendation by leveraging check-ins from auxiliary cities. InPacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 322–334

work page 2023
[31]

Jinze Wang, Tiehua Zhang, Lu Zhang, Yang Bai, Xin Li, and Jiong Jin. 2025. HyperMAN: Hypergraph-enhanced Meta-learning Adaptive Network for Next POI Recommendation. In2025 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6

work page 2025
[32]

Zhaobo Wang, Yanmin Zhu, Haobing Liu, and Chunyang Wang. 2022. Learn- ing graph-based disentangled representations for next POI recommendation. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1154–1163

work page 2022
[33]

Yuqian Wu, Yuhong Peng, Jiapeng Yu, and Raymond Lee. 2025. MAS4POI: a multi-agents collaboration system for next POI recommendation. InPacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 356–367

work page 2025
[34]

Yang Xu, Gao Cong, Lei Zhu, and Lizhen Cui. 2024. MMPOI: A multi-modal content-aware framework for POI recommendations. InProceedings of the ACM Web Conference 2024. 3454–3463

work page 2024
[35]

Song Yang, Jiamou Liu, and Kaiqi Zhao. 2022. GetNext: trajectory flow map enhanced transformer for next POI recommendation. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1144–1153

work page 2022
[36]

Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, and Lijuan Wang. 2023. The dawn of LMMs: Preliminary explorations with GPT-4V (ision).arXiv preprint arXiv:2309.17421(2023). Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Trovato et al

work page internal anchor Pith review Pith/arXiv arXiv 2023
[37]

Mao Ye, Peifeng Yin, and Wang-Chien Lee. 2010. Location recommendation for location-based social networks. InProceedings of the 18th SIGSPATIAL Interna- tional Conference on Advances in Geographic Information Systems. 458–461

work page 2010
[38]

Hongzhi Yin, Weiqing Wang, Hao Wang, Ling Chen, and Xiaofang Zhou. 2017. Spatial-aware hierarchical collaborative deep learning for POI recommendation. IEEE Transactions on Knowledge and Data Engineering29, 11 (2017), 2537–2551

work page 2017
[39]

Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, and Ji-Rong Wen. 2024. AgentCF: Collaborative learning with autonomous language agents for recommender systems. InProceedings of the ACM Web Conference 2024. 3679–3689

work page 2024
[40]

Pengpeng Zhao, Anjing Luo, Yanchi Liu, Jiajie Xu, Zhixu Li, Fuzhen Zhuang, Victor S Sheng, and Xiaofang Zhou. 2020. Where to go next: A spatio-temporal gated network for next POI recommendation.IEEE Transactions on Knowledge and Data Engineering34, 5 (2020), 2512–2524

work page 2020

[1] [1]

Paola Ardón, Èric Pairet, Katrin S Lohan, Subramanian Ramamoorthy, and Ronald Petrick. 2020. Affordances in robotic tasks–a survey.arXiv preprint arXiv:2004.07400(2020)

work page arXiv 2020

[2] [2]

Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yanchen Luo, Chong Chen, Fuli Feng, and Qi Tian. 2025. A bi-step grounding paradigm for large language models in recommendation systems.ACM Transactions on Recommender Systems3, 4 (2025), 1–27

work page 2025

[3] [3]

Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. TALLRec: An effective and efficient tuning framework to align large language model with recommendation. InProceedings of the 17th ACM Conference on Recommender Systems. 1007–1014

work page 2023

[4] [4]

Ramesh Baral, XiaoLong Zhu, SS Iyengar, and Tao Li. 2018. REEL: Review aware explanation of location recommendation. InProceedings of the 26th Conference on User Modeling, Adaptation and Personalization. 23–32

work page 2018

[5] [5]

William W Gaver. 1991. Technology affordances. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems. 79–84

work page 1991

[6] [6]

James J Gibson. 1977. The theory of affordances.Hilldale, USA1, 2 (1977), 67–82

work page 1977

[7] [7]

James G Greeno. 1994. Gibson’s affordances. (1994)

work page 1994

[8] [8]

Chao Hao, Shuai Wang, and Kaiwen Zhou. 2025. Uncertainty-aware GUI agent: Adaptive perception through component recommendation and human-in-the- loop refinement.arXiv preprint arXiv:2508.04025(2025)

work page arXiv 2025

[9] [9]

Siyuan Huang, Jiahui Jin, Xin Lin, Xigang Sun, and Yukun Ban. 2025. IM-POI: Bridging ID and Multi-modal Gaps in Next POI Recommendation. InProceedings of the 33rd ACM International Conference on Multimedia. 5979–5987

work page 2025

[10] [10]

Theis Jendal, Mads Corfixen, Magnus Olesen, Peter Dolog, Katja Hose, Daniele Dell’Aglio, and Matteo Lissandrini. 2025. The Yelp Collaborative Knowledge Graph. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6414–6419

work page 2025

[11] [11]

affordances

Harold S Jenkins. 2008. Gibson’s “affordances”: evolution of a pivotal concept. Journal of Scientific Psychology12, 2008 (2008), 34–45

work page 2008

[12] [12]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In2018 IEEE International Conference on Data Mining (ICDM). IEEE, 197–206

work page 2018

[13] [13]

Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, et al

work page

[14] [14]

InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

From generation to judgment: Opportunities and challenges of LLM-as- a-Judge. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2757–2791

work page 2025

[15] [15]

Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, and Yiqun Liu. 2024. LLMs-as-Judges: a comprehensive survey on LLM-based evaluation methods.arXiv preprint arXiv:2412.05579(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[16] [16]

Jinming Li, Wentao Zhang, Tian Wang, Guanglei Xiong, Alan Lu, and Gerard Medioni. 2023. GPT4Rec: A generative framework for personalized recommen- dation and user interests interpretation.arXiv preprint arXiv:2304.03879(2023)

work page arXiv 2023

[17] [17]

Peibo Li, Maarten de Rijke, Hao Xue, Shuang Ao, Yang Song, and Flora D Salim

work page

[18] [18]

In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Large language models for next point-of-interest recommendation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1463–1472

work page

[19] [19]

Jianxin Liao, Tongcun Liu, Hongzhi Yin, Tong Chen, Jingyu Wang, and Yulong Wang. 2021. An integrated model based on deep multimodal and rank learning for point-of-interest recommendation.World Wide Web24, 2 (2021), 631–655

work page 2021

[20] [20]

Jiahao Liu, Xueshuo Yan, Dongsheng Li, Guangping Zhang, Hansu Gu, Peng Zhang, Tun Lu, Li Shang, and Ning Gu. 2025. Improving LLM-powered recommen- dations with personalized information. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2560–2565

work page 2025

[21] [21]

Yuqing Liu, Yu Wang, Lichao Sun, and Philip S Yu. 2024. Rec-GPT4V: Mul- timodal recommendation with large vision-language models.arXiv preprint arXiv:2402.08670(2024)

work page arXiv 2024

[22] [22]

Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. 2024. Representation learning with large language models for recommendation. InProceedings of the ACM Web Conference 2024. 3464–3475

work page 2024

[23] [23]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme

work page

[24] [24]

BPR: Bayesian personalized ranking from implicit feedback.arXiv preprint arXiv:1205.2618(2012)

work page internal anchor Pith review Pith/arXiv arXiv 2012

[25] [25]

Pablo Sánchez, Alejandro Bellogin, and José L Jorro-Aragoneses. 2025. Context Trails: A Dataset to Study Contextual and Route Recommendation. InProceedings of the Nineteenth ACM Conference on Recommender Systems. 716–725

work page 2025

[26] [26]

Maya Sappelli, Suzan Verberne, and Wessel Kraaij. 2013. Recommending person- alized touristic sights using Google Places. InProceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 781–784

work page 2013

[27] [27]

Zhu Sun, Jie Yang, Jie Zhang, Alessandro Bozzon, Long-Kai Huang, and Chi Xu

work page

[28] [28]

In Proceedings of the 12th ACM Conference on Recommender Systems

Recurrent knowledge graph embedding for effective recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems. 297–305

work page

[29] [29]

Jinze Wang, Lu Zhang, Yiyang Cui, Tiehua Zhang, Zhishu Shen, Yuze Liu, Xingjun Ma, and Jiong Jin. 2025. Do we really need SFT? Prompt-as-policy over knowledge graphs for cold-start next POI recommendation.arXiv preprint arXiv:2510.08012 (2025)

work page arXiv 2025

[30] [30]

Jinze Wang, Lu Zhang, Zhu Sun, and Yew-Soon Ong. 2023. Meta-learning en- hanced next POI recommendation by leveraging check-ins from auxiliary cities. InPacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 322–334

work page 2023

[31] [31]

Jinze Wang, Tiehua Zhang, Lu Zhang, Yang Bai, Xin Li, and Jiong Jin. 2025. HyperMAN: Hypergraph-enhanced Meta-learning Adaptive Network for Next POI Recommendation. In2025 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6

work page 2025

[32] [32]

Zhaobo Wang, Yanmin Zhu, Haobing Liu, and Chunyang Wang. 2022. Learn- ing graph-based disentangled representations for next POI recommendation. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1154–1163

work page 2022

[33] [33]

Yuqian Wu, Yuhong Peng, Jiapeng Yu, and Raymond Lee. 2025. MAS4POI: a multi-agents collaboration system for next POI recommendation. InPacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 356–367

work page 2025

[34] [34]

Yang Xu, Gao Cong, Lei Zhu, and Lizhen Cui. 2024. MMPOI: A multi-modal content-aware framework for POI recommendations. InProceedings of the ACM Web Conference 2024. 3454–3463

work page 2024

[35] [35]

Song Yang, Jiamou Liu, and Kaiqi Zhao. 2022. GetNext: trajectory flow map enhanced transformer for next POI recommendation. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1144–1153

work page 2022

[36] [36]

Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, and Lijuan Wang. 2023. The dawn of LMMs: Preliminary explorations with GPT-4V (ision).arXiv preprint arXiv:2309.17421(2023). Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Trovato et al

work page internal anchor Pith review Pith/arXiv arXiv 2023

[37] [37]

Mao Ye, Peifeng Yin, and Wang-Chien Lee. 2010. Location recommendation for location-based social networks. InProceedings of the 18th SIGSPATIAL Interna- tional Conference on Advances in Geographic Information Systems. 458–461

work page 2010

[38] [38]

Hongzhi Yin, Weiqing Wang, Hao Wang, Ling Chen, and Xiaofang Zhou. 2017. Spatial-aware hierarchical collaborative deep learning for POI recommendation. IEEE Transactions on Knowledge and Data Engineering29, 11 (2017), 2537–2551

work page 2017

[39] [39]

Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, and Ji-Rong Wen. 2024. AgentCF: Collaborative learning with autonomous language agents for recommender systems. InProceedings of the ACM Web Conference 2024. 3679–3689

work page 2024

[40] [40]

Pengpeng Zhao, Anjing Luo, Yanchi Liu, Jiajie Xu, Zhixu Li, Fuzhen Zhuang, Victor S Sheng, and Xiaofang Zhou. 2020. Where to go next: A spatio-temporal gated network for next POI recommendation.IEEE Transactions on Knowledge and Data Engineering34, 5 (2020), 2512–2524

work page 2020