SessionIntentBench: A Multi-task Inter-session Intention-shift Modeling Benchmark for E-commerce Customer Behavior Understanding

· 2025 · cs.CL · arXiv 2507.20185

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Session history is a common way of recording user interacting behaviors throughout a browsing activity with multiple products. For example, if an user clicks a product webpage and then leaves, it might because there are certain features that don't satisfy the user, which serve as an important indicator of on-the-spot user preferences. However, all prior works fail to capture and model customer intention effectively because insufficient information exploitation and only apparent information like descriptions and titles are used. There is also a lack of data and corresponding benchmark for explicitly modeling intention in E-commerce product purchase sessions. To address these issues, we introduce the concept of an intention tree and propose a dataset curation pipeline. Together, we construct a sibling multimodal benchmark, SessionIntentBench, that evaluates L(V)LMs' capability on understanding inter-session intention shift with four subtasks. With 1,952,177 intention entries, 1,132,145 session intention trajectories, and 13,003,664 available tasks mined using 10,905 sessions, we provide a scalable way to exploit the existing session data for customer intention understanding. We conduct human annotations to collect ground-truth label for a subset of collected data to form an evaluation gold set. Extensive experiments on the annotated data further confirm that current L(V)LMs fail to capture and utilize the intention across the complex session setting. Further analysis show injecting intention enhances LLMs' performances.

representative citing papers

Shopping Reasoning Bench: An Expert-Authored Benchmark for Multi-Turn Conversational Shopping Assistants

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

Shopping Reasoning Bench is a new expert-created benchmark of 525 missions and 10863 rubrics showing GPT/Claude/Gemini models achieve only 57-77% pass rates, with notable drops on multi-turn and optional criteria.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Shopping Reasoning Bench: An Expert-Authored Benchmark for Multi-Turn Conversational Shopping Assistants cs.CL · 2026-06-10 · unverdicted · none · ref 2 · internal anchor
Shopping Reasoning Bench is a new expert-created benchmark of 525 missions and 10863 rubrics showing GPT/Claude/Gemini models achieve only 57-77% pass rates, with notable drops on multi-turn and optional criteria.

SessionIntentBench: A Multi-task Inter-session Intention-shift Modeling Benchmark for E-commerce Customer Behavior Understanding

fields

years

verdicts

representative citing papers

citing papers explorer