Towards In-Context Tone Style Transfer with A Large-Scale Triplet Dataset

· 2026 · cs.CV · arXiv 2604.16114

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Tone style transfer for photo retouching aims to adapt the stylistic tone of the reference image to a given content image. However, the lack of high-quality large-scale triplet datasets with stylized ground truth forces existing methods to rely on self-supervised or proxy objectives, which limits model capability. To mitigate this gap, we design a data construction pipeline to build TST100K, a large-scale dataset of 100,000 content-reference-stylized triplets. At the core of this pipeline, we train a tone style scorer to ensure strict stylistic consistency for each triplet. In addition, existing methods typically extract content and reference features independently and then fuse them in a decoder, which may cause semantic loss and lead to inappropriate color transfer and degraded visual aesthetics. Instead, we propose ICTone, a diffusion-based framework that performs tone transfer in an in-context manner by jointly conditioning on both images, leveraging the semantic priors of generative models for semantic-aware transfer. Reward feedback learning using the tone style scorer is further incorporated to improve stylistic fidelity and visual quality. Experiments demonstrate the effectiveness of TST100K, and ICTone achieves state-of-the-art performance on both quantitative metrics and human evaluations.

representative citing papers

There and Back Again: A Flexible-Frame Transformer for Multi-Exposure Fusion

cs.CV · 2026-06-26 · unverdicted · novelty 6.0

FreeMEF is the first flexible-frame transformer for multi-exposure fusion using a recurrent state space module and global feature guided block to handle variable numbers of input exposures.

citing papers explorer

Showing 1 of 1 citing paper.

There and Back Again: A Flexible-Frame Transformer for Multi-Exposure Fusion cs.CV · 2026-06-26 · unverdicted · none · ref 10 · internal anchor
FreeMEF is the first flexible-frame transformer for multi-exposure fusion using a recurrent state space module and global feature guided block to handle variable numbers of input exposures.

Towards In-Context Tone Style Transfer with A Large-Scale Triplet Dataset

fields

years

verdicts

representative citing papers

citing papers explorer