CogBlender: Towards Continuous Cognitive Intervention in Text-to-Image Generation

Jiaying Lei; Nan Cao; Shengqi Dang; Yi He; Ziqing Qian

arxiv: 2603.09286 · v2 · pith:5YM44J33new · submitted 2026-03-10 · 💻 cs.CV

CogBlender: Towards Continuous Cognitive Intervention in Text-to-Image Generation

Shengqi Dang , Yi He , Jiaying Lei , Ziqing Qian , Nan Cao This is my paper

Pith reviewed 2026-05-21 11:11 UTC · model grok-4.3

classification 💻 cs.CV

keywords text-to-image generationcognitive interventionflow matchingvelocity fieldsprompt rewritingvalencememorabilitycontinuous control

0 comments

The pith

CogBlender achieves continuous control of cognitive properties in text-to-image outputs by blending velocity fields from discrete cognition-aware prompts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CogBlender to allow users to specify not only the content but also the psychological impact of generated images, such as how memorable or emotionally positive they are. It does this through a two-stage process that first creates prompt variants for extreme cognitive states and then uses linear interpolation in the model's velocity field space to mix them according to desired scores. This approach addresses the limitation that existing models excel at semantic control but lack fine-grained influence over viewer responses like valence or memorability. If successful, it would enable more precise alignment between generated images and the user's intended psychological effect.

Core claim

CogBlender constructs discrete cognition-aware rewritten prompts that represent distinct extreme cognitive states and translates these into continuous control signals by interpolating within the velocity-field domain of flow-matching models. Dynamically blending the velocity fields predicted from these prompts according to target cognitive scores smoothly steers the generative trajectory to realize the desired cognitive properties in the final image.

What carries the argument

Velocity field blending in flow-matching models, which combines predictions from multiple rewritten prompts to achieve continuous interpolation between cognitive extremes.

If this is right

Cognitive properties including valence, arousal, dominance, and memorability can be adjusted continuously and independently in the output image.
The method preserves semantic content from the original prompt while modifying only the cognitive attributes.
Experiments show effective intervention across the four tested cognitive dimensions without requiring model retraining.
Multi-dimensional control is possible by specifying a vector of target cognitive scores.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This blending technique could be adapted to other diffusion or flow-based models for similar property control.
Future work might test whether the same interpolation applies to controlling additional properties like perceived realism or aesthetic appeal.
Applications could include generating images for psychological studies or personalized content with specific emotional impacts.

Load-bearing premise

That rewriting prompts to capture extreme cognitive states and then linearly blending their velocity fields will result in predictable and continuous shifts in the actual cognitive properties of the generated images.

What would settle it

Generating a series of images with gradually increasing target memorability scores and then having human raters score the actual memorability to check if it increases monotonically and smoothly as predicted.

Figures

Figures reproduced from arXiv: 2603.09286 by Jiaying Lei, Nan Cao, Shengqi Dang, Yi He, Ziqing Qian.

**Figure 1.** Figure 1: CogBlender is a framework designed to intervene in the continuous cognitive properties during the text-to-image generation process. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of CogBlender framework. cognitive predictors to navigate latent spaces along specific "memorability directions." Existing research has primarily focused on isolated attributes, leaving the compounded effects of multi-dimensional cognitive properties unexplored. To bridge this gap, our work, CogBlender, proposes a unified framework that enables continuous and multidimensional cognitive intervent… view at source ↗

**Figure 3.** Figure 3: Visualization of cognitive space S, semantic manifold M𝑝 , cognitive anchors {a 𝑘 } and the polarized prompt sets { P𝑘 }, utilizing a twodimensional Valence-Arousal space as an example. 3.1 Cognitive Space and Semantic Manifold To enable precise and continuous control over cognitive properties in image generation, we must first explicitly model the relationship between cognitive attributes and visual sem… view at source ↗

**Figure 4.** Figure 4: Quantitative evaluation of memorability intervention. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of emotion-conditioned image generation. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparison of memorability-aware image generation. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Reference-based image editing for cognitive intervention. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Ablation study on the Valence-Arousal conditional image generation task. We evaluate the contribution of each component. The average time for per [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Extended results across diverse domains. (a) Generalization to various artistic styles (e.g., oil painting) and different scenes. (b) Dual-attribute control, [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: Continuous cognitive interpolation. Smooth transition from high valence and arousal (V=0.7, A=0.7) to low valence and arousal (V=0.1, A=0.1). The [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

read the original abstract

Beyond conveying semantic information, images also possess cognitive properties that elicit specific psychological responses from viewers, such as memory encoding or emotional reactions. Although modern text-to-image (T2I) models generate semantically coherent content effectively, they struggle to control cognitive properties (e.g., valence, memorability) and often fail to align with the user's psychological intent. To bridge the gap, we introduce CogBlender, an algorithm that enables continuous and multi-dimensional intervention on cognitive properties through a novel two-stage approach. First, we construct discrete cognition-aware rewritten prompts-variants of the input prompt that represent distinct extreme cognitive states. Second, we translate these discrete prompts into continuous control signals by interpolating within the velocity-field domain of flow-matching models. By dynamically blending the velocity fields predicted from these prompts according to the target cognitive scores, CogBlender smoothly steers the generative trajectory to realize the desired cognitive properties in the final image. Extensive experiments across four cognitive properties (i.e., valence, arousal, dominance, and memorability) demonstrate that CogBlender achieves effective cognitive intervention.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces CogBlender, a two-stage method for continuous multi-dimensional control of cognitive properties (valence, arousal, dominance, memorability) in text-to-image generation using flow-matching models. Stage one rewrites the input prompt into discrete variants representing extreme cognitive states; stage two dynamically blends the predicted velocity fields according to target cognitive scores to steer the generative trajectory.

Significance. If the central claim holds with supporting quantitative evidence, the work would offer a practical algorithmic construction for psychological control in T2I outputs that goes beyond semantic prompting. The velocity-field interpolation approach in flow-matching models is a technically interesting direction that could generalize to other controllable generation tasks.

major comments (3)

[§3.2] §3.2 (velocity-field blending): the claim that linear interpolation v = Σ w_i v_i with weights w_i derived from target scores produces continuous, monotonic, and predictable shifts in cognitive properties lacks supporting analysis or verification. The manuscript must demonstrate that the resulting trajectory actually traverses the desired cognitive manifold rather than an arbitrary path in latent space.
[§4] §4 (experiments): the abstract asserts effective intervention across four properties, yet the provided description supplies no quantitative results, error bars, baselines, or ablation studies on prompt rewriting versus velocity blending. Without these, the effectiveness claim cannot be evaluated against the paper's own data.
[§3.1] §3.1 (prompt rewriting): the assumption that discrete cognition-aware rewritten prompts accurately isolate distinct extreme cognitive states is load-bearing but untested for entanglement between semantic content and cognitive labels; a concrete test (e.g., human ratings or metric correlation) is required to support the discrete-to-continuous translation.

minor comments (2)

[§3.2] Notation for the blending weights w_i and the precise definition of the target cognitive scores should be formalized with an equation in §3.2 for reproducibility.
Figure captions and axis labels in the experimental results should explicitly state the cognitive property being measured and the baseline methods compared.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us identify areas where the manuscript can be strengthened. We address each major comment below and describe the revisions we will make.

read point-by-point responses

Referee: [§3.2] §3.2 (velocity-field blending): the claim that linear interpolation v = Σ w_i v_i with weights w_i derived from target scores produces continuous, monotonic, and predictable shifts in cognitive properties lacks supporting analysis or verification. The manuscript must demonstrate that the resulting trajectory actually traverses the desired cognitive manifold rather than an arbitrary path in latent space.

Authors: We agree that additional verification is required to substantiate the properties of the interpolated trajectories. In the revised manuscript we will add quantitative analysis in §3.2 and a new figure showing measured cognitive property values (via both automated metrics and human ratings) at multiple points along sample trajectories for varying target scores. This will demonstrate continuity and monotonicity with respect to the target cognitive manifold. revision: yes
Referee: [§4] §4 (experiments): the abstract asserts effective intervention across four properties, yet the provided description supplies no quantitative results, error bars, baselines, or ablation studies on prompt rewriting versus velocity blending. Without these, the effectiveness claim cannot be evaluated against the paper's own data.

Authors: We acknowledge that the experimental presentation in the current version does not include sufficient quantitative detail. The full manuscript contains results from experiments on the four properties, but we will substantially revise §4 to include: (i) tables reporting mean cognitive scores with standard deviations for each property, (ii) comparisons against baselines including direct prompting and classifier-based guidance, and (iii) ablations that isolate the contributions of the prompt-rewriting stage versus the velocity-blending stage. These additions will be accompanied by error bars and statistical significance tests. revision: yes
Referee: [§3.1] §3.1 (prompt rewriting): the assumption that discrete cognition-aware rewritten prompts accurately isolate distinct extreme cognitive states is load-bearing but untested for entanglement between semantic content and cognitive labels; a concrete test (e.g., human ratings or metric correlation) is required to support the discrete-to-continuous translation.

Authors: We recognize that empirical validation of the prompt-rewriting stage is necessary to rule out semantic-cognitive entanglement. We will add a dedicated validation subsection (or appendix) that reports: human ratings of cognitive properties on images generated from the extreme rewritten prompts, and correlation coefficients between the intended cognitive labels and both human ratings and available automated cognitive metrics, while controlling for semantic similarity via CLIP embeddings. This will directly support the discrete-to-continuous mapping. revision: yes

Circularity Check

0 steps flagged

No significant circularity; CogBlender is an explicit algorithmic construction

full rationale

The paper defines CogBlender directly as a two-stage procedure: (1) rewriting the input prompt into discrete variants that represent extreme cognitive states, and (2) linearly blending the velocity fields predicted by a flow-matching model according to target cognitive scores. No equations, derivations, or predictions are shown that reduce the claimed continuous control back to a fitted parameter, self-referential definition, or load-bearing self-citation. Effectiveness is asserted via experiments on valence, arousal, dominance, and memorability rather than by mathematical equivalence to the inputs. The derivation chain is therefore self-contained and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not enumerate free parameters, background axioms, or new postulated entities. Assessment is limited by absence of the full manuscript.

pith-pipeline@v0.9.0 · 5723 in / 1162 out tokens · 49156 ms · 2026-05-21T11:11:25.960480+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

v(x_t, t, p, s) = 1/2 (sum_{k=1}^{2^n} w_k(s) · v̂(x_t, t, P_k) + v_θ(x_t, t, p)) with w_k(s) = ∏_i (s_i a^k_i + (1-s_i)(1-a^k_i))
IndisputableMonolith/Foundation/RealityFromDistinction reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Cognitive Space S = [0,1]^n with anchors as vertices; Semantic Manifold M_p via polarized prompts

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 3 internal anchors

[1]

Zoya Bylinskii, Lore Goetschalckx, Anelise Newman, and Aude Oliva

Measuring emotion: the self-assessment manikin and the semantic differential.Journal of behavior therapy and experimental psychiatry25, 1 (1994), 49–59. Zoya Bylinskii, Lore Goetschalckx, Anelise Newman, and Aude Oliva

work page 1994
[2]

Yiming Chen, Junlin Han, Tianyi Bai, Shengbang Tong, Filippos Kokkinos, and Philip Torr

Visual cognition.Vision research51, 13 (2011), 1538–1551. Yiming Chen, Junlin Han, Tianyi Bai, Shengbang Tong, Filippos Kokkinos, and Philip Torr

work page 2011
[3]

arXiv:2511.22805 Mihai Gabriel Constantin, Liviu-Daniel Ştefan, Bogdan Ionescu, Ngoc QK Duong, Claire-Héléne Demarty, and Mats Sjöberg

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Per- ception of Images. arXiv:2511.22805 Mihai Gabriel Constantin, Liviu-Daniel Ştefan, Bogdan Ionescu, Ngoc QK Duong, Claire-Héléne Demarty, and Mats Sjöberg

work page arXiv
[4]

Shengqi Dang, Yi He, Long Ling, Ziqing Qian, Nanxuan Zhao, and Nan Cao

Visual interestingness prediction: A benchmark framework and literature review.International Journal of Computer Vision129, 5 (2021), 1526–1550. Shengqi Dang, Yi He, Long Ling, Ziqing Qian, Nanxuan Zhao, and Nan Cao

work page 2021
[5]

Proceedings of the National Academy of Sciences120, 28 (2023), e2302389120

Memory for artwork is predictable. Proceedings of the National Academy of Sciences120, 28 (2023), e2302389120. Charles D Gilbert and Wu Li

work page 2023
[6]

Lore Goetschalckx, Alex Andonian, Aude Oliva, and Phillip Isola

Top-down influences on visual processing.Nature reviews neuroscience14, 5 (2013), 350–363. Lore Goetschalckx, Alex Andonian, Aude Oliva, and Phillip Isola

work page 2013
[7]

Jonathan Ho and Tim Salimans

Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems30 (2017). Jonathan Ho and Tim Salimans

work page 2017
[8]

InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications

Classifier-Free Diffusion Guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications. Jingyang Jia, Kai Shu, Gang Yang, Long Xing, Xun Chen, and Aiping Liu

work page 2021
[9]

arXiv:2511.19982 Aditya Khosla, Akhil S Raju, Antonio Torralba, and Aude Oliva

EmoFeedback2: Reinforcement of Continuous Emotional Image Generation via LVLM-based Reward and Textual Feedback. arXiv:2511.19982 Aditya Khosla, Akhil S Raju, Antonio Torralba, and Aude Oliva

work page arXiv
[10]

Ronak Kosti, Jose M

The roles of sensory perceptions and mental imagery in consumer decision-making.Journal of Retailing and Consumer Services61 (2021), 102517. Ronak Kosti, Jose M. Alvarez, Adria Recasens, and Agata Lapedriza

work page 2021
[11]

Black Forest Labs

Introducing the open affective standardized image set (OASIS).Behavior research methods49, 2 (2017), 457–470. Black Forest Labs

work page 2017
[12]

InProceedings of IEEE Conference on Computer Vision and Pattern Recognition

AVA: A large-scale database for aesthetic visual analysis. InProceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2408–2415. Ulric Neisser. 2014.Cognitive psychology: Classic edition. Psychology press. Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach

work page 2014
[13]

Remigiusz Szczepanowski, Jakub Traczyk, Michał Wierzchoń, and Axel Cleeremans

The effectiveness of advertising im- ages in promoting experiential offerings: An emotional response approach.Journal of Business Research122 (2021), 344–352. Remigiusz Szczepanowski, Jakub Traczyk, Michał Wierzchoń, and Axel Cleeremans

work page 2021
[14]

Consciousness and cognition22, 1 (2013), 212–220

The perception of visual emotion: comparing different measures of awareness. Consciousness and cognition22, 1 (2013), 212–220. Jianyi Wang, Kelvin CK Chan, and Chen Change Loy

work page 2013
[15]

Qwen3 Technical Report

VisRecall: Quantifying information visualisation recallability via question answering.IEEE Transactions on Visualization and Computer Graphics28, 12 (2022), 4995–5005. An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025a. Qwen3 Technical Report. arXiv:2505.09388 Jingyuan Yang, ...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[16]

EmoCtrl: Controllable Emotional Image Content Generation

EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 6358–6368. Jingyuan Yang, Weibin Luo, and Hui Huang. 2025b. EmoCtrl: Controllable Emotional Image Content Generation. arXiv:2512.22437 Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang

work page internal anchor Pith review Pith/arXiv arXiv
[17]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models. arXiv:2308.06721 Lvmin Zhang, Anyi Rao, and Maneesh Agrawala

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Pattern Recognition154 (2024), 110584

Emotion- aware hierarchical interaction network for multimodal image aesthetics assessment. Pattern Recognition154 (2024), 110584. ACM Trans. Graph., Vol. 1, No. 1, Article . Publication date: March

work page 2024
[19]

Smooth transition from high valence and arousal (V=0.7, A=0.7) to low valence and arousal (V=0.1, A=0.1)

Continuous cognitive interpolation. Smooth transition from high valence and arousal (V=0.7, A=0.7) to low valence and arousal (V=0.1, A=0.1). The results showcase a consistent shift in visual atmosphere and emotional impact along a continuous trajectory while maintaining subject stability. ACM Trans. Graph., Vol. 1, No. 1, Article . Publication date: March 2026

work page 2026

[1] [1]

Zoya Bylinskii, Lore Goetschalckx, Anelise Newman, and Aude Oliva

Measuring emotion: the self-assessment manikin and the semantic differential.Journal of behavior therapy and experimental psychiatry25, 1 (1994), 49–59. Zoya Bylinskii, Lore Goetschalckx, Anelise Newman, and Aude Oliva

work page 1994

[2] [2]

Yiming Chen, Junlin Han, Tianyi Bai, Shengbang Tong, Filippos Kokkinos, and Philip Torr

Visual cognition.Vision research51, 13 (2011), 1538–1551. Yiming Chen, Junlin Han, Tianyi Bai, Shengbang Tong, Filippos Kokkinos, and Philip Torr

work page 2011

[3] [3]

arXiv:2511.22805 Mihai Gabriel Constantin, Liviu-Daniel Ştefan, Bogdan Ionescu, Ngoc QK Duong, Claire-Héléne Demarty, and Mats Sjöberg

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Per- ception of Images. arXiv:2511.22805 Mihai Gabriel Constantin, Liviu-Daniel Ştefan, Bogdan Ionescu, Ngoc QK Duong, Claire-Héléne Demarty, and Mats Sjöberg

work page arXiv

[4] [4]

Shengqi Dang, Yi He, Long Ling, Ziqing Qian, Nanxuan Zhao, and Nan Cao

Visual interestingness prediction: A benchmark framework and literature review.International Journal of Computer Vision129, 5 (2021), 1526–1550. Shengqi Dang, Yi He, Long Ling, Ziqing Qian, Nanxuan Zhao, and Nan Cao

work page 2021

[5] [5]

Proceedings of the National Academy of Sciences120, 28 (2023), e2302389120

Memory for artwork is predictable. Proceedings of the National Academy of Sciences120, 28 (2023), e2302389120. Charles D Gilbert and Wu Li

work page 2023

[6] [6]

Lore Goetschalckx, Alex Andonian, Aude Oliva, and Phillip Isola

Top-down influences on visual processing.Nature reviews neuroscience14, 5 (2013), 350–363. Lore Goetschalckx, Alex Andonian, Aude Oliva, and Phillip Isola

work page 2013

[7] [7]

Jonathan Ho and Tim Salimans

Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems30 (2017). Jonathan Ho and Tim Salimans

work page 2017

[8] [8]

InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications

Classifier-Free Diffusion Guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications. Jingyang Jia, Kai Shu, Gang Yang, Long Xing, Xun Chen, and Aiping Liu

work page 2021

[9] [9]

arXiv:2511.19982 Aditya Khosla, Akhil S Raju, Antonio Torralba, and Aude Oliva

EmoFeedback2: Reinforcement of Continuous Emotional Image Generation via LVLM-based Reward and Textual Feedback. arXiv:2511.19982 Aditya Khosla, Akhil S Raju, Antonio Torralba, and Aude Oliva

work page arXiv

[10] [10]

Ronak Kosti, Jose M

The roles of sensory perceptions and mental imagery in consumer decision-making.Journal of Retailing and Consumer Services61 (2021), 102517. Ronak Kosti, Jose M. Alvarez, Adria Recasens, and Agata Lapedriza

work page 2021

[11] [11]

Black Forest Labs

Introducing the open affective standardized image set (OASIS).Behavior research methods49, 2 (2017), 457–470. Black Forest Labs

work page 2017

[12] [12]

InProceedings of IEEE Conference on Computer Vision and Pattern Recognition

AVA: A large-scale database for aesthetic visual analysis. InProceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2408–2415. Ulric Neisser. 2014.Cognitive psychology: Classic edition. Psychology press. Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach

work page 2014

[13] [13]

Remigiusz Szczepanowski, Jakub Traczyk, Michał Wierzchoń, and Axel Cleeremans

The effectiveness of advertising im- ages in promoting experiential offerings: An emotional response approach.Journal of Business Research122 (2021), 344–352. Remigiusz Szczepanowski, Jakub Traczyk, Michał Wierzchoń, and Axel Cleeremans

work page 2021

[14] [14]

Consciousness and cognition22, 1 (2013), 212–220

The perception of visual emotion: comparing different measures of awareness. Consciousness and cognition22, 1 (2013), 212–220. Jianyi Wang, Kelvin CK Chan, and Chen Change Loy

work page 2013

[15] [15]

Qwen3 Technical Report

VisRecall: Quantifying information visualisation recallability via question answering.IEEE Transactions on Visualization and Computer Graphics28, 12 (2022), 4995–5005. An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025a. Qwen3 Technical Report. arXiv:2505.09388 Jingyuan Yang, ...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[16] [16]

EmoCtrl: Controllable Emotional Image Content Generation

EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 6358–6368. Jingyuan Yang, Weibin Luo, and Hui Huang. 2025b. EmoCtrl: Controllable Emotional Image Content Generation. arXiv:2512.22437 Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models. arXiv:2308.06721 Lvmin Zhang, Anyi Rao, and Maneesh Agrawala

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Pattern Recognition154 (2024), 110584

Emotion- aware hierarchical interaction network for multimodal image aesthetics assessment. Pattern Recognition154 (2024), 110584. ACM Trans. Graph., Vol. 1, No. 1, Article . Publication date: March

work page 2024

[19] [19]

Smooth transition from high valence and arousal (V=0.7, A=0.7) to low valence and arousal (V=0.1, A=0.1)

Continuous cognitive interpolation. Smooth transition from high valence and arousal (V=0.7, A=0.7) to low valence and arousal (V=0.1, A=0.1). The results showcase a consistent shift in visual atmosphere and emotional impact along a continuous trajectory while maintaining subject stability. ACM Trans. Graph., Vol. 1, No. 1, Article . Publication date: March 2026

work page 2026