pith. machine review for the scientific record. sign in

hub

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

57 Pith papers cite this work. Polarity classification is still indexing.

57 Pith papers citing it
abstract

In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates (1) an optimized training strategy, (2) expanded training data, and (3) scaling to larger model size. With these improvements, Janus-Pro achieves significant advancements in both multimodal understanding and text-to-image instruction-following capabilities, while also enhancing the stability of text-to-image generation. We hope this work will inspire further exploration in the field. Code and models are publicly available.

hub tools

citation-role summary

background 2

citation-polarity summary

claims ledger

  • abstract In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates (1) an optimized training strategy, (2) expanded training data, and (3) scaling to larger model size. With these improvements, Janus-Pro achieves significant advancements in both multimodal understanding and text-to-image instruction-following capabilities, while also enhancing the stability of text-to-image generation. We hope this work will inspire further exploration in the field. Code and models are publicly available.

co-cited works

roles

background 2

polarities

background 2

clear filters

representative citing papers

MolSight: Molecular Property Prediction with Images

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

Vision encoders on single 2D molecular images with a chemistry-informed curriculum achieve top or near-top results on 10 property prediction tasks at 80x lower FLOPs than multi-modal competitors.

Flow-GRPO: Training Flow Matching Models via Online RL

cs.CV · 2025-05-08 · unverdicted · novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

Probing Visual Planning in Image Editing Models

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

Image editing models fail zero-shot visual planning on abstract mazes and queen puzzles but generalize after finetuning, yet still cannot match human zero-shot efficiency.

Beyond Text Prompts: Visual-to-Visual Generation as A Unified Paradigm

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

V2V-Zero adapts frozen VLMs for visual conditioning via hidden states from specification pages, scoring 0.85 on GenEval and 32.7 on a new seven-task benchmark while revealing capability hierarchies in attribute binding and structural control.

CASCADE: Context-Aware Relaxation for Speculative Image Decoding

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

CASCADE formalizes semantic interchangeability and convergence in target model representations to enable context-aware acceptance relaxation in tree-based speculative decoding, delivering up to 3.6x speedup on text-to-image models without quality loss.

citing papers explorer

Showing 1 of 1 citing paper after filters.