pith. sign in

hub Baseline reference

LongCat-Image Technical Report

Baseline reference. 83% of citing Pith papers use this work as a benchmark or comparison.

27 Pith papers citing it
Baseline 83% of classified citations
abstract

We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models. 1) We achieve this through rigorous data curation strategies across the pre-training, mid-training, and SFT stages, complemented by the coordinated use of curated reward models during the RL phase. This strategy establishes the model as a new state-of-the-art (SOTA), delivering superior text-rendering capabilities and remarkable photorealism, and significantly enhancing aesthetic quality. 2) Notably, it sets a new industry standard for Chinese character rendering. By supporting even complex and rare characters, it outperforms both major open-source and commercial solutions in coverage, while also achieving superior accuracy. 3) The model achieves remarkable efficiency through its compact design. With a core diffusion model of only 6B parameters, it is significantly smaller than the nearly 20B or larger Mixture-of-Experts (MoE) architectures common in the field. This ensures minimal VRAM usage and rapid inference, significantly reducing deployment costs. Beyond generation, LongCat-Image also excels in image editing, achieving SOTA results on standard benchmarks with superior editing consistency compared to other open-source works. 4) To fully empower the community, we have established the most comprehensive open-source ecosystem to date. We are releasing not only multiple model versions for text-to-image and image editing, including checkpoints after mid-training and post-training stages, but also the entire toolchain of training procedure. We believe that the openness of LongCat-Image will provide robust support for developers and researchers, pushing the frontiers of visual content creation.

hub tools

citation-role summary

baseline 5 background 1

citation-polarity summary

years

2026 27

representative citing papers

Gen-Searcher: Reinforcing Agentic Search for Image Generation

cs.CV · 2026-03-30 · unverdicted · novelty 7.0 · 2 refs

Gen-Searcher is the first trained search-augmented image generation agent using SFT followed by GRPO reinforcement learning with dual text-image rewards, delivering 15-16 point gains on knowledge-intensive benchmarks.

SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

SpatialEdit provides a benchmark, large synthetic dataset, and baseline model for precise object and camera spatial manipulations in images, with the model beating priors on spatial editing.

PaintBench: Deterministic Evaluation of Precise Visual Editing

cs.GR · 2026-05-29 · unverdicted · novelty 5.0

PaintBench provides a scalable deterministic benchmark for precise visual editing operations, revealing that even the best of 11 models achieves only 17.1% mIoU and that scores correlate strongly with applied data visualization editing performance.

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

Lens is a 3.8B-parameter text-to-image model that reaches competitive or superior performance to >6B-parameter systems using 19.3% of the training compute of Z-Image through a densely captioned 800M dataset, multi-resolution batching, semantic VAE, strong language encoder, RL fine-tuning, and 4-step

FineEdit: Fine-Grained Image Edit with Bounding Box Guidance

cs.CV · 2026-04-13 · unverdicted · novelty 5.0

FineEdit adds multi-level bounding box injection to diffusion image editing, releases a 1.2M-pair dataset with box annotations, and shows better instruction following and background consistency than prior open models on new and existing benchmarks.

citing papers explorer

Showing 27 of 27 citing papers.