pith. sign in

hub Canonical reference

Unified multimodal understanding and generation models: Advances, challenges, and opportunities.arXiv preprint arXiv:2505.02567

Canonical reference. 100% of citing Pith papers cite this work as background.

19 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 8

citation-polarity summary

years

2026 17 2025 2

roles

background 7

polarities

background 7

representative citing papers

Lance: Unified Multimodal Modeling by Multi-Task Synergy

cs.CV · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.

Mind the Gap No More: Achieving Zero-Gap Multimodal Integration via One Tokenizer

q-bio.GN · 2026-01-21 · unverdicted · novelty 6.0

One Tokenizer achieves zero-gap multimodal integration by mapping all inputs to a unified token vocabulary, allowing native LLMs to perform deep cross-modal reasoning without modular encoders or fusion layers, and outperforming encoder-based baselines on DNA-text tasks.

Mull-Tokens: Modality-Agnostic Latent Thinking

cs.CV · 2025-12-11 · unverdicted · novelty 6.0

Mull-Tokens are modality-agnostic latent tokens that enable free-form multimodal thinking and deliver up to 16% gains on spatial reasoning benchmarks.

PaintBench: Deterministic Evaluation of Precise Visual Editing

cs.GR · 2026-05-29 · unverdicted · novelty 5.0

PaintBench provides a scalable deterministic benchmark for precise visual editing operations, revealing that even the best of 11 models achieves only 17.1% mIoU and that scores correlate strongly with applied data visualization editing performance.

Adaptive Forensic Feature Refinement via Intrinsic Importance Perception

cs.CV · 2026-04-18 · unverdicted · novelty 4.0

I2P adaptively selects the most discriminative layers from visual foundation models for synthetic image detection and constrains task updates to low-sensitivity parameter subspaces to improve specificity without harming generalization.

citing papers explorer

Showing 19 of 19 citing papers.