pith. sign in

hub

Llavanext: Improved reasoning, ocr, and world knowledge

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

hub tools

citation-role summary

background 3 baseline 1

citation-polarity summary

fields

cs.CV 9 cs.AI 1

years

2026 9 2025 1

verdicts

UNVERDICTED 10

representative citing papers

Lance: Unified Multimodal Modeling by Multi-Task Synergy

cs.CV · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.

X2SAM: Any Segmentation in Images and Videos

cs.CV · 2026-04-27 · unverdicted · novelty 6.0

X2SAM unifies any-segmentation across images and videos in one MLLM by adding a Mask Memory module for temporal consistency and joint training on mixed datasets.

Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM

cs.CV · 2025-05-23 · unverdicted · novelty 6.0

Slot-MLLM introduces a slot-attention-based object-centric visual tokenizer with Q-Former encoder, diffusion decoder, and residual vector quantization for improved local visual comprehension and generation in multimodal LLMs.

Swift Sampling: Selecting Temporal Surprises via Taylor Series

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

Swift Sampling is a training-free frame selection method that uses Taylor expansions on video latent trajectories to pick temporally surprising frames, outperforming uniform sampling on long-video QA tasks.

citing papers explorer

Showing 10 of 10 citing papers.