pith. machine review for the scientific record. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

5081 papers in cs.CV · page 1

  1. cs.CV 2026-05-14 reviewed
    Memory bank preserves characters across 48-shot gaps in video

    EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

    Meng Wei +3

  2. cs.CV 2026-05-14 reviewed
    One token unifies agentic and latent visual reasoning

    ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

    Pheng-Ann Heng +3

  3. cs.CV 2026-05-14 reviewed
    The paper proposes RefDecoder

    RefDecoder: Enhancing Visual Generation with Conditional Video Decoding

    Bohan Fang +4

  4. cs.CV 2026-05-14 reviewed
    New index catches 3D geometry errors in video generators

    Quantitative Video World Model Evaluation for Geometric-Consistency

    Jiaxin Wu +4

  5. cs.CV 2026-05-14 reviewed
    Frozen video models follow camera paths via simple warp interface

    Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

    Tong He +1

  6. cs.CV 2026-05-14 reviewed
    Reward-driven planner and orchestrator improve multi-step image edits

    From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing

    Anirudh Sundara Rajan +2

  7. cs.CV 2026-05-14 reviewed
    Geometry-first method cuts satellite-to-street 3D error by 23 percent

    Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

    Bin Tan +8

  8. cs.CV 2026-05-14 reviewed
    The paper introduces MicroscopyMatching

    MicroscopyMatching: Towards a Ready-to-use Framework for Microscopy Image Analysis in Diverse Conditions

    Haoxuan Qu +5

  9. cs.GR 2026-05-14 reviewed
    Meschers process impossible objects without cuts or bends

    Meschers: Geometry Processing of Impossible Objects

    Ana Dodik +6

  10. cs.CV 2026-05-14 reviewed
    Head ranking doubles KV cache compression in image generators

    HeatKV: Head-tuned KV-cache Compression for Visual Autoregressive Modeling

    Axel Berg +4

  11. cs.CV 2026-05-14 reviewed
    The paper presents the Closed-Loop Visual Reasoning (CLVR) framework that integrates…

    Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

    Hanbo Cheng +4

  12. cs.CV 2026-05-14 reviewed
    Shared channel basis across frequencies boosts spectral mixers

    CHASM: Cross-frequency Harmonized Axis-Separable Mixing for Spectral Token Operators

    Hongli Chen +5

  13. cs.CV 2026-05-14 reviewed
    Model reads cell types and protein levels from label-free images

    Towards Label-Free Single-Cell Phenotyping Using Multi-Task Learning

    Ardhendu Behera +1

  14. cs.CV 2026-05-14 reviewed
    Vision features align LLM text with clinical data for stroke prognosis

    Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke

    Guanjie Wang +7

  15. cs.CV 2026-05-14 reviewed
    Adaptive mode switching raises fidelity on complex image prompts

    Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners

    Bingjie Gao +11

  16. cs.CV 2026-05-14 reviewed
    Dual-branch model copies text styles across languages in scenes

    StyleTextGen: Style-Conditioned Multilingual Scene Text Generation

    Fangmin Zhao +5

  17. cs.CV 2026-05-14 reviewed
    Model generates sign language replies from signing context alone

    Towards Continuous Sign Language Conversation from Isolated Signs

    Chanyoung Kim +6

  18. cs.CV 2026-05-14 reviewed
    VLMs fail to locate hidden functional objects from task instructions

    SceneFunRI: Reasoning the Invisible for Task-Driven Functional Object Localization

    Gueter Josmy Faure +4

  19. cs.CV 2026-05-14 reviewed
    Generative model turns SDR video into HDR by predicting bracketed exposures

    Generating HDR Video from SDR Video

    Daisuke Iso +8

  20. cs.CV 2026-05-14 reviewed
    Driving model gains planning edge by forecasting 3D futures

    EponaV2: Driving World Model with Comprehensive Future Reasoning

    Jian Yang +10

  21. cs.CV 2026-05-14 reviewed
    Randomly initialized nets match active learning without candidate models

    Are Candidate Models Really Needed for Active Learning?

    Harshini Mridula Mohan +4

  22. cs.CV 2026-05-14 reviewed
    Multiscale VLM features raise video edit quality

    MiVE: Multiscale Vision-language features for reference-guided video Editing

    Chengjing Wu +6

  23. cs.CV 2026-05-14 reviewed
    Anatomy topology across patients boosts medical scan pre-training

    Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging

    Chen Jiang +10

  24. cs.CV 2026-05-14 reviewed
    New dataset tracks urban land and vegetation shifts with 5221 Sentinel-2 pairs

    TERRA-CD: Multi-Temporal Framework for Multi-class and Semantic Change Detection

    Omkar Oak +3

  25. cs.CV 2026-05-14 reviewed
    Vision framework with physical priors lifts water level accuracy

    Vision-Based Water Level and Flow Estimation

    ZhiXin Sun

  26. cs.CV 2026-05-14 reviewed
    RefineCAM improves high-resolution CAMs for CNN explanations

    How to Evaluate and Refine your CAM

    Alessandra Stramiglio +3

  27. cs.CV 2026-05-14 reviewed
    Multi-label benchmark shows MLLMs still miss full emotion mixes

    MultiEmo-Bench: Multi-label Visual Emotion Analysis for Multi-modal Large Language Models

    Mo Fan +5

  28. cs.LG 2026-05-14 reviewed
    Learned potential reweights bridges to improve generative fidelity

    Action-Inspired Generative Models

    Debnath Pal +1

  29. cs.CV 2026-05-14 reviewed
    Unified diffusion generates aligned VIS-IR-Label triplets from few pairs

    UniTriGen: Unified Triplet Generation of Aligned Visible-Infrared-Label for Few-Shot RGB-T Semantic Segmentation

    Chen Ding +6

  30. cs.CV 2026-05-14 reviewed
    The paper introduces SIRA, an internal contrastive decoding method that reduces…

    Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution

    Junzhe Chen +5

  31. cs.CV 2026-05-14 reviewed
    ViMU benchmark tests video AI on hidden meanings

    ViMU: Benchmarking Video Metaphorical Understanding

    Qi Li +1

  32. cs.CV 2026-05-14 reviewed
    Hybrid Mamba-attention model extends rainfall forecasts to three hours

    MambaRain: Multi-Scale Mamba-Attention Framework for 0-3 Hour Precipitation Nowcasting

    Boyu Liu +12

  33. cs.CV 2026-05-14 reviewed
    Gaussians replace grids to lift panoramic images into 3D detections

    Towards Accurate Single Panoramic 3D Detection: A Semantic Gaussian Centric Approach

    Kanglin Ning +5

  34. cs.CV 2026-05-14 reviewed
    Two-stage model fuses radar and satellite for sharper rain forecasts

    VMU-Diff: A Coarse-to-fine Multi-source Data Fusion Framework for Precipitation Nowcasting

    Boyu Liu +8

  35. cs.CV 2026-05-14 reviewed
    TOPOS locks single-image 3D heads to fixed studio topology

    TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation

    Bojun Xiong +8

  36. cs.CV 2026-05-14 reviewed
    Higher-order stain stats raise federated pathology accuracy 3.9%

    FedStain: Modeling Higher-Order Stain Statistics for Federated Domain Generalization in Computational Pathology

    Fengyi Zhang +2

  37. cs.CV 2026-05-14 reviewed
    Aggregated vectors make different financial docs look identical

    A Picture is Worth a Thousand Words? An Empirical Study of Aggregation Strategies for Visual Financial Document Retrieval

    Ho Hung Lim +1

  38. cs.CV 2026-05-14 reviewed
    Dispersive loss on batch features sharpens medical boundaries

    Med-DisSeg: Dispersion-Driven Representation Learning for Fine-Grained Medical Image Segmentation

    Guowei Zou +3

  39. cs.CV 2026-05-14 reviewed
    Framework turns fMRI signals into videos via semantic stages

    Bridging Brain and Semantics: A Hierarchical Framework for Semantically Enhanced fMRI-to-Video Reconstruction

    Biao Gong +8

  40. cs.CV 2026-05-14 reviewed
    Latent alignment of images to masks improves medical segmentation

    SpectraFlow: Unifying Structural Pretraining and Frequency Adaptation for Medical Image Segmentation

    Guowei Zou +3

  41. cs.CV 2026-05-14 reviewed
  42. cs.CV 2026-05-14 reviewed
    2D convolutions extract temporal gait patterns via strip pooling

    Local Spatiotemporal Convolutional Network for Robust Gait Recognition

    Cunrong Li +2

  43. cs.CV 2026-05-14 reviewed
    RC metrics align object removal scores with human perception

    PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media

    Daiguo Zhou +8

  44. cs.CV 2026-05-14 reviewed
    Mask drift triggers repetition in diffusion vision-language models

    Mitigating Mask Prior Drift and Positional Attention Collapse in Large Diffusion Vision-Language Models

    Chanyong Yoon +2

  45. cs.CV 2026-05-14 reviewed
    The paper proposes using sparse images from different camera views captured at different…

    From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper

    Changjie Chen +6

  46. cs.CV 2026-05-14 reviewed
    ArcGate activation adapts shape to raise remote sensing accuracy

    ArcGate: Adaptive Arctangent Gated Activation

    Alejandro C. Frery +4

  47. cs.CV 2026-05-14 reviewed
    Head-wise sparsity speeds video diffusion 1.93x

    HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention

    Fei Chao +5

  48. cs.CV 2026-05-14 reviewed
    Training-free method stretches video generation to full minutes

    Head Forcing: Long Autoregressive Video Generation via Head Heterogeneity

    Chi Zhang +3

  49. cs.CV 2026-05-14 reviewed
    GAN upsampling plus expert fusion cuts artifact bias in image detectors

    Reduce the Artifacts Bias for More Generalizable AI-Generated Image Detection

    Gao Li +5

  50. cs.CV 2026-05-14 reviewed
    GeoVista plans globally then inspects branches for satellite images

    GeoVista: Visually Grounded Active Perception for Ultra-High-Resolution Remote Sensing Understanding

    Bo Yang +12