Title resolution pending

Learning Transferable Visual Models From Natural Language Supervision , author= · 2021

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs

cs.CV · 2026-05-08 · unverdicted · novelty 8.0

Sparse autoencoders inserted into VLMs and trained only for reconstruction can reliably detect adversarial attacks on images, including unseen domains and attack types.

NeuralBench: A Unifying Framework to Benchmark NeuroAI Models

cs.LG · 2026-05-08 · conditional · novelty 7.0

NeuralBench is a new benchmarking framework for neuroAI models on EEG data that finds foundation models only marginally outperform task-specific ones while many tasks like cognitive decoding stay highly challenging.

Probing Visual Planning in Image Editing Models

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

Image editing models fail zero-shot visual planning on abstract mazes and queen puzzles but generalize after finetuning, yet still cannot match human zero-shot efficiency.

PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs

cs.CV · 2026-05-22 · unverdicted · novelty 6.0

PGT generates synthetic tasks via geometric overlays on images to supply dense visual supervision, improving spatial and relational understanding in MLLMs by up to 20% on targeted benchmarks.

Prefix-Adaptive Block Diffusion for Efficient Document Recognition

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

PA-BDM adapts block diffusion by switching to causal intra-block denoising and dynamically committing reliable prefixes to KV cache, yielding higher accuracy and 71.6% higher throughput than a comparable baseline on document benchmarks.

A Few-Step Generative Model on Cumulative Flow Maps

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

Cumulative flow maps unify few-step generative modeling for diffusion and flow models via cumulative transport and parameterization with minimal changes to time embeddings and objectives.

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

cs.CV · 2024-03-05 · conditional · novelty 6.0

Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.

ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models

cs.CL · 2024-02-18 · unverdicted · novelty 6.0

ALLaVA creates 1.3M GPT4V-synthesized samples enabling 4B VLMs to achieve competitive results on 17 benchmarks and match 7B/13B models on some tasks.

NeuralSet: A High-Performing Python Package for Neuro-AI

q-bio.NC · 2026-05-04 · unverdicted · novelty 5.0 · 2 refs

NeuralSet is a scalable Python framework that unifies diverse neural recordings and stimuli with deep learning embeddings via metadata decoupling and lazy data extraction.

Mesh Based Simulations with Spatial and Temporal awareness

cs.LG · 2026-05-02 · unverdicted · novelty 5.0

A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention correction, and adding 3D rotary positional embeddings.

From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media

cs.CV · 2026-04-23 · unverdicted · novelty 5.0

VLMs recover reliable population-level trends in climate change visual discourse on social media even when per-image accuracy is only moderate.

Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection

cs.LG · 2026-05-19

Unified Pix Token And Word Token Generative Language Model

cs.CV · 2026-05-13

Online Learning-to-Defer with Varying Experts

stat.ML · 2026-05-12 · 2 refs

Reward Shaping and Action Masking for Compositional Tasks using Behavior Trees and LLMs

cs.LG · 2026-05-07

Spherical Flows for Sampling Categorical Data

stat.ML · 2026-05-07 · 2 refs

citing papers explorer

Showing 1 of 1 citing paper after filters.

ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models cs.CL · 2024-02-18 · unverdicted · none · ref 68
ALLaVA creates 1.3M GPT4V-synthesized samples enabling 4B VLMs to achieve competitive results on 17 benchmarks and match 7B/13B models on some tasks.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer