hub

Perceiver IO: A general architecture for structured inputs & outputs.Preprint arXiv:2107.14795

Perceiver io: A general architecture for structured inputs & outputs , author= · 2021 · arXiv 2107.14795

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models

cs.CL · 2026-05-10 · conditional · novelty 7.0

Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.

MeshFIM: Local Low-Poly Mesh Editing via Fill-in-the-Middle Autoregressive Generation

cs.GR · 2026-05-09 · unverdicted · novelty 7.0

MeshFIM enables local low-poly mesh editing by autoregressively filling target regions conditioned on context, using boundary markers, positional embeddings, and a gated geometry encoder to enforce attachment, topology, and region limits.

A foundation model of vision, audition, and language for in-silico neuroscience

q-bio.NC · 2026-05-05 · unverdicted · novelty 7.0

TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.

A Generalist Agent

cs.AI · 2022-05-12 · accept · novelty 7.0

Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.

High-Resolution Image Synthesis with Latent Diffusion Models

cs.CV · 2021-12-20 · conditional · novelty 7.0

Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and

TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

TOPOS creates high-fidelity 3D heads with fixed industry topology from single images via a specialized VAE with Perceiver Resampler and a rectified flow transformer.

A Meta Reinforcement Learning Approach to Goals-Based Wealth Management

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.

Hypergraph and Latent ODE Learning for Multimodal Root Cause Localization in Microservices

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

HyperODE RCA integrates hypergraph learning with latent ODEs and cross-modal attention to improve root cause localization in microservice architectures on the Tianchi AIOps benchmark.

MLG-Stereo: ViT Based Stereo Matching with Multi-Stage Local-Global Enhancement

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

MLG-Stereo adds multi-granularity feature extraction, local-global cost volumes, and guided recurrent refinement to ViT stereo matching, yielding competitive results on Middlebury, KITTI-2015, and strong results on KITTI-2012.

OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens

q-bio.NC · 2026-04-20 · unverdicted · novelty 6.0

OmniMouse demonstrates data-driven scaling in multi-task brain models on a 150B-token neural dataset, achieving SOTA across prediction, decoding, and forecasting while model size gains saturate.

UHR-BAT: Budget-Aware Token Compression Vision-Language model for Ultra-High-Resolution Remote Sensing

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

UHR-BAT is a budget-aware framework that uses text-guided multi-scale importance estimation plus region-wise preserve and merge strategies to compress visual tokens in ultra-high-resolution remote sensing vision-language models.

Towards foundation-style models for energy-frontier heterogeneous neutrino detectors via self-supervised pre-training

hep-ex · 2026-04-08 · conditional · novelty 6.0

Self-supervised pre-training on multimodal neutrino detector simulations produces reusable representations that improve downstream classification, regression, and data efficiency over training from scratch.

Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis

cs.MM · 2026-04-07 · unverdicted · novelty 6.0

PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

cs.AI · 2025-06-11 · unverdicted · novelty 6.0

V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 hours of unlabeled robot video.

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

cs.CL · 2024-05-27 · accept · novelty 6.0

NV-Embed achieves first place on the MTEB leaderboard across 56 tasks by combining a latent attention layer, causal-mask removal, two-stage contrastive training, and data curation for LLM-based embedding models.

Chameleon: Mixed-Modal Early-Fusion Foundation Models

cs.CL · 2024-05-16 · unverdicted · novelty 6.0

Chameleon is an early-fusion token model that handles mixed image-text sequences for understanding and generation, achieving competitive or superior performance to larger models like Llama-2, Mixtral, and Gemini-Pro on captioning, VQA, text, and image tasks.

PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling

cs.LG · 2026-04-15 · unverdicted · novelty 5.0 · 2 refs

PRiMeFlow applies flow matching in gene expression space with a U-Net velocity field and pretraining-finetuning to model perturbation-induced heterogeneity, showing strong benchmark performance on PerturBench and the ARC Virtual Cell Challenge.

ENSEMBITS: an alphabet of protein conformational ensembles

cs.LG · 2026-05-13

citing papers explorer

Showing 18 of 18 citing papers.

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models cs.CL · 2026-05-10 · conditional · none · ref 48
Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
MeshFIM: Local Low-Poly Mesh Editing via Fill-in-the-Middle Autoregressive Generation cs.GR · 2026-05-09 · unverdicted · none · ref 17
MeshFIM enables local low-poly mesh editing by autoregressively filling target regions conditioned on context, using boundary markers, positional embeddings, and a gated geometry encoder to enforce attachment, topology, and region limits.
A foundation model of vision, audition, and language for in-silico neuroscience q-bio.NC · 2026-05-05 · unverdicted · none · ref 69
TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.
A Generalist Agent cs.AI · 2022-05-12 · accept · none · ref 30
Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.
High-Resolution Image Synthesis with Latent Diffusion Models cs.CV · 2021-12-20 · conditional · none · ref 35
Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and
TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation cs.CV · 2026-05-14 · unverdicted · none · ref 58
TOPOS creates high-fidelity 3D heads with fixed industry topology from single images via a specialized VAE with Perceiver Resampler and a rectified flow transformer.
A Meta Reinforcement Learning Approach to Goals-Based Wealth Management cs.LG · 2026-05-04 · unverdicted · none · ref 263
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
Hypergraph and Latent ODE Learning for Multimodal Root Cause Localization in Microservices cs.LG · 2026-05-01 · unverdicted · none · ref 9
HyperODE RCA integrates hypergraph learning with latent ODEs and cross-modal attention to improve root cause localization in microservice architectures on the Tianchi AIOps benchmark.
MLG-Stereo: ViT Based Stereo Matching with Multi-Stage Local-Global Enhancement cs.CV · 2026-04-22 · unverdicted · none · ref 43
MLG-Stereo adds multi-granularity feature extraction, local-global cost volumes, and guided recurrent refinement to ViT stereo matching, yielding competitive results on Middlebury, KITTI-2015, and strong results on KITTI-2012.
OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens q-bio.NC · 2026-04-20 · unverdicted · none · ref 45
OmniMouse demonstrates data-driven scaling in multi-task brain models on a 150B-token neural dataset, achieving SOTA across prediction, decoding, and forecasting while model size gains saturate.
UHR-BAT: Budget-Aware Token Compression Vision-Language model for Ultra-High-Resolution Remote Sensing cs.CV · 2026-04-15 · unverdicted · none · ref 10
UHR-BAT is a budget-aware framework that uses text-guided multi-scale importance estimation plus region-wise preserve and merge strategies to compress visual tokens in ultra-high-resolution remote sensing vision-language models.
Towards foundation-style models for energy-frontier heterogeneous neutrino detectors via self-supervised pre-training hep-ex · 2026-04-08 · conditional · none · ref 54
Self-supervised pre-training on multimodal neutrino detector simulations produces reusable representations that improve downstream classification, regression, and data efficiency over training from scratch.
Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis cs.MM · 2026-04-07 · unverdicted · none · ref 23
PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning cs.AI · 2025-06-11 · unverdicted · none · ref 33
V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 hours of unlabeled robot video.
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models cs.CL · 2024-05-27 · accept · none · ref 131
NV-Embed achieves first place on the MTEB leaderboard across 56 tasks by combining a latent attention layer, causal-mask removal, two-stage contrastive training, and data curation for LLM-based embedding models.
Chameleon: Mixed-Modal Early-Fusion Foundation Models cs.CL · 2024-05-16 · unverdicted · none · ref 12
Chameleon is an early-fusion token model that handles mixed image-text sequences for understanding and generation, achieving competitive or superior performance to larger models like Llama-2, Mixtral, and Gemini-Pro on captioning, VQA, text, and image tasks.
PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling cs.LG · 2026-04-15 · unverdicted · none · ref 6 · 2 links
PRiMeFlow applies flow matching in gene expression space with a U-Net velocity field and pretraining-finetuning to model perturbation-induced heterogeneity, showing strong benchmark performance on PerturBench and the ARC Virtual Cell Challenge.
ENSEMBITS: an alphabet of protein conformational ensembles cs.LG · 2026-05-13 · unreviewed · ref 5

Perceiver IO: A general architecture for structured inputs & outputs.Preprint arXiv:2107.14795

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer