super hub Mixed citations

GLU Variants Improve Transformer

Noam Shazeer · 2020 · cs.LG · arXiv 2002.05202

Mixed citation behavior. Most common role is background (47%).

300 Pith papers citing it

Background 47% of classified citations

open full Pith review browse 300 citing papers more from Noam Shazeer arXiv PDF

abstract

Gated Linear Units (arXiv:1612.08083) consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function. Variations on GLU are possible, using different nonlinear (or even linear) functions in place of sigmoid. We test these variants in the feed-forward sublayers of the Transformer (arXiv:1706.03762) sequence-to-sequence model, and find that some of them yield quality improvements over the typically-used ReLU or GELU activations.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 30 method 24 dataset 2 extension 1

citation-polarity summary

background 27 use method 23 unclear 4 use dataset 2 extend 1

claims ledger

abstract Gated Linear Units (arXiv:1612.08083) consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function. Variations on GLU are possible, using different nonlinear (or even linear) functions in place of sigmoid. We test these variants in the feed-forward sublayers of the Transformer (arXiv:1706.03762) sequence-to-sequence model, and find that some of them yield quality improvements over the typically-used ReLU or GELU activations.

authors

Noam Shazeer

co-cited works

representative citing papers

DataComp-VLM: Improved Open Datasets for Vision-Language Models

cs.CV · 2026-06-26 · conditional · novelty 8.0 · 2 refs

DataComp-VLM benchmark shows instruction-heavy data mixing outperforms filtering for VLM training, with DCVLM-Baseline achieving 63.6% on 33 tasks for 8B models (+5.4pp over FineVision).

Attention as Frustrated Synchronization

cs.LG · 2026-06-17 · unverdicted · novelty 8.0

FSN achieves lower validation loss (1.5953) than a RoPE-SwiGLU transformer (1.611) on character-level tasks at 1M parameters by implementing next-token prediction as synchronization frustrated by data transitions.

Tight Sample Complexity of Transformers

cs.LG · 2026-06-08 · unverdicted · novelty 8.0

Depth-L transformers with W parameters have VC dimension Theta(L W log(T W)), yielding matching O(L W log((T+T')W)) upper and Omega(L W log((T+T')W/L)) lower bounds on sample complexity for chain-of-thought learning.

CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations

cs.LG · 2026-04-14 · unverdicted · novelty 8.0

CLAD is the first deep learning framework for log anomaly detection that operates directly on compressed byte streams using a dilated convolutional encoder, hybrid Transformer-mLSTM, and two-stage training, achieving 0.9909 average F1-score across five datasets.

Test-Time Training with KV Binding Is Secretly Linear Attention

cs.LG · 2026-02-24 · conditional · novelty 8.0

Test-time training with KV binding reduces to learned linear attention.

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

Joint inference of weak lensing convergence map and cosmology with diffusion models

astro-ph.CO · 2026-06-30 · unverdicted · novelty 7.0

A transformer-based diffusion model learns the joint distribution of convergence maps and cosmology from log-normal weak lensing simulations and generates calibrated posterior samples matching MCMC results.

DTM-Codec: Dynamic Token Masking for VFR Speech Coding with Efficient Boundary Selection

eess.AS · 2026-06-28 · unverdicted · novelty 7.0

DTM-Codec achieves better reconstruction quality and intelligibility than fixed-frame-rate neural speech codecs at matched total bitrate via dynamic token masking and Path Length Equalization for variable frame rates.

Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

PRA approximates sequential rollout training in parallel for pixel-space AR models via intermediate states and a pixel decoder, achieving FID 2.58 (135M params) and 1.94 (511M params) on ImageNet-1K 256x256, new SOTA among pixel-space AR models.

Scalable Peptide Design via Memory-Efficient Equivariant Transformer

cs.LG · 2026-06-23 · unverdicted · novelty 7.0 · 2 refs

MEET is a new equivariant transformer backbone that achieves linear memory scaling for full-atom peptide generation and improves quality over prior methods.

Tapered Language Models

cs.LG · 2026-06-22 · unverdicted · novelty 7.0

Tapered Language Models monotonically decrease MLP width across depth with a cosine schedule, yielding better perplexity and downstream performance than uniform-width baselines across multiple architectures and scales at no extra cost.

MADField: Multi-fidelity Amortized Density Field for Adsorption in Nanoporous Materials

physics.comp-ph · 2026-06-19 · unverdicted · novelty 7.0 · 2 refs

MADField is a multi-fidelity amortized model for predicting density fields to improve accuracy and speed of adsorption calculations in nanoporous materials for high-throughput screening.

Scaling Generative Foundation Models for Chest Radiography with Rectified Flow Transformers

cs.CV · 2026-06-17 · unverdicted · novelty 7.0

A 1.3B-parameter rectified flow transformer is the first generative foundation model for chest radiograph synthesis at billion-parameter scale, producing images indistinguishable from real ones to experts.

FoundCause: Causal Discovery with Latent Confounders from Observational Data

cs.LG · 2026-06-16 · unverdicted · novelty 7.0

FoundCause is a transformer-based amortized model for causal graph discovery that explicitly models latent confounders via learnable tokens and reports better performance than prior methods on 15 real-world datasets.

AttentionCap: Transformer Based Capacitance Matrix Learning Toward Full-Chip Extraction

cs.LG · 2026-06-06 · unverdicted · novelty 7.0

AttentionCap, a customized Transformer, predicts capacitance matrices across multiple process nodes with 0.67% self-capacitance and 3.99% coupling error on unseen designs, outperforming CNN baselines in accuracy and speed.

Stateful Visual Encoders for Vision-Language Models

cs.CV · 2026-06-03 · unverdicted · novelty 7.0

Stateful visual encoders condition each visual representation on prior features, yielding consistent gains on multi-image tasks under supervised finetuning across model sizes and domains.

Diffusing in the Right Space: A Systematic Study of Latent Diffusability

cs.CV · 2026-06-02 · unverdicted · novelty 7.0

A large-scale empirical study across tokenizers and diffusion backbones identifies Velocity Irreducible Variance (VIV) as one of the most stable predictors of latent diffusion generation quality.

From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression

cs.CL · 2026-06-01 · unverdicted · novelty 7.0

SubFit enables better LLM compression by fitting residual bypasses to non-contiguously selected submodules, outperforming layer-granularity baselines in accuracy-perplexity trade-offs at 12.5-37.5% sparsity.

mRNAutilus: Multi-Objective-Guided Discrete Generation of mRNA with Optimized Therapeutic Properties

q-bio.BM · 2026-05-29 · unverdicted · novelty 7.0

mRNAutilus generates full-length therapeutic mRNAs via diffusion models and multi-objective guidance, achieving over 400-fold expression gains for luciferase and outperforming baselines for Spike and other targets in zero-shot tests.

Chess-World-Model: A 10M-Game Benchmark for Exact State Tracking from Chess Move Sequences

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Introduces Chess-World-Model benchmark from 10M chess games showing recurrent models (SLiCE, Mamba-3, Gated DeltaNet) outperform Transformers on exact state tracking, with random-play split remaining hard at larger scales.

An In-Vitro Study on Cross-Lingual Generalization in Language Models

cs.CL · 2026-05-26 · unverdicted · novelty 7.0

An in-vitro study with synthetic languages finds cross-lingual transfer depends more on tokenization preserving reusable substructure than on lexical similarity or balance, with transfer emerging in stages.

Forgotten Words: Benchmarking NeoBERT for Dementia Detection in Low-Resource Conversational Filipino and English Speech

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

Bilingual fine-tuning on a new parallel Filipino-English dementia dataset yields Macro-F1 scores of 0.969-0.973 and eliminates cross-lingual degradation for all tested transformers.

MuCRASP: Multimodal Chain-of-thought Reasoning aware Structured Pruning

cs.AI · 2026-05-25 · unverdicted · novelty 7.0

MuCRASP prunes VLMs in a CoT-aware manner, outperforming baselines by preserving reasoning quality at 30-50% compression rates on models like Qwen2.5-VL-7B.

citing papers explorer

Showing 50 of 60 citing papers after filters.

DataComp-VLM: Improved Open Datasets for Vision-Language Models cs.CV · 2026-06-26 · conditional · none · ref 258 · 2 links · internal anchor
DataComp-VLM benchmark shows instruction-heavy data mixing outperforms filtering for VLM training, with DCVLM-Baseline achieving 63.6% on 33 tasks for 8B models (+5.4pp over FineVision).
Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation cs.CV · 2026-06-26 · unverdicted · none · ref 10 · internal anchor
PRA approximates sequential rollout training in parallel for pixel-space AR models via intermediate states and a pixel decoder, achieving FID 2.58 (135M params) and 1.94 (511M params) on ImageNet-1K 256x256, new SOTA among pixel-space AR models.
Scaling Generative Foundation Models for Chest Radiography with Rectified Flow Transformers cs.CV · 2026-06-17 · unverdicted · none · ref 8 · internal anchor
A 1.3B-parameter rectified flow transformer is the first generative foundation model for chest radiograph synthesis at billion-parameter scale, producing images indistinguishable from real ones to experts.
Stateful Visual Encoders for Vision-Language Models cs.CV · 2026-06-03 · unverdicted · none · ref 63 · internal anchor
Stateful visual encoders condition each visual representation on prior features, yielding consistent gains on multi-image tasks under supervised finetuning across model sizes and domains.
Diffusing in the Right Space: A Systematic Study of Latent Diffusability cs.CV · 2026-06-02 · unverdicted · none · ref 56 · internal anchor
A large-scale empirical study across tokenizers and diffusion backbones identifies Velocity Irreducible Variance (VIV) as one of the most stable predictors of latent diffusion generation quality.
Disentangling Generation and Regression in Stochastic Interpolants for Controllable Image Restoration cs.CV · 2026-05-20 · unverdicted · none · ref 67 · internal anchor
DiSI disentangles stochastic interpolants into separate generation and regression paths, allowing controllable transitions between regression and generative image restoration with a unified few-step sampler.
StableHand: Quality-Aware Flow Matching for World-Space Dual-Hand Motion Estimation from Egocentric Video cs.CV · 2026-05-18 · unverdicted · none · ref 40 · internal anchor
StableHand introduces a quality-aware flow matching framework conditioned on predicted four-channel per-frame hand observation quality to estimate dual-hand world-space motion from egocentric video, achieving SOTA results with 20-25% W-MPJPE reduction on HOT3D and ARCTIC benchmarks.
VMU-Diff: A Coarse-to-fine Multi-source Data Fusion Framework for Precipitation Nowcasting cs.CV · 2026-05-14 · unverdicted · none · ref 72 · internal anchor
VMU-Diff improves precipitation nowcasting via coarse multi-source Vision Mamba fusion followed by residual conditional diffusion refinement.
Degradation-Aware Adaptive Context Gating for Unified Image Restoration cs.CV · 2026-05-02 · unverdicted · none · ref 47 · internal anchor
DACG-IR adds a lightweight degradation-aware module that generates prompts to adaptively gate attention temperature, output features, and spatial-channel fusion in an encoder-decoder network for unified image restoration.
Beyond Heuristics: Learnable Density Control for 3D Gaussian Splatting cs.CV · 2026-05-01 · unverdicted · none · ref 20 · 2 links · internal anchor
LeGS turns density control in 3D Gaussian Splatting into a learnable RL policy whose reward is derived from a closed-form sensitivity analysis that measures each Gaussian's marginal contribution to reconstruction quality.
WildSplatter: Feed-forward 3D Gaussian Splatting with Appearance Control from Unconstrained Images cs.CV · 2026-04-23 · unverdicted · none · ref 24 · internal anchor
WildSplatter jointly learns 3D Gaussians and appearance embeddings from unconstrained photo collections to enable fast feed-forward reconstruction and flexible lighting control in 3D Gaussian Splatting.
Envisioning the Future, One Step at a Time cs.CV · 2026-04-10 · unverdicted · none · ref 93 · internal anchor
An autoregressive diffusion model on sparse point trajectories predicts multi-modal future scene dynamics from single images with orders-of-magnitude faster sampling than dense video simulators while matching accuracy.
Mem3R: Streaming 3D Reconstruction with Hybrid Memory via Test-Time Training cs.CV · 2026-04-08 · unverdicted · none · ref 44 · internal anchor
Mem3R achieves better long-sequence 3D reconstruction by decoupling tracking and mapping with a hybrid memory of TTT-updated MLP and explicit tokens, reducing model size and trajectory errors.
ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training cs.CV · 2026-03-04 · unverdicted · none · ref 56 · internal anchor
ZipMap achieves linear-time bidirectional 3D reconstruction by zipping image collections into a compact stateful representation via test-time training layers.
Gated Differential Linear Attention: A Linear-Time Decoder for High-Fidelity Medical Segmentation cs.CV · 2026-03-03 · unverdicted · none · ref 32 · internal anchor
GDLA delivers state-of-the-art accuracy on CT, MRI, ultrasound and dermoscopy segmentation benchmarks while keeping linear O(N) complexity in a PVT encoder-decoder.
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation cs.CV · 2024-06-10 · conditional · none · ref 28 · internal anchor
Scaled vanilla autoregressive models based on Llama achieve 2.18 FID on ImageNet 256x256 image generation, beating popular diffusion models without visual inductive biases.
Multimodal Continuous Reasoning via Asymmetric Mutual Variational Learning cs.CV · 2026-07-01 · unverdicted · none · ref 42 · internal anchor
AMVL applies bidirectional KL calibration to align answer-agnostic prior with answer-conditioned posterior in variational multimodal reasoning, reducing leakage and yielding +10.83 average gain on BLINK benchmark.
Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance cs.CV · 2026-06-17 · unverdicted · none · ref 37 · internal anchor
Moebius introduces a compressed diffusion inpainting model using Local-λ Mix Interaction blocks and latent-space multi-granularity distillation to reach 10B-level quality with 0.22B parameters.
Cross-scale Aligned Supervision for Training GANs cs.CV · 2026-05-26 · unverdicted · none · ref 32 · internal anchor
CAT achieves FID-50K of 1.56 on ImageNet-256 with one-step inference after 60 epochs by aligning intermediate GAN outputs to the final sample.
SP-MoMamba: Superpixel-driven Mixture of State Space Experts for Efficient Image Super-Resolution cs.CV · 2026-05-25 · unverdicted · none · ref 46 · internal anchor
SP-MoMamba uses superpixels to drive content-aware state space modeling and multi-scale mixture-of-experts for efficient single-image super-resolution.
RiT: Vanilla Diffusion Transformers Suffice in Representation Space cs.CV · 2026-05-21 · conditional · none · ref 26 · internal anchor
A vanilla Diffusion Transformer trained via x-prediction on frozen DINOv2 features reaches FID 1.14 on ImageNet 256x256 with fewer parameters and faster sampling than prior DiT variants.
Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models cs.CV · 2026-05-20 · unverdicted · none · ref 5 · internal anchor
Polynomial replacements for activations in MLPs, convolutions, and attention within MetaFormer yield PolyNeXt models that match or exceed standard performance on ImageNet, ADE20K, and robustness benchmarks while beating prior polynomial networks.
HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer cs.CV · 2026-05-11 · unverdicted · none · ref 44 · internal anchor
A pixel-space Diffusion Transformer with Unified Transformer architecture unifies image generation, editing, and personalization in an end-to-end model that maps all inputs to a shared token space and scales from 8B to over 200B parameters.
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion cs.CV · 2026-05-08 · unverdicted · none · ref 71 · internal anchor
Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.
Velox: Learning Representations of 4D Geometry and Appearance cs.CV · 2026-05-06 · unverdicted · none · ref 76 · internal anchor
Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth simulation.
End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer cs.CV · 2026-05-01 · unverdicted · none · ref 31 · internal anchor
An end-to-end autoregressive model with a jointly trained 1D semantic tokenizer achieves state-of-the-art FID 1.48 on ImageNet 256x256 generation without guidance.
PointTransformerX: Portable and Efficient 3D Point Cloud Processing without Sparse Algorithms cs.CV · 2026-04-27 · unverdicted · none · ref 39 · internal anchor
PointTransformerX is a fully PyTorch-native 3D point cloud transformer backbone that reaches 98.7% of PointTransformer V3 accuracy on ScanNet using 79.2% fewer parameters, 1.6x faster inference, and only 253 MB memory while running natively on NVIDIA, AMD, and CPU hardware.
Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation cs.CV · 2026-04-23 · unverdicted · none · ref 32 · internal anchor
Synthetic data complements real data in diffusion-based controllable human video generation, with effective sample selection improving motion realism, temporal consistency, and identity preservation.
HAMSA: Scanning-Free Vision State Space Models via SpectralPulseNet cs.CV · 2026-04-16 · unverdicted · none · ref 56 · internal anchor
HAMSA achieves 85.7% ImageNet-1K top-1 accuracy as a spectral-domain SSM with 2.2x faster inference and lower memory than transformers or scanning-based SSMs.
Nucleus-Image: Sparse MoE for Image Generation cs.CV · 2026-04-14 · unverdicted · none · ref 19 · internal anchor
A 17B-parameter sparse MoE diffusion transformer activates 2B parameters per pass and reaches competitive quality on image generation benchmarks without post-training.
Learning Long-term Motion Embeddings for Efficient Kinematics Generation cs.CV · 2026-04-13 · unverdicted · none · ref 33 · internal anchor
A 64x temporally compressed motion embedding learned from trackers enables efficient conditional flow-matching generation of long-term motions that outperform video models and task-specific methods.
Fast Spatial Memory with Elastic Test-Time Training cs.CV · 2026-04-08 · unverdicted · none · ref 43 · internal anchor
Elastic Test-Time Training stabilizes test-time updates via an elastic prior and moving-average anchor, enabling Fast Spatial Memory for scalable long-sequence 4D reconstruction with reduced memory use and fewer shortcuts.
Orthogonal Quadratic Complements for Vision Transformer Feed-Forward Networks cs.CV · 2026-04-08 · unverdicted · none · ref 5 · internal anchor
Orthogonal Quadratic Complements project low-rank quadratic auxiliary branches onto the orthogonal complement of the main hidden state in vision transformer FFNs, improving accuracy by 1.3 points on CIFAR-100 and 1.4 points on TinyImageNet over baselines.
ViT$^3$: Unlocking Test-Time Training in Vision cs.CV · 2025-12-01 · unverdicted · none · ref 50 · internal anchor
ViT³ is a Test-Time Training vision model that achieves linear complexity, matches or exceeds other linear models like Mamba on classification, generation, detection and segmentation, and narrows the gap to standard vision Transformers.
Back to Basics: Let Denoising Generative Models Denoise cs.CV · 2025-11-17 · unverdicted · none · ref 54 · internal anchor
Directly predicting clean data with large-patch pixel Transformers enables strong generative performance in diffusion models where noise prediction fails at high dimensions.
Emu3.5: Native Multimodal Models are World Learners cs.CV · 2025-10-30 · unverdicted · none · ref 78 · internal anchor
Emu3.5 is a native multimodal world model pre-trained on over 10 trillion vision-language tokens with next-token prediction, post-trained via reinforcement learning, and accelerated by Discrete Diffusion Adaptation for efficient interleaved generation and world exploration.
Scaling Sequence-to-Sequence Generative Neural Rendering cs.CV · 2025-10-05 · unverdicted · none · ref 13 · internal anchor
Kaleido is a masked autoregressive generative model that unifies 3D view synthesis and video modeling by pre-training a single transformer on video data, achieving SOTA zero-shot and many-view performance on view synthesis benchmarks.
Pixtral 12B cs.CV · 2024-10-09 · unverdicted · none · ref 19 · internal anchor
Pixtral-12B is a 12B multimodal LLM with a custom vision encoder that ingests images at native resolution and aspect ratio, achieving leading benchmark results among open models while preserving text capabilities.
Emu3: Next-Token Prediction is All You Need cs.CV · 2024-09-27 · unverdicted · none · ref 76 · internal anchor
Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.
CogVLM: Visual Expert for Pretrained Language Models cs.CV · 2023-11-06 · conditional · none · ref 19 · internal anchor
CogVLM adds a trainable visual expert inside frozen language model layers for deep vision-language fusion and reports state-of-the-art results on ten cross-modal benchmarks while preserving NLP performance.
Vision Transformers Need Registers cs.CV · 2023-09-28 · unverdicted · none · ref 297 · internal anchor
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
ERA: Entropy-Guided Visual Token Pruning with Rectified Attention for Efficient MLLMs cs.CV · 2026-06-30 · unverdicted · none · ref 40 · internal anchor
ERA proposes entropy-guided token pruning with bias-aware recycling and logit rectification to compress visual inputs in MLLMs while mitigating attention collapse.
TuringViT: Making SOTA Vision Transformers Accessible to All cs.CV · 2026-06-23 · unverdicted · none · ref 31 · 2 links · internal anchor
TuringViT claims a new ViT design with linear attention and curated data that matches SOTA performance using 10% of typical pretraining data while supporting dynamic resolutions and improving VLM integration.
SSD: Spatially Speculative Decoding Accelerates Autoregressive Image Generation cs.CV · 2026-06-18 · unverdicted · none · ref 34 · internal anchor
SSD predicts multiple spatially adjacent tokens at once in autoregressive image models, claiming up to 13.3x inference speedup on DPG-Bench and GenEval with maintained fidelity.
Where Will They Go? Modelling Multimodal Pedestrian Manoeuvres from Ego-centric Videos cs.CV · 2026-06-17 · unverdicted · none · ref 35 · internal anchor
MMPM uses PIM for gaze/head/hand interactions and MTP (CVAE with query decoder) to model separate crossing/non-crossing trajectory distributions, outperforming baselines on PIE and JAAD with a new validation protocol.
HiSem: Hierarchical Semantic Disentangling for Remote Sensing Image Change Captioning cs.CV · 2026-05-14 · unverdicted · none · ref 70 · internal anchor
HiSem adds bidirectional differential attention and a two-level hierarchical routing module with MoE to handle semantic granularity differences in remote sensing change captioning, reporting +7.52% BLEU-4 on WHU-CDC.
Colinearity Decay: Training Quantization-Friendly ViTs with Outlier Decay cs.CV · 2026-05-02 · unverdicted · none · ref 33 · internal anchor
Colinearity-Decay regularizer trains ViTs that maintain or improve full-precision accuracy while delivering higher accuracy after low-bit quantization on ImageNet and COCO tasks.
Uni-Encoder Meets Multi-Encoders: Representation Before Fusion for Brain Tumor Segmentation with Missing Modalities cs.CV · 2026-04-24 · unverdicted · none · ref 38 · internal anchor
UniME combines a pretrained unified ViT encoder with modality-specific CNN encoders to improve brain tumor segmentation performance when some MRI modalities are missing.
Sapiens2 cs.CV · 2026-04-23 · unverdicted · none · ref 24 · internal anchor
Sapiens2 improves pretraining, data scale, and architecture over its predecessor to set new state-of-the-art results on human pose estimation, body-part segmentation, normal estimation, and new tasks like pointmap and albedo estimation.
Improved Mean Flows: On the Challenges of Fastforward Generative Models cs.CV · 2025-12-01 · unverdicted · none · ref 41 · internal anchor
Improved MeanFlow (iMF) reaches 1.72 FID on ImageNet 256x256 with one function evaluation by reformulating the training objective as a regression on instantaneous velocity and treating guidance as flexible conditioning variables.

GLU Variants Improve Transformer

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer