hub

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun · 2015 · cs.CV · arXiv 1512.03385

64 Pith papers cite this work. Polarity classification is still indexing.

64 Pith papers citing it

open full Pith review browse 64 citing papers arXiv PDF

abstract

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

hub tools

JSON dossier citing papers JSON arXiv source

claims ledger

abstract Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG ne

co-cited works

representative citing papers

WaveNet: A Generative Model for Raw Audio

cs.SD · 2016-09-12 · accept · novelty 9.0

WaveNet generates realistic raw audio using an autoregressive neural network with dilated convolutions, achieving state-of-the-art naturalness in speech synthesis for English and Mandarin.

Density estimation using Real NVP

cs.LG · 2016-05-27 · accept · novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo

cond-mat.str-el · 2026-05-13 · conditional · novelty 7.0

PSR-NQS makes recurrent neural quantum states scalable for variational Monte Carlo by using parallel scan recurrence, reaching accurate results on 52x52 two-dimensional lattices.

Gradient Clipping Beyond Vector Norms: A Spectral Approach for Matrix-Valued Parameters

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Spectral clipping of leading singular values in gradient matrices stabilizes SGD for non-convex problems with heavy-tailed noise and achieves the optimal convergence rate O(K^{(2-2α)/(3α-2)}).

Optimal Representations for Generalized Contrastive Learning with Imbalanced Datasets

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

In generalized contrastive learning with imbalanced classes, optimal representations collapse to class means whose angular geometry is determined by class proportions via convex optimization, and extreme imbalance causes all minority classes to collapse to one vector.

Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

Concept-based abductive and contrastive explanations find minimal high-level concepts that causally determine vision model outcomes on individual images or groups sharing a specified behavior.

Replica Theory of Spherical Boltzmann Machine Ensembles

cond-mat.dis-nn · 2026-04-20 · unverdicted · novelty 7.0

Replica calculations fully solve spherical Boltzmann machine ensembles and identify regimes where ensemble learning outperforms standard training, particularly for nearly finite-dimensional data.

Grokking of Diffusion Models: Case Study on Modular Addition

cs.LG · 2026-04-20 · unverdicted · novelty 7.0

Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.

Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading

cs.CR · 2026-04-19 · unverdicted · novelty 7.0 · 2 refs

Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.

Momentum Further Constrains Sharpness at the Edge of Stochastic Stability

cs.LG · 2026-04-15 · unverdicted · novelty 7.0

Momentum SGD exhibits two distinct EoSS regimes for batch sharpness, stabilizing at 2(1-β)/η for small batches and 2(1+β)/η for large batches, aligning with linear stability thresholds.

Seg2Change: Adapting Open-Vocabulary Semantic Segmentation Model for Remote Sensing Change Detection

cs.CV · 2026-04-13 · conditional · novelty 7.0

Seg2Change adapts open-vocabulary segmentation models to open-vocabulary change detection via a category-agnostic change head and new dataset CA-CDD, delivering +9.52 IoU on WHU-CD and +5.50 mIoU on SECOND.

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

cs.LG · 2024-02-27 · unverdicted · novelty 7.0

HSTU-based generative recommenders with 1.5 trillion parameters scale as a power law with compute up to GPT-3 scale, outperform baselines by up to 65.8% NDCG, run 5-15x faster than FlashAttention2 on long sequences, and improve online A/B metrics by 12.4%.

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

cs.CV · 2017-04-17 · accept · novelty 7.0

MobileNets introduce depthwise separable convolutions plus width and resolution multipliers to produce efficient CNNs that trade off latency and accuracy for mobile and embedded vision applications.

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

cs.CL · 2016-11-28 · accept · novelty 7.0

MS MARCO is a new large-scale machine reading comprehension dataset built from real Bing search queries, human-generated answers, and web passages, supporting three tasks including answer synthesis and passage ranking.

Wide Residual Networks

cs.CV · 2016-05-23 · accept · novelty 7.0

Wide residual networks achieve higher accuracy and faster training than very deep thin residual networks by increasing width and decreasing depth, setting new state-of-the-art results on CIFAR, SVHN, and ImageNet.

Training Deep Nets with Sublinear Memory Cost

cs.LG · 2016-04-21 · accept · novelty 7.0

An algorithm trains n-layer networks with O(sqrt(n)) memory via selective recomputation of activations, at the cost of one extra forward pass.

MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

cs.DC · 2026-05-11 · unverdicted · novelty 6.0

Chakra introduces a portable, interoperable graph-based execution trace format for distributed ML workloads along with supporting tools to standardize performance benchmarking and software-hardware co-design.

StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception

cs.RO · 2026-05-11 · unverdicted · novelty 6.0

StereoPolicy fuses stereo image pairs via a Stereo Transformer on pretrained 2D encoders to boost robotic manipulation policies, showing gains over monocular, RGB-D, point cloud, and multi-view methods in simulations and real-robot tests.

Event Fields: Learning Latent Event Structure for Waveform Foundation Models

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Event-centric waveform foundation models are learned via self-supervised consistency on latent event structures and interactions, yielding improved performance and label efficiency over sequence-based baselines on physiological tasks.

It Just Takes Two: Scaling Amortized Inference to Large Sets

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

A mean-pool deep set trained on sets of size at most two produces an encoder that generalizes to arbitrary sizes, decoupling representation learning from posterior modeling and making training cost independent of deployment set size N.

ICDAR 2026 Competition on Writer Identification and Pen Classification from Hand-Drawn Circles

cs.CV · 2026-05-08 · accept · novelty 6.0

A new dataset of hand-drawn circles from 66 writers and 8 pens yields competition results of 64.8% top-1 accuracy for open-set writer identification and 92.7% for pen classification.

Detecting Adversarial Data via Provable Adversarial Noise Amplification

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

A provable adversarial noise amplification theorem under sufficient conditions enables a custom-trained detector that identifies adversarial examples at inference time using enhanced layer-wise noise signals.

ShapeY: A Principled Framework for Measuring Shape Recognition Capacity via Nearest-Neighbor Matching

cs.CV · 2026-04-27 · unverdicted · novelty 6.0

ShapeY is a benchmark dataset and nearest-neighbor protocol that measures shape-based recognition in vision models, revealing that even state-of-the-art networks fail to generalize consistently across 3D viewpoints and non-shape appearance changes.

Fine-Tuning Regimes Define Distinct Continual Learning Problems

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

The relative rankings of continual learning methods are not preserved across different fine-tuning regimes defined by trainable parameter depth.

citing papers explorer

Showing 50 of 64 citing papers.

WaveNet: A Generative Model for Raw Audio cs.SD · 2016-09-12 · accept · none · ref 11 · internal anchor
WaveNet generates realistic raw audio using an autoregressive neural network with dilated convolutions, achieving state-of-the-art naturalness in speech synthesis for English and Mandarin.
Density estimation using Real NVP cs.LG · 2016-05-27 · accept · none · ref 24 · internal anchor
Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo cond-mat.str-el · 2026-05-13 · conditional · none · ref 44 · internal anchor
PSR-NQS makes recurrent neural quantum states scalable for variational Monte Carlo by using parallel scan recurrence, reaching accurate results on 52x52 two-dimensional lattices.
Gradient Clipping Beyond Vector Norms: A Spectral Approach for Matrix-Valued Parameters cs.LG · 2026-05-12 · unverdicted · none · ref 24 · internal anchor
Spectral clipping of leading singular values in gradient matrices stabilizes SGD for non-convex problems with heavy-tailed noise and achieves the optimal convergence rate O(K^{(2-2α)/(3α-2)}).
Optimal Representations for Generalized Contrastive Learning with Imbalanced Datasets cs.LG · 2026-05-11 · unverdicted · none · ref 104 · internal anchor
In generalized contrastive learning with imbalanced classes, optimal representations collapse to class means whose angular geometry is determined by class proportions via convex optimization, and extreme imbalance causes all minority classes to collapse to one vector.
Concept-Based Abductive and Contrastive Explanations for Behaviors of Vision Models cs.LG · 2026-05-07 · unverdicted · none · ref 49 · internal anchor
Concept-based abductive and contrastive explanations find minimal high-level concepts that causally determine vision model outcomes on individual images or groups sharing a specified behavior.
Replica Theory of Spherical Boltzmann Machine Ensembles cond-mat.dis-nn · 2026-04-20 · unverdicted · none · ref 54 · internal anchor
Replica calculations fully solve spherical Boltzmann machine ensembles and identify regimes where ensemble learning outperforms standard training, particularly for nearly finite-dimensional data.
Grokking of Diffusion Models: Case Study on Modular Addition cs.LG · 2026-04-20 · unverdicted · none · ref 9 · internal anchor
Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading cs.CR · 2026-04-19 · unverdicted · none · ref 32 · 2 links · internal anchor
Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
Momentum Further Constrains Sharpness at the Edge of Stochastic Stability cs.LG · 2026-04-15 · unverdicted · none · ref 15 · internal anchor
Momentum SGD exhibits two distinct EoSS regimes for batch sharpness, stabilizing at 2(1-β)/η for small batches and 2(1+β)/η for large batches, aligning with linear stability thresholds.
Seg2Change: Adapting Open-Vocabulary Semantic Segmentation Model for Remote Sensing Change Detection cs.CV · 2026-04-13 · conditional · none · ref 25 · internal anchor
Seg2Change adapts open-vocabulary segmentation models to open-vocabulary change detection via a category-agnostic change head and new dataset CA-CDD, delivering +9.52 IoU on WHU-CD and +5.50 mIoU on SECOND.
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations cs.LG · 2024-02-27 · unverdicted · none · ref 114 · internal anchor
HSTU-based generative recommenders with 1.5 trillion parameters scale as a power law with compute up to GPT-3 scale, outperform baselines by up to 65.8% NDCG, run 5-15x faster than FlashAttention2 on long sequences, and improve online A/B metrics by 12.4%.
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications cs.CV · 2017-04-17 · accept · none · ref 8 · internal anchor
MobileNets introduce depthwise separable convolutions plus width and resolution multipliers to produce efficient CNNs that trade off latency and accuracy for mobile and embedded vision applications.
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset cs.CL · 2016-11-28 · accept · none · ref 8 · internal anchor
MS MARCO is a new large-scale machine reading comprehension dataset built from real Bing search queries, human-generated answers, and web passages, supporting three tasks including answer synthesis and passage ranking.
Wide Residual Networks cs.CV · 2016-05-23 · accept · none · ref 11 · internal anchor
Wide residual networks achieve higher accuracy and faster training than very deep thin residual networks by increasing width and decreasing depth, setting new state-of-the-art results on CIFAR, SVHN, and ImageNet.
Training Deep Nets with Sublinear Memory Cost cs.LG · 2016-04-21 · accept · none · ref 10 · internal anchor
An algorithm trains n-layer networks with O(sqrt(n)) memory via selective recomputation of activations, at the cost of one extra forward pass.
MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces cs.DC · 2026-05-11 · unverdicted · none · ref 69 · internal anchor
Chakra introduces a portable, interoperable graph-based execution trace format for distributed ML workloads along with supporting tools to standardize performance benchmarking and software-hardware co-design.
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception cs.RO · 2026-05-11 · unverdicted · none · ref 14 · internal anchor
StereoPolicy fuses stereo image pairs via a Stereo Transformer on pretrained 2D encoders to boost robotic manipulation policies, showing gains over monocular, RGB-D, point cloud, and multi-view methods in simulations and real-robot tests.
Event Fields: Learning Latent Event Structure for Waveform Foundation Models cs.LG · 2026-05-09 · unverdicted · none · ref 5 · internal anchor
Event-centric waveform foundation models are learned via self-supervised consistency on latent event structures and interactions, yielding improved performance and label efficiency over sequence-based baselines on physiological tasks.
It Just Takes Two: Scaling Amortized Inference to Large Sets cs.LG · 2026-05-08 · unverdicted · none · ref 42 · internal anchor
A mean-pool deep set trained on sets of size at most two produces an encoder that generalizes to arbitrary sizes, decoupling representation learning from posterior modeling and making training cost independent of deployment set size N.
ICDAR 2026 Competition on Writer Identification and Pen Classification from Hand-Drawn Circles cs.CV · 2026-05-08 · accept · none · ref 10 · internal anchor
A new dataset of hand-drawn circles from 66 writers and 8 pens yields competition results of 64.8% top-1 accuracy for open-set writer identification and 92.7% for pen classification.
Detecting Adversarial Data via Provable Adversarial Noise Amplification cs.LG · 2026-05-04 · unverdicted · none · ref 13 · internal anchor
A provable adversarial noise amplification theorem under sufficient conditions enables a custom-trained detector that identifies adversarial examples at inference time using enhanced layer-wise noise signals.
ShapeY: A Principled Framework for Measuring Shape Recognition Capacity via Nearest-Neighbor Matching cs.CV · 2026-04-27 · unverdicted · none · ref 34 · internal anchor
ShapeY is a benchmark dataset and nearest-neighbor protocol that measures shape-based recognition in vision models, revealing that even state-of-the-art networks fail to generalize consistently across 3D viewpoints and non-shape appearance changes.
Fine-Tuning Regimes Define Distinct Continual Learning Problems cs.LG · 2026-04-23 · unverdicted · none · ref 7 · internal anchor
The relative rankings of continual learning methods are not preserved across different fine-tuning regimes defined by trainable parameter depth.
Geometric Monomial (GEM): a family of rational 2N-differentiable activation functions cs.LG · 2026-04-23 · unverdicted · none · ref 8 · internal anchor
GEM is a new family of C^{2N}-smooth rational activation functions with variants that achieve performance on par with or exceeding GELU on ResNet, GPT-2, and BERT benchmarks.
Materialistic RIR: Material Conditioned Realistic RIR Generation cs.CV · 2026-04-22 · unverdicted · none · ref 26 · internal anchor
A two-module neural model disentangles spatial layout from material properties to generate controllable and more realistic room impulse responses, reporting gains of up to 16% on acoustic metrics and 70% on material metrics plus better human ratings.
DVAR: Adversarial Multi-Agent Debate for Video Authenticity Detection cs.CV · 2026-04-18 · unverdicted · none · ref 15 · internal anchor
DVAR turns video authenticity detection into an iterative debate between a generative hypothesis agent and a natural mechanism agent, resolved via minimum description length and a knowledge base for better generalization than supervised detectors.
Symphony: Taming Step Misalignments in the Network for Ring-based Collective Operations cs.NI · 2026-04-18 · unverdicted · none · ref 29 · internal anchor
Symphony detects step misalignments in ring collectives via lightweight in-network tracking and mitigates them by throttling outpacing flows with congestion signals, yielding up to 54% better communication times in Astra-Sim simulations and a Tofino2 prototype.
Deepfake Detection Generalization with Diffusion Noise cs.CV · 2026-04-16 · unverdicted · none · ref 18 · internal anchor
ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.
The illusory simplicity of the feedforward pass: evidence for the dynamical nature of stimulus encoding along the primate ventral stream q-bio.NC · 2026-04-14 · unverdicted · none · ref 1 · internal anchor
Primate ventral stream encodes visual stimuli through evolving neural dynamics that carry category information beyond any fixed spatial pattern during the initial feedforward pass.
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting cs.RO · 2026-04-14 · unverdicted · none · ref 6 · internal anchor
Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline Parallelism cs.LG · 2026-04-13 · unverdicted · none · ref 2 · internal anchor
ResBM achieves 128x activation compression in pipeline-parallel transformer training by adding a residual bottleneck module that preserves a low-rank identity path, with no major loss in convergence or added overhead.
Zero-shot World Models Are Developmentally Efficient Learners cs.AI · 2026-04-11 · unverdicted · none · ref 113 · internal anchor
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
Enhancing event reconstruction for $\gamma$-ray particle detector arrays using transformers astro-ph.IM · 2026-04-10 · unverdicted · none · ref 40 · internal anchor
Transformer models applied to simulated water-Cherenkov array data improve gamma-hadron separation and reconstruction of direction, core position, and energy compared to established techniques.
EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World cs.RO · 2026-04-08 · unverdicted · none · ref 24 · internal anchor
EgoVerse releases 1,362 hours of standardized egocentric human data across 1,965 tasks and shows via multi-lab experiments that robot policy performance scales with human data volume when the data aligns with robot objectives.
Multispectral representation of Distributed Acoustic Sensing data: a framework for physically interpretable feature extraction and visualization physics.ins-det · 2026-04-08 · unverdicted · none · ref 4 · internal anchor
A multispectral decomposition of DAS data into band-limited energy images enables clearer visualization, unsupervised clustering, and 97.3% accurate CNN detection of whale vocalizations.
AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling cs.LG · 2026-04-07 · unverdicted · none · ref 21 · internal anchor
AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout error by roughly five times versus prior DL-ROMs and ViTs on advection-diffusion-re
Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification stat.ML · 2026-04-07 · unverdicted · none · ref 66 · internal anchor
Ensemble-based method of moments on softmax outputs produces stable Dirichlet predictive distributions that improve uncertainty-guided tasks like selective classification over evidential deep learning.
LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection cs.CV · 2026-04-05 · unverdicted · none · ref 29 · internal anchor
LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.
Physics-Informed Transformer for Real-Time High-Fidelity Topology Optimization cs.CE · 2026-04-04 · unverdicted · none · ref 40 · internal anchor
A transformer model with self-attention and auxiliary physics losses learns a direct non-iterative mapping from loads and fields to manufacturable optimized topologies.
PhDLspec: physical-prior embedded deep learning method for spectroscopic determination of stellar labels in high-dimensional parameter space astro-ph.GA · 2026-04-03 · unverdicted · none · ref 15 · internal anchor
PhDLspec combines differential spectra from physical stellar models with a transformer to derive approximately 30 stellar parameters from low-resolution spectra hundreds of times faster than traditional calculations.
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework cs.AI · 2023-08-01 · unverdicted · none · ref 206 · internal anchor
MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 234 · internal anchor
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 157 · internal anchor
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
VideoGPT: Video Generation using VQ-VAE and Transformers cs.CV · 2021-04-20 · accept · none · ref 15 · internal anchor
VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.
Rethinking Atrous Convolution for Semantic Image Segmentation cs.CV · 2017-06-17 · unverdicted · none · ref 32 · internal anchor
DeepLabv3 improves semantic segmentation by capturing multi-scale context with cascaded or parallel atrous convolutions and adding global context to ASPP, achieving better results on PASCAL VOC 2012 without DenseCRF post-processing.
SGDR: Stochastic Gradient Descent with Warm Restarts cs.LG · 2016-08-13 · accept · none · ref 7 · internal anchor
SGDR uses periodic warm restarts of the learning rate in SGD to reach new state-of-the-art error rates of 3.14% on CIFAR-10 and 16.21% on CIFAR-100.
WISTERIA: Learning Clinical Representations from Noisy Supervision via Multi-View Consistency in Electronic Health Records cs.LG · 2026-05-10 · unverdicted · none · ref 5 · internal anchor
WISTERIA learns robust clinical representations from noisy EHR labels by enforcing consistency across multiple weak supervision views plus ontology regularization.
Medical Model Synthesis Architectures: A Case Study cs.AI · 2026-05-10 · unverdicted · none · ref 240 · internal anchor
MedMSA framework retrieves knowledge via language models then builds formal probabilistic models to produce uncertainty-weighted differential diagnoses from symptoms.
mHC-SSM: Manifold-Constrained Hyper-Connections for State Space Language Models with Stream-Specialized Adapters cs.LG · 2026-05-08 · unverdicted · none · ref 6 · internal anchor
Manifold-constrained multi-stream mixing plus per-stream adapters improves SSM language model validation loss from 6.3507 to 6.1353 and perplexity from 572.91 to 461.88 on WikiText-2.

Deep Residual Learning for Image Recognition

hub tools

claims ledger

co-cited works

fields

years

verdicts

representative citing papers

citing papers explorer