STREAM decouples text and music conditioning in a diffusion transformer via AdaLN for structure and BEAM for beats, plus new Motorica++ dataset and editability metrics, claiming SOTA music alignment with preserved semantics.
Mixed citations
U-Net: Convolutional networks for biomedical image segmentation
Mixed citation behavior. Most common role is background (50%).
citation-role summary
citation-polarity summary
authors
co-cited works
representative citing papers
A new quality-guided approach for semi-supervised medical image segmentation that trains a predictor on synthetic errors to enhance pseudolabel handling.
GPROF-IR is a CNN-based retrieval that uses temporal context in geostationary IR observations to produce precipitation estimates with lower error than prior IR methods and climatological consistency with PMW retrievals for integration into IMERG V08.
AttentionBender applies 2D transforms to cross-attention maps in video diffusion transformers, producing distributed distortions and glitch aesthetics that reveal entangled attention mechanisms while serving as both an XAI probe and creative tool.
A LoRA-adapted conditional diffusion surrogate for electromagnetic calorimeter showers matches key observables within 2% RMSE and reproduces directional trends in design-utility gradients.
OOD-SEG reframes multi-class segmentation from sparse positive-only annotations as pixel-wise positive-unlabelled learning solved by integrating out-of-distribution detection techniques, with a proposed cross-validation evaluation on surgical imaging datasets.
Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, and Pix2Pix.
DeepMine-Mamba adds an Anti-Dilution Gate to Mamba-based models to counteract feature dilution in document binarization and reports competitive FM and Fps scores on DIBCO benchmarks under leave-one-year-out evaluation.
High magnetic fields directly enhance the amplitude and correlation length of stripe order in a cuprate superconductor far above the vortex melting transition, indicating a coupling mechanism independent of superconductivity suppression.
CrackGeoFM is a multi-task framework that adapts a frozen visual foundation model with FCEM, CFAM, and SMTD modules for crack mask prediction, skeleton reconstruction, and uncertainty estimation, reporting SOTA results across 20 datasets including few-shot settings.
PaNO neural operator improves port-power readout fidelity in photonic design surrogates over global-field baselines on a 3x3 MMI benchmark.
ARC-STAR reduces velocity rollout error by at least 36x over raw Poseidon across all tested regime cells via auditable global and local correction stages on five flow benchmarks.
A Jacobian sensitivity curve computed at initialization identifies the narrowest U-Net configuration that avoids performance collapse, matching nnU-Net accuracy with 400-1600x fewer parameters on six medical datasets.
Commutativity regularization mitigates transient error amplification in autoregressive neural simulators by penalizing non-normality and non-commutativity of Jacobians, yielding stable long-horizon rollouts.
Spectral analysis of activations and gradients provides new diagnostics that link batch size to representation geometry, early covariance tails to token efficiency, and spectral shifts to learning dynamics in decoder-only LLMs, backed by a mechanistic model.
A recurrent Vision Transformer hypernetwork injects context into Flux Neural Operators to infer and solve unseen conservation laws while preserving robustness and long-time stability.
SIAM achieves state-of-the-art whole-head MRI segmentation of 16 structures including extra-cerebral tissues by training on synthetic data from just six manual templates, matching or exceeding prior methods on 301 scans across eight heterogeneous datasets.
Cross-domain transfer of remote-sensing HSI foundation models improves proximal sensing semantic segmentation over in-domain training and narrows the gap to cross-modality methods on the HS3-Bench benchmark.
VSLP infers dense segmentations from global label proportions via a pre-trained transformer for initial confidence maps followed by variational optimization using Wasserstein fidelity and a learned regularizer, outperforming prior weakly supervised methods on histopathology datasets.
The ICPR 2026 LRLPR competition on real low-quality license plate images drew 99 valid submissions, with the winning team reaching 82.13% recognition rate and four teams exceeding 80%.
A tornado outbreak with simultaneous tornadic supercells occurred in the Philippines within an easterly severe weather regime, documented as the first known instance there.
FlowForge predicts flow fields via staged local updates with a shared lightweight predictor, matching or exceeding baselines in accuracy while improving robustness to noise and reducing latency.
PSIRNet produces diagnostic-quality free-breathing PSIR LGE cardiac MRI from a single interleaved IR/PD acquisition over two heartbeats using a physics-guided deep learning network trained on over 800,000 slices.
CATMIL augments nnU-Net with component-adaptive Tversky and MIL-based lesion supervision to raise Dice scores, small-lesion recall, and error control on the MSLesSeg dataset.
citing papers explorer
-
Quality-Guided Semi-Supervised Learning for Medical Image Segmentation
A new quality-guided approach for semi-supervised medical image segmentation that trains a predictor on synthetic errors to enhance pseudolabel handling.
-
OOD-SEG: Exploiting out-of-distribution detection techniques for learning image segmentation from sparse multi-class positive-only annotations
OOD-SEG reframes multi-class segmentation from sparse positive-only annotations as pixel-wise positive-unlabelled learning solved by integrating out-of-distribution detection techniques, with a proposed cross-validation evaluation on surgical imaging datasets.
-
DeepMine-Mamba: Mitigating Information Dilution in Mamba-Based State Space Models for Document Image Binarization
DeepMine-Mamba adds an Anti-Dilution Gate to Mamba-based models to counteract feature dilution in document binarization and reports competitive FM and Fps scores on DIBCO benchmarks under leave-one-year-out evaluation.
-
Multi-Task Crack Foundation Model for Engineering-Reliable Crack Representation and Topology Preservation in Civil Infrastructure
CrackGeoFM is a multi-task framework that adapts a frozen visual foundation model with FCEM, CFAM, and SMTD modules for crack mask prediction, skeleton reconstruction, and uncertainty estimation, reporting SOTA results across 20 datasets including few-shot settings.
-
SIAM: Head and Brain MRI Segmentation from Few High-Quality Templates via Synthetic Training
SIAM achieves state-of-the-art whole-head MRI segmentation of 16 structures including extra-cerebral tissues by training on synthetic data from just six manual templates, matching or exceeding prior methods on 301 scans across eight heterogeneous datasets.
-
Cross-Domain Transfer of Hyperspectral Foundation Models
Cross-domain transfer of remote-sensing HSI foundation models improves proximal sensing semantic segmentation over in-domain training and narrows the gap to cross-modality methods on the HS3-Bench benchmark.
-
Component-Adaptive and Lesion-Level Supervision for Improved Small Structure Segmentation in Brain MRI
CATMIL augments nnU-Net with component-adaptive Tversky and MIL-based lesion supervision to raise Dice scores, small-lesion recall, and error control on the MSLesSeg dataset.
-
RABC-Net: Reliability-Aware Annotation-Free Skin Lesion Segmentation for Low-Resource Dermoscopy
RABC-Net achieves 86.58% DICE and 79.47% JAC on skin lesion segmentation across ISIC-2017, ISIC-2018, and PH2 using only pseudo-labels and no manual masks for training or adaptation.
-
RSEdit: Text-Guided Image Editing for Remote Sensing
RSEdit adapts off-the-shelf text-to-image models into a collection of editing systems that follow text instructions while keeping geospatial structure intact in remote sensing images.
-
GEAR-Seg: A Grounded Explainable Agent for Reasoning Segmentation and Data Engine
GEAR-Seg decouples segmentation, semantic description, and LLM reasoning into an explicit chain for interpretable zero-shot reasoning segmentation while generating the GEAR-131K dataset.
-
Test-Time Adaptation in Optical Coherence Tomography Using Trajectory-Aligned Time-Independent Flow
Flow-matching TTA with histogram matching to synthetic reference trajectories and time-independent flow achieves SOTA segmentation of AMD biomarkers in OCT.
-
SegTME-UNI2: A Foundation Model-Based Framework for Generalisable Multiclass Cell Segmentation and LLM-Driven Tumour Microenvironment Characterisation in Histopathology
SegTME-UNI2 pairs a UNI2-based dual-head segmentation model trained via progressive pseudo-labeling with an LLM to produce multiclass cell maps and narrative TME descriptions from H&E images.
-
CDPM-Align: Multi-Scale Guidance-Aligned Diffusion Pretraining for Robust Few-Shot Anatomical Landmark Detection
CDPM-Align applies multi-scale guidance-aligned conditional diffusion pretraining on three small heterogeneous datasets to improve accuracy and uncertainty in few-shot (10-25 image) anatomical landmark detection.
-
Weighted Knowledge Distillation for Semi-Supervised Segmentation of Maxillary Sinus in Panoramic X-ray Images
A semi-supervised framework using weighted knowledge distillation and SinusCycle-GAN refinement achieves 96.35% Dice score for maxillary sinus segmentation in panoramic X-rays from 2,511 patients.
-
Training-inference input alignment outweighs framework choice in longitudinal retinal image prediction
Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.
-
HTC-SGA Former: A Hybrid Transformer-CNN Network with Self-Guided Attention and a New Boundary-Weighted Adaptive Loss for Coronary DSA Vessel Segmentation
HTC-SGA Former is a hybrid CNN-Transformer model with MS-GLWA, SGFA, and BWACL that outperforms 14 prior methods on private coronary DSA datasets using only 0.81M parameters.
-
Do We Really Need Diffusion? A Fast U-Net for Paired Medical Image Translation
Lightweight U-Net outperforms DDPM on T2w-to-MRI-SFF translation (r=0.975 vs 0.962, MAE=0.014 vs 0.019) with 208x faster inference on 230k paired images from NAKO.
-
Patient-Level Diagnosis of Acute Myeloid Leukemia via Deep Learning Analysis of Bone Marrow Smear
YOLO segmentation plus EfficientNet classification aggregates cell predictions to patient-level CBLC ratios, reporting weighted F1 scores of 0.87-0.91 on three external center cohorts from 89 patients.
-
Few-Shot Left Atrial Wall Segmentation in 3D LGE MRI via Meta-Learning
MAML with auxiliary cavity tasks and boundary loss improves 5-shot LA wall segmentation over standard fine-tuning (DSC 0.54 vs 0.48) and nears fully supervised performance at 20 shots.