REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.
hub
U-Net: Convolutional Networks for Biomedical Image Segmentation
45 Pith papers cite this work. Polarity classification is still indexing.
abstract
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segme
co-cited works
fields
cs.CV 20 cs.LG 8 cs.RO 5 eess.IV 3 cs.CE 2 astro-ph.CO 1 astro-ph.EP 1 astro-ph.GA 1 cond-mat.quant-gas 1 cs.CL 1roles
background 2polarities
background 2representative citing papers
LatentHDR generates structurally consistent panoramic HDR images by producing one scene latent with a diffusion backbone then deterministically mapping it to multiple exposure latents via a lightweight conditional head.
EchoXFlow is a new dataset of 37,125 beamspace echocardiography recordings with separable modalities, Doppler data, ECG, and clinical annotations that enables acquisition-aware learning not possible with standard scan-converted videos.
Influpaint uses generative diffusion models on image-encoded influenza data to produce realistic and diverse epidemic trajectories that match leading ensemble methods in accuracy.
VitaminP uses paired H&E-mIF data to train a model that transfers molecular boundary information, enabling accurate whole-cell segmentation directly from routine H&E histology across 34 cancer types.
A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
A U-Net-based ML pipeline reconstructs the complete phase field and quantized vortex charges in 2D Bose-Einstein condensates from density snapshots alone, using synthetic training data from projected Gross-Pitaevskii simulations.
Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.
Implicit Manifold-valued Diffusions (IMDs) are data-driven SDEs built from proximity graphs that converge in law to smooth manifold diffusions as sample count increases.
BTECF encodes retinal vessels as Bézier trees to enable targeted, parameter-level counterfactual interventions on vessel geometry for causal analysis of vascular diseases.
A dual-branch system using frequency edge cues and CLIP-based synthetic patch detection for accurate, resolution-independent image forgery localization.
GeoProto enriches appearance prototypes with geometric offsets from an ordinal shape branch to improve cross-domain few-shot medical image segmentation.
ABLE learns a spatially adaptive Parseval frame from data via an ancillary density to replace fixed bases in spectral neural operators for PDEs.
StereoPolicy fuses stereo image pairs via a Stereo Transformer on pretrained 2D encoders to boost robotic manipulation policies, showing gains over monocular, RGB-D, point cloud, and multi-view methods in simulations and real-robot tests.
Implicit score matching trains diffusion models that successfully sample SU(3) Wilson gauge configurations on lattices, with a Hamiltonian-dynamics corrector needed for strong coupling.
Mixing real UAV imagery with 2101 AI-generated image-mask pairs improves semantic segmentation F1 scores for fine-grained forest species by over 15 percentage points overall and up to 30 points for rare classes.
A hybrid CNN-Transformer denoiser trained on synthetic spectra substantially reduces noise and improves stellar population recovery for low-S/N galaxy observations in controlled tests.
A composite SAM-based method segments organoid images with accuracy matching or approaching inter-observer variability among human annotators.
A vanilla U-Net with 7.76M parameters achieves R²=0.834 and RMSE=1.01 cm on a global InSAR benchmark, beating larger attention models by 34% in R² and 51% in RMSE while running 2.5× faster.
A field-level CNN emulator converts MG-PICOLA runs into near N-body accuracy for f(R) gravity and neutrino cosmologies, achieving sub-percent errors on power spectra and bispectra while generalizing beyond its training set.
Petro-SAM adapts SAM via a Merge Block for polarized views plus multi-scale fusion and color-entropy priors to jointly achieve grain-edge and lithology segmentation in petrographic images.
DINOCell achieves a SEG score of 0.784 on LIVECell by self-supervised domain adaptation of DINOv2, improving 10.42% over SAM-based models and showing strong zero-shot transfer.
GIF fuses geometrical image features and logical graph topology in a conditional diffusion model to generate high-quality IR drop images for chip layouts, outperforming prior ML methods on CircuitNet-N28 with SSIM 0.78, Pearson 0.95, PSNR 21.77, and NMAE 0.026.
Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.
citing papers explorer
-
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.
-
LatentHDR: Decoupling Exposure from Diffusion via Conditional Latent-to-Latent Mapping for Text/Image-to-Panoramic HDR
LatentHDR generates structurally consistent panoramic HDR images by producing one scene latent with a diffusion backbone then deterministically mapping it to multiple exposure latents via a lightweight conditional head.
-
EchoXFlow: A Beamspace Echocardiography Dataset for Cardiac Motion, Flow, and Function
EchoXFlow is a new dataset of 37,125 beamspace echocardiography recordings with separable modalities, Doppler data, ECG, and clinical annotations that enables acquisition-aware learning not possible with standard scan-converted videos.
-
Generative diffusion models for spatiotemporal influenza forecasting
Influpaint uses generative diffusion models on image-encoded influenza data to produce realistic and diverse epidemic trajectories that match leading ensemble methods in accuracy.
-
VitaminP: cross-modal learning enables whole-cell segmentation from routine histology
VitaminP uses paired H&E-mIF data to train a model that transfers molecular boundary information, enabling accurate whole-cell segmentation directly from routine H&E histology across 34 cancer types.
-
Physics-informed, Generative Adversarial Design of Funicular Shells
A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
-
Machine Learning Phase Field Reconstruction in a Bose-Einstein Condensate
A U-Net-based ML pipeline reconstructs the complete phase field and quantized vortex charges in 2D Bose-Einstein condensates from density snapshots alone, using synthetic training data from projected Gross-Pitaevskii simulations.
-
Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings
Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.
-
Diffusion Processes on Implicit Manifolds
Implicit Manifold-valued Diffusions (IMDs) are data-driven SDEs built from proximity graphs that converge in law to smooth manifold diffusions as sample count increases.
-
A General B\'ezier Tree Encoding Counterfactual Framework for Retinal-Vessel-Mediated Disease Analysis
BTECF encodes retinal vessels as Bézier trees to enable targeted, parameter-level counterfactual interventions on vessel geometry for causal analysis of vascular diseases.
-
EDGER: EDge-Guided with HEatmap Refinement for Generalizable Image Forgery Localization
A dual-branch system using frequency edge cues and CLIP-based synthetic patch detection for accurate, resolution-independent image forgery localization.
-
Geometry-aware Prototype Learning for Cross-domain Few-shot Medical Image Segmentation
GeoProto enriches appearance prototypes with geometric offsets from an ordinal shape branch to improve cross-domain few-shot medical image segmentation.
-
Don't Fix the Basis -- Learn It: Spectral Representation with Adaptive Basis Learning for PDEs
ABLE learns a spatially adaptive Parseval frame from data via an ancillary density to replace fixed bases in spectral neural operators for PDEs.
-
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception
StereoPolicy fuses stereo image pairs via a Stereo Transformer on pretrained 2D encoders to boost robotic manipulation policies, showing gains over monocular, RGB-D, point cloud, and multi-view methods in simulations and real-robot tests.
-
Diffusion model for SU(N) gauge theories
Implicit score matching trains diffusion models that successfully sample SU(3) Wilson gauge configurations on lattices, with a Hamiltonian-dynamics corrector needed for strong coupling.
-
Leveraging Image Generators to Address Training Data Scarcity: The Gen4Regen Dataset for Forest Regeneration Mapping
Mixing real UAV imagery with 2101 AI-generated image-mask pairs improves semantic segmentation F1 scores for fine-grained forest species by over 15 percentage points overall and up to 30 points for rare classes.
-
A CNN--Transformer Denoiser for low-$S/N$ Galaxy Spectra: Stellar Population Recovery in Synthetic Tests
A hybrid CNN-Transformer denoiser trained on synthetic spectra substantially reduces noise and improves stellar population recovery for low-S/N galaxy observations in controlled tests.
-
Approaching human parity in the quality of automated organoid image segmentation
A composite SAM-based method segments organoid images with accuracy matching or approaching inter-observer variability among human annotators.
-
When Less Is More: Simplicity Beats Complexity for Physics-Constrained InSAR Phase Unwrapping
A vanilla U-Net with 7.76M parameters achieves R²=0.834 and RMSE=1.01 cm on a global InSAR benchmark, beating larger attention models by 34% in R² and 51% in RMSE while running 2.5× faster.
-
MG-NECOLA: A Field-Level Emulator for $f(R)$ Gravity and Massive Neutrino Cosmologies
A field-level CNN emulator converts MG-PICOLA runs into near N-body accuracy for f(R) gravity and neutrino cosmologies, achieving sub-percent errors on power spectra and bispectra while generalizing beyond its training set.
-
From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation
Petro-SAM adapts SAM via a Merge Block for polarized views plus multi-scale fusion and color-entropy priors to jointly achieve grain-edge and lithology segmentation in petrographic images.
-
Self-supervised Pretraining of Cell Segmentation Models
DINOCell achieves a SEG score of 0.784 on LIVECell by self-supervised domain adaptation of DINOv2, improving 10.42% over SAM-based models and showing strong zero-shot transfer.
-
GIF: A Conditional Multimodal Generative Framework for IR Drop Imaging in Chip Layouts
GIF fuses geometrical image features and logical graph topology in a conditional diffusion model to generate high-quality IR drop images for chip layouts, outperforming prior ML methods on CircuitNet-N28 with SSIM 0.78, Pearson 0.95, PSNR 21.77, and NMAE 0.026.
-
ELT: Elastic Looped Transformers for Visual Generation
Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.
-
MRI-to-CT synthesis using drifting models
Drifting models outperform diffusion, CNN, VAE, and GAN baselines in MRI-to-CT synthesis on two pelvis datasets with higher SSIM/PSNR, lower RMSE, and millisecond one-step inference.
-
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
A low-cost whole-body teleoperation system enables effective imitation learning for complex bimanual mobile manipulation by co-training on mobile and static demonstration datasets.
-
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.
-
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.
-
TRAS: An Interactive Software for Tracing Tree Ring Cross Sections
TRAS integrates CS-TRD, DeepCS-TRD, and INBD detection methods with an interactive GUI, achieving 81% F-score on 18 Pinus taeda images while reducing manual correction to ~20% of boundaries and matching CooRecorder ring-width measurements at r > 0.99.
-
Scalable Active Metamaterials for Shape-Morphing
A hierarchical SAM framework decouples macroscale mesh optimization from microscale inverse design to enable fast scalable creation of aperiodic shape-morphing metamaterials.
-
Full-chip CMP modelling based on Fully Convolutional Network leveraging White Light Interferometry
A fully convolutional network trained separately on WLI and AFM data predicts full-chip post-CMP nanotopography at nanometer accuracy.
-
Flow matching for Sentinel-2 super-resolution: implementation, application, and implications
Flow matching achieves single-step pixel accuracy and 20-step perceptual quality for Sentinel-2 super-resolution, outperforming diffusion and Real-ESRGAN while enabling large-scale 2.5 m land-cover products.
-
End-to-end Automated Deep Neural Network Optimization for PPG-based Blood Pressure Estimation on Wearables
An end-to-end hardware-aware optimization pipeline produces DNNs for PPG-based blood pressure estimation with up to 7.99% lower error and 83x fewer parameters that fit on ultra-low-power SoCs like GAP8.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
-
Deep Learning-Based Segmentation of Peritoneal Cancer Index Regions from CT Imaging
nnU-Net segments rPCI regions on 62 CT scans with mean Dice 0.82, nearing inter-observer agreement of 0.88 and beating Swin UNETR at 0.76.
-
KAYRA: A Microservice Architecture for AI-Assisted Karyotyping with Cloud and On-Premise Deployment
KAYRA packages a cascade of EfficientNet-B5 + U-Net, Mask R-CNN, and ResNet-18 models into a microservice architecture that supports both cloud and on-premise deployment and reaches 98.91% segmentation accuracy in a pilot test on 459 chromosomes.
-
A Deep U-Net Framework for Flood Hazard Mapping Using Hydraulic Simulations of the Wupper Catchment
A U-Net surrogate model trained on hydraulic simulations predicts maximum water levels for flood hazard mapping in the Wupper catchment with results comparable to the original simulations.
-
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.
-
Learning to count small and clustered objects with application to bacterial colonies
ACFamNet Pro reaches 9.64% mean normalized absolute error on bacterial colony images under 5-fold cross-validation, beating FamNet by 12.71%.
-
AI Approach for MRI-only Full-Spine Vertebral Segmentation and 3D Reconstruction in Paediatric Scoliosis
An AI pipeline using GAN-generated MRI-like images and U-Net segmentation produces automated 3D thoracolumbar spine reconstructions from MRI with 88% Dice score and reduces processing time from 1 hour to under 1 minute while preserving scoliosis deformity features.
-
DigiForest: Digital Analytics and Robotics for Sustainable Forestry
DigiForest integrates heterogeneous autonomous robots for data collection, automated tree trait extraction, a decision support system for growth forecasting, and autonomous harvesters for selective logging, with real-world tests in European forests.
-
AMO-ENE: Attention-based Multi-Omics Fusion Model for Outcome Prediction in Extra Nodal Extension and HPV-associated Oropharyngeal Cancer
An attention-based fusion model combining semi-supervised CT segmentation, radiomics, and clinical features predicts metastatic recurrence, overall survival, and disease-free survival in HPV+ oropharyngeal cancer with AUCs of 88.2%, 79.2%, and 78.1% on an internal cohort of 397 patients.
-
Uncertainty Estimation for Deep Reconstruction in Actuatic Disaster Scenarios with Autonomous Vehicles
Evidential Deep Learning outperforms other methods in accuracy, calibration, and speed for uncertainty-aware scalar field reconstruction in aquatic environments using autonomous vehicles.
-
SAGE-GAN: Towards Realistic and Robust Segmentation of Spatially Ordered Nanoparticles via Attention-Guided GANs
SAGE-GAN integrates a self-attention U-Net into a CycleGAN framework to generate realistic synthetic electron microscopy image-mask pairs that augment training data for nanoparticle segmentation without human labeling.
-
Machine Learning as a Transformative Tool for (Exo-)Planetary Science
The paper reviews ML applications for sequence modeling, pattern recognition, and generative Bayesian analysis to tackle heterogeneous data challenges in (exo)planetary science.