InterCMDM proposes a block-causal latent diffusion framework with dual-stream causal transformers and multi-task attention masks for autoregressive text-conditioned two-person interaction generation and reports SOTA results on InterHuman and Inter-X.
TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
Pixel-wise image segmentation is demanding task in computer vision. Classical U-Net architectures composed of encoders and decoders are very popular for segmentation of medical images, satellite images etc. Typically, neural network initialized with weights from a network pre-trained on a large data set like ImageNet shows better performance than those trained from scratch on a small dataset. In some practical applications, particularly in medicine and traffic safety, the accuracy of the models is of utmost importance. In this paper, we demonstrate how the U-Net type architecture can be improved by the use of the pre-trained encoder. Our code and corresponding pre-trained weights are publicly available at https://github.com/ternaus/TernausNet. We compare three weight initialization schemes: LeCun uniform, the encoder with weights from VGG11 and full network trained on the Carvana dataset. This network architecture was a part of the winning solution (1st out of 735) in the Kaggle: Carvana Image Masking Challenge.
representative citing papers
U-Net with combined binary cross-entropy and soft Jaccard loss segments the tidemark in PTA-stained micro-CT images, achieving IoU scores of 0.59–0.86 at increasing padded distances on cross-validation of 35 samples.
Generates large labeled realistic laparoscopic image datasets from simulations using extended unpaired translation and demonstrates use for liver segmentation achieving Dice scores up to 0.89 without any real labeled data.
citing papers explorer
-
InterCMDM: Block-Causal Diffusion for Autoregressive Human Interaction Generation
InterCMDM proposes a block-causal latent diffusion framework with dual-stream causal transformers and multi-task attention masks for autoregressive text-conditioned two-person interaction generation and reports SOTA results on InterHuman and Inter-X.
-
Deep-Learning for Tidemark Segmentation in Human Osteochondral Tissues Imaged with Micro-computed Tomography
U-Net with combined binary cross-entropy and soft Jaccard loss segments the tidemark in PTA-stained micro-CT images, achieving IoU scores of 0.59–0.86 at increasing padded distances on cross-validation of 35 samples.
-
Generating large labeled data sets for laparoscopic image processing tasks using unpaired image-to-image translation
Generates large labeled realistic laparoscopic image datasets from simulations using extended unpaired translation and demonstrates use for liver segmentation achieving Dice scores up to 0.89 without any real labeled data.