ciwGAN and fiwGAN models trained on isolated words spontaneously generate concatenated multi-word outputs and display early compositionality precursors.
hub Canonical reference
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Canonical reference. 71% of citing Pith papers cite this work as background.
abstract
In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.
GPT-f, a transformer-based prover for Metamath, generated new short proofs that were accepted into the main library—the first such contribution from a deep-learning system.
AGAN is the first neural architecture search method for GANs that discovers architectures outperforming state-of-the-art on CIFAR-10 unsupervised image generation and competitive on supervised tasks.
Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
A relative projection error metric in foundation-model embedding space predicts the downstream utility of synthetic positive samples for binary classifiers.
Prompts can be split into separate roles for sampling design and recovery modeling in generative compressed sensing, with stable recovery bounds for matched prompts and an explicit penalty for mismatch, validated on Stable Diffusion.
A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
SurFITR is a new collection of 137k+ surveillance-style forged images that causes existing detectors to degrade while enabling substantial gains when used for training in both in-domain and cross-domain settings.
A pre-training diagnostic map based on spectral correlation resemblance to IQP circuits and excess structural complexity identifies suitable datasets like turbulence data for quantum generative models, yielding competitive low-resource performance.
ASTRA is a plug-and-play training-free method for precise multi-subject video editing that uses prompt-guided multimodal alignment and prior-based mask retargeting to avoid attention dilution and boundary issues.
Progressive growing stabilizes GAN training to produce high-resolution images of unprecedented quality and achieves a record unsupervised inception score of 8.80 on CIFAR10.
Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.
VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.
NeTMY neural fields with annealed encoding, multiscale optimization, and spectrum-fidelity losses achieve superior localization and distributional accuracy in NV-center inverse sensing by using a tensor power-summed dipolar operator that exposes and mitigates center-collapse failures.
CE-FI maps heterogeneous model representations to a shared embedding space via unsupervised training on unlabeled data, enabling privacy-preserving federated inference that outperforms solo models on image classification benchmarks.
A new framework evaluates utility of synthetic mobility trajectories while a membership inference attack reveals privacy vulnerabilities in generative models thought to be safe.
Embedding Arithmetic performs vector operations in the embedding space of T2I models to mitigate bias at inference time, outperforming baselines on diversity while preserving coherence via a new Concept Coherence Score.
FatigueFusion fuses fatigue features in latent space using algorithmic, data-driven, and PINN modules to synthesize novel fatigued motions from non-fatigued joint sequences in an end-to-end pipeline.
Finetuning generative models on limited instance segmentation data produces zero-shot generalization to unseen object categories and styles, matching or exceeding supervised baselines like SAM on ambiguous boundaries.
Scaling noise magnitude in NCE aligns gradients with MLE, enabling a practical approximation that improves performance on CIFAR-10 and ImageNet image modeling with fewer training steps.
MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.
VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.
DASCN uses a unified primal-dual GAN architecture to generate semantics-consistent visual features for generalized zero-shot learning, claiming state-of-the-art gains.
citing papers explorer
-
Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks
ciwGAN and fiwGAN models trained on isolated words spontaneously generate concatenated multi-word outputs and display early compositionality precursors.
-
Toy Models of Superposition
Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.
-
Generative Language Modeling for Automated Theorem Proving
GPT-f, a transformer-based prover for Metamath, generated new short proofs that were accepted into the main library—the first such contribution from a deep-learning system.
-
AGAN: Towards Automated Design of Generative Adversarial Networks
AGAN is the first neural architecture search method for GANs that discovers architectures outperforming state-of-the-art on CIFAR-10 unsupervised image generation and competitive on supervised tasks.
-
Density estimation using Real NVP
Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
-
Discriminative Span as a Predictor of Synthetic Data Utility via Classifier Reconstruction
A relative projection error metric in foundation-model embedding space predicts the downstream utility of synthetic positive samples for binary classifiers.
-
Active Learning for Conditional Generative Compressed Sensing
Prompts can be split into separate roles for sampling design and recovery modeling in generative compressed sensing, with stable recovery bounds for matched prompts and an explicit penalty for mismatch, validated on Stable Diffusion.
-
Physics-informed, Generative Adversarial Design of Funicular Shells
A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
-
SurFITR: A Dataset for Surveillance Image Forgery Detection and Localisation
SurFITR is a new collection of 137k+ surveillance-style forged images that causes existing detectors to degrade while enabling substantial gains when used for training in both in-domain and cross-domain settings.
-
Toward Generative Quantum Utility via Correlation-Complexity Map
A pre-training diagnostic map based on spectral correlation resemblance to IQP circuits and excess structural complexity identifies suitable datasets like turbulence data for quantum generative models, yielding competitive low-resource performance.
-
ASTRA: Let Arbitrary Subjects Transform in Video Editing
ASTRA is a plug-and-play training-free method for precise multi-subject video editing that uses prompt-guided multimodal alignment and prior-based mask retargeting to avoid attention dilution and boundary issues.
-
Progressive Growing of GANs for Improved Quality, Stability, and Variation
Progressive growing stabilizes GAN training to produce high-resolution images of unprecedented quality and achieves a record unsupervised inception score of 8.80 on CIFAR10.
-
Mixed Precision Training
Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.
-
Vision Foundation Models as Generalist Tokenizers for Image Generation
VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.
-
Neural Fields for NV-Center Inverse Sensing
NeTMY neural fields with annealed encoding, multiscale optimization, and spectrum-fidelity losses achieve superior localization and distributional accuracy in NV-center inverse sensing by using a tensor power-summed dipolar operator that exposes and mitigates center-collapse failures.
-
Enabling Federated Inference via Unsupervised Consensus Embedding
CE-FI maps heterogeneous model representations to a shared embedding space via unsupervised training on unlabeled data, enabling privacy-preserving federated inference that outperforms solo models on image classification benchmarks.
-
A Dual Perspective on Synthetic Trajectory Generators: Utility Framework and Privacy Vulnerabilities
A new framework evaluates utility of synthetic mobility trajectories while a membership inference attack reveals privacy vulnerabilities in generative models thought to be safe.
-
Embedding Arithmetic: A Lightweight, Tuning-Free Framework for Post-hoc Bias Mitigation in Text-to-Image Models
Embedding Arithmetic performs vector operations in the embedding space of T2I models to mitigate bias at inference time, outperforming baselines on diversity while preserving coherence via a new Concept Coherence Score.
-
FatigueFusion: Latent Space Fusion for Fatigue-Driven Motion Synthesis
FatigueFusion fuses fatigue features in latent space using algorithmic, data-driven, and PINN modules to synthesize novel fatigued motions from non-fatigued joint sequences in an end-to-end pipeline.
-
gen2seg: Generative Models Enable Generalizable Instance Segmentation
Finetuning generative models on limited instance segmentation data produces zero-shot generalization to unseen object categories and styles, matching or exceeding supervised baselines like SAM on ambiguous boundaries.
-
"Noisier" Noise Contrastive Eestimation is (Almost) Maximum Likelihood
Scaling noise magnitude in NCE aligns gradients with MLE, enabling a practical approximation that improves performance on CIFAR-10 and ImageNet image modeling with fewer training steps.
-
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.
-
VideoGPT: Video Generation using VQ-VAE and Transformers
VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.
-
Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning
DASCN uses a unified primal-dual GAN architecture to generate semantics-consistent visual features for generalized zero-shot learning, claiming state-of-the-art gains.
-
Dual Adversarial Learning with Attention Mechanism for Fine-grained Medical Image Synthesis
Dual-discriminator GAN with adversarial attention improves fine-grained medical image synthesis, especially in hard-to-generate tumor regions, and outperforms prior methods on brain tumor and CT-to-MRI tasks.
-
RED: A ReRAM-based Deconvolution Accelerator
RED introduces pixel-wise mapping and zero-skipping dataflow for ReRAM deconvolution acceleration, reporting 1.15x-3.69x speedup and 8%-88.36% energy reduction versus prior ReRAM accelerators.
-
A Halo Merger Tree Generation and Evaluation Framework
A GAN framework is trained on EAGLE simulation merger trees to generate new realistic trees for semi-analytic galaxy models at modest computational cost.
-
Demystifying MMD GANs
MMD GANs have unbiased critic gradients but biased generator gradients from sample-based learning, and the Kernel Inception Distance provides a practical new measure for GAN convergence and dynamic learning rate adaptation.
-
Are Candidate Models Really Needed for Active Learning?
Active learning with randomly initialized models achieves comparable results to traditional candidate-model methods, with low-confidence sampling proving most effective.
-
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.
-
ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance
ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.
-
Improving Diversity in Black-box Few-shot Knowledge Distillation
An adaptive high-confidence image selection scheme during GAN training expands diversity in the distillation set for black-box few-shot KD and yields SOTA student accuracy on seven image datasets.
-
A Geometric Algebra-informed NeRF Framework for Generalizable Wireless Channel Prediction
GAI-NeRF combines geometric algebra attention and an adaptive ray tracing module inside a NeRF model to deliver more accurate and generalizable wireless channel predictions across varied indoor environments.
-
Unsupervised Detection of Spatiotemporal Anomalies in PMU Data Using Transformer-Based BiGAN
T-BiGAN integrates window-attention Transformers in a BiGAN to achieve ROC-AUC 0.95 and average precision 0.996 for unsupervised spatiotemporal anomaly detection in PMU data.
-
Quantum generative modeling for financial time series with temporal correlations
QGANs with quantum generators and classical discriminators generate financial time series matching target distributions and desired temporal correlations, with quality varying by circuit depth, bond dimension, and simulation method.
-
CCNETS: A Modular Causal Learning Framework for Pattern Recognition in Imbalanced Datasets
CCNETS is a new modular causal framework using three cooperative modules and a Zoint mechanism to align synthetic data generation with classifier needs on imbalanced pattern recognition tasks.
-
Synthetic Augmentation and Feature-based Filtering for Improved Cervical Histopathology Image Classification
cGAN data augmentation with feature-based filtering improves ResNet18 CIN grading accuracy from 66.3% to 71.7% on segmented epithelium patches.
-
Affine Disentangled GAN for Interpretable and Robust AV Perception
ADIS-GAN disentangles affine transformations in a GAN to achieve over 98% classification accuracy on MNIST within 30 degrees rotation and over 90% under FGSM and PGD attacks while generating rotation and scaling factors.
-
Generative Counterfactual Introspection for Explainable Deep Learning
A generative-model-driven introspection method produces counterfactual image edits to explain deep neural network predictions on MNIST and CelebA.
-
Disentangled Makeup Transfer with Generative Adversarial Network
DMT uses identity and makeup encoders in a GAN to enable controllable makeup transfer from references and sampling of new styles from a prior distribution.
-
Enhancing the accuracy of under-resolved numerical simulations of atmospheric flows with super resolution
A multi-scale CNN super-resolution model outperforms baseline CNN, attention CNN, and diffusion-based approaches in reconstructing fine-scale features from under-resolved atmospheric flow simulations on standard benchmarks.
-
Improving conditional generative adversarial networks for inverse design of plasmonic structures
Adding label projection and a novel embedding network to cGANs cuts mean absolute error by up to an order of magnitude and makes training converge over three times faster for plasmonic inverse design.
-
Diving Deeper into Underwater Image Enhancement: A Survey
A comprehensive survey of deep learning-based underwater image enhancement with systematic experimental comparison of algorithms on multiple datasets.
-
Mean Spectral Normalization of Deep Neural Networks for Embedded Automation
Proposes MSN reparameterization to address mean-drift in SN, claiming ~16% faster inference than BN with fewer parameters on CNNs and GANs.
-
MIDI-Sandwich: Multi-model Multi-task Hierarchical Conditional VAE-GAN networks for Symbolic Single-track Music Generation
MIDI-Sandwich is a hierarchical VAE-GAN architecture that generates structured 136-beat melodies by modeling local bars and global relationships on the Nottingham dataset.
-
GAN-Knowledge Distillation for one-stage Object Detection
A GAN-based adversarial training method distills knowledge from teacher to student networks by treating their feature maps as real and fake samples to boost one-stage object detector performance.
-
Synthetic data in cryptocurrencies using generative models
CGANs with LSTM generator can produce synthetic crypto price series that reproduce temporal patterns and preserve market trends and dynamics.
-
Federated Breast Cancer Detection Enhanced by Synthetic Ultrasound Image Augmentation
Balanced synthetic image augmentation via GANs and diffusion models raises average AUC from 0.9206 to 0.9362 for FedAvg and 0.9429 to 0.9574 for FedProx in federated breast ultrasound classification.
- A Geometric Algebra-Informed 3DGS Framework for Wireless Channel Prediction
- One-Step Generative Modeling via Wasserstein Gradient Flows