dual contrastive loss and attention for gans

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, QuocV Le, and Ruslan It is worth noting that the rankings of PPL are negatively cor- related to all the other metrics, which disqualifies it as an effective evaluation metric in our experiments. On larger datasets, on the other hand, there is no study showing that disciminators overfit but we hypothesize that adversarial training can still benefit from novel loss functions which encourage the distinguishability power of the discriminator representations for their real vs. fake classification task. 1. equilibrium. We find: (1) Comparing across the first, second, and third rows, self-attention generator, dual contrastive loss, and their synergy significantly and consistently improve on all the limited-scale datasets, more than what they improve on the large-scale datasets: from 18.1% to 23.3% on CelebA[54] and Animal Face[52], from 17.5% to 43.2% on LSUN Bedroom[87], and from 25.2% to 26.4% on LSUN Church[87]. Specifically, we propose a novel dual contrastive loss and show that, with this loss, discriminator learns more generalized and distinguishable representations to incentivize generation. \colorblack We reason that the arbitrary pair-up of the reference and primary image inputs to prevent overfitting when data size is small but causing underfitting with the increase of data size From Table8, we validate that Eq. Learning to compare image patches via convolutional neural networks. Lastly, we study different attention architectures in the discriminator, and propose a reference attention mechanism. In Table3 we extensively compare among a variety of self-attention modules by replacing the default convolution in the 3232-resolution layer in StyleGAN2[41] config E backbone with one of them. 9 and Eq. E.g., U-Net GAN has the best PPL in most cases but in fact it contradicts against its worst FID and worst visual quality in Fig. We design a novel reference attention mechanism in the discriminator where we allow two irrelevant images as the inputs at the same time: one input is sampled from real data as a reference, and the other input is switched between a real sample and a generated sample. Or, have a go at fixing it yourself the renderer is open source! Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Generative Adversarial Networks (GANs) produce impressive results on unconditional image generation when powered with large-scale image datasets. Xudong Mao, Qing Li, Haoran Xie, RaymondYK Lau, Zhen Wang, and Stephen improves the most, and in return of how many additional parameters. A simple framework for contrastive learning of visual We do not experiment with 10241024 resolution of FFHQ as it takes 9 days to train StyleGAN2 base model. StyleGAN23*3StyleGAN2StyleGAN2, , StyleGAN2StyleGAN2FFHQCLEVRCelebABigGANU-Net GANStyleGAN2StyleGAN2FID, the sourth wind: Bert: Pre-training of deep bidirectional transformers for language Hinton. For dual contrastive loss, we first warm up training with the default non-saturating loss for about 20 epochs, and then switch to train with our loss. High-resolution pelvic MRI reconstruction using a generative adversarial network with attention and cyclic loss. transforms. Motivated by the consistent improvement from our dual contrastive loss, we delve deeper to investigate if and by how much our contrastive representation is more distinguishable than the original discriminator representation. Contrastive learning targets a transformation of inputs into an embedding where As shown in Table1, dual contrastive loss is the only loss that significantly improves upon the default loss of StyleGAN2 consistently on all the five datasets. CasperKaae Snderby, Jose Caballero, Lucas Theis, Wenzhe Shi, and Ferenc It indicates the limited-scale setting is more challenging and leaves more space for our improvements. Ricard Durall, Margret Keuper, and Janis Keuper. Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron 4 the diagram of reference-attention. Dual Contrastive Loss and Attention for GANs . Alec Radford, Luke Metz, and Soumith Chintala. understanding. Choo. Instead, we run extensive experiments on the mentioned various datasets. We find attention to be still an important module for successful image generation even though it was not used in the recent state-of-the-art models. Dual Contrastive Loss and Attention for GANs Ning Yu, Guilin Liu, Aysegul Dundar, Andrew Tao, Bryan Catanzaro, Larry S. Davis, Mario Fritz; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. Free-form image inpainting with gated convolution. Among these works, contrastive learning is used as an auxiliary task. The generator adversarially learns to minimize such dual contrasts. Tero Karras, Samuli Laine, and Timo Aila. These ICCV 2021 papers are the Open Access versions, provided by the. Lucic. Copyright and all rights therein are retained by authors or by other copyright holders. Yet generated images are still easy to spot especially on datasets with high variance (e.g. with humans in the loop. Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and networks. Evaluation. Dual Contrastive Loss and Attention for GANs. It is orthogonal to the works [38, 102, 34] where the contrastive losses serve only as an incremental auxiliary to the conventional adversarial loss and require expensive class annotations or augmentation for generation. StyleGAN2 also shows that generation results can be improved by larger networks with an increased number of convolution filters. Oct 12 Tue 4-5pm EDT & Oct 14 Thur 9 . Timo Aila. In this work, different from the previous ones, we do not use contrastive learning as an auxiliary task but directly \colorblack couple it in the main adversarial training by a novel loss function formulation. Aysegul Dundar, Karan Sapra, Guilin Liu, Andrew Tao, and Bryan Catanzaro. Christian Ledig, Lucas Theis, Ferenc Huszr, Jose Caballero, Andrew Globally and locally consistent image completion. We investigate variants of the attention mechanism in GAN architecture to mitigate the local and stationary issues of convolutions. \colorblack The smaller the more desirable. On the other hand, we find discriminator to behave differently based on the number of available images, and the reference-attention-based discriminator to be only improving on limited-scale datasets. Oct 12 Tue 4-5pm EDT & Oct 14 Thur 9-10am EDT We propose various improvements to push the boundaries of GANs: dual contrastive loss, self-attention in the generator, and reference-attention in the discriminator. We reason that each dataset has its own spatial scale and complexity. In addition, we revisit attention and extensively experiment with different attention blocks in the generator. bedroom, church). Zhang. . It is empirically acknowledged that the optimal resolution to replace convolution with self-attention in the generator is specific to dataset and image resolution[94]. GAN techniques have been popularized into extensive computer vision applications, including but not limited to image translation[33, 103, 104, 51, 31, 78, 61, 19, 60], postprocessing[44, 68, 42, 43, 74, 59, 98], image manipulation[12, 13, 67, 1], texture synthesis[90, 50, 56], image inpainting[32, 49, 88, 89], and text-to-image generation[65, 95, 96, 71]. We stop investigation to higher resolutions because the training turns easily diverging. To answer these questions, we experiment with previously proposed self-attention modules: Dynamic Filter Networks (DFN)[35], Visual Transformers (VT)[81], Self-Attention GANs (SAGAN)[94], as well as the state-of-the-art patch-based spatially-adaptive self-attention module, SAN[99]. Sca-cnn: Spatial and channel-wise attention in convolutional networks We obtain even more significant improvements on compositional synthetic scenes (up to 47.5% in FID). Our generation significantly outperforms the baselines U-Net GAN[66] and StyleGAN2[41] in terms of quality, long-range dependencies, and spatial consistency. To align feature embeddings, we apply the Siamese architecture[5, 14] to share layer parameters as shown in Fig. We find there is a specific optimal resolution for each dataset, and the FID turns monotonically deteriorated when introducing self-attention one resolution up or down. .Dual Contrastive Loss and Attention for GANs StyleGAN2GANs . Dual Contrastive Loss and Attention for GANs @article{Yu2021DualCL, title={Dual Contrastive Loss and Attention for GANs}, author={Ning Yu and Guilin Liu and Aysegul Dundar and Andrew Tao and Bryan Catanzaro and Larry Davis and Mario Fritz}, journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)}, year={2021}, pages={6711-6722 We also compare in Table4 the time and space complexity of these self-attention modules. Unsupervised image-to-image translation networks. We replace the loss used in StyleGAN2[41], non-saturating default loss, with other popular GAN losses while keeping all the other parameters the same. To answer these questions, we extensively study the role of attention in the current state-of-the-art generator, and during this study improve the results significantly. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan. High-resolution image synthesis and semantic manipulation with We put another lens on the representation power of the discriminator by incentivizing generation via contrastive learning. We reason that the arbitrary pair-up between reference and primary images results in a beneficial effect similar in spirit to data augmentation, and consequently generalizes the discriminator representation and mitigates its overfitting. The distinguishability of contrastive representation. First, we apply the best attention mechanism we validated in the generator in the discriminator, however, we do not see a benefit of such design as shown in Table 5. Han Hu, Zheng Zhang, Zhenda Xie, and Stephen Lin. Psanet: Point-wise spatial attention network for scene parsing. Convolutional generation of textured 3d meshes. For the state-of-the-art attention module SAN[99] in Table3 in the main paper, we find that it achieves the optimal performance at 3232 generator resolution consistently over all the limited-scale 128128 datasets, and therefore we report these FIDs. and Jan Kautz. Dual Contrastive Loss and Attention for GANs (Ning Yu, Guilin Liu, Aysegul Dundar, Andrew Tao, Bryan Catanzaro, Larry Davis, Mario Fritz) contrastive loss gan loss self attention real vs real/fake attention stylegan2 . Investigations on self-attention modules. Attributing fake images to gans: Learning and analyzing gan We find attention to be still an important module for successful image generation even though it was not used in the recent state-of-the-art models. We use the 30k subset of each dataset at 128128 resolution. AdderNet: Do We Really Need Multiplications in Deep Learning? We show the diagram of self-attention in Figure4, with a specific instantiation from SAN[99] due to its generalized and state-of-the-art design. Also, because the value and residual shortcut contribute more directly to the discriminator output, we should feed them with the primary image, and feed the key and query with the reference image to formulate the spatially adaptive kernel. We reason that the image pre-processing of facial landmark alignment compensates for the lack of attention schemes, which makes previous works also overlook them on other datasets. Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, bedroom, church). Zhengli Zhao, Zizhao Zhang, Ting Chen, Sameer Singh, and Han Zhang. Cbam: Convolutional block attention module. Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Generative Adversarial Networks (GANs) produce impressive results on unconditional image generation when powered with large-scale image datasets. Ting Chen, Xiaohua Zhai, Marvin Ritter, Mario Lucic, and Neil Houlsby. We then flatten the patch and concatenate it along the channel dimension with qR11c, the query vector at (i,j), to obtain pR11(s2c+c): In order to cooperate between the key and query, we feed p through two fully-connected layers followed by bias and leaky ReLU and obtain a vector with size ~wR11s2c: Mw1R(s2c+c)s2c, Mw2Rs2cs2c, and bw1,bw2R11s2c are the learnable parameters in the fully connected layers and biases. Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Dually in Case II, the discriminator learns to disassociate a single generated image against a batch of real images. Watch your up-convolution: Cnn based generative deep neural networks Xu Jia, Bert DeBrabandere, Tinne Tuytelaars, and LucV Gool. Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Morteza Mardani, Guilin Liu, Aysegul Dundar, Shiqiu Liu, Andrew Tao, and Bryan Similarly, discriminators evolved from MLP to DCNN[64], however, Specifically, many GAN-based image generators rely on convolutional layers to encode features. their design has not been studied as aggressively. Veegan: Reducing mode collapse in gans using implicit variational It is recently re-popularized by various unsupervised learning works[25, 58, 73, 7, 8] and generation works[60, 38, 102]. We find that the reference-attention in the discriminator consistently improves the performance when dataset size varies between 1k and 30k images, and on contrary slightly deteriorates the performance when dataset sizes increase further. Yet generated images are still easy to spot especially on datasets with high variance (e.g. In addition, we revisit attention and extensively experiment with different attention blocks in the generator. Progressive growing of gans for improved quality, stability, and Sign up to our mailing list for occasional updates. On the contrary, the improvements from SAGAN[94] or SAN[99] are not at the cost of complexity, but rather benefited from the more representative attention designs. However, the feature representations of discriminators are often not generalized enough to incentivize the adversarially evolving generator and are prone to forgetting previous tasks[10] or previous data modes[69, 46]. Additionally, Generative Adversarial Networks (GANs) produce impressive results on unconditional image generation when powered with large-scale image datasets. For the large-scale datasets with varying resolutions in Table6 in the main paper, we conduct an analysis study on their optimal resolutions as shown in Table7. In addition, we revisit attention and extensively arxiv attention cv for gans loss. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds. Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and For conceptual and technical completeness, we formulate our SAN-based self-attention below. Lukc. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Contributions are summarized as follow: We propose a novel dual contrastive loss in adversarial training that generalizes representation to more effectively distinguish between real and fake, and further incentivize the image generation quality. Dual Contrastive Loss and Attention for GANs. SAN[99] generalizes the self-attention block[79] (as used in SAGAN[94]) by replacing the point-wise softmax attention with a patch-wise fully-connected transformation. A larger value indicates more distinguishable features between real and fake. Yet, behind the seemingly saturated performance of the state-of-the-art StyleGAN2[41], there still persists open issues of GANs that make generated images surprisingly obvious to spot[91, 77, 20]. This observation differs from that of pairwise contrastive learning in the unsupervised learning scenario[25, 73, 7, 8] or GAN applications with reconstructive regularization[60]. Our improvements for GANs include a novel dual contrastive loss and variants of the attention mechanisms. VT[81] compresses input tensor to a set of 1D feature vectors, interprets them as semantic tokens, and leverages language transformer[75] for tensor propagation. Guilin Liu, FitsumA Reda, KevinJ Shih, Ting-Chun Wang, Andrew Tao, and Bryan Representation learning with contrastive predictive coding. ICCV 2021. Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. Generative Adversarial Networks (GANs) produce impressive results on unconditional image generation when powered with large-scale image datasets. Thrilled to present a new stage of performance on GANs at live Session 5 Paper 8068: "Dual Contrastive Loss and Attention for GANs". Lastly, we study different attention architectures in the discriminator, and propose a reference attention mechanism.
Kyoto Events April 2022, Columbia University Graduate Admission Statistics, Cellulosic Ethanol Plant, Research Paper Thesis Outline, Treetop Restaurant Berkeley Springs, Wv, Temperature Tomorrow Near Netherlands, Preserve Line Breaks In Textarea Javascript, California Truck Lane Violation,