【原】【每周CV論文推薦】初學(xué)基于GAN的圖像語義編輯,，需要閱讀哪些論文？

有三AI 2022-09-28 發(fā)布于北京

展開全文

生成對抗網(wǎng)絡(luò)是一項非?；A(chǔ)的技術(shù),，當前基于GAN的語義圖像合成可以用于修改圖像中的語義信息，實現(xiàn)圖像編輯,，是一個非常重要與前沿的研究方向,，本次我們給大家推薦初學(xué)基于GAN的圖像編輯領(lǐng)域中值得閱讀的一些方向。

作者&編輯 | 言有三

1 基本條件控制IcGAN

IcGAN是非常早期的圖像編輯GAN領(lǐng)域的工作,，它將條件GAN的結(jié)構(gòu)進行反轉(zhuǎn),，使用編碼器完成從圖像到屬性向量的學(xué)習(xí)，從而通過對屬性向量的編輯來實現(xiàn)圖像編輯,。

文章引用量：600+

推薦指數(shù)：?????

[1] Perarnau G, Van De Weijer J, Raducanu B, et al. Invertible conditional gans for image editing[J]. arXiv preprint arXiv:1611.06355, 2016.

2 多域條件控制StarGAN系列

StarGAN v1和StarGAN v2是非常經(jīng)典的多域圖像翻譯框架,，它通過域標簽屬性向量，可以自由地實現(xiàn)任意域之間的切換,，從而實現(xiàn)圖像語義內(nèi)容的編輯,。

文章引用量：3000+

推薦指數(shù)：?????

[2] Choi Y, Choi M, Kim M, et al. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8789-8797.

[3] Choi Y, Uh Y, Yoo J, et al. Stargan v2: Diverse image synthesis for multiple domains[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 8188-8197.

3 潛在向量學(xué)習(xí)StyleGAN

StyleGAN優(yōu)良的設(shè)計使其學(xué)習(xí)到了一個非常優(yōu)秀的Latent向量空間，通過將圖像反投影回該向量空間,，可以實現(xiàn)各類屬性的編輯,，獲得高質(zhì)量的編輯效果，尤其是在人臉圖像領(lǐng)域的研究非常多,，以Image2StyleGAN等為代表,。

文章引用量：10000+

推薦指數(shù)：?????

[4] Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 4401-4410.

[5] Abdal R , Qin Y , Wonka P . Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?[J]. IEEE, 2019.

[6] Abdal R, Qin Y, Wonka P. Image2stylegan++: How to edit the embedded images?[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 8296-8305.

4 語義信息監(jiān)督MaskGAN

前面介紹的框架要么通過對GAN的Latent空間進行學(xué)習(xí)，要么基于高層的語義屬性作為條件控制,，它們都只能編輯高層的語義,，如果想要實現(xiàn)非常細粒度的編輯，需要語義級別的控制,，MaskGAN就是一個典型的基于語義掩膜來進行編輯的框架,，類似的還有經(jīng)典的交互式編輯框架SPADE等。

文章引用量：3000+

推薦指數(shù)：?????

[7] Lee C H, Liu Z, Wu L, et al. Maskgan: Towards diverse and interactive facial image manipulation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 5549-5558.

[8] Park T, Liu M Y, Wang T C, et al. Semantic image synthesis with spatially-adaptive normalization[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 2337-2346.