IB-GAN

Disentangled Representation Learning

  • Disentangled representations: a change in a single direction of the latent vector corresponds to changes in a single factor of variation of data while invariant to others.

  • GOAL: Learning an Encoder that can predict the disentangled representation. Learning a Decoder (or Generator) which can synthesize an image.

  • HARD: Achieving the goal without truth generative factors or supervision is hard.

Information Bottleneck (IB) Principle

  • GOAL: Obtaining the optimal representation encoder q_Ο• (zβ”‚x) that balances the trade-off between the maximization and minimization of both mutual information terms.

  • I(β‹…,β‹…) denotes mutual information (MI) between input variable X and target variable Y.

  • The learned representation Z acts as a minimal sufficient statistic of X for predicting Y.

Information Bottleneck GAN

IB-GAN introduces the upper bound of MI and 𝛽 term to InfoGAN’s objective, inspired by IB principle and 𝛽-VAE for the disentangled representation learning.

𝑰^𝑳 (β‹…,β‹…) and 𝑰^𝑼 (β‹…,β‹…) denote the lower and upper-bound of MI, respectively (πœ† β‰₯𝛽). IB-GAN not only maximizes the shared information between the generator 𝐺 and the representation 𝑧 but also allows control of the maximum amount of information shared by them using 𝛽 analogously to that of 𝛽-VAE and IB theory.

Inference

Variational Lower-Bound

  • The lower bound of MI is formulated by introducing the variational reconstructor 𝒒_𝝓 (𝒛|𝒙). Intuitively, the maximization of MI is achieved by reconstructing an input code 𝑧 from a generated sample 𝐺(𝑧)=𝑝_πœƒ (π‘₯│𝑧), similar to the approach of InfoGAN.

Variational Upper-Bound

  • NaΓ―ve variational upper-bound of generative MI introduces an approximating prior 𝒅(𝒙). However, any improper choice of 𝒅(𝒙) may severely downgrade the quality of the synthesized sample from generator 𝒑_𝜽 (𝒙|𝒛).

  • We developed another formulation of variational upper-bound of MI term based on the Markov property: if any generative process follows 𝑍→𝑅→𝑋, then 𝐼(𝑍,𝑋)≀𝐼(𝑍,𝑅). Hence, we use an additional stochastic model 𝑒_πœ“ (π‘Ÿβ”‚π‘§). In other words, we let 𝐺(π‘Ÿ(𝑧)).

IB-GAN Architecture (tractable approximation)

  • The IB-Gan introduces the stochastic encoder 𝑒_πœ“ (π‘Ÿβ”‚π‘§) before the generator to constrain the MI between the generator and the noise 𝑧.

  • IB-GAN is partly analogous to that of 𝛽-VAE but does not suffer from the shortcoming of 𝛽-VAE generating blur image due to MSE loss and large 𝛽β‰₯1.

  • IB-GAN is an extension of InfoGAN, supplementing an information-constraining term that InfoGAN misses, and shows better performance in disentanglement learning.

Experiment

  • Comparison between methods with the disentanglement metrics in [1,5]. Our model’s scores are obtained from 32 random seeds, with a peak score of (0.826, 0.74). The baseline scores except InfoGAN are referred to [6].

Conclusion

  • IB-GAN is a novel unsupervised GAN-based model for learning disentangled representation. The IB-GAN's motivation for combining the GAN objective with the Information Bottleneck (IB) theory is straightforward. Still, it provides elegant limiting cases that recover both the standard GAN and the InfoGAN.

  • The IB-GAN not only achieves comparable disentanglement results to existing state-of-the-art VAE-based models but also produces a better quality of samples than standard GAN and InfoGAN.

  • The approach of constructing the variational upper bound of generative MI by introducing an intermediate stochastic representation is a universal methodology. It may advance the design of other generative models based on the generative MI in the future.

Last updated