IB-GAN
Disentangled Representation Learning
Disentangled representations: a change in a single direction of the latent vector corresponds to changes in a single factor of variation of data while invariant to others.
GOAL: Learning an Encoder that can predict the disentangled representation. Learning a Decoder (or Generator) which can synthesize an image.
HARD: Achieving the goal without truth generative factors or supervision is hard.
Information Bottleneck (IB) Principle
GOAL: Obtaining the optimal representation encoder q_Ο (zβx) that balances the trade-off between the maximization and minimization of both mutual information terms.
I(β ,β ) denotes mutual information (MI) between input variable X and target variable Y.
The learned representation Z acts as a minimal sufficient statistic of X for predicting Y.
Information Bottleneck GAN
IB-GAN introduces the upper bound of MI and π½ term to InfoGANβs objective, inspired by IB principle and π½-VAE for the disentangled representation learning.
π°^π³ (β ,β ) and π°^πΌ (β ,β ) denote the lower and upper-bound of MI, respectively (π β₯π½). IB-GAN not only maximizes the shared information between the generator πΊ and the representation π§ but also allows control of the maximum amount of information shared by them using π½ analogously to that of π½-VAE and IB theory.
Inference
Variational Lower-Bound
The lower bound of MI is formulated by introducing the variational reconstructor π_π (π|π). Intuitively, the maximization of MI is achieved by reconstructing an input code π§ from a generated sample πΊ(π§)=π_π (π₯βπ§), similar to the approach of InfoGAN.
Variational Upper-Bound
NaΓ―ve variational upper-bound of generative MI introduces an approximating prior π (π). However, any improper choice of π (π) may severely downgrade the quality of the synthesized sample from generator π_π½ (π|π).
We developed another formulation of variational upper-bound of MI term based on the Markov property: if any generative process follows πβπ βπ, then πΌ(π,π)β€πΌ(π,π ). Hence, we use an additional stochastic model π_π (πβπ§). In other words, we let πΊ(π(π§)).
IB-GAN Architecture (tractable approximation)
The IB-Gan introduces the stochastic encoder π_π (πβπ§) before the generator to constrain the MI between the generator and the noise π§.
IB-GAN is partly analogous to that of π½-VAE but does not suffer from the shortcoming of π½-VAE generating blur image due to MSE loss and large π½β₯1.
IB-GAN is an extension of InfoGAN, supplementing an information-constraining term that InfoGAN misses, and shows better performance in disentanglement learning.
Experiment
Comparison between methods with the disentanglement metrics in [1,5]. Our modelβs scores are obtained from 32 random seeds, with a peak score of (0.826, 0.74). The baseline scores except InfoGAN are referred to [6].
Conclusion
IB-GAN is a novel unsupervised GAN-based model for learning disentangled representation. The IB-GAN's motivation for combining the GAN objective with the Information Bottleneck (IB) theory is straightforward. Still, it provides elegant limiting cases that recover both the standard GAN and the InfoGAN.
The IB-GAN not only achieves comparable disentanglement results to existing state-of-the-art VAE-based models but also produces a better quality of samples than standard GAN and InfoGAN.
The approach of constructing the variational upper bound of generative MI by introducing an intermediate stochastic representation is a universal methodology. It may advance the design of other generative models based on the generative MI in the future.
Last updated