IB-GAN
Last updated
Disentangled representations: a change in a single direction of the latent vector corresponds to changes in a single factor of variation of data while invariant to others.
GOAL: Learning an Encoder that can predict the disentangled representation. Learning a Decoder (or Generator) which can synthesize an image.
HARD: Achieving the goal without truth generative factors or supervision is hard.
GOAL: Obtaining the optimal representation encoder q_ϕ (z│x) that balances the trade-off between the maximization and minimization of both mutual information terms.
I(⋅,⋅) denotes mutual information (MI) between input variable X and target variable Y.
The learned representation Z acts as a minimal sufficient statistic of X for predicting Y.
IB-GAN introduces the upper bound of MI and 𝛽 term to InfoGAN’s objective, inspired by IB principle and 𝛽-VAE for the disentangled representation learning.
𝑰^𝑳 (⋅,⋅) and 𝑰^𝑼 (⋅,⋅) denote the lower and upper-bound of MI, respectively (𝜆 ≥𝛽). IB-GAN not only maximizes the shared information between the generator 𝐺 and the representation 𝑧 but also allows control of the maximum amount of information shared by them using 𝛽 analogously to that of 𝛽-VAE and IB theory.
The lower bound of MI is formulated by introducing the variational reconstructor 𝒒_𝝓 (𝒛|𝒙). Intuitively, the maximization of MI is achieved by reconstructing an input code 𝑧 from a generated sample 𝐺(𝑧)=𝑝_𝜃 (𝑥│𝑧), similar to the approach of InfoGAN.
Naïve variational upper-bound of generative MI introduces an approximating prior 𝒅(𝒙). However, any improper choice of 𝒅(𝒙) may severely downgrade the quality of the synthesized sample from generator 𝒑_𝜽 (𝒙|𝒛).
We developed another formulation of variational upper-bound of MI term based on the Markov property: if any generative process follows 𝑍→𝑅→𝑋, then 𝐼(𝑍,𝑋)≤𝐼(𝑍,𝑅). Hence, we use an additional stochastic model 𝑒_𝜓 (𝑟│𝑧). In other words, we let 𝐺(𝑟(𝑧)).
The IB-Gan introduces the stochastic encoder 𝑒_𝜓 (𝑟│𝑧) before the generator to constrain the MI between the generator and the noise 𝑧.
IB-GAN is partly analogous to that of 𝛽-VAE but does not suffer from the shortcoming of 𝛽-VAE generating blur image due to MSE loss and large 𝛽≥1.
IB-GAN is an extension of InfoGAN, supplementing an information-constraining term that InfoGAN misses, and shows better performance in disentanglement learning.
Comparison between methods with the disentanglement metrics in [1,5]. Our model’s scores are obtained from 32 random seeds, with a peak score of (0.826, 0.74). The baseline scores except InfoGAN are referred to [6].
IB-GAN is a novel unsupervised GAN-based model for learning disentangled representation. The IB-GAN's motivation for combining the GAN objective with the Information Bottleneck (IB) theory is straightforward. Still, it provides elegant limiting cases that recover both the standard GAN and the InfoGAN.
The IB-GAN not only achieves comparable disentanglement results to existing state-of-the-art VAE-based models but also produces a better quality of samples than standard GAN and InfoGAN.
The approach of constructing the variational upper bound of generative MI by introducing an intermediate stochastic representation is a universal methodology. It may advance the design of other generative models based on the generative MI in the future.