# IB-GAN

## Disentangled Representation Learning

<figure><img src="https://704680750-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M01K_pXDa9D-2q3E1yB%2Fuploads%2FiCpXriR4Ud765TvG0bmN%2Fimage.png?alt=media&#x26;token=fe626142-0931-4745-aeaf-2862f370fd66" alt=""><figcaption></figcaption></figure>

* Disentangled representations: a change in a single direction of the latent vector corresponds to changes in a single factor of variation of data while invariant to others.
* GOAL: Learning an Encoder that can predict the disentangled representation. Learning a Decoder (or Generator) which can synthesize an image.
* HARD: Achieving the goal without truth generative factors or supervision is hard.

## Information Bottleneck (IB) Principle

<figure><img src="https://704680750-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M01K_pXDa9D-2q3E1yB%2Fuploads%2FTY1Vd2GrnQsDPjbeQQY9%2Fimage.png?alt=media&#x26;token=0c227344-7df1-477f-8847-736431a3e9bd" alt="" width="375"><figcaption></figcaption></figure>

* GOAL: Obtaining the optimal representation encoder q\_ϕ (z│x) that balances the trade-off between the maximization and minimization of both mutual information terms.
* I(⋅,⋅) denotes mutual information (MI) between input variable X and target variable Y.
* The learned representation Z acts as a minimal sufficient statistic of X for predicting Y.

## Information Bottleneck GAN

IB-GAN introduces the upper bound of MI and 𝛽 term to InfoGAN’s objective, inspired by IB principle and 𝛽-VAE for the disentangled representation learning.

<figure><img src="https://704680750-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M01K_pXDa9D-2q3E1yB%2Fuploads%2FFCXL91TINESagJWw5f14%2Fimage.png?alt=media&#x26;token=07a0f136-9410-47f5-b84c-b8a0d353d329" alt=""><figcaption></figcaption></figure>

𝑰^𝑳 (⋅,⋅) and 𝑰^𝑼 (⋅,⋅) denote the lower and upper-bound of MI, respectively (𝜆 ≥𝛽). IB-GAN not only maximizes the shared information between the generator 𝐺 and the representation 𝑧 but also allows control of the maximum amount of information shared by them using 𝛽 analogously to that of 𝛽-VAE and IB theory.

## Inference

### Variational Lower-Bound

<figure><img src="https://704680750-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M01K_pXDa9D-2q3E1yB%2Fuploads%2F4T52W0GL9TKyiBF74ngQ%2Fimage.png?alt=media&#x26;token=0387a243-0d1e-4934-aa14-0bd1cf759921" alt="" width="563"><figcaption></figcaption></figure>

* The lower bound of MI is formulated by introducing the variational reconstructor 𝒒\_𝝓 (𝒛|𝒙). Intuitively, the maximization of MI is achieved by reconstructing an input code 𝑧 from a generated sample 𝐺(𝑧)=𝑝\_𝜃 (𝑥│𝑧), similar to the approach of InfoGAN.

### Variational Upper-Bound

<figure><img src="https://704680750-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M01K_pXDa9D-2q3E1yB%2Fuploads%2Fx1DDof8Dazl9YdZZvb0L%2Fimage.png?alt=media&#x26;token=bb34caa5-05c9-4893-a538-c8b5a2350efe" alt="" width="563"><figcaption></figcaption></figure>

* Naïve variational upper-bound of generative MI introduces an approximating prior 𝒅(𝒙). However, any improper choice of 𝒅(𝒙) may severely downgrade the quality of the synthesized sample from generator 𝒑\_𝜽 (𝒙|𝒛).

<figure><img src="https://704680750-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M01K_pXDa9D-2q3E1yB%2Fuploads%2FpUZxOdHyxYf65RybDUjW%2Fimage.png?alt=media&#x26;token=c98d5d85-5ad0-4072-b616-f6b88b05a933" alt="" width="563"><figcaption></figcaption></figure>

* We developed another formulation of variational upper-bound of MI term based on the Markov property: if any generative process follows 𝑍→𝑅→𝑋, then 𝐼(𝑍,𝑋)≤𝐼(𝑍,𝑅). Hence, we use an additional stochastic model 𝑒\_𝜓 (𝑟│𝑧). In other words, we let 𝐺(𝑟(𝑧)).

## IB-GAN Architecture (tractable approximation)

<figure><img src="https://704680750-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M01K_pXDa9D-2q3E1yB%2Fuploads%2F40loyNWleOo6VVnASHWu%2Fimage.png?alt=media&#x26;token=fc56570b-6592-4fca-ba1c-28de27716e3e" alt="" width="563"><figcaption></figcaption></figure>

![](https://704680750-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M01K_pXDa9D-2q3E1yB%2Fuploads%2FvEPxKxjpajdk0cume8ZI%2Fimage.png?alt=media\&token=e0dc26de-3837-4684-be22-b85104f66679)

* The IB-Gan introduces the stochastic encoder 𝑒\_𝜓 (𝑟│𝑧) before the generator to constrain the MI between the generator and the noise 𝑧.&#x20;
* IB-GAN is partly analogous to that of 𝛽-VAE but does not suffer from the shortcoming of 𝛽-VAE generating blur image due to MSE loss and large 𝛽≥1.&#x20;
* IB-GAN is an extension of InfoGAN, supplementing an information-constraining term that InfoGAN misses, and shows better performance in disentanglement learning.

## Experiment

<figure><img src="https://704680750-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M01K_pXDa9D-2q3E1yB%2Fuploads%2FUCqKFtWksXuEy9imHXEH%2Fimage.png?alt=media&#x26;token=25f68e79-4f65-40d9-bac5-8f1ad0a647d5" alt="" width="563"><figcaption><p>Example of generated images from IB-GAN in the latent traversals experiment [1]. (a) IB-GAN captures many attributes on the CelebA [2] and (b) 3D Chairs dataset [3].</p></figcaption></figure>

<figure><img src="https://704680750-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M01K_pXDa9D-2q3E1yB%2Fuploads%2FdejdY8DCBiy1RIuD0Wd3%2Fimage.png?alt=media&#x26;token=80906eff-aaca-488c-8b96-46ead14addaa" alt="" width="563"><figcaption></figcaption></figure>

<figure><img src="https://704680750-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M01K_pXDa9D-2q3E1yB%2Fuploads%2FIEuQ1YCFpBwZkDn4jziX%2Fimage.png?alt=media&#x26;token=b81be24f-8a94-47ef-ba8c-98e83f5d88e5" alt="" width="563"><figcaption></figcaption></figure>

<figure><img src="https://704680750-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M01K_pXDa9D-2q3E1yB%2Fuploads%2F7sZP59h7BdBb0q9K9G3T%2Fimage.png?alt=media&#x26;token=b236cc57-d07f-4d70-a890-ed1afb029ec6" alt="" width="375"><figcaption></figcaption></figure>

* Comparison between methods with the disentanglement metrics in \[1,5]. Our model’s scores are obtained from 32 random seeds, with a peak score of (0.826, 0.74). The baseline scores except InfoGAN are referred to \[6].

## Conclusion

* IB-GAN is a novel unsupervised GAN-based model for learning disentangled representation. The IB-GAN's motivation for combining the GAN objective with the Information Bottleneck (IB) theory is straightforward. Still, it provides elegant limiting cases that recover both the standard GAN and the InfoGAN.&#x20;
* The IB-GAN not only achieves comparable disentanglement results to existing state-of-the-art VAE-based models but also produces a better quality of samples than standard GAN and InfoGAN.&#x20;
* The approach of constructing the variational upper bound of generative MI by introducing an intermediate stochastic representation is a universal methodology. It may advance the design of other generative models based on the generative MI in the future.
