KL Divergence Online Demo

The intersection of two current events have born fruit:

To try out Shiny, I created an interactive visualization for Kullback-Leibler divergence (or KL Divergence). Right now, it only supports two univariate Gaussians, which should be sufficient to build some intuition.

If you like it, let me know! If it turns out to be popular, I might add more features, or create similar visualizations for other concepts!

What is KL Divergence? What am I seeing?

Consider an unknown probability distribution p(x), which we’re trying to approximate with probability distribution q(x), then

    \[\text{KL}(p||q) = - \int p(x) \ln \frac{q(x)}{p(x)} dx\]

can informally be interpreted as the amount of information being lost by approximating p with q. As you might imagine, this has several applications in Machine Learning. A recurring pattern is to fit parameters to a model by minimizing an approximation of \text{KL}(p||q) (ie, making q “as similar” to p as possible). This blog post elaborates in a fun and informative way. If you have never heard about KL divergence before, Bishop provides a more formal (but still easy to understand) introduction in Section 1.6 of PRML.

Suggested exercises with the interactive plot

Using the visualization tool, find out (or verify) the answer to the following questions:

  • Is \text{KL}(p||q) = \text{KL}(q||p)? Always? Never?
  • When is \text{KL}(p||q) = 0?
  • Let r(x) = \mathcal{N}(0, 1) and s(x) = \mathcal{N}(0, 2). Which is larger: \text{KL}(r||s) or \text{KL}(s||r)? Why?
  • Is \text{KL}(p||q) ever negative? When, or why not?

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.