Small Simplicity

KL Divergence

Lec on VAE, by Ali Ghodsi: This lecture motivates KL Divergence as the measurement of difference in the average information content of two random varialbes, whose distributions are \(p\) and \(q\) in in the article.
Wiki: It clears up different terminologies that are (misused) to refer to the KL.
Information Theory for Intelligent People, by Simon Dedeo
- It gives a great example of "answering 20 questions" problem as a way to think about basic concepts in info theory, including entropy, KL divergence and mutual information.
- \(H(X)\) is equal to the average length of an arbitrary tree, which is the number of questions to get to choice \(x\)
- "(Using \(H(X)\),) (f)or any probability distribution, we can now talk about "how uncertain we are about the outcome", "how much information is in the process", or "how much entropy the process has", and even measure it, in bits" (p.3)

posted at 00:00 · ML · divergence similarity