Tuesday, June 9, 2026

From Kullback-Leibler divergence to Jensen–Shannon metric


Kullback-Leibler divergence

Kullback-Leibler divergence is outlined for 2 random variables X and Y by

Okay-L divergence is non-negative, and it’s zero if and provided that X and Y have the identical distribution. However it’s not a metric, for causes defined right here. For one factor, it’s not symmetric.

Jeffreys divergence

We will repair the symmetry downside by defining

J(X, Y) = D_{KL}(X || Y)  + D_{KL}(Y || X)

The J above stands for Jeffreys, for Harold Jeffreys. J is named both the symmetrized Okay-L divergence or Jeffreys’ divergence. It’s nonetheless a divergence, not a distance.

A distance (metric) d has to have 4 properties:

  1. d(x, x) = 0
  2. d(xy) > 0 if xy
  3. d(xy) = d(yx)
  4. d(xz) ≤ d(xy) + d(yz)

Okay-L divergence satisfies the primary two properties. Jeffreys’ divergence satisfies the primary three, however not the final one, the triangle inequality.

To point out that J doesn’t fulfill the triangle inequality, let X, Y, and Z be Bernoulli random variables with p equal to 0.1, 0.2, and 0.3 respectively. Then the next Python code reveals that the divergence from X to Y, plus the divergence from Y to Z, is lower than the divergence from X to Z. This might be like saying you could possibly get from LA to NYC quicker by having a layover in Denver reasonably than taking a direct flight.

from math import log

kl = lambda p, q: p*log(p/q) + (1-p)*log((1-p)/(1-q))
j  = lambda p, q: kl(p, q) + kl(q, p)

a = j(0.1, 0.2)
b = j(0.2, 0.3)
c = j(0.1, 0.3)
print(a + b, c)

This prints 0.135 and 0.270.

Jensen-Shannon distance

Jensen-Shannon distance turns Okay-L divergence right into a metric as follows. First, outline the random variable M to be the typical of X and Y. Then common the Okay-L divergence from M to every of X and Y. This defines the Jensen-Shannon divergence. It’s nonetheless not a metric, nevertheless it’s sq. root is, which defines the Jensen-Shannon distance.

begin{align*} M &= (X + Y)/2  text{JSD}(X, Y) &= tfrac{1}{2}D_{KL}(X || M) + tfrac{1}{2}D_{KL}(Y || M)  d_{JS}(X, Y) &= sqrt{text{JSD}(X, Y)} end{align*}

The next code provides an instance of Jensen-Shannon distance satisfying the triangle inequality.

def d(p, q):
    m = 0.5*(p + q)
    jsd = 0.5*kl(p, m) + 0.5*kl(q, m) 
    return jsd**0.5

a = d(0.1, 0.2)
b = d(0.2, 0.3)
c = d(0.1, 0.3)
print(a + b, c)

This prints 0.1817 and 0.1801. Now a layover makes the journey longer.

Related Articles

Latest Articles