Kullback-Leibler divergence
Kullback-Leibler divergence is outlined for 2 random variables X and Y by
Okay-L divergence is non-negative, and it’s zero if and provided that X and Y have the identical distribution. However it’s not a metric, for causes defined right here. For one factor, it’s not symmetric.
Jeffreys divergence
We will repair the symmetry downside by defining
The J above stands for Jeffreys, for Harold Jeffreys. J is named both the symmetrized Okay-L divergence or Jeffreys’ divergence. It’s nonetheless a divergence, not a distance.
A distance (metric) d has to have 4 properties:
- d(x, x) = 0
- d(x, y) > 0 if x ≠ y
- d(x, y) = d(y, x)
- d(x, z) ≤ d(x, y) + d(y, z)
Okay-L divergence satisfies the primary two properties. Jeffreys’ divergence satisfies the primary three, however not the final one, the triangle inequality.
To point out that J doesn’t fulfill the triangle inequality, let X, Y, and Z be Bernoulli random variables with p equal to 0.1, 0.2, and 0.3 respectively. Then the next Python code reveals that the divergence from X to Y, plus the divergence from Y to Z, is lower than the divergence from X to Z. This might be like saying you could possibly get from LA to NYC quicker by having a layover in Denver reasonably than taking a direct flight.
from math import log kl = lambda p, q: p*log(p/q) + (1-p)*log((1-p)/(1-q)) j = lambda p, q: kl(p, q) + kl(q, p) a = j(0.1, 0.2) b = j(0.2, 0.3) c = j(0.1, 0.3) print(a + b, c)
This prints 0.135 and 0.270.
Jensen-Shannon distance
Jensen-Shannon distance turns Okay-L divergence right into a metric as follows. First, outline the random variable M to be the typical of X and Y. Then common the Okay-L divergence from M to every of X and Y. This defines the Jensen-Shannon divergence. It’s nonetheless not a metric, nevertheless it’s sq. root is, which defines the Jensen-Shannon distance.
The next code provides an instance of Jensen-Shannon distance satisfying the triangle inequality.
def d(p, q):
m = 0.5*(p + q)
jsd = 0.5*kl(p, m) + 0.5*kl(q, m)
return jsd**0.5
a = d(0.1, 0.2)
b = d(0.2, 0.3)
c = d(0.1, 0.3)
print(a + b, c)
This prints 0.1817 and 0.1801. Now a layover makes the journey longer.
