Wednesday, October 29, 2025

Why Econometrics is Complicated Half II: The Independence Zoo


In econometrics it’s completely essential to maintain monitor of which issues are dependent and that are unbiased. To make this as complicated as potential for college students, a typical introductory econometrics course strikes forwards and backwards between completely different notions of dependence, stopping often to say that they’re not equal however by no means totally explaining why, on the premise that “you’ve definitely already realized this in your introductory likelihood and statistics course.” I bear in mind discovering this extraordinarily irritating as a scholar, however solely not too long ago managed to translate this frustration into significant adjustments in my very own instructing. Constructing on a few of my latest instructing supplies, this publish is a discipline information to the menagerie–or at the least petting zoo–of “dependence” notions that seem frequently in econometrics. We’ll look at every property by itself together with the relationships between them, utilizing the straightforward examples to construct your instinct. Since an image is value a thousand phrases, right here’s one which summarizes your complete publish:

Determine 1: Completely different notions of dependence in econometrics and their relationships. A directed double arrow signifies that one property implies one other.

Conditions

Whereas written at an introductory stage, this publish assumes primary familiarity with calculations involving discrete and steady random variables.
Specifically, I assume that:

  • You already know the definitions of anticipated worth, variance, covariance, and correlation.
  • You’re comfy working with joint, marginal, and conditional distributions of a pair of discrete random variables.
  • You perceive the uniform distribution and how one can compute its moments (imply, variance, and so on.).
  • You’ve encountered the notion of conditional expectation and the legislation of iterated expectations.

When you’re a bit rusty on this materials, lectures 7-11 from these slides needs to be useful. For bivariate, discrete distributions I additionally counsel watching this video from 1:07:00 to the tip and this different video from 0:00:00 as much as the one hour mark.

Two Examples

Instance #1 – Discrete RVs ((X,Y))

My first instance includes two discrete random variables (X) and (Y) with joint likelihood mass perform (p_{XY}(x,y)) given by

(X = -1) (1/3) (0)
(X = 0) (0) (1/3)
(X= 1) (1/3) (0)

Even with out doing any math, we see that figuring out (X) conveys details about (Y), and vice-versa. For instance, if (X = -1) then we all know that (Y) should equal zero. Equally, if (Y=1) then (X) should equal zero. Spend a little bit of time fascinated about this joint distribution earlier than studying additional. We’ll have loads of time for arithmetic under, nevertheless it’s all the time value seeing the place our instinct takes us earlier than calculating all the things.

To streamline our dialogue under, it is going to be useful to work out a number of primary outcomes about (X) and (Y). A fast calculation with (p_{XY}) reveals that
[
mathbb{E}(XY) equiv sum_{text{all } x} sum_{text{all } y}= x y cdot p_{XY}(x,y) = 0.
]

Calculating the marginal pmfs for (X) we see that
[
p_X(-1) = p_X(0) = p_X(1) = 1/3 implies mathbb{E}(X) equiv sum_{text{all } x} x cdot p_X(x) = 0.
]

Equally, calculating the marginal pmf of (Y), we get hold of
[
p_Y(0) = 2/3,, p_Y(1) = 1/3 implies mathbb{E}(Y) equiv sum_{text{all } y} p_Y(y) = 1/3.
]

We’ll use these outcomes as substances under as we clarify and relate three key notions of dependence: correlation, conditional imply independence, and statistical independence.

Instance #2 – Steady RVs ((W,Z))

My second instance considerations two steady random variables (W) and (Z), the place (W sim textual content{Uniform}(-1, 1)) and (Z = W^2).
On this instance, (W) and (Z) are very strongly associated: if I let you know that the conclusion of (W) is (w), then you realize for certain that the conclusion of (Z) have to be (w^2). Once more, maintain this instinct in thoughts as we work by means of the arithmetic under.

Within the the rest of the publish, we’ll discover it useful to refer to some properties of (W) and (Z), specifically
[
begin{aligned}
mathbb{E}[W] &equiv int_{-infty}^infty wcdot f_W(w), dw = int_{-1}^1 wcdot frac{1}{2},dw = left. frac{w^2}{4}proper|_{-1}^1 = 0
mathbb{E}[Z] &equiv mathbb{E}[W^2] = int_{-infty}^{infty} w^2 cdot f_W(w), dw = int_{-1}^1 w^2 cdot frac{1}{2} , dw = left. frac{w^3}{6}proper|_{-1}^1 = frac{1}{3}
mathbb{E}[WZ] &= mathbb{E}[W^3] equiv int_{-infty}^infty w^3 cdot f_W(w), dw =int_{-1}^1 w^3 cdot frac{1}{2}, dw = left. frac{w^4}{8} proper|_{-1}^1 = 0.
finish{aligned}
]

Since (W) is uniform on the interval ([-1,1]), its pdf is solely (1/2) on this interval, and 0 in any other case.
All else equal, I favor simple integration issues!

Uncorrelatedness

Recall that the correlation between two random variables (X) and (Y) is outlined as
[
text{Corr}(X,Y) equiv frac{text{Cov}(X,Y)}{text{SD}(X)text{SD}(Y)} = frac{mathbb{E}[(X – mu_X)(Y – mu_Y)]}{sqrt{mathbb{E}[(X – mu_X)^2]mathbb{E}[(Y – mu_Y)^2]}}
]

the place (mu_X equiv mathbb{E}(X)) and (mu_Y equiv mathbb{E}(Y)). We are saying that (X) and (Y) are uncorrelated if (textual content{Corr}(X,Y)= 0). Until (X) and (Y) are each constants their variances have to be constructive. Which means the denominator of our expression for (textual content{Corr}(X,Y)) is likewise constructive.
It follows that zero correlation is identical factor as zero covariance. Correlation is solely covariance rescaled in order that the models of (X) and (Y) cancel out and the end result all the time lies between (-1) and (1).

Correlation and covariance are each measures of linear dependence. If (X) is, on common, above its imply when (Y) is above its imply, then (textual content{Corr}(X,Y)) and (textual content{Cov}(X,Y)) are each constructive. If (X) is, on common, under its imply when (Y) is above its imply, then (textual content{Corr}(X,Y)) and (textual content{Cov}(X,Y)) are each adverse. If there’s, on common, no linear relationship (X) and (Y), then each the correlation and covariance between them are zero. Utilizing the “shortcut formulation” for covariance, specifically
[
text{Cov}(X,Y) equiv mathbb{E}[(X – mu_X)(Y – mu_Y)] = mathbb{E}[XY] – mathbb{E}[X]mathbb{E}[Y],
]

it follows that uncorrelatedness is equal to
[
mathbb{E}[XY] = mathbb{E}[X]mathbb{E}[Y].
]

Rendering this in English slightly than arithmetic,

Two random variables (X) and (Y) are uncorrelated if and provided that the expectation of their product equals the product of their expectations.

Conditional Imply Independence

We are saying that (Y) is imply unbiased of (X) if (mathbb{E}(Y|X) = mathbb{E}(Y)). In phrases,

(Y) is imply unbiased of (X) if the conditional imply of (Y) given (X) equals the unconditional imply of (Y).

Simply to make issues complicated, this property is usually referred to as “conditional imply independence” and generally referred to as merely “imply independence.” The phrases are fully interchangeagle. Reversing the roles of (X) and (Y), we are saying that (X) is imply unbiased of (Y) if the conditional imply of (X) given (Y) is identical because the unconditional imply of (X).
Spoiler alert: it’s potential for (X) to be imply unbiased of (Y) whereas (Y) is not imply unbiased of (X). We’ll talk about this additional under.

To raised perceive the idea of imply independence, let’s shortly evaluation the distinction between an unconditional imply and a conditional imply. The unconditional imply (mathbb{E}(Y)), also referred to as the “anticipated worth” or “expectation” of (Y), is a fixed quantity. If (Y) is discrete, that is merely the probability-weighted common of all potential realizations of (Y), specifically
[
mathbb{E}(Y) = sum_{text{all } y} y cdot p_Y(y).
]

If (Y) is steady, it’s the identical concept however with an integral changing the sum and a likelihood density (f_Y(y)) multiplied by (dy) changing the likelihood mass perform (p_Y(y)). Both approach, we’re merely multiplying numbers collectively and including up the end result.
Regardless of the similarity in notation, the conditional expectation (mathbb{E}(Y|X)) is a perform of (X) that tells us how the imply of (Y) varies with (X). Since (X) is a random variable, so is (mathbb{E}(Y|X)). If (Y) is conditionally imply unbiased of (X) then (mathbb{E}(Y|X)) equals (mathbb{E}(Y)). In phrases, the imply of (Y) doesn’t differ with (X). Whatever the worth that (X) takes on, the imply of (Y) is identical: (mathbb{E}(Y)).

There’s one other approach to consider this property when it comes to prediction. With a little bit of calculus, we are able to present that (mathbb{E}(Y)) solves the next optimization downside:
[
min_{text{all constants } c} mathbb{E}[(Y – c)^2].
]

In different phrases, (mathbb{E}(Y)) is the fixed quantity that’s as shut as potential to (Y) on common, the place “shut” is measured by squared euclidean distance. On this sense, we are able to consider (mathbb{E}(Y)) as our “finest guess” of the worth that (Y) will take. Once more utilizing a little bit of calculus, it seems that (mathbb{E}(Y|X)) solves the next optimization downside:
[
min_{text{all functions } g} mathbb{E}[{Y – g(X) }^2].
]

(See this video for a proof.) Thus, (mathbb{E}(Y|X)) is the perform of (X) that’s as shut as potential to (Y) on common, the place “shut” is measured utilizing squared Euclidean distance. Thus, (mathbb{E}(Y|X)) is our “finest guess” of (Y) after observing (X). We have now seen that (mathbb{E}(Y)) and (mathbb{E}(Y|X)) are the options to 2 associated however distinct optimization issues; the previous is a fixed quantity that doesn’t rely upon the conclusion of (X) whereas the latter is a perform of (X). Imply independence is the particular case during which the options to the 2 optimization issues coincide: (mathbb{E}(Y|X) = mathbb{E}(Y)).
Due to this fact,

(Y) is imply unbiased of (X) if our greatest guess of (Y) taking (X) under consideration is identical as our greatest guess of (Y) ignoring (X), the place “finest” is outlined by “minimizes common squared distance to (Y).”

Instance #1: (X) is imply unbiased of (Y).

Utilizing the desk of joint chances for Instance #1 above, we discovered that (mathbb{E}(X) = 0). To find out whether or not (X) is imply unbiased of (Y), we have to calculate (mathbb{E}(X|Y=y)), which we are able to accomplish as follows:
[
begin{aligned}
mathbb{E}(X|y=0) &= sum_{text{all } x} x cdot mathbb{P}(X=x|Y=0) = sum_{text{all } x} x cdot frac{mathbb{P}(X=x,Y=0)}{mathbb{P}(Y=0)}
mathbb{E}(X|y=1) &= sum_{text{all } x} x cdot mathbb{P}(X=x|Y=1) = sum_{text{all } x} x cdot frac{mathbb{P}(X=x,Y=1)}{mathbb{P}(Y=1)}.
end{aligned}
]

Substituting the joint and marginal chances from the desk above, we discover that
[
mathbb{E}(X|Y=0) = 0, quad
mathbb{E}(X|Y=1) = 0.
]

Thus (mathbb{E}(X|Y=y)) merely equals zero, whatever the realization (y) of (Y). Since (mathbb{E}(X) = 0) we’ve proven that (X) is conditionally imply unbiased of (X).

Instance #1: (Y) is NOT imply unbiased of (X).

To find out whether or not (Y) is imply unbiased of (X) we have to calculate (mathbb{E}(Y|X)).
However that is simple. From the desk we see that (Y) is recognized with certainty after we observe (X): if (X = -1) then (Y = 0), if (X = 0) then (Y = 1), and if (X = 1) then (Y = 0). Thus, with out doing any math in any respect we discover that
[
mathbb{E}(Y|X=-1) = 0, quad
mathbb{E}(Y|X=0) = 1, quad
mathbb{E}(Y|X=1) = 0.
]

(When you don’t consider me, work by means of the arithmetic your self!) This clearly is dependent upon (X), so (Y) is not imply unbiased of (X).

Instance #2: (Z) is NOT imply unbiased of (W).

Above we calculated that (mathbb{E}(Z) = mathbb{E}(W^2) = 1/3). However the conditional expectation is
[
mathbb{E}(Z|W) = mathbb{E}(W^2|W) = W^2
]

utilizing the “taking out what is understood” property: conditional on (W), we all know (W^2) and might therefore deal with it as if it had been a relentless in an unconditional expectation, pulling it in entrance of the (mathbb{E}) operator. We see that (mathbb{E}(Z|W)) doesn’t equal (1/3): its worth is dependent upon (W). Due to this fact (Z) is just not imply unbiased of (W).

Instance #2: (W) is imply unbiased of (Z).

This one is trickier. To maintain this publish at an elementary stage, my clarification received’t be fully rigorous. For extra particulars see right here. We have to calculate (mathbb{E}(W|Z)). Since (Z equiv W^2) this is identical factor as (mathbb{E}(W|W^2)). Let’s begin with an instance. Suppose we observe (Z = 1). Which means (W^2 = 1) so (W) both equals (1) or (-1). How seemingly is every of those potential realizations of (W) provided that (W^2 = 1)? As a result of the density of (W) is symmetric about zero, (f_W(-1) = f_W(1)). So provided that (W^2 = 1), it’s simply as seemingly that (W = 1) as it’s that (W = -1). Due to this fact,
[
mathbb{E}(W|W^2 = 1) = 0.5 times 1 + 0.5 times -1 = 0.
]

Generalizing this concept, if we observe (Z = z) then (W = sqrt{z}) or (-sqrt{z}). However since (f_W(cdot)) is symmetric about zero, these prospects are equally seemingly. Due to this fact,
[
mathbb{E}(W|Z=z) = 0.5 times sqrt{z} – 0.5 times sqrt{z} = 0.
]

Above we calculated that (mathbb{E}(W) = 0). Due to this fact, (W) is imply unbiased of (Z).

Statistical Independence

Once you see the phrase “unbiased” with none qualification, this implies “statistically unbiased.” Consistent with this utilization, I usually write “unbiased” slightly than “statistically unbiased.” Whichever terminology you favor, there are three equal methods of defining this concept:

(X) and (Y) are statistically unbiased if and provided that:

  1. their joint distribution equals the product of their marginals, or
  2. the conditional distribution of (Y|X) equals the unconditional distribution of (Y), or
  3. the conditional distribution of (X|Y) equals the unconditional distribution of (X).

The hyperlink between these three alternate options is the definition of conditional likelihood. Suppose that (X) and (Y) are discrete random variables with joint pmf (p_{XY}), marginal pmfs (p_X) and (p_Y), and conditional pmfs (p_Y) and (p_X). Model 1 requires that (p_{XY}(x,y) = p_X(x) p_Y(y)) for all realizations (x,y). However by the definition of conditional likelihood,
[
p_Y(x|y) equiv frac{p_{XY}(x,y)}{p_Y(y)}, quad
p_X(y|x) equiv frac{p_{XY}(x,y)}{p_X(x)}.
]

If (p_{XY} = p_X p_Y), these expressions simplify to
[
p_Y(x|y) equiv frac{p_{X}(x)p_Y(y)}{p_Y(y)} = p_X(x), quad
p_X(y|x) equiv frac{p_{X}(x)p_Y(y)}{p_X(x)} = p_Y(y)
]

so 1 implies 2 and three. Equally, if (p_Y=p_X) then by the definition of conditional likelihood
[
p_Y(x|y) equiv frac{p_{XY}(x,y)}{p_Y(y)} = p_X(x).
]

Re-arranging, this reveals that (p_{XY} = p_X p_Y), so 3 implies 1. An nearly similar argument reveals that 2 implies 1, finishing our proof that these three seemingly completely different definitions of statistical independence are equal.
If (X) and (Y) are steady, the thought is identical however with densities changing likelihood mass features, e.g. (f_{XY}(x,y) = f_X(x) f_Y(y)) and so forth.

In most examples, it’s simpler to indicate independence (or the shortage thereof) utilizing 2 or 3 slightly than 1. These latter two definitions are additionally extra intuitively interesting. To say that the conditional distribution of (X|Y) is identical because the unconditional distribution of (X) is identical factor as saying that

(Y) gives completely no details about (X) in any respect.

If studying (Y) tells us something in any respect about (X), then (X) and (Y) are usually not unbiased. Equally, if (X) tells us something about (Y) in any respect, then (X) and (Y) are usually not unbiased.

Instance #1: (X) and (Y) are NOT unbiased.

If I let you know that (X = 0), then you realize for certain that (Y = 0). Earlier than I instructed you this, you didn’t know that (Y) would equal zero: it’s a random variable with help set ({0,1}). Since studying (X) has the potential to let you know one thing about (Y), (X) and (Y) are usually not unbiased. That was simple! For further credit score, (p_{XY}(-1,0) = 1/3) however (p_X(-1)p_Y(0) = 1/3 instances 2/3 = 2/9). Since these are usually not equal, (p_{XY}neq p_X p_Y) so the marginal doesn’t equal the product of the joint. We didn’t have to examine this, nevertheless it’s reassuring to see that all the things works out because it ought to.

Instance #2: (W) and (Z) are NOT unbiased.

Once more, this one is simple: studying that (W = w) tells us that (Z = w^2). We didn’t know this earlier than, so (W) and (Z) can’t be unbiased.

Relating the Three Properties

Now that we’ve described uncorrelatedness, imply independence, and statistical independence, we’re able to see how these properties relate to at least one one other. Let’s begin by reviewing what we realized from the examples given above. In instance #1:

  • (X) and (Y) are uncorrelated
  • (X) is imply unbiased of (Y)
  • (Y) is not imply unbiased of (X)
  • (X) and (Y) are not unbiased.

In instance #2, we discovered that

  • (W) and (Z) are uncorrelated
  • (W) is imply unbiased of (Z).
  • (Z) is not imply unbiased of (W).
  • (W) and (Z) are not unbiased.

These are value remembering, as a result of they’re comparatively easy and supply a supply of counterexamples that will help you keep away from making tempting however incorrect statements about correlation, imply independence, and statistical independence. For instance:

  1. Uncorrelatedness does NOT IMPLY statistical independence: (X) and (Y) are usually not unbiased, however they’re uncorrelated. (Ditto for (W) and (Z).)
  2. Imply independence does NOT IMPLY statistical independence: (W) is imply unbiased of (Z) however these random variables are usually not unbiased.
  3. Imply independence is NOT SYMMETRIC: (X) is imply unbiased of (Y), however (Y) is just not imply unbiased of (X).

Now that we’ve a deal with on what’s not true, let’s see what could be mentioned about correlation, imply independence, and statistical independence.

Statistical Independence Implies Conditional Imply Independence

Statistical independence is the “strongest” of the three properties: it implies each imply independence and uncorrelatedness. We’ll present this in two steps. In step one, we’ll present that statistical independence implies imply independence. Within the second step we’ll present that imply independence implies uncorrelatedness. Then we’ll carry this overly-long weblog publish to a detailed! Suppose that (X) and (Y) are discrete random variables. (For the continual case, exchange sums with integrals.) If (X) is statistically unbiased of (Y), then (p_X = p_Y) and (p_Y = p_X). Therefore,
[
begin{aligned}
mathbb{E}(Y|X=x) &equiv sum_{text{all } y} y cdot p_X(y|x) = sum_{text{all } y} y cdot p_Y(y) equiv mathbb{E}(Y)
mathbb{E}(X|Y=y) &equiv sum_{text{all } x} x cdot p_Y(x|y) = sum_{text{all } x} x cdot p_X(x) equiv mathbb{E}(X)
end{aligned}
]

so (Y) is imply unbiased of (X) and (X) is imply unbiased of (Y).

Abstract

On this publish we proven that that:

  • Statistical Independence (implies) Imply Independence (implies) Uncorrelatedness.
  • Uncorrelatedness doesn’t suggest imply independence or statistical independence.
  • Imply independence doesn’t suggest statistical independence.
  • Statistical independence and correlation are symmetric; imply independence is just not.

Studying the determine from the very starting of this publish from high to backside: statistical independence is the strongest notion, adopted by imply independence, adopted by uncorrelatedness.

Related Articles

Latest Articles