Law of total expectation

Proposition in probability theory

The proposition in probability theory known as the law of total expectation,[1] the law of iterated expectations[2] (LIE), Adam's law,[3] the tower rule,[4] and the smoothing theorem,[5] among other names, states that if X {\displaystyle X} is a random variable whose expected value E ( X ) {\displaystyle \operatorname {E} (X)} is defined, and Y {\displaystyle Y} is any random variable on the same probability space, then

E ( X ) = E ( E ( X Y ) ) , {\displaystyle \operatorname {E} (X)=\operatorname {E} (\operatorname {E} (X\mid Y)),}

i.e., the expected value of the conditional expected value of X {\displaystyle X} given Y {\displaystyle Y} is the same as the expected value of X {\displaystyle X} .

Note: The conditional expected value E(X | Y), with Y a random variable, is not a simple number; it is a random variable whose value depends on the value of Y. That is, the conditional expected value of X given the event Y = y is a number and it is a function of y. If we write g(y) for the value of E(X | Y = y) then the random variable E(X | Y) is g(Y).

One special case states that if { A i } {\displaystyle {\left\{A_{i}\right\}}} is a finite or countable partition of the sample space, then

E ( X ) = i E ( X A i ) P ( A i ) . {\displaystyle \operatorname {E} (X)=\sum _{i}{\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})}.}

Example

Suppose that only two factories supply light bulbs to the market. Factory X {\displaystyle X} 's bulbs work for an average of 5000 hours, whereas factory Y {\displaystyle Y} 's bulbs work for an average of 4000 hours. It is known that factory X {\displaystyle X} supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?

Applying the law of total expectation, we have:

E ( L ) = E ( L X ) P ( X ) + E ( L Y ) P ( Y ) = 5000 ( 0.6 ) + 4000 ( 0.4 ) = 4600 {\displaystyle {\begin{aligned}\operatorname {E} (L)&=\operatorname {E} (L\mid X)\operatorname {P} (X)+\operatorname {E} (L\mid Y)\operatorname {P} (Y)\\[3pt]&=5000(0.6)+4000(0.4)\\[2pt]&=4600\end{aligned}}}

where

  • E ( L ) {\displaystyle \operatorname {E} (L)} is the expected life of the bulb;
  • P ( X ) = 6 10 {\displaystyle \operatorname {P} (X)={6 \over 10}} is the probability that the purchased bulb was manufactured by factory X {\displaystyle X} ;
  • P ( Y ) = 4 10 {\displaystyle \operatorname {P} (Y)={4 \over 10}} is the probability that the purchased bulb was manufactured by factory Y {\displaystyle Y} ;
  • E ( L X ) = 5000 {\displaystyle \operatorname {E} (L\mid X)=5000} is the expected lifetime of a bulb manufactured by X {\displaystyle X} ;
  • E ( L Y ) = 4000 {\displaystyle \operatorname {E} (L\mid Y)=4000} is the expected lifetime of a bulb manufactured by Y {\displaystyle Y} .

Thus each purchased light bulb has an expected lifetime of 4600 hours.

Informal proof

When a joint probability density function is well defined and the expectations are integrable, we write for the general case

E ( X ) = x Pr [ X = x ]   d x E ( X Y = y ) = x Pr [ X = x Y = y ]   d x E ( E ( X Y ) ) = ( x Pr [ X = x Y = y ]   d x ) Pr [ Y = y ]   d y = x Pr [ X = x , Y = y ]   d x   d y = x ( Pr [ X = x , Y = y ]   d y )   d x = x Pr [ X = x ]   d x = E ( X ) . {\displaystyle {\begin{aligned}\operatorname {E} (X)&=\int x\Pr[X=x]~dx\\\operatorname {E} (X\mid Y=y)&=\int x\Pr[X=x\mid Y=y]~dx\\\operatorname {E} (\operatorname {E} (X\mid Y))&=\int \left(\int x\Pr[X=x\mid Y=y]~dx\right)\Pr[Y=y]~dy\\&=\int \int x\Pr[X=x,Y=y]~dx~dy\\&=\int x\left(\int \Pr[X=x,Y=y]~dy\right)~dx\\&=\int x\Pr[X=x]~dx\\&=\operatorname {E} (X)\,.\end{aligned}}}
A similar derivation works for discrete distributions using summation instead of integration. For the specific case of a partition, give each cell of the partition a unique label and let the random variable Y be the function of the sample space that assigns a cell's label to each point in that cell.

Proof in the general case

Let ( Ω , F , P ) {\displaystyle (\Omega ,{\mathcal {F}},\operatorname {P} )} be a probability space on which two sub σ-algebras G 1 G 2 F {\displaystyle {\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}\subseteq {\mathcal {F}}} are defined. For a random variable X {\displaystyle X} on such a space, the smoothing law states that if E [ X ] {\displaystyle \operatorname {E} [X]} is defined, i.e. min ( E [ X + ] , E [ X ] ) < {\displaystyle \min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty } , then

E [ E [ X G 2 ] G 1 ] = E [ X G 1 ] (a.s.) . {\displaystyle \operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]=\operatorname {E} [X\mid {\mathcal {G}}_{1}]\quad {\text{(a.s.)}}.}

Proof. Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:

  • E [ E [ X G 2 ] G 1 ]  is  G 1 {\displaystyle \operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]{\mbox{ is }}{\mathcal {G}}_{1}} -measurable
  • G 1 E [ E [ X G 2 ] G 1 ] d P = G 1 X d P , {\displaystyle \int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]\,d\operatorname {P} =\int _{G_{1}}X\,d\operatorname {P} ,} for all G 1 G 1 . {\displaystyle G_{1}\in {\mathcal {G}}_{1}.}

The first of these properties holds by definition of the conditional expectation. To prove the second one,

min ( G 1 X + d P , G 1 X d P ) min ( Ω X + d P , Ω X d P ) = min ( E [ X + ] , E [ X ] ) < , {\displaystyle {\begin{aligned}\min \left(\int _{G_{1}}X_{+}\,d\operatorname {P} ,\int _{G_{1}}X_{-}\,d\operatorname {P} \right)&\leq \min \left(\int _{\Omega }X_{+}\,d\operatorname {P} ,\int _{\Omega }X_{-}\,d\operatorname {P} \right)\\[4pt]&=\min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty ,\end{aligned}}}

so the integral G 1 X d P {\displaystyle \textstyle \int _{G_{1}}X\,d\operatorname {P} } is defined (not equal {\displaystyle \infty -\infty } ).

The second property thus holds since G 1 G 1 G 2 {\displaystyle G_{1}\in {\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}} implies

G 1 E [ E [ X G 2 ] G 1 ] d P = G 1 E [ X G 2 ] d P = G 1 X d P . {\displaystyle \int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]\,d\operatorname {P} =\int _{G_{1}}\operatorname {E} [X\mid {\mathcal {G}}_{2}]\,d\operatorname {P} =\int _{G_{1}}X\,d\operatorname {P} .}

Corollary. In the special case when G 1 = { , Ω } {\displaystyle {\mathcal {G}}_{1}=\{\emptyset ,\Omega \}} and G 2 = σ ( Y ) {\displaystyle {\mathcal {G}}_{2}=\sigma (Y)} , the smoothing law reduces to

E [ E [ X Y ] ] = E [ X ] . {\displaystyle \operatorname {E} [\operatorname {E} [X\mid Y]]=\operatorname {E} [X].}

Alternative proof for E [ E [ X Y ] ] = E [ X ] . {\displaystyle \operatorname {E} [\operatorname {E} [X\mid Y]]=\operatorname {E} [X].}

This is a simple consequence of the measure-theoretic definition of conditional expectation. By definition, E [ X Y ] := E [ X σ ( Y ) ] {\displaystyle \operatorname {E} [X\mid Y]:=\operatorname {E} [X\mid \sigma (Y)]} is a σ ( Y ) {\displaystyle \sigma (Y)} -measurable random variable that satisfies

A E [ X Y ] d P = A X d P , {\displaystyle \int _{A}\operatorname {E} [X\mid Y]\,d\operatorname {P} =\int _{A}X\,d\operatorname {P} ,}

for every measurable set A σ ( Y ) {\displaystyle A\in \sigma (Y)} . Taking A = Ω {\displaystyle A=\Omega } proves the claim.

See also

References

  1. ^ Weiss, Neil A. (2005). A Course in Probability. Boston: Addison–Wesley. pp. 380–383. ISBN 0-321-18954-X.
  2. ^ "Law of Iterated Expectation | Brilliant Math & Science Wiki". brilliant.org. Retrieved 2018-03-28.
  3. ^ "Adam's and Eve's Laws". Retrieved 2022-04-19.
  4. ^ Rhee, Chang-han (Sep 20, 2011). "Probability and Statistics" (PDF).
  5. ^ Wolpert, Robert (November 18, 2010). "Conditional Expectation" (PDF).
  • Billingsley, Patrick (1995). Probability and measure. New York: John Wiley & Sons. ISBN 0-471-00710-2. (Theorem 34.4)
  • Christopher Sims, "Notes on Random Variables, Expectations, Probability Densities, and Martingales", especially equations (16) through (18)