Normal-Wishart distribution

Normal-Wishart
Notation ( μ , Λ ) N W ( μ 0 , λ , W , ν ) {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )}
Parameters μ 0 R D {\displaystyle {\boldsymbol {\mu }}_{0}\in \mathbb {R} ^{D}\,} location (vector of real)
λ > 0 {\displaystyle \lambda >0\,} (real)
W R D × D {\displaystyle \mathbf {W} \in \mathbb {R} ^{D\times D}} scale matrix (pos. def.)
ν > D 1 {\displaystyle \nu >D-1\,} (real)
Support μ R D ; Λ R D × D {\displaystyle {\boldsymbol {\mu }}\in \mathbb {R} ^{D};{\boldsymbol {\Lambda }}\in \mathbb {R} ^{D\times D}} covariance matrix (pos. def.)
PDF f ( μ , Λ | μ 0 , λ , W , ν ) = N ( μ | μ 0 , ( λ Λ ) 1 )   W ( Λ | W , ν ) {\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Lambda }}|{\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})\ {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )}

In probability theory and statistics, the normal-Wishart distribution (or Gaussian-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and precision matrix (the inverse of the covariance matrix).[1]

Definition

Suppose

μ | μ 0 , λ , Λ N ( μ 0 , ( λ Λ ) 1 ) {\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Lambda }}\sim {\mathcal {N}}({\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})}

has a multivariate normal distribution with mean μ 0 {\displaystyle {\boldsymbol {\mu }}_{0}} and covariance matrix ( λ Λ ) 1 {\displaystyle (\lambda {\boldsymbol {\Lambda }})^{-1}} , where

Λ | W , ν W ( Λ | W , ν ) {\displaystyle {\boldsymbol {\Lambda }}|\mathbf {W} ,\nu \sim {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )}

has a Wishart distribution. Then ( μ , Λ ) {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})} has a normal-Wishart distribution, denoted as

( μ , Λ ) N W ( μ 0 , λ , W , ν ) . {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu ).}

Characterization

Probability density function

f ( μ , Λ | μ 0 , λ , W , ν ) = N ( μ | μ 0 , ( λ Λ ) 1 )   W ( Λ | W , ν ) {\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Lambda }}|{\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})\ {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )}

Properties

Scaling

Marginal distributions

By construction, the marginal distribution over Λ {\displaystyle {\boldsymbol {\Lambda }}} is a Wishart distribution, and the conditional distribution over μ {\displaystyle {\boldsymbol {\mu }}} given Λ {\displaystyle {\boldsymbol {\Lambda }}} is a multivariate normal distribution. The marginal distribution over μ {\displaystyle {\boldsymbol {\mu }}} is a multivariate t-distribution.

Posterior distribution of the parameters

After making n {\displaystyle n} observations x 1 , , x n {\displaystyle {\boldsymbol {x}}_{1},\dots ,{\boldsymbol {x}}_{n}} , the posterior distribution of the parameters is

( μ , Λ ) N W ( μ n , λ n , W n , ν n ) , {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{n},\lambda _{n},\mathbf {W} _{n},\nu _{n}),}

where

λ n = λ + n , {\displaystyle \lambda _{n}=\lambda +n,}
μ n = λ μ 0 + n x ¯ λ + n , {\displaystyle {\boldsymbol {\mu }}_{n}={\frac {\lambda {\boldsymbol {\mu }}_{0}+n{\boldsymbol {\bar {x}}}}{\lambda +n}},}
ν n = ν + n , {\displaystyle \nu _{n}=\nu +n,}
W n 1 = W 1 + i = 1 n ( x i x ¯ ) ( x i x ¯ ) T + n λ n + λ ( x ¯ μ 0 ) ( x ¯ μ 0 ) T . {\displaystyle \mathbf {W} _{n}^{-1}=\mathbf {W} ^{-1}+\sum _{i=1}^{n}({\boldsymbol {x}}_{i}-{\boldsymbol {\bar {x}}})({\boldsymbol {x}}_{i}-{\boldsymbol {\bar {x}}})^{T}+{\frac {n\lambda }{n+\lambda }}({\boldsymbol {\bar {x}}}-{\boldsymbol {\mu }}_{0})({\boldsymbol {\bar {x}}}-{\boldsymbol {\mu }}_{0})^{T}.} [2]

Generating normal-Wishart random variates

Generation of random variates is straightforward:

  1. Sample Λ {\displaystyle {\boldsymbol {\Lambda }}} from a Wishart distribution with parameters W {\displaystyle \mathbf {W} } and ν {\displaystyle \nu }
  2. Sample μ {\displaystyle {\boldsymbol {\mu }}} from a multivariate normal distribution with mean μ 0 {\displaystyle {\boldsymbol {\mu }}_{0}} and variance ( λ Λ ) 1 {\displaystyle (\lambda {\boldsymbol {\Lambda }})^{-1}}

Related distributions

Notes

  1. ^ Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media. Page 690.
  2. ^ Cross Validated, https://stats.stackexchange.com/q/324925

References

  • Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.
  • v
  • t
  • e
Discrete
univariate
with finite
support
with infinite
support
Continuous
univariate
supported on a
bounded interval
supported on a
semi-infinite
interval
supported
on the whole
real line
with support
whose type varies
Mixed
univariate
continuous-
discrete
Multivariate
(joint)DirectionalDegenerate
and singular
Degenerate
Dirac delta function
Singular
Cantor
Families
  • Category
  • Commons