Normal-Wishart distribution

Normal-Wishart
Notation	$({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )$
Parameters	${\boldsymbol {\mu }}_{0}\in \mathbb {R} ^{D}\,$ location (vector of real) $\lambda >0\,$ (real) $\mathbf {W} \in \mathbb {R} ^{D\times D}$ scale matrix (pos. def.) $\nu >D-1\,$ (real)
Support	${\boldsymbol {\mu }}\in \mathbb {R} ^{D};{\boldsymbol {\Lambda }}\in \mathbb {R} ^{D\times D}$ covariance matrix (pos. def.)
PDF	$f({\boldsymbol {\mu }},{\boldsymbol {\Lambda }}\|{\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )={\mathcal {N}}({\boldsymbol {\mu }}\|{\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})\ {\mathcal {W}}({\boldsymbol {\Lambda }}\|\mathbf {W} ,\nu )$

In probability theory and statistics, the normal-Wishart distribution (or Gaussian-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and precision matrix (the inverse of the covariance matrix).^[1]

Definition

Suppose

{\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Lambda }}\sim {\mathcal {N}}({\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})

has a multivariate normal distribution with mean ${\boldsymbol {\mu }}_{0}$ and covariance matrix $(\lambda {\boldsymbol {\Lambda }})^{-1}$ , where

{\boldsymbol {\Lambda }}|\mathbf {W} ,\nu \sim {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )

has a Wishart distribution. Then $({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})$ has a normal-Wishart distribution, denoted as

({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu ).

Characterization

Probability density function

f({\boldsymbol {\mu }},{\boldsymbol {\Lambda }}|{\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})\ {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )

Properties

Scaling

Marginal distributions

By construction, the marginal distribution over ${\boldsymbol {\Lambda }}$ is a Wishart distribution, and the conditional distribution over ${\boldsymbol {\mu }}$ given ${\boldsymbol {\Lambda }}$ is a multivariate normal distribution. The marginal distribution over ${\boldsymbol {\mu }}$ is a multivariate t-distribution.

Posterior distribution of the parameters

After making $n$ observations ${\boldsymbol {x}}_{1},\dots ,{\boldsymbol {x}}_{n}$ , the posterior distribution of the parameters is

({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{n},\lambda _{n},\mathbf {W} _{n},\nu _{n}),

where

\lambda _{n}=\lambda +n,

{\boldsymbol {\mu }}_{n}={\frac {\lambda {\boldsymbol {\mu }}_{0}+n{\boldsymbol {\bar {x}}}}{\lambda +n}},

\nu _{n}=\nu +n,

\mathbf {W} _{n}^{-1}=\mathbf {W} ^{-1}+\sum _{i=1}^{n}({\boldsymbol {x}}_{i}-{\boldsymbol {\bar {x}}})({\boldsymbol {x}}_{i}-{\boldsymbol {\bar {x}}})^{T}+{\frac {n\lambda }{n+\lambda }}({\boldsymbol {\bar {x}}}-{\boldsymbol {\mu }}_{0})({\boldsymbol {\bar {x}}}-{\boldsymbol {\mu }}_{0})^{T}.

^[2]

Generating normal-Wishart random variates

Generation of random variates is straightforward:

Sample ${\boldsymbol {\Lambda }}$ from a Wishart distribution with parameters $\mathbf {W}$ and $\nu$
Sample ${\boldsymbol {\mu }}$ from a multivariate normal distribution with mean ${\boldsymbol {\mu }}_{0}$ and variance $(\lambda {\boldsymbol {\Lambda }})^{-1}$

Related distributions

The normal-inverse Wishart distribution is essentially the same distribution parameterized by variance rather than precision.
The normal-gamma distribution is the one-dimensional equivalent.
The multivariate normal distribution and Wishart distribution are the component distributions out of which this distribution is made.

Notes

^ Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media. Page 690.
^ Cross Validated, https://stats.stackexchange.com/q/324925

References

Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.

Probability distributions (list)

Discrete
univariate

with finite support	Benford Bernoulli beta-binomial binomial categorical hypergeometric negative Poisson binomial Rademacher soliton discrete uniform Zipf Zipf–Mandelbrot
with infinite support	beta negative binomial Borel Conway–Maxwell–Poisson discrete phase-type Delaporte extended negative binomial Flory–Schulz Gauss–Kuzmin geometric logarithmic mixed Poisson negative binomial Panjer parabolic fractal Poisson Skellam Yule–Simon zeta

Continuous
univariate

supported on a bounded interval	arcsine ARGUS Balding–Nichols Bates beta beta rectangular continuous Bernoulli Irwin–Hall Kumaraswamy logit-normal noncentral beta PERT raised cosine reciprocal triangular U-quadratic uniform Wigner semicircle
supported on a semi-infinite interval	Benini Benktander 1st kind Benktander 2nd kind beta prime Burr chi chi-squared noncentral inverse scaled Dagum Davis Erlang hyper exponential hyperexponential hypoexponential logarithmic F noncentral folded normal Fréchet gamma generalized inverse gamma/Gompertz Gompertz shifted half-logistic half-normal Hotelling's T-squared inverse Gaussian generalized Kolmogorov Lévy log-Cauchy log-Laplace log-logistic log-normal log-t Lomax matrix-exponential Maxwell–Boltzmann Maxwell–Jüttner Mittag-Leffler Nakagami Pareto phase-type Poly-Weibull Rayleigh relativistic Breit–Wigner Rice truncated normal type-2 Gumbel Weibull discrete Wilks's lambda
supported on the whole real line	Cauchy exponential power Fisher's z Kaniadakis κ-Gaussian Gaussian q generalized normal generalized hyperbolic geometric stable Gumbel Holtsmark hyperbolic secant Johnson's S_U Landau Laplace asymmetric logistic noncentral t normal (Gaussian) normal-inverse Gaussian skew normal slash stable Student's t Tracy–Widom variance-gamma Voigt
with support whose type varies	generalized chi-squared generalized extreme value generalized Pareto Marchenko–Pastur Kaniadakis κ-exponential Kaniadakis κ-Gamma Kaniadakis κ-Weibull Kaniadakis κ-Logistic Kaniadakis κ-Erlang q-exponential q-Gaussian q-Weibull shifted log-logistic Tukey lambda