简记·思行-SiNZeRo

Sunshine and Imagination

Approximate Gradient of Incomplete-data Log Likelihood

| Comments


Let’s firstly give the result, and then show how to achieve it.

where $Z$ is latent variable and $S = \{Z^{1}, Z^{2} ….\}$ is the collection of samples. Given a joint model

one may recognize it as collapsed form of Latent Dirichlet Allocation(LDA), here omits $\alpha, \beta$ for simplicity.

Now, we want to calculate the gradient of it’s marginalized posterior $p(\phi|X)$ as following:

The second is just gradient of prior distribution of $\phi$, first term is incomplete-data log likelihood, We can not calculate it directly. As the full likelihood is $p(x_i,z_i|\phi)$, we calculate it as following:

where, in line-2, log cannot go through the summation, note that line2 and line3 are equivalent.

Here, we derive the approximation as following:

We can use MCMC method to approximate this expectation, specifically, we can construct a Markov chain to sample $z_i$ from $p(z_i | x_i, \phi)$, and sum over gradient based on each sample $z_i^{s}$ to calculate expectation.

We denote the collection of samples as ${{z_i^{s}}}_{s=1}^{|S|}$ , the approximation is:

In these example, the Gibbs sampler for $p(z_i|x_i, \phi)$ is:

Now, we consider more complex model with two set of variables that we want their gradient, un-collapsed LDA is a good example:

The log likelihood:

It’s easy to see that we only need concern how to calculate the gradient of

The approximation:

This draft may contain some mistakes, if you found any, I will appreciate the correction.

Comments