Approximate Gradient of incomplete-data log likelihood

Let’s firstly give the result, and then show how to achieve it.

$\begin{align} \frac{\partial}{\partial \theta_i } \log p(\theta;\alpha) \approx \sum_{s \in S} \frac{\partial}{\partial \theta_i} \log p(\theta|Z^{s},\alpha) \end{align}$

where $Z$ is latent variable and $S = \{Z^{1}, Z^{2} ….\}$ is the collection of samples. Given a joint model

$p(X, Z, \phi) = \prod_i p(x_i|z_i,\phi) \prod_i p(z_i;\alpha) p(\phi;\beta)$

one may recognize it as collapsed form of Latent Dirichlet Allocation(LDA), here omits $\alpha, \beta$ for simplicity.

Now, we want to calculate the gradient of it’s marginalized posterior $p(\phi|X)$ as following:

$\begin{align} &\frac{\partial \log p(\phi | X)}{\partial \phi} \\ = &\frac{\partial}{\partial \phi} [\sum_i \log p(x_i | \phi) + \log p(\phi)- \log p(X)] \\ = &\sum_i {\frac{\partial}{\partial \phi} \log p(x_i | \phi)} + \frac{\partial \log p(\phi)}{\partial \phi} \end{align}$

The second is just gradient of prior distribution of $\phi$, first term is incomplete-data log likelihood, We can not calculate it directly. As the full likelihood is $p(x_i,z_i|\phi)$, we calculate it as following:

$\frac{\partial}{\partial \phi} \log p(x | \phi) \\ = \frac{\partial}{\partial \phi} \log \sum_{z_i} p(x_i, z_i | \phi) \\ = \frac{\partial}{\partial \phi} \log E_{z_i} p(x_i|z_i, \phi)$

where, in line-2, log cannot go through the summation, note that line2 and line3 are equivalent.

Here, we derive the approximation as following:

$\begin{align} &\frac{\partial}{\partial \phi} \log p(x_i | \phi) \\ =&\frac{1}{p(x_i | \phi)} \frac{\partial}{\partial \phi} p(x_i | \phi) \\ =&\frac{1}{p(x_i | \phi)} \sum_{z_i} \frac{\partial}{\partial \phi} p(x_i, z_i | \phi) \\ =&\frac{1}{p(x_i | \phi)} \sum_{z_i} p(x_i, z_i | \phi) \frac{\partial}{\partial \phi} \log p(x_i, z_i | \phi) \\ =&\sum_{z_i} \frac{p(x_i, z_i | \phi)}{p(x_i | \phi)} \frac{\partial}{\partial \phi} \log p(x_i, z_i | \phi) \\ =&\sum_{z_i} p(z_i | x_i, \phi) \frac{\partial}{\partial \phi} \log p(x_i, z_i | \phi) \\ =&E_{z_i | x_i, \phi} \frac{\partial}{\partial \phi} \log p(x_i, z_i | \phi) \\ \end{align}$

We can use MCMC method to approximate this expectation, specifically, we can construct a Markov chain to sample $z_i$ from $p(z_i | x_i, \phi)$, and sum over gradient based on each sample $z_i^{s}$ to calculate expectation.

We denote the collection of samples as ${{z_i^{s}}}_{s=1}^{|S|}$ , the approximation is:

$\begin{align} &E_{z_i | x_i, \phi} \frac{\partial}{\partial \phi} \log p(x_i, z_i | \phi) \\ =& \sum_{s \in S} \frac{\partial}{\partial \phi} \log p(x_i, z_i^{s} | \phi) \\ =& \sum_{s \in S} \frac{\partial}{\partial \phi} [\log p(x_i | z_i^{s}, \phi) + \log p(z_i^{s})]\\ =& \sum_{s \in S} \frac{\partial}{\partial \phi} \log p(x_i |z_i^{s}, \phi) \end{align}$

In these example, the Gibbs sampler for $p(z_i|x_i, \phi)$ is:

$\begin{align} &p(z_{ij} = k| z_{i}^{\neg ij}, x_{ij}, \phi) \\ \propto & (n_{dk}^{\neg ij} + \alpha) \phi_{kx_{ij}} \end{align}$

Now, we consider more complex model with two set of variables that we want their gradient, un-collapsed LDA is a good example:

$p(X, Z, \phi, \theta;\alpha, \beta) = p(\phi;\beta) \prod_i (p(x_i|z_i,\phi) p(z_i; \theta_i) p(\theta_i;\alpha))$

The log likelihood:

$\begin{align} LL(\theta, \phi) &= \log p(\phi;\beta) + \sum_i \log p(x_i|\phi, \theta_i) + \sum_i \log p(\theta_i;\alpha) \end{align}$

It’s easy to see that we only need concern how to calculate the gradient of $\log p(x_i|\phi, \theta_i)$

The approximation:

$\begin{align} &\frac{\partial}{\partial \phi} \log p(x_i | \phi, \theta) \\ =&\frac{1}{p(x_i | \phi, \theta)} \frac{\partial}{\partial \phi} p(x_i | \phi, \theta) \\ =&\frac{1}{p(x_i | \phi, \theta)} \sum_{z_i} \frac{\partial}{\partial \phi} p(x_i, z_i | \phi, \theta) \\ =&\frac{1}{p(x_i | \phi, \theta)} \sum_{z_i} p(x_i, z_i | \phi, \theta) \frac{\partial}{\partial \phi} \log p(x_i, z_i | \phi, \theta) \\ =&\sum_{z_i} \frac{p(x_i, z_i | \phi, \theta)}{p(x_i | \phi, \theta)} \frac{\partial}{\partial \phi} \log p(x_i, z_i | \phi, \theta) \\ =&\sum_{z_i} p(z_i | x_i, \phi, \theta) \frac{\partial}{\partial \phi} \log p(x_i, z_i | \phi, \theta) \\ =&E_{z_i | x_i, \phi, \theta} \frac{\partial}{\partial \phi} \log p(x_i, z_i | \phi, \theta) \\ =&E_{z_i | x_i, \phi, \theta} \frac{\partial}{\partial \phi} [\log p(x_i|z_i, \phi) + \log p(z_i | \theta)] \\ \end{align}$

This draft may contain some mistakes, if you found any, I will appreciate the correction.

简记·思行-SiNZeRo

Sunshine and Imagination

Approximate Gradient of Incomplete-data Log Likelihood

Comments