1. On Smoothing and Inference for Topic Models

强烈推荐大家看这篇, 这篇讲的东西，应该可以理清很多人的疑问。

这篇里提到了PLSA和LDA的关系

Exact inference (i.e. computing the posterior over the hidden variables) for this model is intractable [Blei et al.,2003], and so a variety of approximate algorithms have been developed. If we ignore α and η and treat θkj and φwk as parameters, we obtain the PLSA model, and maximum likelihood (ML) estimation over θkj and φwk directly corresponds to PLSA’s EM algorithm.

意思就是如果 1. 不假设先验 或者 2. 假设 $ \alpha=\beta=1 $ , 那么LDA就是PLSA模型。并且，木有先验的话，就没有所谓的 bayesian inference 和 MAP 的说法了。求参数$\theta, \phi$的过程就和PLSA的EM Algorithm一样的了。

比较了计算LDA inference的方法

Variational Bayesian (VB)
Collapsed VB
MAP (EM Elgorithm)
CVB0
Collapsed Gibbs Sampling(CGS)

文中主要着重介绍了几点

这几个方法都很依赖调参数，调参数可以手工设一个固定值，或者用Minka fixed point,或者grid search
表示在适当的参数的情况下，这几个方式的求解过程是相似的。并且给出了对应参数之间的关系。比如, MAP的 prior要比其他的多加 + 1 (要不会负的概率)之类的。
表示认真的调了参数之后，这几个方法结果的好坏，差距没有那么大。
当然最好的还是CVB0和Collapsed Gibbs Sampling. 后者多拿出几个Sampling出来平均效果更好。
CVB0是 determinstic 的，所以收敛的比 CGS 快一些。
MAP居然是并行化的保障的

The updates in MAP estimation can be performed in parallel without affecting the fixed point since MAP is an EM algorithm [Neal and Hinton, 1998]

2. Rethinking LDA : Why Priors Matter，

今天看了一篇文章叫Rethinking LDA : Why Priors Matter，主要讲 LDA里 symmetric prior 和 asymmetric prior 对结果的影响。

最终表示 AS 这种类型的prior效果比较好。

此处，Symmetric prior 是$T$维的dirichlet prior全部取一直值。

AS 是指2个参取不同类型的prior.

document specific topic distribution $\theta$ 的 prior $\alpha$ 取为 symmetric prior
topic word distribution $\phi$ 的prior $\beta$ 取为 asymmetric prior, 具体 asymmetric prior 的形式可以参考论文

$ \phi $的prior

最早的时候blei大叔是不给这个参数加prior，加prior的是04的年griffiths大叔。

通过blei的 variational inference 出来的应该就是 asymmetric topic word distribution, 不懂为啥后来又给加了 symmetric prior，然后又给整成了 asymmetric prior.

通过修改成asymmetric prior, graphical model 确实是增加了一层的节点，当然如果把这层给 marginalize 掉就成为了另外一种的prior, 就不是原先的 dirichlet 假设。

最后作者表示实验效果好呀，实验效果好。

简记·思行-SiNZeRo

Sunshine and Imagination

Priors and Smoothing of LDA