简记·思行-SiNZeRo

Sunshine and Imagination

Approximate Gradient of Incomplete-data Log Likelihood

| Comments


Let’s firstly give the result, and then show how to achieve it.

where $Z$ is latent variable and $S = \{Z^{1}, Z^{2} ….\}$ is the collection of samples. Given a joint model

one may recognize it as collapsed form of Latent Dirichlet Allocation(LDA), here omits $\alpha, \beta$ for simplicity.

Now, we want to calculate the gradient of it’s marginalized posterior $p(\phi|X)$ as following:

The second is just gradient of prior distribution of $\phi$, first term is incomplete-data log likelihood, We can not calculate it directly. As the full likelihood is $p(x_i,z_i|\phi)$, we calculate it as following:

where, in line-2, log cannot go through the summation, note that line2 and line3 are equivalent.

Here, we derive the approximation as following:

We can use MCMC method to approximate this expectation, specifically, we can construct a Markov chain to sample $z_i$ from $p(z_i | x_i, \phi)$, and sum over gradient based on each sample $z_i^{s}$ to calculate expectation.

We denote the collection of samples as ${{z_i^{s}}}_{s=1}^{|S|}$ , the approximation is:

In these example, the Gibbs sampler for $p(z_i|x_i, \phi)$ is:

Now, we consider more complex model with two set of variables that we want their gradient, un-collapsed LDA is a good example:

The log likelihood:

It’s easy to see that we only need concern how to calculate the gradient of

The approximation:

This draft may contain some mistakes, if you found any, I will appreciate the correction.

How Does Spotify Use Big Data?

| Comments


read a answer on quora about how does Spotify use big data

the following comments is posted by Erik Bernhardsson, Tech Lead at Spotify

We use it for a ton of things. Spotify is just in the process of upgrading to a 700 node Hadoop cluster and we probably run 2000+ jobs every day. We use it for a lot of things, including toplists, recommendations, ad forecasting, business analytics, and lots of other things. Most of the Hadoop jobs are in Python or Hive, but we also run some stuff in pure Java, Pig and Scala (scalding).

We open sourced our workflow manager Luigi a while back. Here’s a presentation by me: Luigi Presentation at OSCON 2013

Apart from Hadoop, we use Cassandra extensively. We’re also running a test cluster with Storm and Kafka and we might start using it in production later this year.

Probabilistic latent semantic analysis is one method that works pretty well in the implicit context. We use it and related methods for our recommender system at Spotify.

PLSA

http://www.slideshare.net/erikbern/collaborative-filtering-at-spotify-16182818

Alternative definition for $p(u,i)$

Thus, the log likelihood is:

$$ L= (\sum_{u,i} a_u^T b_i) - T log Z $$ The calculation of Z involves summation of all possible user item combination instead of only sweeping over observed rating. That leave the question how to efficiently estimate Z. The author didn’t mention the this approach. Maybe sampling algorithm can be incorporated to estimate this term.

With the estimated Z, the gradient for updating $a_u$ is

Some Notes on ICML13 Papers

| Comments


modeling musical influence with topic models

2.2. The Dataset Conducting our study became possible with the publication of the Million Songs Dataset (Bertin-Mahieux et al., 2011) (MSD) in 2011. MSD is the first truly large scale, diverse and epoch-spanning dataset of songs ever made publicly available. MSD includes detailed audio features for ~ 1,000,000 songs along with rich (albeit sometimes inconsistent and missing) metadata including genre tags, artist location and artist familiarity. The audio features are described in Section 4.

Sequential Monte Carlo (SMC) for Bayesian decision trees

An Adaptive Learning Rate for Stochastic Variational Inference

Mini-Batch Primal and Dual Methods for SVMs

In this paper, we consider using mini-batches with Pegasos (SGD on the primal objective) and with Stochastic Dual Coordinate Ascent (SDCA). We show that for both methods, the quantity that controls the speedup obtained using mini-batching/parallelization is the spectral norm of the data.

Fast Probabilistic Optimization from Noisy Gradients,

Clustering and Learning Behaviors using a Sparse Latent Space

General Functional Matrix Factorization Using Gradient Boosting

Multiple-Source Cross Validation

Cross-validation is an essential tool in machine learning and statistics. The typical procedure, in which data points are randomly assigned to one of the test sets, makes an implicit assumption that the data are exchangeable. A common case in which this does not hold is when the data come from multiple sources, in the sense used in transfer learning. In this case it is common to arrange the cross-validation procedure in a way that takes the source structure into account. Although common in practice, this procedure does not appear to have been theoretically analysed. We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting. These new estimators allow for much more accurate con dence intervals and hypothesis tests to compare algorithms.

Local Low-Rank Matrix Approximation

Matrix approximation is a common tool in recommendation systems, text mining, and computer vision. A prevalent assumption in constructing matrix approximations is that the partially observed matrix is of low-rank. We propose a new matrix approximation model where we assume instead that the matrix is locally of low-rank, leading to a representation of the observed matrix as a weighted sum of low-rank matrices. We analyze the accuracy of the proposed local lowrank modeling. Our experiments show improvements in prediction accuracy over classical approaches for recommendation tasks.

ELLA: An Ecient Lifelong Learning Algorithm

The problem of learning multiple consecu- tive tasks, known as lifelong learning, is of great importance to the creation of intelli- gent, general-purpose, and exible machines. In this paper, we develop a method for on- line multi-task learning in the lifelong learn- ing setting. The proposed Ecient Life- long Learning Algorithm (ELLA) maintains a sparsely shared basis for all task models, transfers knowledge from the basis to learn each new task, and re nes the basis over time to maximize performance across all tasks. We show that ELLA has strong connections to both online dictionary learning for sparse coding and state-of-the-art batch multi-task learning methods, and provide robust the- oretical performance guarantees. We show empirically that ELLA yields nearly identi- cal performance to batch multi-task learning while learning tasks sequentially in three or- ders of magnitude (over 1,000x) less time.

Ubuntu Source Maintained by CUHK

| Comments


1
sudo gedit /etc/apt/sources.list

replace the original content with following source list

1
2
3
4
5
6
7
8
9
10
deb http://ftp.cuhk.edu.hk/pub/Linux/ubuntu precise main restricted universe multiverse
deb http://ftp.cuhk.edu.hk/pub/Linux/ubuntu precise-security main restricted universe multiverse
deb http://ftp.cuhk.edu.hk/pub/Linux/ubuntu precise-updates main restricted universe multiverse
deb http://ftp.cuhk.edu.hk/pub/Linux/ubuntu precise-backports main restricted universe multiverse
deb http://ftp.cuhk.edu.hk/pub/Linux/ubuntu precise-proposed main restricted universe multiverse
deb-src http://ftp.cuhk.edu.hk/pub/Linux/ubuntu precise main restricted universe multiverse
deb-src http://ftp.cuhk.edu.hk/pub/Linux/ubuntu precise-security main restricted universe multiverse
deb-src http://ftp.cuhk.edu.hk/pub/Linux/ubuntu precise-updates main restricted universe multiverse
deb-src http://ftp.cuhk.edu.hk/pub/Linux/ubuntu precise-backports main restricted universe multiverse
deb-src http://ftp.cuhk.edu.hk/pub/Linux/ubuntu precise-proposed main restricted universe
1
sudo apt-get update
1
sudo apt-get upgrade

LDA and Its Variants

| Comments


Found a post that summarized LDA and its variants:

Topic Model的分类总结(LDA变种)

A more comprehensive bibliography is below, this page is maintained by mimno who is famous researcher in this direction.

Topic Modeling Bibliography

People have different focus in this field, the first post is more likely to be written by guys who have interests in IR. In contrast, I have more interests on modeling community and STOA inference techniques. Thus, I’m going to summarize paper list according to my interest. :)

Priors and Smoothing of LDA

| Comments


1. On Smoothing and Inference for Topic Models

强烈推荐大家看这篇, 这篇讲的东西,应该可以理清很多人的疑问。

这篇里提到了PLSA和LDA的关系

Exact inference (i.e. computing the posterior over the hidden variables) for this model is intractable [Blei et al.,2003], and so a variety of approximate algorithms have been developed. If we ignore α and η and treat θkj and φwk as parameters, we obtain the PLSA model, and maximum likelihood (ML) estimation over θkj and φwk directly corresponds to PLSA’s EM algorithm.

意思就是如果 1. 不假设先验 或者 2. 假设 $ \alpha=\beta=1 $ , 那么LDA就是PLSA模型。并且,木有先验的话,就没有所谓的 bayesian inference MAP 的说法了。求参数$\theta, \phi$的过程就和PLSAEM Algorithm一样的了。

比较了计算LDA inference的方法

  1. Variational Bayesian (VB)
  2. Collapsed VB
  3. MAP (EM Elgorithm)
  4. CVB0
  5. Collapsed Gibbs Sampling(CGS)

文中主要着重介绍了几点

  1. 这几个方法都很依赖调参数,调参数可以手工设一个固定值,或者用Minka fixed point,或者grid search
  2. 表示在适当的参数的情况下,这几个方式的求解过程是相似的。并且给出了对应参数之间的关系。比如, MAP的 prior要比其他的多加 + 1 (要不会负的概率)之类的。
  3. 表示认真的调了参数之后,这几个方法结果的好坏,差距没有那么大。
  4. 当然最好的还是CVB0和Collapsed Gibbs Sampling. 后者多拿出几个Sampling出来平均效果更好。
  5. CVB0是 determinstic 的,所以收敛的比 CGS 快一些。
  6. MAP居然是并行化的保障的

The updates in MAP estimation can be performed in parallel without affecting the fixed point since MAP is an EM algorithm [Neal and Hinton, 1998]

Latex Test

| Comments


Google means $10^{100}$

A Cross Product Formula

The probability of getting $k$ heads when flipping $n$ coins is

Octopress-配置

| Comments


教程到处都是,所以我就给一些建议吧。 似乎这个东西的更新很快,所以我就说下我是2013/7/20配置的。这个时候的各个东西的版本如下。

  1. ruby的版本一定要装1.9.3
  2. ruby devkit 我装的是 DevKit-tdm-32-4.5.2-20111229-1559-sfx.exe
  3. 然后几乎按着教程走就可以了。
  4. 本地的库,最好直接用Dropbox保存。免得迁入迁出什么的麻烦。
  5. 修改主题的时候记得在.themes里改模板。(其实我也不确定,但是这么做不至于换主题的时候被覆盖掉)
  6. 为了避免中文问题和配置上的麻烦请使用kramdown替换rdiscount
  • 安装 kramdown gem install kramdown
  • _config.yml里替换 markdown: rdiscountmarkdown: kramdown

参考

教程:一步步在github上建立octopress博客

DBLP_plugin

| Comments


- dblp_plugin.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
var script = document.createElement("script");
script.src = "http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js";
document.getElementsByTagName("head")[0].appendChild(script);
script.onload = function() {
    venue_list = Array();
    $("#autocomplete_F_boxes_2_body .box_line").each(function(){
        var t=$(this).find(".hits_number");
        $("<INPUT type='checkbox' class='venue_check'></input>").insertAfter(t);
    });
    $('.venue_check').each(function(){
        var venue_name = $(this).parent().find('.completion_link').text();
        var default_venue_list = ["KDD", 'WWW', 'ICDM', 'SDM', 'VLDB', 'CIKM'];
        if ($.inArray(venue_name,default_venue_list)>=0) {
            venue_list[venue_name ]=1;
            $(this).attr('checked','true');
        }
    });
    $('.venue_check').click(function(){
        var venue_name = $(this).parent().find('.completion_link').text();
        if ($(this).attr("checked")){
            venue_list[venue_name ]=1;
        }
        else
        {
            venue_list[venue_name ]=undefined;
        }
        $("#autocomplete_H_boxes_1_body table tr").each(function(){
            if ($(this).find('th').size()>0) {return ;}
            var match_flag = false;
            for (x in venue_list){
                if (!venue_list[x]) continue;
                if ($(this).text().match(x)){
                    match_flag = true;
                }
            }
            if (! match_flag ) {$(this).hide();}else ($(this).show());
        });
    });
};