read a answer on quora about how does Spotify use big data
the following comments is posted by Erik Bernhardsson, Tech Lead at Spotify
We use it for a ton of things. Spotify is just in the process of upgrading to a 700 node Hadoop cluster and we probably run 2000+ jobs every day. We use it for a lot of things, including toplists, recommendations, ad forecasting, business analytics, and lots of other things. Most of the Hadoop jobs are in
Python
orHive
, but we also run some stuff in pureJava
,Pig
andScala (scalding)
.
We open sourced our workflow manager
Luigi
a while back. Here’s a presentation by me: Luigi Presentation at OSCON 2013
Apart from
Hadoop
, we useCassandra
extensively. We’re also running a test cluster withStorm
andKafka
and we might start using it in production later this year.
Probabilistic latent semantic analysis is one method that works pretty well in the implicit context. We use it and related methods for our recommender system at Spotify.
PLSA
http://www.slideshare.net/erikbern/collaborative-filtering-at-spotify-16182818
Alternative definition for $p(u,i)$
Thus, the log likelihood is:
$$ L= (\sum_{u,i} a_u^T b_i) - T log Z $$ The calculation of Z involves summation of all possible user item combination instead of only sweeping over observed rating. That leave the question how to efficiently estimate Z. The author didn’t mention the this approach. Maybe sampling algorithm can be incorporated to estimate this term.
With the estimated Z, the gradient for updating $a_u$ is