modeling musical influence with topic models
2.2. The Dataset Conducting our study became possible with the publication of the Million Songs Dataset (Bertin-Mahieux et al., 2011) (MSD) in 2011. MSD is the first truly large scale, diverse and epoch-spanning dataset of songs ever made publicly available. MSD includes detailed audio features for ~ 1,000,000 songs along with rich (albeit sometimes inconsistent and missing) metadata including genre tags, artist location and artist familiarity. The audio features are described in Section 4.
Sequential Monte Carlo (SMC) for Bayesian decision trees
An Adaptive Learning Rate for Stochastic Variational Inference
Mini-Batch Primal and Dual Methods for SVMs
In this paper, we consider using mini-batches with Pegasos (SGD on the primal objective) and with Stochastic Dual Coordinate Ascent (SDCA). We show that for both methods, the quantity that controls the speedup obtained using mini-batching/parallelization is the spectral norm of the data.
Fast Probabilistic Optimization from Noisy Gradients,
Clustering and Learning Behaviors using a Sparse Latent Space
General Functional Matrix Factorization Using Gradient Boosting
Multiple-Source Cross Validation
Cross-validation is an essential tool in machine learning and statistics. The typical procedure, in which data points are randomly assigned to one of the test sets, makes an implicit assumption that the data are exchangeable. A common case in which this does not hold is when the data come from multiple sources, in the sense used in transfer learning. In this case it is common to arrange the cross-validation procedure in a way that takes the source structure into account. Although common in practice, this procedure does not appear to have been theoretically analysed. We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting. These new estimators allow for much more accurate condence intervals and hypothesis tests to compare algorithms.
Local Low-Rank Matrix Approximation
Matrix approximation is a common tool in recommendation systems, text mining, and computer vision. A prevalent assumption in constructing matrix approximations is that the partially observed matrix is of low-rank. We propose a new matrix approximation model where we assume instead that the matrix is locally of low-rank, leading to a representation of the observed matrix as a weighted sum of low-rank matrices. We analyze the accuracy of the proposed local lowrank modeling. Our experiments show improvements in prediction accuracy over classical approaches for recommendation tasks.
ELLA: An Ecient Lifelong Learning Algorithm
The problem of learning multiple consecu- tive tasks, known as lifelong learning, is of great importance to the creation of intelli- gent, general-purpose, and exible machines. In this paper, we develop a method for on- line multi-task learning in the lifelong learn- ing setting. The proposed Ecient Life- long Learning Algorithm (ELLA) maintains a sparsely shared basis for all task models, transfers knowledge from the basis to learn each new task, and renes the basis over time to maximize performance across all tasks. We show that ELLA has strong connections to both online dictionary learning for sparse coding and state-of-the-art batch multi-task learning methods, and provide robust the- oretical performance guarantees. We show empirically that ELLA yields nearly identi- cal performance to batch multi-task learning while learning tasks sequentially in three or- ders of magnitude (over 1,000x) less time.