After attending ICML 2012 in Edinburgh a month ago, I suppose a summary of my favorite papers and presentations is long overdue.

Overall, Edinburgh provided a beautiful setting for the conference, and famously dreary Scotland weather even cooperated with a few days of sunshine.

## Evaluating Bayesian and L1 Approaches for Sparse Unsupervised Learning [link]

Shakir Mohamed, Katherine Heller, and Zoubin Ghahramani

Encouraging sparsity is a common goal in many analysis tasks. Here, the authors provide what is (to my limited knowledge) the first rigorous comparison of two approaches: optimization using an L1 objective, and Bayesian inference using an L0 objective (spike and slab prior). Surprisingly, the authors suggest that the spike and slab approach offers comparable or even superior performance, without additional computational expense. This is somewhat intuitive, since an L1 approach forces shrinkage of *all* entries in a vector, while a spike and slab can force some entries in a vector to be exactly zero while allowing others to take more reasonable values. I’ve always intuitively thought the spike-and-slab approach would be expensive, but perhaps it is worth a new look.

## Revisiting k-means: New Algorithms via Bayesian Nonparametrics [link]

K-means allows efficient hard partitioning, but suffers from requiring good prior knowledge of exactly how many clusters are required. This new approach appeals to Bayesian nonparametrics to allow *automatic* determination of the number of clusters relevant for given dataset. While this approach has promise, from my discussions with the author it appears that this new algorithm trades selection of the number of clusters K with selection of a hyperparameter lambda, which roughly determines how often the algorithm prefers to create a new cluster to explain an outlier. I seem to remember that some datasets may be sensitive to small changes in lambda, but I can’t find a good example in the paper itself. Anyways, the idea is valuable and the authors present some neat extensions to clustering in multiple related datasets.

## A Hierarchical Dirichlet Process Model with Multiple Levels of Clustering for Human EEG Seizure Modeling [link]

I haven’t read this paper carefully, but I think this is a neat medically-driven application of Bayesian nonparametric analysis. The rough idea is that there is lots of related structure in measuring EEG seizures: each patient has some idiosyncrasies that should be accounted for, distinct sensor channels will have distinct behaviors, and different seizure types will produce different records. The HDP framework the authors provide allows joint modeling of all these factors simultaneously. It is nice to see some principled Bayesian structure modeling ideas applied to real data outside the text and image and video domains.

## Spectral Approaches to Learning Latent Variable Models [link]

OK, so this was a tutorial and not a paper. Nevertheless, I think it’s worth checking out. So much of latent variable model inference involves coordinate-ascent-style algorithms like EM or Gibbs sampling, which fix a subset of variables and update the others. These approaches are vulnerable to severe local optima. Spectral methods appear to offer global guarantees (no local optima), and can also be lightning fast. I’m very interested in looking into these approaches in more detail, but this requires dusting off the linear algebra skills.

## The Nonparametric Metadata Dependent Relational Model [link]

I’ll end with a shameless plug for my own paper (with co-authors Dae Il Kim and Erik Sudderth). If you’re interested in models for relational data that have a Bayesian nonparametric flavor, you should definitely check this out. Here’s a teaser photo of our ICML poster with yours truly.

## Recent Comments