Jul 12

CVPR 2012 Highlights

In early June, I attended CVPR, a premier vision conference.  Luckily for me, the conference was right here in Providence just 20 minutes from Brown, so I didn’t have far to go.

Definitely, one highlight was the lobster dinner, served to all 1000+ attendees.  After the jump, I’ll summarize some of my favorite research highlights.


Fisher Kernels and other ways to improve bag-of-words classification

Image categorization using Fisher Kernels of non-iid models [web]

The usual “bag-of-words” approach to classification has many weaknesses. This paper shows how using nice modeling tricks can help avoid things like the awful “iid” assumption inherent in usual BoW approaches. Further, they show that even though these models are generative and not discriminative, the Fisher kernel provides a way to get noticeably better recognition rates.  Definitely worth a read for its simplicity and easy-to-implement recommendations.

Human-in-the-loop Recognition Systems

So much of vision work these days wants the computer to “do it all”.  It’s nice to see what can happen when man and machine work together.

Discovering Local Attributes for Fine-Grained Recognition [PDF]

Some nice work by Kun Duan, Devi Parikh, David Crandall, and Kristen Grauman. Basic idea is to have humans in the loop to help identify what features might be more discriminative for some fine-grained recognition tasks (think differentiating between 20+ types of birds).  The machine proposes candidate features based on raw discriminativeness, and the human helps ensure the features are interpretable (e.g. white belly, red feathers, etc.).

Stream-based Joint Exploration and Exploitation [PDF]

This one was a dark horse for me. Despite being an oral presentation, I didn’t think it was really up my alley.  However, it turns out active learning is quite interesting, especially when you use Bayesian nonparametrics under the hood (as the authors do).  I need to read this closer to understand more details, but I think the overall direction is worthwhile.

Fine-Grained Categorization Datasets

One last theme worth mentioning is a recent surge in interest in developing datasets for object/action recognition that really force the algorithm to capture some subtleties. I like it when datasets push the envelope

Fine Grained Cooking Activities [dataset], [PDF]

65 cooking activities, like “dice”, “peel”, and “wash”. As a video understanding enthusiast, I think this data can really push the envelope.  Spatio-temporal “bag-of-words” probably can’t cut it here, so pose estimation, hand tracking, and other high-level representations will be crucial.

700 Species of North American Birds [project]

Okay, so only a teaser was promised at CVPR, so this isn’t really work presented at the conference.  But Serge Belongie’s group and Pietro Perona’s group have a nice dataset coming out that has something like 700 different kinds of birds. Definitely will influence how object recognition approaches evolve in the next few years.

Reactions from other researchers

Tomasz, a postdoc at Antonio Torralba’s lab at MIT, has a nice set of blog posts on CVPR.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>