Join my research team at Tufts CS!
In Fall 2018, I am actively recruiting:
 new Ph.D. students to join in Spring or Fall 2019
 current undergrad and masters students
Keep reading below for the skills I'm looking for, some possible highlevel project ideas, etc.
Jump to: Project Ideas • Current Tufts Students • Prospective Students • Prerequisite Skills
Possible projects/topics
Possible project ideas, based on what I've done in the past with links to sample research papers:
Statistical Machine Learning

Semisupervised training of latent variable models [My AISTATS 2018 paper]: How do we learn from datasets when labeled measurements are rare but unlabeled measurements are easy? Labels we wish to predict are often expensive, timeconsuming, or dangerous to collect (e.g. does a patient respond to this medication?), but we have many thousands of unlabeled data measurements that might be useful (patient records). How can we make sure this works even when our model is wrong?

Bayesian nonparametric models [My NIPS 2015 paper], [Bayesian Nonparametrics in Python (BNPy) package] : Many clustering algorithms require the number of clusters to be chosen in advance before any model fitting. Can we apply Bayesian nonparametric models to 'let the data speak' and learn a posterior distribution over possible clusterings that adapts to the complexity of the data at hand? How do we avoid the local optima that might result?

Latent Variable Models with Neural Networks for Recognition: I'm excited about deep generative models and variational autoencoders as ways to do fast inference for flexible model families.

Models that are Robust to Missing Data: How do we make good statistical predictions when some measurements are missing, or measurements happen at many different frequencies (some every minute, some irregularly every dozen hours or so)?
Machine Learning applications in Healthcare

Intensive Care Interventions [My AMIA CRI 2017 paper], [MIMIC dataset]: Patients in the intensive care units of hospitals are evolving all the time. Can we predict when they will need mechanical ventilators based on sensor readings? Could we do so far enough in advance that we might helpfully change clinical workflow?

Depression: Patients who suffer from depression often look to antidepressant drugs, but currently psychiatrists have trouble knowing which of many drugs will work for that patient. Can we use probabilistic ML to predict which drug will work for a patient based on their history?

Fertility Medicine: Couples who struggle to get pregnant look to invitro fertilization (IVF), which requires several medicationbased interventions. Can we learn from previous records to predict which drugs will give patients the best chance? Can our models give insight into the science behind why predictions might work?
Current students at Tufts
I'm excited to work with strong undergraduate, Masters, and Ph.D. students to make exciting machine learning research happen. I don't generally pursue research projects with students who are brand new to probabilistic machine learning or data analysis, but if you have even a small bit of prior experience I would love for you to join the team. See below for a quick recap of skills I hope students have before joining a project.
If you think you have the right background and are willing to commit at least two semesters (or at least 1 semester + summer) to research, please send me an email with:
 what project ideas seem most exciting to you?
 quick list of courses you've completed at Tufts or other schools relevant to machine learning / data analysis
 pointer to a previous course project report / github repository / etc. to demonstrate ML skills or coding skills
 statement confirming you're willing to give several hours a week for multiple semesters (or a semester + summer)
I know this is not a casual commitment, but I will work hard with you to deliver a meaningful experience that results in some kind of opensource software release and/or publication in a workshop or conference. If you look at my CV, you'll see all the students marked with ^u (for ugrad), ^m (for masters), or ^d (for doctoral) that I've mentored in the past, so I have a pretty decent track record.
Prospective students
Prospective Ph.D. students: Apply to Tufts First
You should apply to the Tufts CS Ph.D. program. If you mention me by name in your research statement, I'll be able to consider the application carefully.
 CS Ph.D. Program Overview
 Application
 Deadlines: apply for Fall admission (deadline mid December) OR Spring admission (deadline mid Sept.)
Prospective M.S. or PostBacc. students: Apply to Tufts First
Generally, I plan to take M.S. students to join the lab after they have been accepted into a program and complete some machinelearning coursework successfully. You should apply to one of the possible Tufts CS masters programs, including
 M.S. in Data Science
 M.S. in Computer Science
 Post Bacc. Certificates in Data Science or Computer Science
Prerequisite Skills
Preferred candidates will have a strong machine learning mathematical background and/or strong Python data science development background.
Probabilistic Machine Learning Skills
My group's work in statistical machine learning requires some prior understanding of concepts like Bayesian data analysis, supervised machine learning, optimization algorithms (e.g. gradient descent), unsupervised machine learning (e.g. kmeans cluster/PCA) or deep learning. Good evidence for your probabilistic ML capabilities would include:

Evidence that you can complete a selfdirected research project involving fitting a probabilistic model to data and interpreting its results. This might be a selfstudy project or a project from previous coursework. The key skill here is to be able to translate inference algorithm pseudocode from a research paper into a concrete Python implementation without much guidance.

Successful completion of AI/ML coursework at Tufts, including COMP 135 (Intro to ML) or COMP 136 (Statistical Pattern Recognition) or other 150level courses. If you did a final project for that course, sending me your report/slides would be a great litmus test.
Python Data Science Development Skills
A key goal of my group is to develop opensource Python software so that nonexperts can apply our novel machine learning algorithms to their datasets in meaningful ways. For examples of previous opensource work, see our Bayesian Nonparametrics in Python (BNPy) package or our recent Predictionconstrained topic models package.
I am keen to make these packages more usable and would love some help extending their features.
Good evidence of software capabilities would include:

An opensource repository on github showcasing some smallbutinteresting project produced using common Python data science stack packages (numpy, scipy, pandas, scikitlearn, PyTorch, tensorflow, etc.)

If you did some closedsource contributions (perhaps at an internship), please describe them in a brief paragraph. Try to give examples of concrete algorithms you've implemented or the scale of datasets you've worked with.