Machine Learning for Clinicians: Advances for Multi-Modal Health Data

A Tutorial at MLHC 2018

Thursday August 16, 2018, 13:30-16:30
Li Ka Shing Learning and Knowledge Center, Room LK120, Stanford University

Goals

This tutorial is intended for clinicians and other healthcare professionals who wish to become familiar with recent advances in machine learning and how these might be applied to medical applications. Within machine learning, we will focus mostly on supervised methods for making predictions, starting at basic predictors from static, vector-valued data and building up to multi-modal, time-varying signals. One motivating example will be the combination of demographics, vital sign data, diagnosis or procedural code data, text data, and image data available in an in-patient hospital setting.

The tutorial has 3 parts. The first part will provide a common foundation of common methods used training basic single-data-source predictors (e.g. predict length of stay from patient vital signs) as well as best practices for evaluating these predictors. The second part will cover methods for learning rich representations that can handle structured data such as time series, text, and images. The final part will cover recent methods that address 6 key methodological challenges arising from healthcare applications, such as missing data, methods for combining labeled and unlabeled data (semi-supervised learning), methods for learning from multiple data sources (multimodal learning), explainable/interpretable models, models that try to account for causality, and sequential decision-making (reinforcement learning).

Throughout each part, we hope to provide a practical tour of major methodological approaches, give examples of cutting-edge applications to the healthcare domain, and cut through the hype to identify key limitations and remaining open problems.

We hope to give you tools to navigate successful collaborations with machine learning researchers, as well as the ability to critically evaluate claims about machine learning methods in the literature. We hope that after completing this tutorial, you could be a more competent reviewer for the computational side of a submitted MLHC paper.

Target Audience

This is targeted at clinicians and other healthcare professionals who might have limited exposure to machine learning methods in the past. This is probably right for you if you have some basic understanding of methods linear regression/logistic regression or decision trees, but wish to know more about how the latest methodological advances in machine learning might be applied to healthcare problems.

This tutorial is not necessarily designed for professional data scientists or machine learning researchers, though you are welcome to attend.

Content

01:30-01:45 Overview (15 min) [slides PDF]

  • Tutorial Goals
  • Success Stories: ML for Sepsis Prediction
  • Context: ML as a piece of a much larger puzzle

01:45-02:20 Part 1: Making Predictions (45 min) [slides PDF]

  • Evaluation of Predictions
    • Train/Valid/Test
    • Performance Metrics
      • TPR, FPR, AUC, etc.
    • Calibration
    • Decision-Theory and Utility
  • Basic Methods for Regression and Classification

    • Linear models
    • Decision trees and Random Forests
    • Simple Neural Nets
  • Predictions with Uncertainty

    • Gaussian Processes

02:20-02:30 <<< Questions + Break >>>

02:30-03:15 Part 2: Learning Representations (45 min) [slides PDF]

  • Representations using Bag-of-Words
    • Topic Models
  • Learned Representations for Images

    • Convolutional Networks
  • Learned Representations for Time Series

    • RNNs
  • Learned Representations for Text

    • RNNs
    • Embeddings
  • Tricks of the Trade

    • Dropout, Data Augmentation, etc.
  • Models that Generate Data

    • Autoencoders (AEs) & Denoising AEs
    • Deep Generative Models
    • Variational Autoencoders (VAEs)
    • GANs

03:15-03:30 <<< Questions + Break >>>

03:30-04:30 Part 3: Addressing Health Data Challenges [slides PDF]

We consider 6 key challenges for ML that arise from health applications. We will summarize

  • Missing data
    • Imputation strategies that work
    • Modeling strategies that work
  • Incomplete/partial labels (Semi-supervised learning)
    • Two stage strategies
    • End-to-end strategies
  • Multi-modal data
    • Learning joint/shared representations
    • Learning coordinated representations
  • Interpretability
    • Methods designed to be inspected (SLIM)
    • Methods to explain a fixed, pre-trained predictor
    • Methods to optimize predictors to be more interpretable
  • Causality
    • Potential Outcomes framework for Counterfactuals
    • Examples on health data
  • Sequential Decision Making / Reinforcement Learning
    • Recent Successes from "deep" RL
    • Limitations for Health Applications
    • Evaluation Methods on Observational data

Resources

Bibliography

Related Tutorials

Related Conferences and Workshops