Chapter 1: Interpretable ML Introduction

This chapter introduces the basic concepts of Interpretable Machine Learning. We focus on supervised learning, explain the different types of explanations, repeat the topics correlation and interaction.

§1.01: Introduction, Motivation, and History



§1.02: Interpretation Goals


Interpreting models can serve a variety of important goals, including:


§1.03: Dimensions of Interpretability



§1.04: Correlation and Dependencies

Pearsons Correlation Coefficient:

\[\rho (X_1, X_2) = \frac{\sum_{i=1}^{n} (x_1^{(i)} - \bar{x_1})(x_2^{(i)} - \bar{x_2})}{\sqrt{\sum_{i=1}^n (x_1^{(i)} - \bar{x_1})^2} \sqrt{\sum_{i=1}^n (x_2^{(i)} - \bar{x_2})^2}} \in [-1, 1]\]

So if positive areas dominate then the correlation coefficient will be positive, if negative areas dominate then it will be negative. If the areas are equal then $\rho = 0$ which implies uncorrelated features.

Coefficient of determination $R^2$:

Dependence:

Mutual Information


§1.05: Interaction


When there is an interaction, the effect of one variable on the outcome changes depending on the level of another variable. For example, the effect of $ x_1 $ on $ f(x) $ might be stronger at higher levels of $ x_2 $ and weaker at lower levels of $ x_2 $ When there is no interaction, the effect of one variable on the outcome is consistent across all levels of another variable. For example, if we are examining the effect of $ x_1 $$ on $ f(x) $$ at different levels of $ x_2 $, the slope of the effect of curve of $ x_1 $ will be the same regardless of the value of $ x_2 $