This chapter introduces the basic concepts of Interpretable Machine Learning. We focus on supervised learning, explain the different types of explanations, repeat the topics correlation and interaction.
ML has a huge potential to aid decision making process due to its predictive performance. However many ML methods are black-boxes and are too complex to be understood by humans. When deploying these models, especially in business critical or sensitive use-cases, the lack of explanations hurts trust and creates barriers.
We mostly think of using these methods to create the best predictor i.e. “learn to predict”, the other important paradigm is to understand the underlying relationships themselves i.e. “learn to understand” which is often the case in medical applications for example.
This is a fairly strong simplification of the performance-interpretability tradeoff as even a linear model can be difficult to interpret in the high dimensional space whereas neural-networks could be made very shallow.
We should always start with the simpler, less complex and more explainable model and only choose a more complex network if it is really needed. “Need” doesn’t always mean need for a better predictive performance.
GDPR, EU AI Act, and other regulations around the world makes explainable and interpretable ML even more important.
Some models like shallow decision trees, simple linear regressions, are inherently interpretable and don’t require any specific methods post estimation. However they too can get become difficult to interpret if the trees are deep or if the regresion has too many interaction terms/engineered features.
For black-box models, we can develop techniques that are specific to one model (class), for e.g. summing up the Gini values across the tree.
We can also develop techniques/huerisitics that are model agnostic and universally applicable. This allows us to fit a predictor across many different model classes and then use model-agnostic technique.
Global interpretation methods explain the model behaviour for the entire input space by considering all available observations (e.g. Permutation Feature Importance, Partial Dependence plots, Accumulated Local Effects).
Local interpretation methods explain the model behaviour for single instances e.g. (Individual Conditional Expectation curves, LIME, Shapley Values, SHAP)
TO-DO: Fixed Model vs Refits
Geometric Interpretation: Consider just the numerator which is simply a sum of the product of $x_1 - \bar{x_1}$ and $x_2 - \bar{x_2}$. A multiplication of two numbers is simply the area under the rectangle represented by those points. The areas (numerator) determine the sign and the denominator simply scales it.
So if positive areas dominate then the correlation coefficient will be positive, if negative areas dominate then it will be negative. If the areas are equal then $\rho = 0$ which implies uncorrelated features.
Analytical Interpretation: Simply put if the direction/sign of the difference between an instance of the variable and the mean is the same i.e. if both $x_1$ and #x_2$ are smaller than their means OR both are larger than their respective means $\rightarrow$ positive correlation. On the other hand if the direction/sign is not the same $\rightarrow$ negative correlation.
An important thing to remember is that $\rho$ is a measure of linear correlation. For non-linear features, the correlation coefficient could be meaningless (more on that in a bit).
Fitting a linear model and analysing the slope alone is problematic as for the same underlying data we could get different slopes. For example if we scale m to cm, then the linear regression’s slope will now be 100x for the same variable. Or if we negate a variable then the new coefficient would be negative.
$R^2 = 1 - \frac{SSE_{LM}}{SSE_{const}} \in [0,1]$ where $SSE_{const}$ is a constant model. $\R^2$ is another measure to evaluate linear dependency which is invariant to sclaing and simple transformation (depends on the transformation ofcourse, if you multiply with another feature that is no longer a simple transformation) of the underlying data.
If $\frac{SSE_{LM}}{SSE_{const}}$ is 1, it implies that the fitted model is no different from a constant model implying no linear relation.
Definition: $X_j, X_k$ independent $\leftrightarrow$ the joint distribution is the product of the marginals i.e.
\[P(X_j, X_k) = P(X_j) \cdot P(X_k)\]An equivalent definition (that can be derived using the above definition) is:
\[P(X_j \mid X_k) = P(X_j)\]and vice versa implying that knowledge of the other variable has no effect on the conditional probability.
Mutual information describes the amount of information shared by two R.Vs by measuring how “different” the joint distribution is from the product of marginals. It is 0 if and only if the variables are independent.
\[MI(X_1, X_2) = E_{p(X_1, X_2)}[\log \frac{p(X_1, X_2)}{p(X_1) p(X_2)}]\]Whereas pearson correlation is limited to only continuous features, mutual information can be calculated for discrete variables as well.
Whereas feature dependencies concern only the data distribution, feature interactions may occur in structure of both the model or the data generating process. Feature dependencies may lead to feature interactions in a model.
Interactions: A feature’s effect on the prediction depends on other features e.g. $\hat{f}(x) = x_1 x_2 \Rightarrow$ Effect of $x_1$ on $\hat{f}$ depends on $x_2$ and vice versa. A function $f(x)$ contains an interaction between $x_j$ and $x_k$ if a difference in $f(x)$-values due to changes in $x_j$ will also depend on $x_k$ i.e.
\[E[\frac{\partial^2 f(x)}{\partial x_j \partial x_k}]^2 > 0\]The mixed partial derivative measures how the rate of change of $f$ wrt $x_j$ changes when $x_k$ varies. Again, it measures how the rate of change varies. If this derivative non-zero, it indicates an interaction effect and the squared expectation ensures that we capture the overall magnitude of this interaction effect across the feature space.
If $x_j$ and $x_k$ do not interact, $f(x)$ is a sum of 2 functions each independent of $x_j, x_k$ i.e.
\[f(x) = f_{-j}(x_1,.., x_{j-1}, x_{j+1},...) + f_{-k}(x_1,...,x_{k-1},x_{k+1},...)\]
![]() |
![]() |
Effect curve of a function with feature interactions | Effect curve of a function with no feature interactions |