Chapter 2: Interpretable Models

Some machine learning models are already inherently interpretable, e.g. simple LMs, GLMs, GAMs and rule-based models. These models are briefly summarized and their interpretation clarified.

§2.01: Inherently Interpretable Models - Motivation



§2.02: Linear Regression Model (LM)


Assumptions of the Linear Model

  1. Linear Relationship between the features and target.
  2. $\epsilon$ and $y \mid x$ are normally distributed with homoscedastic (constant) variance i.e. $ \epsilon \sim N(0, \sigma^2) \Rightarrow y \mid x \sim N(x^T \theta, \sigma^2)$. If the homoscedastic variance assumption is violated then inference based metrics like $p-$value or $t-$ statistics are no longer valid/reliable.
  3. Features $x_j$ are independent from the error term $\epsilon$. Therefore if we plot a single feature against the error term, we should see a point-cloud with no trend.
  4. No or little multicollinearity i.e. there are no strong correlations in our dataset.

Interpretation of Weights (Feature Effects)

Inference


§2.03: LM - Interactions and LASSO


Regularization via LASSO


§2.04: Generalized Linear Models


Logistic Regression


§2.05: Rule-based Models


Decision Trees

CART (Classification and Regression Trees)

CART is a non-parametric decision tree learning technique that can produce classification or regression trees.

ctree (Conditional Inference Tree)

Conditional Inference Trees (ctree) are a type of decision tree that uses a statistical framework based on conditional inference procedures for recursive partitioning. This approach aims to avoid the variable selection bias (favoring variables with more potential split points) present in algorithms like CART.

mob (Model-Based Recursive Partitioning)

Model-Based Recursive Partitioning (mob) is an extension of recursive partitioning that allows for fitting parametric models (like lm, glm, etc) in the terminal nodes of the tree. The partitioning is done based on finding subgroups in the data that exhibit statistically significant differences in the parameters of these models.

Key Differences (LLM Generated)
Feature CART ctree (Conditional Inference Trees) mob (Model-Based Recursive Partitioning)
Primary Goal Prediction (classification or regression) with simple node models. Unbiased variable selection and partitioning based on statistical significance. Identifying subgroups with structurally different parametric models.
Splitting Logic Greedy search for purity/SSE improvement. Statistical tests of independence between predictors and response. Statistical tests for parameter instability across partitioning variables.
Variable Bias Can be biased towards variables with more potential split points. Aims to be unbiased in variable selection. Focuses on variables that cause parameter changes in the node models.
Stopping Rule Grow full tree, then prune using cost-complexity and cross-validation. Stops when no statistically significant splits are found (e.g., p-value threshold). Stops when no significant parameter instability is detected.
Pruning Essential (cost-complexity pruning). Often not needed due to statistical stopping criterion. Pruning can be applied, or statistical stopping criteria used.
Node Models Constant value (majority class for classification, mean for regression). Constant value (majority class for classification, mean for regression). Parametric models (e.g., linear models, GLMs, survival models).
Statistical Basis Heuristic (impurity reduction). Formal statistical inference (permutation tests). Formal statistical inference (parameter instability tests, M-fluctuation tests).
Output Insight Decision rules leading to a prediction. Decision rules with statistical backing for splits. Tree structure showing subgroups where different model parameters apply.

Other Rule Based Models


§2.06: Generalized Additive Models and Boosting

Generalized Additive Models (GAM)

Model Based Boosting (TO-DO)