Chapter 7: Counterfactual Explanations

This chapter deals with further local analyses. First, counterfactuals are examined which search for data points in the neighborhood of an observation that lead to a different prediction.

§7.01: Counterfactual Explanations (CE)


Extending the Opyimization Problem

We also extend this optimization problem by considering other objectives apart from proximity/distance in order to improve explanation quality like Sparsity and Plausibility.

  1. Sparsity favours counterfactual data points that have fewer feature changes. i.e. the most “proximal” counterfactual data point could have all features changed. Sparsity would instead prefer “more distant” data points that have fewer feature changes.
    • This could be integrated into \(o_{proximity}\) by using the \(L_0\) or \(L_1\) norm.
    • Alternatively, include a separate objectivity measure (e.g. via \(L_0\) norm):
    \[o_{sparse} = \sum_{j=1}^{p} 1(x_j' \neq x_j)\]
  2. Plausibility: The proposed data points should be realistic and not implausible (e.g. becoming unemployed and increase in income). Estimating the joint distribution is hard especially if the feature spaces are mixed. A common proxy is to ensure that the new proposed point lies in the data manifold. This is done by preferring points that not necessarily “most proximal” but “most proximal to the training data / data manifold”.
    • This done by using the measuring the Gower distance of the proposed point to the nearest data point in the training dataset.
CEs explain model predictions NOT real-world users .

§7.02: Methods & Discussion of CEs


Many methods exist to generate counterfactuals, they mainly differ in:

First Optimization-Based CE Method

\[\operatorname{argmin}_{\mathbf{x'}} \max_\lambda \lambda (\hat{f}(\mathbf{x'}) - y')^2 + \sum_{j=1}^p \frac{\mid x_j' - x_j \mid}{MAD_j}\]

Where:

Limitations and Pitfalls