This chapter treats the multiclass case of classification. Tasks with more than two classes preclude the application of some techniques studied in the binary scenario and require an adaptation of loss functions.
Multiclass Brier Score is defined on a vector of class probabilities $(\pi_1(x), …\pi_g(x))$:
\[L(y, \pi(x)) = (1_{\{y=k\}} - \pi(k))^2\]Multiclass Log Loss / Cross Entropy Loss is a generalisation of the binary case:
\[L(y, \pi(x)) = - \Sigma_{k=1}^g1_{\{y=k\}} \log(\pi_k(x))\]The softmax regression is a straightforward generalisation of the logistic regression to the multi class case. Instead of single linear discriminant function, we have g linear discriminant functions, each indicating confidence in class k:
\[f_k(x) = \theta_k^Tx\]The g score functions are transformed into probabilities by the softmax function:
\[\pi_k(x) = s(f(x))_k = \frac{\exp(\theta_k^Tx)}{\Sigma_{j=1}^g \exp(\theta_j^Tx)}\]Note that the softmax function is a smooth approximation of the argmax function and is invariant to constant offsets i.e. :
\[s(f(x) + c) = s(f(x))\]One-vs-Rest: For a g-class problem, create g-subproblems where in each problem, one class is encoded as positive, all the others are encoded as the negative. Then you output the class which has the highest score i.e.:
\[\hat{y} = \underset{k \in 1,2,...,g}{\operatorname{argmax}} \hat{f}_k(x)\]