Chapter 17: Nonlinear Support Vector Machines

Many classification problems warrant nonlinear decision boundaries. This chapter introduces nonlinear support vector machines as a crucial extension to the linear variant.

§17.01: Feature Generation for Nonlinear Separation



§17.02: The Kernel Trick


  • The key observation here is that our problem does not depend on the features itself but rather on the the pairwise inner-product of the inputs i.e. we don’t need to know the transformations directly but only need to know the product.
  • The entire idea behind the kernel method is to find a “shortcut” to directly calculate these inner products instead of first transforming and then calculating.

§17.03: The Polynomial Kernel



§17.04: Reproducing Kernel Hilbert Space and Representer Theorem



§17.05: The Gaussian RBF Kernel


  • What makes the gaussian kernel so powerful is that we can approximate any arbitrary continuous function as a sum of Gaussians.
  • The “radial” Gaussian Kernel is defined as: $$ k(x, y) = \exp(-\frac{||x-y||^2}{2\sigma^2})$$
  • If we choose a large $\sigma$, then the width becomes large and we would need a very large number of such functions estimate the original function.
  • A large $\sigma$ (small $\gamma)$ leads to a very smooth decision boundary and in the limit is essentially a linear function.
  • With enough such narrow curves, we can more or less approximate all functions.
  • A small $\sigma$ (large $\gamma)$ would yield a very rough/wiggly decision boundary and in the limit will basically model each data point.

§17.06: SVM Model Selection