Advanced ML 8 Supervised Learning 7 Reinforcement Learning 6 Large Language Models 1 01 Chapter 1: Interpretable ML Introduction This chapter introduces the basic concepts of Interpretable Machine Learning. We focus on supervised learning, explain the different types of explanations, repeat the topics correlation and interaction. 5 sections 02 Chapter 2: Interpretable Models Some machine learning models are already inherently interpretable, e.g. simple LMs, GLMs, GAMs and rule-based models. These models are briefly summarized and their interpretation clarified. 6 sections 03 Chapter 3: Feature Effects Feature Effects indicate the change in prediction due to changes in feature values. This chapter explains the feature effects methods ICE curves, PDP and ALE plots. 5 sections 04 Chapter 4: Functional Decomposition This chapter focuses on understanding how ML models make predictions by breaking down their behavior into simpler, interpretable components. This is achieved through the concept of Functional Decomposition, with specific methods like Classical Functional ANOVA (fANOVA) and Friedman's H-Statistic. 4 sections 05 Chapter 5: Shapley Shapley values originate from classical game theory and aim to fairly devide a payout between players. In this section a brief explanation of Shapley values in game theory is given, followed by an adaption to IML resulting in the method SHAP. 3 sections 06 Chapter 6: Local Interpretable Model-agnostic Explanations (LIME) A common approach to interpret an ML model locally is implemented by LIME. The basic idea is to fit a surrogate model while focussing on data points near the observation of interest. The resulting model should be an inherently interpretable one. 4 sections 07 Chapter 7: Counterfactual Explanations This chapter deals with further local analyses. First, counterfactuals are examined which search for data points in the neighborhood of an observation that lead to a different prediction. 2 sections 08 Chapter 8: Feature Importance Methods belonging to this category aim to rank the features according to their influence on the predictive performance of an ML model. Depending on the interpretation goal, these methods are more or less suitable. 2 sections 01 Chapter 11: Advanced Risk Minimization This chapter revisits the theory of risk minimization, providing more in-depth analysis on established losses and the connection between empirical risk minimization and maximum likelihood estimation. 13 sections 02 Chapter 12: Multiclass Classification This chapter treats the multiclass case of classification. Tasks with more than two classes preclude the application of some techniques studied in the binary scenario and require an adaptation of loss functions. 3 sections 03 Chapter 13: Information Theory This chapter covers basic information-theoretic concepts and discusses their relation to machine learning. 8 sections 04 Chapter 15: Regularisation Regularization is a vital tool in machine learning to prevent overfitting and foster generalization ability. This chapter introduces the concept of regularization and discusses common regularization techniques in more depth. 12 sections 05 Chapter 16: Linear Support Vector Machine This chapter introduces the linear support vector machine (SVM), a linear classifier that finds decision boundaries by maximizing margins to the closest data points, possibly allowing for violations to a certain extent. 5 sections 06 Chapter 17: Nonlinear Support Vector Machines Many classification problems warrant nonlinear decision boundaries. This chapter introduces nonlinear support vector machines as a crucial extension to the linear variant. 6 sections 07 Chapter 18: Boosting This chapter introduces boosting as a sequential ensemble method that creates powerful committees from different kinds of base learners. 12 sections 01 Chapter 1: Introduction & Multi-Armed Bandits This chapter introduces the fundamental concepts of Reinforcement Learning including its key characteristics of trial-and-error search and delayed rewards. It also introduces Multi Armed Bandits, exploration-exploitation tradeoffs, and various methods for action-value estimation 9 sections 02 Chapter 2: Finite Markov Decision Processes This chapter explores the fundamental concepts of Markov Decision Processes (MDPs) covering the agent-environment interface, goals and rewards, returns and episodes, and policies and value functions. 6 sections 03 Chapter 4: Temporal-Difference Learning This chapter covers Temporal-Difference (TD) learning methods that combine ideas from Monte Carlo and dynamic programming, enabling agents to learn directly from raw experience without a model of the environment. 4 sections 04 Chapter 5: n-step Bootstrapping This section explains n-step bootstrapping techniques, which generalize TD learning by updating value estimates using returns accumulated over multiple steps, balancing bias and variance in learning. 4 sections 05 Chapter 6: Function Approximation, Deep Q-Networks, Expected SARSA This chapter discusses function approximation methods for scaling RL to large state spaces, including Deep Q-Networks (DQN) for learning value functions with neural networks and the Expected SARSA algorithm for stable policy evaluation. 5 sections 06 Chapter 7: Policy Gradient Algorithms, REINFORCE, Actor-Critic Algorithms, DPG, Hierarchical RL This section introduces policy gradient methods for directly optimizing policies, detailing the REINFORCE algorithm, Actor-Critic frameworks, Deterministic Policy Gradient (DPG), and approaches to hierarchical reinforcement learning for complex task decomposition. PDF notes 01 Large Language Models Transformers, Attention, Positional Encoding, BERT, BART, GPT, Pre-Training & Finetuning, Decoding Strategies, Tokenization, Data, Fast Attention Mechanisms, LoRA, Fast Inference Mechanism. PDF notes
01 Chapter 1: Interpretable ML Introduction This chapter introduces the basic concepts of Interpretable Machine Learning. We focus on supervised learning, explain the different types of explanations, repeat the topics correlation and interaction. 5 sections
02 Chapter 2: Interpretable Models Some machine learning models are already inherently interpretable, e.g. simple LMs, GLMs, GAMs and rule-based models. These models are briefly summarized and their interpretation clarified. 6 sections
03 Chapter 3: Feature Effects Feature Effects indicate the change in prediction due to changes in feature values. This chapter explains the feature effects methods ICE curves, PDP and ALE plots. 5 sections
04 Chapter 4: Functional Decomposition This chapter focuses on understanding how ML models make predictions by breaking down their behavior into simpler, interpretable components. This is achieved through the concept of Functional Decomposition, with specific methods like Classical Functional ANOVA (fANOVA) and Friedman's H-Statistic. 4 sections
05 Chapter 5: Shapley Shapley values originate from classical game theory and aim to fairly devide a payout between players. In this section a brief explanation of Shapley values in game theory is given, followed by an adaption to IML resulting in the method SHAP. 3 sections
06 Chapter 6: Local Interpretable Model-agnostic Explanations (LIME) A common approach to interpret an ML model locally is implemented by LIME. The basic idea is to fit a surrogate model while focussing on data points near the observation of interest. The resulting model should be an inherently interpretable one. 4 sections
07 Chapter 7: Counterfactual Explanations This chapter deals with further local analyses. First, counterfactuals are examined which search for data points in the neighborhood of an observation that lead to a different prediction. 2 sections
08 Chapter 8: Feature Importance Methods belonging to this category aim to rank the features according to their influence on the predictive performance of an ML model. Depending on the interpretation goal, these methods are more or less suitable. 2 sections
01 Chapter 11: Advanced Risk Minimization This chapter revisits the theory of risk minimization, providing more in-depth analysis on established losses and the connection between empirical risk minimization and maximum likelihood estimation. 13 sections
02 Chapter 12: Multiclass Classification This chapter treats the multiclass case of classification. Tasks with more than two classes preclude the application of some techniques studied in the binary scenario and require an adaptation of loss functions. 3 sections
03 Chapter 13: Information Theory This chapter covers basic information-theoretic concepts and discusses their relation to machine learning. 8 sections
04 Chapter 15: Regularisation Regularization is a vital tool in machine learning to prevent overfitting and foster generalization ability. This chapter introduces the concept of regularization and discusses common regularization techniques in more depth. 12 sections
05 Chapter 16: Linear Support Vector Machine This chapter introduces the linear support vector machine (SVM), a linear classifier that finds decision boundaries by maximizing margins to the closest data points, possibly allowing for violations to a certain extent. 5 sections
06 Chapter 17: Nonlinear Support Vector Machines Many classification problems warrant nonlinear decision boundaries. This chapter introduces nonlinear support vector machines as a crucial extension to the linear variant. 6 sections
07 Chapter 18: Boosting This chapter introduces boosting as a sequential ensemble method that creates powerful committees from different kinds of base learners. 12 sections
01 Chapter 1: Introduction & Multi-Armed Bandits This chapter introduces the fundamental concepts of Reinforcement Learning including its key characteristics of trial-and-error search and delayed rewards. It also introduces Multi Armed Bandits, exploration-exploitation tradeoffs, and various methods for action-value estimation 9 sections
02 Chapter 2: Finite Markov Decision Processes This chapter explores the fundamental concepts of Markov Decision Processes (MDPs) covering the agent-environment interface, goals and rewards, returns and episodes, and policies and value functions. 6 sections
03 Chapter 4: Temporal-Difference Learning This chapter covers Temporal-Difference (TD) learning methods that combine ideas from Monte Carlo and dynamic programming, enabling agents to learn directly from raw experience without a model of the environment. 4 sections
04 Chapter 5: n-step Bootstrapping This section explains n-step bootstrapping techniques, which generalize TD learning by updating value estimates using returns accumulated over multiple steps, balancing bias and variance in learning. 4 sections
05 Chapter 6: Function Approximation, Deep Q-Networks, Expected SARSA This chapter discusses function approximation methods for scaling RL to large state spaces, including Deep Q-Networks (DQN) for learning value functions with neural networks and the Expected SARSA algorithm for stable policy evaluation. 5 sections
06 Chapter 7: Policy Gradient Algorithms, REINFORCE, Actor-Critic Algorithms, DPG, Hierarchical RL This section introduces policy gradient methods for directly optimizing policies, detailing the REINFORCE algorithm, Actor-Critic frameworks, Deterministic Policy Gradient (DPG), and approaches to hierarchical reinforcement learning for complex task decomposition. PDF notes
01 Large Language Models Transformers, Attention, Positional Encoding, BERT, BART, GPT, Pre-Training & Finetuning, Decoding Strategies, Tokenization, Data, Fast Attention Mechanisms, LoRA, Fast Inference Mechanism. PDF notes