In supervised learning, we are given a training set consists of a set of input-output pairs, where is the number of samples. The inputs is a set of attributes or features which are stored in an matrix. In classification problems, the output where denotes the number of outputs. If , it is called binary classification which is often . If , it is called multiclass classification. In contrast to classification problem, the response variable in regression problems is continuous numeric value . For example, consider the following table that contains data of student failure/pass prediction. The table has four samples (rows) and three attributes (columns), hence the input matrix has the size of . The output corresponds to label 0 (fail) and 1 (pass), .

Learning Time | Coursework | CGPA | Fail/Pass |

19h | 33% | 3.13 | 0 |

39h | 28% | 2.82 | 1 |

31h | 41% | 3.49 | 1 |

21h | 24% | 2.94 | 0 |

The aim of supervised learning is to learn a mapping from inputs to outputs . In other words, we want to estimate the function approximation, given the dataset. The estimation is often referred to as predictive model. A predictive model is built by observing the samples in the training set and learning to predict from . The goal is to use the predictive model that has been built to make predictions given a new or unobserved sample . The is to denote an estimate . This is the main challenge in machine learning which is to build a predictive model that performs well on new or unobserved samples. Predicting the output of the training set is easy since the model have observed or learned the samples. This is what we called as generalization.