Supervised Learning

In supervised learning, we are given a training set consists of a set of input-output pairs, D=\{\mathbf{x}_i,y_i\}^N_{i=1} where N is the number of samples. The inputs is a set of d attributes or features which are stored in an N\timesd matrix. In classification problems, the output y=\{1,...,C\} where C denotes the number of outputs. If C=2, it is called binary classification which is often y=\{0,1\}. If C>2, it is called multiclass classification. In contrast to classification problem, the response variable in regression problems is continuous numeric value y\in\mathbb{R}. For example, consider the following table that contains data of student failure/pass prediction. The table has four samples (rows) and three attributes (columns), hence the input matrix has the size of 4\times3. The output corresponds to label 0 (fail) and 1 (pass), y=\{0,1\}.

Learning Time Coursework CGPA Fail/Pass
19h 33% 3.13 0
39h 28% 2.82 1
31h 41% 3.49 1
21h 24% 2.94 0

The aim of supervised learning is to learn a mapping from inputs \mathbf{x} to outputs y. In other words, we want to estimate the function approximation, y=f(\mathbf{x}) given the dataset. The estimation is often referred to as predictive model. A predictive model is built by observing the samples in the training set and learning to predict y from \mathbf{x}. The goal is to use the predictive model that has been built to make predictions \hat{y}=f(\mathbf{x}) given a new or unobserved sample \mathbf{x}. The \hat{y} is to denote an estimate p(\hat{y},\mathbf{x}). This is the main challenge in machine learning which is to build a predictive model that performs well on new or unobserved samples. Predicting the output of the training set is easy since the model have observed or learned the samples. This is what we called as generalization.

Leave a Reply