{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Ensemble Learning\n", "\n", "In this lab, we will be implementing Ensemble Learning using Scikit Learn (sklearn).\n", "\n", "\n", "## Part 1: Input Data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Import the standard modules to be used in this lab\n", "import pandas as pd\n", "import numpy as np\n", "from matplotlib import pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agesexcptrestbpscholfbsrestecgthalachexangoldpeakslopecathaltarget
063131452331015002.30011
137121302500118703.50021
241011302040017201.42021
356111202360117800.82021
457001203540116310.62021
\n", "
" ], "text/plain": [ " age sex cp trestbps chol fbs restecg thalach exang oldpeak slope \\\n", "0 63 1 3 145 233 1 0 150 0 2.3 0 \n", "1 37 1 2 130 250 0 1 187 0 3.5 0 \n", "2 41 0 1 130 204 0 0 172 0 1.4 2 \n", "3 56 1 1 120 236 0 1 178 0 0.8 2 \n", "4 57 0 0 120 354 0 1 163 1 0.6 2 \n", "\n", " ca thal target \n", "0 0 1 1 \n", "1 0 2 1 \n", "2 0 2 1 \n", "3 0 2 1 \n", "4 0 2 1 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_pd = pd.read_csv('heart.csv')\n", "data_pd.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that all the ranges of the features are not the same. This may cause a problem where a small change in a feature might not affect the other. So we normalize the ranges of the features to a uniform range which in this case 0 - 1." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import MinMaxScaler\n", "scaler = MinMaxScaler(feature_range=(0,1))\n", "normData = scaler.fit_transform(data_pd.iloc[:,0:-1])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agesexcptrestbpscholfbsrestecgthalachexangoldpeakslopecathaltarget
00.7083331.01.0000000.4811320.2442921.00.00.6030530.00.3709680.00.00.3333331
10.1666671.00.6666670.3396230.2831050.00.50.8854960.00.5645160.00.00.6666671
20.2500000.00.3333330.3396230.1780820.00.00.7709920.00.2258061.00.00.6666671
30.5625001.00.3333330.2452830.2511420.00.50.8167940.00.1290321.00.00.6666671
40.5833330.00.0000000.2452830.5205480.00.50.7022901.00.0967741.00.00.6666671
\n", "
" ], "text/plain": [ " age sex cp trestbps chol fbs restecg thalach exang \\\n", "0 0.708333 1.0 1.000000 0.481132 0.244292 1.0 0.0 0.603053 0.0 \n", "1 0.166667 1.0 0.666667 0.339623 0.283105 0.0 0.5 0.885496 0.0 \n", "2 0.250000 0.0 0.333333 0.339623 0.178082 0.0 0.0 0.770992 0.0 \n", "3 0.562500 1.0 0.333333 0.245283 0.251142 0.0 0.5 0.816794 0.0 \n", "4 0.583333 0.0 0.000000 0.245283 0.520548 0.0 0.5 0.702290 1.0 \n", "\n", " oldpeak slope ca thal target \n", "0 0.370968 0.0 0.0 0.333333 1 \n", "1 0.564516 0.0 0.0 0.666667 1 \n", "2 0.225806 1.0 0.0 0.666667 1 \n", "3 0.129032 1.0 0.0 0.666667 1 \n", "4 0.096774 1.0 0.0 0.666667 1 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "normData_pd = pd.DataFrame(normData)\n", "normData_pd.columns = data_pd.columns[:-1]\n", "normData_pd['target'] = data_pd['target']\n", "normData_pd.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's split the dataset into training and testing sets i.e. 2/3 as training set and 1/3 as testing set." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "X_train, X_test, y_train, y_test = train_test_split(normData_pd.iloc[:,0:-1], \n", " normData_pd.iloc[:,-1], test_size=0.3, random_state=80)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 7: Exercise\n", "The lab exercise uses auto mpg datasets.

The data is technical spec of cars.\n", "Attribute Information:\n", "1. mpg (miles per gallon): continuous\n", "2. cylinders: multi-valued discrete\n", "3. displacement: continuous\n", "4. horsepower: continuous\n", "5. weight: continuous\n", "6. acceleration: continuous\n", "7. model year: multi-valued discrete\n", "8. origin: multi-valued discrete\n", "9. car name: string (unique for each instance)\n", "\n", "We would like to predict origin. Build ensemble models to perform the prediction. Use random_state=5 to split the dataset into training and test data." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodel_yearorigincar_name
018.08307.0130.03504.012.0701chevrolet chevelle malibu
115.08350.0165.03693.011.5701buick skylark 320
218.08318.0150.03436.011.0701plymouth satellite
316.08304.0150.03433.012.0701amc rebel sst
417.08302.0140.03449.010.5701ford torino
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight acceleration \\\n", "0 18.0 8 307.0 130.0 3504.0 12.0 \n", "1 15.0 8 350.0 165.0 3693.0 11.5 \n", "2 18.0 8 318.0 150.0 3436.0 11.0 \n", "3 16.0 8 304.0 150.0 3433.0 12.0 \n", "4 17.0 8 302.0 140.0 3449.0 10.5 \n", "\n", " model_year origin car_name \n", "0 70 1 chevrolet chevelle malibu \n", "1 70 1 buick skylark 320 \n", "2 70 1 plymouth satellite \n", "3 70 1 amc rebel sst \n", "4 70 1 ford torino " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "auto_mpg = pd.read_csv(\"auto_mpg.csv\", delim_whitespace=True)\n", "auto_mpg.head()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodel_yearorigincar_name
00.2393621.00.6175710.4565220.5361500.2380950.010.161184
10.1595741.00.7286820.6467390.5897360.2083330.010.118421
20.2393621.00.6459950.5652170.5168700.1785710.010.759868
30.1861701.00.6098190.5652170.5160190.2380950.010.046053
40.2127661.00.6046510.5108700.5205560.1488100.010.529605
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight acceleration \\\n", "0 0.239362 1.0 0.617571 0.456522 0.536150 0.238095 \n", "1 0.159574 1.0 0.728682 0.646739 0.589736 0.208333 \n", "2 0.239362 1.0 0.645995 0.565217 0.516870 0.178571 \n", "3 0.186170 1.0 0.609819 0.565217 0.516019 0.238095 \n", "4 0.212766 1.0 0.604651 0.510870 0.520556 0.148810 \n", "\n", " model_year origin car_name \n", "0 0.0 1 0.161184 \n", "1 0.0 1 0.118421 \n", "2 0.0 1 0.759868 \n", "3 0.0 1 0.046053 \n", "4 0.0 1 0.529605 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.preprocessing import LabelEncoder\n", "from sklearn.preprocessing import MinMaxScaler\n", "\n", "enc = LabelEncoder()\n", "enc.fit(auto_mpg['car_name'])\n", "labels = enc.transform(auto_mpg['car_name'])\n", "auto_mpg['car_name'] = labels\n", "\n", "column_names = auto_mpg.columns\n", "origin = auto_mpg['origin']\n", "scaler = MinMaxScaler()\n", "auto_mpg = scaler.fit_transform(auto_mpg)\n", "auto_mpg = pd.DataFrame(auto_mpg, columns=column_names)\n", "auto_mpg['origin'] = origin\n", "auto_mpg.head()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3 4 5]\n", "[ 98.54179491 113.69086729 145.79160346 61.01898487 112.74350469\n", " 14.30828699 7.73876738]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "E:\\Programs\\Miniconda3\\envs\\myenv\\lib\\site-packages\\ipykernel_launcher.py:12: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n", " if sys.path[0] == '':\n", "E:\\Programs\\Miniconda3\\envs\\myenv\\lib\\site-packages\\ipykernel_launcher.py:13: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n", " del sys.path[0]\n" ] } ], "source": [ "from sklearn.feature_selection import SelectKBest\n", "from sklearn.feature_selection import f_classif\n", "from sklearn.model_selection import train_test_split\n", "kBest = SelectKBest(f_classif, k=6)\n", "kBest.fit(auto_mpg.iloc[:,0:7], auto_mpg['origin']) # run the score function on the data\n", "idx = kBest.get_support(True)\n", "print(idx)\n", "scores = kBest.scores_\n", "print(scores)\n", "\n", "auto_mpg_kbest = auto_mpg.iloc[:,[0,1,2,4]]\n", "auto_mpg_kbest['origin'] = auto_mpg['origin']\n", "auto_mpg_kbest['origin'] = auto_mpg_kbest['origin'].astype(int)\n", "auto_mpg_kbest.head()\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(auto_mpg_kbest.iloc[:,0:-1], \n", " auto_mpg_kbest.iloc[:,-1], test_size=0.3, random_state=5, shuffle=False)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 3, 2], dtype=int64)" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "auto_mpg['origin'].unique()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "from sklearn.tree import DecisionTreeClassifier\n", "from sklearn.svm import SVC\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.naive_bayes import GaussianNB\n", "\n", "from sklearn.metrics import confusion_matrix, classification_report, accuracy_score\n", "from matplotlib import pyplot as plt" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[52 9 2]\n", " [ 7 7 6]\n", " [ 2 13 22]]\n", " precision recall f1-score support\n", "\n", " 1 0.85 0.83 0.84 63\n", " 2 0.24 0.35 0.29 20\n", " 3 0.73 0.59 0.66 37\n", "\n", " accuracy 0.68 120\n", " macro avg 0.61 0.59 0.59 120\n", "weighted avg 0.71 0.68 0.69 120\n", "\n", "0.675\n" ] } ], "source": [ "dt = DecisionTreeClassifier(criterion='gini', max_depth=6, random_state=0)\n", "dt.fit(X_train, y_train)\n", "y_pred_dt = dt.predict(X_test)\n", "\n", "print(confusion_matrix(y_test, y_pred_dt))\n", "print(classification_report(y_test, y_pred_dt))\n", "print(accuracy_score(y_test, y_pred_dt))" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[58 3 2]\n", " [14 3 3]\n", " [16 4 17]]\n", " precision recall f1-score support\n", "\n", " 1 0.66 0.92 0.77 63\n", " 2 0.30 0.15 0.20 20\n", " 3 0.77 0.46 0.58 37\n", "\n", " accuracy 0.65 120\n", " macro avg 0.58 0.51 0.51 120\n", "weighted avg 0.63 0.65 0.61 120\n", "\n", "0.65\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "E:\\Programs\\Miniconda3\\envs\\myenv\\lib\\site-packages\\sklearn\\svm\\_base.py:231: ConvergenceWarning: Solver terminated early (max_iter=500). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n", " % self.max_iter, ConvergenceWarning)\n" ] } ], "source": [ "sv = SVC(gamma='scale', degree=7, probability=True, kernel='poly', max_iter=500)\n", "sv.fit(X_train, y_train)\n", "y_pred_sv = sv.predict(X_test)\n", "print(confusion_matrix(y_test, y_pred_sv))\n", "print(classification_report(y_test, y_pred_sv))\n", "print(accuracy_score(y_test, y_pred_sv))" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(array([10], dtype=int64),)\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "scores = []\n", "for k in range(2,100):\n", " kn = KNeighborsClassifier(n_neighbors=k)\n", " kn.fit(X_train, y_train)\n", " scores.append(kn.score(X_test, y_test))\n", "\n", "plt.plot(scores)\n", "print(np.where(scores == np.max(scores)))" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[57 0 6]\n", " [ 9 1 10]\n", " [14 2 21]]\n", " precision recall f1-score support\n", "\n", " 1 0.71 0.90 0.80 63\n", " 2 0.33 0.05 0.09 20\n", " 3 0.57 0.57 0.57 37\n", "\n", " accuracy 0.66 120\n", " macro avg 0.54 0.51 0.48 120\n", "weighted avg 0.60 0.66 0.61 120\n", "\n", "0.6583333333333333\n" ] } ], "source": [ "kn = KNeighborsClassifier(n_neighbors=12)\n", "kn.fit(X_train, y_train)\n", "y_pred_kn = kn.predict(X_test)\n", "\n", "print(confusion_matrix(y_test, y_pred_kn))\n", "print(classification_report(y_test, y_pred_kn))\n", "print(accuracy_score(y_test, y_pred_kn))" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[49 4 10]\n", " [ 6 0 14]\n", " [ 6 5 26]]\n", " precision recall f1-score support\n", "\n", " 1 0.80 0.78 0.79 63\n", " 2 0.00 0.00 0.00 20\n", " 3 0.52 0.70 0.60 37\n", "\n", " accuracy 0.62 120\n", " macro avg 0.44 0.49 0.46 120\n", "weighted avg 0.58 0.62 0.60 120\n", "\n", "0.625\n" ] } ], "source": [ "lg = LogisticRegression()\n", "lg.fit(X_train, y_train)\n", "y_pred_lg = lg.predict(X_test)\n", "\n", "print(confusion_matrix(y_test, y_pred_lg))\n", "print(classification_report(y_test, y_pred_lg))\n", "print(accuracy_score(y_test, y_pred_lg))" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[25 25 13]\n", " [ 2 4 14]\n", " [ 3 5 29]]\n", " precision recall f1-score support\n", "\n", " 1 0.83 0.40 0.54 63\n", " 2 0.12 0.20 0.15 20\n", " 3 0.52 0.78 0.62 37\n", "\n", " accuracy 0.48 120\n", " macro avg 0.49 0.46 0.44 120\n", "weighted avg 0.62 0.48 0.50 120\n", "\n", "0.48333333333333334\n" ] } ], "source": [ "nb = GaussianNB()\n", "nb.fit(X_train, y_train)\n", "y_pred_nb = nb.predict(X_test)\n", "print(confusion_matrix(y_test, y_pred_nb))\n", "print(classification_report(y_test, y_pred_nb))\n", "print(accuracy_score(y_test, y_pred_nb))" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[47 7 9]\n", " [ 6 3 11]\n", " [ 4 5 28]]\n", " precision recall f1-score support\n", "\n", " 1 0.82 0.75 0.78 63\n", " 2 0.20 0.15 0.17 20\n", " 3 0.58 0.76 0.66 37\n", "\n", " accuracy 0.65 120\n", " macro avg 0.54 0.55 0.54 120\n", "weighted avg 0.65 0.65 0.64 120\n", "\n", "0.65\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "E:\\Programs\\Miniconda3\\envs\\myenv\\lib\\site-packages\\sklearn\\svm\\_base.py:231: ConvergenceWarning: Solver terminated early (max_iter=500). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n", " % self.max_iter, ConvergenceWarning)\n" ] } ], "source": [ "from sklearn.ensemble import StackingClassifier\n", "\n", "clf1 = DecisionTreeClassifier(criterion='gini', max_depth=6, random_state=0)\n", "clf2 = SVC(gamma='scale', degree=7, probability=True, kernel='poly', max_iter=500)\n", "clf3 = KNeighborsClassifier(n_neighbors=12)\n", "clf4 = LogisticRegression()\n", "clf5 = GaussianNB()\n", "\n", "clf_stack = StackingClassifier(estimators=[('clf1', clf1), ('clf2', clf3), ('clf3', clf4)], final_estimator=clf2, n_jobs=-1)\n", "clf_stack.fit(X_train, y_train)\n", "y_pred_stack = clf_stack.predict(X_test)\n", "\n", "print(confusion_matrix(y_test, y_pred_stack))\n", "print(classification_report(y_test, y_pred_stack))\n", "print(accuracy_score(y_test, y_pred_stack))" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[60 2 1]\n", " [11 3 6]\n", " [14 3 20]]\n", " precision recall f1-score support\n", "\n", " 1 0.71 0.95 0.81 63\n", " 2 0.38 0.15 0.21 20\n", " 3 0.74 0.54 0.62 37\n", "\n", " accuracy 0.69 120\n", " macro avg 0.61 0.55 0.55 120\n", "weighted avg 0.66 0.69 0.65 120\n", "\n", "0.6916666666666667\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "E:\\Programs\\Miniconda3\\envs\\myenv\\lib\\site-packages\\sklearn\\svm\\_base.py:231: ConvergenceWarning: Solver terminated early (max_iter=500). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n", " % self.max_iter, ConvergenceWarning)\n" ] } ], "source": [ "from sklearn.ensemble import VotingClassifier\n", "\n", "clf1 = DecisionTreeClassifier(criterion='gini', max_depth=6, random_state=0)\n", "clf2 = SVC(gamma='scale', degree=7, probability=True, kernel='poly', max_iter=500)\n", "clf3 = KNeighborsClassifier(n_neighbors=12)\n", "clf4 = LogisticRegression()\n", "clf5 = GaussianNB()\n", "\n", "\n", "clf_vote = VotingClassifier(estimators=[('clf1', clf1), ('clf2', clf2), ('clf3', clf5), ('clf4', clf3)], voting='hard')\n", "clf_vote.fit(X_train, y_train)\n", "y_pred_vote = clf_vote.predict(X_test)\n", "\n", "print(confusion_matrix(y_test, y_pred_vote))\n", "print(classification_report(y_test, y_pred_vote))\n", "print(accuracy_score(y_test, y_pred_vote))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[49 7 7]\n", " [ 8 4 8]\n", " [ 8 3 26]]\n", " precision recall f1-score support\n", "\n", " 1 0.75 0.78 0.77 63\n", " 2 0.29 0.20 0.24 20\n", " 3 0.63 0.70 0.67 37\n", "\n", " accuracy 0.66 120\n", " macro avg 0.56 0.56 0.56 120\n", "weighted avg 0.64 0.66 0.65 120\n", "\n", "0.6583333333333333\n" ] } ], "source": [ "from sklearn.ensemble import BaggingClassifier\n", "\n", "base_clf = DecisionTreeClassifier(criterion='gini', max_depth=6, random_state=0)\n", "#base_clf = SVC(gamma='scale', degree=7, probability=True, kernel='poly', max_iter=500)\n", "#base_clf = KNeighborsClassifier(n_neighbors=12)\n", "#base_clf = LogisticRegression()\n", "#base_clf = GaussianNB()\n", "\n", "clf_bag = BaggingClassifier(base_estimator=base_clf)\n", "clf_bag.fit(X_train, y_train)\n", "y_pred_bag = clf_bag.predict(X_test)\n", "\n", "print(confusion_matrix(y_test, y_pred_bag))\n", "print(classification_report(y_test, y_pred_bag))\n", "print(accuracy_score(y_test, y_pred_bag))" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[51 7 5]\n", " [ 8 5 7]\n", " [ 8 3 26]]\n", " precision recall f1-score support\n", "\n", " 1 0.76 0.81 0.78 63\n", " 2 0.33 0.25 0.29 20\n", " 3 0.68 0.70 0.69 37\n", "\n", " accuracy 0.68 120\n", " macro avg 0.59 0.59 0.59 120\n", "weighted avg 0.67 0.68 0.67 120\n", "\n", "0.6833333333333333\n" ] } ], "source": [ "from sklearn.ensemble import RandomForestClassifier\n", "\n", "num_classifiers = 500\n", "rf = RandomForestClassifier(n_estimators=num_classifiers,\n", " criterion='gini', max_depth=6, min_samples_split=5)\n", "rf.fit(X_train, y_train)\n", "y_pred_rf = rf.predict(X_test)\n", "\n", "print(confusion_matrix(y_test, y_pred_rf))\n", "print(classification_report(y_test, y_pred_rf))\n", "print(accuracy_score(y_test, y_pred_rf))" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[49 5 9]\n", " [ 6 5 9]\n", " [ 4 3 30]]\n", " precision recall f1-score support\n", "\n", " 1 0.83 0.78 0.80 63\n", " 2 0.38 0.25 0.30 20\n", " 3 0.62 0.81 0.71 37\n", "\n", " accuracy 0.70 120\n", " macro avg 0.61 0.61 0.60 120\n", "weighted avg 0.69 0.70 0.69 120\n", "\n", "0.7\n" ] } ], "source": [ "from sklearn.ensemble import AdaBoostClassifier\n", "\n", "num_classifiers = 500\n", "learning_rate = 0.07\n", "base_clf = DecisionTreeClassifier(criterion='gini', max_depth=1, random_state=0)\n", "#base_clf = SVC(gamma='scale', degree=7, probability=True, kernel='poly', max_iter=500)\n", "#base_clf = LogisticRegression()\n", "#base_clf = GaussianNB()\n", "ada_clf = AdaBoostClassifier(base_estimator=base_clf,\n", " n_estimators=num_classifiers,\n", " learning_rate=learning_rate)\n", "ada_clf.fit(X_train, y_train)\n", "y_pred_ada = ada_clf.predict(X_test)\n", "\n", "print(confusion_matrix(y_test, y_pred_ada))\n", "print(classification_report(y_test, y_pred_ada))\n", "print(accuracy_score(y_test, y_pred_ada))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.12" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }