{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature Selection\n", "\n", "In this lab, we will be implementing Univariate Feature Selection and Sequential Feature Selection using Scikit-learn.\n", "\n", "## Part 1: Feature Selection using Univariate Feature Selection\n", "Univariate feature selection selects the best features based on univariate statistical tests. Scikit-learn provides several methods for features selection routines (which can be referred [here](http://scikit-learn.org/stable/modules/feature_selection.html). In this lab, we will look at how to use SelectKBest to selects the best features. SelectKBest works by removing all but the *k* highest scoring features. SelectKBest takes as input a scoring function that returns univariate scores. Several scoring functions are provided as follows.\n", "\n", "for classification: chi2, f_classif and mutual_info_classif
\n", "for regression: f_regression, mutual_info_regression\n", "\n", "- f_classif: ANOVA F-value between label/feature for classification tasks.\n", "- mutual_info_classif: Mutual information for a discrete target.\n", "- chi2: Chi-squared stats of non-negative features for classification tasks.\n", "- f_regression: F-value between label/feature for regression tasks.\n", "- mutual_info_regression: Mutual information for a continuous target." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import the standard modules to be used in this lab\n", "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agesexcptrestbpscholfbsrestecgthalachexangoldpeakslopecathaltarget
063131452331015002.30011
137121302500118703.50021
241011302040017201.42021
356111202360117800.82021
457001203540116310.62021
\n", "
" ], "text/plain": [ " age sex cp trestbps chol fbs restecg thalach exang oldpeak slope \\\n", "0 63 1 3 145 233 1 0 150 0 2.3 0 \n", "1 37 1 2 130 250 0 1 187 0 3.5 0 \n", "2 41 0 1 130 204 0 0 172 0 1.4 2 \n", "3 56 1 1 120 236 0 1 178 0 0.8 2 \n", "4 57 0 0 120 354 0 1 163 1 0.6 2 \n", "\n", " ca thal target \n", "0 0 1 1 \n", "1 0 2 1 \n", "2 0 2 1 \n", "3 0 2 1 \n", "4 0 2 1 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_pd = pd.read_csv('heart.csv')\n", "data_pd.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: Feature Selection using Sequential Feature Selection\n", "The implementation of Sequential Feature Selector is available in Mlxtend (machine learning extensions) library. We will be using a high dimensional dataset to demonstrate how RFE works.

\n", "First we import SequentialFeatureSelector from Mlxtend.feature_selection package. We will be using Decision Tree as the predictive model for the feature selection. Here, we set 5 as the number of features that we want to select and forward=True indicates Sequential Forward Selection. By choosing cv=0, we do not perform any cross-validation, which means the performance (accuracy) is computed entirely on the training set. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 3: Exercise\n", "The lab exercise uses auto mpg datasets.

The data is technical spec of cars.\n", "Attribute Information:\n", "1. mpg (miles per gallon): continuous\n", "2. cylinders: multi-valued discrete\n", "3. displacement: continuous\n", "4. horsepower: continuous\n", "5. weight: continuous\n", "6. acceleration: continuous\n", "7. model year: multi-valued discrete\n", "8. origin: multi-valued discrete\n", "9. car name: string (unique for each instance)\n", "\n", "We would like to predict miles per gallon (mpg) using decision tree regressor. \n", "1. Perform feature reduction on the dataset to reduce the dimension of the dataset.\n", "2. Perform feature selection on the dataset to select the most relevant attributes to predict mpg. " ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodel_yearorigincar_name
018.08307.0130.03504.012.0701chevrolet chevelle malibu
115.08350.0165.03693.011.5701buick skylark 320
218.08318.0150.03436.011.0701plymouth satellite
316.08304.0150.03433.012.0701amc rebel sst
417.08302.0140.03449.010.5701ford torino
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight acceleration \\\n", "0 18.0 8 307.0 130.0 3504.0 12.0 \n", "1 15.0 8 350.0 165.0 3693.0 11.5 \n", "2 18.0 8 318.0 150.0 3436.0 11.0 \n", "3 16.0 8 304.0 150.0 3433.0 12.0 \n", "4 17.0 8 302.0 140.0 3449.0 10.5 \n", "\n", " model_year origin car_name \n", "0 70 1 chevrolet chevelle malibu \n", "1 70 1 buick skylark 320 \n", "2 70 1 plymouth satellite \n", "3 70 1 amc rebel sst \n", "4 70 1 ford torino " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "auto_mpg = pd.read_csv(\"auto_mpg.csv\", delim_whitespace=True)\n", "auto_mpg.head()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "SelectKBest(k=5, score_func=)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.feature_selection import SelectKBest\n", "from sklearn.feature_selection import f_regression\n", "kBest = SelectKBest(f_regression, k=5)\n", "kBest.fit(auto_mpg.iloc[:,1:-1], auto_mpg['mpg']) # run the score function on the data" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3 5]\n", "[597.07704785 724.99430337 604.99596063 888.85068265 84.95770025\n", " 199.98200802 184.19963937]\n" ] } ], "source": [ "idx = kBest.get_support(True)\n", "print(idx)\n", "scores = kBest.scores_\n", "print(scores)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "E:\\Programs\\Miniconda3\\envs\\myenv\\lib\\site-packages\\ipykernel_launcher.py:2: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n", " \n" ] } ], "source": [ "auto_mpg_kbest = auto_mpg.iloc[:,idx+1]\n", "auto_mpg_kbest['mpg'] = auto_mpg['mpg']" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "from sklearn.tree import DecisionTreeRegressor\n", "from sklearn.model_selection import train_test_split\n", "X_train, X_test, y_train, y_test = train_test_split(auto_mpg_kbest.iloc[:,0:-1], auto_mpg_kbest.iloc[:,-1], test_size=0.3, random_state=5, shuffle=False)\n", "tree = DecisionTreeRegressor()\n", "path = tree.cost_complexity_pruning_path(X_train, y_train)\n", "ccp_alphas, impurities = path.ccp_alphas, path.impurities" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 1.0, 'Total Impurity vs effective alpha for training set')" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEWCAYAAABv+EDhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3deZwcdZ3/8dc7kwkMEByQACEQAuiiIEhwQA5XOXRBREAEBEXxYEFdRXRF4Qdy7Kqw5LfqrqyLWRVRIKIcWUQBOX94cAUChFMR5BiuAIYzAgmf3x/f70Cn091Tc3T3TNf7+XjMY6qrquv76erq+lR961vfUkRgZmblM6HdAZiZWXs4AZiZlZQTgJlZSTkBmJmVlBOAmVlJOQGYmZWUE0ALSVpRUkhat92xDEbSuyXd0u44hkLSLElPSvpLfr2fpH5Jz0l68yiW0/J1M5RtZ7S3M0k7SPpzXo+7jsYyR0rSpyT9crTnLRuV/T4ASc9VvFwJeBFYml8fGhFnNnjvrsApEfGGgmWtCCwG1ouIh2pMvzYv74yi8beKpEeBfSLid+2OpRZJbwTmA9Mj4qk8rh/4ZERcMoLlNvzOWmUocYx2zJJ+D/wkIr4/0mXl5f0MuC0ivj4ayxtvJL2J9PkntjuWtgfQbhGxysBwPnI8OCIua19E7SdpYkQsaXccQ7Q+8GjFzr8bWAe4va1RdYb1GeZ6HM62NE63v/EpIvyX/4C/AO+uGtcD/BfwCPAQMAvoBl5POsp6BXgu/70e2B64DngaeBj4NjAxL2tFIIB165R/LXBgHt4VuAc4BngC6Ad2A/YE/gw8CfxzxXtPAuYA5wLPAjcAm9YrF/gZcExVWV8DHgP+Z2Bcnv6L/DlfyJ/zMOBy4B+r4v8jsGuNz3UVKbFWjrs7f54u4BRgYV5ntwAb11k/qwM/AR4FHgSOI1Vj7l71XZyW/wfwPHB7fv96wP/m9Xkv8OmKZU/My7sXeCavv7WB6yuW8xywV9W6OR44oyrO7wMnN4q5zucrvO3k7++7wJX5+74cmFY17z/mbeWvwLcrynlT/k6eyuv9dGBynZgeqvzu87jpwK/z+/8IHFS1HZ4FnJ3jOrBqeYcBL5POtJ8DfpHHPwp8mZRoXsjjjgXuy8u5DXhfxXI+DVxW8PMOZd6JwH+Sfl9/zvEuabDP+Bpp3/AMcCfw93l8V552L2l7OxPozdMezzEM7Ddmtm2f166Cx+IftRPAycBvgTWAtUg7hqPztFd3BBXzbw1slTeAjUg71k9XbXxFE8DLwFfzRvn5/CP5KbAyMBP4G6/96E8CXgL2ICWoY0g72a5a5bJ8AlgC/AswiZT0lvlsuex3VLz+GPD/Kl6/Pc/TVeNzHQJcXvH6baQdz0RSQrsGWJW0M98UWLPO+rmItNNbCZhKqvI5qNZ3Uf2Z83pYkNfnJODvgAeAd+XpX8vLe0OOYybQW2fdVSaAvyPtoHry627SzmOLwWKu8fkKbzv5+1sEbJunncryO7nz8nrdIM+7Q57+JmCnvB7WJm13JzX4XVR/99eRktMKQB8pEWxfsR2+SEruEwbWS9XyXt32qsq4gXTWNrAuP5TX2QTgo3k9r5Gn1dqp1/u8Q5n3cNJByFTSAd3V1EkAwFtJO/i1AAEbAhvkaUeS9hvr5DJ/DJxWsf7rJpWW7vPaHcBY+qN2AugHdqp4vSdwVx5eLgHUWOaRwJyqja9oAniafLQITMnvfWvF/LeTj7jzD++qimkTSTuirWqVy/IJ4Hmgu2L6YAlgZdJRz/T8+hTgW3U+1+qkI/Sp+fW/A9/Lw7vlz7E1+ZpUnWWsXyPGTwAX1Ym3eof5LuBPVcs8AfjvPHw/sEuNchsmgPx6HrBfHn4/cEeRmAtsj3W3nfz9/bhqHUfeTgbm7auYfgFweJ1y9geuaRDHq9898EbSgUdPxfRvA6dWbIe/GeRz1UsAHx7kfXcNfEfU3qnX/LxDnPcPLHtGszv1E8CmpKP/HclnahXT7iMnxfx6A9JZlBhDCcCtgBqQJNIR0v0Vo+8HpjV4zyaSLpL0mKRnSKexawwzhIUR8UoeXpz/P1YxfTGwSsXrBwcGItWhPkw6Aini0Yh4uWhgEfE86SjqI7m+/UOks5Na8z4FXArsJ2lCnnfg4vpFwA9J1SaPSfqepFVqLGZ90o93oaRFkhYB/0E6+ipifWDGwHvz+78ErJ2/52mkU/7hOAs4IA9/mNc+25BiHsa2U/l9P0WqTqj8vh+tGH6BvK1IWkfSL3ILqWeAHwxSTqV1SNvl4opx1b+JBxmeZd6XW+/cWrHu3jBInDU/7xDnXacqjrqfJSJuJyXpbwCPSzpT0lp5e1oP+HVF7PNJZzKvbxBTyzkBNBApdT9K+iEPmE46K4B0JFHtf4CbgI0iYlVStYqaGWeF9QYGJHWRNuaHSVVDL5OqIQasXfXeWp9lsOmnAweSjogfi4j5Dd4/h7STfBepuukPkNZxRHwrImYCm5NOq79Q4/0PknZwq0VEb/5bNSK2HCTuyvffVfHe3oiYHBEfyN9zP6napdpg6wVSffcukqaRzgDmDDPmoW47ld/36qSd2CMF4p1FOjN5Sy7n4EHKqfQwMEVST8W4yt8EDG9bWma8pL8jVZ0dAqweEb2kKrFm/5YeASqbz65Xb0aAiDg9IrYjVf+sCHy9YnvaqWp7WzEinqDYNtUSTgCDmwMcJ+n1ktYEjgYGmmk+BqxZdcQ6GXg6Ip6TtCnpYlOrbCdp93xE/hVSFdBN+SxiAelovUvS+0l1x0PxGGkjr3QVaafzDdKFzkb+l3TKfDSpWiMAJG0jqU/SRNJO6SVea4b7qoi4j1RFdrKkyZImSHqjpHcUjP93ubzDczv5iZI2lzSwM/4B8E1JGyqZKak3Il4kVcVVf/bK2PpJ9eI/BhZExL3DjHmo286ekt4uaQXg68CVEfF4gXUxmZSYnpE0nXQmVNQ9wK3A1yWtkNffQbx21lNErW2p2iqki88LgQmSPk06A2i2nwNflLS2pNeTLkzXlM/Y3pXX/+L8N7DtngqcJGm9PO+a+XcH6SJwV173beUEMLhjgTtI9dQ3A78nXRiGdLHoAuD+fKq3OvBF4OB8f8F/kY4OW+Vc4JOklg0fBD4YEQMb5OdIVS9/BT4AXDjEZX8D+Eb+nJ+DV8+QfkrasZ/V6M0R8QJpXe1cNW8vace5iHRB7X5SK4xaDsjz30W68Hg2BauAcvXWbsB2uYyFwH/z2qn/ScCvgCtI1zZOJV3khLQN/CJ/9j3qFHEW8G6WXw9DiXmo284ZOe4ngDeTdsRFHAu8g5TYzidtN4Xk73w/YBPS2fHZwBER8duiywBmA1vl9fmzOuXcRPoO5pGOyjfIw812Cuns9A7SRekLSRe1a+khXc96Ise4CmndQtpHXAZcIenZvMwtASLir3n6jXkdbNGcjzK40t8I1ikknURqIXFwi8s9hHQB9N2tLLfsyn4zVatI+gCphdTG7Y6lGXwGYMMmaWXgM6QjOrNxL1fV/UOuKp1Oak59frvjahYnABuWXBXyOKlO+Jw2h2M2WiaQqtWeJlUB3US6vtKRXAVkZlZSPgMwMyupcdEZ3BprrBEzZsxodxhmZuPKjTfe+ERETKk3fVwkgBkzZjBvXitagJmZdQ5J9zea7iogM7OScgIwMyspJwAzs5JyAjAzK6mmJQBJP5L0uKTbakz7stJDq4fbTbKZmY1QM1sB/ZjUsdIyvUTm3vHeQ3oak5mZ1TB3fj+zLrmbhxctZp3eHo7YZWP2mln3USTD0rQzgIi4mtT7YbVvk7oq9i3IZmY1zJ3fz1HnLaB/0WIC6F+0mKPOW8Dc+f2DvncoWnoNIPcf0x8RtxSY9xBJ8yTNW7hwYQuiMzMbG2ZdcjeLX172sRiLX17KrEvuHtVyWpYAJK1EehjIsYPNCxARsyOiLyL6pkypeyObmVnHeXjR4iGNH65WngFsRHqowy2S/kJ67NpNkqofTWhmVmrr9PYMafxwtSwBRMSCiFgzImZExAzgIWDLiHh0kLeamZXKEbtsTE931zLjerq7OGKX0X0uTTObgc4BrgE2lvSQpE81qywzs06y18xpnLj3ZkzqSrvoab09nLj3ZqPeCqhpzUAj4oBBps9oVtlmZuPdXjOnMef61Fr+7EO3bUoZvhPYzKyknADMzErKCcDMrKScAMzMSsoJwMyspJwAzMxKygnAzKyknADMzErKCcDMrKScAMzMSsoJwMyspJwAzMxKygnAzKyknADMzErKCcDMrKScAMzMSsoJwMyspJwAzMxKygnAzKyknADMzEqqaQlA0o8kPS7ptopxsyTdJelWSedL6m1W+WZm1lgzzwB+DOxaNe5S4C0RsTnwR+CoJpZvZmYNNC0BRMTVwFNV434TEUvyy2uBdZtVvpmZNdbOawCfBC6qN1HSIZLmSZq3cOHCFoZlZlYObUkAko4GlgBn1psnImZHRF9E9E2ZMqV1wZmZlcTEVhco6SBgd2DniIhWl29mZklLE4CkXYGvAu+KiBdaWbaZmS2rmc1A5wDXABtLekjSp4BTgMnApZJulnRqs8o3M7PGmnYGEBEH1Bj9w2aVZ2ZmQ+M7gc3MSsoJwMyspJwAzMxKygnAzKyknADMzErKCcDMrKScAMzMSsoJwMyspJwAzMxKatAEIGkjSSvk4R0kHeYneZmZjX9FzgDOBZZKegOpK4cNgLOaGpWZmTVdkQTwSn6K1weA70TEF4GpzQ3LzMyarUgCeFnSAcBBwIV5XHfzQjIzs1YokgA+AWwLfCMi7pO0AXBGc8MyM7NmG7Q76Ii4Q9JXgen59X3ASc0OzMzMmqtIK6D3AzcDF+fXW0i6oNmBmZlZcxWpAjoe2BpYBBARN5NaApmZ2ThWJAEsiYinq8b5Ye5mZuNckUdC3ibpw0CXpDcChwF/aG5YZmbFzZ3fz6xL7ubhRYtZp7eHI3bZmL1mTmt3WGNekTOAzwObAi+SbgB7Gjh8sDdJ+pGkxyXdVjFudUmXSvpT/r/acAM3M4O08z/qvAX0L1pMAP2LFnPUeQuYO7+/3aGNeYpoXJsjaWZEzB/ygqV3As8BP4mIt+RxJwNPRcRJko4EVouIrw62rL6+vpg3b95QQzCzEtj+pCvoX7R4ufGTuiYwc/r47rXmjkeeYZOpq3L2odsO6/2SboyIvnrTi5wBfEvSXZL+VdKmRQuOiKuBp6pG7wmcnodPB/Yqujwzs1oerrHzB3hp6SstjmT0bTJ1VfbconlVWUXuA9hR0trAfsBsSasCZ0fE14dR3loR8Uhe7iOS1qw3o6RDgEMApk+fPoyizKwM1untqXkGMK23Z9hHzmVRqDvoiHg0Iv4T+DTpnoBjmxpVKnN2RPRFRN+UKVOaXZyZjVNH7LIxPd1dy4zr6e7iiF02blNE40eRG8HeLOn4fDH3FFILoHWHWd5jkqbm5U4FHh/mcszMANhr5jRO3HszJnWl3dm03h5O3HsztwIqoEgz0NOAOcA/RMTDIyzvAlKncifl//87wuWZmbHXzGnMuf4BAFf7DEGRawDbDGfBkuYAOwBrSHoIOI604/+5pE8BDwD7DmfZZmY2coMmgHzz14nAJsCKA+MjYsNG74uIA+pM2nkoAZqZWXMUuQh8GvDfwBJgR+AnwE+bGZSZmTVfkQTQExGXk24auz8ijgd2am5YZmbWbEUuAv9N0gTgT5I+B/QDddvvm5nZ+FDkDOBwYCVSJ3BvAw4kteAxM7NxrEgroBsAJEVEfKL5IZmZWSsUuRFsW0l3AHfm12+V9L2mR2ZmZk1VpAroO8AuwJMAEXEL8M5mBmVmZs1XtC+gB6tGLW1CLGZm1kJFWgE9KGk7ICRNIl0MvrO5YZmZWbMVOQP4NPBPwDTgIWCL/NrMzMaxIq2AngA+0oJYzMysheomAEnfBeo+LzIiDmtKRGZm1hKNzgD8EF4zsw5WNwFExOn1ppmZ2fhXqBmomZl1HicAM7OSqpsAJP1b/u+ndpmZdaBGZwC7SeoGjmpVMGZm1jqNWgFdDDwBrCzpGUCkZqECIiJWbUF8ZmbWJHXPACLiiIh4HfCriFg1IiZX/h9JoZK+KOl2SbdJmiNpxcHfZWZmo2nQi8ARsaektSTtnv+mjKRASdNI/Qn1RcRbgC5g/5Es08zMhq7I8wD2Ba4H9gX2A66XtM8Iy50I9EiaSHra2MMjXJ6ZmQ1Rkd5AjwG2iojHAfIZwGXAOcMpMCL6Jf1f4AFgMfCbiPhN9XySDgEOAZg+ffpwijIzswaK3AcwYWDnnz1Z8H01SVoN2BPYAFiHdJH5wOr5ImJ2RPRFRN+UKSOqdTIzsxqKnAFcLOkSYE5+/SHg1yMo893AfRGxEEDSecB2wBkjWKaZmQ1Rke6gj5C0N/AOUhPQ2RFx/gjKfADYRtJKpCqgnXHHc2ZmLVfkDICIOA84bzQKjIjrJJ0D3AQsAeYDs0dj2WZmVlyhBDDaIuI44Lh2lG1mZok7gzMzK6ki9wHsLsmJwsyswxTZse8P/EnSyZLe3OyAzMysNYp0BXEgMBP4M3CapGskHSJpctOjMzOzpilUtRMRzwDnAj8DpgIfAG6S9PkmxmZmZk1U5BrAHpLOB64AuoGtI+K9wFuBLzc5PjMza5IizUD3Ab4dEVdXjoyIFyR9sjlhmZlZsxWpAnqkeuc/8LjIiLi8KVGZmVnTFUkA76kx7r2jHYiNrrnz+9n+pCvY4Mhfsf1JVzB3fn+7QzKzMaZuFZCkzwCfBTaSdGvFpMnA75sdmA3f3Pn9HHXeAha/vBSA/kWLOeq8BQDsNXNaO0MzszGk0TWAs4CLgBOBIyvGPxsRTzU1KhuRWZfc/erOf8Dil5fylXNuZc71D7QpKrPmuuORZ9hkqh9VPhSNEkBExF8k/VP1BEmrOwmMXQ8vWlxz/EtLX2lxJGats8nUVdlzC5/hDsVgZwC7AzcCQeoKekAAGzYxLhuBdXp76K+RBKb19nD2odu2ISIzG4vqXgSOiN0lCXhXRGwYERtU/HnnP4YdscvG9HR3LTOup7uLI3bZuE0RmdlY1LAVUEQEMJKHv1gb7DVzGifuvRmTutLXO623hxP33swXgM1sGUVuBLtW0lYRcUPTo7FRs9fMaa9e8HW1j5nVUiQB7AgcKul+4HnStYCIiM2bGpmZmTVVkQTgm77MzDpQkQQQTY/CzMxarkgC+BWvNQNdEdgAuBvYtIlxmZlZkw2aACJis8rXkrYEDh1JoZJ6gR8AbyEll09GxDUjWWYnmTu/n1mX3M3DixazTm8PR+yysVvwmNmoK3IGsIyIuEnSViMs9z+AiyNiH0mTgJVGuLyO4X58zKxVBk0Akr5U8XICsCWwcLgFSloVeCfwcYCIeAl4abjL6zSj2Y+P+0Yxs0aKdAc9ueJvBdI1gT1HUOaGpARymqT5kn4gaeXqmfJzh+dJmrdw4bDzzbgzmv34uG8UM2ukyDWAE+DVI/eIiGdHocwtgc9HxHWS/oPU2+jXqsqdDcwG6OvrK01LJPfjY2atUuSZwH2SFgC3Agsk3SLpbSMo8yHgoYi4Lr8+h5QQDPfjY2atU6QK6EfAZyNiRkTMAP4JOG24BUbEo8CDkgb2aDsDdwx3eZ3G/fiYWasUaQX0bET8duBFRPxO0kirgT4PnJlbAN0LfGKEy+so7sfHzFqhSAK4XtL3gTmkNvsfAq7K9wMQETcNtdCIuBnoG+r7xhO35Tezsa5IAtgi/z+uavx2pISw06hG1AHclt/MxoMirYB2bEUgnWQ02vK7Db+ZNVuRG8F6gY8BMyrnj4jDmhfW+DYabfndht/Mmq1IFdCvgWuBBYCfKl6A2/Kb2XhQJAGsGBFfGnw2G3DELhsvcw0A3JbfzMaeIgngp5L+EbgQeHFgZEQ81bSoxpFGrX2+cs6tvLT0Faa5FZCZjUFFEsBLwCzgaF57OEyQ+vQptcFa+7gtv5mNZUUSwJeAN0TEE80OZrwZrLWPW/KY2VhWpCuI24EXmh3IeDRYax+35DGzsazIGcBS4GZJV7LsNYDSNwN1ax8zG8+KJIC5+a+06l3odWsfMxvPitwJfHorAhmrinTr4NY+ZjYe1U0Akn4eEfvlZwEs90CWiNi8qZGNEUW6dVihewIzp/e62sfMxpVGZwBfyP93b0UgY1WRbh18sdfMxqO6CSAiHsn/729dOGNP70rd/PWFl5cbv9pK3T7iN7NxrUgz0FKLOk8jrjfezGy8cAIYxNOLlz/6bzTezGy8cAIYRO9K3UMab2Y2XjRqBVSz9Q8gIMrSCshVQGbWqRq1Ampq6x9JXcA8oD8ixmxLI1cBmVmnatQKqNmtf74A3AmMud7SKu/8nSCxtMbh/jq9PW2IzMxs9Ax6DUDSNpJukPScpJckLZX0zEgKlbQu8D7gByNZTjMM3Pnbv2gxATV3/u7uwcw6QZGLwKcABwB/AnqAg4HvjrDc7wBfocEjJiUdImmepHkLFy4cYXHF1brzt9K03h5O3Hszd/dgZuNekc7giIh7JHVFxFLgNEl/GG6BknYHHo+IGyXt0KDM2cBsgL6+vpZdcq135y/ANz+wGR9++/RWhWJm1lRFEsALkiaRuoQ+GXgEWHkEZW4P7CFpN2BFYFVJZ0TEgSNY5qhp1MWzd/5m1kmKVAF9NM/3OeB5YD1g7+EWGBFHRcS6ETED2B+4Yqzs/OfO7+f5F5csN951/mbWiYokgL0i4m8R8UxEnBARX6IDO4gbuPi7qKp552ordbvO38w6UpEEcFCNcR8fjcIj4qqxcg9AvYu/K02a6J2/mXWkRncCHwB8GNhA0gUVk1YFnmx2YK1Wq94fGl8UNjMbzxpdBP4D6YLvGsC/V4x/Fri1mUG12tz5/al/ixrTfMOXmXWqwe4Evh/YVtJawFZ50p0RsfyV0nHshF/eXrfTI1/8NbNOVeRO4H2B64F9gf2A6yTt0+zAWmXu/P6aD3yBdEbg+n8z61RF7gM4BtgqIh4HkDQFuAw4p5mBtcoJv7y97rRprv4xsw5WpBXQhIGdf/ZkwfeNeY2O/sHVP2bW2YqcAVws6RJgTn79IeCi5oXUOo2O/nt7ul39Y2YdbdAEEBFHSNobeAfpuujsiDi/6ZE12TFzFzQ8+j9+j01bGI2ZWesNmgAk/VtEfBU4r8a4cWnu/H7OvPaButN99G9mZVCkLv89Nca9d7QDaaVZl9xds9nnAB/9m1kZNLoT+DPAZ4ENJVXe+DUZ+H2zA2umenf9go/+zaw8GlUBnUW62HsicGTF+Gcj4qmmRtVEc+f3N5zuo38zK4tGdwI/DTxNehpYx2jU8gd845eZlUdHtOcfikYtf3zjl5mVSekSQCO+8cvMyqR0CWBSl2qO757g6h8zK5dSJYC58/t5aWntBqArr9Dd4mjMzNqrVAng6PMX1J329OL61wbMzDpRaRLA3Pn9PP/S8o98HOAHv5hZ2bQ8AUhaT9KVku6UdLukL7Si3MGaf/oCsJmVTZHeQEfbEuCfI+ImSZOBGyVdGhF3NLPQRs0/e7on+AKwmZVOy88AIuKRiLgpDz8L3Ak0de872N2/J+69eTOLNzMbk9p6DUDSDGAmcF0zy/Hdv2Zmy2tbApC0CnAucHhEPFNj+iGS5kmat3DhwhGV1aj6p7fHzT/NrJzakgAkdZN2/mdGxHm15omI2RHRFxF9U6ZMaVos7vzNzMqqHa2ABPwQuDMivtXq8qu5+sfMyqodZwDbAx8FdpJ0c/7brQ1xmJmVWsubgUbE70jPFm6JY+bWv/vXzKzMOv5O4EbP/jUzK7OOTwCNnv3b093xH9/MrK5S7wF9A5iZlVlHJ4DB7gB2CyAzK7OOTgBfPffWdodgZjZmdXQCeHHJK+0OwcxszOroBNCIHwBvZmVX2gTg/v/NrOxKmwB8AdjMyq60CcDMrOw6NgG4Cwgzs8Y6NgGc4S4gzMwa6tgEYGZmjZUyAXSpZZ2RmpmNWaVMAAe8fb12h2Bm1nalTABf32uzdodgZtZ2pUwAZmbmBGBmVlpOAGZmJdWRCWDGkb9qdwhmZmNeWxKApF0l3S3pHklHtiMGM7Oya3kCkNQF/BfwXmAT4ABJm7Q6DjOzsmvHGcDWwD0RcW9EvAT8DNizDXGYmZVaOxLANODBitcP5XHLkHSIpHmS5i1cuHDUCl9r8qRRW5aZ2XjWjgRQqx+GWG5ExOyI6IuIvilTpoxa4dcd/Z5RW5aZ2XjWjgTwEFDZF8O6wMNtiMPMrNTakQBuAN4oaQNJk4D9gQtGs4C/nPS+IY03Myujia0uMCKWSPoccAnQBfwoIm4f7XK8szcza6zlCQAgIn4N/LodZZuZWdKRdwKbmdngnADMzErKCcDMrKScAMzMSkoRy92DNeZIWgjcP8y3rwE8MYrhjCbHNnxjOT7HNjyObXgaxbZ+RNS9k3ZcJICRkDQvIvraHUctjm34xnJ8jm14HNvwjCQ2VwGZmZWUE4CZWUmVIQHMbncADTi24RvL8Tm24XFswzPs2Dr+GoCZmdVWhjMAMzOrwQnAzKykOiYBDPageUkrSDo7T79O0owxFNvHJS2UdHP+O7iFsf1I0uOSbqszXZL+M8d+q6Qtx1BsO0h6umK9HdvC2NaTdKWkOyXdLukLNeZpy7orGFtb1p2kFSVdL+mWHNsJNeZpy2+1YGxt+63m8rskzZd0YY1pQ19vETHu/0jdSv8Z2BCYBNwCbFI1z2eBU/Pw/sDZYyi2jwOntGndvRPYEritzvTdgItIT3LbBrhuDMW2A3Bhm9bbVGDLPDwZ+GON77Ut665gbG1Zd3ldrJKHu4HrgG2q5mnXb7VIbG37rebyvwScVeu7G85665QzgCIPmt8TOD0PnwPsLKnW4ynbEVvbRMTVwFMNZtkT+Ekk1wK9kqaOkdjaJiIeiYib8vCzwJ0s/2zrtqy7grG1RV4Xz+WX3fmvuiVKW36rBWNrG0nrAu8DflBnliGvt05JAEUeNP/qPBGxBHgaeP0YiQ3gg7ma4BxJ69WY3i5F42+XbfMp+0WSNm1HAPlUeybpiLFS29ddg9igTesuV2PcDDwOXBoRdddbi3+rRWKD9v1WvwN8BRe3/r4AAAU9SURBVHilzvQhr7dOSQBFHjRf6GH0TVCk3F8CMyJic+AyXsviY0G71lsRN5H6Onkr8F1gbqsDkLQKcC5weEQ8Uz25xltatu4Gia1t6y4ilkbEFqTngW8t6S1Vs7RtvRWIrS2/VUm7A49HxI2NZqsxruF665QEUORB86/OI2ki8DpaU70waGwR8WREvJhf/g/wthbEVVSRddsWEfHMwCl7pKfMdUtao1XlS+om7WDPjIjzaszStnU3WGztXne53EXAVcCuVZPa9VsdNLY2/la3B/aQ9BdSNfJOks6ommfI661TEkCRB81fAByUh/cBroh8taTdsVXVC+9BqrMdKy4APpZbtGwDPB0Rj7Q7KABJaw/UcUramrQ9P9misgX8ELgzIr5VZ7a2rLsisbVr3UmaIqk3D/cA7wbuqpqtLb/VIrG167caEUdFxLoRMYO0D7kiIg6smm3I660tzwQebVHnQfOS/gWYFxEXkH4QP5V0Dykr7j+GYjtM0h7Akhzbx1sRG4CkOaQWIWtIegg4jnTxi4g4lfTs5t2Ae4AXgE+Modj2AT4jaQmwGNi/RUkd0hHZR4EFuc4Y4P8A0yvia9e6KxJbu9bdVOB0SV2kpPPziLhwLPxWC8bWtt9qLSNdb+4KwsyspDqlCsjMzIbICcDMrKScAMzMSsoJwMyspJwAzMxKygnAOoKkfZV6v7wyv56Tb9f/4hCX0yvpsxWv15F0zmjHW1Xmc6Mxj9lQuRmodQRJFwP/FhFXSlqb1PPm+sNYzgxST4vVXQA0jaTnImKVkc5jNlQ+A7BxRdKBSn223yzp+7nzrmOBdwCnSpoF/AZYM8/z95I2knSxpBsl/VbSm/Ky1pJ0fu4Q7RZJ2wEnARvl986SNEP5eQRKfaxvWhHLVZLeJmllpWcX3KDUV/tyvb1KWkXS5ZJukrSgzjw7SLo6x3SHpFMlTaiY/o0c57WS1srj3p/jmi/psoHxZoUMpS9q//mvnX/Am0mdcXXn198DPpaHrwL68vAMKp4hAFwOvDEPv510izzA2aSO0iDdpf26Gu999TXwReCEPDwV+GMe/iZwYB7uJfW/v3JV7BOBVfPwGqS7gwfOwJ/L/3cA/kZ6dkQXcCmwT54WwPvz8MnAMXl4tYrlHAz8e7u/J/+Nn7+O6ArCSmNnUudbN+RubHpI3fbWpdQj5nbAL/Ra1+gr5P87AR+D1Ask8LSk1Ros7ueknfJxwH7AL/L4fyB11PXl/HpFUrcLlf3ECPimpHeSuvOdBqwFPFpVxvURcW+OfQ7pzOYc4CVg4ClQNwLvycPrAmfnPmomAfc1iN9sGU4ANp4IOD0ijhrCeyYAiyJ18TsiEdEv6UlJmwMfAg6tiOuDEXF3g7d/BJgCvC0iXlbq1XHFWsXUef1yRAwML+W13+53gW9FxAWSdgCOH8JHspLzNQAbTy4H9pG0JoCk1SU1vNAbqR/8+yTtm98jSW+tWN5n8vguSasCz5Ieo1jPz0gP5XhdRCzI4y4BPl/Ru+bMGu97Hak/95cl7QjUi3trpZ5jJ5CSzO8afb683P48fFCjGc2qOQHYuBERdwDHAL+RdCupOqbIIxY/AnxK0i3A7bz2SM4vADtKWkCqVtk0Ip4Efi/ptnxBudo5pF4Wf14x7l9JvZTemi8Y/2uN950J9Emal+Op7gJ5wDWkC9G3kapzzh/ksx1Pqt76LfDEIPOaLcPNQM3GiFyF8+WI2L3dsVg5+AzAzKykfAZgZlZSPgMwMyspJwAzs5JyAjAzKyknADOzknICMDMrqf8Pnzd10LUUOpwAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from matplotlib import pyplot as plt\n", "fig, ax = plt.subplots()\n", "ax.plot(ccp_alphas[:-1], impurities[:-1], marker='o', drawstyle=\"steps-post\")\n", "ax.set_xlabel(\"effective alpha\")\n", "ax.set_ylabel(\"total impurity of leaves\")\n", "ax.set_title(\"Total Impurity vs effective alpha for training set\")" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of nodes in the last tree is: 1 with ccp_alpha: 25.178707081037665\n" ] } ], "source": [ "clfs = []\n", "for ccp_alpha in ccp_alphas:\n", " clf = DecisionTreeRegressor(random_state=0, ccp_alpha=ccp_alpha)\n", " clf.fit(X_train, y_train)\n", " clfs.append(clf)\n", "print(\"Number of nodes in the last tree is: {} with ccp_alpha: {}\".format(\n", " clfs[-1].tree_.node_count, ccp_alphas[-1]))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "182" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(ccp_alphas)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAnUAAAGDCAYAAABN1ObNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3de5xdZX3o/883wwQGCAxCsGQSJG050Sg0qSmXptZqsYGeCqnHlwjipbWi9efp8VCD5GipWi2psWpPtRbtsV5QbpFGWrFBBVq1BAhOIAaNBNEkE5SAjNxGEibf3x977bCzs/dc9+zZs+fzfr32a/Z61rOe9ay99uU7z2WtyEwkSZI0tc2Y7ApIkiRp/AzqJEmS2oBBnSRJUhswqJMkSWoDBnWSJEltwKBOkiSpDRjUSWqKiDghIjIiDmpk3hHu+w8jYntEPB4RixtR5nhFxP+JiH9qdN7JFBGfiYj3T8J+jy/ObUez9y21EoM6aQQi4paIeCQiDp7sumhMPgS8LTMPz8ze8RZWvB/+ZDxlZOZfZ+aIyhhN3lYVEW+IiG81qKwfRcQZ5eXM3Fac28FGlN8I1XWUmsGgThpGRJwAvAhI4Owm77shLVXiOcDmsWw4ltYfz5ukyWBQJw3vdcB64DPA6ytXRERXRPxtRPw4In4eEd+KiK5i3W9FxH9FRH/R9feGIn2/Vp7qFoyi2/H/i4h7gXuLtL8ryng0Iu6MiBdV5O8ouufui4jHivXzIuLjEfG3VfX914h4e/UBRsQ/RsSHqtK+HBEXFc/fGRF9RflbIuJ3a71QEfHfI6K3qOf2iHhPvRe1eB0ui4jbi9fuyxHxrKpsr4mIbRHxUES8q2LbUyLi1uK1fSAiPhYRM2vs4+CIeBzoAO6KiPuK9OcV+++PiM0RcXbFNp+JiE9ExA0R8QTwkqoyP0ApyP9Y0eX3sSJ9tOftPRFxRfG83N38+jrHO5q8XRHx2aJl+XsRcXFE7BjiPAxXx2si4nPFud8cEUsq1i+OiO8U664GDqmzj+cB/wicXrxm/RXn50PFcfy0eB+WPz/HRMS/FefoZxHxzYiYERGfB44H/rUo6+Ko6q4vzu1fRcS3i7rdGBHHVNTndVH6zD4cEX8RQ7SqRcTvR8Q9RTl9EfGOinV/EBEbizr+V0ScXKTXquMhEXFFsc/+iLgjIp5d77xIY5KZPnz4GOIBbAXeCrwQ2AM8u2Ldx4FbgB5KgcNvAgdT+kJ/DDgP6ASOBhYV29wC/ElFGW8AvlWxnMDXgGcBXUXaBUUZBwF/DvwEOKRYtwLYBCwAAvi1Iu8pwE5gRpHvGODJyvpX7PO3ge1AFMtHAQPAnKLc7cCcYt0JwK/Uea1+BziJ0j+MJwM/BZZXbJfAQRWvQx/wAuAw4EvAFVV5PwV0Fcf0FPC8Yv0LgdOK1+ME4HvA24c4hwn8avG8szin/weYCby0OFcLivWfAX4OLC2O45Aa5e13Dsd43t4ziuMdTd5VwH8U53AucDewY4jXZrg6/gL4fUrv78uA9cW6mcCPgf9dvKavpPT5eH+d/byBivd5kfZR4PriNZsF/CtwWbHuMkqBYGfxeBHPvD9/BJxRUU75Nal8b90H/LfiNboFWFWsWwg8DvxWcQwfKup9Rp16PwC8qOJz8evF818HHgROLV6b1xf1OrhOHd9cHN+hRf4XAkdM9vebj/Z62FInDSEifotS1901mXknpR+K84t1M4A/Bv5XZvZl5mBm/ldmPgW8Bvh6Zl6ZmXsy8+HM3DiKXV+WmT/LzAGAzLyiKOPpzPxbSoHjgiLvnwDvzswtWXJXkfd2SsFJuVXt1cAtmfnTGvv7JqUfxXIrzSuBWzNzJzBY7G9hRHRm5o8y875alc7MWzJzU2buzcy7gSuBFw9xnJ/PzO9m5hPAXwCviv27O9+bmQOZeRdwF6UAhsy8MzPXF6/Hj4DLh9lPpdOAwyn9yO/OzJuAf6MUgJd9OTO/XRzHL0ZYLozuvNVS83hHmfdVwF9n5iOZuQP4v0NVeAR1/FZm3pCl8Wqfr9jPaZSCrY8W7/E1wB1D7atSRATwJuB/F6/ZY8BfU3qfQinQOg54TlH+NzNzNDcr/+fM/EFxLq4BFhXprwT+NTO/lZm7gUspvffr2UPpvX9E8Zp+p0h/E3B5Zt5WfPY/Sym4Pm2Ico6m9M/FYPEefnQUxyMNy6BOGtrrgRsz86Fi+Ys80wV7DKXuploBzrw66SO1vXIhIv686Er7edF1dWSx/+H29VlKLTEUfz9fK1PxY3kVzwQ25wNfKNZtBd5OqdXmwYi4KiLm1ConIk6NiJsjYldE/Bx4S0U9hzvOH1MKEirz/6Ti+ZOUgjEi4r8VXXM/iYhHKQUDQ+2n0hxge2burdp3T516jcZozlstNY93lHnnVNVjyGMZQR2r93NI0c05B+irCrR+PNS+qsym1Gp1Z9Ed2Q/8e5EOsJpSi+qNEfHDiLhkFGXXqnfN1ycznwQeHqKc/0GppfLHEfEfEXF6kf4c4M/LdS/qP68ov5bPA+uAqyJiZ0R8MCI6R3lM0pAM6qQ6irE9rwJeXAQPP6HU1fRrEfFrwEOUuqZ+pcbm2+ukAzxB6ces7Jdq5Nn3Q1mMcXpnUZejMrObUgtcjGBfVwDnFPV9HrC2Tj4otaq9MiKeQ6lL6Uv7KpP5xcwst1om8Dd1yvgipe60eZl5JKXus6iTF0o/gmXHU2rNeKhO3kqfAL4PnJiZR1DqSh1qP5V2AvOKltbKffdVLA/XIlRv/WjO20R5gFK3a9m8ehnHWccHgJ6ixa3s+CHyV79mD1Hq4n9+ZnYXjyMz83CAzHwsM/88M38ZeDlwUTwzlnM0LXa16r3v9Sk+50fXrXTmHZl5DnAspc/PNcWq7cAHKurenZmHZuaVtepYtDa+NzMXUhqm8QeUxutKDWNQJ9W3nFLX40JKXTeLKAVG3wReV7T0fBr4cETMidKEhdOjdNmTLwBnRMSrIuKgiDg6IsrdPxuBV0TEoRHxq8Abh6nHLOBpYBdwUERcChxRsf6fgL+KiBOj5OSIOBqg6H67g1IrwZfK3YK1ZOlSH7uK8tZlZnkw+4KIeGlxXL+g9ENc79IRs4CfZeYvIuIUiq7qIVwQEQsj4lDgfcCaHNllKWYBjwKPR8RzgT8dwTZlt1EKrC+OiM6I+B1KQcNVoyjjp8Avj6COQ523iXINsDIijoqIHuBtE1THW4tt/6x4j7+C0jjOen4KzI1iQkvx+fkU8JGIOBYgInoiYlnx/A8i4leLoPFRSu+5wYqyhnv961kDvDwifrOoy3upE8RGxMyIeE1EHJmZeyrqQVH3txSt0xERh0VpotCsWnWMiJdExEnF8IJHKf0D0zKXYFF7MKiT6ns9pXE52zLzJ+UH8DFKszIPAt5BaZLCHcDPKLVgzcjMbZS6bP68SN/IM2ORPgLspvSl/1mKbs4hrAO+CvyAUvfWL9i/S+3DlH7Ib6T0Y/H/KA0OL/sspckLNbteq1wJnEGpxa3sYEqD7x+i1KV1LKWWsVreCrwvIh6jNFbpmjr5yj5PaWLCTyh1Zf/ZCOoIpdf9fEoTHD4FXD3C7SjGUZ0NnEXpmP6BUpD+/ZGWAfwdpVbNRyKi3pi14c7bRHkfsAO4H/g6pSDmqUbXsXgdX0FpAsQjwLnAdUNschOly8r8JCLKrbHvpNTFur7oRv86z4znO7FYfpxSAPkPmXlLse4y4N1Ft+e+2agjrPdm4H9SCuIfoPQeepD6r9FrgR8V9XsLxXCGzNxAaVzdxygd/1ZKr0VZdR1/idK5eJTSxJ7/oNSSLjVMeSaRpDYVEb9N6cfjhKpxZJMqIm6hNKOz5e+UMJVFxJ8Cr87MkU4kmVYi4nCgn1JX/v2TXR9pPGypk9pYMRD7fwH/1EoBnSZORBwXEUujdE23BZRai/9lsuvVSiLi5cXwh8MoXdJkE6VLkEhTmkGd1KaidMHXfkqXhfjoJFdHzTOT0iVeHqPU5fllSl3MesY5lCbM7KTUzfvqUV4uRWpJdr9KkiS1AVvqJEmS2oBBnSRJUhs4aLIrMBmOOeaYPOGEEya7GpIkScO68847H8rM2cPlm5ZB3QknnMCGDRsmuxqSJEnDiogR3YLP7ldJkqQ2YFAnSZLUBgzqJEmS2sC0HFMnSZKmrj179rBjxw5+8YtfTHZVGuqQQw5h7ty5dHZ2jml7gzpJkjSl7Nixg1mzZnHCCScQEZNdnYbITB5++GF27NjB/Pnzx1SG3a+SJGlK+cUvfsHRRx/dNgEdQERw9NFHj6v10aBOkiRNOe0U0JWN95gM6iRJkkahv7+ff/iHfxjTth/96Ed58sknG1yjEoM6SZLU1tb29rF01U3Mv+QrLF11E2t7+8ZVXqsGdU6UUMtb29vH6nVb2Nk/wJzuLl7y3Nl86c4dDOzZC8CMgPNPPZ73Lz9pkmsqSWo1a3v7WHndJgb2DALQ1z/Ayus2AbB8cc+Yyrzkkku47777WLRoES972cs49thjueaaa3jqqaf4wz/8Q9773vfyxBNP8KpXvYodO3YwODjIX/zFX/DTn/6UnTt38pKXvIRjjjmGm2++uWHHCQZ1anG1PoxXrN+2X569yb40AztJml7e+6+buWfno3XX927rZ/fg3v3SBvYMcvGau7ny9m01t1k45wj+8uXPr1vmqlWr+O53v8vGjRu58cYbWbNmDbfffjuZydlnn81//ud/smvXLubMmcNXvvIVAH7+859z5JFH8uEPf5ibb76ZY445ZgxHOzS7X9XSVq/bsi+gG86Vt22f4NpIkqaa6oBuuPTRuvHGG7nxxhtZvHgxv/7rv873v/997r33Xk466SS+/vWv8853vpNvfvObHHnkkQ3Z31BsqVNL29k/MOK8g5kTWBNJUisaqkUNYOmqm+ir8VvS093F1W8+fdz7z0xWrlzJm9/85gPW3Xnnndxwww2sXLmS3/u93+PSSy8d9/6GYkudWtqc7q4R5+1ow+ntkqTxWbFsAV2dHfuldXV2sGLZgjGXOWvWLB577DEAli1bxqc//Wkef/xxAPr6+njwwQfZuXMnhx56KBdccAHveMc7+M53vnPAto1mS51a2oplC/YbUzeU806d14QaSZKmkvJkiMoJdyuWLRjzJAmAo48+mqVLl/KCF7yAs846i/PPP5/TTy+1+h1++OFcccUVbN26lRUrVjBjxgw6Ozv5xCc+AcCFF17IWWedxXHHHdfwiRKR07DLasmSJblhw4bJroZGaG1vHxevuZvdg3vpcfarJE173/ve93je85432dWYELWOLSLuzMwlw21rS51a3vLFPftmKJXHP7x/+Umce/mt+6VJkjSdOaZOkiSpDTQ1qIuIMyNiS0RsjYhLaqy/KCLuiYi7I+IbEfGcqvVHRERfRHysIu2WosyNxePYZhyLJElSK2laUBcRHcDHgbOAhcB5EbGwKlsvsCQzTwbWAB+sWv9XwH/UKP41mbmoeDzY4KpLkqQW045zAsZ7TM0cU3cKsDUzfwgQEVcB5wD3lDNkZuU0kPXABeWFiHgh8Gzg34FhBwuqva3t7dt3lfClq27iJc+dzb/d9QD9A3sAOOrQTv7y5c9nw49/xpW3bd/vGnY9DZj5JEmaPIcccggPP/wwRx99NNEml7PKTB5++GEOOeSQMZfRzKCuB6i85P8O4NQh8r8R+CpARMwA/hZ4LfC7NfL+c0QMAl8C3p/tGL5rn/Ktw8pXA69167BHntzD26/eWHP7Rtz3T5I0eebOncuOHTvYtWvXZFeloQ455BDmzp075u2bGdTVCqVrBl8RcQGl1rgXF0lvBW7IzO01IvLXZGZfRMyiFNS9FvhcjTIvBC4EOP7448d0AGoNo7l1WD0DewZ5z/WbDeokaQrq7Oxk/vz5k12NltPMiRI7gMqrw84FdlZniogzgHcBZ2fmU0Xy6cDbIuJHwIeA10XEKoDM7Cv+PgZ8kVI37wEy85OZuSQzl8yePbsxR6RJMZpbhw2l3FUrSVI7aGZL3R3AiRExH+gDXg2cX5khIhYDlwNnVk54yMzXVOR5A6XJFJdExEFAd2Y+FBGdwB8AX5/wI9GkmtPdVfM+fqM1s8Mr+kiS2kfTftUy82ngbcA64HvANZm5OSLeFxFnF9lWA4cD1xaXJ7l+mGIPBtZFxN3ARkrB4qcm5gjUKmrdx2+0ZgTMO2rk95WVJKnVNfWOEpl5A3BDVdqlFc/PGEEZnwE+Uzx/AnhhQyupllfrPn6jnf16yEEzOGbWwZNSf0mSJoK3CdOUtHxxzwGTHGrd+3X54p6a6eVbjEmS1C4cVCRJktQGDOokSZLagEGdJElSGzCokyRJagMGdZIkSW3AoE6SJKkNGNRJkiS1AYM6SZKkNmBQJ0mS1AYM6iRJktqAQZ0kSVIbMKiTJElqAwZ1kiRJbeCgya6AJt7a3j5Wr9vCzv4B5nR3sWLZApYv7pnsakmSpAYyqGtza3v7WHndJgb2DALQ1z/Ayus2ARjYSZLURux+bXOr123ZF9CVDewZZPW6LZNUI0mSNBEM6trczv6BUaVLkqSpyaCuzc3p7hpVuiRJmpoM6trcimUL6Ors2C+tq7ODFcsWTFKNJEnSRDCoa3PLF/dw2StOYmZH6VTP7JjBZa84yUkSkiS1GYO6aWD54h4WH98NwOLjuw3oJElqQwZ1kiRJbcCgTpIkqQ0Y1EmSJLUBgzpJkqQ2YFAnSZLUBgzqJEmS2oBBnSRJUhswqJMkSWoDTQ3qIuLMiNgSEVsj4pIa6y+KiHsi4u6I+EZEPKdq/RER0RcRH6tIe2FEbCrK/L8REc04FkmSpFbStKAuIjqAjwNnAQuB8yJiYVW2XmBJZp4MrAE+WLX+r4D/qEr7BHAhcGLxOLPBVZckSWp5zWypOwXYmpk/zMzdwFXAOZUZMvPmzHyyWFwPzC2vi4gXAs8GbqxIOw44IjNvzcwEPgcsn9jDkCRJaj3NDOp6gO0VyzuKtHreCHwVICJmAH8LrKhR5o6RlBkRF0bEhojYsGvXrlFWXZIkqbUd1MR91RrrljUzRlwALAFeXCS9FbghM7dXDZkbcZmZ+UngkwBLliypmUetZW1vHyuvu5uBPXv3pb177Sbev/ykSayVJEmtqZlB3Q5gXsXyXGBndaaIOAN4F/DizHyqSD4deFFEvBU4HJgZEY8Df0dFF229MtU4a3v7WL1uCzv7B5jT3cUJR3ex/oePMJhJRwTnnTqvIUHX2t4+Lrp6I3ur0q9Yvw3AwE6SpCrNDOruAE6MiPlAH/Bq4PzKDBGxGLgcODMzHyynZ+ZrKvK8gdJkikuK5cci4jTgNuB1wN9P8HFMW6WWs00M7BkEoK9/gL7+gX3rBzMbFnStXrflgICu7Mrbto+r/LW9ffRu62f34F6WrrqJFcsWsHzxM732r/nUrXz7vp/tWz7x2MN4cvfefYFsdf7qQLd6vSRJzdC0MXWZ+TTwNmAd8D3gmszcHBHvi4izi2yrKbXEXRsRGyPi+hEU/afAPwFbgfsoxuGp8Vav27IvoBvKF27bNu597awIFqsN5th7z8uB6e7BUsjY1z/Ayus2sba3DzgwoAO498En6OsfIGvkL5dXb70kSc3SzJY6MvMG4IaqtEsrnp8xgjI+A3ymYnkD8IKGVVJ1DRVoVRou5qps2Tqyq5PdTw/yZDFu7qhDO/nLlz+fOd1d+7UCVuoYx6UIawWmA3sGuXjN3Vx5+zZuu/9ndbasnb/c4le9fvW6LbbWSZKaqqlBnVrTu9du4srbtu83Lq7cjVrpqEM7eeTJPePaV3UXbv/A/uU98uQeVqy5i3N/Yx5fXL+tZhfseafOq5E6MvUC0+rAbDjl/PW2G2kALElSo3ibsGnu3Ws3ccX6bfu6NCvHxVUbTUC3dNVNNbsgR9KFu2cwueaOHcw/5rD90mcEXHDa8eMaTzenu6tmek93F1e/+fQRl1PO31OnvHr7kSRpothSN81dedv24TMNoXMG7M04YJxbeWwZsF835EhbsHYP7uWYWQdzzKyDOWdRD+efevy46lm2YtmC/VoKAbo6O1ixbAEAS3/lWQeMqatWmb9WeTOCfeslSWoWg7ppbjyTDgCe3gv3r/p9lq666YAxcJVjz8o6O2aMqKtztC1nI1UOMOvNVv3Cm04f1ezXyvLKx783S8uV60fKmbSSpLEyqJvmOuLAVrbRKHczjnSs2ryjurj/4SfYO8QuOztiQlu6li/uGTJQ+sKbRhdMlsuqvtxLrZbKodS6ZMxoy5AkTV8GddNcvUkRI1HZDVlvtmqtFreRzH6dakHMcLNqR6LeTNrKMhrZFS1Jai8GddNcedJBObAbavbrR89dVLdrcLixapWGaymbihoxq7Ze3nL6PQ88CmBQJ0mqyaBOvH/5Sdz708cB9rWq1ZthWi8YG26sWrsbTUtlPbXGJVaWce7lt467npKk9mVQp4Zpxxa4kRpNS+VEliFJmr4M6qQGaERL5XRv7ZQkjY9BXYO97MO3cO+DT+xbPvHYw/jaRb8zeRVS0zSipXI6t3ZKksbHO0o0UHVAB6Wbwb/sw7dMToUkSdK0YUtdA1UHdMOlT5Za93qVJElTm0HdNHP/rse57f5n7pZQvtfrsYfPZP7swyexZpIkaTzsfp1mHnx896jSJUnS1GBQ10DPnjWz7rq1vX1NrMnYnLPIAfqSJE1VBnUNdFBHR911f37NXS0d2HVEeKcCSZKmMIO6Bqp3qygojV1bed2mSQ/sjj28dmuikyUkSZraDOoaaE5315DrB/YM8p7rNzepNrXNn304F5z2TItcRwQXnHZ83duCSZKkqcHZrw20YtkC3n71xiHz9A/saVJt6qt1r1dJkjS12VLXQMsX9zAjhs4zs8OXXJIkNZ4RRoMNNdlgRsC8o4buopUkSRoLu18brDw2rXzHhrKe7i4OOWgGx8w6eFLqVb7g8G33/4wTLvkKAL9yzGGTUhdJktR4BnUT4P3LT6o58eDcy2+dhNqwL4irdt9DT7C2t88byEuS1Absfp3mVq/bMtlVkCRJDWBL3TQ31LX11Dpe86lb93Whn3DJV+gIGMz983REcN6p83j/8pNY29vH6nVb2Nk/wJzuLl7y3Nnc/P1d7Owf4MiuTiKg/8k9zOnuYsWyBbbWSlIbMKhrkrW9ffRu62f34F6WrrqpZX5Ih7u2nibfaz51K9++72f7pVUHdKW05Ir127h/1+N8Z9vPGdgzCEBf/wBXrN+2L1/lZXX6+gdYed0mgJZ4P0qSxs6grgnW9vax8rpN7B7cC7TWD+mKZQsmdf8aXnVA1+j8A3sGuXjN3Vx5+7aa689Z1OMt5CRpCnBMXROsXrdlX6tJ2cCewaaNZ/vRqv9eM/2j5y6a9KBSraH8D0e1ex54lC9vbN17FkuSnmFLXRPUG7fWzPFsp85/FuAdJFRbT3dXzffGZM3YliSNXlNb6iLizIjYEhFbI+KSGusvioh7IuLuiPhGRDynSH9ORNwZERsjYnNEvKVim1uKMjcWj2ObeUwjUW/cmuPZNBJLf+VZo87f1dkx4vxdnR12w0tSG2haUBcRHcDHgbOAhcB5EbGwKlsvsCQzTwbWAB8s0h8AfjMzFwGnApdExJyK7V6TmYuKx4MTeiBjsGLZggN+ZP0h1Uh94U2nHxDYddS4HV1HBBecdjxfeNPpXPaKk+jp7iIotcJdcNrx+5a7uzo56tDOfesue8VJdsNLUhtoZvfrKcDWzPwhQERcBZwD3FPOkJk3V+RfD1xQpO+uSD+YKTYWsPyDefGau9k9uJee4hIT77l+M2+/eiMARx3ayV++/Pn+uKqmL7xpdN3myxf3+F6SpGmmmUFdD7C9YnkHpVa3et4IfLW8EBHzgK8AvwqsyMydFXn/OSIGgS8B78/MAy74EBEXAhcCHH9882fyLV/cs2924XmnHM+Ka+9iz95nqvnIk3tYseaufXklaaJVX8+wVS61JGlsmhnU1egwosbVtiAiLgCWAC/elzFzO3By0e26NiLWZOZPKXW99kXELEpB3WuBzx2wo8xPAp8EWLJkSc39NsvqdVv2C+jK9gwmq9dtOeBL9d1rN+27l2zlBWZrGU1eSdNX+VJLldczbJVLLUkam2YGdTuAeRXLc4Gd1Zki4gzgXcCLM/Op6vWZuTMiNgMvAtZkZl+R/lhEfJFSN+8BQV0rGWrWa/W6d6/dtN+FY8sXmIXSPWYrg7hqlXklqdJQl1oyqJOmpmYGdXcAJ0bEfKAPeDVwfmWGiFgMXA6cWTnhISLmAg9n5kBEHAUsBT4cEQcB3Zn5UER0An8AfL05hzN2c7q76KsT2CWl20D1FF0hV962vWa+K9Zv4wc/fYzb739k2P1dsX4bsw45iIXHHTGeaktqI61wqSVJjdW0CQeZ+TTwNmAd8D3gmszcHBHvi4izi2yrgcOBa4vLk1xfpD8PuC0i7gL+A/hQZm6iNGliXUTcDWykFCx+qlnHNFYrli2gc0at3uhnlLtCarXAlY0koCtbeNwRnLPI/74llXipJan9NPXiw5l5A3BDVdqlFc/PqLPd14CTa6Q/AbywwdWccOWujfLM13qqu0bGqiPCiw5L2s+KZQv2G1MHXmpJmuqm1KVB2sl4x6wce/jMEec979R5w2eSNK0sX9zDZa94ZhKV1yyUpj6DuhZXvnBsR5S6a8sXmJ0/+3BmHTz8XQNOPPYwZ79KqqkygPv2JS81oJOmOO/9OolmHdzBY0/V72Itd4UsX9xzQGB27uW38tTT9cfbeTkTSZKmF4O6SbRwzpHcs/PnNQO7niEuBHrqB77GTx/bfUB6WQD3Xfb7jayqJElqcQZ1k2zhnCNHNYlhuIAOnL0mSdJ05Ji6KWa4gM7Za5IkTU+21E2Cl334Fu598In9lr920e+Mu9yhumwlSVJ7s6WuyTZue2S/gA7g3gef4GUfvmXcZTt7TZKk6cugrsmeGqw9Y7U60Kvn2bNqX5+uXrokSZoeDOqmmNve9bIDArhnz5rJbe962STVSJIktQLH1E1BBnCSJKmaQV2TrO3to3dbf931Jx57WBNrI0mNs7a3j9XrtrCzf4A53V285LmzuTGOOmIAABtbSURBVPn7u9jZP8CRXZ1EQP+Te5jThMlc1XWp3N9Q6yZaq9ZL7cWgrgnW9vax8rpN7B7cW3P9icce1pDZr5LUbOXvt4E9pYuo9/UPcMX6bfvW9w/s2fe8r3+AlddtAsZ//+uR1qW8P6DuuokOoFq1Xmo/kVn/VlPtasmSJblhw4am7W/pqpvo6x84IH1mxwx+8IGzmlYPaTTW9vZx8Zq72T24l44IBjNHdNmcoVodqi/nMxH/0NjqMTonXPIVAH606r+Paft6329Dmdkxg8XHd49pf0Pp3dZf85/nmR2l4eP11k1EXZpRr3MW9XD+qcc3ppJqaRFxZ2YuGS6fLXVNsLPOF169ljtpslW3Lg8W//wN14owVIvEx2++t+7lfBoV2A21fwO7iVHv+20oE/XdV6/cofbXjO/hiajXPQ88CmBQp/0Y1DXBnO6umv/J9ng7L7Wo1eu27AuMqg3sGeTiNXdz5e3bDlhXq0WinL/ej9S9Dz7BuZffOv5KD7H/91y/2aBugtT7fhtKT3fXqG6POFL1Wg3L37X11k1EXSa6Xo36zKi9eEmTJlixbAFdnR37pXk7L7Wy4VpfRtvy0KxW6Xr7qRzXpcaq9f02lIn87hvqu3Yyv4dbtV5qP7bUNUG5hcBxPpoqhmt9qdeKMFSLxFDlNaqlZKjxq5oY5e+xcmtszyTOfh3Jd+1kfA+3ar3UfgzqmmT54h4/pJoyVixbsN/YtEpDtSLU2q6cv9aYOmjs5Xxq7X9GwLyjHOowkZYv7tnXHT/RXZkjqUu979rJ/B5u1Xqpvfjvq6QDLF/cw2WvOGnfmJ+OCKDU4nbZK04a8sepvF1U5f/aRb9zQADX6NmvtfY//+jDOGbWwQ3bhyS1KlvqJNU01taDobZrxvUYq/fvgHJJ04UtdZIkSW3AoE6SJKkN2P0qSYXKu1EMNWvTu1ZIakUGdZLEgXejqHfPUvBenZJak0GdJDH0XTTgmTtjwIEXOa51lw3vyymp2RxTJ0mM7B6muwf3juiuGfc88Chf3tjXsLpJ0kjYUidJjOwepiO9V6eXUZE0GQzqJLWttb199G7rZ/fgXuZf8hWySD/q0E7+8uXPB9hvYkRnR7BnMGuWVXknjXp3zZCkydTUoC4izgT+DugA/ikzV1Wtvwj4E+BpYBfwx5n544h4DnBdsV0n8PeZ+Y/FNi8EPgN0ATcA/ysza38rS5o2yhMfyt2ilV8Kjzy5h4uu2UjHjGeCuP6BPXTOCA6aETy9N+ke5p6lzn6V1GqaFtRFRAfwceBlwA7gjoi4PjPvqcjWCyzJzCcj4k+BDwLnAg8Av5mZT0XE4cB3i213Ap8ALgTWUwrqzgS+2qzjktSahpv4sDdhb1Wr3J69SQCnzn/WkPcw9V6dklpRMydKnAJszcwfZuZu4CrgnMoMmXlzZj5ZLK4H5hbpuzPzqSL9YIp6R8RxwBGZeWvROvc5YPnEH4qkVjeSiQ+1JKWZq5I01TQzqOsBtlcs7yjS6nkjFS1uETEvIu4uyvibopWupyhnpGVKmibmFJMaRqunu8tLkUiakpoZ1EWNtJpj3yLiAmAJsHpfxsztmXky8KvA6yPi2aMs88KI2BARG3bt2jXqykuaWlYsW0BXZ0fd9TMCOjv2/wpxwoOkqayZQd0OYF7F8lxgZ3WmiDgDeBdwdkWX6z5FC91m4EVFmXOHK7PY7pOZuSQzl8yePXvMByFpali+uIfLXnHSvsuQVIZvRx3ayYdftYjVr/w1erq7CEotdJe94iTHykmaspo5+/UO4MSImA/0Aa8Gzq/MEBGLgcuBMzPzwYr0ucDDmTkQEUcBS4EPZ+YDEfFYRJwG3Aa8Dvj75hyOpFY3kgkNBnGS2kXTgrrMfDoi3gaso3Rpkk9n5uaIeB+wITOvp9TdejhwbUQAbMvMs4HnAX8bEUnpH+4PZWb5Rox/yjOXNPkqznyVJEnTUFOvU5eZN1C67Ehl2qUVz8+os93XgJPrrNsAvKCB1ZSkMau84PHSVTfVvYbd2t4+3nP9ZvoH9gDPXBDZlkNJY+UdJSSpQaoveNzXP8DK60qdCpXB2trePlZcexd79j4zr+uRJ/ewYs1dB+SVpJEyqJOkBql1weOBPYNcvOZurrx927603m39+wV0ZXsGk/dcv9mgTtKYNHP2qyS1tXoXPC633NVbrlTujpWk0bKlTpIaZE53F301Arue7q79bju2dNVNNfMBzOzwf21JY+O3hyQ1SK0LHte6oPGKZQvonHHgtdMDmHfU2O6EIUm21ElSg5THwq1et4Wd/QPM6e6qOfu1vFw9+/VZh87kmFkHN7fSktqGQZ0kNdBILnhcL9+5l986UdWSNA3Y/SpJktQGDOokSZLagEGdJElSGzCokyRJagMGdZIkSW3AoE6SJKkNGNRJkiS1gWGDuoh4WUR8KiIWFcsXTny1JEmSNBojufjwW4E/At4dEc8CFk1slSRJkjRaI+l+3ZWZ/Zn5DuD3gN+Y4DpJkiRplEYS1H2l/CQzLwE+N3HVkSRJ0lgMG9Rl5perlv9+4qojSZKksRjR7NeIeG1E7IqIHRHxuiLttIh4f0TcObFVlCRJ0nBGekmTS4HfpzRJ4pcj4mvAtcBM4O0TVDdJkiSN0EhmvwI8npl3AETEe4GfAv8tM/snrGaSJEkasZEGdb9UXJ9uS/HYYUAnSZLUOkYa1P0lcDLwGuAkYFZEfB3oBXoz84sTVD9JkiSNwIiCusz8ZOVyRMylFOSdBJwFGNRJkiRNopG21O0nM3cAO4AbGlsdSZIkjcVIZ79KkiSphRnUSZIktQGDOkmSpDZgUCdJktQGmhrURcSZEbElIrZGxCU11l8UEfdExN0R8Y2IeE6Rvigibo2IzcW6cyu2+UxE3B8RG4vHomYekyRJUitoWlAXER3AxyldAmUhcF5ELKzK1gssycyTgTXAB4v0J4HXZebzgTOBj0ZEd8V2KzJzUfHYOKEHIkmS1IKa2VJ3CrA1M3+YmbuBq4BzKjNk5s2Z+WSxuB6YW6T/IDPvLZ7vBB4EZjet5pIkSS2umUFdD7C9YnlHkVbPG4GvVidGxCnATOC+iuQPFN2yH4mIg2sVFhEXRsSGiNiwa9eu0ddekiSphTUzqIsaaVkzY8QFwBJgdVX6ccDngT/KzL1F8krgucBvAM8C3lmrzMz8ZGYuycwls2fbyCdJktpLM4O6HcC8iuW5wM7qTBFxBvAu4OzMfKoi/QjgK8C7M3N9OT0zH8iSp4B/ptTNK0mSNK00M6i7AzgxIuZHxEzg1cD1lRkiYjFwOaWA7sGK9JnAvwCfy8xrq7Y5rvgbwHLguxN6FJIkSS1oTPd+HYvMfDoi3gasAzqAT2fm5oh4H7AhM6+n1N16OHBtKUZjW2aeDbwK+G3g6Ih4Q1HkG4qZrl+IiNmUunc3Am9p1jFJkiS1iqYFdQCZeQNwQ1XapRXPz6iz3RXAFXXWvbSRdZQkSZqKvKOEJElSGzCokyRJagMGdZIkSW3AoE6SJKkNGNRJkiS1gabOfpUktY61vX37ni9ddRMvee5sbv7+Lvr6B+iIYDCTnu4uVixbwPLFPfu2Wb1uCzv7B5hTtU7S5DKok6RpaG1vHyuv27Rvua9/gCvWb9u3PJi5L70y38rrNjGwZ7DmOkmTy6BOkqah1eu27AvOhjOwZ5CL19wNwO7BvTXXHdw5g4XHHdHweupAa3v76N3Wz+7BvSxddZOtpdrHMXWS1ALKP9S33f8zlq66ab+u0Ymws39gVPl3D+49IKCrXLfwuCM4Z5GBxUQrt7CWz0W5tXSi3y+aGgzqJGmSTcYP9ZzurlHl7+nuoqfONj3dXVz95tM5/9TjG1E1DaFWC+vAnkHec/3mSaqRWondr5I0yer9UF+85m6uvH1bna3G55CDZjAjYG8On7ers4MVyxYA+4+pq16niVevhbV/YE+Ta6JWZFAnSZOs3g91ve7ORjhm1sEAbH9kgN2De+np7hrR7FfA2a+TaE53F3013i8zO+x4k0GdJE26ej/U5W7NVrJ8cY9B3CRasWzBAa2lMwLmHTW67nS1J0N7SZpkK5YtoKuzY780uzVVy/LFPVz2ipPo6e4iKAX+848+bF/Lq6Y3W+okaZKVW77s1tRIVLeWnnv5rZNYG7USgzpJagF2a0oaL7tfJUmS2oBBnSRJUhswqJMkSWoDBnWSJEltwKBOkiSpDRjUSZIktQGDOkmSpDZgUCdJ0hS1treP3m393Hb/z1i66ibW9vZNdpU0iQzqJEmagtb29rHyuk3sHtwLQF//ACuv22RgN415RwlJkqag1eu2MLBncL+0gT2DXLzmbq68fdu4yz9nUQ/nn3r8uMtR89hSJ0nSFLSzf6BmernlbjzueeBRvrzRFr+pxpY6SZKmoDndXfTVCOx6uru4+s2nj6vscy+/dVzba3LYUidJ0hS0YtkCujo79kvr6uxgxbIFk1QjTTZb6iRJmoKWL+4BSmPrdvYPMKe7ixXLFuxL1/TT1KAuIs4E/g7oAP4pM1dVrb8I+BPgaWAX8MeZ+eOIWAR8AjgCGAQ+kJlXF9vMB64CngV8B3htZu5u0iFJkjRpli/uMYjTPk3rfo2IDuDjwFnAQuC8iFhYla0XWJKZJwNrgA8W6U8Cr8vM5wNnAh+NiO5i3d8AH8nME4FHgDdO7JFIkiS1nmaOqTsF2JqZPyxa0q4CzqnMkJk3Z+aTxeJ6YG6R/oPMvLd4vhN4EJgdEQG8lFIACPBZYPmEH4kkSVKLaWZQ1wNsr1jeUaTV80bgq9WJEXEKMBO4Dzga6M/Mp4crMyIujIgNEbFh165dY6i+JElS62pmUBc10rJmxogLgCXA6qr044DPA3+UmXtHU2ZmfjIzl2TmktmzZ4+q4pIkSa2umRMldgDzKpbnAjurM0XEGcC7gBdn5lMV6UcAXwHenZnri+SHgO6IOKhoratZpiRJUrtrZkvdHcCJETE/ImYCrwaur8wQEYuBy4GzM/PBivSZwL8An8vMa8vpmZnAzcAri6TXA1+e0KOQJElqQU0L6oqWtLcB64DvAddk5uaIeF9EnF1kWw0cDlwbERsjohz0vQr4beANRfrG4jInAO8ELoqIrZTG2P2/Zh2TJElSq2jqdeoy8wbghqq0Syuen1FnuyuAK+qs+yGlmbWSJEnTlrcJkyRJagMGdZIkSW3AoE6SJKkNGNRJkiS1AYM6SZKkNmBQJ0mS1AYM6iRJktqAQZ0kSVIbMKiTJElqAwZ1kiRJbcCgTpIkqQ0Y1EmSJLUBgzpJkqQ2YFAnSZLUBgzqJEmS2oBBnSRJUhswqJMkSWoDBnWSJEltwKBOkiSpDRjUSZIktQGDOkmSpDZgUCdJktQGDOokSZLagEGdJElSGzCokyRJagMGdZIkSW3AoE6SJKkNGNRJkiS1AYM6SZKkNtDUoC4izoyILRGxNSIuqbH+ooi4JyLujohvRMRzKtb9e0T0R8S/VW3zmYi4PyI2Fo9FzTgWSZKkVtK0oC4iOoCPA2cBC4HzImJhVbZeYElmngysAT5YsW418No6xa/IzEXFY2ODqy5JktTymtlSdwqwNTN/mJm7gauAcyozZObNmflksbgemFux7hvAY82qrCRJ0lTSzKCuB9hesbyjSKvnjcBXR1j2B4ou249ExMFjraAkSdJU1cygLmqkZc2MERcASyh1uQ5nJfBc4DeAZwHvrFPmhRGxISI27Nq1a2Q1liRJmiKaGdTtAOZVLM8FdlZniogzgHcBZ2fmU8MVmpkPZMlTwD9T6uatle+TmbkkM5fMnj17TAcgSZLUqpoZ1N0BnBgR8yNiJvBq4PrKDBGxGLicUkD34EgKjYjjir8BLAe+29BaS5IkTQEHNWtHmfl0RLwNWAd0AJ/OzM0R8T5gQ2ZeT6m79XDg2lKMxrbMPBsgIr5JqZv18IjYAbwxM9cBX4iI2ZS6dzcCb2nWMUmSJLWKpgV1AJl5A3BDVdqlFc/PGGLbF9VJf2nDKihJkjRFeUcJSZKkNmBQJ0mS1AYM6iRJktqAQZ0kSVIbMKiTJElqAwZ1kiRJbcCgTpIkqQ0Y1EmSJLUBgzpJkqQ2YFAnSZLUBgzqJEmS2oBBnSRJUhswqJMkSWoDBnWSJEltwKBOkiSpDRjUSZIktQGDOkmSpDZgUCdJktQGDOokSZLagEGdJElSGzCokyRJagMGdZIkSW3AoE6SJKkNGNRJkiS1AYM6SZKkNmBQJ0mS1AYM6iRJktqAQZ0kSVIbMKiTJElqAwZ1kiRJbaCpQV1EnBkRWyJia0RcUmP9RRFxT0TcHRHfiIjnVKz794joj4h/q9pmfkTcFhH3RsTVETGzGcciSZLUSpoW1EVEB/Bx4CxgIXBeRCysytYLLMnMk4E1wAcr1q0GXluj6L8BPpKZJwKPAG9sdN0lSZJaXTNb6k4BtmbmDzNzN3AVcE5lhsy8OTOfLBbXA3Mr1n0DeKwyf0QE8FJKASDAZ4HlE1N9SZKk1tXMoK4H2F6xvKNIq+eNwFeHKfNooD8znx6uzIi4MCI2RMSGXbt2jbDKkiRJU0Mzg7qokZY1M0ZcACyh1OXakDIz85OZuSQzl8yePXuYYiVJkqaWg5q4rx3AvIrlucDO6kwRcQbwLuDFmfnUMGU+BHRHxEFFa13NMiVJktpdM1vq7gBOLGarzgReDVxfmSEiFgOXA2dn5oPDFZiZCdwMvLJIej3w5YbWWpIkaQpoWlBXtKS9DVgHfA+4JjM3R8T7IuLsIttq4HDg2ojYGBH7gr6I+CZwLfC7EbEjIpYVq94JXBQRWymNsft/TTokSZKkltHM7lcy8wbghqq0SyuenzHEti+qk/5DSjNrJUmSpi3vKCFJktQGDOokSZLagEGdJElSGzCokyRJagMGdZIkSW3AoE6SJKkNGNRJkiS1AYM6SZKkNtDUiw9LkqTWtra3j95t/ewe3MvSVTexYtkCli/uGVd5q9dtYWf/AHO6u8ZdXito1WMyqJMkSUApWFl53SZ2D+4FoK9/gJXXbQIYU9BSLm9gz2BDymsFrXxMkZmTWoHJsGTJktywYcNkV0OSpJaydNVN9PUPHJA+s2MGi4/vHnV55Ra/RpXXCuodU093F9++5KUTss+IuDMzlwyXzzF1kiQJgJ01AjqgZhAzEvW2G2t5raBe3eu9ds1k96skSQJgTndXzZa6nu4urn7z6aMur17L31jLawX1jmlOd9ck1GZ/ttRJkiQAVixbQFdnx35pXZ0drFi2oCXKawWtfEy21EmSJOCZgf6NmtnZ6PJaQSsfkxMlJEmSWpgTJSRJkqYRgzpJkqQ2YFAnSZLUBgzqJEmS2oBBnSRJUhswqJMkSWoDBnWSJEltwKBOkiSpDRjUSZIktQGDOkmSpDYwLW8TFhG7gB9P8G6OAR6a4H1o7Dw/rc9z1No8P63N89P6RnOOnpOZs4fLNC2DumaIiA0juU+bJofnp/V5jlqb56e1eX5a30ScI7tfJUmS2oBBnSRJUhswqJs4n5zsCmhInp/W5zlqbZ6f1ub5aX0NP0eOqZMkSWoDttRJkiS1AYO6EYiIMyNiS0RsjYhLaqw/OCKuLtbfFhEnVKxbWaRviYhlIy1TozNB5+hHEbEpIjZGxIbmHEl7Guv5iYijI+LmiHg8Ij5Wtc0Li/OzNSL+b0REc46m/UzQ+bmlKHNj8Ti2OUfTnsZxjl4WEXcWn5U7I+KlFdv4GWqQCTo/o/8MZaaPIR5AB3Af8MvATOAuYGFVnrcC/1g8fzVwdfF8YZH/YGB+UU7HSMr0MbnnqFj3I+CYyT6+qf4Y5/k5DPgt4C3Ax6q2uR04HQjgq8BZk32sU/ExgefnFmDJZB9fOzzGeY4WA3OK5y8A+iq28TPU2udn1J8hW+qGdwqwNTN/mJm7gauAc6rynAN8tni+Bvjd4j+ec4CrMvOpzLwf2FqUN5IyNXITcY7UOGM+P5n5RGZ+C/hFZeaIOA44IjNvzdK33+eA5RN6FO2r4edHDTeec9SbmTuL9M3AIUWrkZ+hxmn4+RlrRQzqhtcDbK9Y3lGk1cyTmU8DPweOHmLbkZSpkZuIcwSQwI1Fk/iFE1Dv6WI852eoMncMU6ZGZiLOT9k/F91Gf2HX3rg06hz9D6A3M5/Cz1AjTcT5KRvVZ+ig0dZ8Gqr1IlZPGa6Xp156rWDaachjNxHnCGBpZu4sxjF8LSK+n5n/OY56TlfjOT/jKVMjMxHnB+A1mdkXEbOALwGvpdQapNEb9zmKiOcDfwP83ijK1MhMxPmBMXyGbKkb3g5gXsXyXGBnvTwRcRBwJPCzIbYdSZkauYk4R5SbxDPzQeBfsFt2rMZzfoYqc+4wZWpkJuL8kJl9xd/HgC/i52c8xnWOImIupe+w12XmfRX5/Qw1xkScnzF9hgzqhncHcGJEzI+ImZQGOF5fled64PXF81cCNxVjFK4HXl2MX5gPnEhpYOpIytTINfwcRcRhxX9HRMRhlP57+m4TjqUdjef81JSZDwCPRcRpRZfE64AvN77q00LDz09EHBQRxxTPO4E/wM/PeIz5HEVEN/AVYGVmfruc2c9QQzX8/Iz5MzTZs0amwgP4feAHlGa3vKtIex9wdvH8EOBaSoPsbwd+uWLbdxXbbaFiZlGtMn20zjmiNIvpruKx2XM0qefnR5T+o32c0n+7C4v0JcWX3H3Axygupu5j8s8PpVmxdwJ3F5+fv6OYVe6juecIeDfwBLCx4nFssc7PUIuen7F+hryjhCRJUhuw+1WSJKkNGNRJkiS1AYM6SZKkNmBQJ0mS1AYM6iRJktqAQZ0kjVBE/Kh87ajx5JGkiWBQJ0mS1AYM6iSphohYGxF3RsTmiLiwat0JEfH9iPhsRNwdEWsi4tCKLP8zIr4TEZsi4rnFNqdExH9FRG/xd0FTD0hS2zOok6Ta/jgzX0jpqvt/FhFHV61fAHwyM08GHgXeWrHuocz8deATwDuKtO8Dv52Zi4FLgb+e0NpLmnYM6iSptj+LiLuA9ZRuxH1i1frt+cy9Gq8Afqti3XXF3zuBE4rnRwLXRsR3gY8Az5+ISkuavgzqJKlKRPwOcAZwemb+GtBL6d6NlarvsVi5/FTxdxA4qHj+V8DNmfkC4OU1ypOkcTGok6QDHQk8kplPFmPiTquR5/iIOL14fh7wrRGU2Vc8f0NDailJFQzqJOlA/w4cFBF3U2phW18jz/eA1xd5nkVp/NxQPghcFhHfBjoaWVlJAojM6h4ESdJQIuIE4N+KrlRJagm21EmSJLUBW+okSZLagC11kiRJbcCgTpIkqQ0Y1EmSJLUBgzpJkqQ2YFAnSZLUBgzqJEmS2sD/DxSKn8D98vMHAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "train_scores = [clf.score(X_train, y_train) for clf in clfs]\n", "test_scores = [clf.score(X_test, y_test) for clf in clfs]\n", "idx = 140\n", "fig, ax = plt.subplots(figsize=(10,6))\n", "ax.set_xlabel(\"alpha\")\n", "ax.set_ylabel(\"$R^2$\")\n", "ax.set_title(\"Accuracy vs alpha for training and testing sets\")\n", "#ax.plot(ccp_alphas[:30], train_scores[:30], marker='o', label=\"train\",\n", "# drawstyle=\"steps-post\")\n", "ax.plot(ccp_alphas[:idx], test_scores[:idx], marker='o', label=\"test\",\n", " drawstyle=\"steps-post\")\n", "ax.legend()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(array([0], dtype=int64),)\n", "0.9959869170126309\n" ] }, { "data": { "text/plain": [ "0.004328537170263941" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "idx = 100\n", "print(np.where(train_scores == np.max(train_scores)))\n", "print(train_scores[78])\n", "ccp_alphas[80]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "37.877120949074076" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.tree import DecisionTreeRegressor\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import mean_squared_error\n", "X_train, X_test, y_train, y_test = train_test_split(auto_mpg_kbest.iloc[:,0:-1], auto_mpg_kbest.iloc[:,-1], test_size=0.3, random_state=5, shuffle=False)\n", "idx = 79\n", "tree = DecisionTreeRegressor(ccp_alpha=ccp_alphas[idx])\n", "tree.fit(X_train, y_train)\n", "y_pred = tree.predict(X_test)\n", "mean_squared_error(y_test, y_pred)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.12" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }