{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction to Time Series Analysis\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this lab, we will explore Pandas functionality to perform time series analysis\n", "\n", "## Part 1: Creating Timestamps\n", "In pandas, a single point in time is represented as a timestamp. Before we explore how to work with time series in pandas, let's us look at how to create timestamps." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import the standard modules to be used in this lab\n", "import pandas as pd\n", "import numpy as np\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: Working with Time Series Data\n", "### Part 2.1: Setting the Index\n", "To work with time series data in pandas, we use a DatetimeIndex as the index for our DataFrame (or Series). Let’s see how to do this with our dataset. First, we use the read_csv() function to read the data into a DataFrame, and then display its shape.\n", "\n", "Note: DatetimeIndex is an array of datetime64 data." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4383, 5)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "file = 'energy.csv'\n", "energy = pd.read_csv(file)\n", "energy = energy.fillna(method='pad')\n", "energy.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The DataFrame has 4383 rows, covering the period from January 1, 2006 through December 31, 2017. To see what the data looks like, let’s use the head() and tail() methods to display the first and last few rows. `NaN` stands for Not a Number since the values are not available or missing." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateConsumptionWindSolarWind+Solar
02006-01-011069.184NaNNaNNaN
12006-01-021380.521NaNNaNNaN
22006-01-031442.533NaNNaNNaN
32006-01-041457.217NaNNaNNaN
42006-01-051477.131NaNNaNNaN
\n", "
" ], "text/plain": [ " Date Consumption Wind Solar Wind+Solar\n", "0 2006-01-01 1069.184 NaN NaN NaN\n", "1 2006-01-02 1380.521 NaN NaN NaN\n", "2 2006-01-03 1442.533 NaN NaN NaN\n", "3 2006-01-04 1457.217 NaN NaN NaN\n", "4 2006-01-05 1477.131 NaN NaN NaN" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "energy.head()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateConsumptionWindSolarWind+Solar
43782017-12-271263.94091394.50716.530411.037
43792017-12-281299.86398506.42414.162520.586
43802017-12-291295.08753584.27729.854614.131
43812017-12-301215.44897721.2477.467728.714
43822017-12-311107.11488721.17619.980741.156
\n", "
" ], "text/plain": [ " Date Consumption Wind Solar Wind+Solar\n", "4378 2017-12-27 1263.94091 394.507 16.530 411.037\n", "4379 2017-12-28 1299.86398 506.424 14.162 520.586\n", "4380 2017-12-29 1295.08753 584.277 29.854 614.131\n", "4381 2017-12-30 1215.44897 721.247 7.467 728.714\n", "4382 2017-12-31 1107.11488 721.176 19.980 741.156" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "energy.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, let’s check out the data types of each column." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Date object\n", "Consumption float64\n", "Wind float64\n", "Solar float64\n", "Wind+Solar float64\n", "dtype: object" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "energy.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Date` column is not the correct data type (`datetime64[ns]`). We need to use `to_datetime()` to convert it to `datetime64`. Then we set it as the DataFrame's index." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 7: Exercise\n", "The lab exercise uses air pollution dataset of a city.

The dataset contains the following data\n", "1. No: Row number\n", "2. year: Year of data in this row\n", "3. month: Month of data in this row\n", "4. day: Day of data in this row\n", "5. hour: Hour of data in this row\n", "6. PM2.5: Particulate matter 2.5 micrometers in diameter concentration (ug/m^3)\n", "7. PM10: Particulate matter 10 micrometers in diameter concentration (ug/m^3)\n", "8. SO2: Sulfur dioxide concentration (ug/m^3)\n", "9. NO2: Nitrogen dioxide concentration (ug/m^3)\n", "10. CO: Carbon monoxide concentration (ug/m^3)\n", "11. O3: Ozone concentration (ug/m^3)\n", "12. TEMP: Temperature (degree Celsius)\n", "13. PRES: Pressure (hPa)\n", "14. DEWP: Dew point temperature (degree Celsius)\n", "15. RAIN: Precipitation (mm)\n", "16. wd: Wind direction\n", "17. WSPM: Wind speed (m/s)\n", "18. station: Name of the air-quality monitoring site\n", "\n", "Analyse the data in terms of seasonality and trend." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.12" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }