homework2_correction.ipynb

/notebooks/python_en/homework2_correction.ipynb

https://gitlab.com/gvallverdu/cours-python · Jupyter · 341 lines · 341 code · 0 blank · 0 comment · 0 complexity · 848e30fe6d86dc9e26719c8f1cb0edcd MD5 · raw file

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Correction of homework 2: read in data\n",
    "\n",
    "The aim of this second homework is to validate the part concerning the python syntax.\n",
    "\n",
    "## Instructions\n",
    "\n",
    "You have to:\n",
    "\n",
    "* Verify that you have mastered the notions discussed in the notebook concerning the reading of data.\n",
    "* Download the file climat_perpignan.csv\n",
    "* Read in the file, the maximum temperature and sunshine.\n",
    "\n",
    "Two parts are requested :\n",
    "\n",
    "* Read the file by yourself by browsing it line by line\n",
    "* Use the read_csv() function of the pandas module to read the table.\n",
    "\n",
    "## Part 1: by hand"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Janvier        12.400  141.200\n",
      "Février        13.200  160.800\n",
      "Mars           16.000  209.600\n",
      "Avril          18.200  218.000\n",
      "Mai            21.800  235.800\n",
      "Juin           26.200  268.900\n",
      "Juillet        29.200  298.200\n",
      "Août           28.900  267.400\n",
      "Septembre      25.400  222.200\n",
      "Octobre        21.000  167.600\n",
      "Novembre       15.900  149.200\n",
      "Décembre       13.100  126.100\n"
     ]
    }
   ],
   "source": [
    "# open the file in read mode \"r\"\n",
    "with open(\"climat_perpignan.csv\", \"r\") as f:\n",
    "    \n",
    "    # initialization of lists to record the values\n",
    "    max_temperature = list()\n",
    "    sunshine = list()\n",
    "    \n",
    "    # read the two first lines\n",
    "    f.readline()\n",
    "    f.readline()\n",
    "    \n",
    "    # loop over the lines of the file\n",
    "    for line in f:\n",
    "        # cut the line according to the semi-column ;\n",
    "        values = line.split(\";\")\n",
    "        \n",
    "        # save the value on column 2 and 4\n",
    "        # replace , by . using the replace method\n",
    "        # convert into float\n",
    "        tmax = float(values[2].replace(\",\", \".\"))\n",
    "        sun = float(values[4].replace(\",\", \".\"))\n",
    "        \n",
    "        # store the values in lists\n",
    "        max_temperature.append(tmax)\n",
    "        sunshine.append(sun)\n",
    "        \n",
    "        # print the values\n",
    "        month = values[0]\n",
    "        print(f\"{month:12s} {tmax:8.3f} {sun:8.3f}\")\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can print the contain of each list\n",
    "\n",
    "##### maximal temperature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[12.4, 13.2, 16.0, 18.2, 21.8, 26.2, 29.2, 28.9, 25.4, 21.0, 15.9, 13.1]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "max_temperature"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Sunshine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[141.2,\n",
       " 160.8,\n",
       " 209.6,\n",
       " 218.0,\n",
       " 235.8,\n",
       " 268.9,\n",
       " 298.2,\n",
       " 267.4,\n",
       " 222.2,\n",
       " 167.6,\n",
       " 149.2,\n",
       " 126.1]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sunshine"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 2: read in the file with pandas\n",
    "\n",
    "First, you have to import the pandas module:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now use pandas' [`read_csv()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) function to read the file. Here are the elements we have to control to read the :\n",
    "\n",
    "* We give the name of the file\n",
    "* The separator is a semicolon => `sep`.\n",
    "* Columns 0, 2 and 4 are used => `usecols`.\n",
    "* Skip the first line => `skiprows`.\n",
    "* Decimal numbers are written with a comma => `decimal`.\n",
    "* The first column is used as index => `index_col`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv(\n",
    "    \"climat_perpignan.csv\", \n",
    "    sep=\";\", \n",
    "    usecols=(0, 2, 4), \n",
    "    skiprows=1, \n",
    "    decimal=\",\", \n",
    "    index_col=0\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Température maximale</th>\n",
       "      <th>Durée d'ensoleillement (h)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Janvier</th>\n",
       "      <td>12.4</td>\n",
       "      <td>141.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Février</th>\n",
       "      <td>13.2</td>\n",
       "      <td>160.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mars</th>\n",
       "      <td>16.0</td>\n",
       "      <td>209.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Avril</th>\n",
       "      <td>18.2</td>\n",
       "      <td>218.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mai</th>\n",
       "      <td>21.8</td>\n",
       "      <td>235.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Juin</th>\n",
       "      <td>26.2</td>\n",
       "      <td>268.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Juillet</th>\n",
       "      <td>29.2</td>\n",
       "      <td>298.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Août</th>\n",
       "      <td>28.9</td>\n",
       "      <td>267.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Septembre</th>\n",
       "      <td>25.4</td>\n",
       "      <td>222.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Octobre</th>\n",
       "      <td>21.0</td>\n",
       "      <td>167.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Novembre</th>\n",
       "      <td>15.9</td>\n",
       "      <td>149.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Décembre</th>\n",
       "      <td>13.1</td>\n",
       "      <td>126.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Température maximale  Durée d'ensoleillement (h)\n",
       "Janvier                    12.4                       141.2\n",
       "Février                    13.2                       160.8\n",
       "Mars                       16.0                       209.6\n",
       "Avril                      18.2                       218.0\n",
       "Mai                        21.8                       235.8\n",
       "Juin                       26.2                       268.9\n",
       "Juillet                    29.2                       298.2\n",
       "Août                       28.9                       267.4\n",
       "Septembre                  25.4                       222.2\n",
       "Octobre                    21.0                       167.6\n",
       "Novembre                   15.9                       149.2\n",
       "Décembre                   13.1                       126.1"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}