/notebooks/python_en/homework2_correction.ipynb
Jupyter | 341 lines | 341 code | 0 blank | 0 comment | 0 complexity | 848e30fe6d86dc9e26719c8f1cb0edcd MD5 | raw file
- {
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Correction of homework 2: read in data\n",
- "\n",
- "The aim of this second homework is to validate the part concerning the python syntax.\n",
- "\n",
- "## Instructions\n",
- "\n",
- "You have to:\n",
- "\n",
- "* Verify that you have mastered the notions discussed in the notebook concerning the reading of data.\n",
- "* Download the file climat_perpignan.csv\n",
- "* Read in the file, the maximum temperature and sunshine.\n",
- "\n",
- "Two parts are requested :\n",
- "\n",
- "* Read the file by yourself by browsing it line by line\n",
- "* Use the read_csv() function of the pandas module to read the table.\n",
- "\n",
- "## Part 1: by hand"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Janvier 12.400 141.200\n",
- "Février 13.200 160.800\n",
- "Mars 16.000 209.600\n",
- "Avril 18.200 218.000\n",
- "Mai 21.800 235.800\n",
- "Juin 26.200 268.900\n",
- "Juillet 29.200 298.200\n",
- "Août 28.900 267.400\n",
- "Septembre 25.400 222.200\n",
- "Octobre 21.000 167.600\n",
- "Novembre 15.900 149.200\n",
- "Décembre 13.100 126.100\n"
- ]
- }
- ],
- "source": [
- "# open the file in read mode \"r\"\n",
- "with open(\"climat_perpignan.csv\", \"r\") as f:\n",
- " \n",
- " # initialization of lists to record the values\n",
- " max_temperature = list()\n",
- " sunshine = list()\n",
- " \n",
- " # read the two first lines\n",
- " f.readline()\n",
- " f.readline()\n",
- " \n",
- " # loop over the lines of the file\n",
- " for line in f:\n",
- " # cut the line according to the semi-column ;\n",
- " values = line.split(\";\")\n",
- " \n",
- " # save the value on column 2 and 4\n",
- " # replace , by . using the replace method\n",
- " # convert into float\n",
- " tmax = float(values[2].replace(\",\", \".\"))\n",
- " sun = float(values[4].replace(\",\", \".\"))\n",
- " \n",
- " # store the values in lists\n",
- " max_temperature.append(tmax)\n",
- " sunshine.append(sun)\n",
- " \n",
- " # print the values\n",
- " month = values[0]\n",
- " print(f\"{month:12s} {tmax:8.3f} {sun:8.3f}\")\n",
- " "
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "You can print the contain of each list\n",
- "\n",
- "##### maximal temperature"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "[12.4, 13.2, 16.0, 18.2, 21.8, 26.2, 29.2, 28.9, 25.4, 21.0, 15.9, 13.1]"
- ]
- },
- "execution_count": 2,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "max_temperature"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "##### Sunshine"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "[141.2,\n",
- " 160.8,\n",
- " 209.6,\n",
- " 218.0,\n",
- " 235.8,\n",
- " 268.9,\n",
- " 298.2,\n",
- " 267.4,\n",
- " 222.2,\n",
- " 167.6,\n",
- " 149.2,\n",
- " 126.1]"
- ]
- },
- "execution_count": 3,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "sunshine"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Part 2: read in the file with pandas\n",
- "\n",
- "First, you have to import the pandas module:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [],
- "source": [
- "import pandas as pd"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We now use pandas' [`read_csv()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) function to read the file. Here are the elements we have to control to read the :\n",
- "\n",
- "* We give the name of the file\n",
- "* The separator is a semicolon => `sep`.\n",
- "* Columns 0, 2 and 4 are used => `usecols`.\n",
- "* Skip the first line => `skiprows`.\n",
- "* Decimal numbers are written with a comma => `decimal`.\n",
- "* The first column is used as index => `index_col`."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [],
- "source": [
- "df = pd.read_csv(\n",
- " \"climat_perpignan.csv\", \n",
- " sep=\";\", \n",
- " usecols=(0, 2, 4), \n",
- " skiprows=1, \n",
- " decimal=\",\", \n",
- " index_col=0\n",
- ")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "<div>\n",
- "<style scoped>\n",
- " .dataframe tbody tr th:only-of-type {\n",
- " vertical-align: middle;\n",
- " }\n",
- "\n",
- " .dataframe tbody tr th {\n",
- " vertical-align: top;\n",
- " }\n",
- "\n",
- " .dataframe thead th {\n",
- " text-align: right;\n",
- " }\n",
- "</style>\n",
- "<table border=\"1\" class=\"dataframe\">\n",
- " <thead>\n",
- " <tr style=\"text-align: right;\">\n",
- " <th></th>\n",
- " <th>Température maximale</th>\n",
- " <th>Durée d'ensoleillement (h)</th>\n",
- " </tr>\n",
- " </thead>\n",
- " <tbody>\n",
- " <tr>\n",
- " <th>Janvier</th>\n",
- " <td>12.4</td>\n",
- " <td>141.2</td>\n",
- " </tr>\n",
- " <tr>\n",
- " <th>Février</th>\n",
- " <td>13.2</td>\n",
- " <td>160.8</td>\n",
- " </tr>\n",
- " <tr>\n",
- " <th>Mars</th>\n",
- " <td>16.0</td>\n",
- " <td>209.6</td>\n",
- " </tr>\n",
- " <tr>\n",
- " <th>Avril</th>\n",
- " <td>18.2</td>\n",
- " <td>218.0</td>\n",
- " </tr>\n",
- " <tr>\n",
- " <th>Mai</th>\n",
- " <td>21.8</td>\n",
- " <td>235.8</td>\n",
- " </tr>\n",
- " <tr>\n",
- " <th>Juin</th>\n",
- " <td>26.2</td>\n",
- " <td>268.9</td>\n",
- " </tr>\n",
- " <tr>\n",
- " <th>Juillet</th>\n",
- " <td>29.2</td>\n",
- " <td>298.2</td>\n",
- " </tr>\n",
- " <tr>\n",
- " <th>Août</th>\n",
- " <td>28.9</td>\n",
- " <td>267.4</td>\n",
- " </tr>\n",
- " <tr>\n",
- " <th>Septembre</th>\n",
- " <td>25.4</td>\n",
- " <td>222.2</td>\n",
- " </tr>\n",
- " <tr>\n",
- " <th>Octobre</th>\n",
- " <td>21.0</td>\n",
- " <td>167.6</td>\n",
- " </tr>\n",
- " <tr>\n",
- " <th>Novembre</th>\n",
- " <td>15.9</td>\n",
- " <td>149.2</td>\n",
- " </tr>\n",
- " <tr>\n",
- " <th>Décembre</th>\n",
- " <td>13.1</td>\n",
- " <td>126.1</td>\n",
- " </tr>\n",
- " </tbody>\n",
- "</table>\n",
- "</div>"
- ],
- "text/plain": [
- " Température maximale Durée d'ensoleillement (h)\n",
- "Janvier 12.4 141.2\n",
- "Février 13.2 160.8\n",
- "Mars 16.0 209.6\n",
- "Avril 18.2 218.0\n",
- "Mai 21.8 235.8\n",
- "Juin 26.2 268.9\n",
- "Juillet 29.2 298.2\n",
- "Août 28.9 267.4\n",
- "Septembre 25.4 222.2\n",
- "Octobre 21.0 167.6\n",
- "Novembre 15.9 149.2\n",
- "Décembre 13.1 126.1"
- ]
- },
- "execution_count": 6,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "df"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.8.5"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
- }