{ "cells": [ { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "# Libreria Pandas\n", "\n", "**Pandas** es la librería por excelencia para el análisis de datos del lenguaje `Python`. Su nombre proviene de “panel data” (término econométrico). Inspirada en las funcionalidades de `R`, pero con el potencial de este lenguaje de propósito general.\n", "\n", "**Pandas** incluye todas las funcionalidades necesarias para el proceso de análisis de datos: carga, filtrado, tratamiento, síntesis, agrupamiento, almacenamiento y visualización. Además, se integra con el resto de librerías de cálculo numérico como `Numpy`, `Matplotlib`, `scikit-learn`, … y de despliegue: `HPC`, `Cloud`, etc.\n", "\n", "En resumen, **es como una hoja de cálculo -por ejemplo excel- pero con más mucho más potencial!!!**\n", "\n", "[Características principales](https://github.com/pandas-dev/pandas#main-features)\n" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Todo el trabajo que realizaremos es sobre la estructura de datos básica: el `dataFrame`.\n", "\n", "Un `dataFrame` es un objeto de dos dimensiones que contiene información. También puede verse como una **hoja de cálculo**, como una tabla de un modelo entidad-relación, o como una colección de una base de datos no relacional.\n", "\n", "[Documentación](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: pandas in /Users/isaac/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages (1.5.3)\n", "Requirement already satisfied: python-dateutil>=2.8.1 in /Users/isaac/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages (from pandas) (2.8.2)\n", "Requirement already satisfied: pytz>=2020.1 in /Users/isaac/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages (from pandas) (2023.3)\n", "Requirement already satisfied: numpy>=1.21.0 in /Users/isaac/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages (from pandas) (1.23.5)\n", "Requirement already satisfied: six>=1.5 in /Users/isaac/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)\n", "\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "!uv pip install pandas" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Importación de la libreria\n", "```Python\n", "import pandas as pd\n", "import pandas\n", "from pandas import *\n", "```\n", "\n", "Por convención se hace de la siguiente manera, todas las funciones de la libreria se tienen que llamar con el prefijo pd.*:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## Carga de datos\n", "Vamos a aprender Pandas a través de una serie de proyectos y ejemplos. En esta primera fase, vamos a cargar datos de un fichero _CSV_, recordad que son ficheros donde los atributos/valores de una observación están separados por una coma y las observaciones se separan mediante un salto de línea.\n", "\n", "Podemos descargar los datos con los que trabajaremos del siguiente enlace [WHO dataset](http://www.exploredata.net/Downloads/WHO-Data-Set)\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Empezamos viendo como se carga un dataframe a partir de un fichero en formato CSV." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "df = pd.read_csv(\"data/WHO.csv\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
0Afghanistan11151.028.0NaNNaNNaN26088.04.0...692.50NaNNaNNaN257.00231.9257.005740436.05.4422.9
1Albania2227.098.76000.093.094.03172.00.6...3499.124.790000e+0978.14-2.040000e+0918.4715.518.471431793.92.2145.4
2Algeria336.069.95940.094.096.033351.01.5...137535.566.970000e+10351.364.700000e+0940.0031.240.0020800000.02.6163.3
3Andorra42NaNNaNNaN83.083.074.01.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4Angola53146.067.43890.049.051.016557.02.8...8991.461.490000e+1027.139.140000e+09164.10242.5164.108578749.04.1453.3
..................................................................
197Vietnam198625.090.32310.091.096.086206.01.4...101826.234.480000e+1047.11-1.940000e+0920.2023.420.2021900000.02.9026.4
198West Bank and Gaza1991NaNNaNNaNNaNNaNNaNNaN...655.863.780000e+09NaNNaN28.0025.828.002596216.03.3371.6
199Yemen200183.054.12090.065.085.021732.03.0...20148.341.150000e+10114.528.310000e+0882.4087.982.405759120.54.3727.3
200Zambia2013161.068.01140.094.090.011696.01.9...2366.944.090000e+0910.41-4.470000e+08175.30163.8175.304017411.01.9535.0
201Zimbabwe2023101.089.5NaN88.087.013228.00.8...11457.335.620000e+093.39-1.710000e+08106.5067.0106.504709965.01.9035.9
\n", "

202 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "0 Afghanistan 1 1 151.0 \n", "1 Albania 2 2 27.0 \n", "2 Algeria 3 3 6.0 \n", "3 Andorra 4 2 NaN \n", "4 Angola 5 3 146.0 \n", ".. ... ... ... ... \n", "197 Vietnam 198 6 25.0 \n", "198 West Bank and Gaza 199 1 NaN \n", "199 Yemen 200 1 83.0 \n", "200 Zambia 201 3 161.0 \n", "201 Zimbabwe 202 3 101.0 \n", "\n", " Adult literacy rate (%) \\\n", "0 28.0 \n", "1 98.7 \n", "2 69.9 \n", "3 NaN \n", "4 67.4 \n", ".. ... \n", "197 90.3 \n", "198 NaN \n", "199 54.1 \n", "200 68.0 \n", "201 89.5 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "0 NaN \n", "1 6000.0 \n", "2 5940.0 \n", "3 NaN \n", "4 3890.0 \n", ".. ... \n", "197 2310.0 \n", "198 NaN \n", "199 2090.0 \n", "200 1140.0 \n", "201 NaN \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "0 NaN \n", "1 93.0 \n", "2 94.0 \n", "3 83.0 \n", "4 49.0 \n", ".. ... \n", "197 91.0 \n", "198 NaN \n", "199 65.0 \n", "200 94.0 \n", "201 88.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "0 NaN \n", "1 94.0 \n", "2 96.0 \n", "3 83.0 \n", "4 51.0 \n", ".. ... \n", "197 96.0 \n", "198 NaN \n", "199 85.0 \n", "200 90.0 \n", "201 87.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "0 26088.0 4.0 ... \n", "1 3172.0 0.6 ... \n", "2 33351.0 1.5 ... \n", "3 74.0 1.0 ... \n", "4 16557.0 2.8 ... \n", ".. ... ... ... \n", "197 86206.0 1.4 ... \n", "198 NaN NaN ... \n", "199 21732.0 3.0 ... \n", "200 11696.0 1.9 ... \n", "201 13228.0 0.8 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "0 692.50 NaN NaN \n", "1 3499.12 4.790000e+09 78.14 \n", "2 137535.56 6.970000e+10 351.36 \n", "3 NaN NaN NaN \n", "4 8991.46 1.490000e+10 27.13 \n", ".. ... ... ... \n", "197 101826.23 4.480000e+10 47.11 \n", "198 655.86 3.780000e+09 NaN \n", "199 20148.34 1.150000e+10 114.52 \n", "200 2366.94 4.090000e+09 10.41 \n", "201 11457.33 5.620000e+09 3.39 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "0 NaN 257.00 \n", "1 -2.040000e+09 18.47 \n", "2 4.700000e+09 40.00 \n", "3 NaN NaN \n", "4 9.140000e+09 164.10 \n", ".. ... ... \n", "197 -1.940000e+09 20.20 \n", "198 NaN 28.00 \n", "199 8.310000e+08 82.40 \n", "200 -4.470000e+08 175.30 \n", "201 -1.710000e+08 106.50 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "0 231.9 257.00 \n", "1 15.5 18.47 \n", "2 31.2 40.00 \n", "3 NaN NaN \n", "4 242.5 164.10 \n", ".. ... ... \n", "197 23.4 20.20 \n", "198 25.8 28.00 \n", "199 87.9 82.40 \n", "200 163.8 175.30 \n", "201 67.0 106.50 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "0 5740436.0 5.44 22.9 \n", "1 1431793.9 2.21 45.4 \n", "2 20800000.0 2.61 63.3 \n", "3 NaN NaN NaN \n", "4 8578749.0 4.14 53.3 \n", ".. ... ... ... \n", "197 21900000.0 2.90 26.4 \n", "198 2596216.0 3.33 71.6 \n", "199 5759120.5 4.37 27.3 \n", "200 4017411.0 1.95 35.0 \n", "201 4709965.0 1.90 35.9 \n", "\n", "[202 rows x 358 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "A continuación se muestra la estructura interna del DataFrame. Se puede ver que és muy parecido a una tabla bidimensional:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
0Afghanistan11151.028.0NaNNaNNaN26088.04.0...692.50NaNNaNNaN257.00231.9257.005740436.05.4422.9
1Albania2227.098.76000.093.094.03172.00.6...3499.124.790000e+0978.14-2.040000e+0918.4715.518.471431793.92.2145.4
2Algeria336.069.95940.094.096.033351.01.5...137535.566.970000e+10351.364.700000e+0940.0031.240.0020800000.02.6163.3
3Andorra42NaNNaNNaN83.083.074.01.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4Angola53146.067.43890.049.051.016557.02.8...8991.461.490000e+1027.139.140000e+09164.10242.5164.108578749.04.1453.3
..................................................................
197Vietnam198625.090.32310.091.096.086206.01.4...101826.234.480000e+1047.11-1.940000e+0920.2023.420.2021900000.02.9026.4
198West Bank and Gaza1991NaNNaNNaNNaNNaNNaNNaN...655.863.780000e+09NaNNaN28.0025.828.002596216.03.3371.6
199Yemen200183.054.12090.065.085.021732.03.0...20148.341.150000e+10114.528.310000e+0882.4087.982.405759120.54.3727.3
200Zambia2013161.068.01140.094.090.011696.01.9...2366.944.090000e+0910.41-4.470000e+08175.30163.8175.304017411.01.9535.0
201Zimbabwe2023101.089.5NaN88.087.013228.00.8...11457.335.620000e+093.39-1.710000e+08106.5067.0106.504709965.01.9035.9
\n", "

202 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "0 Afghanistan 1 1 151.0 \n", "1 Albania 2 2 27.0 \n", "2 Algeria 3 3 6.0 \n", "3 Andorra 4 2 NaN \n", "4 Angola 5 3 146.0 \n", ".. ... ... ... ... \n", "197 Vietnam 198 6 25.0 \n", "198 West Bank and Gaza 199 1 NaN \n", "199 Yemen 200 1 83.0 \n", "200 Zambia 201 3 161.0 \n", "201 Zimbabwe 202 3 101.0 \n", "\n", " Adult literacy rate (%) \\\n", "0 28.0 \n", "1 98.7 \n", "2 69.9 \n", "3 NaN \n", "4 67.4 \n", ".. ... \n", "197 90.3 \n", "198 NaN \n", "199 54.1 \n", "200 68.0 \n", "201 89.5 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "0 NaN \n", "1 6000.0 \n", "2 5940.0 \n", "3 NaN \n", "4 3890.0 \n", ".. ... \n", "197 2310.0 \n", "198 NaN \n", "199 2090.0 \n", "200 1140.0 \n", "201 NaN \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "0 NaN \n", "1 93.0 \n", "2 94.0 \n", "3 83.0 \n", "4 49.0 \n", ".. ... \n", "197 91.0 \n", "198 NaN \n", "199 65.0 \n", "200 94.0 \n", "201 88.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "0 NaN \n", "1 94.0 \n", "2 96.0 \n", "3 83.0 \n", "4 51.0 \n", ".. ... \n", "197 96.0 \n", "198 NaN \n", "199 85.0 \n", "200 90.0 \n", "201 87.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "0 26088.0 4.0 ... \n", "1 3172.0 0.6 ... \n", "2 33351.0 1.5 ... \n", "3 74.0 1.0 ... \n", "4 16557.0 2.8 ... \n", ".. ... ... ... \n", "197 86206.0 1.4 ... \n", "198 NaN NaN ... \n", "199 21732.0 3.0 ... \n", "200 11696.0 1.9 ... \n", "201 13228.0 0.8 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "0 692.50 NaN NaN \n", "1 3499.12 4.790000e+09 78.14 \n", "2 137535.56 6.970000e+10 351.36 \n", "3 NaN NaN NaN \n", "4 8991.46 1.490000e+10 27.13 \n", ".. ... ... ... \n", "197 101826.23 4.480000e+10 47.11 \n", "198 655.86 3.780000e+09 NaN \n", "199 20148.34 1.150000e+10 114.52 \n", "200 2366.94 4.090000e+09 10.41 \n", "201 11457.33 5.620000e+09 3.39 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "0 NaN 257.00 \n", "1 -2.040000e+09 18.47 \n", "2 4.700000e+09 40.00 \n", "3 NaN NaN \n", "4 9.140000e+09 164.10 \n", ".. ... ... \n", "197 -1.940000e+09 20.20 \n", "198 NaN 28.00 \n", "199 8.310000e+08 82.40 \n", "200 -4.470000e+08 175.30 \n", "201 -1.710000e+08 106.50 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "0 231.9 257.00 \n", "1 15.5 18.47 \n", "2 31.2 40.00 \n", "3 NaN NaN \n", "4 242.5 164.10 \n", ".. ... ... \n", "197 23.4 20.20 \n", "198 25.8 28.00 \n", "199 87.9 82.40 \n", "200 163.8 175.30 \n", "201 67.0 106.50 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "0 5740436.0 5.44 22.9 \n", "1 1431793.9 2.21 45.4 \n", "2 20800000.0 2.61 63.3 \n", "3 NaN NaN NaN \n", "4 8578749.0 4.14 53.3 \n", ".. ... ... ... \n", "197 21900000.0 2.90 26.4 \n", "198 2596216.0 3.33 71.6 \n", "199 5759120.5 4.37 27.3 \n", "200 4017411.0 1.95 35.0 \n", "201 4709965.0 1.90 35.9 \n", "\n", "[202 rows x 358 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "df.shape" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(202, 358)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Country', 'CountryID', 'Continent', 'Adolescent fertility rate (%)',\n", " 'Adult literacy rate (%)',\n", " 'Gross national income per capita (PPP international $)',\n", " 'Net primary school enrolment ratio female (%)',\n", " 'Net primary school enrolment ratio male (%)',\n", " 'Population (in thousands) total', 'Population annual growth rate (%)',\n", " ...\n", " 'Total_CO2_emissions', 'Total_income', 'Total_reserves',\n", " 'Trade_balance_goods_and_services', 'Under_five_mortality_from_CME',\n", " 'Under_five_mortality_from_IHME', 'Under_five_mortality_rate',\n", " 'Urban_population', 'Urban_population_growth',\n", " 'Urban_population_pct_of_total'],\n", " dtype='object', length=358)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RangeIndex(start=0, stop=202, step=1)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.index" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Atributos de un DataFrame\n", "\n", "Un dataframe dispone de diferentes atributos con los que podemos obtener su información o metainformación. Los siguientes ejemplos muestran cómo se pueden consultar sus dimensiones o un listado del nombre de sus columnas:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "(202, 358)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape # Ver las dimensiones **" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "Index(['Country', 'CountryID', 'Continent', 'Adolescent fertility rate (%)',\n", " 'Adult literacy rate (%)'],\n", " dtype='object')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns[:5]" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Podemos aplicar sobre el listado de columnas todas las operaciones sobre listas que hemos visto en la introducción del curso. A continuación tenemos dos ejemplos de indexación:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "'Urban_population_growth'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns[-2]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "Index(['Country', 'CountryID', 'Continent', 'Adolescent fertility rate (%)',\n", " 'Adult literacy rate (%)'],\n", " dtype='object')" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns[0:5]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Under_five_mortality_from_IHME', 'Under_five_mortality_rate',\n", " 'Urban_population', 'Urban_population_growth'],\n", " dtype='object')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns[-5:-1]" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "**¿Cómo consultariais el nombre de la columna 10? ¿y los de las columnas de la 200 a la 225?**" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "358" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df.columns)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "Index(['Urban_population', 'Urban_population_growth',\n", " 'Urban_population_pct_of_total'],\n", " dtype='object')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns[200:226]\n", "\n", "df.columns[len(df.columns)-3:len(df.columns)]" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Funciones descriptivas de un dataframe\n", "\n", "`Pandas` ofrece una colección de funciones que permiten realizar una inspección general de la tabla de datos:\n", "\n", "- **describe**: muestra estadísticas descriptivas básicas para todas las columnas numéricas.\n", "- **info**: muestra todas las columnas y sus tipos de datos.\n", "- **head** i **tail**: muestra las $n$ primeras/últimas filas. El valor de $n$ es un parámetro de este método.\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)Population in urban areas (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
count202.000000202.000000177.000000131.000000178.000000179.000000179.0000001.930000e+02193.000000193.000000...1.860000e+021.780000e+02128.0000001.710000e+02181.000000170.000000181.0000001.880000e+02188.000000188.000000
mean101.5000003.57920859.45762778.87175611250.11236084.03352085.6983243.409805e+041.29792754.911917...1.483596e+052.015567e+1157.2535163.424012e+0856.67762454.35647156.6776241.665763e+072.16585155.195213
std58.4565371.80826349.10528620.41576012586.75341717.78804715.4512121.304957e+051.16386423.554182...6.133091e+059.400689e+11138.6692985.943043e+1060.06092961.16055660.0609295.094867e+071.59662823.742122
min1.0000001.0000000.00000023.600000260.0000006.00000011.0000002.000000e+00-2.50000010.000000...2.565000e+015.190000e+070.990000-7.140000e+112.9000003.0000002.9000001.545600e+04-1.16000010.000000
25%51.2500002.00000019.00000068.4000002112.50000079.00000079.5000001.340000e+030.50000036.000000...1.672615e+033.317500e+0916.292500-1.210000e+0912.4000008.47500012.4000009.171623e+051.10500035.650000
50%101.5000003.00000046.00000086.5000006175.00000090.00000090.0000006.762000e+031.30000057.000000...1.021157e+041.145000e+1028.515000-2.240000e+0829.98000027.60000029.9800003.427661e+061.94500057.300000
75%151.7500005.00000091.00000095.30000014502.50000096.00000096.0000002.173200e+042.10000073.000000...6.549217e+048.680000e+1055.3100001.024000e+0988.70000082.90000088.7000009.837113e+063.25250072.750000
max202.0000007.000000199.00000099.80000060870.000000100.000000100.0000001.328474e+064.300000100.000000...5.776432e+061.100000e+131334.8600001.390000e+11267.000000253.700000267.0000005.270000e+087.850000100.000000
\n", "

8 rows × 357 columns

\n", "
" ], "text/plain": [ " CountryID Continent Adolescent fertility rate (%) \\\n", "count 202.000000 202.000000 177.000000 \n", "mean 101.500000 3.579208 59.457627 \n", "std 58.456537 1.808263 49.105286 \n", "min 1.000000 1.000000 0.000000 \n", "25% 51.250000 2.000000 19.000000 \n", "50% 101.500000 3.000000 46.000000 \n", "75% 151.750000 5.000000 91.000000 \n", "max 202.000000 7.000000 199.000000 \n", "\n", " Adult literacy rate (%) \\\n", "count 131.000000 \n", "mean 78.871756 \n", "std 20.415760 \n", "min 23.600000 \n", "25% 68.400000 \n", "50% 86.500000 \n", "75% 95.300000 \n", "max 99.800000 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "count 178.000000 \n", "mean 11250.112360 \n", "std 12586.753417 \n", "min 260.000000 \n", "25% 2112.500000 \n", "50% 6175.000000 \n", "75% 14502.500000 \n", "max 60870.000000 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "count 179.000000 \n", "mean 84.033520 \n", "std 17.788047 \n", "min 6.000000 \n", "25% 79.000000 \n", "50% 90.000000 \n", "75% 96.000000 \n", "max 100.000000 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "count 179.000000 \n", "mean 85.698324 \n", "std 15.451212 \n", "min 11.000000 \n", "25% 79.500000 \n", "50% 90.000000 \n", "75% 96.000000 \n", "max 100.000000 \n", "\n", " Population (in thousands) total Population annual growth rate (%) \\\n", "count 1.930000e+02 193.000000 \n", "mean 3.409805e+04 1.297927 \n", "std 1.304957e+05 1.163864 \n", "min 2.000000e+00 -2.500000 \n", "25% 1.340000e+03 0.500000 \n", "50% 6.762000e+03 1.300000 \n", "75% 2.173200e+04 2.100000 \n", "max 1.328474e+06 4.300000 \n", "\n", " Population in urban areas (%) ... Total_CO2_emissions Total_income \\\n", "count 193.000000 ... 1.860000e+02 1.780000e+02 \n", "mean 54.911917 ... 1.483596e+05 2.015567e+11 \n", "std 23.554182 ... 6.133091e+05 9.400689e+11 \n", "min 10.000000 ... 2.565000e+01 5.190000e+07 \n", "25% 36.000000 ... 1.672615e+03 3.317500e+09 \n", "50% 57.000000 ... 1.021157e+04 1.145000e+10 \n", "75% 73.000000 ... 6.549217e+04 8.680000e+10 \n", "max 100.000000 ... 5.776432e+06 1.100000e+13 \n", "\n", " Total_reserves Trade_balance_goods_and_services \\\n", "count 128.000000 1.710000e+02 \n", "mean 57.253516 3.424012e+08 \n", "std 138.669298 5.943043e+10 \n", "min 0.990000 -7.140000e+11 \n", "25% 16.292500 -1.210000e+09 \n", "50% 28.515000 -2.240000e+08 \n", "75% 55.310000 1.024000e+09 \n", "max 1334.860000 1.390000e+11 \n", "\n", " Under_five_mortality_from_CME Under_five_mortality_from_IHME \\\n", "count 181.000000 170.000000 \n", "mean 56.677624 54.356471 \n", "std 60.060929 61.160556 \n", "min 2.900000 3.000000 \n", "25% 12.400000 8.475000 \n", "50% 29.980000 27.600000 \n", "75% 88.700000 82.900000 \n", "max 267.000000 253.700000 \n", "\n", " Under_five_mortality_rate Urban_population Urban_population_growth \\\n", "count 181.000000 1.880000e+02 188.000000 \n", "mean 56.677624 1.665763e+07 2.165851 \n", "std 60.060929 5.094867e+07 1.596628 \n", "min 2.900000 1.545600e+04 -1.160000 \n", "25% 12.400000 9.171623e+05 1.105000 \n", "50% 29.980000 3.427661e+06 1.945000 \n", "75% 88.700000 9.837113e+06 3.252500 \n", "max 267.000000 5.270000e+08 7.850000 \n", "\n", " Urban_population_pct_of_total \n", "count 188.000000 \n", "mean 55.195213 \n", "std 23.742122 \n", "min 10.000000 \n", "25% 35.650000 \n", "50% 57.300000 \n", "75% 72.750000 \n", "max 100.000000 \n", "\n", "[8 rows x 357 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 202 entries, 0 to 201\n", "Columns: 358 entries, Country to Urban_population_pct_of_total\n", "dtypes: float64(355), int64(2), object(1)\n", "memory usage: 565.1+ KB\n" ] } ], "source": [ "df.info()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
0Afghanistan11151.028.0NaNNaNNaN26088.04.0...692.50NaNNaNNaN257.00231.9257.005740436.05.4422.9
1Albania2227.098.76000.093.094.03172.00.6...3499.124.790000e+0978.14-2.040000e+0918.4715.518.471431793.92.2145.4
\n", "

2 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "0 Afghanistan 1 1 151.0 \n", "1 Albania 2 2 27.0 \n", "\n", " Adult literacy rate (%) \\\n", "0 28.0 \n", "1 98.7 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "0 NaN \n", "1 6000.0 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "0 NaN \n", "1 93.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "0 NaN \n", "1 94.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "0 26088.0 4.0 ... \n", "1 3172.0 0.6 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "0 692.50 NaN NaN \n", "1 3499.12 4.790000e+09 78.14 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "0 NaN 257.00 \n", "1 -2.040000e+09 18.47 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "0 231.9 257.00 \n", "1 15.5 18.47 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "0 5740436.0 5.44 22.9 \n", "1 1431793.9 2.21 45.4 \n", "\n", "[2 rows x 358 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(2)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
200Zambia2013161.068.01140.094.090.011696.01.9...2366.944.090000e+0910.41-447000000.0175.3163.8175.34017411.01.9535.0
201Zimbabwe2023101.089.5NaN88.087.013228.00.8...11457.335.620000e+093.39-171000000.0106.567.0106.54709965.01.9035.9
\n", "

2 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "200 Zambia 201 3 161.0 \n", "201 Zimbabwe 202 3 101.0 \n", "\n", " Adult literacy rate (%) \\\n", "200 68.0 \n", "201 89.5 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "200 1140.0 \n", "201 NaN \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "200 94.0 \n", "201 88.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "200 90.0 \n", "201 87.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "200 11696.0 1.9 ... \n", "201 13228.0 0.8 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "200 2366.94 4.090000e+09 10.41 \n", "201 11457.33 5.620000e+09 3.39 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "200 -447000000.0 175.3 \n", "201 -171000000.0 106.5 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "200 163.8 175.3 \n", "201 67.0 106.5 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "200 4017411.0 1.95 35.0 \n", "201 4709965.0 1.90 35.9 \n", "\n", "[2 rows x 358 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.tail(2)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
200Zambia2013161.068.01140.094.090.011696.01.9...2366.944.090000e+0910.41-447000000.0175.3163.8175.34017411.01.9535.0
201Zimbabwe2023101.089.5NaN88.087.013228.00.8...11457.335.620000e+093.39-171000000.0106.567.0106.54709965.01.9035.9
\n", "

2 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "200 Zambia 201 3 161.0 \n", "201 Zimbabwe 202 3 101.0 \n", "\n", " Adult literacy rate (%) \\\n", "200 68.0 \n", "201 89.5 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "200 1140.0 \n", "201 NaN \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "200 94.0 \n", "201 88.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "200 90.0 \n", "201 87.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "200 11696.0 1.9 ... \n", "201 13228.0 0.8 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "200 2366.94 4.090000e+09 10.41 \n", "201 11457.33 5.620000e+09 3.39 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "200 -447000000.0 175.3 \n", "201 -171000000.0 106.5 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "200 163.8 175.3 \n", "201 67.0 106.5 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "200 4017411.0 1.95 35.0 \n", "201 4709965.0 1.90 35.9 \n", "\n", "[2 rows x 358 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.tail(2)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Carga de datos (segunda parte)\n", "\n", "Desafortunadamente, la estructura y codificación de los datos en los archivos CSV varía según la herramienta o el sistema operativo. Por lo tanto, podemos encontrarnos con separadores entre columnas que no sean la típica coma (',') o formatos de codificación de texto que no sean abiertos (por ejemplo, utf-8, ansi, etc.).\n", "\n", "Es por esto que la función `read_csv` es muy versátil. Puedes consultar su [documentación](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html).\n", "\n", "Vamos a ver qué sucede cuando se obtienen datos de una administración pública. Puedes encontrar el archivo disponible en: 'data/presupuesto_gastos_2023.csv'. Puedes acceder a estos datos a través del siguiente [enlace](https://datos.gob.es/es/catalogo?q=bilbao&g-recaptcha-response=&administration_level=L&theme_id=economia&sort=score+desc%2C+metadata_created+desc)." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "df_gastos = pd.read_csv(\"data/presupuesto_gastos_2023.csv\",encoding=\"cp1250\",sep=\";\") # quins errors genera ?" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
_idSOZIETATEA_EU/SOCIEDAD_EUSOZIETATEA_CAS/SOCIEDAD_CASEKITALDIA/EJERCICIOSAILA/DEPARTAMENTOSAILAREN DESKRIBAPENA_EU/DESCRIPCION DEPARTAMENTO_EUSAILAREN DESKRIBAPENA_EU/DESCRIPCION DEPARTAMENTO_CASZENTRU KUDEATZAILEA/CENTRO GESTORZENTRO KUDEATZAILEAREN DESKR._EUS/DESCR. CENTRO GESTOR_EUSZENTRO KUDEATZAILEAREN DESKR._CAS/DESCR. CENTRO GESTOR_CAS...ARTIKULUAREN DESKRIBAPENA_CAS/DESCRIPCION ARTICULO_CASKONTZEPTUA/CONCEPTOKONTZEPTUAREN DESKRIBAPENA_EUS/DESCRIPCION CONCEPTO_EUSKONTZEPTUAREN DESKRIBAPENA_CAS/DESCRIPCION CONCEPTO_CASAZPIKONTZEPTUA/SUBCONCEPTOAZPIKONTZEPTUAREN DESKRIBAPENA_EUS/DESCRIPCION SUBCONCEPTO_EUSAZPIKONTZEPTUAREN DESKRIBAPENA_CAS/DESCRIPCION SUBCONCEPTO_CASPROIEKTUA/PROYECTOPROIEKTUAREN DESKRIBAPENA/DESCRIPCION PROYECTOHASIERAKO KREDITUA/CREDITO INICIAL 2023
01UDALUDAL2023110KULTURA ETA GOBERNANTZACULTURA Y GOBERNANZA1100KULTURA ETA GOBERNANTZACULTURA Y GOBERNANZA...RETRIBUCIONES DE ALTOS CARGOS100GOI KARGUEN OINARRIZKO SOLDATAK ETA BESTELAKO ...RETRIBUCIONES BASICAS Y OTRAS REMUNERACIONES D...10001GOI KARGUEN OINARRIZKO SOLDATAK ETA BESTELAKO ...RETRIBUCIONES BASICAS Y OTRAS REMUNERACIONES D...9999/99999GENERIKOA/ GENÉRICO220.619,00
12UDALUDAL2023110KULTURA ETA GOBERNANTZACULTURA Y GOBERNANZA1100KULTURA ETA GOBERNANTZACULTURA Y GOBERNANZA...RETRIBUCIONES DEL PERSONAL EVENTUAL DE GABINETES110KABINETEETAKO ALDI BAT. PERTSONALAREN OINARRIZ...RETRIBUCIONES BASICAS Y OTRAS REMUNERACIONES D...11001KABINETEETAKO ALDI BAT. PERTSONALAREN OINARRIZ...RETRIBUCIONES BASICAS Y OTRAS REMUNERACIONES D...9999/99999GENERIKOA/ GENÉRICO589.261,00
23UDALUDAL2023110KULTURA ETA GOBERNANTZACULTURA Y GOBERNANZA1100KULTURA ETA GOBERNANTZACULTURA Y GOBERNANZA...RETRIBUCIONES DEL PERSONAL FUNCIONARIO120FUNTZIONARIOEN OINARRIZKO SOLDATAKRETRIBUCIONES BASICAS DEL PERSONAL FUNCIONARIO12001FUNTZIONARIOEN OINARRIZKO SOLDATAKRETRIBUCIONES BASICAS DEL PERSONAL FUNCIONARIO9999/99999GENERIKOA/ GENÉRICO383.369,00
34UDALUDAL2023110KULTURA ETA GOBERNANTZACULTURA Y GOBERNANZA1100KULTURA ETA GOBERNANTZACULTURA Y GOBERNANZA...RETRIBUCIONES DEL PERSONAL FUNCIONARIO121FUNTZIONARIOEN ORDAINSARI OSAGARRIAKRETRIBUCIONES COMPLEMENTARIAS DEL PERSONAL FUN...12101LANTOKI OSAGARRIACOMPLEMENTO DE DESTINO9999/99999GENERIKOA/ GENÉRICO131.722,00
45UDALUDAL2023110KULTURA ETA GOBERNANTZACULTURA Y GOBERNANZA1100KULTURA ETA GOBERNANTZACULTURA Y GOBERNANZA...RETRIBUCIONES DEL PERSONAL FUNCIONARIO121FUNTZIONARIOEN ORDAINSARI OSAGARRIAKRETRIBUCIONES COMPLEMENTARIAS DEL PERSONAL FUN...12102OSAGARRI BEREZIACOMPLEMENTO ESPECIFICO9999/99999GENERIKOA/ GENÉRICO396.609,00
..................................................................
20882089UDALUDAL2023A10ALKATETZAKO KABINETEAGABINETE DE ALCALDÍAA101PROTOKOLOKO ETA HARREMAN PUBLIKOEN BULEGOAOFICINA PROTOCOLO, RR.PP. E NSTITUCIONALES...MATERIAL, SUMINISTROS Y OTROS227KANPOKO ENPRESEK EGINDAKO LANAKTRABAJOS REALIZADOS POR OTRAS EMPRESAS EXTERNAS22717BIDAI AGENTZIETAKO ZERBITZUAKSERVICIOS DE AGENCIAS DE VIAJES2006/00184Harreman Publikoetako Kabineteari lotutako gas...500,00
20892090UDALUDAL2023A10ALKATETZAKO KABINETEAGABINETE DE ALCALDÍAA101PROTOKOLOKO ETA HARREMAN PUBLIKOEN BULEGOAOFICINA PROTOCOLO, RR.PP. E NSTITUCIONALES...MATERIAL, SUMINISTROS Y OTROS227KANPOKO ENPRESEK EGINDAKO LANAKTRABAJOS REALIZADOS POR OTRAS EMPRESAS EXTERNAS22720ITZULPEN ETA INTERPRETARITZA ZERBITZUAKSERVICIOS DE TRADUCCION E INTERPRETES2006/00182Protokoloko eta ordezkaritzako arreta/Harreman...2.000,00
20902091UDALUDAL2023A10ALKATETZAKO KABINETEAGABINETE DE ALCALDÍAA101PROTOKOLOKO ETA HARREMAN PUBLIKOEN BULEGOAOFICINA PROTOCOLO, RR.PP. E NSTITUCIONALES...MATERIAL, SUMINISTROS Y OTROS227KANPOKO ENPRESEK EGINDAKO LANAKTRABAJOS REALIZADOS POR OTRAS EMPRESAS EXTERNAS22723ZERBITZUAK:ARGIAK ETA SOINUAKSERVICIOS DE ILUMINACIÓN Y SONIDO2006/00182Protokoloko eta ordezkaritzako arreta/Harreman...10.000,00
20912092UDALUDAL2023A10ALKATETZAKO KABINETEAGABINETE DE ALCALDÍAA101PROTOKOLOKO ETA HARREMAN PUBLIKOEN BULEGOAOFICINA PROTOCOLO, RR.PP. E NSTITUCIONALES...MATERIAL, SUMINISTROS Y OTROS227KANPOKO ENPRESEK EGINDAKO LANAKTRABAJOS REALIZADOS POR OTRAS EMPRESAS EXTERNAS22799KANPOKO ENPRESEK EGINDAKO BESTELAKO LANAKOTROS TRABAJOS REALIZADOS POR EMPRESAS EXTERNAS2006/00182Protokoloko eta ordezkaritzako arreta/Harreman...48.400,00
20922093UDALUDAL2023A10ALKATETZAKO KABINETEAGABINETE DE ALCALDÍAA101PROTOKOLOKO ETA HARREMAN PUBLIKOEN BULEGOAOFICINA PROTOCOLO, RR.PP. E NSTITUCIONALES...INDEMNIZACIONES POR RAZON DEL SERVICIO230BIDAI SARIAK, PERTSONALAREN GARRAIOA ETA LOKOM...DIETAS, LOCOMOCION Y TRASLADO DEL PERSONAL23001BIDAI SARIAK, PERTSONALAREN GARRAIOA ETA LOKOM...DIETAS, LOCOMOCION Y TRASLADO DEL PERSONAL2006/00184Harreman Publikoetako Kabineteari lotutako gas...500,00
\n", "

2093 rows × 37 columns

\n", "
" ], "text/plain": [ " _id SOZIETATEA_EU/SOCIEDAD_EU SOZIETATEA_CAS/SOCIEDAD_CAS \\\n", "0 1 UDAL UDAL \n", "1 2 UDAL UDAL \n", "2 3 UDAL UDAL \n", "3 4 UDAL UDAL \n", "4 5 UDAL UDAL \n", "... ... ... ... \n", "2088 2089 UDAL UDAL \n", "2089 2090 UDAL UDAL \n", "2090 2091 UDAL UDAL \n", "2091 2092 UDAL UDAL \n", "2092 2093 UDAL UDAL \n", "\n", " EKITALDIA/EJERCICIO SAILA/DEPARTAMENTO \\\n", "0 2023 110 \n", "1 2023 110 \n", "2 2023 110 \n", "3 2023 110 \n", "4 2023 110 \n", "... ... ... \n", "2088 2023 A10 \n", "2089 2023 A10 \n", "2090 2023 A10 \n", "2091 2023 A10 \n", "2092 2023 A10 \n", "\n", " SAILAREN DESKRIBAPENA_EU/DESCRIPCION DEPARTAMENTO_EU \\\n", "0 KULTURA ETA GOBERNANTZA \n", "1 KULTURA ETA GOBERNANTZA \n", "2 KULTURA ETA GOBERNANTZA \n", "3 KULTURA ETA GOBERNANTZA \n", "4 KULTURA ETA GOBERNANTZA \n", "... ... \n", "2088 ALKATETZAKO KABINETEA \n", "2089 ALKATETZAKO KABINETEA \n", "2090 ALKATETZAKO KABINETEA \n", "2091 ALKATETZAKO KABINETEA \n", "2092 ALKATETZAKO KABINETEA \n", "\n", " SAILAREN DESKRIBAPENA_EU/DESCRIPCION DEPARTAMENTO_CAS \\\n", "0 CULTURA Y GOBERNANZA \n", "1 CULTURA Y GOBERNANZA \n", "2 CULTURA Y GOBERNANZA \n", "3 CULTURA Y GOBERNANZA \n", "4 CULTURA Y GOBERNANZA \n", "... ... \n", "2088 GABINETE DE ALCALDÍA \n", "2089 GABINETE DE ALCALDÍA \n", "2090 GABINETE DE ALCALDÍA \n", "2091 GABINETE DE ALCALDÍA \n", "2092 GABINETE DE ALCALDÍA \n", "\n", " ZENTRU KUDEATZAILEA/CENTRO GESTOR \\\n", "0 1100 \n", "1 1100 \n", "2 1100 \n", "3 1100 \n", "4 1100 \n", "... ... \n", "2088 A101 \n", "2089 A101 \n", "2090 A101 \n", "2091 A101 \n", "2092 A101 \n", "\n", " ZENTRO KUDEATZAILEAREN DESKR._EUS/DESCR. CENTRO GESTOR_EUS \\\n", "0 KULTURA ETA GOBERNANTZA \n", "1 KULTURA ETA GOBERNANTZA \n", "2 KULTURA ETA GOBERNANTZA \n", "3 KULTURA ETA GOBERNANTZA \n", "4 KULTURA ETA GOBERNANTZA \n", "... ... \n", "2088 PROTOKOLOKO ETA HARREMAN PUBLIKOEN BULEGOA \n", "2089 PROTOKOLOKO ETA HARREMAN PUBLIKOEN BULEGOA \n", "2090 PROTOKOLOKO ETA HARREMAN PUBLIKOEN BULEGOA \n", "2091 PROTOKOLOKO ETA HARREMAN PUBLIKOEN BULEGOA \n", "2092 PROTOKOLOKO ETA HARREMAN PUBLIKOEN BULEGOA \n", "\n", " ZENTRO KUDEATZAILEAREN DESKR._CAS/DESCR. CENTRO GESTOR_CAS ... \\\n", "0 CULTURA Y GOBERNANZA ... \n", "1 CULTURA Y GOBERNANZA ... \n", "2 CULTURA Y GOBERNANZA ... \n", "3 CULTURA Y GOBERNANZA ... \n", "4 CULTURA Y GOBERNANZA ... \n", "... ... ... \n", "2088 OFICINA PROTOCOLO, RR.PP. E NSTITUCIONALES ... \n", "2089 OFICINA PROTOCOLO, RR.PP. E NSTITUCIONALES ... \n", "2090 OFICINA PROTOCOLO, RR.PP. E NSTITUCIONALES ... \n", "2091 OFICINA PROTOCOLO, RR.PP. E NSTITUCIONALES ... \n", "2092 OFICINA PROTOCOLO, RR.PP. E NSTITUCIONALES ... \n", "\n", " ARTIKULUAREN DESKRIBAPENA_CAS/DESCRIPCION ARTICULO_CAS \\\n", "0 RETRIBUCIONES DE ALTOS CARGOS \n", "1 RETRIBUCIONES DEL PERSONAL EVENTUAL DE GABINETES \n", "2 RETRIBUCIONES DEL PERSONAL FUNCIONARIO \n", "3 RETRIBUCIONES DEL PERSONAL FUNCIONARIO \n", "4 RETRIBUCIONES DEL PERSONAL FUNCIONARIO \n", "... ... \n", "2088 MATERIAL, SUMINISTROS Y OTROS \n", "2089 MATERIAL, SUMINISTROS Y OTROS \n", "2090 MATERIAL, SUMINISTROS Y OTROS \n", "2091 MATERIAL, SUMINISTROS Y OTROS \n", "2092 INDEMNIZACIONES POR RAZON DEL SERVICIO \n", "\n", " KONTZEPTUA/CONCEPTO \\\n", "0 100 \n", "1 110 \n", "2 120 \n", "3 121 \n", "4 121 \n", "... ... \n", "2088 227 \n", "2089 227 \n", "2090 227 \n", "2091 227 \n", "2092 230 \n", "\n", " KONTZEPTUAREN DESKRIBAPENA_EUS/DESCRIPCION CONCEPTO_EUS \\\n", "0 GOI KARGUEN OINARRIZKO SOLDATAK ETA BESTELAKO ... \n", "1 KABINETEETAKO ALDI BAT. PERTSONALAREN OINARRIZ... \n", "2 FUNTZIONARIOEN OINARRIZKO SOLDATAK \n", "3 FUNTZIONARIOEN ORDAINSARI OSAGARRIAK \n", "4 FUNTZIONARIOEN ORDAINSARI OSAGARRIAK \n", "... ... \n", "2088 KANPOKO ENPRESEK EGINDAKO LANAK \n", "2089 KANPOKO ENPRESEK EGINDAKO LANAK \n", "2090 KANPOKO ENPRESEK EGINDAKO LANAK \n", "2091 KANPOKO ENPRESEK EGINDAKO LANAK \n", "2092 BIDAI SARIAK, PERTSONALAREN GARRAIOA ETA LOKOM... \n", "\n", " KONTZEPTUAREN DESKRIBAPENA_CAS/DESCRIPCION CONCEPTO_CAS \\\n", "0 RETRIBUCIONES BASICAS Y OTRAS REMUNERACIONES D... \n", "1 RETRIBUCIONES BASICAS Y OTRAS REMUNERACIONES D... \n", "2 RETRIBUCIONES BASICAS DEL PERSONAL FUNCIONARIO \n", "3 RETRIBUCIONES COMPLEMENTARIAS DEL PERSONAL FUN... \n", "4 RETRIBUCIONES COMPLEMENTARIAS DEL PERSONAL FUN... \n", "... ... \n", "2088 TRABAJOS REALIZADOS POR OTRAS EMPRESAS EXTERNAS \n", "2089 TRABAJOS REALIZADOS POR OTRAS EMPRESAS EXTERNAS \n", "2090 TRABAJOS REALIZADOS POR OTRAS EMPRESAS EXTERNAS \n", "2091 TRABAJOS REALIZADOS POR OTRAS EMPRESAS EXTERNAS \n", "2092 DIETAS, LOCOMOCION Y TRASLADO DEL PERSONAL \n", "\n", " AZPIKONTZEPTUA/SUBCONCEPTO \\\n", "0 10001 \n", "1 11001 \n", "2 12001 \n", "3 12101 \n", "4 12102 \n", "... ... \n", "2088 22717 \n", "2089 22720 \n", "2090 22723 \n", "2091 22799 \n", "2092 23001 \n", "\n", " AZPIKONTZEPTUAREN DESKRIBAPENA_EUS/DESCRIPCION SUBCONCEPTO_EUS \\\n", "0 GOI KARGUEN OINARRIZKO SOLDATAK ETA BESTELAKO ... \n", "1 KABINETEETAKO ALDI BAT. PERTSONALAREN OINARRIZ... \n", "2 FUNTZIONARIOEN OINARRIZKO SOLDATAK \n", "3 LANTOKI OSAGARRIA \n", "4 OSAGARRI BEREZIA \n", "... ... \n", "2088 BIDAI AGENTZIETAKO ZERBITZUAK \n", "2089 ITZULPEN ETA INTERPRETARITZA ZERBITZUAK \n", "2090 ZERBITZUAK:ARGIAK ETA SOINUAK \n", "2091 KANPOKO ENPRESEK EGINDAKO BESTELAKO LANAK \n", "2092 BIDAI SARIAK, PERTSONALAREN GARRAIOA ETA LOKOM... \n", "\n", " AZPIKONTZEPTUAREN DESKRIBAPENA_CAS/DESCRIPCION SUBCONCEPTO_CAS \\\n", "0 RETRIBUCIONES BASICAS Y OTRAS REMUNERACIONES D... \n", "1 RETRIBUCIONES BASICAS Y OTRAS REMUNERACIONES D... \n", "2 RETRIBUCIONES BASICAS DEL PERSONAL FUNCIONARIO \n", "3 COMPLEMENTO DE DESTINO \n", "4 COMPLEMENTO ESPECIFICO \n", "... ... \n", "2088 SERVICIOS DE AGENCIAS DE VIAJES \n", "2089 SERVICIOS DE TRADUCCION E INTERPRETES \n", "2090 SERVICIOS DE ILUMINACIÓN Y SONIDO \n", "2091 OTROS TRABAJOS REALIZADOS POR EMPRESAS EXTERNAS \n", "2092 DIETAS, LOCOMOCION Y TRASLADO DEL PERSONAL \n", "\n", " PROIEKTUA/PROYECTO PROIEKTUAREN DESKRIBAPENA/DESCRIPCION PROYECTO \\\n", "0 9999/99999 GENERIKOA/ GENÉRICO \n", "1 9999/99999 GENERIKOA/ GENÉRICO \n", "2 9999/99999 GENERIKOA/ GENÉRICO \n", "3 9999/99999 GENERIKOA/ GENÉRICO \n", "4 9999/99999 GENERIKOA/ GENÉRICO \n", "... ... ... \n", "2088 2006/00184 Harreman Publikoetako Kabineteari lotutako gas... \n", "2089 2006/00182 Protokoloko eta ordezkaritzako arreta/Harreman... \n", "2090 2006/00182 Protokoloko eta ordezkaritzako arreta/Harreman... \n", "2091 2006/00182 Protokoloko eta ordezkaritzako arreta/Harreman... \n", "2092 2006/00184 Harreman Publikoetako Kabineteari lotutako gas... \n", "\n", " HASIERAKO KREDITUA/CREDITO INICIAL 2023 \n", "0 220.619,00 \n", "1 589.261,00 \n", "2 383.369,00 \n", "3 131.722,00 \n", "4 396.609,00 \n", "... ... \n", "2088 500,00 \n", "2089 2.000,00 \n", "2090 10.000,00 \n", "2091 48.400,00 \n", "2092 500,00 \n", "\n", "[2093 rows x 37 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_gastos" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Codificación de caracteres\n", "\n", "Python utilitza una representación basada en Unicode (https://home.unicode.org/). Otros sistemas operativos y programas utilizan otro tipo de representaciones.\n", "\n", "```python\n", "var = \"camión\"\n", "var = \"lul·lià\"\n", "var = \"Ζεύς\"\n", "var = \"ประเทศไทย\"\n", "var = \"日本語で\"\n", "```\n", "\n", "\n", "Codificaciones:\n", "- [listado de codificaciones](https://docs.python.org/3.11/library/codecs.html#standard-encodings)\n", "- [UTF-8](https://es.wikipedia.org/wiki/UTF-8)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "ename": "UnicodeDecodeError", "evalue": "'utf-8' codec can't decode byte 0xc1 in position 1697: invalid start byte", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mUnicodeDecodeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m/Users/isaac/Projects/TxADM_notebooks/notebooks/Part2/00_Pandas/01_Introduccion.ipynb Cell 32\u001b[0m line \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[39m# cp1250 | windows-1250 | Central and Eastern Europe\u001b[39;00m\n\u001b[0;32m----> 3\u001b[0m df_gastos \u001b[39m=\u001b[39m pd\u001b[39m.\u001b[39;49mread_csv(\u001b[39m\"\u001b[39;49m\u001b[39mdata/presupuesto_gastos_2023.csv\u001b[39;49m\u001b[39m\"\u001b[39;49m,sep\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39m;\u001b[39;49m\u001b[39m\"\u001b[39;49m)\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/util/_decorators.py:211\u001b[0m, in \u001b[0;36mdeprecate_kwarg.._deprecate_kwarg..wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 209\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 210\u001b[0m kwargs[new_arg_name] \u001b[39m=\u001b[39m new_arg_value\n\u001b[0;32m--> 211\u001b[0m \u001b[39mreturn\u001b[39;00m func(\u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/util/_decorators.py:331\u001b[0m, in \u001b[0;36mdeprecate_nonkeyword_arguments..decorate..wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 325\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(args) \u001b[39m>\u001b[39m num_allow_args:\n\u001b[1;32m 326\u001b[0m warnings\u001b[39m.\u001b[39mwarn(\n\u001b[1;32m 327\u001b[0m msg\u001b[39m.\u001b[39mformat(arguments\u001b[39m=\u001b[39m_format_argument_list(allow_args)),\n\u001b[1;32m 328\u001b[0m \u001b[39mFutureWarning\u001b[39;00m,\n\u001b[1;32m 329\u001b[0m stacklevel\u001b[39m=\u001b[39mfind_stack_level(),\n\u001b[1;32m 330\u001b[0m )\n\u001b[0;32m--> 331\u001b[0m \u001b[39mreturn\u001b[39;00m func(\u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/io/parsers/readers.py:950\u001b[0m, in \u001b[0;36mread_csv\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)\u001b[0m\n\u001b[1;32m 935\u001b[0m kwds_defaults \u001b[39m=\u001b[39m _refine_defaults_read(\n\u001b[1;32m 936\u001b[0m dialect,\n\u001b[1;32m 937\u001b[0m delimiter,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 946\u001b[0m defaults\u001b[39m=\u001b[39m{\u001b[39m\"\u001b[39m\u001b[39mdelimiter\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39m,\u001b[39m\u001b[39m\"\u001b[39m},\n\u001b[1;32m 947\u001b[0m )\n\u001b[1;32m 948\u001b[0m kwds\u001b[39m.\u001b[39mupdate(kwds_defaults)\n\u001b[0;32m--> 950\u001b[0m \u001b[39mreturn\u001b[39;00m _read(filepath_or_buffer, kwds)\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/io/parsers/readers.py:605\u001b[0m, in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 602\u001b[0m _validate_names(kwds\u001b[39m.\u001b[39mget(\u001b[39m\"\u001b[39m\u001b[39mnames\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39mNone\u001b[39;00m))\n\u001b[1;32m 604\u001b[0m \u001b[39m# Create the parser.\u001b[39;00m\n\u001b[0;32m--> 605\u001b[0m parser \u001b[39m=\u001b[39m TextFileReader(filepath_or_buffer, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwds)\n\u001b[1;32m 607\u001b[0m \u001b[39mif\u001b[39;00m chunksize \u001b[39mor\u001b[39;00m iterator:\n\u001b[1;32m 608\u001b[0m \u001b[39mreturn\u001b[39;00m parser\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1442\u001b[0m, in \u001b[0;36mTextFileReader.__init__\u001b[0;34m(self, f, engine, **kwds)\u001b[0m\n\u001b[1;32m 1439\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39moptions[\u001b[39m\"\u001b[39m\u001b[39mhas_index_names\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m kwds[\u001b[39m\"\u001b[39m\u001b[39mhas_index_names\u001b[39m\u001b[39m\"\u001b[39m]\n\u001b[1;32m 1441\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mhandles: IOHandles \u001b[39m|\u001b[39m \u001b[39mNone\u001b[39;00m \u001b[39m=\u001b[39m \u001b[39mNone\u001b[39;00m\n\u001b[0;32m-> 1442\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_engine \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_make_engine(f, \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mengine)\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1753\u001b[0m, in \u001b[0;36mTextFileReader._make_engine\u001b[0;34m(self, f, engine)\u001b[0m\n\u001b[1;32m 1750\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(msg)\n\u001b[1;32m 1752\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m-> 1753\u001b[0m \u001b[39mreturn\u001b[39;00m mapping[engine](f, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49moptions)\n\u001b[1;32m 1754\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mException\u001b[39;00m:\n\u001b[1;32m 1755\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mhandles \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py:79\u001b[0m, in \u001b[0;36mCParserWrapper.__init__\u001b[0;34m(self, src, **kwds)\u001b[0m\n\u001b[1;32m 76\u001b[0m kwds\u001b[39m.\u001b[39mpop(key, \u001b[39mNone\u001b[39;00m)\n\u001b[1;32m 78\u001b[0m kwds[\u001b[39m\"\u001b[39m\u001b[39mdtype\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m ensure_dtype_objs(kwds\u001b[39m.\u001b[39mget(\u001b[39m\"\u001b[39m\u001b[39mdtype\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39mNone\u001b[39;00m))\n\u001b[0;32m---> 79\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_reader \u001b[39m=\u001b[39m parsers\u001b[39m.\u001b[39;49mTextReader(src, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwds)\n\u001b[1;32m 81\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39munnamed_cols \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_reader\u001b[39m.\u001b[39munnamed_cols\n\u001b[1;32m 83\u001b[0m \u001b[39m# error: Cannot determine type of 'names'\u001b[39;00m\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:547\u001b[0m, in \u001b[0;36mpandas._libs.parsers.TextReader.__cinit__\u001b[0;34m()\u001b[0m\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:636\u001b[0m, in \u001b[0;36mpandas._libs.parsers.TextReader._get_header\u001b[0;34m()\u001b[0m\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:852\u001b[0m, in \u001b[0;36mpandas._libs.parsers.TextReader._tokenize_rows\u001b[0;34m()\u001b[0m\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:1965\u001b[0m, in \u001b[0;36mpandas._libs.parsers.raise_parser_error\u001b[0;34m()\u001b[0m\n", "\u001b[0;31mUnicodeDecodeError\u001b[0m: 'utf-8' codec can't decode byte 0xc1 in position 1697: invalid start byte" ] } ], "source": [ "# cp1250 | windows-1250 | Central and Eastern Europe\n", "df_gastos = pd.read_csv(\"data/presupuesto_gastos_2023.csv\",sep=\";\") #Aun tenemos otro fallo, vamos a verlo" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_gastos" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "df_gastos = pd.read_csv(\"data/presupuesto_gastos_2023.csv\",delimiter=\"\\t\",encoding=\"cp1250\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Otras maneras de cargar datos\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# por la dirección del fichero en web\n", "df_who = pd.read_csv(\"http://www.exploredata.net/ftp/WHO.csv\") #dataframe" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
0Afghanistan11151.028.0NaNNaNNaN26088.04.0...692.50NaNNaNNaN257.00231.9257.005740436.05.4422.9
1Albania2227.098.76000.093.094.03172.00.6...3499.124.790000e+0978.14-2.040000e+0918.4715.518.471431793.92.2145.4
\n", "

2 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "0 Afghanistan 1 1 151.0 \n", "1 Albania 2 2 27.0 \n", "\n", " Adult literacy rate (%) \\\n", "0 28.0 \n", "1 98.7 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "0 NaN \n", "1 6000.0 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "0 NaN \n", "1 93.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "0 NaN \n", "1 94.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "0 26088.0 4.0 ... \n", "1 3172.0 0.6 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "0 692.50 NaN NaN \n", "1 3499.12 4.790000e+09 78.14 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "0 NaN 257.00 \n", "1 -2.040000e+09 18.47 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "0 231.9 257.00 \n", "1 15.5 18.47 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "0 5740436.0 5.44 22.9 \n", "1 1431793.9 2.21 45.4 \n", "\n", "[2 rows x 358 columns]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who.head(2)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0NombreCapital (de iure o en su defecto de facto)Población (2022)Porcentaje poblaciónDensidad (hab./km²)Superficie (km²)Porcentaje superficieMapaPIB per cápita en € (2021)
01AndalucíaSevillaNaN17,87 %9705NaN17,31 %NaNNaN
12CataluñaBarcelonaNaN16,38 %23884NaN6,35 %NaNNaN
23Comunidad de MadridMadridNaN14,24 %84115NaN1,59 %NaNNaN
34Comunidad ValencianaValenciaNaN10,67 %21698NaN4,60 %NaNNaN
45GaliciaSantiago de CompostelaNaN5,68 %9119NaN5,84 %NaNNaN
56Castilla y LeónValladolid[nota 1]​NaN5,02 %2534NaN18,62 %NaNNaN
67País VascoVitoria[nota 1]​NaN4,67 %30213NaN1,43 %NaNNaN
78CanariasLas Palmas de Gran Canaria y Santa Cruz de Ten...NaN4,58 %30139[nota 3]​1,47 %NaNNaN
89Castilla-La ManchaToledo[nota 1]​NaN4,32 %2579NaN15,70 %NaNNaN
910Región de MurciaMurciaNaN3,20 %13374NaN2,24 %NaNNaN
1011AragónZaragozaNaN2,79 %2790NaN9,43 %NaNNaN
1112Islas BalearesPalmaNaN2,47 %24427NaN0,99 %NaNNaN
1213ExtremaduraMéridaNaN2,23 %2541NaN8,23 %NaNNaN
1314Principado de AsturiasOviedoNaN2,14 %9553NaN2,13 %NaNNaN
1415Comunidad Foral de NavarraPamplonaNaN1,39 %6330NaN2,05 %NaNNaN
1516CantabriaSantanderNaN1,23 %10974NaN1,05 %NaNNaN
1617La RiojaLogroñoNaN0,67 %6267NaN1,00 %NaNNaN
1718MelillaMelillaNaN0,18 %700158NaN<0,01 %NaNNaN
1819CeutaCeutaNaN0,18 %417510NaN<0,01 %NaNNaN
19TOTALEspañaMadridNaN100 %9367NaN100 %NaN
\n", "
" ], "text/plain": [ " Unnamed: 0 Nombre \\\n", "0 1 Andalucía \n", "1 2 Cataluña \n", "2 3 Comunidad de Madrid \n", "3 4 Comunidad Valenciana \n", "4 5 Galicia \n", "5 6 Castilla y León \n", "6 7 País Vasco \n", "7 8 Canarias \n", "8 9 Castilla-La Mancha \n", "9 10 Región de Murcia \n", "10 11 Aragón \n", "11 12 Islas Baleares \n", "12 13 Extremadura \n", "13 14 Principado de Asturias \n", "14 15 Comunidad Foral de Navarra \n", "15 16 Cantabria \n", "16 17 La Rioja \n", "17 18 Melilla \n", "18 19 Ceuta \n", "19 TOTAL España \n", "\n", " Capital (de iure o en su defecto de facto) Población (2022) \\\n", "0 Sevilla NaN \n", "1 Barcelona NaN \n", "2 Madrid NaN \n", "3 Valencia NaN \n", "4 Santiago de Compostela NaN \n", "5 Valladolid[nota 1]​ NaN \n", "6 Vitoria[nota 1]​ NaN \n", "7 Las Palmas de Gran Canaria y Santa Cruz de Ten... NaN \n", "8 Toledo[nota 1]​ NaN \n", "9 Murcia NaN \n", "10 Zaragoza NaN \n", "11 Palma NaN \n", "12 Mérida NaN \n", "13 Oviedo NaN \n", "14 Pamplona NaN \n", "15 Santander NaN \n", "16 Logroño NaN \n", "17 Melilla NaN \n", "18 Ceuta NaN \n", "19 Madrid NaN \n", "\n", " Porcentaje población Densidad (hab./km²) Superficie (km²) \\\n", "0 17,87 % 9705 NaN \n", "1 16,38 % 23884 NaN \n", "2 14,24 % 84115 NaN \n", "3 10,67 % 21698 NaN \n", "4 5,68 % 9119 NaN \n", "5 5,02 % 2534 NaN \n", "6 4,67 % 30213 NaN \n", "7 4,58 % 30139 [nota 3]​ \n", "8 4,32 % 2579 NaN \n", "9 3,20 % 13374 NaN \n", "10 2,79 % 2790 NaN \n", "11 2,47 % 24427 NaN \n", "12 2,23 % 2541 NaN \n", "13 2,14 % 9553 NaN \n", "14 1,39 % 6330 NaN \n", "15 1,23 % 10974 NaN \n", "16 0,67 % 6267 NaN \n", "17 0,18 % 700158 NaN \n", "18 0,18 % 417510 NaN \n", "19 100 % 9367 NaN \n", "\n", " Porcentaje superficie Mapa PIB per cápita en € (2021) \n", "0 17,31 % NaN NaN \n", "1 6,35 % NaN NaN \n", "2 1,59 % NaN NaN \n", "3 4,60 % NaN NaN \n", "4 5,84 % NaN NaN \n", "5 18,62 % NaN NaN \n", "6 1,43 % NaN NaN \n", "7 1,47 % NaN NaN \n", "8 15,70 % NaN NaN \n", "9 2,24 % NaN NaN \n", "10 9,43 % NaN NaN \n", "11 0,99 % NaN NaN \n", "12 8,23 % NaN NaN \n", "13 2,13 % NaN NaN \n", "14 2,05 % NaN NaN \n", "15 1,05 % NaN NaN \n", "16 1,00 % NaN NaN \n", "17 <0,01 % NaN NaN \n", "18 <0,01 % NaN NaN \n", "19 100 % — NaN " ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# extracción de contenido en una página web\n", "url = \"https://es.wikipedia.org/wiki/Anexo:Comunidades_y_ciudades_aut%C3%B3nomas_de_Espa%C3%B1a\" \n", "\n", "comunidades_esp = pd.io.html.read_html(url) \n", "comunidades_esp[0] # Alerta! \n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0NombreCapital (de iure o en su defecto de facto)Población (2022)Porcentaje poblaciónDensidad (hab./km²)Superficie (km²)Porcentaje superficieMapaPIB per cápita en € (2021)
01AndalucíaSevillaNaN17,87 %9705NaN17,31 %NaNNaN
12CataluñaBarcelonaNaN16,38 %23884NaN6,35 %NaNNaN
23Comunidad de MadridMadridNaN14,24 %84115NaN1,59 %NaNNaN
34Comunidad ValencianaValenciaNaN10,67 %21698NaN4,60 %NaNNaN
45GaliciaSantiago de CompostelaNaN5,68 %9119NaN5,84 %NaNNaN
\n", "
" ], "text/plain": [ " Unnamed: 0 Nombre Capital (de iure o en su defecto de facto) \\\n", "0 1 Andalucía Sevilla \n", "1 2 Cataluña Barcelona \n", "2 3 Comunidad de Madrid Madrid \n", "3 4 Comunidad Valenciana Valencia \n", "4 5 Galicia Santiago de Compostela \n", "\n", " Población (2022) Porcentaje población Densidad (hab./km²) \\\n", "0 NaN 17,87 % 9705 \n", "1 NaN 16,38 % 23884 \n", "2 NaN 14,24 % 84115 \n", "3 NaN 10,67 % 21698 \n", "4 NaN 5,68 % 9119 \n", "\n", " Superficie (km²) Porcentaje superficie Mapa PIB per cápita en € (2021) \n", "0 NaN 17,31 % NaN NaN \n", "1 NaN 6,35 % NaN NaN \n", "2 NaN 1,59 % NaN NaN \n", "3 NaN 4,60 % NaN NaN \n", "4 NaN 5,84 % NaN NaN " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(type(comunidades_esp[0]))\n", "df_comunidades_esp = comunidades_esp[0]\n", "df_comunidades_esp.head() " ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date_and_time_of_initialdate_and_time_of_rangerboroughpropertylocationspecies_descriptioncall_sourcespecies_statusanimal_conditionduration_of_response..._311sr_numberfinal_ranger_actionof_animalspep_responseanimal_monitoredpolice_responseesu_responseacc_intake_numberhours_spent_monitoringrehabilitator
02021-06-23T16:45:00.0002021-06-24T08:00:00.000BrooklynSternberg ParkInside locked athletic field under constructionChukarOtherExoticHealthy6.00...311-06712416ACC6FalseFalseFalseFalse163537NaNNaN
12021-06-24T10:00:00.0002021-06-24T11:00:00.000BronxHaffen ParkHaffen PoolSparrowCentralNativeHealthy1.75...311-06714879Rehabilitator4FalseFalseFalseFalseNaNNaNNaN
22021-06-23T14:30:00.0002021-06-23T14:30:00.000BronxPelham Bay ParkPelham Bay SouthWhite-tailed DeerEmployeeNativeN/A1.00...NaNUnfounded0FalseFalseFalseFalseNaNNaNNaN
32021-06-23T13:00:00.0002021-06-23T13:10:00.000Staten IslandWillowbrook ParkThe carouselRaccoonEmployeeNativeN/A2.00...NaNUnfounded0FalseFalseFalseFalseNaNNaNNaN
42021-06-23T09:20:00.0002021-06-23T09:20:00.000QueensJudge Moses Weinstein PlaygroundGarbage canVirginia OpossumCentralNativeHealthy2.25...311-06699415ACC1FalseFalseFalseFalse119833NaNNaN
\n", "

5 rows × 22 columns

\n", "
" ], "text/plain": [ " date_and_time_of_initial date_and_time_of_ranger borough \\\n", "0 2021-06-23T16:45:00.000 2021-06-24T08:00:00.000 Brooklyn \n", "1 2021-06-24T10:00:00.000 2021-06-24T11:00:00.000 Bronx \n", "2 2021-06-23T14:30:00.000 2021-06-23T14:30:00.000 Bronx \n", "3 2021-06-23T13:00:00.000 2021-06-23T13:10:00.000 Staten Island \n", "4 2021-06-23T09:20:00.000 2021-06-23T09:20:00.000 Queens \n", "\n", " property \\\n", "0 Sternberg Park \n", "1 Haffen Park \n", "2 Pelham Bay Park \n", "3 Willowbrook Park \n", "4 Judge Moses Weinstein Playground \n", "\n", " location species_description \\\n", "0 Inside locked athletic field under construction Chukar \n", "1 Haffen Pool Sparrow \n", "2 Pelham Bay South White-tailed Deer \n", "3 The carousel Raccoon \n", "4 Garbage can Virginia Opossum \n", "\n", " call_source species_status animal_condition duration_of_response ... \\\n", "0 Other Exotic Healthy 6.00 ... \n", "1 Central Native Healthy 1.75 ... \n", "2 Employee Native N/A 1.00 ... \n", "3 Employee Native N/A 2.00 ... \n", "4 Central Native Healthy 2.25 ... \n", "\n", " _311sr_number final_ranger_action of_animals pep_response animal_monitored \\\n", "0 311-06712416 ACC 6 False False \n", "1 311-06714879 Rehabilitator 4 False False \n", "2 NaN Unfounded 0 False False \n", "3 NaN Unfounded 0 False False \n", "4 311-06699415 ACC 1 False False \n", "\n", " police_response esu_response acc_intake_number hours_spent_monitoring \\\n", "0 False False 163537 NaN \n", "1 False False NaN NaN \n", "2 False False NaN NaN \n", "3 False False NaN NaN \n", "4 False False 119833 NaN \n", "\n", " rehabilitator \n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "\n", "[5 rows x 22 columns]" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Contenido JSON\n", "#Fuente: https://data.cityofnewyork.us/Environment/Urban-Park-Ranger-Animal-Condition-Response/fuhs-xmg2\n", "url = 'https://data.cityofnewyork.us/resource/s3vf-x992.json'\n", "df = pd.read_json(url)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Actividades\n", "\n", "En esta actividad practicaremos la carga de datos en diferentes formatos. En el mundo real, los datos no siempre tienen una estructura y un formato como desearíamos.\n", "\n", "El objetivo es que analices la carga de estos datos con los datos originales:
\n", "- ¿Qué dimensión tienen los datos reales y los cargados?\n", "- ¿Cuáles son las columnas?\n", "- ¿El concepto de columna como atributo o característica y el concepto de fila como muestra están presentes en la estructura de los datos?\n", "- ¿Coinciden con la información del archivo?" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# 1. Descarga y carga el siguiente fichero:\n", "# https://data.cityofnewyork.us/Housing-Development/Speculation-Watch-List/adax-9mit" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# 2. Descarga y carga el siguiente fichero:\n", "# https://ibestat.es/estadistica/demografia/moviment-natural-de-la-poblacio/naixements-2/?lang=ca\n", "\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# 3. Y ahora Descargalo y abrelo comprimido!!!. Es decir, sin descomprimir!\n", "# https://ec.europa.eu/eurostat/databrowser/view/tin00171/default/table?lang=en\n", "# Los archivos comprimidos en formato .gz se pueden abrir directamente como si fueran archivos de datos con Pandas, y en este caso, son archivos del tipo CSV. Es decir, no es necesario descomprimirlos." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gestión de *dataframes*\n", "\n", "Durante este curso aprenderemos a modificar los dataframes, agregaremos y eliminaremos columnas, y también modificaremos las que ya tenemos. Por lo tanto, después de realizar este trabajo, es necesario guardar los nuevos datos en un archivo." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_gastos.to_csv('data/tmp_file.csv',encoding='utf-8') # guardant un dataframe en un fitxer, especificant el format" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Estructura del dataframe\n", "\n", "Ahora que ya sabemos cargar dataframes desde archivos, descubriremos cómo podemos acceder a la información que se encuentra dentro de los tipos de variables utilizados por Pandas.\n", "\n", "Un dataframe tiene columnas y filas. Las filas son muestras y las columnas representan características de una muestra. Una columna es del tipo Serie.\n", "\n", "**El objetivo de esta unidad es adquirir herramientas para comprender y seleccionar los datos representados en un dataframe y una serie con Pandas.**\n", "\n", "Comenzaremos seleccionando columnas y obteniendo resúmenes estadísticos de ellas. Más adelante, pasaremos a realizar selecciones de filas en el dataframe. Finalmente, realizaremos selecciones combinadas creando nuestros propios dataframes a partir de los subconjuntos seleccionados.\"\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "df_who = pd.read_csv(\"http://www.exploredata.net/ftp/WHO.csv\") #dataframe" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 202 entries, 0 to 201\n", "Columns: 358 entries, Country to Urban_population_pct_of_total\n", "dtypes: float64(355), int64(2), object(1)\n", "memory usage: 565.1+ KB\n" ] } ], "source": [ "df_who.info()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)Population in urban areas (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
count202.000000202.000000177.000000131.000000178.000000179.000000179.0000001.930000e+02193.000000193.000000...1.860000e+021.780000e+02128.0000001.710000e+02181.000000170.000000181.0000001.880000e+02188.000000188.000000
mean101.5000003.57920859.45762778.87175611250.11236084.03352085.6983243.409805e+041.29792754.911917...1.483596e+052.015567e+1157.2535163.424012e+0856.67762454.35647156.6776241.665763e+072.16585155.195213
std58.4565371.80826349.10528620.41576012586.75341717.78804715.4512121.304957e+051.16386423.554182...6.133091e+059.400689e+11138.6692985.943043e+1060.06092961.16055660.0609295.094867e+071.59662823.742122
min1.0000001.0000000.00000023.600000260.0000006.00000011.0000002.000000e+00-2.50000010.000000...2.565000e+015.190000e+070.990000-7.140000e+112.9000003.0000002.9000001.545600e+04-1.16000010.000000
25%51.2500002.00000019.00000068.4000002112.50000079.00000079.5000001.340000e+030.50000036.000000...1.672615e+033.317500e+0916.292500-1.210000e+0912.4000008.47500012.4000009.171623e+051.10500035.650000
50%101.5000003.00000046.00000086.5000006175.00000090.00000090.0000006.762000e+031.30000057.000000...1.021157e+041.145000e+1028.515000-2.240000e+0829.98000027.60000029.9800003.427661e+061.94500057.300000
75%151.7500005.00000091.00000095.30000014502.50000096.00000096.0000002.173200e+042.10000073.000000...6.549217e+048.680000e+1055.3100001.024000e+0988.70000082.90000088.7000009.837113e+063.25250072.750000
max202.0000007.000000199.00000099.80000060870.000000100.000000100.0000001.328474e+064.300000100.000000...5.776432e+061.100000e+131334.8600001.390000e+11267.000000253.700000267.0000005.270000e+087.850000100.000000
\n", "

8 rows × 357 columns

\n", "
" ], "text/plain": [ " CountryID Continent Adolescent fertility rate (%) \\\n", "count 202.000000 202.000000 177.000000 \n", "mean 101.500000 3.579208 59.457627 \n", "std 58.456537 1.808263 49.105286 \n", "min 1.000000 1.000000 0.000000 \n", "25% 51.250000 2.000000 19.000000 \n", "50% 101.500000 3.000000 46.000000 \n", "75% 151.750000 5.000000 91.000000 \n", "max 202.000000 7.000000 199.000000 \n", "\n", " Adult literacy rate (%) \\\n", "count 131.000000 \n", "mean 78.871756 \n", "std 20.415760 \n", "min 23.600000 \n", "25% 68.400000 \n", "50% 86.500000 \n", "75% 95.300000 \n", "max 99.800000 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "count 178.000000 \n", "mean 11250.112360 \n", "std 12586.753417 \n", "min 260.000000 \n", "25% 2112.500000 \n", "50% 6175.000000 \n", "75% 14502.500000 \n", "max 60870.000000 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "count 179.000000 \n", "mean 84.033520 \n", "std 17.788047 \n", "min 6.000000 \n", "25% 79.000000 \n", "50% 90.000000 \n", "75% 96.000000 \n", "max 100.000000 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "count 179.000000 \n", "mean 85.698324 \n", "std 15.451212 \n", "min 11.000000 \n", "25% 79.500000 \n", "50% 90.000000 \n", "75% 96.000000 \n", "max 100.000000 \n", "\n", " Population (in thousands) total Population annual growth rate (%) \\\n", "count 1.930000e+02 193.000000 \n", "mean 3.409805e+04 1.297927 \n", "std 1.304957e+05 1.163864 \n", "min 2.000000e+00 -2.500000 \n", "25% 1.340000e+03 0.500000 \n", "50% 6.762000e+03 1.300000 \n", "75% 2.173200e+04 2.100000 \n", "max 1.328474e+06 4.300000 \n", "\n", " Population in urban areas (%) ... Total_CO2_emissions Total_income \\\n", "count 193.000000 ... 1.860000e+02 1.780000e+02 \n", "mean 54.911917 ... 1.483596e+05 2.015567e+11 \n", "std 23.554182 ... 6.133091e+05 9.400689e+11 \n", "min 10.000000 ... 2.565000e+01 5.190000e+07 \n", "25% 36.000000 ... 1.672615e+03 3.317500e+09 \n", "50% 57.000000 ... 1.021157e+04 1.145000e+10 \n", "75% 73.000000 ... 6.549217e+04 8.680000e+10 \n", "max 100.000000 ... 5.776432e+06 1.100000e+13 \n", "\n", " Total_reserves Trade_balance_goods_and_services \\\n", "count 128.000000 1.710000e+02 \n", "mean 57.253516 3.424012e+08 \n", "std 138.669298 5.943043e+10 \n", "min 0.990000 -7.140000e+11 \n", "25% 16.292500 -1.210000e+09 \n", "50% 28.515000 -2.240000e+08 \n", "75% 55.310000 1.024000e+09 \n", "max 1334.860000 1.390000e+11 \n", "\n", " Under_five_mortality_from_CME Under_five_mortality_from_IHME \\\n", "count 181.000000 170.000000 \n", "mean 56.677624 54.356471 \n", "std 60.060929 61.160556 \n", "min 2.900000 3.000000 \n", "25% 12.400000 8.475000 \n", "50% 29.980000 27.600000 \n", "75% 88.700000 82.900000 \n", "max 267.000000 253.700000 \n", "\n", " Under_five_mortality_rate Urban_population Urban_population_growth \\\n", "count 181.000000 1.880000e+02 188.000000 \n", "mean 56.677624 1.665763e+07 2.165851 \n", "std 60.060929 5.094867e+07 1.596628 \n", "min 2.900000 1.545600e+04 -1.160000 \n", "25% 12.400000 9.171623e+05 1.105000 \n", "50% 29.980000 3.427661e+06 1.945000 \n", "75% 88.700000 9.837113e+06 3.252500 \n", "max 267.000000 5.270000e+08 7.850000 \n", "\n", " Urban_population_pct_of_total \n", "count 188.000000 \n", "mean 55.195213 \n", "std 23.742122 \n", "min 10.000000 \n", "25% 35.650000 \n", "50% 57.300000 \n", "75% 72.750000 \n", "max 100.000000 \n", "\n", "[8 rows x 357 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Columnes\n", "\n", "Com hem comentat a l'introducció, començarem amb les columnes" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
0Afghanistan11151.028.0NaNNaNNaN26088.04.0...692.50NaNNaNNaN257.00231.9257.005740436.05.4422.9
1Albania2227.098.76000.093.094.03172.00.6...3499.124.790000e+0978.14-2.040000e+0918.4715.518.471431793.92.2145.4
2Algeria336.069.95940.094.096.033351.01.5...137535.566.970000e+10351.364.700000e+0940.0031.240.0020800000.02.6163.3
3Andorra42NaNNaNNaN83.083.074.01.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4Angola53146.067.43890.049.051.016557.02.8...8991.461.490000e+1027.139.140000e+09164.10242.5164.108578749.04.1453.3
\n", "

5 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "0 Afghanistan 1 1 151.0 \n", "1 Albania 2 2 27.0 \n", "2 Algeria 3 3 6.0 \n", "3 Andorra 4 2 NaN \n", "4 Angola 5 3 146.0 \n", "\n", " Adult literacy rate (%) \\\n", "0 28.0 \n", "1 98.7 \n", "2 69.9 \n", "3 NaN \n", "4 67.4 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "0 NaN \n", "1 6000.0 \n", "2 5940.0 \n", "3 NaN \n", "4 3890.0 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "0 NaN \n", "1 93.0 \n", "2 94.0 \n", "3 83.0 \n", "4 49.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "0 NaN \n", "1 94.0 \n", "2 96.0 \n", "3 83.0 \n", "4 51.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "0 26088.0 4.0 ... \n", "1 3172.0 0.6 ... \n", "2 33351.0 1.5 ... \n", "3 74.0 1.0 ... \n", "4 16557.0 2.8 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "0 692.50 NaN NaN \n", "1 3499.12 4.790000e+09 78.14 \n", "2 137535.56 6.970000e+10 351.36 \n", "3 NaN NaN NaN \n", "4 8991.46 1.490000e+10 27.13 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "0 NaN 257.00 \n", "1 -2.040000e+09 18.47 \n", "2 4.700000e+09 40.00 \n", "3 NaN NaN \n", "4 9.140000e+09 164.10 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "0 231.9 257.00 \n", "1 15.5 18.47 \n", "2 31.2 40.00 \n", "3 NaN NaN \n", "4 242.5 164.10 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "0 5740436.0 5.44 22.9 \n", "1 1431793.9 2.21 45.4 \n", "2 20800000.0 2.61 63.3 \n", "3 NaN NaN NaN \n", "4 8578749.0 4.14 53.3 \n", "\n", "[5 rows x 358 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who.head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(202, 358)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who.shape" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['Country', 'CountryID', 'Continent', 'Adolescent fertility rate (%)',\n", " 'Adult literacy rate (%)',\n", " 'Gross national income per capita (PPP international $)',\n", " 'Net primary school enrolment ratio female (%)',\n", " 'Net primary school enrolment ratio male (%)',\n", " 'Population (in thousands) total', 'Population annual growth rate (%)',\n", " ...\n", " 'Total_CO2_emissions', 'Total_income', 'Total_reserves',\n", " 'Trade_balance_goods_and_services', 'Under_five_mortality_from_CME',\n", " 'Under_five_mortality_from_IHME', 'Under_five_mortality_rate',\n", " 'Urban_population', 'Urban_population_growth',\n", " 'Urban_population_pct_of_total'],\n", " dtype='object', length=358)\n" ] } ], "source": [ "# Columnas o características de cada muestra\n", "print(df_who.columns)\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ix:0\tlabel:Country\n", "ix:1\tlabel:CountryID\n", "ix:2\tlabel:Continent\n", "ix:3\tlabel:Adolescent fertility rate (%)\n", "ix:4\tlabel:Adult literacy rate (%)\n", "ix:5\tlabel:Gross national income per capita (PPP international $)\n", "ix:6\tlabel:Net primary school enrolment ratio female (%)\n", "ix:7\tlabel:Net primary school enrolment ratio male (%)\n", "ix:8\tlabel:Population (in thousands) total\n", "ix:9\tlabel:Population annual growth rate (%)\n", "ix:10\tlabel:Population in urban areas (%)\n", "ix:11\tlabel:Population living below the poverty line (% living on < US$1 per day)\n", "ix:12\tlabel:Population median age (years)\n", "ix:13\tlabel:Population proportion over 60 (%)\n", "ix:14\tlabel:Population proportion under 15 (%)\n", "ix:15\tlabel:Registration coverage of births (%)\n", "ix:16\tlabel:Total fertility rate (per woman)\n", "ix:17\tlabel:Antenatal care coverage - at least four visits (%)\n", "ix:18\tlabel:Antiretroviral therapy coverage among HIV-infected pregt women for PMTCT (%)\n", "ix:19\tlabel:Antiretroviral therapy coverage among people with advanced HIV infections (%)\n", "ix:20\tlabel:Births attended by skilled health personnel (%)\n", "ix:21\tlabel:Births by caesarean section (%)\n", "ix:22\tlabel:Children aged 6-59 months who received vitamin A supplementation (%)\n", "ix:23\tlabel:Children aged <5 years sleeping under insecticide-treated nets (%)\n", "ix:24\tlabel:Children aged <5 years who received any antimalarial treatment for fever (%)\n", "ix:25\tlabel:Children aged <5 years with ARI symptoms taken to facility (%)\n", "ix:26\tlabel:Children aged <5 years with diarrhoea receiving ORT (%)\n", "ix:27\tlabel:Contraceptive prevalence (%)\n", "ix:28\tlabel:Neonates protected at birth against neonatal tetanus (PAB) (%)\n", "ix:29\tlabel:One-year-olds immunized with MCV\n", "ix:30\tlabel:One-year-olds immunized with three doses of diphtheria tetanus toxoid and pertussis (DTP3) (%)\n", "ix:31\tlabel:One-year-olds immunized with three doses of Hepatitis B (HepB3) (%)\n", "ix:32\tlabel:One-year-olds immunized with three doses of Hib (Hib3) vaccine (%)\n", "ix:33\tlabel:Tuberculosis detection rate under DOTS (%)\n", "ix:34\tlabel:Tuberculosis treatment success under DOTS (%)\n", "ix:35\tlabel:Women who have had mammography (%)\n", "ix:36\tlabel:Women who have had PAP smear (%)\n", "ix:37\tlabel:Community and traditional health workers density (per 10 000 population)\n", "ix:38\tlabel:Dentistry personnel density (per 10 000 population)\n", "ix:39\tlabel:Environment and public health workers density (per 10 000 population)\n", "ix:40\tlabel:External resources for health as percentage of total expenditure on health\n", "ix:41\tlabel:General government expenditure on health as percentage of total expenditure on health\n", "ix:42\tlabel:General government expenditure on health as percentage of total government expenditure\n", "ix:43\tlabel:Hospital beds (per 10 000 population)\n", "ix:44\tlabel:Laboratory health workers density (per 10 000 population)\n", "ix:45\tlabel:Number of community and traditional health workers\n", "ix:46\tlabel:Number of dentistry personnel\n", "ix:47\tlabel:Number of environment and public health workers\n", "ix:48\tlabel:Number of laboratory health workers\n", "ix:49\tlabel:Number of nursing and midwifery personnel\n", "ix:50\tlabel:Number of other health service providers\n", "ix:51\tlabel:Number of pharmaceutical personnel\n", "ix:52\tlabel:Number of physicians\n", "ix:53\tlabel:Nursing and midwifery personnel density (per 10 000 population)\n", "ix:54\tlabel:Other health service providers density (per 10 000 population)\n", "ix:55\tlabel:Out-of-pocket expenditure as percentage of private expenditure on health\n", "ix:56\tlabel:Per capita government expenditure on health (PPP int. $)\n", "ix:57\tlabel:Per capita government expenditure on health at average exchange rate (US$)\n", "ix:58\tlabel:Per capita total expenditure on health (PPP int. $)\n", "ix:59\tlabel:Per capita total expenditure on health at average exchange rate (US$)\n", "ix:60\tlabel:Pharmaceutical personnel density (per 10 000 population)\n", "ix:61\tlabel:Physicians density (per 10 000 population)\n", "ix:62\tlabel:Private expenditure on health as percentage of total expenditure on health\n", "ix:63\tlabel:Private prepaid plans as percentage of private expenditure on health\n", "ix:64\tlabel:Ratio of health management and support workers to health service providers\n", "ix:65\tlabel:Ratio of nurses and midwives to physicians\n", "ix:66\tlabel:Social security expenditure on health as percentage of general government expenditure on health\n", "ix:67\tlabel:Total expenditure on health as percentage of gross domestic product\n", "ix:68\tlabel:Births attended by skilled health personnel (%) highest educational level of mother\n", "ix:69\tlabel:Births attended by skilled health personnel (%) highest wealth quintile\n", "ix:70\tlabel:Births attended by skilled health personnel (%) lowest educational level of mother\n", "ix:71\tlabel:Births attended by skilled health personnel (%) lowest wealth quintile\n", "ix:72\tlabel:Births attended by skilled health personnel (%) rural\n", "ix:73\tlabel:Births attended by skilled health personnel (%) urban\n", "ix:74\tlabel:Births attended by skilled health personnel difference highest lowest educational level of mother\n", "ix:75\tlabel:Births attended by skilled health personnel difference highest-lowest wealth quintile\n", "ix:76\tlabel:Births attended by skilled health personnel difference urban-rural\n", "ix:77\tlabel:Births attended by skilled health personnel ratio highest-lowest educational level of mother\n", "ix:78\tlabel:Births attended by skilled health personnel ratio highest-lowest wealth quintile\n", "ix:79\tlabel:Births attended by skilled health personnel ratio urban-rural\n", "ix:80\tlabel:Measles immunization coverage among one-year-olds (%) highest educational level of mother\n", "ix:81\tlabel:Measles immunization coverage among one-year-olds (%) highest wealth quintile\n", "ix:82\tlabel:Measles immunization coverage among one-year-olds (%) lowest educational level of mother\n", "ix:83\tlabel:Measles immunization coverage among one-year-olds (%) lowest wealth quintile\n", "ix:84\tlabel:Measles immunization coverage among one-year-olds (%) rural\n", "ix:85\tlabel:Measles immunization coverage among one-year-olds (%) urban\n", "ix:86\tlabel:Measles immunization coverage among one-year-olds difference highest-lowest educational level of mother\n", "ix:87\tlabel:Measles immunization coverage among one-year-olds difference highest-lowest wealth quintile\n", "ix:88\tlabel:Measles immunization coverage among one-year-olds difference urban-rural\n", "ix:89\tlabel:Measles immunization coverage among one-year-olds ratio highest-lowest educational level of mother\n", "ix:90\tlabel:Measles immunization coverage among one-year-olds ratio highest-lowest wealth quintile\n", "ix:91\tlabel:Measles immunization coverage among one-year-olds ratio urban-rural\n", "ix:92\tlabel:Under-5 mortality rate (Probability of dying aged < 5 years per 1 000 live births) difference lowest-highest educational level of mother\n", "ix:93\tlabel:Under-5 mortality rate (Probability of dying aged < 5 years per 1 000 live births) difference lowest-highest wealth quintile\n", "ix:94\tlabel:Under-5 mortality rate (Probability of dying aged < 5 years per 1 000 live births) difference rural-urban\n", "ix:95\tlabel:Under-5 mortality rate (Probability of dying aged < 5 years per 1 000 live births) highest educational level of mother\n", "ix:96\tlabel:Under-5 mortality rate (Probability of dying aged < 5 years per 1 000 live births) highest wealth quintile\n", "ix:97\tlabel:Under-5 mortality rate (Probability of dying aged < 5 years per 1 000 live births) lowest educational level of mother\n", "ix:98\tlabel:Under-5 mortality rate (Probability of dying aged < 5 years per 1 000 live births) lowest wealth quintile\n", "ix:99\tlabel:Under-5 mortality rate (Probability of dying aged < 5 years per 1 000 live births) ratio lowest-highest educational level of mother\n", "ix:100\tlabel:Under-5 mortality rate (Probability of dying aged < 5 years per 1 000 live births) ratio lowest-highest wealth quintile\n", "ix:101\tlabel:Under-5 mortality rate (Probability of dying aged < 5 years per 1 000 live births) ratio rural-urban\n", "ix:102\tlabel:Under-5 mortality rate (Probability of dying aged < 5 years per 1 000 live births) rural\n", "ix:103\tlabel:Under-5 mortality rate (Probability of dying aged < 5 years per 1 000 live births) urban\n", "ix:104\tlabel:Adult mortality rate (probability of dying between 15 to 60 years per 1000 population) both sexes\n", "ix:105\tlabel:Adult mortality rate (probability of dying between 15 to 60 years per 1000 population) female\n", "ix:106\tlabel:Adult mortality rate (probability of dying between 15 to 60 years per 1000 population) male\n", "ix:107\tlabel:Age-standardized mortality rate for cancer (per 100 000 population)\n", "ix:108\tlabel:Age-standardized mortality rate for cardiovascular diseases (per 100 000 population)\n", "ix:109\tlabel:Age-standardized mortality rate for injuries (per 100 000 population)\n", "ix:110\tlabel:Age-standardized mortality rate for non-communicable diseases (per 100 000 population)\n", "ix:111\tlabel:Deaths among children under five years of age due to diarrhoeal diseases (%)\n", "ix:112\tlabel:Deaths among children under five years of age due to HIV/AIDS (%)\n", "ix:113\tlabel:Deaths among children under five years of age due to injuries (%)\n", "ix:114\tlabel:Deaths among children under five years of age due to malaria (%)\n", "ix:115\tlabel:Deaths among children under five years of age due to measles (%)\n", "ix:116\tlabel:Deaths among children under five years of age due to neonatal causes (%)\n", "ix:117\tlabel:Deaths among children under five years of age due to other causes (%)\n", "ix:118\tlabel:Deaths among children under five years of age due to pneumonia (%)\n", "ix:119\tlabel:Deaths due to HIV/AIDS (per 100 000 population per year)\n", "ix:120\tlabel:Deaths due to tuberculosis among HIV-negative people (per 100 000 population)\n", "ix:121\tlabel:Deaths due to tuberculosis among HIV-positive people (per 100 000 population)\n", "ix:122\tlabel:Healthy life expectancy (HALE) at birth (years) both sexes\n", "ix:123\tlabel:Healthy life expectancy (HALE) at birth (years) female\n", "ix:124\tlabel:Healthy life expectancy (HALE) at birth (years) male\n", "ix:125\tlabel:Incidence of tuberculosis (per 100 000 population per year)\n", "ix:126\tlabel:Infant mortality rate (per 1 000 live births) both sexes\n", "ix:127\tlabel:Infant mortality rate (per 1 000 live births) female\n", "ix:128\tlabel:Infant mortality rate (per 1 000 live births) male\n", "ix:129\tlabel:Life expectancy at birth (years) both sexes\n", "ix:130\tlabel:Life expectancy at birth (years) female \n", "ix:131\tlabel:Life expectancy at birth (years) male\n", "ix:132\tlabel:Maternal mortality ratio (per 100 000 live births)\n", "ix:133\tlabel:Neonatal mortality rate (per 1 000 live births)\n", "ix:134\tlabel:Number of confirmed poliomyelitis cases\n", "ix:135\tlabel:Prevalence of HIV among adults aged >=15 years (per 100 000 population)\n", "ix:136\tlabel:Prevalence of tuberculosis (per 100 000 population)\n", "ix:137\tlabel:Under-5 mortality rate (probability of dying by age 5 per 1000 live births) both sexes\n", "ix:138\tlabel:Under-5 mortality rate (probability of dying by age 5 per 1000 live births) female\n", "ix:139\tlabel:Under-5 mortality rate (probability of dying by age 5 per 1000 live births) male\n", "ix:140\tlabel:Years of life lost to communicable diseases (%)\n", "ix:141\tlabel:Years of life lost to injuries (%)\n", "ix:142\tlabel:Years of life lost to non-communicable diseases (%)\n", "ix:143\tlabel:Children under five years of age overweight for age (%)\n", "ix:144\tlabel:Children under five years of age stunted for age (%)\n", "ix:145\tlabel:Children under five years of age underweight for age (%)\n", "ix:146\tlabel:Newborns with low birth weight (%)\n", "ix:147\tlabel:Per capita recorded alcohol consumption (litres of pure alcohol) among adults (>=15 years)\n", "ix:148\tlabel:Population using solid fuels (%) rural\n", "ix:149\tlabel:Population using solid fuels (%) urban\n", "ix:150\tlabel:Population with sustainable access to improved drinking water sources (%) rural\n", "ix:151\tlabel:Population with sustainable access to improved drinking water sources (%) total\n", "ix:152\tlabel:Population with sustainable access to improved drinking water sources (%) urban\n", "ix:153\tlabel:Population with sustainable access to improved sanitation (%) rural\n", "ix:154\tlabel:Population with sustainable access to improved sanitation (%) total\n", "ix:155\tlabel:Population with sustainable access to improved sanitation (%) urban\n", "ix:156\tlabel:Prevalence of adults (>=15 years) who are obese (%) female\n", "ix:157\tlabel:Prevalence of adults (>=15 years) who are obese (%) male\n", "ix:158\tlabel:Prevalence of condom use by young people (15-24 years) at higher risk sex (%) female\n", "ix:159\tlabel:Prevalence of condom use by young people (15-24 years) at higher risk sex (%) male\n", "ix:160\tlabel:Prevalence of current tobacco use among adolescents (13-15 years) (%) both sexes\n", "ix:161\tlabel:Prevalence of current tobacco use among adolescents (13-15 years) (%) female\n", "ix:162\tlabel:Prevalence of current tobacco use among adolescents (13-15 years) (%) male\n", "ix:163\tlabel:Prevalence of current tobacco use among adults (>=15 years) (%) both sexes\n", "ix:164\tlabel:Prevalence of current tobacco use among adults (>=15 years) (%) female\n", "ix:165\tlabel:Prevalence of current tobacco use among adults (>=15 years) (%) male\n", "ix:166\tlabel:Adolescent_fertility_rate\n", "ix:167\tlabel:Agricultural_land\n", "ix:168\tlabel:Agriculture_contribution_to_economy\n", "ix:169\tlabel:Aid_given\n", "ix:170\tlabel:Aid_received\n", "ix:171\tlabel:Aid_received_total\n", "ix:172\tlabel:All_forms_of_TB_new_cases_per_100_000_estimated\n", "ix:173\tlabel:All_forms_of_TB_new_cases_per_100_000_reported\n", "ix:174\tlabel:Annual_freshwater_withdrawals_total\n", "ix:175\tlabel:Arms_exports\n", "ix:176\tlabel:Arms_imports\n", "ix:177\tlabel:Bad_teeth_per_child\n", "ix:178\tlabel:Births_attended_by_skilled_health_staff\n", "ix:179\tlabel:Breast_cancer_deaths_per_100_000_women\n", "ix:180\tlabel:Breast_cancer_new_cases_per_100_000_women\n", "ix:181\tlabel:Breast_cancer_number_of_female_deaths\n", "ix:182\tlabel:Breast_cancer_number_of_new_female_cases\n", "ix:183\tlabel:Broadband_subscribers\n", "ix:184\tlabel:Broadband_subscribers_per_100_people\n", "ix:185\tlabel:CO2_emissions\n", "ix:186\tlabel:CO2_intensity_of_economic_output\n", "ix:187\tlabel:Capital_formation\n", "ix:188\tlabel:Cell_phones_per_100_people\n", "ix:189\tlabel:Cell_phones_total\n", "ix:190\tlabel:Central_bank_discount_rate\n", "ix:191\tlabel:Cervical_cancer_deaths_per_100_000_women\n", "ix:192\tlabel:Cervical_cancer_new_cases_per_100_000_women\n", "ix:193\tlabel:Cervical_cancer_number_of_female_deaths\n", "ix:194\tlabel:Cervical_cancer_number_of_new_female_cases\n", "ix:195\tlabel:Children_and_elderly\n", "ix:196\tlabel:Children_out_of_school_primary\n", "ix:197\tlabel:Children_out_of_school_primary_female\n", "ix:198\tlabel:Children_out_of_school_primary_male\n", "ix:199\tlabel:Children_per_woman\n", "ix:200\tlabel:Coal_consumption\n", "ix:201\tlabel:Coal_consumption_per_person\n", "ix:202\tlabel:Coal_production\n", "ix:203\tlabel:Coal_production_per_person\n", "ix:204\tlabel:Colon_and_Rectum_cancer_deaths_per_100_000_men\n", "ix:205\tlabel:Colon_and_Rectum_cancer_deaths_per_100_000_women\n", "ix:206\tlabel:Colon_and_Rectum_cancer_new_cases_per_100_000_men\n", "ix:207\tlabel:Colon_and_Rectum_cancer_new_cases_per_100_000_women\n", "ix:208\tlabel:Colon_and_Rectum_cancer_number_of_female_deaths\n", "ix:209\tlabel:Colon_and_Rectum_cancer_number_of_male_deaths\n", "ix:210\tlabel:Colon_and_Rectum_cancer_number_of_new_female_cases\n", "ix:211\tlabel:Colon_and_Rectum_cancer_number_of_new_male_cases\n", "ix:212\tlabel:Consumer_price_index\n", "ix:213\tlabel:Contraceptive_use\n", "ix:214\tlabel:Deaths_from_TB_per_100_000_estimated\n", "ix:215\tlabel:Debt_servicing_costs\n", "ix:216\tlabel:Democracy_score\n", "ix:217\tlabel:Electric_power_consumption\n", "ix:218\tlabel:Electricity_generation\n", "ix:219\tlabel:Electricity_generation_per_person\n", "ix:220\tlabel:Energy_use\n", "ix:221\tlabel:Expenditure_per_student_primary\n", "ix:222\tlabel:Expenditure_per_student_secondary\n", "ix:223\tlabel:Expenditure_per_student_tertiary\n", "ix:224\tlabel:Exports_of_goods_and_services\n", "ix:225\tlabel:Exports_unit_value\n", "ix:226\tlabel:External_debt_total_DOD_current_USdollars\n", "ix:227\tlabel:External_debt_total_pct_of_GNI\n", "ix:228\tlabel:Female_labour_force\n", "ix:229\tlabel:Fixed_line_and_mobile_phone_subscribers\n", "ix:230\tlabel:Foreign_direct_investment_net_inflows\n", "ix:231\tlabel:Foreign_direct_investment_net_outflows\n", "ix:232\tlabel:Forest_area\n", "ix:233\tlabel:Gross_capital_formation\n", "ix:234\tlabel:HIV_infected\n", "ix:235\tlabel:Health_expenditure_per_person\n", "ix:236\tlabel:Health_expenditure_private\n", "ix:237\tlabel:Health_expenditure_public_pct_of_GDP\n", "ix:238\tlabel:Health_expenditure_public_pct_of_government_expenditure\n", "ix:239\tlabel:Health_expenditure_public_pct_of_total_health_expenditure\n", "ix:240\tlabel:Health_expenditure_total\n", "ix:241\tlabel:High_technology_exports\n", "ix:242\tlabel:Hydroelectricity_consumption\n", "ix:243\tlabel:Hydroelectricity_consumption_per_person\n", "ix:244\tlabel:Imports_of_goods_and_services\n", "ix:245\tlabel:Imports_unit_value\n", "ix:246\tlabel:Improved_sanitation_facilities_urban\n", "ix:247\tlabel:Improved_water_source\n", "ix:248\tlabel:Income_growth\n", "ix:249\tlabel:Income_per_person\n", "ix:250\tlabel:Income_share_held_by_lowest_20pct\n", "ix:251\tlabel:Industry_contribution_to_economy\n", "ix:252\tlabel:Inequality_index\n", "ix:253\tlabel:Infant_mortality_rate\n", "ix:254\tlabel:Infectious_TB_new_cases_per_100_000_estimated\n", "ix:255\tlabel:Infectious_TB_new_cases_per_100_000_reported\n", "ix:256\tlabel:Infectious_TB_treatment_completeness\n", "ix:257\tlabel:Inflation_GDP_deflator\n", "ix:258\tlabel:Internet_users\n", "ix:259\tlabel:Life_expectancy_at_birth\n", "ix:260\tlabel:Literacy_rate_adult_female\n", "ix:261\tlabel:Literacy_rate_adult_male\n", "ix:262\tlabel:Literacy_rate_adult_total\n", "ix:263\tlabel:Literacy_rate_youth_female\n", "ix:264\tlabel:Literacy_rate_youth_male\n", "ix:265\tlabel:Literacy_rate_youth_total\n", "ix:266\tlabel:Liver_cancer_deaths_per_100_000_men\n", "ix:267\tlabel:Liver_cancer_deaths_per_100_000_women\n", "ix:268\tlabel:Liver_cancer_new_cases_per_100_000_men\n", "ix:269\tlabel:Liver_cancer_new_cases_per_100_000_women\n", "ix:270\tlabel:Liver_cancer_number_of_female_deaths\n", "ix:271\tlabel:Liver_cancer_number_of_male_deaths\n", "ix:272\tlabel:Liver_cancer_number_of_new_female_cases\n", "ix:273\tlabel:Liver_cancer_number_of_new_male_cases\n", "ix:274\tlabel:Lung_cancer_deaths_per_100_000_men\n", "ix:275\tlabel:Lung_cancer_deaths_per_100_000_women\n", "ix:276\tlabel:Lung_cancer_new_cases_per_100_000_men\n", "ix:277\tlabel:Lung_cancer_new_cases_per_100_000_women\n", "ix:278\tlabel:Lung_cancer_number_of_female_deaths\n", "ix:279\tlabel:Lung_cancer_number_of_male_deaths\n", "ix:280\tlabel:Lung_cancer_number_of_new_female_cases\n", "ix:281\tlabel:Lung_cancer_number_of_new_male_cases\n", "ix:282\tlabel:Malaria_prevention_insecticide_treated_bed_nets_usage\n", "ix:283\tlabel:Malaria_treatment\n", "ix:284\tlabel:Malnutrition_weight_for_age\n", "ix:285\tlabel:Market_value_of_listed_companies\n", "ix:286\tlabel:Maternal_mortality\n", "ix:287\tlabel:Math_achievement_4th_grade\n", "ix:288\tlabel:Math_achievement_8th_grade\n", "ix:289\tlabel:Measles_immunization\n", "ix:290\tlabel:Medical_Doctors\n", "ix:291\tlabel:Merchandise_trade\n", "ix:292\tlabel:Military_expenditure\n", "ix:293\tlabel:Natural_gas_consumption\n", "ix:294\tlabel:Natural_gas_consumption_per_person\n", "ix:295\tlabel:Natural_gas_production\n", "ix:296\tlabel:Natural_gas_production_per_person\n", "ix:297\tlabel:Natural_gas_proved_reserves\n", "ix:298\tlabel:Natural_gas_proven_reserves_per_person\n", "ix:299\tlabel:Net_barter_terms_of_trade\n", "ix:300\tlabel:Nuclear_consumption\n", "ix:301\tlabel:Nuclear_consumption_per_person\n", "ix:302\tlabel:Number_of_deaths_from_TB_estimated\n", "ix:303\tlabel:Number_of_existing_TB_cases_estimated\n", "ix:304\tlabel:Oil_consumption\n", "ix:305\tlabel:Oil_consumption_per_person\n", "ix:306\tlabel:Oil_production\n", "ix:307\tlabel:Oil_production_per_person\n", "ix:308\tlabel:Oil_proved_reserves\n", "ix:309\tlabel:Oil_proven_reserves_per_person\n", "ix:310\tlabel:Old_version_of_Income_per_person\n", "ix:311\tlabel:Patent_applications\n", "ix:312\tlabel:Patents_granted\n", "ix:313\tlabel:Patents_in_force\n", "ix:314\tlabel:People_living_with_HIV\n", "ix:315\tlabel:Personal_computers_per_100_people\n", "ix:316\tlabel:Personal_computers_total\n", "ix:317\tlabel:Population_growth\n", "ix:318\tlabel:Population_in_urban_agglomerations_more_than_1_million\n", "ix:319\tlabel:Population_total\n", "ix:320\tlabel:Poverty_headcount_ratio_at_national_poverty_line\n", "ix:321\tlabel:Present_value_of_debt\n", "ix:322\tlabel:Primary_completion_rate_total\n", "ix:323\tlabel:Primary_energy_consumption\n", "ix:324\tlabel:Primary_energy_consumption_per_person\n", "ix:325\tlabel:Primary_school_completion_pct_of_boys\n", "ix:326\tlabel:Primary_school_completion_pct_of_girls\n", "ix:327\tlabel:Prostate_cancer_deaths_per_100_000_men\n", "ix:328\tlabel:Prostate_cancer_new_cases_per_100_000_men\n", "ix:329\tlabel:Prostate_cancer_number_of_male_deaths\n", "ix:330\tlabel:Prostate_cancer_number_of_new_male_cases\n", "ix:331\tlabel:Pump_price_for_gasoline\n", "ix:332\tlabel:Ratio_of_girls_to_boys_in_primary_and_secondary_education\n", "ix:333\tlabel:Ratio_of_young_literate_females_to_males\n", "ix:334\tlabel:Roads_paved\n", "ix:335\tlabel:SO2_emissions_per_person\n", "ix:336\tlabel:Services_contribution_to_economy\n", "ix:337\tlabel:Stomach_cancer_deaths_per_100_000_men\n", "ix:338\tlabel:Stomach_cancer_deaths_per_100_000_women\n", "ix:339\tlabel:Stomach_cancer_new_cases_per_100_000_men\n", "ix:340\tlabel:Stomach_cancer_new_cases_per_100_000_women\n", "ix:341\tlabel:Stomach_cancer_number_of_female_deaths\n", "ix:342\tlabel:Stomach_cancer_number_of_male_deaths\n", "ix:343\tlabel:Stomach_cancer_number_of_new_female_cases\n", "ix:344\tlabel:Stomach_cancer_number_of_new_male_cases\n", "ix:345\tlabel:Sugar_per_person\n", "ix:346\tlabel:Surface_area\n", "ix:347\tlabel:Tax_revenue\n", "ix:348\tlabel:Total_CO2_emissions\n", "ix:349\tlabel:Total_income\n", "ix:350\tlabel:Total_reserves\n", "ix:351\tlabel:Trade_balance_goods_and_services\n", "ix:352\tlabel:Under_five_mortality_from_CME\n", "ix:353\tlabel:Under_five_mortality_from_IHME\n", "ix:354\tlabel:Under_five_mortality_rate\n", "ix:355\tlabel:Urban_population\n", "ix:356\tlabel:Urban_population_growth\n", "ix:357\tlabel:Urban_population_pct_of_total\n" ] } ], "source": [ "for ix, col in enumerate(df_who.columns): #label\n", " print(\"ix:%i\\tlabel:%s\"%(ix,col))" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Si inspeccionamos y comparamos los tipos del dataframe y de las columnas..." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(df_who)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "pandas.core.indexes.base.Index" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(df_who.columns)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "\n", "\n", "Podemos utilizar el nombre de una columna para obtener los datos de dicha columna, tal como lo hacíamos con un diccionario `python`. Veremos dos maneras diferentes de hacerlo:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "0 Afghanistan\n", "1 Albania\n", "2 Algeria\n", "3 Andorra\n", "4 Angola\n", " ... \n", "197 Vietnam\n", "198 West Bank and Gaza\n", "199 Yemen\n", "200 Zambia\n", "201 Zimbabwe\n", "Name: Country, Length: 202, dtype: object" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "paises = df_who[\"Country\"]\n", "paises" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "**¿Qué tipo de datos es una columna?**" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(df_who[\"Country\"])" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Las `Series` són la otra estructura básica de Pandas. Las filas y las columnas se estructuran en `Series`, se pueden ver cómo un tipo de lista que solamente puede contener un único tipo de datos, acepta operaciones vectoriales y se puede indexar de manera similar a un diccionario." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Una vez que seleccionamos una columna, podemos acceder a sus elementos como si fueran una lista:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Afghanistan\n", "------------------------------\n", "0 Afghanistan\n", "1 Albania\n", "2 Algeria\n", "3 Andorra\n", "4 Angola\n", "Name: Country, dtype: object\n", "------------------------------\n", "['Afghanistan' 'Albania' 'Algeria' 'Andorra' 'Angola'\n", " 'Antigua and Barbuda' 'Argentina' 'Armenia' 'Australia' 'Austria'\n", " 'Azerbaijan' 'Bahamas' 'Bahrain' 'Bangladesh' 'Barbados' 'Belarus'\n", " 'Belgium' 'Belize' 'Benin' 'Bermuda' 'Bhutan' 'Bolivia'\n", " 'Bosnia and Herzegovina' 'Botswana' 'Brazil' 'Brunei Darussalam'\n", " 'Bulgaria' 'Burkina Faso' 'Burundi' 'Cambodia' 'Cameroon' 'Canada'\n", " 'Cape Verde' 'Central African Republic' 'Chad' 'Chile' 'China' 'Colombia'\n", " 'Comoros' 'Congo, Dem. Rep.' 'Congo, Rep.' 'Cook Islands' 'Costa Rica'\n", " \"Cote d'Ivoire\" 'Croatia' 'Cuba' 'Cyprus' 'Czech Republic' 'Denmark'\n", " 'Djibouti' 'Dominica' 'Dominican Republic' 'Ecuador' 'Egypt'\n", " 'El Salvador' 'Equatorial Guinea' 'Eritrea' 'Estonia' 'Ethiopia' 'Fiji'\n", " 'Finland' 'France' 'French Polynesia' 'Gabon' 'Gambia' 'Georgia'\n", " 'Germany' 'Ghana' 'Greece' 'Grenada' 'Guatemala' 'Guinea' 'Guinea-Bissau'\n", " 'Guyana' 'Haiti' 'Honduras' 'Hong Kong, China' 'Hungary' 'Iceland'\n", " 'India' 'Indonesia' 'Iran (Islamic Republic of)' 'Iraq' 'Ireland'\n", " 'Israel' 'Italy' 'Jamaica' 'Japan' 'Jordan' 'Kazakhstan' 'Kenya'\n", " 'Kiribati' 'Korea, Dem. Rep.' 'Korea, Rep.' 'Kuwait' 'Kyrgyzstan'\n", " \"Lao People's Democratic Republic\" 'Latvia' 'Lebanon' 'Lesotho' 'Liberia'\n", " 'Libyan Arab Jamahiriya' 'Lithuania' 'Luxembourg' 'Macao, China'\n", " 'Macedonia' 'Madagascar' 'Malawi' 'Malaysia' 'Maldives' 'Mali' 'Malta'\n", " 'Marshall Islands' 'Mauritania' 'Mauritius' 'Mexico'\n", " 'Micronesia (Federated States of)' 'Moldova' 'Monaco' 'Mongolia'\n", " 'Montenegro' 'Morocco' 'Mozambique' 'Myanmar' 'Namibia' 'Nauru' 'Nepal'\n", " 'Netherlands' 'Netherlands Antilles' 'New Caledonia' 'New Zealand'\n", " 'Nicaragua' 'Niger' 'Nigeria' 'Niue' 'Norway' 'Oman' 'Pakistan' 'Palau'\n", " 'Panama' 'Papua New Guinea' 'Paraguay' 'Peru' 'Philippines' 'Poland'\n", " 'Portugal' 'Puerto Rico' 'Qatar' 'Romania' 'Russia' 'Rwanda'\n", " 'Saint Kitts and Nevis' 'Saint Lucia' 'Saint Vincent and the Grenadines'\n", " 'Samoa' 'San Marino' 'Sao Tome and Principe' 'Saudi Arabia' 'Senegal'\n", " 'Serbia' 'Seychelles' 'Sierra Leone' 'Singapore' 'Slovakia' 'Slovenia'\n", " 'Solomon Islands' 'Somalia' 'South Africa' 'Spain' 'Sri Lanka' 'Sudan'\n", " 'Suriname' 'Swaziland' 'Sweden' 'Switzerland' 'Syria' 'Taiwan'\n", " 'Tajikistan' 'Tanzania' 'Thailand' 'Timor-Leste' 'Togo' 'Tonga'\n", " 'Trinidad and Tobago' 'Tunisia' 'Turkey' 'Turkmenistan' 'Tuvalu' 'Uganda'\n", " 'Ukraine' 'United Arab Emirates' 'United Kingdom'\n", " 'United States of America' 'Uruguay' 'Uzbekistan' 'Vanuatu' 'Venezuela'\n", " 'Vietnam' 'West Bank and Gaza' 'Yemen' 'Zambia' 'Zimbabwe']\n" ] } ], "source": [ "print(df_who[\"Country\"][0])\n", "print(\"-\"*30)\n", "print(df_who[\"Country\"][:5]) # slicing\n", "print(\"-\"*30)\n", "print(df_who[\"Country\"].values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Existe otra manera más sencilla de seleccionar una única columna, pero existen nombres de columna muy largos: 'Children aged <5 years who received any antimalarial treatment for fever (%)'.\n", "\n", "**Nota**: En la creación de documentos, es importante usar nombres adecuados para las columnas." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Afghanistan\n", "1 Albania\n", "2 Algeria\n", "3 Andorra\n", "4 Angola\n", " ... \n", "197 Vietnam\n", "198 West Bank and Gaza\n", "199 Yemen\n", "200 Zambia\n", "201 Zimbabwe\n", "Name: Country, Length: 202, dtype: object" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who.Country" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Population annual growth rate (%)\n", "------------------------------\n", "0 4.0\n", "1 0.6\n", "2 1.5\n", "3 1.0\n", "4 2.8\n", " ... \n", "197 1.4\n", "198 NaN\n", "199 3.0\n", "200 1.9\n", "201 0.8\n", "Name: Population annual growth rate (%), Length: 202, dtype: float64\n" ] } ], "source": [ "print(df_who.columns[9])\n", "print(\"-\"*30)\n", "print(df_who[df_who.columns[9]])" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)
0Afghanistan11151.028.0
1Albania2227.098.7
2Algeria336.069.9
3Andorra42NaNNaN
4Angola53146.067.4
..................
197Vietnam198625.090.3
198West Bank and Gaza1991NaNNaN
199Yemen200183.054.1
200Zambia2013161.068.0
201Zimbabwe2023101.089.5
\n", "

202 rows × 5 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "0 Afghanistan 1 1 151.0 \n", "1 Albania 2 2 27.0 \n", "2 Algeria 3 3 6.0 \n", "3 Andorra 4 2 NaN \n", "4 Angola 5 3 146.0 \n", ".. ... ... ... ... \n", "197 Vietnam 198 6 25.0 \n", "198 West Bank and Gaza 199 1 NaN \n", "199 Yemen 200 1 83.0 \n", "200 Zambia 201 3 161.0 \n", "201 Zimbabwe 202 3 101.0 \n", "\n", " Adult literacy rate (%) \n", "0 28.0 \n", "1 98.7 \n", "2 69.9 \n", "3 NaN \n", "4 67.4 \n", ".. ... \n", "197 90.3 \n", "198 NaN \n", "199 54.1 \n", "200 68.0 \n", "201 89.5 \n", "\n", "[202 rows x 5 columns]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Multiples columnas\n", "df_who[df_who.columns[0:5]]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryIDContinent
011
122
233
342
453
.........
1971986
1981991
1992001
2002013
2012023
\n", "

202 rows × 2 columns

\n", "
" ], "text/plain": [ " CountryID Continent\n", "0 1 1\n", "1 2 2\n", "2 3 3\n", "3 4 2\n", "4 5 3\n", ".. ... ...\n", "197 198 6\n", "198 199 1\n", "199 200 1\n", "200 201 3\n", "201 202 3\n", "\n", "[202 rows x 2 columns]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Dos columnas específicas\n", "df_who[[\"CountryID\",\"Continent\"]]" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(202, 2)\n", " Adolescent fertility rate (%) Adult literacy rate (%)\n", "0 151.0 28.0\n", "1 27.0 98.7\n", "2 6.0 69.9\n", "3 NaN NaN\n", "4 146.0 67.4\n" ] } ], "source": [ "# Existen dos métodos loc e iloc para acceder a sub-regiones de los datos\n", "# funció: .iloc(filas, columnas)\n", "df2 = df_who.iloc[:,3:5]\n", "print(df2.shape)\n", "print(df2.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Columnas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cuando seleccionamos una columna de un DataFrame, obtenemos una Serie. Las Series tienen ciertas características, como la capacidad de aplicar métodos estadísticos (si son Series numéricas)." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 151.0\n", "1 27.0\n", "2 6.0\n", "3 NaN\n", "4 146.0\n", " ... \n", "197 25.0\n", "198 NaN\n", "199 83.0\n", "200 161.0\n", "201 101.0\n", "Name: Adolescent fertility rate (%), Length: 202, dtype: float64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fertilitat = df_who[df_who.columns[3]]\n", "\n", "fertilitat" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Min 0.0\n", "Max 199.0\n", "Count 177\n" ] } ], "source": [ "print(\"Min \", fertilitat.min()) # a Pandas el concepte d'iterar \"no té sentit\"\n", "print(\"Max \", fertilitat.max())\n", "print(\"Count \", fertilitat.count())" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "Tabla con las funciones descriptivas
\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Veremos que obtener esta información estadística nos puede ayudar a extraer información muy concreta de la tabla, por ejemplo, si queremos saber:\n", "\n", "**¿Qué país tiene la mayor emisión de CO2?**" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5776431.5" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "co2 = df_who[\"Total_CO2_emissions\"]\n", "co2.max()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'United States of America'" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who[co2==co2.max()][\"Country\"].values[0]" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 692.50\n", "1 3499.12\n", "2 137535.56\n", "3 NaN\n", "4 8991.46\n", " ... \n", "197 101826.23\n", "198 655.86\n", "199 20148.34\n", "200 2366.94\n", "201 11457.33\n", "Name: Total_CO2_emissions, Length: 202, dtype: float64" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "co2 = df_who[\"Total_CO2_emissions\"]\n", "co2" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5776431.5" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "co2.max()" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 False\n", "2 False\n", "3 False\n", "4 False\n", " ... \n", "197 False\n", "198 False\n", "199 False\n", "200 False\n", "201 False\n", "Name: Total_CO2_emissions, Length: 202, dtype: bool" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "co2==co2.max()" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
192United States of America193443.0NaN44070.093.091.0302841.01.0...5776431.51.100000e+13NaN-7.140000e+118.07.18.0240000000.01.3980.8
\n", "

1 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent \\\n", "192 United States of America 193 4 \n", "\n", " Adolescent fertility rate (%) Adult literacy rate (%) \\\n", "192 43.0 NaN \n", "\n", " Gross national income per capita (PPP international $) \\\n", "192 44070.0 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "192 93.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "192 91.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "192 302841.0 1.0 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "192 5776431.5 1.100000e+13 NaN \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "192 -7.140000e+11 8.0 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "192 7.1 8.0 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "192 240000000.0 1.39 80.8 \n", "\n", "[1 rows x 358 columns]" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who[co2==co2.max()]" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "192 United States of America\n", "Name: Country, dtype: object" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who[co2==co2.max()][\"Country\"]" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['United States of America'], dtype=object)" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who[co2==co2.max()][\"Country\"].values" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'United States of America'" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who[co2==co2.max()][\"Country\"].values[0]" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "112 NaN\n", "24 71.0\n", "15 22.0\n", "Name: Adolescent fertility rate (%), dtype: float64\n" ] }, { "data": { "text/plain": [ "53 48.0\n", "149 28.0\n", "13 135.0\n", "Name: Adolescent fertility rate (%), dtype: float64" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# También existen métodos para seleccionar de manera aleatoria muestras dentro de una serie\n", "# Tip: Métodos montecarlo\n", "fertilidad = df_who[df_who.columns[3]]\n", "some = fertilidad.sample(n=3)\n", "print(some)\n", "fertilidad.sample(n=3,random_state=2) #on random_state és la llavor/seed del aleatori" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Filas\n", "\n", "Cada fila tiene un índice. El índice puede ser numérico, alfabético o temporal." ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RangeIndex(start=0, stop=202, step=1)" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who.index" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,\n", " 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,\n", " 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,\n", " 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,\n", " 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,\n", " 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,\n", " 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,\n", " 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,\n", " 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,\n", " 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,\n", " 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,\n", " 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,\n", " 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,\n", " 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,\n", " 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,\n", " 195, 196, 197, 198, 199, 200, 201])" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who.index.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Así como seleccionamos columnas, podemos seleccionar información con para obtener filas. Para realizar la consulta de una fila concreta usaremos el atributo `loc` de los dataframes." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "df = pd.read_csv(\"http://www.exploredata.net/ftp/WHO.csv\") #dataframe" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Country Afghanistan\n", "CountryID 1\n", "Continent 1\n", "Adolescent fertility rate (%) 151.0\n", "Adult literacy rate (%) 28.0\n", " ... \n", "Under_five_mortality_from_IHME 231.9\n", "Under_five_mortality_rate 257.0\n", "Urban_population 5740436.0\n", "Urban_population_growth 5.44\n", "Urban_population_pct_of_total 22.9\n", "Name: 0, Length: 358, dtype: object\n" ] } ], "source": [ "\n", "print(df.loc[0])" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Si lo que necesitamos es obtener son los valores, necesitaremos el atributo `values`:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "array(['Afghanistan', 1, 1, 151.0, 28.0, nan, nan, nan, 26088.0, 4.0,\n", " 23.0, nan, 16.0, 4.0, 47.0, 6.0, 7.2, nan, nan, nan, 14.0, nan,\n", " nan, nan, nan, nan, nan, 10.3, 73.0, 70.0, 83.0, 83.0, nan, 66.0,\n", " 90.0, nan, nan, nan, nan, nan, 20.1, 27.5, 4.4, 4.0, nan, nan,\n", " 900.0, nan, nan, 14930.0, nan, 900.0, 5970.0, 5.0, nan, 97.2, 8.0,\n", " 6.0, 29.0, 23.0, nan, 2.0, 72.5, 0.0, nan, 2.5, 0.0, 5.4, nan, nan,\n", " nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n", " nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n", " nan, nan, nan, nan, nan, nan, nan, nan, 473.0, 443.0, 500.0, 153.0,\n", " 706.0, 134.0, 1269.0, 18.9, 0.3, 1.1, 1.0, 5.9, 26.0, 22.1, 24.8,\n", " 10.0, 32.0, 0.0, 36.0, 36.0, 35.0, 161.0, 165.0, 154.0, 176.0,\n", " 42.0, 43.0, 42.0, 1800.0, 60.0, 16.0, 100.0, 231.0, 257.0, 254.0,\n", " 260.0, 76.0, 6.0, 18.0, 4.6, 59.3, 32.9, nan, 0.01, nan, nan, 17.0,\n", " 22.0, 37.0, 25.0, 30.0, 45.0, nan, nan, nan, nan, 9.8, 3.2, 13.1,\n", " nan, nan, nan, nan, 58.35, 36.1, nan, 108.83, 2750000000.0, 168.0,\n", " 87.0, 42.29, 0.0, 28000000.0, 2.9, 14.3, 11.7, 26.8, 874.0, 2021.0,\n", " 220.0, 0.000878, 0.02, 0.04, 62300000000.0, 4.0, 600000.0, nan,\n", " 3.6, 6.9, 254.0, 511.0, 96.8, nan, nan, 1792633.0, 7.07, nan, nan,\n", " nan, nan, 3.3, 2.8, 5.2, 4.5, 193.0, 236.0, 316.0, 377.0, nan,\n", " 10.3, 33.0, nan, -7.0, nan, nan, nan, nan, nan, nan, nan, 12.43,\n", " nan, nan, nan, nan, 5.19, 0.0, nan, 8670.0, 25.05, nan, 20.0, 4.16,\n", " 1.04, 3.3, 20.0, 5.2, nan, nan, nan, 55.66, nan, 49.0, 39.0, nan,\n", " 874.0, nan, 24.48, nan, 165.0, 76.0, 40.0, 90.0, 11.9, 1.0, 43.4,\n", " 12.59, 43.14, 28.0, 18.39, 50.81, 34.26, 3.5, 2.3, 3.7, 2.5, 147.0,\n", " 218.0, 155.0, 233.0, 11.3, 2.7, 12.2, 2.9, 173.0, 675.0, 190.0,\n", " 732.0, nan, nan, nan, nan, 1900.0, nan, nan, 64.0, 0.19, 39.42,\n", " 9.93, nan, nan, nan, nan, nan, nan, nan, nan, nan, 8242.0, 66826.0,\n", " nan, nan, nan, nan, nan, nan, 717.04, nan, nan, nan, nan, nan, nan,\n", " nan, nan, 29900000.0, nan, nan, 37.73, nan, nan, nan, nan, 2.8,\n", " 4.5, 151.0, 249.0, 0.68, 55.57, 36.2, 23.66, 3.14, 39.42, 15.8,\n", " 8.3, 18.5, 9.7, 499.0, 936.0, 592.0, 1108.0, nan, 652090.0, nan,\n", " 692.5, nan, nan, nan, 257.0, 231.9, 257.0, 5740436.0, 5.44, 22.9],\n", " dtype=object)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[0].values" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Afghanistan\n", "Afghanistan\n", "151.0\n" ] } ], "source": [ "print(df_who.loc[0].Country) \n", "print(df_who.loc[0][0])\n", "print(df_who.loc[0][3])" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "#### *Slicing*\n", "\n", "Utilizando el atributo `loc` del dataframe podemos seleccionar y filtrar las filas (y columnas) mediante labels. En el caso de filas, el label es el índice y si éste es númerico podemos usar los _slicing_ típicos de `Python`.\n", "\n", "```\n", ".loc(row_labels,columns_labels)\n", "```\n", "\n", "Recordemos el _slicing_:\n", "```{python}\n", "sublista = lista[start:stop:step]\n", "```\n", "\n", "Dónde:\n", "* **start**: Posición de la lista original dónde empieza la sublista. Si no se indica és 0.\n", "* **stop**: Posición de la lista original hasta donde seleccionar. Se selecciona hasta la posición stop - 1.\n", "* **step**: Incremento entre cada índice de la selección, por defecto 1." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Si entendemos el concepto para un array..." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2]\n", "[4, 5, 6, 7, 9, 0]\n", "[1, 2, 3]\n" ] } ], "source": [ "array =[1,2,3,4,5,6,7,9,0]\n", "print(array[0:2]) #**\n", "print(array[3:]) #**\n", "print(array[:3]) #**" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "... podemos hacer las mismas operaciones con las filas de un _dataFrame_." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
4Angola53146.067.43890.049.051.016557.02.8...8991.461.490000e+1027.139.140000e+09164.1242.5164.18578749.04.1453.3
6Argentina7562.097.211670.098.099.039134.01.0...152711.863.140000e+1121.111.190000e+1018.116.718.134900000.01.1790.1
8Australia9616.0NaN33940.097.096.020530.01.1...368858.534.680000e+11NaN-1.280000e+105.95.15.918000000.01.5488.2
10Azerbaijan11231.098.85430.083.086.08406.00.6...36629.019.930000e+0964.891.330000e+0950.063.750.04321803.01.2651.5
\n", "

4 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "4 Angola 5 3 146.0 \n", "6 Argentina 7 5 62.0 \n", "8 Australia 9 6 16.0 \n", "10 Azerbaijan 11 2 31.0 \n", "\n", " Adult literacy rate (%) \\\n", "4 67.4 \n", "6 97.2 \n", "8 NaN \n", "10 98.8 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "4 3890.0 \n", "6 11670.0 \n", "8 33940.0 \n", "10 5430.0 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "4 49.0 \n", "6 98.0 \n", "8 97.0 \n", "10 83.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "4 51.0 \n", "6 99.0 \n", "8 96.0 \n", "10 86.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "4 16557.0 2.8 ... \n", "6 39134.0 1.0 ... \n", "8 20530.0 1.1 ... \n", "10 8406.0 0.6 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "4 8991.46 1.490000e+10 27.13 \n", "6 152711.86 3.140000e+11 21.11 \n", "8 368858.53 4.680000e+11 NaN \n", "10 36629.01 9.930000e+09 64.89 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "4 9.140000e+09 164.1 \n", "6 1.190000e+10 18.1 \n", "8 -1.280000e+10 5.9 \n", "10 1.330000e+09 50.0 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "4 242.5 164.1 \n", "6 16.7 18.1 \n", "8 5.1 5.9 \n", "10 63.7 50.0 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "4 8578749.0 4.14 53.3 \n", "6 34900000.0 1.17 90.1 \n", "8 18000000.0 1.54 88.2 \n", "10 4321803.0 1.26 51.5 \n", "\n", "[4 rows x 358 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[4:10:2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "También se puede realizar una selección particular mediante una lista:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
3Andorra42NaNNaNNaN83.083.074.01.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
10Azerbaijan11231.098.85430.083.086.08406.00.6...36629.019.930000e+0964.891.330000e+0950.063.750.04321803.01.2651.5
29Cambodia30752.073.61550.089.091.014197.01.7...538.615.680000e+0932.94-5.470000e+0897.394.397.32749235.04.5819.7
34Chad353193.025.71170.049.071.010468.03.1...139.232.790000e+0914.17-2.210000e+08208.2180.1208.22566839.04.8825.3
\n", "

4 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "3 Andorra 4 2 NaN \n", "10 Azerbaijan 11 2 31.0 \n", "29 Cambodia 30 7 52.0 \n", "34 Chad 35 3 193.0 \n", "\n", " Adult literacy rate (%) \\\n", "3 NaN \n", "10 98.8 \n", "29 73.6 \n", "34 25.7 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "3 NaN \n", "10 5430.0 \n", "29 1550.0 \n", "34 1170.0 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "3 83.0 \n", "10 83.0 \n", "29 89.0 \n", "34 49.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "3 83.0 \n", "10 86.0 \n", "29 91.0 \n", "34 71.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "3 74.0 1.0 ... \n", "10 8406.0 0.6 ... \n", "29 14197.0 1.7 ... \n", "34 10468.0 3.1 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "3 NaN NaN NaN \n", "10 36629.01 9.930000e+09 64.89 \n", "29 538.61 5.680000e+09 32.94 \n", "34 139.23 2.790000e+09 14.17 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "3 NaN NaN \n", "10 1.330000e+09 50.0 \n", "29 -5.470000e+08 97.3 \n", "34 -2.210000e+08 208.2 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "3 NaN NaN \n", "10 63.7 50.0 \n", "29 94.3 97.3 \n", "34 180.1 208.2 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "3 NaN NaN NaN \n", "10 4321803.0 1.26 51.5 \n", "29 2749235.0 4.58 19.7 \n", "34 2566839.0 4.88 25.3 \n", "\n", "[4 rows x 358 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[[3,10,29,34]]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "### Selección de filas y columnas\n", "\n", "Si seguimos con la misma lógica, usando el atributo `loc` de los _dataFrames_." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "df = pd.read_csv(\"http://www.exploredata.net/ftp/WHO.csv\") #dataframe" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
0Afghanistan11151.028.0NaNNaNNaN26088.04.0...692.50NaNNaNNaN257.00231.9257.005740436.05.4422.9
1Albania2227.098.76000.093.094.03172.00.6...3499.124.790000e+0978.14-2.040000e+0918.4715.518.471431793.92.2145.4
\n", "

2 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "0 Afghanistan 1 1 151.0 \n", "1 Albania 2 2 27.0 \n", "\n", " Adult literacy rate (%) \\\n", "0 28.0 \n", "1 98.7 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "0 NaN \n", "1 6000.0 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "0 NaN \n", "1 93.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "0 NaN \n", "1 94.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "0 26088.0 4.0 ... \n", "1 3172.0 0.6 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "0 692.50 NaN NaN \n", "1 3499.12 4.790000e+09 78.14 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "0 NaN 257.00 \n", "1 -2.040000e+09 18.47 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "0 231.9 257.00 \n", "1 15.5 18.47 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "0 5740436.0 5.44 22.9 \n", "1 1431793.9 2.21 45.4 \n", "\n", "[2 rows x 358 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[0:1]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "Las columnas se deben seleccionar con una **lista** que debe contener el nombre de las columnas deseadas." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Continent
01
12
\n", "
" ], "text/plain": [ " Continent\n", "0 1\n", "1 2" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[0:1,[\"Continent\"]]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ContinentTotal_CO2_emissions
01692.50
123499.12
23137535.56
32NaN
\n", "
" ], "text/plain": [ " Continent Total_CO2_emissions\n", "0 1 692.50\n", "1 2 3499.12\n", "2 3 137535.56\n", "3 2 NaN" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[0:3,[\"Continent\",\"Total_CO2_emissions\"]]" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryContinentTotal_CO2_emissions
0Afghanistan1692.50
3Andorra2NaN
20Bhutan7414.03
100Liberia3472.66
\n", "
" ], "text/plain": [ " Country Continent Total_CO2_emissions\n", "0 Afghanistan 1 692.50\n", "3 Andorra 2 NaN\n", "20 Bhutan 7 414.03\n", "100 Liberia 3 472.66" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[[0,3,20,100],[\"Country\",\"Continent\",\"Total_CO2_emissions\"]]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "Alternativamente, con el atributo _iloc_ podemos seleccionar las columnas con su índice numérico: su posicion en la lista de columnas." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "Country Afghanistan\n", "CountryID 1\n", "Continent 1\n", "Adolescent fertility rate (%) 151.0\n", "Adult literacy rate (%) 28.0\n", " ... \n", "Under_five_mortality_from_IHME 231.9\n", "Under_five_mortality_rate 257.0\n", "Urban_population 5740436.0\n", "Urban_population_growth 5.44\n", "Urban_population_pct_of_total 22.9\n", "Name: 0, Length: 358, dtype: object" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[0]" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Adolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)
0151.028.0NaNNaN
127.098.76000.093.0
26.069.95940.094.0
3NaNNaNNaN83.0
\n", "
" ], "text/plain": [ " Adolescent fertility rate (%) Adult literacy rate (%) \\\n", "0 151.0 28.0 \n", "1 27.0 98.7 \n", "2 6.0 69.9 \n", "3 NaN NaN \n", "\n", " Gross national income per capita (PPP international $) \\\n", "0 NaN \n", "1 6000.0 \n", "2 5940.0 \n", "3 NaN \n", "\n", " Net primary school enrolment ratio female (%) \n", "0 NaN \n", "1 93.0 \n", "2 94.0 \n", "3 83.0 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[0:4, 3:7] # ídem a una matriz" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "'Afghanistan'" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[0][0]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "Country Afghanistan\n", "CountryID 1\n", "Continent 1\n", "Adolescent fertility rate (%) 151.0\n", "Name: 0, dtype: object" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[0][0:4]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "array(['Afghanistan', 1, 1, 151.0], dtype=object)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[0][0:4].values" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "'Afghanistan'" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[0][0:4].values[0]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "### Selección condicional\n", "\n", "Además de la selección con base a índices, lo interesante es realizar selecciones mediante condiciones lógicas que permiten filtrar las filas del dataset. Por ejemplo:\n" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryAdult literacy rate (%)
85Italy98.4
\n", "
" ], "text/plain": [ " Country Adult literacy rate (%)\n", "85 Italy 98.4" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who = pd.read_csv(\"http://www.exploredata.net/ftp/WHO.csv\") #dataframe\n", "\n", "alfabetitzacio = df_who[df_who['Adult literacy rate (%)'] > 70][[\"Country\",\"Adult literacy rate (%)\"]]\n", "\n", "\n", "alfabetitzacio[alfabetitzacio[\"Country\"] == \"Italy\"]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 True\n", "2 False\n", "3 False\n", "4 False\n", " ... \n", "197 True\n", "198 False\n", "199 False\n", "200 False\n", "201 True\n", "Name: Adult literacy rate (%), Length: 202, dtype: bool" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who['Adult literacy rate (%)'] > 70" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "seleccio = df_who['Adult literacy rate (%)'] > 70" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryAdult literacy rate (%)
1Albania98.7
6Argentina97.2
7Armenia99.4
10Azerbaijan98.8
12Bahrain86.5
.........
193Uruguay96.8
195Vanuatu75.5
196Venezuela93.0
197Vietnam90.3
201Zimbabwe89.5
\n", "

93 rows × 2 columns

\n", "
" ], "text/plain": [ " Country Adult literacy rate (%)\n", "1 Albania 98.7\n", "6 Argentina 97.2\n", "7 Armenia 99.4\n", "10 Azerbaijan 98.8\n", "12 Bahrain 86.5\n", ".. ... ...\n", "193 Uruguay 96.8\n", "195 Vanuatu 75.5\n", "196 Venezuela 93.0\n", "197 Vietnam 90.3\n", "201 Zimbabwe 89.5\n", "\n", "[93 rows x 2 columns]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# En aquest codi, de les files on seleccio == True agafam les dues columnes que ens interessen\n", "df_who[seleccio][[\"Country\",\"Adult literacy rate (%)\"]]" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": [ "(array([92]),)" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Multiples criterios\n", "# CO_emisions y Fertilidad\n", "import numpy as np\n", "ix = np.where((df_who[\"Total_CO2_emissions\"] > 10) & (df_who[df_who.columns[3]] <=0.6))\n", "ix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estadísticas en un DataFrame" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothree
0-10.7488040
1-60.4985072
250.2247970
3-100.1980634
470.7605313
\n", "
" ], "text/plain": [ " one two three\n", "0 -1 0.748804 0\n", "1 -6 0.498507 2\n", "2 5 0.224797 0\n", "3 -10 0.198063 4\n", "4 7 0.760531 3" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Creamos un dataframe con valores aleatorios\n", "import numpy as np\n", "\n", "np.random.seed(10)\n", "\n", "df = pd.DataFrame({\"one\":np.random.randint(-10,10,5),\n", " \"two\":np.random.random(5),\n", " \"three\":np.random.randint(0,5,5)})\n", "df" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
one-1.000000-6.0000005.000000-10.0000007.000000
two0.7488040.4985070.2247970.1980630.760531
three0.0000002.0000000.0000004.0000003.000000
\n", "
" ], "text/plain": [ " 0 1 2 3 4\n", "one -1.000000 -6.000000 5.000000 -10.000000 7.000000\n", "two 0.748804 0.498507 0.224797 0.198063 0.760531\n", "three 0.000000 2.000000 0.000000 4.000000 3.000000" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.T" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "one -5.000000\n", "two 2.430701\n", "three 9.000000\n", "dtype: float64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sum()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 -0.251196\n", "1 -3.501493\n", "2 5.224797\n", "3 -5.801937\n", "4 10.760531\n", "dtype: float64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sum(axis=1) # concepto de axis " ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothree
0-10.7488040
1-71.2473112
2-21.4721082
3-121.6701706
4-52.4307019
\n", "
" ], "text/plain": [ " one two three\n", "0 -1 0.748804 0\n", "1 -7 1.247311 2\n", "2 -2 1.472108 2\n", "3 -12 1.670170 6\n", "4 -5 2.430701 9" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.cumsum()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothree
0-1.0-0.251196-0.251196
1-6.0-5.501493-3.501493
25.05.2247975.224797
3-10.0-9.801937-5.801937
47.07.76053110.760531
\n", "
" ], "text/plain": [ " one two three\n", "0 -1.0 -0.251196 -0.251196\n", "1 -6.0 -5.501493 -3.501493\n", "2 5.0 5.224797 5.224797\n", "3 -10.0 -9.801937 -5.801937\n", "4 7.0 7.760531 10.760531" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.cumsum(axis=1)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothree
010.7488040
171.2473112
2121.4721082
3221.6701706
4292.4307019
\n", "
" ], "text/plain": [ " one two three\n", "0 1 0.748804 0\n", "1 7 1.247311 2\n", "2 12 1.472108 2\n", "3 22 1.670170 6\n", "4 29 2.430701 9" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.apply(np.abs,axis=1).cumsum() # función apply" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothree
01.01.7488041.748804
16.06.4985078.498507
25.05.2247975.224797
310.010.19806314.198063
47.07.76053110.760531
\n", "
" ], "text/plain": [ " one two three\n", "0 1.0 1.748804 1.748804\n", "1 6.0 6.498507 8.498507\n", "2 5.0 5.224797 5.224797\n", "3 10.0 10.198063 14.198063\n", "4 7.0 7.760531 10.760531" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.apply(np.abs).cumsum(axis=1)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothree
0-10.7488040
160.3732840
2300.0839130
3-3000.0166200
4-21000.0126400
\n", "
" ], "text/plain": [ " one two three\n", "0 -1 0.748804 0\n", "1 6 0.373284 0\n", "2 30 0.083913 0\n", "3 -300 0.016620 0\n", "4 -2100 0.012640 0" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.cumprod()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothree
0-1.0-0.748804-0.000000
1-6.0-2.991042-5.982084
25.01.1239830.000000
3-10.0-1.980629-7.922515
47.05.32371515.971145
\n", "
" ], "text/plain": [ " one two three\n", "0 -1.0 -0.748804 -0.000000\n", "1 -6.0 -2.991042 -5.982084\n", "2 5.0 1.123983 0.000000\n", "3 -10.0 -1.980629 -7.922515\n", "4 7.0 5.323715 15.971145" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.cumprod(axis=1)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1. 1. 1. 1. 1.]\n" ] } ], "source": [ "ones = np.ones(5) # numpy arrays\n", "print(ones)\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothree
0-2.0-0.251196-1.0
1-7.0-0.5014931.0
24.0-0.775203-1.0
3-11.0-0.8019373.0
46.0-0.2394692.0
\n", "
" ], "text/plain": [ " one two three\n", "0 -2.0 -0.251196 -1.0\n", "1 -7.0 -0.501493 1.0\n", "2 4.0 -0.775203 -1.0\n", "3 -11.0 -0.801937 3.0\n", "4 6.0 -0.239469 2.0" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sub(ones,axis=0)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "Unable to coerce to Series, length must be 3: given 5", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m/Users/isaac/Projects/TxADM_notebooks/notebooks/Part2/00_Pandas/01_Introduccion.ipynb Cell 131\u001b[0m line \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0m df\u001b[39m.\u001b[39;49msub(ones)\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/core/ops/__init__.py:436\u001b[0m, in \u001b[0;36mflex_arith_method_FRAME..f\u001b[0;34m(self, other, axis, level, fill_value)\u001b[0m\n\u001b[1;32m 433\u001b[0m axis \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_get_axis_number(axis) \u001b[39mif\u001b[39;00m axis \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m \u001b[39melse\u001b[39;00m \u001b[39m1\u001b[39m\n\u001b[1;32m 435\u001b[0m other \u001b[39m=\u001b[39m maybe_prepare_scalar_for_op(other, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mshape)\n\u001b[0;32m--> 436\u001b[0m \u001b[39mself\u001b[39m, other \u001b[39m=\u001b[39m align_method_FRAME(\u001b[39mself\u001b[39;49m, other, axis, flex\u001b[39m=\u001b[39;49m\u001b[39mTrue\u001b[39;49;00m, level\u001b[39m=\u001b[39;49mlevel)\n\u001b[1;32m 438\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39misinstance\u001b[39m(other, ABCDataFrame):\n\u001b[1;32m 439\u001b[0m \u001b[39m# Another DataFrame\u001b[39;00m\n\u001b[1;32m 440\u001b[0m new_data \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_combine_frame(other, na_op, fill_value)\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/core/ops/__init__.py:248\u001b[0m, in \u001b[0;36malign_method_FRAME\u001b[0;34m(left, right, axis, flex, level)\u001b[0m\n\u001b[1;32m 245\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39misinstance\u001b[39m(right, np\u001b[39m.\u001b[39mndarray):\n\u001b[1;32m 247\u001b[0m \u001b[39mif\u001b[39;00m right\u001b[39m.\u001b[39mndim \u001b[39m==\u001b[39m \u001b[39m1\u001b[39m:\n\u001b[0;32m--> 248\u001b[0m right \u001b[39m=\u001b[39m to_series(right)\n\u001b[1;32m 250\u001b[0m \u001b[39melif\u001b[39;00m right\u001b[39m.\u001b[39mndim \u001b[39m==\u001b[39m \u001b[39m2\u001b[39m:\n\u001b[1;32m 251\u001b[0m \u001b[39mif\u001b[39;00m right\u001b[39m.\u001b[39mshape \u001b[39m==\u001b[39m left\u001b[39m.\u001b[39mshape:\n", "File \u001b[0;32m~/.pyenv/versions/3.9.7/lib/python3.9/site-packages/pandas/core/ops/__init__.py:239\u001b[0m, in \u001b[0;36malign_method_FRAME..to_series\u001b[0;34m(right)\u001b[0m\n\u001b[1;32m 237\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 238\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(left\u001b[39m.\u001b[39mcolumns) \u001b[39m!=\u001b[39m \u001b[39mlen\u001b[39m(right):\n\u001b[0;32m--> 239\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\n\u001b[1;32m 240\u001b[0m msg\u001b[39m.\u001b[39mformat(req_len\u001b[39m=\u001b[39m\u001b[39mlen\u001b[39m(left\u001b[39m.\u001b[39mcolumns), given_len\u001b[39m=\u001b[39m\u001b[39mlen\u001b[39m(right))\n\u001b[1;32m 241\u001b[0m )\n\u001b[1;32m 242\u001b[0m right \u001b[39m=\u001b[39m left\u001b[39m.\u001b[39m_constructor_sliced(right, index\u001b[39m=\u001b[39mleft\u001b[39m.\u001b[39mcolumns)\n\u001b[1;32m 243\u001b[0m \u001b[39mreturn\u001b[39;00m right\n", "\u001b[0;31mValueError\u001b[0m: Unable to coerce to Series, length must be 3: given 5" ] } ], "source": [ "df.sub(ones) #Alerta!" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothree
00.0000000.966021-1.006231
1-0.6967330.0454820.111803
20.836080-0.961166-1.006231
3-1.254119-1.0594871.229837
41.1147731.0091500.670820
\n", "
" ], "text/plain": [ " one two three\n", "0 0.000000 0.966021 -1.006231\n", "1 -0.696733 0.045482 0.111803\n", "2 0.836080 -0.961166 -1.006231\n", "3 -1.254119 -1.059487 1.229837\n", "4 1.114773 1.009150 0.670820" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# nornalitzación z-score (media a 0 y desviación a 1)\n", "ts_stand = (df - df.mean()) / df.std()\n", "ts_stand" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "one 0.000000e+00\n", "two 2.664535e-16\n", "three 6.661338e-17\n", "dtype: float64\n", "----\n", "one 1.0\n", "two 1.0\n", "three 1.0\n", "dtype: float64\n" ] } ], "source": [ "print(ts_stand.mean())\n", "print(\"----\")\n", "print(ts_stand.std())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Series no númericas" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Afghanistan\n", "1 Albania\n", "2 Algeria\n", "3 Andorra\n", "4 Angola\n", " ... \n", "197 Vietnam\n", "198 West Bank and Gaza\n", "199 Yemen\n", "200 Zambia\n", "201 Zimbabwe\n", "Name: Country, Length: 202, dtype: object" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who[\"Country\"] " ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 afghanistan\n", "1 albania\n", "2 algeria\n", "3 andorra\n", "4 angola\n", " ... \n", "197 vietnam\n", "198 west bank and gaza\n", "199 yemen\n", "200 zambia\n", "201 zimbabwe\n", "Name: Country, Length: 202, dtype: object" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#https://www.w3schools.com/python/python_ref_string.asp\n", "df_who.Country.str.casefold()\n" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0Afghanistan\n", "1 00000Albania\n", "2 00000Algeria\n", "3 00000Andorra\n", "4 000000Angola\n", " ... \n", "197 00000Vietnam\n", "198 West Bank and Gaza\n", "199 0000000Yemen\n", "200 000000Zambia\n", "201 0000Zimbabwe\n", "Name: Country, Length: 202, dtype: object" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who.Country.str.zfill(12)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Modificación del dataframe\n", "\n", "Además de realizar selecciones, en algunos momentos necesitaremos incorporar nueva información a nuestras tablas de datos.\n", "\n", "Vamos a crear un pequeño conjunto para practicar:\n" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nametypeAvgBill
0Foreign CinemaRestaurant289.0
1Liho LihoRestaurant224.0
2500 Clubbar80.5
3The Squarebar25.3
\n", "
" ], "text/plain": [ " name type AvgBill\n", "0 Foreign Cinema Restaurant 289.0\n", "1 Liho Liho Restaurant 224.0\n", "2 500 Club bar 80.5\n", "3 The Square bar 25.3" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0),\n", " ('Liho Liho', 'Restaurant', 224.0),\n", " ('500 Club', 'bar', 80.5),\n", " ('The Square', 'bar', 25.30)],\n", " columns=('name', 'type', 'AvgBill')\n", " )\n", "df2" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "### Añadir columnas\n", "\n", "Tenemos diversas maneras de añadir columnas a un _dataFrame_:\n", "\n", "- Mediante el nombre de la columna que queremos añadir, tal como añadimos una nueva clave a un diccionario.\n", "- `insert`: es un método que necesita 3 parámetros. La posición en la que queremos añadir la columna (`loc`), su nombre (´column´) y la lista de valores (`value`).\n", "- `assign`: muy similar a la anterior, pero permite añadir múltiples columnas.\n", "- `concat`: no se suele usar para concatenar columnas, en el caso que queramos usarlo para este caso, deberemos poner el parámetro `axis=1`.\n", "\n", "Veamos algunos ejemplos:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nametypeAvgBillDay
0Foreign CinemaRestaurant289.0Monday
1Liho LihoRestaurant224.0Monday
2500 Clubbar80.5Monday
3The Squarebar25.3Monday
\n", "
" ], "text/plain": [ " name type AvgBill Day\n", "0 Foreign Cinema Restaurant 289.0 Monday\n", "1 Liho Liho Restaurant 224.0 Monday\n", "2 500 Club bar 80.5 Monday\n", "3 The Square bar 25.3 Monday" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2['Day'] = \"Monday\" # Como un diccionario\n", "df2" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "ename": "ValueError", "evalue": "Length of values (3) does not match length of index (4)", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[39], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mdf2\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mDay\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m \u001b[38;5;241m=\u001b[39m [\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mMonday\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mTuesday\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mWednesday\u001b[39m\u001b[38;5;124m'\u001b[39m]\n\u001b[1;32m 2\u001b[0m df2\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/core/frame.py:3980\u001b[0m, in \u001b[0;36mDataFrame.__setitem__\u001b[0;34m(self, key, value)\u001b[0m\n\u001b[1;32m 3977\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_setitem_array([key], value)\n\u001b[1;32m 3978\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 3979\u001b[0m \u001b[38;5;66;03m# set column\u001b[39;00m\n\u001b[0;32m-> 3980\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_set_item\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mvalue\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/core/frame.py:4174\u001b[0m, in \u001b[0;36mDataFrame._set_item\u001b[0;34m(self, key, value)\u001b[0m\n\u001b[1;32m 4164\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_set_item\u001b[39m(\u001b[38;5;28mself\u001b[39m, key, value) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 4165\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 4166\u001b[0m \u001b[38;5;124;03m Add series to DataFrame in specified column.\u001b[39;00m\n\u001b[1;32m 4167\u001b[0m \n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 4172\u001b[0m \u001b[38;5;124;03m ensure homogeneity.\u001b[39;00m\n\u001b[1;32m 4173\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m-> 4174\u001b[0m value \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_sanitize_column\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvalue\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 4176\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m (\n\u001b[1;32m 4177\u001b[0m key \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcolumns\n\u001b[1;32m 4178\u001b[0m \u001b[38;5;129;01mand\u001b[39;00m value\u001b[38;5;241m.\u001b[39mndim \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m 4179\u001b[0m \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m is_extension_array_dtype(value)\n\u001b[1;32m 4180\u001b[0m ):\n\u001b[1;32m 4181\u001b[0m \u001b[38;5;66;03m# broadcast across multiple columns if necessary\u001b[39;00m\n\u001b[1;32m 4182\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcolumns\u001b[38;5;241m.\u001b[39mis_unique \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcolumns, MultiIndex):\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/core/frame.py:4915\u001b[0m, in \u001b[0;36mDataFrame._sanitize_column\u001b[0;34m(self, value)\u001b[0m\n\u001b[1;32m 4912\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m _reindex_for_setitem(Series(value), \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mindex)\n\u001b[1;32m 4914\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m is_list_like(value):\n\u001b[0;32m-> 4915\u001b[0m \u001b[43mcom\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrequire_length_match\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvalue\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mindex\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 4916\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m sanitize_array(value, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mindex, copy\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m, allow_2d\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/core/common.py:571\u001b[0m, in \u001b[0;36mrequire_length_match\u001b[0;34m(data, index)\u001b[0m\n\u001b[1;32m 567\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 568\u001b[0m \u001b[38;5;124;03mCheck the length of data matches the length of the index.\u001b[39;00m\n\u001b[1;32m 569\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 570\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(data) \u001b[38;5;241m!=\u001b[39m \u001b[38;5;28mlen\u001b[39m(index):\n\u001b[0;32m--> 571\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m 572\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mLength of values \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 573\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m(\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mlen\u001b[39m(data)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m) \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 574\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdoes not match length of index \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 575\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m(\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mlen\u001b[39m(index)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m)\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 576\u001b[0m )\n", "\u001b[0;31mValueError\u001b[0m: Length of values (3) does not match length of index (4)" ] } ], "source": [ "df2['Day'] = ['Monday', 'Tuesday', 'Wednesday']\n", "df2" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameStarstypeAvgBillDay
0Foreign Cinema2Restaurant289.0Monday
1Liho Liho2Restaurant224.0Monday
2500 Club3bar80.5Monday
3The Square4bar25.3Monday
\n", "
" ], "text/plain": [ " name Stars type AvgBill Day\n", "0 Foreign Cinema 2 Restaurant 289.0 Monday\n", "1 Liho Liho 2 Restaurant 224.0 Monday\n", "2 500 Club 3 bar 80.5 Monday\n", "3 The Square 4 bar 25.3 Monday" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Vamos a usar el método insert\n", "df2.insert(loc=1, column=\"Stars\", value=[2,2,3,4])\n", "df2" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameStarstypeAvgBillDayAvgBillIVA
0Foreign Cinema2Restaurant289.0Monday349.690
1Liho Liho2Restaurant224.0Monday271.040
2500 Club3bar80.5Monday97.405
3The Square4bar25.3Monday30.613
\n", "
" ], "text/plain": [ " name Stars type AvgBill Day AvgBillIVA\n", "0 Foreign Cinema 2 Restaurant 289.0 Monday 349.690\n", "1 Liho Liho 2 Restaurant 224.0 Monday 271.040\n", "2 500 Club 3 bar 80.5 Monday 97.405\n", "3 The Square 4 bar 25.3 Monday 30.613" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2[\"AvgBillIVA\"] = df2[\"AvgBill\"]*1.21\n", "df2" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "df3 = df2.assign(AvgHalfBill=df2.AvgBill / 2, Michelin_Star=3)\n", "df3\n", "\n", "df3[\"HOLA\"] = df3.name.str.capitalize()" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameStarstypeAvgBillDayAvgHalfBillMichelin_StarHOLA
0Foreign Cinema2Restaurant289.0Monday144.503Foreign cinema
1Liho Liho2Restaurant224.0Tuesday112.003Liho liho
2500 Club3bar80.5Wednesday40.253500 club
3The Square4bar25.3Thursday12.653The square
\n", "
" ], "text/plain": [ " name Stars type AvgBill Day AvgHalfBill \\\n", "0 Foreign Cinema 2 Restaurant 289.0 Monday 144.50 \n", "1 Liho Liho 2 Restaurant 224.0 Tuesday 112.00 \n", "2 500 Club 3 bar 80.5 Wednesday 40.25 \n", "3 The Square 4 bar 25.3 Thursday 12.65 \n", "\n", " Michelin_Star HOLA \n", "0 3 Foreign cinema \n", "1 3 Liho liho \n", "2 3 500 club \n", "3 3 The square " ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Añadir filas\n", "\n", "Para agregar una nueva fila a un DataFrame en pandas, una forma común es crear un DataFrame con esa fila y luego unirlo al original usando la función `pd.concat()`. **Ejemplo**:\n", "\n", "```python\n", "import pandas as pd\n", "\n", "# DataFrame original\n", "df = pd.DataFrame({\n", " \"nombre\": [\"Ana\", \"Luis\"],\n", " \"edad\": [28, 34]\n", "})\n", "\n", "# Nueva fila como DataFrame\n", "nueva_fila = pd.DataFrame({\n", " \"nombre\": [\"Carlos\"],\n", " \"edad\": [30]\n", "})\n", "\n", "# Concatenar\n", "df = pd.concat([df, nueva_fila], ignore_index=True)\n", "```\n", "\n", "La función `pandas.concat()` sirve para unir objetos de tipo Series o DataFrame. Para añadir una fila:\n", "- Se pasa una lista que contiene el DataFrame original y el nuevo DataFrame.\n", "- `ignore_index=True` hace que se reajusten los índices del resultado.\n", "- El método no modifica los DataFrames originales a menos que reasignes el resultado." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "\n", "\n", "#### Eliminar filas y columnas\n", "\n", "\n", "Tenemos el método `drop` que nos proporciona un nuevo _dataFrame_ sin la(s) fila(s) o la(s) columna(s) que seleccionemos. \n", "Si queremos eliminar columnas podemos hacerlo especificando la lista de columnas en el parámetro `columns` de la siguiente manera:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameStarstypeAvgBillDayAvgHalfBillMichelin_StarHOLA
0Foreign Cinema2Restaurant289.0Monday144.503Foreign cinema
1Liho Liho2Restaurant224.0Monday112.003Liho liho
2500 Club3bar80.5Monday40.253500 club
3The Square4bar25.3Monday12.653The square
\n", "
" ], "text/plain": [ " name Stars type AvgBill Day AvgHalfBill \\\n", "0 Foreign Cinema 2 Restaurant 289.0 Monday 144.50 \n", "1 Liho Liho 2 Restaurant 224.0 Monday 112.00 \n", "2 500 Club 3 bar 80.5 Monday 40.25 \n", "3 The Square 4 bar 25.3 Monday 12.65 \n", "\n", " Michelin_Star HOLA \n", "0 3 Foreign cinema \n", "1 3 Liho liho \n", "2 3 500 club \n", "3 3 The square " ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "ename": "KeyError", "evalue": "\"['Stars'] not found in axis\"", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[49], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mdf3\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdrop\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcolumns\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mStars\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43minplace\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m)\u001b[49m \u001b[38;5;66;03m# Eliminamos la última columna que hemos creado\u001b[39;00m\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/util/_decorators.py:331\u001b[0m, in \u001b[0;36mdeprecate_nonkeyword_arguments..decorate..wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 325\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(args) \u001b[38;5;241m>\u001b[39m num_allow_args:\n\u001b[1;32m 326\u001b[0m warnings\u001b[38;5;241m.\u001b[39mwarn(\n\u001b[1;32m 327\u001b[0m msg\u001b[38;5;241m.\u001b[39mformat(arguments\u001b[38;5;241m=\u001b[39m_format_argument_list(allow_args)),\n\u001b[1;32m 328\u001b[0m \u001b[38;5;167;01mFutureWarning\u001b[39;00m,\n\u001b[1;32m 329\u001b[0m stacklevel\u001b[38;5;241m=\u001b[39mfind_stack_level(),\n\u001b[1;32m 330\u001b[0m )\n\u001b[0;32m--> 331\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/core/frame.py:5399\u001b[0m, in \u001b[0;36mDataFrame.drop\u001b[0;34m(self, labels, axis, index, columns, level, inplace, errors)\u001b[0m\n\u001b[1;32m 5251\u001b[0m \u001b[38;5;129m@deprecate_nonkeyword_arguments\u001b[39m(version\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m, allowed_args\u001b[38;5;241m=\u001b[39m[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mself\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mlabels\u001b[39m\u001b[38;5;124m\"\u001b[39m])\n\u001b[1;32m 5252\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mdrop\u001b[39m( \u001b[38;5;66;03m# type: ignore[override]\u001b[39;00m\n\u001b[1;32m 5253\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 5260\u001b[0m errors: IgnoreRaise \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mraise\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m 5261\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m DataFrame \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 5262\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 5263\u001b[0m \u001b[38;5;124;03m Drop specified labels from rows or columns.\u001b[39;00m\n\u001b[1;32m 5264\u001b[0m \n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 5397\u001b[0m \u001b[38;5;124;03m weight 1.0 0.8\u001b[39;00m\n\u001b[1;32m 5398\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m-> 5399\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdrop\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 5400\u001b[0m \u001b[43m \u001b[49m\u001b[43mlabels\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mlabels\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 5401\u001b[0m \u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43maxis\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 5402\u001b[0m \u001b[43m \u001b[49m\u001b[43mindex\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mindex\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 5403\u001b[0m \u001b[43m \u001b[49m\u001b[43mcolumns\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcolumns\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 5404\u001b[0m \u001b[43m \u001b[49m\u001b[43mlevel\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mlevel\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 5405\u001b[0m \u001b[43m \u001b[49m\u001b[43minplace\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43minplace\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 5406\u001b[0m \u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43merrors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 5407\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/util/_decorators.py:331\u001b[0m, in \u001b[0;36mdeprecate_nonkeyword_arguments..decorate..wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 325\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(args) \u001b[38;5;241m>\u001b[39m num_allow_args:\n\u001b[1;32m 326\u001b[0m warnings\u001b[38;5;241m.\u001b[39mwarn(\n\u001b[1;32m 327\u001b[0m msg\u001b[38;5;241m.\u001b[39mformat(arguments\u001b[38;5;241m=\u001b[39m_format_argument_list(allow_args)),\n\u001b[1;32m 328\u001b[0m \u001b[38;5;167;01mFutureWarning\u001b[39;00m,\n\u001b[1;32m 329\u001b[0m stacklevel\u001b[38;5;241m=\u001b[39mfind_stack_level(),\n\u001b[1;32m 330\u001b[0m )\n\u001b[0;32m--> 331\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/core/generic.py:4505\u001b[0m, in \u001b[0;36mNDFrame.drop\u001b[0;34m(self, labels, axis, index, columns, level, inplace, errors)\u001b[0m\n\u001b[1;32m 4503\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m axis, labels \u001b[38;5;129;01min\u001b[39;00m axes\u001b[38;5;241m.\u001b[39mitems():\n\u001b[1;32m 4504\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m labels \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m-> 4505\u001b[0m obj \u001b[38;5;241m=\u001b[39m \u001b[43mobj\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_drop_axis\u001b[49m\u001b[43m(\u001b[49m\u001b[43mlabels\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mlevel\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mlevel\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43merrors\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 4507\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m inplace:\n\u001b[1;32m 4508\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_update_inplace(obj)\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/core/generic.py:4546\u001b[0m, in \u001b[0;36mNDFrame._drop_axis\u001b[0;34m(self, labels, axis, level, errors, only_slice)\u001b[0m\n\u001b[1;32m 4544\u001b[0m new_axis \u001b[38;5;241m=\u001b[39m axis\u001b[38;5;241m.\u001b[39mdrop(labels, level\u001b[38;5;241m=\u001b[39mlevel, errors\u001b[38;5;241m=\u001b[39merrors)\n\u001b[1;32m 4545\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m-> 4546\u001b[0m new_axis \u001b[38;5;241m=\u001b[39m \u001b[43maxis\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdrop\u001b[49m\u001b[43m(\u001b[49m\u001b[43mlabels\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43merrors\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 4547\u001b[0m indexer \u001b[38;5;241m=\u001b[39m axis\u001b[38;5;241m.\u001b[39mget_indexer(new_axis)\n\u001b[1;32m 4549\u001b[0m \u001b[38;5;66;03m# Case for non-unique axis\u001b[39;00m\n\u001b[1;32m 4550\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/core/indexes/base.py:6934\u001b[0m, in \u001b[0;36mIndex.drop\u001b[0;34m(self, labels, errors)\u001b[0m\n\u001b[1;32m 6932\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m mask\u001b[38;5;241m.\u001b[39many():\n\u001b[1;32m 6933\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m errors \u001b[38;5;241m!=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mignore\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n\u001b[0;32m-> 6934\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mlist\u001b[39m(labels[mask])\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m not found in axis\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 6935\u001b[0m indexer \u001b[38;5;241m=\u001b[39m indexer[\u001b[38;5;241m~\u001b[39mmask]\n\u001b[1;32m 6936\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mdelete(indexer)\n", "\u001b[0;31mKeyError\u001b[0m: \"['Stars'] not found in axis\"" ] } ], "source": [ "df3.drop(columns=[\"Stars\"],inplace=True) # Eliminamos la última columna que hemos creado\n" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nametypeAvgBillDayAvgHalfBillMichelin_StarHOLA
0Foreign CinemaRestaurant289.0Monday144.503Foreign cinema
1Liho LihoRestaurant224.0Monday112.003Liho liho
2500 Clubbar80.5Monday40.253500 club
3The Squarebar25.3Monday12.653The square
\n", "
" ], "text/plain": [ " name type AvgBill Day AvgHalfBill Michelin_Star \\\n", "0 Foreign Cinema Restaurant 289.0 Monday 144.50 3 \n", "1 Liho Liho Restaurant 224.0 Monday 112.00 3 \n", "2 500 Club bar 80.5 Monday 40.25 3 \n", "3 The Square bar 25.3 Monday 12.65 3 \n", "\n", " HOLA \n", "0 Foreign cinema \n", "1 Liho liho \n", "2 500 club \n", "3 The square " ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "Para poder eliminar filas, usamos la misma función, esta vez sin el parámetro que hemos usado anteriormente, simplemente indicamos los índices a eliminar:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameStarstypeAvgBillDayAvgHalfBill
0Foreign Cinema2Restaurant289.0Monday144.50
2500 Club3bar80.5Wednesday40.25
\n", "
" ], "text/plain": [ " name Stars type AvgBill Day AvgHalfBill\n", "0 Foreign Cinema 2 Restaurant 289.0 Monday 144.50\n", "2 500 Club 3 bar 80.5 Wednesday 40.25" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_less_rows = df_no_michelin.drop([1,3])\n", "df_less_rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Eliminación de filas según un criterio\n", "\n", "En ocasiones, al trabajar con datos es necesario depurar un `DataFrame` eliminando aquellas filas que no cumplen ciertas condiciones o que contienen valores no deseados. La eliminación de filas según un criterio permite filtrar la información de forma eficiente, manteniendo únicamente los registros relevantes para el análisis. En pandas, este proceso suele realizarse mediante operaciones lógicas y el filtrado booleano. Estas herramientas facilitan un control preciso sobre qué datos se conservan y cuáles se descartan." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameStarstypeAvgBillDayAvgHalfBillMichelin_Star
2500 Club3bar80.5Wednesday40.253
3The Square4bar25.3Thursday12.653
\n", "
" ], "text/plain": [ " name Stars type AvgBill Day AvgHalfBill Michelin_Star\n", "2 500 Club 3 bar 80.5 Wednesday 40.25 3\n", "3 The Square 4 bar 25.3 Thursday 12.65 3" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfs1 = df3[df3.Stars>=3] # per una selecció?\n", "dfs1" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Int64Index([2, 3], dtype='int64')" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3[df3.Stars>=3].index" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameStarstypeAvgBillDayAvgHalfBillMichelin_Star
0Foreign Cinema2Restaurant289.0Monday144.53
1Liho Liho2Restaurant224.0Tuesday112.03
\n", "
" ], "text/plain": [ " name Stars type AvgBill Day AvgHalfBill \\\n", "0 Foreign Cinema 2 Restaurant 289.0 Monday 144.5 \n", "1 Liho Liho 2 Restaurant 224.0 Tuesday 112.0 \n", "\n", " Michelin_Star \n", "0 3 \n", "1 3 " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3.drop(df3[df3.Stars>=3].index)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameStarstypeAvgBillDayAvgHalfBill
0Foreign Cinema2Restaurant289.0Monday144.50
2500 Club3bar80.5Wednesday40.25
\n", "
" ], "text/plain": [ " name Stars type AvgBill Day AvgHalfBill\n", "0 Foreign Cinema 2 Restaurant 289.0 Monday 144.50\n", "2 500 Club 3 bar 80.5 Wednesday 40.25" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3.drop(df3[df3.Stars>=3].index, inplace=True) #Alerta con la integración de los cambios\n", "df3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Modificar el valor de un fila según un criterio\n", "\n", "En algunos casos es necesario actualizar los valores de determinadas filas que cumplan una condición específica. En pandas, esto se logra fácilmente usando filtrado booleano junto con asignaciones directas. Este método permite cambiar solo las filas que coincidan con un criterio, manteniendo el resto del *DataFrame* intacto. **Ejemplo**:\n", "\n", "```python\n", "import pandas as pd\n", "\n", "# DataFrame original\n", "df = pd.DataFrame({\n", " \"nombre\": [\"Ana\", \"Luis\", \"Carlos\"],\n", " \"edad\": [28, 34, 30]\n", "})\n", "\n", "# Modificar la edad de la fila donde el nombre es \"Luis\"\n", "df.loc[df[\"nombre\"] == \"Luis\", \"edad\"] = 35\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Concatenación y unión de dataframes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A veces, los datos vienen en diferentes archivos y necesitan ser combinados en un único archivo, este proceso implica la concatenación de dataframes. Otras veces, los datos son complementarios, es decir, hay nuevas columnas en un dataframe, y esto se conoce como realizar operaciones de unión (joins)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Concatenación" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nametypeAvgBill
0Foreign CinemaRestaurant289.0
1Liho LihoRestaurant224.0
2500 Clubbar80.5
3The Squarebar25.3
\n", "
" ], "text/plain": [ " name type AvgBill\n", "0 Foreign Cinema Restaurant 289.0\n", "1 Liho Liho Restaurant 224.0\n", "2 500 Club bar 80.5\n", "3 The Square bar 25.3" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0),\n", " ('Liho Liho', 'Restaurant', 224.0),\n", " ('500 Club', 'bar', 80.5),\n", " ('The Square', 'bar', 25.30)],\n", " columns=('name', 'type', 'AvgBill')\n", " )\n", "df1" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
name2typeAvgBill
0BielsQuiosquet389.0
1BarkusBar24.0
2Blue Wallbar80.5
3Bounty HuntersSocial Club125.3
\n", "
" ], "text/plain": [ " name2 type AvgBill\n", "0 Biels Quiosquet 389.0\n", "1 Barkus Bar 24.0\n", "2 Blue Wall bar 80.5\n", "3 Bounty Hunters Social Club 125.3" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.DataFrame([('Biels', 'Quiosquet', 389.0),\n", " ('Barkus', 'Bar', 24.0),\n", " ('Blue Wall', 'bar', 80.5),\n", " ('Bounty Hunters', 'Social Club', 125.30)],\n", " columns=('name2', 'type', 'AvgBill')\n", " )\n", "df2" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nametypeAvgBillname2
0Foreign CinemaRestaurant289.0NaN
1Liho LihoRestaurant224.0NaN
2500 Clubbar80.5NaN
3The Squarebar25.3NaN
0NaNQuiosquet389.0Biels
1NaNBar24.0Barkus
2NaNbar80.5Blue Wall
3NaNSocial Club125.3Bounty Hunters
\n", "
" ], "text/plain": [ " name type AvgBill name2\n", "0 Foreign Cinema Restaurant 289.0 NaN\n", "1 Liho Liho Restaurant 224.0 NaN\n", "2 500 Club bar 80.5 NaN\n", "3 The Square bar 25.3 NaN\n", "0 NaN Quiosquet 389.0 Biels\n", "1 NaN Bar 24.0 Barkus\n", "2 NaN bar 80.5 Blue Wall\n", "3 NaN Social Club 125.3 Bounty Hunters" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfAll = pd.concat([df1,df2])\n", "dfAll" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexnametypeAvgBill
00Foreign CinemaRestaurant289.0
11Liho LihoRestaurant224.0
22500 Clubbar80.5
33The Squarebar25.3
40BielsQuiosquet389.0
51BarkusBar24.0
62Blue Wallbar80.5
73Bounty HuntersSocial Club125.3
\n", "
" ], "text/plain": [ " index name type AvgBill\n", "0 0 Foreign Cinema Restaurant 289.0\n", "1 1 Liho Liho Restaurant 224.0\n", "2 2 500 Club bar 80.5\n", "3 3 The Square bar 25.3\n", "4 0 Biels Quiosquet 389.0\n", "5 1 Barkus Bar 24.0\n", "6 2 Blue Wall bar 80.5\n", "7 3 Bounty Hunters Social Club 125.3" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfAll = dfAll.reset_index()\n", "dfAll" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dfAll.drop(columns=[\"index\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# https://pandas.pydata.org/docs/reference/api/pandas.concat.html\n", "dfAll = pd.concat([df1,df2],ignore_index=True)\n", "dfAll" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Unión\n", "\n", "Más información: https://pandas.pydata.org/docs/user_guide/merging.html" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameIDCountry
0Jhon1Italy
1Pep2Germany
2William3Finland
3Snake4Italy
\n", "
" ], "text/plain": [ " name ID Country\n", "0 Jhon 1 Italy\n", "1 Pep 2 Germany\n", "2 William 3 Finland\n", "3 Snake 4 Italy" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.DataFrame([('Jhon', 1, \"Italy\"),\n", " ('Pep', 2, \"Germany\"),\n", " ('William', 3, \"Finland\"),\n", " ('Snake', 4, \"Italy\")],\n", " columns=('name', 'ID', 'Country')\n", " )\n", "df1" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DNIWeightSalary
01145.03000.1
12189.22030.2
23129.03000.0
34198.14020.2
\n", "
" ], "text/plain": [ " DNI Weight Salary\n", "0 1 145.0 3000.1\n", "1 2 189.2 2030.2\n", "2 3 129.0 3000.0\n", "3 4 198.1 4020.2" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.DataFrame([(1, 145.0, 3000.1),\n", " ( 2, 189.2, 2030.2),\n", " ( 3, 129.0, 3000.0),\n", " ( 4, 198.1, 4020.2)],\n", " columns=('DNI', 'Weight', 'Salary')\n", " )\n", "df2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "ename": "MergeError", "evalue": "No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mMergeError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[56], line 3\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;66;03m# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html\u001b[39;00m\n\u001b[0;32m----> 3\u001b[0m \u001b[43mdf1\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmerge\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdf2\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/core/frame.py:10093\u001b[0m, in \u001b[0;36mDataFrame.merge\u001b[0;34m(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)\u001b[0m\n\u001b[1;32m 10074\u001b[0m \u001b[38;5;129m@Substitution\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 10075\u001b[0m \u001b[38;5;129m@Appender\u001b[39m(_merge_doc, indents\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m2\u001b[39m)\n\u001b[1;32m 10076\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mmerge\u001b[39m(\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 10089\u001b[0m validate: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 10090\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m DataFrame:\n\u001b[1;32m 10091\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mpandas\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mcore\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mreshape\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mmerge\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m merge\n\u001b[0;32m> 10093\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mmerge\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 10094\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10095\u001b[0m \u001b[43m \u001b[49m\u001b[43mright\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10096\u001b[0m \u001b[43m \u001b[49m\u001b[43mhow\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mhow\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10097\u001b[0m \u001b[43m \u001b[49m\u001b[43mon\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mon\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10098\u001b[0m \u001b[43m \u001b[49m\u001b[43mleft_on\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mleft_on\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10099\u001b[0m \u001b[43m \u001b[49m\u001b[43mright_on\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mright_on\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10100\u001b[0m \u001b[43m \u001b[49m\u001b[43mleft_index\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mleft_index\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10101\u001b[0m \u001b[43m \u001b[49m\u001b[43mright_index\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mright_index\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10102\u001b[0m \u001b[43m \u001b[49m\u001b[43msort\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43msort\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10103\u001b[0m \u001b[43m \u001b[49m\u001b[43msuffixes\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43msuffixes\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10104\u001b[0m \u001b[43m \u001b[49m\u001b[43mcopy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcopy\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10105\u001b[0m \u001b[43m \u001b[49m\u001b[43mindicator\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mindicator\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10106\u001b[0m \u001b[43m \u001b[49m\u001b[43mvalidate\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mvalidate\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 10107\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/core/reshape/merge.py:110\u001b[0m, in \u001b[0;36mmerge\u001b[0;34m(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)\u001b[0m\n\u001b[1;32m 93\u001b[0m \u001b[38;5;129m@Substitution\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124mleft : DataFrame or named Series\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 94\u001b[0m \u001b[38;5;129m@Appender\u001b[39m(_merge_doc, indents\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m0\u001b[39m)\n\u001b[1;32m 95\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mmerge\u001b[39m(\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 108\u001b[0m validate: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 109\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m DataFrame:\n\u001b[0;32m--> 110\u001b[0m op \u001b[38;5;241m=\u001b[39m \u001b[43m_MergeOperation\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 111\u001b[0m \u001b[43m \u001b[49m\u001b[43mleft\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 112\u001b[0m \u001b[43m \u001b[49m\u001b[43mright\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 113\u001b[0m \u001b[43m \u001b[49m\u001b[43mhow\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mhow\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 114\u001b[0m \u001b[43m \u001b[49m\u001b[43mon\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mon\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 115\u001b[0m \u001b[43m \u001b[49m\u001b[43mleft_on\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mleft_on\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 116\u001b[0m \u001b[43m \u001b[49m\u001b[43mright_on\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mright_on\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 117\u001b[0m \u001b[43m \u001b[49m\u001b[43mleft_index\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mleft_index\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 118\u001b[0m \u001b[43m \u001b[49m\u001b[43mright_index\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mright_index\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 119\u001b[0m \u001b[43m \u001b[49m\u001b[43msort\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43msort\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 120\u001b[0m \u001b[43m \u001b[49m\u001b[43msuffixes\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43msuffixes\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 121\u001b[0m \u001b[43m \u001b[49m\u001b[43mindicator\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mindicator\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 122\u001b[0m \u001b[43m \u001b[49m\u001b[43mvalidate\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mvalidate\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 123\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 124\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m op\u001b[38;5;241m.\u001b[39mget_result(copy\u001b[38;5;241m=\u001b[39mcopy)\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/core/reshape/merge.py:685\u001b[0m, in \u001b[0;36m_MergeOperation.__init__\u001b[0;34m(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, indicator, validate)\u001b[0m\n\u001b[1;32m 681\u001b[0m \u001b[38;5;66;03m# stacklevel chosen to be correct when this is reached via pd.merge\u001b[39;00m\n\u001b[1;32m 682\u001b[0m \u001b[38;5;66;03m# (and not DataFrame.join)\u001b[39;00m\n\u001b[1;32m 683\u001b[0m warnings\u001b[38;5;241m.\u001b[39mwarn(msg, \u001b[38;5;167;01mFutureWarning\u001b[39;00m, stacklevel\u001b[38;5;241m=\u001b[39mfind_stack_level())\n\u001b[0;32m--> 685\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mleft_on, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mright_on \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_validate_left_right_on\u001b[49m\u001b[43m(\u001b[49m\u001b[43mleft_on\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mright_on\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 687\u001b[0m cross_col \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 688\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhow \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcross\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n", "File \u001b[0;32m~/.pyenv/versions/3.11.0rc2/envs/my3110/lib/python3.11/site-packages/pandas/core/reshape/merge.py:1434\u001b[0m, in \u001b[0;36m_MergeOperation._validate_left_right_on\u001b[0;34m(self, left_on, right_on)\u001b[0m\n\u001b[1;32m 1432\u001b[0m common_cols \u001b[38;5;241m=\u001b[39m left_cols\u001b[38;5;241m.\u001b[39mintersection(right_cols)\n\u001b[1;32m 1433\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(common_cols) \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m0\u001b[39m:\n\u001b[0;32m-> 1434\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m MergeError(\n\u001b[1;32m 1435\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mNo common columns to perform merge on. \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 1436\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mMerge options: left_on=\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mleft_on\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m, \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 1437\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mright_on=\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mright_on\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m, \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 1438\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mleft_index=\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mleft_index\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m, \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 1439\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mright_index=\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mright_index\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 1440\u001b[0m )\n\u001b[1;32m 1441\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m (\n\u001b[1;32m 1442\u001b[0m \u001b[38;5;129;01mnot\u001b[39;00m left_cols\u001b[38;5;241m.\u001b[39mjoin(common_cols, how\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minner\u001b[39m\u001b[38;5;124m\"\u001b[39m)\u001b[38;5;241m.\u001b[39mis_unique\n\u001b[1;32m 1443\u001b[0m \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m right_cols\u001b[38;5;241m.\u001b[39mjoin(common_cols, how\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minner\u001b[39m\u001b[38;5;124m\"\u001b[39m)\u001b[38;5;241m.\u001b[39mis_unique\n\u001b[1;32m 1444\u001b[0m ):\n\u001b[1;32m 1445\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m MergeError(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mData columns not unique: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mrepr\u001b[39m(common_cols)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n", "\u001b[0;31mMergeError\u001b[0m: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False" ] } ], "source": [ "# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html\n", "\n", "df1.merge(df2) ## ALERTA ! Necesita más parametros" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameIDCountryDNIWeightSalary
0Jhon1Italy1145.03000.1
1Pep2Germany2189.22030.2
2William3Finland3129.03000.0
3Snake4Italy4198.14020.2
\n", "
" ], "text/plain": [ " name ID Country DNI Weight Salary\n", "0 Jhon 1 Italy 1 145.0 3000.1\n", "1 Pep 2 Germany 2 189.2 2030.2\n", "2 William 3 Finland 3 129.0 3000.0\n", "3 Snake 4 Italy 4 198.1 4020.2" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.merge(df2, left_on='ID', right_on='DNI')" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DNIWeightSalary
01145.03000.1
13129.03000.0
23159.04000.0
33109.05000.0
44198.14020.2
55200.05000.2
\n", "
" ], "text/plain": [ " DNI Weight Salary\n", "0 1 145.0 3000.1\n", "1 3 129.0 3000.0\n", "2 3 159.0 4000.0\n", "3 3 109.0 5000.0\n", "4 4 198.1 4020.2\n", "5 5 200.0 5000.2" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.DataFrame([(1, 145.0, 3000.1), #No 2\n", " ( 3, 129.0, 3000.0), # Multiples 3\n", " ( 3, 159.0, 4000.0),\n", " ( 3, 109.0, 5000.0),\n", " ( 4, 198.1, 4020.2),\n", " ( 5, 200.0, 5000.2)], #a new one \n", " columns=('DNI', 'Weight', 'Salary')\n", " )\n", "df2" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameIDCountryDNIWeightSalary
0Jhon1Italy1145.03000.1
1William3Finland3129.03000.0
2William3Finland3159.04000.0
3William3Finland3109.05000.0
4Snake4Italy4198.14020.2
\n", "
" ], "text/plain": [ " name ID Country DNI Weight Salary\n", "0 Jhon 1 Italy 1 145.0 3000.1\n", "1 William 3 Finland 3 129.0 3000.0\n", "2 William 3 Finland 3 159.0 4000.0\n", "3 William 3 Finland 3 109.0 5000.0\n", "4 Snake 4 Italy 4 198.1 4020.2" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.merge(df2, left_on='ID', right_on='DNI')" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameIDCountryDNIWeightSalary
0Jhon1Italy1.0145.03000.1
1Pep2GermanyNaNNaNNaN
2William3Finland3.0129.03000.0
3William3Finland3.0159.04000.0
4William3Finland3.0109.05000.0
5Snake4Italy4.0198.14020.2
\n", "
" ], "text/plain": [ " name ID Country DNI Weight Salary\n", "0 Jhon 1 Italy 1.0 145.0 3000.1\n", "1 Pep 2 Germany NaN NaN NaN\n", "2 William 3 Finland 3.0 129.0 3000.0\n", "3 William 3 Finland 3.0 159.0 4000.0\n", "4 William 3 Finland 3.0 109.0 5000.0\n", "5 Snake 4 Italy 4.0 198.1 4020.2" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.merge(df2, left_on='ID', right_on='DNI',how=\"left\")" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameIDCountryDNIWeightSalary
0Jhon1.0Italy1145.03000.1
1William3.0Finland3129.03000.0
2William3.0Finland3159.04000.0
3William3.0Finland3109.05000.0
4Snake4.0Italy4198.14020.2
5NaNNaNNaN5200.05000.2
\n", "
" ], "text/plain": [ " name ID Country DNI Weight Salary\n", "0 Jhon 1.0 Italy 1 145.0 3000.1\n", "1 William 3.0 Finland 3 129.0 3000.0\n", "2 William 3.0 Finland 3 159.0 4000.0\n", "3 William 3.0 Finland 3 109.0 5000.0\n", "4 Snake 4.0 Italy 4 198.1 4020.2\n", "5 NaN NaN NaN 5 200.0 5000.2" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.merge(df2, left_on='ID', right_on='DNI',how=\"right\")" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameIDCountryDNIWeightSalary
0Jhon1Italy1145.03000.1
1William3Finland3129.03000.0
2William3Finland3159.04000.0
3William3Finland3109.05000.0
4Snake4Italy4198.14020.2
\n", "
" ], "text/plain": [ " name ID Country DNI Weight Salary\n", "0 Jhon 1 Italy 1 145.0 3000.1\n", "1 William 3 Finland 3 129.0 3000.0\n", "2 William 3 Finland 3 159.0 4000.0\n", "3 William 3 Finland 3 109.0 5000.0\n", "4 Snake 4 Italy 4 198.1 4020.2" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.merge(df2, left_on='ID', right_on='DNI',how=\"inner\")" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameIDCountryDNIWeightSalary
0Jhon1Italy1145.03000.1
1Jhon1Italy3129.03000.0
2Jhon1Italy3159.04000.0
3Jhon1Italy3109.05000.0
4Jhon1Italy4198.14020.2
5Jhon1Italy5200.05000.2
6Pep2Germany1145.03000.1
7Pep2Germany3129.03000.0
8Pep2Germany3159.04000.0
9Pep2Germany3109.05000.0
10Pep2Germany4198.14020.2
11Pep2Germany5200.05000.2
12William3Finland1145.03000.1
13William3Finland3129.03000.0
14William3Finland3159.04000.0
15William3Finland3109.05000.0
16William3Finland4198.14020.2
17William3Finland5200.05000.2
18Snake4Italy1145.03000.1
19Snake4Italy3129.03000.0
20Snake4Italy3159.04000.0
21Snake4Italy3109.05000.0
22Snake4Italy4198.14020.2
23Snake4Italy5200.05000.2
\n", "
" ], "text/plain": [ " name ID Country DNI Weight Salary\n", "0 Jhon 1 Italy 1 145.0 3000.1\n", "1 Jhon 1 Italy 3 129.0 3000.0\n", "2 Jhon 1 Italy 3 159.0 4000.0\n", "3 Jhon 1 Italy 3 109.0 5000.0\n", "4 Jhon 1 Italy 4 198.1 4020.2\n", "5 Jhon 1 Italy 5 200.0 5000.2\n", "6 Pep 2 Germany 1 145.0 3000.1\n", "7 Pep 2 Germany 3 129.0 3000.0\n", "8 Pep 2 Germany 3 159.0 4000.0\n", "9 Pep 2 Germany 3 109.0 5000.0\n", "10 Pep 2 Germany 4 198.1 4020.2\n", "11 Pep 2 Germany 5 200.0 5000.2\n", "12 William 3 Finland 1 145.0 3000.1\n", "13 William 3 Finland 3 129.0 3000.0\n", "14 William 3 Finland 3 159.0 4000.0\n", "15 William 3 Finland 3 109.0 5000.0\n", "16 William 3 Finland 4 198.1 4020.2\n", "17 William 3 Finland 5 200.0 5000.2\n", "18 Snake 4 Italy 1 145.0 3000.1\n", "19 Snake 4 Italy 3 129.0 3000.0\n", "20 Snake 4 Italy 3 159.0 4000.0\n", "21 Snake 4 Italy 3 109.0 5000.0\n", "22 Snake 4 Italy 4 198.1 4020.2\n", "23 Snake 4 Italy 5 200.0 5000.2" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.merge(df2,how=\"cross\")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Ejercicios\n", "\n", "\n", "Usando el fichero who.csv, se pide:
" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
0Afghanistan11151.028.0NaNNaNNaN26088.04.0...692.50NaNNaNNaN257.00231.9257.005740436.05.4422.9
1Albania2227.098.76000.093.094.03172.00.6...3499.124.790000e+0978.14-2.040000e+0918.4715.518.471431793.92.2145.4
2Algeria336.069.95940.094.096.033351.01.5...137535.566.970000e+10351.364.700000e+0940.0031.240.0020800000.02.6163.3
3Andorra42NaNNaNNaN83.083.074.01.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4Angola53146.067.43890.049.051.016557.02.8...8991.461.490000e+1027.139.140000e+09164.10242.5164.108578749.04.1453.3
\n", "

5 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "0 Afghanistan 1 1 151.0 \n", "1 Albania 2 2 27.0 \n", "2 Algeria 3 3 6.0 \n", "3 Andorra 4 2 NaN \n", "4 Angola 5 3 146.0 \n", "\n", " Adult literacy rate (%) \\\n", "0 28.0 \n", "1 98.7 \n", "2 69.9 \n", "3 NaN \n", "4 67.4 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "0 NaN \n", "1 6000.0 \n", "2 5940.0 \n", "3 NaN \n", "4 3890.0 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "0 NaN \n", "1 93.0 \n", "2 94.0 \n", "3 83.0 \n", "4 49.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "0 NaN \n", "1 94.0 \n", "2 96.0 \n", "3 83.0 \n", "4 51.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "0 26088.0 4.0 ... \n", "1 3172.0 0.6 ... \n", "2 33351.0 1.5 ... \n", "3 74.0 1.0 ... \n", "4 16557.0 2.8 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "0 692.50 NaN NaN \n", "1 3499.12 4.790000e+09 78.14 \n", "2 137535.56 6.970000e+10 351.36 \n", "3 NaN NaN NaN \n", "4 8991.46 1.490000e+10 27.13 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "0 NaN 257.00 \n", "1 -2.040000e+09 18.47 \n", "2 4.700000e+09 40.00 \n", "3 NaN NaN \n", "4 9.140000e+09 164.10 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "0 231.9 257.00 \n", "1 15.5 18.47 \n", "2 31.2 40.00 \n", "3 NaN NaN \n", "4 242.5 164.10 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "0 5740436.0 5.44 22.9 \n", "1 1431793.9 2.21 45.4 \n", "2 20800000.0 2.61 63.3 \n", "3 NaN NaN NaN \n", "4 8578749.0 4.14 53.3 \n", "\n", "[5 rows x 358 columns]" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# read file csv\n", "import pandas as pd\n", "df_who = pd.read_csv(\"data/WHO.csv\")\n", "df_who.head()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "\n", "**1) ¿Cuál és la media de la población urbana (\"Urban_population\") de todos los países? ¿Su desviación típica (std)?**" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "16657626.767446807" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who[\"Urban_population\"].mean()" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "50948665.823935635" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who[\"Urban_population\"].std()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "**2) Consulta la fila del país: “Spain”**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_who[df_who[\"Country\"]==\"Spain\"]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "**3a) ¿Qué país tiene una mayor población urbana?**" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
36China3773.090.94660.096.0100.01328474.00.6...5547757.51.890000e+12295.231.250000e+1127.327.827.3527000000.02.9540.4
\n", "

1 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "36 China 37 7 3.0 \n", "\n", " Adult literacy rate (%) \\\n", "36 90.9 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "36 4660.0 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "36 96.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "36 100.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "36 1328474.0 0.6 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "36 5547757.5 1.890000e+12 295.23 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "36 1.250000e+11 27.3 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "36 27.8 27.3 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "36 527000000.0 2.95 40.4 \n", "\n", "[1 rows x 358 columns]" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who[df_who[\"Urban_population\"] == df_who[\"Urban_population\"].max()]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "**3b) ¿Qué paises tienen una población urbana menor a 50000 ?**" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
5Antigua and Barbuda64NaNNaN15130.0NaNNaN84.01.3...421.36830000000.0NaN-102000000.012.60NaN12.6032468.252.2539.1
69Grenada70553.0NaN8770.083.084.0106.00.3...234.50414000000.023.19-220000000.022.00NaN22.0032589.000.4530.6
91Kiribati926NaNNaN6230.098.096.094.01.7...25.6551900000.0NaN-20800000.066.00NaN66.0046926.003.0847.4
151Saint Kitts and Nevis1525NaNNaN12440.078.064.050.01.3...135.57387000000.024.28-84300000.021.00NaN21.0015456.001.7732.2
152Saint Lucia153551.0NaN8500.097.099.0163.01.1...370.06764000000.027.39-119000000.017.7014.017.7045482.321.1527.6
154Samoa155645.098.65090.091.090.0185.00.8...150.22286000000.012.46-116000000.029.98NaN29.9841181.281.1822.4
160Seychelles1613NaN91.814360.0100.099.086.00.7...578.91563000000.08.34-187000000.013.54NaN13.5443854.101.2052.9
182Tonga183617.098.95470.094.097.0100.00.5...117.25167000000.057.30-94600000.024.48NaN24.4823846.641.0424.0
\n", "

8 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent \\\n", "5 Antigua and Barbuda 6 4 \n", "69 Grenada 70 5 \n", "91 Kiribati 92 6 \n", "151 Saint Kitts and Nevis 152 5 \n", "152 Saint Lucia 153 5 \n", "154 Samoa 155 6 \n", "160 Seychelles 161 3 \n", "182 Tonga 183 6 \n", "\n", " Adolescent fertility rate (%) Adult literacy rate (%) \\\n", "5 NaN NaN \n", "69 53.0 NaN \n", "91 NaN NaN \n", "151 NaN NaN \n", "152 51.0 NaN \n", "154 45.0 98.6 \n", "160 NaN 91.8 \n", "182 17.0 98.9 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "5 15130.0 \n", "69 8770.0 \n", "91 6230.0 \n", "151 12440.0 \n", "152 8500.0 \n", "154 5090.0 \n", "160 14360.0 \n", "182 5470.0 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "5 NaN \n", "69 83.0 \n", "91 98.0 \n", "151 78.0 \n", "152 97.0 \n", "154 91.0 \n", "160 100.0 \n", "182 94.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "5 NaN \n", "69 84.0 \n", "91 96.0 \n", "151 64.0 \n", "152 99.0 \n", "154 90.0 \n", "160 99.0 \n", "182 97.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "5 84.0 1.3 ... \n", "69 106.0 0.3 ... \n", "91 94.0 1.7 ... \n", "151 50.0 1.3 ... \n", "152 163.0 1.1 ... \n", "154 185.0 0.8 ... \n", "160 86.0 0.7 ... \n", "182 100.0 0.5 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "5 421.36 830000000.0 NaN \n", "69 234.50 414000000.0 23.19 \n", "91 25.65 51900000.0 NaN \n", "151 135.57 387000000.0 24.28 \n", "152 370.06 764000000.0 27.39 \n", "154 150.22 286000000.0 12.46 \n", "160 578.91 563000000.0 8.34 \n", "182 117.25 167000000.0 57.30 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "5 -102000000.0 12.60 \n", "69 -220000000.0 22.00 \n", "91 -20800000.0 66.00 \n", "151 -84300000.0 21.00 \n", "152 -119000000.0 17.70 \n", "154 -116000000.0 29.98 \n", "160 -187000000.0 13.54 \n", "182 -94600000.0 24.48 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "5 NaN 12.60 \n", "69 NaN 22.00 \n", "91 NaN 66.00 \n", "151 NaN 21.00 \n", "152 14.0 17.70 \n", "154 NaN 29.98 \n", "160 NaN 13.54 \n", "182 NaN 24.48 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "5 32468.25 2.25 39.1 \n", "69 32589.00 0.45 30.6 \n", "91 46926.00 3.08 47.4 \n", "151 15456.00 1.77 32.2 \n", "152 45482.32 1.15 27.6 \n", "154 41181.28 1.18 22.4 \n", "160 43854.10 1.20 52.9 \n", "182 23846.64 1.04 24.0 \n", "\n", "[8 rows x 358 columns]" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who[df_who[\"Urban_population\"] < 50000 ]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "**4) ¿El continente donde está situado Spain es el mismo que el de `United States of America?**\n", "\n", "Utiliza una condición para obtener un resultado Booleano (*True* o *False*)" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
168Spain169210.097.228200.099.0100.043887.01.1...343701.536.780000e+11NaN-5.770000e+104.94.24.933300000.01.7576.7
\n", "

1 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "168 Spain 169 2 10.0 \n", "\n", " Adult literacy rate (%) \\\n", "168 97.2 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "168 28200.0 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "168 99.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "168 100.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "168 43887.0 1.1 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "168 343701.53 6.780000e+11 NaN \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "168 -5.770000e+10 4.9 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "168 4.2 4.9 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "168 33300000.0 1.75 76.7 \n", "\n", "[1 rows x 358 columns]" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample_spain = df_who[df_who[\"Country\"]==\"Spain\"]\n", "sample_spain" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
192United States of America193443.0NaN44070.093.091.0302841.01.0...5776431.51.100000e+13NaN-7.140000e+118.07.18.0240000000.01.3980.8
\n", "

1 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent \\\n", "192 United States of America 193 4 \n", "\n", " Adolescent fertility rate (%) Adult literacy rate (%) \\\n", "192 43.0 NaN \n", "\n", " Gross national income per capita (PPP international $) \\\n", "192 44070.0 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "192 93.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "192 91.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "192 302841.0 1.0 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "192 5776431.5 1.100000e+13 NaN \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "192 -7.140000e+11 8.0 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "192 7.1 8.0 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "192 240000000.0 1.39 80.8 \n", "\n", "[1 rows x 358 columns]" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample_eeuu = df_who[df_who[\"Country\"]==\"United States of America\"]\n", "sample_eeuu" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample_spain[\"Continent\"].values[0] == sample_eeuu[\"Continent\"].values[0]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "**5) ¿Cuáles son los cinco paises más contaminantes (\"Total_CO2_emissions\")?**\n", "\n", "Esta es mi pista para una solución elegante: http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.sort_values.html" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "**6) Observando algunas muestras del fichero puedes establecer la relación entre el identificador del continente y su nombre?**\n", "\n", "Es decir, sabemos que Spain está en el continente Europeo y el código del continente es el 2. \n", "\n", "Existen los códigos de continentes: 1, 2, 3, 4, 5, 6, 7\n", "\n", "**Nota:** Hay dos códigos asociados a Asia.\n", "\n", "Haz las consultas pertinentes al dataframe para construir un diccionario con la siguiente estructura:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "codigoContinentes = {1:\"Asia\",2:\"Europa\"} #Al menos hay 7!\n", "print(codigoContinentes[2])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "codigoContinentes = {1:\"Asia\",2:\"Europa\",3:\"\", 4:\"\"} # TODO... " ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "**7) Una vez identificado el nombre de los continentes, ¿puedes cambiar la columna de identificadores de continentes por sus respectivos nombres?**\n", "\n", "Esta es es mi pista para una solución elegante: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "**8) Puedes crear un nuevo dataframe con aquellos paises que sean de Europa?**\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "df2 = df #Con una simple asignación ya creas una dataframe \n", "type(df2)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "1**9) ¿Cuáles son los paises más contaminantes de Europa?**\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**10) Calcula la cantidad de ayuda recibida por cada municipio en función del númeto total de habitantes.**" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataframe de Municipios:\n", " Nombre Código Postal Población\n", "0 Municipio1 60494 39232\n", "1 Municipio2 65125 15315\n", "2 Municipio3 15306 34075\n", "3 Municipio4 43936 10127\n", "4 Municipio5 77013 19470\n", "5 Municipio6 73691 10158\n", "6 Municipio7 63075 7214\n", "7 Municipio8 49755 41525\n", "8 Municipio9 72468 17417\n", "9 Municipio10 56930 35902\n", "\n", "Dataframe de Ayudas:\n", " Nombre Ayuda Económica (en euros) Número de Beneficiarios\n", "0 Municipio1 3407 88\n", "1 Municipio2 6081 91\n", "2 Municipio3 2618 36\n", "3 Municipio4 2208 80\n", "4 Municipio5 6409 71\n", "5 Municipio6 8735 66\n", "6 Municipio7 2649 76\n", "7 Municipio8 6796 43\n", "8 Municipio9 8113 17\n", "9 Municipio10 6180 80\n" ] } ], "source": [ "import pandas as pd\n", "import random\n", "\n", "random.seed(0)\n", "\n", "nombres = [f'Municipio{i}' for i in range(1, 11)] \n", "\n", "data_municipios = {\n", " 'Nombre': nombres,\n", " 'Código Postal': [random.randint(10000, 99999) for _ in range(10)],\n", " 'Población': [random.randint(1000, 50000) for _ in range(10)] # Añadimos un atributo aleatorio, en este caso \"Población\"\n", "}\n", "\n", "df_municipios = pd.DataFrame(data_municipios)\n", "\n", "\n", "data_ayudas = {\n", " 'Nombre': [f'Municipio{i}' for i in range(1, 11)],\n", " 'Ayuda Económica (en euros)': [random.randint(1000, 10000) for _ in range(10)],\n", " 'Número de Beneficiarios': [random.randint(10, 100) for _ in range(10)]\n", "}\n", "\n", "df_ayudas = pd.DataFrame(data_ayudas)\n", "\n", "print(\"Dataframe de Municipios:\")\n", "print(df_municipios)\n", "\n", "print(\"\\nDataframe de Ayudas:\")\n", "print(df_ayudas)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Usando el fichero climaMallorca.csv, se pide:
\n", "**11) ¿Cual es la temperatura máxima cuando el viento es inferior a 10? ¿Cuántas muestras hay?**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**12) ¿Cual es la temperatura máxima cuando el viento es superior a 10 y inferior a 20? ¿Cuántas muestras hay?**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
\n", "Isaac Lera and Gabriel Moya
\n", "Universitat de les Illes Balears
\n", "isaac.lera@uib.edu, gabriel.moya@uib.edu" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "my3110", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" } }, "nbformat": 4, "nbformat_minor": 1 }