{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Agrupaciones de datos\n",
    "\n",
    "Las agrupaciones son operaciones necesarias para analizar datos, ya que permiten extraer información en función de datos categóricos de nuestro dataframe.\n",
    "\n",
    "Cargaremos los datos llamados _experiment.csv_ que podemos encontrar en el siguiente [enlace](https://github.com/wisaaco/TTAD/tree/main/data/)"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "slide"
    },
    "ExecuteTime": {
     "end_time": "2025-11-14T08:12:59.745855Z",
     "start_time": "2025-11-14T08:12:59.240653Z"
    }
   },
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "df= pd.read_csv(\"../../data/Pandas/experiment.csv\")\n",
    "print(df)"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   Nombre Apellidos  Altura Sexo  Nacimiento   Cof Categoria\n",
      "0    Will     Smith    1.43    M  10/10/1920  0.19   laboral\n",
      "1     Jon      Snow    1.98    M   10/1/1970  0.98   laboral\n",
      "2    Laia   Ramirez    1.87    F  09/10/1987  0.76      cap6\n",
      "3    Luzy      Raim    1.67    F  23/07/1979  0.56      cap6\n",
      "4    Fein      Mang    1.78    M  12/03/1937  0.27      cap6\n",
      "5  Victor     Colom    1.78    M  22/09/1957  0.97      cap8\n"
     ]
    }
   ],
   "execution_count": 1
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "En el siguiente ejemplo agrupamos los datos según el sexo de la persona mediante el método `groupby` que devuelve un `DataFrame` agrupado:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "-"
    },
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:00.879969Z",
     "start_time": "2025-11-14T08:13:00.876274Z"
    }
   },
   "source": [
    "bySex = df.groupby('Sexo')\n",
    "bySex"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x10af7cfe0>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 2
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "El atributo `groups` nos muestra los grupos hemos creado:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:01.773017Z",
     "start_time": "2025-11-14T08:13:01.769855Z"
    }
   },
   "source": [
    "# Podemos saber los grupos realizados y que índices del dataframe tienen.\n",
    "bySex.groups # nos proporciona un diccionario\n"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'F': [2, 3], 'M': [0, 1, 4, 5]}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 3
  },
  {
   "cell_type": "code",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    },
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:02.213029Z",
     "start_time": "2025-11-14T08:13:02.194939Z"
    }
   },
   "source": [
    "bySex.describe()"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "     Altura                                                     Cof          \\\n",
       "      count    mean       std   min     25%   50%   75%   max count    mean   \n",
       "Sexo                                                                          \n",
       "F       2.0  1.7700  0.141421  1.67  1.7200  1.77  1.82  1.87   2.0  0.6600   \n",
       "M       4.0  1.7425  0.228674  1.43  1.6925  1.78  1.83  1.98   4.0  0.6025   \n",
       "\n",
       "                                                \n",
       "           std   min   25%   50%     75%   max  \n",
       "Sexo                                            \n",
       "F     0.141421  0.56  0.61  0.66  0.7100  0.76  \n",
       "M     0.431383  0.19  0.25  0.62  0.9725  0.98  "
      ],
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"8\" halign=\"left\">Altura</th>\n",
       "      <th colspan=\"8\" halign=\"left\">Cof</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sexo</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>2.0</td>\n",
       "      <td>1.7700</td>\n",
       "      <td>0.141421</td>\n",
       "      <td>1.67</td>\n",
       "      <td>1.7200</td>\n",
       "      <td>1.77</td>\n",
       "      <td>1.82</td>\n",
       "      <td>1.87</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.6600</td>\n",
       "      <td>0.141421</td>\n",
       "      <td>0.56</td>\n",
       "      <td>0.61</td>\n",
       "      <td>0.66</td>\n",
       "      <td>0.7100</td>\n",
       "      <td>0.76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>4.0</td>\n",
       "      <td>1.7425</td>\n",
       "      <td>0.228674</td>\n",
       "      <td>1.43</td>\n",
       "      <td>1.6925</td>\n",
       "      <td>1.78</td>\n",
       "      <td>1.83</td>\n",
       "      <td>1.98</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.6025</td>\n",
       "      <td>0.431383</td>\n",
       "      <td>0.19</td>\n",
       "      <td>0.25</td>\n",
       "      <td>0.62</td>\n",
       "      <td>0.9725</td>\n",
       "      <td>0.98</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 4
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Esto nos permite realizar operaciones de filtrado con base a los grupos que hemos creado:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:03.058260Z",
     "start_time": "2025-11-14T08:13:03.054623Z"
    }
   },
   "source": [
    "dfM = df.loc[bySex.groups['M'].values] #Recorda que \"loc\" accedeix per index de fila\n",
    "dfM"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "   Nombre Apellidos  Altura Sexo  Nacimiento   Cof Categoria\n",
       "0    Will     Smith    1.43    M  10/10/1920  0.19   laboral\n",
       "1     Jon      Snow    1.98    M   10/1/1970  0.98   laboral\n",
       "4    Fein      Mang    1.78    M  12/03/1937  0.27      cap6\n",
       "5  Victor     Colom    1.78    M  22/09/1957  0.97      cap8"
      ],
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Apellidos</th>\n",
       "      <th>Altura</th>\n",
       "      <th>Sexo</th>\n",
       "      <th>Nacimiento</th>\n",
       "      <th>Cof</th>\n",
       "      <th>Categoria</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Will</td>\n",
       "      <td>Smith</td>\n",
       "      <td>1.43</td>\n",
       "      <td>M</td>\n",
       "      <td>10/10/1920</td>\n",
       "      <td>0.19</td>\n",
       "      <td>laboral</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jon</td>\n",
       "      <td>Snow</td>\n",
       "      <td>1.98</td>\n",
       "      <td>M</td>\n",
       "      <td>10/1/1970</td>\n",
       "      <td>0.98</td>\n",
       "      <td>laboral</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Fein</td>\n",
       "      <td>Mang</td>\n",
       "      <td>1.78</td>\n",
       "      <td>M</td>\n",
       "      <td>12/03/1937</td>\n",
       "      <td>0.27</td>\n",
       "      <td>cap6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Victor</td>\n",
       "      <td>Colom</td>\n",
       "      <td>1.78</td>\n",
       "      <td>M</td>\n",
       "      <td>22/09/1957</td>\n",
       "      <td>0.97</td>\n",
       "      <td>cap8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 5
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:13.214632Z",
     "start_time": "2025-11-14T08:13:13.208823Z"
    }
   },
   "source": [
    "df[df.Sexo==\"M\"] # es lo  mismo!"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "   Nombre Apellidos  Altura Sexo  Nacimiento   Cof Categoria\n",
       "0    Will     Smith    1.43    M  10/10/1920  0.19   laboral\n",
       "1     Jon      Snow    1.98    M   10/1/1970  0.98   laboral\n",
       "4    Fein      Mang    1.78    M  12/03/1937  0.27      cap6\n",
       "5  Victor     Colom    1.78    M  22/09/1957  0.97      cap8"
      ],
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Apellidos</th>\n",
       "      <th>Altura</th>\n",
       "      <th>Sexo</th>\n",
       "      <th>Nacimiento</th>\n",
       "      <th>Cof</th>\n",
       "      <th>Categoria</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Will</td>\n",
       "      <td>Smith</td>\n",
       "      <td>1.43</td>\n",
       "      <td>M</td>\n",
       "      <td>10/10/1920</td>\n",
       "      <td>0.19</td>\n",
       "      <td>laboral</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jon</td>\n",
       "      <td>Snow</td>\n",
       "      <td>1.98</td>\n",
       "      <td>M</td>\n",
       "      <td>10/1/1970</td>\n",
       "      <td>0.98</td>\n",
       "      <td>laboral</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Fein</td>\n",
       "      <td>Mang</td>\n",
       "      <td>1.78</td>\n",
       "      <td>M</td>\n",
       "      <td>12/03/1937</td>\n",
       "      <td>0.27</td>\n",
       "      <td>cap6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Victor</td>\n",
       "      <td>Colom</td>\n",
       "      <td>1.78</td>\n",
       "      <td>M</td>\n",
       "      <td>22/09/1957</td>\n",
       "      <td>0.97</td>\n",
       "      <td>cap8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 6
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Funciones de agregación en grupos.\n",
    "\n",
    "El método 'aggregate' nos permite crear variables de agregación en la tabla obtenida con 'groupby'. Indicaremos la información que deseamos obtener de cada columna utilizando un diccionario. Especificamos la función que aplicaremos a los datos de cada grupo en cada columna para obtener un único valor.\n",
    "\n",
    "- Ref: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.aggregate.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Para ejemplificar esta sección, agruparemos el _dataframe_ por `Categoria` laboral. En este caso para la columna `Altura` consultamos la suma de las alturas del grupo y `Cof` la media.\n",
    "\n",
    "La función `aggregate` nos permite crear variables de agregación sobre la tabla obtenida con `groupby`. Indicaremos la información que queremos obtener de cada columna con un diccionario. Especificamos la función que vamos a aplicar a los datos de cada grupo en cada columna para obtener un único valor."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    },
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:14.301171Z",
     "start_time": "2025-11-14T08:13:14.293704Z"
    }
   },
   "source": [
    "dfg = df.groupby([\"Categoria\"]).aggregate({\n",
    "    \"Altura\":\"sum\",\n",
    "    \"Cof\":\"mean\"})\n",
    "\n",
    "dfg"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "           Altura    Cof\n",
       "Categoria               \n",
       "cap6         5.32  0.530\n",
       "cap8         1.78  0.970\n",
       "laboral      3.41  0.585"
      ],
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Altura</th>\n",
       "      <th>Cof</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Categoria</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>cap6</th>\n",
       "      <td>5.32</td>\n",
       "      <td>0.530</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cap8</th>\n",
       "      <td>1.78</td>\n",
       "      <td>0.970</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>laboral</th>\n",
       "      <td>3.41</td>\n",
       "      <td>0.585</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 7
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:15.051012Z",
     "start_time": "2025-11-14T08:13:15.048162Z"
    }
   },
   "source": [
    "dfg.index"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['cap6', 'cap8', 'laboral'], dtype='object', name='Categoria')"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 8
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "**Podemos aplicar un gran número de funciones de agregación:**\n",
    "\n",
    "- [Funciones estadísticas](https://docs.scipy.org/doc/numpy/reference/routines.statistics.html): mean, std, ...\n",
    "\n",
    "- [Funciones matemáticas](https://docs.scipy.org/doc/numpy/reference/routines.math.html): sum, prod, ...\n",
    "\n",
    "- Otras funciones: max, min, ...\n",
    "\n",
    "- [Documentación](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.aggregate.html)\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:15.855878Z",
     "start_time": "2025-11-14T08:13:15.853676Z"
    }
   },
   "source": [
    "type(dfg) #Alerta! Una agregación genera un dataframe y por lo tanto podemos seguir aplicando lo que ya sabemos"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 9
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:16.595836Z",
     "start_time": "2025-11-14T08:13:16.590930Z"
    }
   },
   "source": [
    "dfg[dfg.Altura>3]"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "           Altura    Cof\n",
       "Categoria               \n",
       "cap6         5.32  0.530\n",
       "laboral      3.41  0.585"
      ],
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Altura</th>\n",
       "      <th>Cof</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Categoria</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>cap6</th>\n",
       "      <td>5.32</td>\n",
       "      <td>0.530</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>laboral</th>\n",
       "      <td>3.41</td>\n",
       "      <td>0.585</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 10
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Agrupaciones de múltiples columnas\n",
    "\n",
    "También se pueden realizar agrupaciones de múltiples columnas. Se crean todas las combinaciones de las diversas columnas que existen en el DataFrame. Veamos un ejemplo:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "-"
    },
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:17.745696Z",
     "start_time": "2025-11-14T08:13:17.741696Z"
    }
   },
   "source": [
    "gr = df.groupby(['Sexo',\"Categoria\"]).mean(numeric_only=True)\n",
    "gr\n",
    "\n"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "                Altura    Cof\n",
       "Sexo Categoria               \n",
       "F    cap6        1.770  0.660\n",
       "M    cap6        1.780  0.270\n",
       "     cap8        1.780  0.970\n",
       "     laboral     1.705  0.585"
      ],
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Altura</th>\n",
       "      <th>Cof</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sexo</th>\n",
       "      <th>Categoria</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <th>cap6</th>\n",
       "      <td>1.770</td>\n",
       "      <td>0.660</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">M</th>\n",
       "      <th>cap6</th>\n",
       "      <td>1.780</td>\n",
       "      <td>0.270</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cap8</th>\n",
       "      <td>1.780</td>\n",
       "      <td>0.970</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>laboral</th>\n",
       "      <td>1.705</td>\n",
       "      <td>0.585</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 11
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Si queremos realizar un conteo de los elementos, debemos seleccionar"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "scrolled": true,
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:18.900938Z",
     "start_time": "2025-11-14T08:13:18.896173Z"
    }
   },
   "source": [
    "gr = df.groupby(['Sexo',\"Categoria\"])[\"Sexo\"].count()\n",
    "print(gr)"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sexo  Categoria\n",
      "F     cap6         2\n",
      "M     cap6         1\n",
      "      cap8         1\n",
      "      laboral      2\n",
      "Name: Sexo, dtype: int64\n"
     ]
    }
   ],
   "execution_count": 12
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multiindice \n",
    "\n",
    "A veces, un índice no es suficiente para expresar la meta-información que identifica una o varias columnas. Por ejemplo, una coordenada está compuesta por la latitud y la longitud.\n",
    "\n",
    "Un 'multiíndice' es una jerarquía de índices.\n",
    "\n",
    "Agrupar según diferentes criterios resulta en un multiíndice."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:20.002080Z",
     "start_time": "2025-11-14T08:13:19.997867Z"
    }
   },
   "source": [
    "gr = df.groupby(['Sexo',\"Categoria\"]).count()\n",
    "gr.index"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex([('F',    'cap6'),\n",
       "            ('M',    'cap6'),\n",
       "            ('M',    'cap8'),\n",
       "            ('M', 'laboral')],\n",
       "           names=['Sexo', 'Categoria'])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 13
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:21.764160Z",
     "start_time": "2025-11-14T08:13:21.760510Z"
    }
   },
   "source": [
    "gr.loc[(\"M\",:)] # Como accedemos en rango en un multiindice? "
   ],
   "outputs": [
    {
     "ename": "SyntaxError",
     "evalue": "invalid syntax (263982132.py, line 1)",
     "output_type": "error",
     "traceback": [
      "  \u001B[36mCell\u001B[39m\u001B[36m \u001B[39m\u001B[32mIn[14]\u001B[39m\u001B[32m, line 1\u001B[39m\n\u001B[31m    \u001B[39m\u001B[31mgr.loc[(\"M\",:)] # Como accedemos en rango en un multiindice?\u001B[39m\n                ^\n\u001B[31mSyntaxError\u001B[39m\u001B[31m:\u001B[39m invalid syntax\n"
     ]
    }
   ],
   "execution_count": 14
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:22.938106Z",
     "start_time": "2025-11-14T08:13:22.932545Z"
    }
   },
   "source": [
    "gr.loc[\"F\"] # primer index"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "           Nombre  Apellidos  Altura  Nacimiento  Cof\n",
       "Categoria                                            \n",
       "cap6            2          2       2           2    2"
      ],
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Apellidos</th>\n",
       "      <th>Altura</th>\n",
       "      <th>Nacimiento</th>\n",
       "      <th>Cof</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Categoria</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>cap6</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 15
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:24.064872Z",
     "start_time": "2025-11-14T08:13:23.741714Z"
    }
   },
   "source": [
    "gr.loc[\"cap6\"] # dependent index"
   ],
   "outputs": [
    {
     "ename": "KeyError",
     "evalue": "'cap6'",
     "output_type": "error",
     "traceback": [
      "\u001B[31m---------------------------------------------------------------------------\u001B[39m",
      "\u001B[31mKeyError\u001B[39m                                  Traceback (most recent call last)",
      "\u001B[36mFile \u001B[39m\u001B[32m~/PycharmProjects/TTAD/.venv/lib/python3.12/site-packages/pandas/core/indexes/base.py:3812\u001B[39m, in \u001B[36mIndex.get_loc\u001B[39m\u001B[34m(self, key)\u001B[39m\n\u001B[32m   3811\u001B[39m \u001B[38;5;28;01mtry\u001B[39;00m:\n\u001B[32m-> \u001B[39m\u001B[32m3812\u001B[39m     \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[38;5;28;43mself\u001B[39;49m\u001B[43m.\u001B[49m\u001B[43m_engine\u001B[49m\u001B[43m.\u001B[49m\u001B[43mget_loc\u001B[49m\u001B[43m(\u001B[49m\u001B[43mcasted_key\u001B[49m\u001B[43m)\u001B[49m\n\u001B[32m   3813\u001B[39m \u001B[38;5;28;01mexcept\u001B[39;00m \u001B[38;5;167;01mKeyError\u001B[39;00m \u001B[38;5;28;01mas\u001B[39;00m err:\n",
      "\u001B[36mFile \u001B[39m\u001B[32mpandas/_libs/index.pyx:167\u001B[39m, in \u001B[36mpandas._libs.index.IndexEngine.get_loc\u001B[39m\u001B[34m()\u001B[39m\n",
      "\u001B[36mFile \u001B[39m\u001B[32mpandas/_libs/index.pyx:196\u001B[39m, in \u001B[36mpandas._libs.index.IndexEngine.get_loc\u001B[39m\u001B[34m()\u001B[39m\n",
      "\u001B[36mFile \u001B[39m\u001B[32mpandas/_libs/hashtable_class_helper.pxi:7088\u001B[39m, in \u001B[36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001B[39m\u001B[34m()\u001B[39m\n",
      "\u001B[36mFile \u001B[39m\u001B[32mpandas/_libs/hashtable_class_helper.pxi:7096\u001B[39m, in \u001B[36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001B[39m\u001B[34m()\u001B[39m\n",
      "\u001B[31mKeyError\u001B[39m: 'cap6'",
      "\nThe above exception was the direct cause of the following exception:\n",
      "\u001B[31mKeyError\u001B[39m                                  Traceback (most recent call last)",
      "\u001B[36mCell\u001B[39m\u001B[36m \u001B[39m\u001B[32mIn[16]\u001B[39m\u001B[32m, line 1\u001B[39m\n\u001B[32m----> \u001B[39m\u001B[32m1\u001B[39m \u001B[43mgr\u001B[49m\u001B[43m.\u001B[49m\u001B[43mloc\u001B[49m\u001B[43m[\u001B[49m\u001B[33;43m\"\u001B[39;49m\u001B[33;43mcap6\u001B[39;49m\u001B[33;43m\"\u001B[39;49m\u001B[43m]\u001B[49m \u001B[38;5;66;03m# dependent index\u001B[39;00m\n",
      "\u001B[36mFile \u001B[39m\u001B[32m~/PycharmProjects/TTAD/.venv/lib/python3.12/site-packages/pandas/core/indexing.py:1192\u001B[39m, in \u001B[36m_LocationIndexer.__getitem__\u001B[39m\u001B[34m(self, key)\u001B[39m\n\u001B[32m   1190\u001B[39m maybe_callable = com.apply_if_callable(key, \u001B[38;5;28mself\u001B[39m.obj)\n\u001B[32m   1191\u001B[39m maybe_callable = \u001B[38;5;28mself\u001B[39m._check_deprecated_callable_usage(key, maybe_callable)\n\u001B[32m-> \u001B[39m\u001B[32m1192\u001B[39m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[38;5;28;43mself\u001B[39;49m\u001B[43m.\u001B[49m\u001B[43m_getitem_axis\u001B[49m\u001B[43m(\u001B[49m\u001B[43mmaybe_callable\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43maxis\u001B[49m\u001B[43m=\u001B[49m\u001B[43maxis\u001B[49m\u001B[43m)\u001B[49m\n",
      "\u001B[36mFile \u001B[39m\u001B[32m~/PycharmProjects/TTAD/.venv/lib/python3.12/site-packages/pandas/core/indexing.py:1432\u001B[39m, in \u001B[36m_LocIndexer._getitem_axis\u001B[39m\u001B[34m(self, key, axis)\u001B[39m\n\u001B[32m   1430\u001B[39m \u001B[38;5;66;03m# fall thru to straight lookup\u001B[39;00m\n\u001B[32m   1431\u001B[39m \u001B[38;5;28mself\u001B[39m._validate_key(key, axis)\n\u001B[32m-> \u001B[39m\u001B[32m1432\u001B[39m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[38;5;28;43mself\u001B[39;49m\u001B[43m.\u001B[49m\u001B[43m_get_label\u001B[49m\u001B[43m(\u001B[49m\u001B[43mkey\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43maxis\u001B[49m\u001B[43m=\u001B[49m\u001B[43maxis\u001B[49m\u001B[43m)\u001B[49m\n",
      "\u001B[36mFile \u001B[39m\u001B[32m~/PycharmProjects/TTAD/.venv/lib/python3.12/site-packages/pandas/core/indexing.py:1382\u001B[39m, in \u001B[36m_LocIndexer._get_label\u001B[39m\u001B[34m(self, label, axis)\u001B[39m\n\u001B[32m   1380\u001B[39m \u001B[38;5;28;01mdef\u001B[39;00m\u001B[38;5;250m \u001B[39m\u001B[34m_get_label\u001B[39m(\u001B[38;5;28mself\u001B[39m, label, axis: AxisInt):\n\u001B[32m   1381\u001B[39m     \u001B[38;5;66;03m# GH#5567 this will fail if the label is not present in the axis.\u001B[39;00m\n\u001B[32m-> \u001B[39m\u001B[32m1382\u001B[39m     \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[38;5;28;43mself\u001B[39;49m\u001B[43m.\u001B[49m\u001B[43mobj\u001B[49m\u001B[43m.\u001B[49m\u001B[43mxs\u001B[49m\u001B[43m(\u001B[49m\u001B[43mlabel\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43maxis\u001B[49m\u001B[43m=\u001B[49m\u001B[43maxis\u001B[49m\u001B[43m)\u001B[49m\n",
      "\u001B[36mFile \u001B[39m\u001B[32m~/PycharmProjects/TTAD/.venv/lib/python3.12/site-packages/pandas/core/generic.py:4315\u001B[39m, in \u001B[36mNDFrame.xs\u001B[39m\u001B[34m(self, key, axis, level, drop_level)\u001B[39m\n\u001B[32m   4312\u001B[39m     index = \u001B[38;5;28mself\u001B[39m.index\n\u001B[32m   4314\u001B[39m \u001B[38;5;28;01mif\u001B[39;00m \u001B[38;5;28misinstance\u001B[39m(index, MultiIndex):\n\u001B[32m-> \u001B[39m\u001B[32m4315\u001B[39m     loc, new_index = \u001B[43mindex\u001B[49m\u001B[43m.\u001B[49m\u001B[43m_get_loc_level\u001B[49m\u001B[43m(\u001B[49m\u001B[43mkey\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mlevel\u001B[49m\u001B[43m=\u001B[49m\u001B[32;43m0\u001B[39;49m\u001B[43m)\u001B[49m\n\u001B[32m   4316\u001B[39m     \u001B[38;5;28;01mif\u001B[39;00m \u001B[38;5;129;01mnot\u001B[39;00m drop_level:\n\u001B[32m   4317\u001B[39m         \u001B[38;5;28;01mif\u001B[39;00m lib.is_integer(loc):\n\u001B[32m   4318\u001B[39m             \u001B[38;5;66;03m# Slice index must be an integer or None\u001B[39;00m\n",
      "\u001B[36mFile \u001B[39m\u001B[32m~/PycharmProjects/TTAD/.venv/lib/python3.12/site-packages/pandas/core/indexes/multi.py:3309\u001B[39m, in \u001B[36mMultiIndex._get_loc_level\u001B[39m\u001B[34m(self, key, level)\u001B[39m\n\u001B[32m   3307\u001B[39m         \u001B[38;5;28;01mreturn\u001B[39;00m indexer, maybe_mi_droplevels(indexer, ilevels)\n\u001B[32m   3308\u001B[39m \u001B[38;5;28;01melse\u001B[39;00m:\n\u001B[32m-> \u001B[39m\u001B[32m3309\u001B[39m     indexer = \u001B[38;5;28;43mself\u001B[39;49m\u001B[43m.\u001B[49m\u001B[43m_get_level_indexer\u001B[49m\u001B[43m(\u001B[49m\u001B[43mkey\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mlevel\u001B[49m\u001B[43m=\u001B[49m\u001B[43mlevel\u001B[49m\u001B[43m)\u001B[49m\n\u001B[32m   3310\u001B[39m     \u001B[38;5;28;01mif\u001B[39;00m (\n\u001B[32m   3311\u001B[39m         \u001B[38;5;28misinstance\u001B[39m(key, \u001B[38;5;28mstr\u001B[39m)\n\u001B[32m   3312\u001B[39m         \u001B[38;5;129;01mand\u001B[39;00m \u001B[38;5;28mself\u001B[39m.levels[level]._supports_partial_string_indexing\n\u001B[32m   3313\u001B[39m     ):\n\u001B[32m   3314\u001B[39m         \u001B[38;5;66;03m# check to see if we did an exact lookup vs sliced\u001B[39;00m\n\u001B[32m   3315\u001B[39m         check = \u001B[38;5;28mself\u001B[39m.levels[level].get_loc(key)\n",
      "\u001B[36mFile \u001B[39m\u001B[32m~/PycharmProjects/TTAD/.venv/lib/python3.12/site-packages/pandas/core/indexes/multi.py:3410\u001B[39m, in \u001B[36mMultiIndex._get_level_indexer\u001B[39m\u001B[34m(self, key, level, indexer)\u001B[39m\n\u001B[32m   3407\u001B[39m         \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[38;5;28mslice\u001B[39m(i, j, step)\n\u001B[32m   3409\u001B[39m \u001B[38;5;28;01melse\u001B[39;00m:\n\u001B[32m-> \u001B[39m\u001B[32m3410\u001B[39m     idx = \u001B[38;5;28;43mself\u001B[39;49m\u001B[43m.\u001B[49m\u001B[43m_get_loc_single_level_index\u001B[49m\u001B[43m(\u001B[49m\u001B[43mlevel_index\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mkey\u001B[49m\u001B[43m)\u001B[49m\n\u001B[32m   3412\u001B[39m     \u001B[38;5;28;01mif\u001B[39;00m level > \u001B[32m0\u001B[39m \u001B[38;5;129;01mor\u001B[39;00m \u001B[38;5;28mself\u001B[39m._lexsort_depth == \u001B[32m0\u001B[39m:\n\u001B[32m   3413\u001B[39m         \u001B[38;5;66;03m# Desired level is not sorted\u001B[39;00m\n\u001B[32m   3414\u001B[39m         \u001B[38;5;28;01mif\u001B[39;00m \u001B[38;5;28misinstance\u001B[39m(idx, \u001B[38;5;28mslice\u001B[39m):\n\u001B[32m   3415\u001B[39m             \u001B[38;5;66;03m# test_get_loc_partial_timestamp_multiindex\u001B[39;00m\n",
      "\u001B[36mFile \u001B[39m\u001B[32m~/PycharmProjects/TTAD/.venv/lib/python3.12/site-packages/pandas/core/indexes/multi.py:2999\u001B[39m, in \u001B[36mMultiIndex._get_loc_single_level_index\u001B[39m\u001B[34m(self, level_index, key)\u001B[39m\n\u001B[32m   2997\u001B[39m     \u001B[38;5;28;01mreturn\u001B[39;00m -\u001B[32m1\u001B[39m\n\u001B[32m   2998\u001B[39m \u001B[38;5;28;01melse\u001B[39;00m:\n\u001B[32m-> \u001B[39m\u001B[32m2999\u001B[39m     \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[43mlevel_index\u001B[49m\u001B[43m.\u001B[49m\u001B[43mget_loc\u001B[49m\u001B[43m(\u001B[49m\u001B[43mkey\u001B[49m\u001B[43m)\u001B[49m\n",
      "\u001B[36mFile \u001B[39m\u001B[32m~/PycharmProjects/TTAD/.venv/lib/python3.12/site-packages/pandas/core/indexes/base.py:3819\u001B[39m, in \u001B[36mIndex.get_loc\u001B[39m\u001B[34m(self, key)\u001B[39m\n\u001B[32m   3814\u001B[39m     \u001B[38;5;28;01mif\u001B[39;00m \u001B[38;5;28misinstance\u001B[39m(casted_key, \u001B[38;5;28mslice\u001B[39m) \u001B[38;5;129;01mor\u001B[39;00m (\n\u001B[32m   3815\u001B[39m         \u001B[38;5;28misinstance\u001B[39m(casted_key, abc.Iterable)\n\u001B[32m   3816\u001B[39m         \u001B[38;5;129;01mand\u001B[39;00m \u001B[38;5;28many\u001B[39m(\u001B[38;5;28misinstance\u001B[39m(x, \u001B[38;5;28mslice\u001B[39m) \u001B[38;5;28;01mfor\u001B[39;00m x \u001B[38;5;129;01min\u001B[39;00m casted_key)\n\u001B[32m   3817\u001B[39m     ):\n\u001B[32m   3818\u001B[39m         \u001B[38;5;28;01mraise\u001B[39;00m InvalidIndexError(key)\n\u001B[32m-> \u001B[39m\u001B[32m3819\u001B[39m     \u001B[38;5;28;01mraise\u001B[39;00m \u001B[38;5;167;01mKeyError\u001B[39;00m(key) \u001B[38;5;28;01mfrom\u001B[39;00m\u001B[38;5;250m \u001B[39m\u001B[34;01merr\u001B[39;00m\n\u001B[32m   3820\u001B[39m \u001B[38;5;28;01mexcept\u001B[39;00m \u001B[38;5;167;01mTypeError\u001B[39;00m:\n\u001B[32m   3821\u001B[39m     \u001B[38;5;66;03m# If we have a listlike key, _check_indexing_error will raise\u001B[39;00m\n\u001B[32m   3822\u001B[39m     \u001B[38;5;66;03m#  InvalidIndexError. Otherwise we fall through and re-raise\u001B[39;00m\n\u001B[32m   3823\u001B[39m     \u001B[38;5;66;03m#  the TypeError.\u001B[39;00m\n\u001B[32m   3824\u001B[39m     \u001B[38;5;28mself\u001B[39m._check_indexing_error(key)\n",
      "\u001B[31mKeyError\u001B[39m: 'cap6'"
     ]
    }
   ],
   "execution_count": 16
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:24.646580Z",
     "start_time": "2025-11-14T08:13:24.640335Z"
    }
   },
   "source": [
    "# Necesitamos invocar un IndexSlice.\n",
    "gr.loc[pd.IndexSlice[:, 'cap6'],:] \n",
    "# https://pandas.pydata.org/docs/reference/api/pandas.IndexSlice.html"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "                Nombre  Apellidos  Altura  Nacimiento  Cof\n",
       "Sexo Categoria                                            \n",
       "F    cap6            2          2       2           2    2\n",
       "M    cap6            1          1       1           1    1"
      ],
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Apellidos</th>\n",
       "      <th>Altura</th>\n",
       "      <th>Nacimiento</th>\n",
       "      <th>Cof</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sexo</th>\n",
       "      <th>Categoria</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <th>cap6</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <th>cap6</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 17
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "En estas situaciones donde queremos acceder a esos datos, la forma más sencilla es simplemente eliminar el indice:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:26.046167Z",
     "start_time": "2025-11-14T08:13:26.038778Z"
    }
   },
   "source": [
    "gr = df.groupby(['Sexo',\"Categoria\"]).count().reset_index()\n",
    "gr"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "  Sexo Categoria  Nombre  Apellidos  Altura  Nacimiento  Cof\n",
       "0    F      cap6       2          2       2           2    2\n",
       "1    M      cap6       1          1       1           1    1\n",
       "2    M      cap8       1          1       1           1    1\n",
       "3    M   laboral       2          2       2           2    2"
      ],
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sexo</th>\n",
       "      <th>Categoria</th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Apellidos</th>\n",
       "      <th>Altura</th>\n",
       "      <th>Nacimiento</th>\n",
       "      <th>Cof</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>F</td>\n",
       "      <td>cap6</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>M</td>\n",
       "      <td>cap6</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>M</td>\n",
       "      <td>cap8</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>M</td>\n",
       "      <td>laboral</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 18
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:13:26.596827Z",
     "start_time": "2025-11-14T08:13:26.593947Z"
    }
   },
   "source": [
    "gr.index = gr[\"Categoria\"]\n",
    "print(gr)"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "          Sexo Categoria  Nombre  Apellidos  Altura  Nacimiento  Cof\n",
      "Categoria                                                           \n",
      "cap6         F      cap6       2          2       2           2    2\n",
      "cap6         M      cap6       1          1       1           1    1\n",
      "cap8         M      cap8       1          1       1           1    1\n",
      "laboral      M   laboral       2          2       2           2    2\n"
     ]
    }
   ],
   "execution_count": 19
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Agregaciones avanzadas\n",
    "\n",
    "Sobre una agregación, podemos realizar operaciones más allá de las aritméticas. <br/>\n",
    "\n",
    "Por ejemplo, si deseamos crear un histograma de la distribución de tipos de certificados por ciudad y código postal, necesitamos crear una lista para cada grupo."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:14:00.888234Z",
     "start_time": "2025-11-14T08:14:00.883072Z"
    }
   },
   "source": [
    "df = pd.read_csv(\"../../data/Pandas/data_groups.csv\")\n",
    "print(df.head())\n",
    "print(df.shape)\n"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   Unnamed: 0        Dni                           Nom    CP      Ciutat Sexe  \\\n",
      "0           0  H61414629     María Dolores Arjona Jove  7800     Eivissa    M   \n",
      "1           1  S3138381C                  Núria Quirós  7511     Ruberts    F   \n",
      "2           2  J8698188C  Miguel José María Gil Vargas  7340       Alaro    M   \n",
      "3           3  A48821615       Jordi Chaves Bustamante  7609  Bellavista    F   \n",
      "4           4  U0247281I    Jana Rosa Collado Menéndez  7006       Palma    M   \n",
      "\n",
      "  Tipus certificat cat  Punts  \n",
      "0                    B     73  \n",
      "1                    A     40  \n",
      "2                    A     45  \n",
      "3                    B     40  \n",
      "4                    B     86  \n",
      "(1000, 8)\n"
     ]
    }
   ],
   "execution_count": 23
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:14:01.940552Z",
     "start_time": "2025-11-14T08:14:01.933641Z"
    }
   },
   "source": [
    "dfc = df.groupby([\"Ciutat\",\"CP\"])[\"Tipus certificat cat\"].apply(list)\n",
    "dfc"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ciutat      CP  \n",
       "Alaro       7340    [A, C, B, A, A, B, A, D, A, B, A, B, B, A, B, ...\n",
       "Ariany      7529    [B, A, B, A, A, B, A, B, A, B, A, A, C, A, B, ...\n",
       "Bellavista  7609    [B, C, A, A, A, C, A, A, B, A, A, A, B, A, C, ...\n",
       "Binissalem  7350    [A, B, B, C, C, A, A, A, C, A, C, A, B, B, D, ...\n",
       "Eivissa     7800    [B, A, A, B, B, B, A, A, A, A, B, C, A, A, B, ...\n",
       "La Savina   7870    [A, A, A, B, A, A, B, B, A, A, B, A, B, B, A, ...\n",
       "Mao         7701    [B, C, B, C, B, C, A, B, B, B, A, C, C, A, C, ...\n",
       "            7703    [B, A, B, C, A, A, B, A, C, B, A, B, C, A, B, ...\n",
       "Palma       7006    [B, A, B, B, A, A, A, B, B, A, A, B, B, A, B, ...\n",
       "            7009    [A, D, A, B, C, A, A, C, B, A, A, A, C, A, A, ...\n",
       "            7013    [A, A, B, C, C, C, C, A, A, A, B, A, C, C, A, ...\n",
       "Pedruscada  7590    [A, B, A, A, A, A, B, B, C, B, B, B, C, A, A, ...\n",
       "Ruberts     7511    [A, A, A, B, C, A, B, A, A, B, B, A, C, B, B, ...\n",
       "Name: Tipus certificat cat, dtype: object"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 24
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:14:02.850469Z",
     "start_time": "2025-11-14T08:14:02.848281Z"
    }
   },
   "source": [
    "tipusAlaro = dfc.loc[pd.IndexSlice[\"Alaro\",7340]]\n",
    "\n",
    "values, counts = np.unique(tipusAlaro, return_counts=True)\n",
    "print(values) # Tipus \n",
    "print(counts) # quantitat\n",
    "print(\"-\"*40)\n",
    "distribucioAlaro = dict(zip(values,counts)) ## Que fa el ZIP?!\n",
    "print(distribucioAlaro)"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['A' 'B' 'C' 'D']\n",
      "[57 42 16  5]\n",
      "----------------------------------------\n",
      "{np.str_('A'): np.int64(57), np.str_('B'): np.int64(42), np.str_('C'): np.int64(16), np.str_('D'): np.int64(5)}\n"
     ]
    }
   ],
   "execution_count": 25
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "En pandas, además de utilizar las funciones de agregación predefinidas como `sum()`, `mean()`, `min()` o `max()`, también podemos emplear funciones propias (definidas por el usuario) para realizar agregaciones más flexibles y adaptadas a nuestras necesidades.\n",
    "\n",
    "Estas funciones personalizadas pueden ser cualquier función de Python que reciba una serie de datos como entrada y devuelva un único valor como resultado. Esto permite aplicar cálculos específicos, métricas avanzadas o transformaciones que no están disponibles de forma nativa en pandas."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:14:04.444517Z",
     "start_time": "2025-11-14T08:14:04.438161Z"
    }
   },
   "source": [
    "df2 = df.groupby([\"Ciutat\",\"CP\"]).agg({\"Tipus certificat cat\": np.size}) # lambda !\n",
    "\n",
    "print(df2)"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                 Tipus certificat cat\n",
      "Ciutat     CP                        \n",
      "Alaro      7340                   120\n",
      "Ariany     7529                   113\n",
      "Bellavista 7609                    98\n",
      "Binissalem 7350                    93\n",
      "Eivissa    7800                    91\n",
      "La Savina  7870                    90\n",
      "Mao        7701                    63\n",
      "           7703                    45\n",
      "Palma      7006                    36\n",
      "           7009                    23\n",
      "           7013                    37\n",
      "Pedruscada 7590                   106\n",
      "Ruberts    7511                    85\n"
     ]
    }
   ],
   "execution_count": 26
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "#### Funciones Lambda\n",
    "\n",
    "En **Python**, las **funciones lambda** permiten definir funciones pequeñas y anónimas de forma compacta. Se utilizan especialmente cuando necesitamos una función sencilla para una operación puntual y no queremos crear una función completa con `def`.\n",
    "\n",
    "Una función lambda se escribe en una sola línea y puede recibir uno o varios argumentos, pero solo puede contener una expresión, cuyo resultado se devuelve automáticamente. Gracias a su simplicidad, las lambdas son muy útiles en operaciones como filtrado, transformación o ordenación de datos, así como al trabajar con funciones de orden superior como `map()`, `filter()` o `sorted()`.\n",
    "\n",
    "Su uso hace el código más expresivo y conciso en situaciones donde definir una función tradicional sería innecesariamente largo. Sin embargo, se recomienda utilizarlas únicamente para tareas simples, manteniendo la legibilidad del código como prioridad."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:15:37.394421Z",
     "start_time": "2025-11-14T08:15:37.392008Z"
    }
   },
   "source": [
    "a = np.array([10,10,4,5,7,8,12,4507,30])\n",
    "b = list(map(lambda i:i**2+2*i,a))\n",
    "\n",
    "def f1(i):\n",
    "    return i**2+2\n",
    "\n",
    "b = list(map(f1,a))\n",
    "\n",
    "print(b)"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[np.int64(102), np.int64(102), np.int64(18), np.int64(27), np.int64(51), np.int64(66), np.int64(146), np.int64(20313051), np.int64(902)]\n"
     ]
    }
   ],
   "execution_count": 29
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "El siguiente código muestra cómo definir una función personalizada en Python para utilizarla como parte de una agregación en pandas. En este caso, la función `miBarem50p` recibe una serie y cuenta cuántos valores son mayores que 50. Posteriormente, esta función se aplica dentro de un `groupby` mediante una función `lambda`, lo que permite integrar la lógica propia dentro del proceso de agregación junto con otras funciones como `np.size`. Esto hace posible obtener métricas específicas adaptadas a nuestras necesidades durante el análisis de los datos."
  },
  {
   "cell_type": "code",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-11-14T08:14:05.975198Z",
     "start_time": "2025-11-14T08:14:05.966396Z"
    }
   },
   "source": [
    "def miBarem50p(serie):\n",
    "  up50list = []\n",
    "  for value in serie.values:\n",
    "    if value>50:\n",
    "      up50list.append(value)\n",
    "  return len(up50list)\n",
    "\n",
    "\n",
    "df3 = df.groupby([\"Ciutat\",\"CP\"]).agg(\n",
    "        {\"Punts\": [lambda x: miBarem50p(x), np.size]})\n",
    "\n",
    "print(df3)"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                     Punts     \n",
      "                <lambda_0> size\n",
      "Ciutat     CP                  \n",
      "Alaro      7340         61  120\n",
      "Ariany     7529         56  113\n",
      "Bellavista 7609         43   98\n",
      "Binissalem 7350         50   93\n",
      "Eivissa    7800         44   91\n",
      "La Savina  7870         41   90\n",
      "Mao        7701         34   63\n",
      "           7703         18   45\n",
      "Palma      7006         19   36\n",
      "           7009         10   23\n",
      "           7013         19   37\n",
      "Pedruscada 7590         65  106\n",
      "Ruberts    7511         36   85\n"
     ]
    }
   ],
   "execution_count": 28
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Ejercicios"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "**1) Usando el fichero WHO.csv, ¿Cuál es el volumen total de CO2 emitido por cada continente?**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "**2) ¿Cuál es el número de paises por continente?**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "**3) Del conjunto \"who.csv\" selecciona 30 paises al azar y sobre ellos calcula la media de \"Net primary school enrolment ratio female (%)\" agrupados por: Continente**\n",
    "\n",
    "Nota: la selección de 30 paises aleatoria ha de ser reproducible"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "**3b) Repite la anterior actividad pero ahora con todos los paises. ¿Sale la misma media?**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**4) Calcula la cantidad de ayuda recibida por cada municipio en función del númeto total de habitantes. (v2)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataframe de Municipios:\n",
      "        Nombre  Código Postal  Población\n",
      "0   Municipio1          60494      39232\n",
      "1   Municipio2          65125      15315\n",
      "2   Municipio3          15306      34075\n",
      "3   Municipio4          43936      10127\n",
      "4   Municipio5          77013      19470\n",
      "5   Municipio6          73691      10158\n",
      "6   Municipio7          63075       7214\n",
      "7   Municipio8          49755      41525\n",
      "8   Municipio9          72468      17417\n",
      "9  Municipio10          56930      35902\n",
      "\n",
      "Dataframe de Ayudas:\n",
      "         Nombre  Ayuda Económica (en euros)  Número de Beneficiarios\n",
      "0   Municipio10                        9989                       72\n",
      "1    Municipio3                        1230                       23\n",
      "2    Municipio5                        2528                       48\n",
      "3    Municipio2                        7534                       80\n",
      "4    Municipio2                        1018                       47\n",
      "5    Municipio6                        9086                      100\n",
      "6    Municipio8                        6458                       25\n",
      "7    Municipio9                        4996                       80\n",
      "8    Municipio2                        6328                       52\n",
      "9    Municipio6                        2031                       79\n",
      "10   Municipio7                        4130                       36\n",
      "11   Municipio6                        4632                       87\n",
      "12  Municipio10                        4909                       80\n",
      "13   Municipio4                        3334                       85\n",
      "14   Municipio9                        9896                       46\n",
      "15   Municipio8                        8339                       66\n",
      "16   Municipio8                        2494                       21\n",
      "17   Municipio9                        2318                       86\n",
      "18   Municipio5                        6243                       59\n",
      "19   Municipio1                        9322                       50\n"
     ]
    }
   ],
   "source": [
    "# V2. Con necesidad de agrupar\n",
    "import pandas as pd\n",
    "import random\n",
    "\n",
    "random.seed(0)\n",
    "\n",
    "nombres = [f'Municipio{i}' for i in range(1, 11)] \n",
    "\n",
    "data_municipios = {\n",
    "    'Nombre': nombres,\n",
    "    'Código Postal': [random.randint(10000, 99999) for _ in range(10)],\n",
    "    'Población': [random.randint(1000, 50000) for _ in range(10)]  # Añadimos un atributo aleatorio, en este caso \"Población\"\n",
    "}\n",
    "\n",
    "df_municipios = pd.DataFrame(data_municipios)\n",
    "\n",
    "\n",
    "data_ayudas = {\n",
    "    'Nombre': [random.choice(nombres) for _ in range(20)],\n",
    "    'Ayuda Económica (en euros)': [random.randint(1000, 10000) for _ in range(20)],\n",
    "    'Número de Beneficiarios': [random.randint(10, 100) for _ in range(20)]\n",
    "}\n",
    "\n",
    "df_ayudas = pd.DataFrame(data_ayudas)\n",
    "\n",
    "print(\"Dataframe de Municipios:\")\n",
    "print(df_municipios)\n",
    "\n",
    "print(\"\\nDataframe de Ayudas:\")\n",
    "print(df_ayudas)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**5)  Agrupa los datos por el nombre de de la escuela.**\n",
    "- ¿Qué escuela tiene más infantes?\n",
    "- ¿Qué escuela tiene los infantes más altos?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = {\n",
    "    'school': ['s001', 's002', 's003', 's001', 's002', 's004'],\n",
    "    'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],\n",
    "    'name': ['Alberto Franco', 'Gino Mcneill', 'Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],\n",
    "    'date_Of_Birth': ['15/05/2002', '17/05/2002', '16/02/1999', '25/09/1998', '11/05/2002', '15/09/1997'],\n",
    "    'age': [12, 12, 13, 13, 14, 12],\n",
    "    'height': [173, 192, 186, 167, 151, 159],\n",
    "    'weight': [35, 32, 33, 30, 31, 32],\n",
    "    'address': ['street1', 'street2', 'street3', 'street1', 'street2', 'street4']\n",
    "}\n",
    "\n",
    "df = pd.DataFrame(data, index=['S1', 'S2', 'S3', 'S4','S5','S6'])\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#TODO dejar esto pendiente para isaac futuro\n",
    "#hola\n",
    "#HOLA\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**6) Dado el siguiente listado de ventas:**\n",
    "- ¿Qué comprador ha gastado más?\n",
    "- ¿Qué vendedor ha hecho más ventas?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       Producto  Precio Comprador Vendedor\n",
      "0    Producto 1      47      Juan    Laura\n",
      "1    Producto 2      22     Maria    Laura\n",
      "2    Producto 3      82     Pedro    Elena\n",
      "3    Producto 4      19     Maria   Miguel\n",
      "4    Producto 5      85      Juan   Carlos\n",
      "5    Producto 6      15       Ana   Miguel\n",
      "6    Producto 7      89      Juan    Laura\n",
      "7    Producto 8      74     Maria    Laura\n",
      "8    Producto 9      26      Juan    Laura\n",
      "9   Producto 10      11     Pedro    Elena\n",
      "10  Producto 11      86     Maria    Elena\n",
      "11  Producto 12      81     Maria    Laura\n",
      "12  Producto 13      16      Juan   Miguel\n",
      "13  Producto 14      35       Ana    Laura\n",
      "14  Producto 15      60       Ana    Laura\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "# Crear datos aleatorios\n",
    "np.random.seed(1)\n",
    "n = 15\n",
    "productos = ['Producto ' + str(i) for i in range(1, n+1)]\n",
    "precios = np.random.randint(10, 100, n)\n",
    "compradores = np.random.choice(['Juan', 'Pedro', 'Maria', 'Ana'], n)\n",
    "vendedores = np.random.choice(['Carlos', 'Laura', 'Miguel', 'Elena'], n)\n",
    "\n",
    "# Crear DataFrame\n",
    "df_ventas = pd.DataFrame({\n",
    "    'Producto': productos,\n",
    "    'Precio': precios,\n",
    "    'Comprador': compradores,\n",
    "    'Vendedor': vendedores\n",
    "})\n",
    "\n",
    "print(df_ventas)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "#TODO\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**7) Fichero \"data/data_groups_cursos.csv\"; contiene cursos realizados por personas. Cada muestra corresponde a un curso.**\n",
    "Se pide un listado del número de cursos realizado por cada persona y una valoración de sus puntos según el tipo de curso realizado:\n",
    "- 'A': 3 puntos\n",
    "- 'B': 2 puntos\n",
    "- 'C': 1 punto\n",
    "- 'D': 0.5 punto\n",
    "- \"E\": 0.5 punto\n",
    "- \"F\": 1 punto"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10\n",
      "6\n"
     ]
    }
   ],
   "source": [
    "a = np.array([1,2,3,4,5,6,10])\n",
    "print(a.max())\n",
    "print(a.argmax())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Categorización de datos\n",
    "\n",
    "TODO\n",
    "Valores ya categorizados?\n",
    "\n",
    "https://pandas.pydata.org/docs/user_guide/categorical.html\n",
    "\n",
    "\n",
    "\n",
    "### Función `cut`\n",
    "La función `cut` de Pandas es una herramienta útil para segmentar y discretizar datos en intervalos o categorías. Esta función es especialmente útil cuando desea convertir una variable numérica continua en una variable categórica al dividirla en intervalos o categorías específicas.\n",
    "\n",
    "[Documentación](https://pandas.pydata.org/docs/reference/api/pandas.cut.html)\n",
    "\n",
    "Crearemos un nuevo DataFrame para realizar pruebas:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>candidat</th>\n",
       "      <th>nota</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   candidat  nota\n",
       "0         1     5\n",
       "1         2     0\n",
       "2         3     3\n",
       "3         4     3\n",
       "4         5     7\n",
       "5         6     9\n",
       "6         7     3\n",
       "7         8     5\n",
       "8         9     2\n",
       "9        10     4"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "np.random.seed(0)\n",
    "df_random = pd.DataFrame({\"candidat\":np.arange(1,11), \"nota\":np.random.randint(0,11,size=10)})\n",
    "df_random"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A continuación, creamos una columna nueva que contiene la nota pero en escala categórica:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>candidat</th>\n",
       "      <th>nota</th>\n",
       "      <th>notaCategorica</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>mig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>dolent</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>dolent</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>dolent</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>bona</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>bona</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>dolent</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>5</td>\n",
       "      <td>mig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>2</td>\n",
       "      <td>dolent</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>4</td>\n",
       "      <td>mig</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   candidat  nota notaCategorica\n",
       "0         1     5            mig\n",
       "1         2     0         dolent\n",
       "2         3     3         dolent\n",
       "3         4     3         dolent\n",
       "4         5     7           bona\n",
       "5         6     9           bona\n",
       "6         7     3         dolent\n",
       "7         8     5            mig\n",
       "8         9     2         dolent\n",
       "9        10     4            mig"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_random[\"notaCategorica\"] = pd.cut(df_random.nota, 3, labels=[\"dolent\", \"mig\", \"bona\"])\n",
    "df_random"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>candidat</th>\n",
       "      <th>nota</th>\n",
       "      <th>notaCategorica</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>bona</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>bona</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   candidat  nota notaCategorica\n",
       "4         5     7           bona\n",
       "5         6     9           bona"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_random[df_random.notaCategorica==df_random.notaCategorica.max()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>candidat</th>\n",
       "      <th>nota</th>\n",
       "      <th>notaCategorica</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>Be</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>Suspens</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>Suspens</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>Suspens</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>Be</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>Excel·lent</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>Suspens</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>5</td>\n",
       "      <td>Be</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>2</td>\n",
       "      <td>Suspens</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>4</td>\n",
       "      <td>Suspens</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   candidat  nota notaCategorica\n",
       "0         1     5             Be\n",
       "1         2     0        Suspens\n",
       "2         3     3        Suspens\n",
       "3         4     3        Suspens\n",
       "4         5     7             Be\n",
       "5         6     9     Excel·lent\n",
       "6         7     3        Suspens\n",
       "7         8     5             Be\n",
       "8         9     2        Suspens\n",
       "9        10     4        Suspens"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# mejora\n",
    "bins = [-1, 4, 7, 8.5, 10]\n",
    "labels = ['Suspens', 'Be', 'Notable', 'Excel·lent']\n",
    "df_random[\"notaCategorica\"] = pd.cut(df_random.nota, bins, labels=labels) # include_lowest=True\n",
    "df_random"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.   0.25 0.5  0.75 1.  ]\n",
      "----------\n",
      "0       (3.5, 5.0]\n",
      "1    (-0.001, 3.0]\n",
      "2    (-0.001, 3.0]\n",
      "3    (-0.001, 3.0]\n",
      "4       (5.0, 9.0]\n",
      "5       (5.0, 9.0]\n",
      "6    (-0.001, 3.0]\n",
      "7       (3.5, 5.0]\n",
      "8    (-0.001, 3.0]\n",
      "9       (3.5, 5.0]\n",
      "Name: nota, dtype: category\n",
      "Categories (4, interval[float64, right]): [(-0.001, 3.0] < (3.0, 3.5] < (3.5, 5.0] < (5.0, 9.0]]\n",
      "----------\n",
      "(-0.001, 3.0]    5\n",
      "(3.5, 5.0]       3\n",
      "(5.0, 9.0]       2\n",
      "(3.0, 3.5]       0\n",
      "Name: nota, dtype: int64\n",
      "0    5\n",
      "1    0\n",
      "2    3\n",
      "3    3\n",
      "4    7\n",
      "5    9\n",
      "6    3\n",
      "7    5\n",
      "8    2\n",
      "9    4\n",
      "Name: nota, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "# Podemos definir el intervalo que deseemos (como un Histograma)\n",
    "import numpy as np\n",
    "marks = np.arange(0,1.1,0.25)\n",
    "print(marks)\n",
    "print\n",
    "print(\"-\"*10)\n",
    "factors = pd.qcut(df_random.nota, marks)\n",
    "print(factors)\n",
    "print(\"-\"*10)\n",
    "print(pd.value_counts(factors))\n",
    "print(df_random.nota)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "### Actividad"
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(4557, 28)\n",
      "count    4555.000000\n",
      "mean        0.013723\n",
      "std         0.214786\n",
      "min         0.000000\n",
      "25%         0.000000\n",
      "50%         0.000000\n",
      "75%         0.000000\n",
      "max         7.010000\n",
      "Name: snowfall, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "# Del fichero:\n",
    "\n",
    "import pandas as pd\n",
    "df = pd.read_csv(\"data/rdu-weather-history.csv\",sep=\";\")  \n",
    "# print(df.head())\n",
    "# print(df.columns)\n",
    "print(df.shape)\n",
    "print(df.snowfall.describe())\n",
    "\n",
    "# values, repeticiones = np.unique(df.snowfall,return_counts=True)\n",
    "# print(values)\n",
    "# print(repeticiones)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Queremos discretizar la variable `snowfall` para tener 4 categorias de nevada.\n",
    "\n",
    "En primer lugar, usa la función `qcut` para hacer esta categorización. Haz una prueba y analiza por qué está función quizás no sea la mejor opción.<br/>\n",
    "Ahora, crea 4 categorías utilizando la función `cut`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "#TODO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/) <br/>\n",
    "Isaac Lera and Gabriel Moya <br/>\n",
    "Universitat de les Illes Balears <br/>\n",
    "isaac.lera@uib.edu, gabriel.moya@uib.edu"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "my3110",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}