Úvod do GeoPandas

Nahrání dat

[1]:
filename = 'demo_r_pjangrp3_linear.csv.gz'

import pandas as pd
nuts_data = pd.read_csv(filename, compression='gzip')
nuts_data
[1]:
DATAFLOW LAST UPDATE freq sex unit age geo TIME_PERIOD OBS_VALUE OBS_FLAG
0 ESTAT:DEMO_R_PJANGRP3(1.0) 28/09/23 23:00:00 A F NR TOTAL AL 2014 1430827 NaN
1 ESTAT:DEMO_R_PJANGRP3(1.0) 28/09/23 23:00:00 A F NR TOTAL AL 2015 1424597 NaN
2 ESTAT:DEMO_R_PJANGRP3(1.0) 28/09/23 23:00:00 A F NR TOTAL AL 2016 1417141 NaN
3 ESTAT:DEMO_R_PJANGRP3(1.0) 28/09/23 23:00:00 A F NR TOTAL AL 2017 1423050 NaN
4 ESTAT:DEMO_R_PJANGRP3(1.0) 28/09/23 23:00:00 A F NR TOTAL AL 2018 1431715 NaN
... ... ... ... ... ... ... ... ... ... ...
1125890 ESTAT:DEMO_R_PJANGRP3(1.0) 28/09/23 23:00:00 A T NR Y_LT5 UKN16 2015 7764 NaN
1125891 ESTAT:DEMO_R_PJANGRP3(1.0) 28/09/23 23:00:00 A T NR Y_LT5 UKN16 2016 7720 NaN
1125892 ESTAT:DEMO_R_PJANGRP3(1.0) 28/09/23 23:00:00 A T NR Y_LT5 UKN16 2017 7712 NaN
1125893 ESTAT:DEMO_R_PJANGRP3(1.0) 28/09/23 23:00:00 A T NR Y_LT5 UKN16 2018 7659 NaN
1125894 ESTAT:DEMO_R_PJANGRP3(1.0) 28/09/23 23:00:00 A T NR Y_LT5 UKN16 2019 7654 NaN

1125895 rows × 10 columns

V tabulce chybí geografická část popisu NUTS jednotech. Pro další práci zvolíme data ve formátu Esri Shapefile.

[2]:
import geopandas as gpd
nuts = gpd.read_file("NUTS_RG_20M_2021_3035.shp")
nuts
[2]:
NUTS_ID LEVL_CODE CNTR_CODE NAME_LATN NUTS_NAME MOUNT_TYPE URBN_TYPE COAST_TYPE FID geometry
0 FR 0 FR France France 0.0 0 0 FR MULTIPOLYGON (((9954236.116 -3059379.316, 9961...
1 HR 0 HR Hrvatska Hrvatska 0.0 0 0 HR MULTIPOLYGON (((4827385.889 2618351.326, 48483...
2 HU 0 HU Magyarország Magyarország 0.0 0 0 HU POLYGON ((5214660.069 2880853.832, 5216710.220...
3 AL 0 AL Shqipëria Shqipëria 0.0 0 0 AL POLYGON ((5129579.170 2204098.752, 5148385.473...
4 AT 0 AT Österreich Österreich 0.0 0 0 AT POLYGON ((4742889.368 2876362.725, 4783217.798...
... ... ... ... ... ... ... ... ... ... ...
2005 TRC21 3 TR Şanlıurfa Şanlıurfa 4.0 2 3 TRC21 POLYGON ((6904684.585 2120354.802, 6938677.828...
2006 TRC22 3 TR Diyarbakır Diyarbakır 4.0 2 3 TRC22 POLYGON ((6989716.599 2273670.524, 6982786.486...
2007 NO0B2 3 NO Svalbard Svalbard 3.0 3 1 NO0B2 MULTIPOLYGON (((4754167.335 6382461.409, 47465...
2008 NO0B 2 NO Jan Mayen and Svalbard Jan Mayen and Svalbard NaN 0 0 NO0B MULTIPOLYGON (((4754167.335 6382461.409, 47465...
2009 NO0B1 3 NO Jan Mayen Jan Mayen 3.0 3 1 NO0B1 POLYGON ((3642785.362 5404637.884, 3624291.510...

2010 rows × 10 columns

Vybere prvky na NUTS urovni 0.

[3]:
nuts0 = nuts[nuts["LEVL_CODE"] == 0]
nuts0.head()
[3]:
NUTS_ID LEVL_CODE CNTR_CODE NAME_LATN NUTS_NAME MOUNT_TYPE URBN_TYPE COAST_TYPE FID geometry
0 FR 0 FR France France 0.0 0 0 FR MULTIPOLYGON (((9954236.116 -3059379.316, 9961...
1 HR 0 HR Hrvatska Hrvatska 0.0 0 0 HR MULTIPOLYGON (((4827385.889 2618351.326, 48483...
2 HU 0 HU Magyarország Magyarország 0.0 0 0 HU POLYGON ((5214660.069 2880853.832, 5216710.220...
3 AL 0 AL Shqipëria Shqipëria 0.0 0 0 AL POLYGON ((5129579.170 2204098.752, 5148385.473...
4 AT 0 AT Österreich Österreich 0.0 0 0 AT POLYGON ((4742889.368 2876362.725, 4783217.798...

Vybrané prvky zobrazíme v interaktivní mapě.

[4]:
nuts0.explore()
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
[4]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Přidání informací o obyvatelstvu

[5]:
def population_by_nuts(df, year, level=0, sex='T', nuts=None):
    data = df[(df["TIME_PERIOD"] == year) & (df["sex"] == sex) & (df["age"] == 'TOTAL') & (df["geo"].apply(lambda x: len(x) == level+2))]
    if nuts is not None:
        data = data[data["geo"].str.match(nuts)]


    return data[["geo", "OBS_VALUE"]]

population_by_nuts(nuts_data, 2022).head()
[5]:
geo OBS_VALUE
752124 AL 2793592
752286 AT 8978929
752718 BE 11617623
753305 BG 6838937
753638 CH 8738791
[6]:
popu = population_by_nuts(nuts_data, 2022)
nuts0_popu = pd.merge(nuts0, popu, how='left', left_on='NUTS_ID', right_on='geo')
nuts0_popu.head()
[6]:
NUTS_ID LEVL_CODE CNTR_CODE NAME_LATN NUTS_NAME MOUNT_TYPE URBN_TYPE COAST_TYPE FID geometry geo OBS_VALUE
0 FR 0 FR France France 0.0 0 0 FR MULTIPOLYGON (((9954236.116 -3059379.316, 9961... FR 67871925.0
1 HR 0 HR Hrvatska Hrvatska 0.0 0 0 HR MULTIPOLYGON (((4827385.889 2618351.326, 48483... HR 3862305.0
2 HU 0 HU Magyarország Magyarország 0.0 0 0 HU POLYGON ((5214660.069 2880853.832, 5216710.220... HU 9689010.0
3 AL 0 AL Shqipëria Shqipëria 0.0 0 0 AL POLYGON ((5129579.170 2204098.752, 5148385.473... AL 2793592.0
4 AT 0 AT Österreich Österreich 0.0 0 0 AT POLYGON ((4742889.368 2876362.725, 4783217.798... AT 8978929.0

Zobrazíme prvky znázorňující množství obyvatel.

[7]:
nuts0_popu.explore(column='OBS_VALUE', legend=False, cmap='OrRd')
[7]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Definujeme novou funkci, která výpočet automatizuje.

[8]:
def nuts_population(df_geo, df_data, year, level):
    nuts_level = df_geo[df_geo["LEVL_CODE"] == level]
    popu = population_by_nuts(df_data, year, level)
    return pd.merge(nuts_level, popu, how='left', left_on='NUTS_ID', right_on='geo')

nuts_popu = nuts_population(nuts, nuts_data, 2022, 1)
nuts_popu
nuts_popu.explore(column='OBS_VALUE', legend=False, cmap='OrRd')
[8]:
Make this Notebook Trusted to load map: File -> Trust Notebook
[9]:
nuts_population(nuts, nuts_data, 2014, 2).explore(column='OBS_VALUE', legend=False, cmap='OrRd')
[9]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Hustota obyvatelstva

[10]:
nuts0_popu.area.head()
[10]:
0    6.387388e+11
1    5.658665e+10
2    9.296786e+10
3    2.866265e+10
4    8.385113e+10
dtype: float64

Vypočteme hustotu obyvatelstva (počet osob/km2).

[11]:
(nuts0_popu["OBS_VALUE"] / (nuts0_popu.area / 1e6)).head(10)
[11]:
0    106.259284
1     68.254707
2    104.218925
3     97.464550
4    107.081788
5    377.748627
6     61.396334
7    212.851778
8     97.901453
9    133.296882
dtype: float64
[12]:
nuts0_popu["density"] = nuts0_popu["OBS_VALUE"] / (nuts0_popu.area / 1e6)
nuts0_popu.head()
[12]:
NUTS_ID LEVL_CODE CNTR_CODE NAME_LATN NUTS_NAME MOUNT_TYPE URBN_TYPE COAST_TYPE FID geometry geo OBS_VALUE density
0 FR 0 FR France France 0.0 0 0 FR MULTIPOLYGON (((9954236.116 -3059379.316, 9961... FR 67871925.0 106.259284
1 HR 0 HR Hrvatska Hrvatska 0.0 0 0 HR MULTIPOLYGON (((4827385.889 2618351.326, 48483... HR 3862305.0 68.254707
2 HU 0 HU Magyarország Magyarország 0.0 0 0 HU POLYGON ((5214660.069 2880853.832, 5216710.220... HU 9689010.0 104.218925
3 AL 0 AL Shqipëria Shqipëria 0.0 0 0 AL POLYGON ((5129579.170 2204098.752, 5148385.473... AL 2793592.0 97.464550
4 AT 0 AT Österreich Österreich 0.0 0 0 AT POLYGON ((4742889.368 2876362.725, 4783217.798... AT 8978929.0 107.081788
[13]:
nuts0_popu.dropna(subset=["OBS_VALUE"]).explore(column='density', legend=True, cmap='OrRd')
[13]:
Make this Notebook Trusted to load map: File -> Trust Notebook
[14]:
mean = nuts0_popu["density"].mean()
nuts0_popu.explore(column='density', legend=True, cmap='OrRd', vmax=mean)
[14]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Časová řada

[15]:
import pandas as pd
pd.set_option('mode.chained_assignment', None)

nuts_dens = nuts[nuts["LEVL_CODE"] == 0]

years = nuts_data["TIME_PERIOD"].unique()
for year in years:
    print(f"{year}...")
    nuts_popu_year = nuts_population(nuts, nuts_data, year, 0)
    nuts_dens[f"density_{year}"] = nuts_popu_year["OBS_VALUE"] / (nuts_popu_year.area / 1e6)
2014...
2015...
2016...
2017...
2018...
2019...
2020...
2021...
2022...
[16]:
def show_density(df, year, base="density"):
    column = f"{base}_{year}"
    return df.explore(column, legend=True, cmap='OrRd')

show_density(nuts_dens, 2022)
[16]:
Make this Notebook Trusted to load map: File -> Trust Notebook
[17]:
for year in years[1:]:
    print(f"{year}...")
    prev_year = year - 1
    nuts_dens[f"density_trend_{year}"] = nuts_dens[f"density_{year}"] - nuts_dens[f"density_{prev_year}"]

nuts_dens.head()
2015...
2016...
2017...
2018...
2019...
2020...
2021...
2022...
[17]:
NUTS_ID LEVL_CODE CNTR_CODE NAME_LATN NUTS_NAME MOUNT_TYPE URBN_TYPE COAST_TYPE FID geometry ... density_2021 density_2022 density_trend_2015 density_trend_2016 density_trend_2017 density_trend_2018 density_trend_2019 density_trend_2020 density_trend_2021 density_trend_2022
0 FR 0 FR France France 0.0 0 0 FR MULTIPOLYGON (((9954236.116 -3059379.316, 9961... ... 105.922303 106.259284 0.457422 0.282178 0.268380 0.338805 0.237048 0.223221 0.526766 0.336981
1 HR 0 HR Hrvatska Hrvatska 0.0 0 0 HR MULTIPOLYGON (((4827385.889 2618351.326, 48483... ... 71.330521 68.254707 -0.379825 -0.612282 -0.644251 -0.860981 -0.516853 -0.319528 -0.385427 -3.075814
2 HU 0 HU Magyarország Magyarország 0.0 0 0 HU POLYGON ((5214660.069 2880853.832, 5216710.220... ... 104.668134 104.218925 -0.234425 -0.269835 -0.354144 -0.206415 -0.060397 -0.034743 -0.416854 -0.449209
3 AL 0 AL Shqipëria Shqipëria 0.0 0 0 AL POLYGON ((5129579.170 2204098.752, 5148385.473... ... 98.725738 97.464550 -0.230195 -0.356003 0.034854 -0.218647 -0.275515 -0.574685 -0.565684 -1.261188
4 AT 0 AT Österreich Österreich 0.0 0 0 AT POLYGON ((4742889.368 2876362.725, 4783217.798... ... 106.530037 107.081788 0.919964 1.377978 0.863363 0.589163 0.435391 0.504334 0.376858 0.551752

5 rows × 27 columns

[18]:
show_density(nuts_dens, 2022, "density_trend")
[18]:
Make this Notebook Trusted to load map: File -> Trust Notebook
[19]:
def show_density_diff(df, start, end):
    column = f"density_trend_{end}_{start}"
    df[column] = df[f"density_{end}"] - df[f"density_{start}"]
    return df.explore(column, legend=True, cmap='OrRd')

show_density_diff(nuts_dens, 2014, 2022)
[19]:
Make this Notebook Trusted to load map: File -> Trust Notebook