Chapitre 2 L’API Wbstats (Banque Mondiale)

## Global options
library(knitr)
library(dplyr)
library(ggplot2)
opts_chunk$set(echo=TRUE,
               cache=TRUE,
               prompt=FALSE,
               tidy=FALSE,
               comment=NA,
               message=FALSE,
               warning=FALSE)

2.1 Objectifs

Supposons que l’on souhaite télécharger la population, le PIB et les émisssions de CO2 des pays du monde de 2000 à 2015. Plutôt que d’aller chercher des fichiers sur un site web, nous allons utiliser une API proposée par la Banque Mondiale qui permet de télécharger les données facilement et surtout de les mettre à jour régulièrement. Pour cela on va installer le package R correspondant à l’API wbstats de la Banque mondiale.

https://cran.r-project.org/web/packages/wbstats/vignettes/Using_the_wbstats_package.html

Au moment du chargement du package, il est créé un fichier wb_cachelist qui fournit l’ensemble des donnes disponibles sous la forme d’une liste de tableaux de méta-données.

library("wbstats")
cat<-wb_cachelist
str(cat,max.level = 1)
List of 8
 $ countries    : tibble [304 × 18] (S3: tbl_df/tbl/data.frame)
 $ indicators   : tibble [16,649 × 8] (S3: tbl_df/tbl/data.frame)
 $ sources      : tibble [63 × 9] (S3: tbl_df/tbl/data.frame)
 $ topics       : tibble [21 × 3] (S3: tbl_df/tbl/data.frame)
 $ regions      : tibble [48 × 4] (S3: tbl_df/tbl/data.frame)
 $ income_levels: tibble [7 × 3] (S3: tbl_df/tbl/data.frame)
 $ lending_types: tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
 $ languages    : tibble [23 × 3] (S3: tbl_df/tbl/data.frame)

2.2 Le tableau “countries”

Il fournit des renseignements de base sur les différents pays, leurs codes, etc.

str(cat$countries)
tibble [304 × 18] (S3: tbl_df/tbl/data.frame)
 $ iso3c             : chr [1:304] "ABW" "AFG" "AFR" "AGO" ...
 $ iso2c             : chr [1:304] "AW" "AF" "A9" "AO" ...
 $ country           : chr [1:304] "Aruba" "Afghanistan" "Africa" "Angola" ...
 $ capital_city      : chr [1:304] "Oranjestad" "Kabul" NA "Luanda" ...
 $ longitude         : num [1:304] -70 69.2 NA 13.2 19.8 ...
 $ latitude          : num [1:304] 12.52 34.52 NA -8.81 41.33 ...
 $ region_iso3c      : chr [1:304] "LCN" "SAS" NA "SSF" ...
 $ region_iso2c      : chr [1:304] "ZJ" "8S" NA "ZG" ...
 $ region            : chr [1:304] "Latin America & Caribbean" "South Asia" "Aggregates" "Sub-Saharan Africa" ...
 $ admin_region_iso3c: chr [1:304] NA "SAS" NA "SSA" ...
 $ admin_region_iso2c: chr [1:304] NA "8S" NA "ZF" ...
 $ admin_region      : chr [1:304] NA "South Asia" NA "Sub-Saharan Africa (excluding high income)" ...
 $ income_level_iso3c: chr [1:304] "HIC" "LIC" NA "LMC" ...
 $ income_level_iso2c: chr [1:304] "XD" "XM" NA "XN" ...
 $ income_level      : chr [1:304] "High income" "Low income" "Aggregates" "Lower middle income" ...
 $ lending_type_iso3c: chr [1:304] "LNX" "IDX" NA "IBD" ...
 $ lending_type_iso2c: chr [1:304] "XX" "XI" NA "XF" ...
 $ lending_type      : chr [1:304] "Not classified" "IDA" "Aggregates" "IBRD" ...

Le tableau comporte 304 observation et il mélange des pays (France), des fragments de pays (Réunion) et des agrégats de pays (Europe). Il faudra donc bien faire attention lors de l’extraction à réfléchir à ce que l’on souhaite utiliser. Par exemple, si l’on veut juste les pays :

## Programme en langage R_base
# pays<-cat$countries[cat$countries$income_level!="Aggregates",c("iso3c", "country","capital_city","longitude","latitude", "region","income_level")]


## Programme en langage dplyr

pays <- cat$countries %>% 
          filter(income_level !="Aggregates") %>%
          select(iso3c,country, capital_city, latitude, longitude, region, income_level)


kable(head(pays))
iso3c country capital_city latitude longitude region income_level
ABW Aruba Oranjestad 12.51670 -70.0167 Latin America & Caribbean High income
AFG Afghanistan Kabul 34.52280 69.1761 South Asia Low income
AGO Angola Luanda -8.81155 13.2420 Sub-Saharan Africa Lower middle income
ALB Albania Tirane 41.33170 19.8172 Europe & Central Asia Upper middle income
AND Andorra Andorra la Vella 42.50750 1.5218 Europe & Central Asia High income
ARE United Arab Emirates Abu Dhabi 24.47640 54.3705 Middle East & North Africa High income

2.3 Le tableau indicators

Il comporte pas loin de 17000 variables … Autant dire qu’il est difficile de l’explorer facilement si l’on ne sait pas ce que l’on cherche.

indic<-cat$indicators
dim(indic)
[1] 16649     8
kable(head(indic))
indicator_id indicator unit indicator_desc source_org topics source_id source
1.0.HCount.1.90usd Poverty Headcount ($1.90 a day) NA The poverty headcount index measures the proportion of the population with daily per capita income (in 2011 PPP) below the poverty line. LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). 11 , Poverty 37 LAC Equity Lab
1.0.HCount.2.5usd Poverty Headcount ($2.50 a day) NA The poverty headcount index measures the proportion of the population with daily per capita income (in 2005 PPP) below the poverty line. LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). 11 , Poverty 37 LAC Equity Lab
1.0.HCount.Mid10to50 Middle Class ($10-50 a day) Headcount NA The poverty headcount index measures the proportion of the population with daily per capita income (in 2005 PPP) below the poverty line. LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). 11 , Poverty 37 LAC Equity Lab
1.0.HCount.Ofcl Official Moderate Poverty Rate-National NA The poverty headcount index measures the proportion of the population with daily per capita income below the official poverty line developed by each country. LAC Equity Lab tabulations of data from National Statistical Offices. 11 , Poverty 37 LAC Equity Lab
1.0.HCount.Poor4uds Poverty Headcount ($4 a day) NA The poverty headcount index measures the proportion of the population with daily per capita income (in 2005 PPP) below the poverty line. LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). 11 , Poverty 37 LAC Equity Lab
1.0.HCount.Vul4to10 Vulnerable ($4-10 a day) Headcount NA The poverty headcount index measures the proportion of the population with daily per capita income (in 2005 PPP) below the poverty line. LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). 11 , Poverty 37 LAC Equity Lab

2.3.1 Recherche du code d’un indicateur

Supposons qu’on recherche les données récentes sur les émissions de CO2. On va utiliser le mot-clé CO2 pour rechercher les variables correspondantes dans le catalogue à l’aide de la fonction wbsearch, ce qui donne 45 réponses

vars <- wb_search(pattern = "CO2",fields="indicator")
kable(vars)
indicator_id indicator indicator_desc
EN.ATM.CO2E.CP.KT CO2 emissions from cement production (thousand metric tons) Carbon dioxide emissions from cement production refer mainly to emissions during cement production. Cement production is a multi-step process and CO2 is actually released from klinker production during the cement production process. The U.S. Department of Energy’s carbon Dioxide Information Analysis Center (CDIAC) calculates annual anthropogenic emissions from data on fossil fuel consumption (from the United Nations Statistics Division’s World Energy Data Set) and world cement manufacturing (from the U.S. Bureau of Mine’s Cement Manufacturing Data Set). Carbon dioxide emissions, often calculated and reported as elemental carbon, were converted to actual carbon dioxide mass by multiplying them by 3.664 (the ratio of the mass of carbon to that of carbon dioxide). Although estimates of global carbon dioxide emissions are probably accurate within 10 percent (as calculated from global average file chemistry and use), country estimates may have larger error bounds. Trends estimated from a consistent time series tend to be more accurate than individual values. Each year the CDIAC recalculates the entire time series since 1949, incorporating recent findings and corrections. Estimates exclude fuels supplied to ships and aircraft in international transport because of the difficulty of apportioning he fuels among benefitting countries. The ratio of carbon dioxide per unit of energy shows carbon intensity, which is the amount of carbon dioxide emitted as a result of using one unit of energy in the process of production.
EN.ATM.CO2E.EG.ZS CO2 intensity (kg per kg of oil equivalent energy use) Carbon dioxide emissions from solid fuel consumption refer mainly to emissions from use of coal as an energy source.
EN.ATM.CO2E.FF.KT CO2 emissions from fossil-fuels, total (thousand metric tons) Fossil fuel is any hydrocarbon deposit that can be burned for heat or power, such as petroleum, coal, and natural gas. This is the sum total of all fossil fuel emissions (solid fuel consumption, liquid fuel consumption, gas fuel consumption, cement production and gas flaring). The U.S. Department of Energy’s carbon Dioxide Information Analysis Center (CDIAC) calculates annual anthropogenic emissions from data on fossil fuel consumption (from the United Nations Statistics Division’s World Energy Data Set) and world cement manufacturing (from the U.S. Bureau of Mine’s Cement Manufacturing Data Set). Carbon dioxide emissions, often calculated and reported as elemental carbon, were converted to actual carbon dioxide mass by multiplying them by 3.664 (the ratio of the mass of carbon to that of carbon dioxide). Although estimates of global carbon dioxide emissions are probably accurate within 10 percent (as calculated from global average file chemistry and use), country estimates may have larger error bounds. Trends estimated from a consistent time series tend to be more accurate than individual values. Each year the CDIAC recalculates the entire time series since 1949, incorporating recent findings and corrections. Estimates exclude fuels supplied to ships and aircraft in international transport because of the difficulty of apportioning he fuels among benefitting countries. The ratio of carbon dioxide per unit of energy shows carbon intensity, which is the amount of carbon dioxide emitted as a result of using one unit of energy in the process of production.
EN.ATM.CO2E.FF.ZS CO2 emissions from fossil-fuels (% of total) Fossil fuel is any hydrocarbon deposit that can be burned for heat or power, such as petroleum, coal, and natural gas. This is the sum total of all fossil fuel emissions (solid fuel consumption, liquid fuel consumption, gas fuel consumption, cement production and gas flaring).
EN.ATM.CO2E.GF.KT CO2 emissions from gaseous fuel consumption (kt) Carbon dioxide emissions from liquid fuel consumption refer mainly to emissions from use of natural gas as an energy source.
EN.ATM.CO2E.GF.ZS CO2 emissions from gaseous fuel consumption (% of total) Carbon dioxide emissions from liquid fuel consumption refer mainly to emissions from use of natural gas as an energy source.
EN.ATM.CO2E.GL.KT CO2 emissions from gas flaring (thousand metric tons) Carbon dioxide emissions from gas flaring fuel consumption refer mainly to emissions from gas flaring activities.
EN.ATM.CO2E.KD.GD CO2 emissions (kg per 2010 US$ of GDP) Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.
EN.ATM.CO2E.KT CO2 emissions (kt) Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.
EN.ATM.CO2E.LF.KT CO2 emissions from liquid fuel consumption (kt) Carbon dioxide emissions from liquid fuel consumption refer mainly to emissions from use of petroleum-derived fuels as an energy source.
EN.ATM.CO2E.LF.ZS CO2 emissions from liquid fuel consumption (% of total) Carbon dioxide emissions from liquid fuel consumption refer mainly to emissions from use of petroleum-derived fuels as an energy source.
EN.ATM.CO2E.PC CO2 emissions (metric tons per capita) Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.
EN.ATM.CO2E.PP.GD CO2 emissions (kg per PPP $ of GDP) Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.
EN.ATM.CO2E.PP.GD.KD CO2 emissions (kg per 2017 PPP $ of GDP) Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.
EN.ATM.CO2E.SF.KT CO2 emissions from solid fuel consumption (kt) Carbon dioxide emissions from solid fuel consumption refer mainly to emissions from use of coal as an energy source.
EN.ATM.CO2E.SF.ZS CO2 emissions from solid fuel consumption (% of total) Carbon dioxide emissions from solid fuel consumption refer mainly to emissions from use of coal as an energy source.
EN.ATM.GHGO.KT.CE Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent) Other greenhouse gas emissions are by-product emissions of hydrofluorocarbons, perfluorocarbons, and sulfur hexafluoride.
EN.ATM.GHGT.KT.CE Total greenhouse gas emissions (kt of CO2 equivalent) Total greenhouse gas emissions in kt of CO2 equivalent are composed of CO2 totals excluding short-cycle biomass burning (such as agricultural waste burning and savanna burning) but including other biomass burning (such as forest fires, post-burn decay, peat fires and decay of drained peatlands), all anthropogenic CH4 sources, N2O sources and F-gases (HFCs, PFCs and SF6).
EN.ATM.HFCG.KT.CE HFC gas emissions (thousand metric tons of CO2 equivalent) Hydrofluorocarbons, used as a replacement for chlorofluorocarbons, are used mainly in refrigeration and semiconductor manufacturing.
EN.ATM.METH.AG.KT.CE Agricultural methane emissions (thousand metric tons of CO2 equivalent) Agricultural methane emissions are emissions from animals, animal waste, rice production, agricultural waste burning (nonenergy, on-site), and savanna burning.
EN.ATM.METH.EG.KT.CE Methane emissions in energy sector (thousand metric tons of CO2 equivalent) Methane emissions from energy processes are emissions from the production, handling, transmission, and combustion of fossil fuels and biofuels.
EN.ATM.METH.KT.CE Methane emissions (kt of CO2 equivalent) Methane emissions are those stemming from human activities such as agriculture and from industrial methane production.
EN.ATM.NOXE.AG.KT.CE Agricultural nitrous oxide emissions (thousand metric tons of CO2 equivalent) Agricultural nitrous oxide emissions are emissions produced through fertilizer use (synthetic and animal manure), animal waste management, agricultural waste burning (nonenergy, on-site), and savanna burning.
EN.ATM.NOXE.EG.KT.CE Nitrous oxide emissions in energy sector (thousand metric tons of CO2 equivalent) Nitrous oxide emissions from energy processes are emissions produced by the combustion of fossil fuels and biofuels.
EN.ATM.NOXE.KT.CE Nitrous oxide emissions (thousand metric tons of CO2 equivalent) Nitrous oxide emissions are emissions from agricultural biomass burning, industrial activities, and livestock management.
EN.ATM.PFCG.KT.CE PFC gas emissions (thousand metric tons of CO2 equivalent) Perfluorocarbons, used as a replacement for chlorofluorocarbons in manufacturing semiconductors, are a byproduct of aluminum smelting and uranium enrichment.
EN.ATM.SF6G.KT.CE SF6 gas emissions (thousand metric tons of CO2 equivalent) Sulfur hexafluoride is used largely to insulate high-voltage electric power equipment.
EN.CLC.GHGR.MT.CE GHG net emissions/removals by LUCF (Mt of CO2 equivalent) GHG net emissions/removals by LUCF refers to changes in atmospheric levels of all greenhouse gases attributable to forest and land-use change activities, including but not limited to (1) emissions and removals of CO2 from decreases or increases in biomass stocks due to forest management, logging, fuelwood collection, etc.; (2) conversion of existing forests and natural grasslands to other land uses; (3) removal of CO2 from the abandonment of formerly managed lands (e.g. croplands and pastures); and (4) emissions and removals of CO2 in soil associated with land-use change and management. For Annex-I countries under the UNFCCC, these data are drawn from the annual GHG inventories submitted to the UNFCCC by each country; for non-Annex-I countries, data are drawn from the most recently submitted National Communication where available. Because of differences in reporting years and methodologies, these data are not generally considered comparable across countries. Data are in million metric tons.
EN.CO2.BLDG.ZS CO2 emissions from residential buildings and commercial and public services (% of total fuel combustion) CO2 emissions from residential buildings and commercial and public services contains all emissions from fuel combustion in households. This corresponds to IPCC Source/Sink Category 1 A 4 b. Commercial and public services includes emissions from all activities of ISIC Divisions 41, 50-52, 55, 63-67, 70-75, 80, 85, 90-93 and 99.
EN.CO2.ETOT.ZS CO2 emissions from electricity and heat production, total (% of total fuel combustion) CO2 emissions from electricity and heat production is the sum of three IEA categories of CO2 emissions: (1) Main Activity Producer Electricity and Heat which contains the sum of emissions from main activity producer electricity generation, combined heat and power generation and heat plants. Main activity producers (formerly known as public utilities) are defined as those undertakings whose primary activity is to supply the public. They may be publicly or privately owned. This corresponds to IPCC Source/Sink Category 1 A 1 a. For the CO2 emissions from fuel combustion (summary) file, emissions from own on-site use of fuel in power plants (EPOWERPLT) are also included. (2) Unallocated Autoproducers which contains the emissions from the generation of electricity and/or heat by autoproducers. Autoproducers are defined as undertakings that generate electricity and/or heat, wholly or partly for their own use as an activity which supports their primary activity. They may be privately or publicly owned. In the 1996 IPCC Guidelines, these emissions would normally be distributed between industry, transport and “other” sectors. (3) Other Energy Industries contains emissions from fuel combusted in petroleum refineries, for the manufacture of solid fuels, coal mining, oil and gas extraction and other energy-producing industries. This corresponds to the IPCC Source/Sink Categories 1 A 1 b and 1 A 1 c. According to the 1996 IPCC Guidelines, emissions from coke inputs to blast furnaces can either be counted here or in the Industrial Processes source/sink category. Within detailed sectoral calculations, certain non-energy processes can be distinguished. In the reduction of iron in a blast furnace through the combustion of coke, the primary purpose of the coke oxidation is to produce pig iron and the emissions can be considered as an industrial process. Care must be taken not to double count these emissions in both Energy and Industrial Processes. In the IEA estimations, these emissions have been included in this category.
EN.CO2.MANF.ZS CO2 emissions from manufacturing industries and construction (% of total fuel combustion) CO2 emissions from manufacturing industries and construction contains the emissions from combustion of fuels in industry. The IPCC Source/Sink Category 1 A 2 includes these emissions. However, in the 1996 IPCC Guidelines, the IPCC category also includes emissions from industry autoproducers that generate electricity and/or heat. The IEA data are not collected in a way that allows the energy consumption to be split by specific end-use and therefore, autoproducers are shown as a separate item (Unallocated Autoproducers). Manufacturing industries and construction also includes emissions from coke inputs into blast furnaces, which may be reported either in the transformation sector, the industry sector or the separate IPCC Source/Sink Category 2, Industrial Processes.
EN.CO2.OTHX.ZS CO2 emissions from other sectors, excluding residential buildings and commercial and public services (% of total fuel combustion) CO2 emissions from other sectors, less residential buildings and commercial and public services, contains the emissions from commercial/institutional activities, residential, agriculture/forestry, fishing and other emissions not specified elsewhere that are included in the IPCC Source/Sink Categories 1 A 4 and 1 A 5. In the 1996 IPCC Guidelines, the category also includes emissions from autoproducers in the commercial/residential/agricultural sectors that generate electricity and/or heat. The IEA data are not collected in a way that allows the energy consumption to be split by specific end-use and therefore, autoproducers are shown as a separate item (Unallocated Autoproducers).
EN.CO2.TRAN.ZS CO2 emissions from transport (% of total fuel combustion) CO2 emissions from transport contains emissions from the combustion of fuel for all transport activity, regardless of the sector, except for international marine bunkers and international aviation. This includes domestic aviation, domestic navigation, road, rail and pipeline transport, and corresponds to IPCC Source/Sink Category 1 A 3. In addition, the IEA data are not collected in a way that allows the autoproducer consumption to be split by specific end-use and therefore, autoproducers are shown as a separate item (Unallocated Autoproducers).
IN.ENV.CO2.CONC CO2 Emission (in thousand metric tons of Carbon) NA

On va finalement trouver le code de la variable recherchée

  • EN.ATM.CO2E.KT : émissions de CO2 en kilotonnes

Les deux autres variables dont nous avons besoin ont pour code

  • NY.GDP.MKTP.CD : PIB en parités de pouvoir d’achat
  • SP.POP.TOTL : Population totale

2.3.2 Extraction des métadonnées

Une fois que l’on pense connaître le code de nos variables, on peut extraire les métadonnés pour vérifier qu’il s’agit bien de ce que l’on cherche, quelle est la source exacte, quelle est l’unité de mesure …

# Programme R-base
meta<-cat$indicators[cat$indicators$indicator_id %in% c("SP.POP.TOTL","NY.GDP.MKTP.CD","EN.ATM.CO2E.KT"),]

# Programme dplyr
meta<-cat$indicators %>%
        filter(indicator_id %in% c("SP.POP.TOTL","NY.GDP.MKTP.CD","EN.ATM.CO2E.KT"))

kable(meta)
indicator_id indicator unit indicator_desc source_org topics source_id source
EN.ATM.CO2E.KT CO2 emissions (kt) NA Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring. Carbon Dioxide Information Analysis Center, Environmental Sciences Division, Oak Ridge National Laboratory, Tennessee, United States. 19 , 6 , Climate Change, Environment 2 World Development Indicators
NY.GDP.MKTP.CD GDP (current US$) NA GDP at purchaser’s prices is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. Data are in current U.S. dollars. Dollar figures for GDP are converted from domestic currencies using single year official exchange rates. For a few countries where the official exchange rate does not reflect the rate effectively applied to actual foreign exchange transactions, an alternative conversion factor is used. World Bank national accounts data, and OECD National Accounts data files. 3 , Economy & Growth 2 World Development Indicators
SP.POP.TOTL Population, total NA Total population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. The values shown are midyear estimates. (1) United Nations Population Division. World Population Prospects: 2019 Revision. (2) Census reports and other statistical publications from national statistical offices, (3) Eurostat: Demographic Statistics, (4) United Nations Statistical Division. Population and Vital Statistics Reprot (various years), (5) U.S. Census Bureau: International Database, and (6) Secretariat of the Pacific Community: Statistics and Demography Programme. 19 , 8 , Climate Change, Health 2 World Development Indicators

2.4 L’extraction des données

Elle se fait à l’aide de la fonction wb_data qui comporte de nombreuses options.

2.4.1 le paramètre indicator =

Ce paramètre permet de choisir les indicateurs à collecter, ce qui suppose que l’on connaisse leur code. Par exemple, supposons que l’on veuille extraire la population et le PIB pour pouvoir calculer ensuite le PIB par habitant

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"))
dim(df)
[1] 13237     6
kable(head(df,6))
iso2c iso3c country date NY.GDP.MKTP.CD SP.POP.TOTL
AW ABW Aruba 1960 NA 54208
AW ABW Aruba 1961 NA 55434
AW ABW Aruba 1962 NA 56234
AW ABW Aruba 1963 NA 56699
AW ABW Aruba 1964 NA 57029
AW ABW Aruba 1965 NA 57357
  • commentaire : Nous obtenons un tableau très grand (> 13000 lignes) qui comporte les valeurs pour toutes les dates disponibles depuis 1960 et pour tous les pays, même si les valeurs sont souvent manquantes.

2.4.2 le choix d’une période de temps

2.4.2.1 les paramètres startdate = et startdate =

Ces deux paramètres permettent de choisir une plage de temps. On peut par exemple décider de ne collecter que les données relatives aux années 2014, 2015 et 2016

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"),
                start_date = 2014,
                end_date = 2016)
dim(df)
[1] 651   6
kable(head(df,6))
iso2c iso3c country date NY.GDP.MKTP.CD SP.POP.TOTL
AW ABW Aruba 2014 2790849162 103776
AW ABW Aruba 2015 2962905028 104339
AW ABW Aruba 2016 2983636872 104865
AF AFG Afghanistan 2014 20497126770 33370804
AF AFG Afghanistan 2015 19134211764 34413603
AF AFG Afghanistan 2016 18116562465 35383028
  • commentaire : Le tableau ne comporte donc plus que 651 lignes correspondant aux trois dates pour les différents pays du Monde.

2.4.2.2 Le paramètre mrv (most recent value)

Lorsque l’on souhaite juste obtenir les données les plus récentes, on peut remplacer les paramètres startdate = et startdate = par le paramètre mrv = suivit d’un chiffre indiquant le nombre d’années que l’on souhaite à partir de la date la plus récente. Avec mrv=1 on récupère uniquement la dernière année disponible pour au moins l’une des variables.

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"),
                mrv = 1)
dim(df)
[1] 217   6
kable(head(df,6))
iso2c iso3c country date NY.GDP.MKTP.CD SP.POP.TOTL
AW ABW Aruba 2020 NA 106766
AF AFG Afghanistan 2020 20116137326 38928341
AO AGO Angola 2020 58375976293 32866268
AL ALB Albania 2020 14887629268 2837743
AD AND Andorra 2020 NA 77265
AE ARE United Arab Emirates 2020 358868765175 9890400

L’inconvénient de cette méthode est que cela peut aboutir à un grand nombre de valeurs manquantes si l’une des variables recherchée n’a pas été mise à jour. Par exemple, la variable relative au CO2 n’est pas disponible après 2016 et du coup le tableau va mélanger des dates différentes.

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL","EN.ATM.CO2E.KT" ),
                mrv =1)
dim(df)
[1] 434   7
kable(head(df,6))
iso2c iso3c country date EN.ATM.CO2E.KT NY.GDP.MKTP.CD SP.POP.TOTL
AW ABW Aruba 2018 NA NA NA
AW ABW Aruba 2020 NA NA 106766
AF AFG Afghanistan 2018 7440 NA NA
AF AFG Afghanistan 2020 NA 20116137326 38928341
AO AGO Angola 2018 27340 NA NA
AO AGO Angola 2020 NA 58375976293 32866268

Il est donc préférable de sélectioner une période plus longue mrv=5 et de faire ensuite soi-même le tri :

2.4.3 Le choix des unités géographiques

Le paramètre country = permet de choisir les entités spatiales à collecter, soit sous forme de liste de codes, soit à l’aide de valeurs spéciales. Par défaut; il renvoie la liste de tous les pays, mais on peut se limiter à quelques uns seulement à l’aide de leur nom en anglais (risqué …) ou de leur code ISO3 (plus sûr)

2.4.3.1 sélection de pays

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"),
                start_date = 2018,
                end_date = 2018,
                country = c("USA","CHN"))
df$GDP.per.capita <- round(df$NY.GDP.MKTP.CD / df$SP.POP.TOTL,0)
kable(head(df,6))
iso2c iso3c country date NY.GDP.MKTP.CD SP.POP.TOTL GDP.per.capita
CN CHN China 2018 1.389482e+13 1402760000 9905
US USA United States 2018 2.061186e+13 326838199 63064
  • commentaire : Il est donc facile de travailler sur un petit nombre de pays que l’on souhaite comparer.

2.4.3.2 Opérateurs spéciaux

Il existe un certain nombre de paramètres spéciaux que l’on peut utiliser à la place de la liste des pays :

  • “countries_only” (Default)
  • “regions_only”
  • “admin_regions_only”
  • “income_levels_only”
  • “aggregates_only”
  • “all”
df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"),
                start_date = 2018,
                end_date = 2018,
                country = "regions_only")
df$GDP.per.capita <- round(df$NY.GDP.MKTP.CD / df$SP.POP.TOTL,0)
kable(df)
iso2c iso3c country date NY.GDP.MKTP.CD SP.POP.TOTL GDP.per.capita
Z4 EAS East Asia & Pacific 2018 2.641632e+13 2338223462 11298
Z7 ECS Europe & Central Asia 2018 2.321731e+13 918031055 25290
ZJ LCN Latin America & Caribbean 2018 5.703879e+12 640483586 8906
ZQ MEA Middle East & North Africa 2018 3.356567e+12 448974232 7476
XU NAC North America 2018 2.234094e+13 363967296 61382
8S SAS South Asia 2018 3.436594e+12 1814455018 1894
ZG SSF Sub-Saharan Africa 2018 1.753415e+12 1078319512 1626
  • commentaire : Nous avons extrait les données par grandes régions du Monde pour l’année 2016

2.4.4 Le format de sortie du tableau

Il existe deux façons d’extraire un tableau comprenant plusieurs variables ou plusieurs dates, selon que l’on veut un tableau large (wide) ou étroit. On peut régler la sortie à l’aide du paramètre return_wide qui est TRUE par défaut mais que l’on peut régler sur FALSE.

2.4.4.1 return_wide = FALSE

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"),
                return_wide = TRUE,
                start_date = 2016,
                end_date = 2018,
                country = c("USA","CHN"))
df
# A tibble: 6 × 6
  iso2c iso3c country        date NY.GDP.MKTP.CD SP.POP.TOTL
  <chr> <chr> <chr>         <dbl>          <dbl>       <dbl>
1 CN    CHN   China          2016        1.12e13  1387790000
2 CN    CHN   China          2017        1.23e13  1396215000
3 CN    CHN   China          2018        1.39e13  1402760000
4 US    USA   United States  2016        1.87e13   323071755
5 US    USA   United States  2017        1.95e13   325122128
6 US    USA   United States  2018        2.06e13   326838199

2.4.4.2 return_wide = FALSE

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"),
                return_wide = FALSE,
                start_date = 2016,
                end_date = 2018,
                country = c("USA","CHN"))
df[,1:7]
# A tibble: 12 × 7
   indicator_id   indicator         iso2c iso3c country        date   value
   <chr>          <chr>             <chr> <chr> <chr>         <dbl>   <dbl>
 1 NY.GDP.MKTP.CD GDP (current US$) CN    CHN   China          2018 1.39e13
 2 NY.GDP.MKTP.CD GDP (current US$) CN    CHN   China          2017 1.23e13
 3 NY.GDP.MKTP.CD GDP (current US$) CN    CHN   China          2016 1.12e13
 4 NY.GDP.MKTP.CD GDP (current US$) US    USA   United States  2018 2.06e13
 5 NY.GDP.MKTP.CD GDP (current US$) US    USA   United States  2017 1.95e13
 6 NY.GDP.MKTP.CD GDP (current US$) US    USA   United States  2016 1.87e13
 7 SP.POP.TOTL    Population, total CN    CHN   China          2018 1.40e 9
 8 SP.POP.TOTL    Population, total CN    CHN   China          2017 1.40e 9
 9 SP.POP.TOTL    Population, total CN    CHN   China          2016 1.39e 9
10 SP.POP.TOTL    Population, total US    USA   United States  2018 3.27e 8
11 SP.POP.TOTL    Population, total US    USA   United States  2017 3.25e 8
12 SP.POP.TOTL    Population, total US    USA   United States  2016 3.23e 8

2.5 Exercices

2.5.1 Exercice 1

Extraire les métadonnées relatives à la variable SP.URB.TOTL

indicator_id indicator unit indicator_desc source_org topics source_id source
SP.URB.TOTL Urban population NA Urban population refers to people living in urban areas as defined by national statistical offices. It is calculated using World Bank population estimates and urban ratios from the United Nations World Urbanization Prospects. Aggregation of urban and rural population may not add up to total population because of different country coverages. World Bank staff estimates based on the United Nations Population Division’s World Urbanization Prospects: 2018 Revision. 19 , 16 , Climate Change , Urban Development 2 World Development Indicators
meta<-cat$indicators[cat$indicators$indicator_id %in% c("SP.URB.TOTL"),]

2.5.2 Exercice 2

Créer un tableau de la population des pays du monde en 2000, triez le par ordre décroissant et affichez les 10 pays les plus peuplés avec leur nom,leur code et la population en millions

Code Pays Population
CHN China 1262.6
IND India 1056.6
USA United States 282.2
IDN Indonesia 211.5
BRA Brazil 174.8
RUS Russian Federation 146.6
PAK Pakistan 142.3
BGD Bangladesh 127.7
JPN Japan 126.8
NGA Nigeria 122.3
# Chargement des données avec l'API
tab <- wb_data(indicator = c("SP.POP.TOTL"),
                  start_date=2000,
                  end_date = 2000)

### Tri, sélection, transformation  et recodage en R-Base
 # tab<-tab[order(tab$SP.POP.TOTL,decreasing = T),]
 # tab<-tab[,c("iso3c","country","SP.POP.TOTL")]
 # tab$SP.POP.TOTL<-tab$SP.POP.TOTL/1000000
 # names(tab)<-c("Code", "Nom", "Population")

### Tri, sélection, transformationet recodage en dplyr
tab<- tab %>% 
          arrange(desc(SP.POP.TOTL)) %>%
          select(iso3c, country, SP.POP.TOTL) %>%
          mutate(SP.POP.TOTL = SP.POP.TOTL/1000000) %>%
          rename(Code=iso3c, Pays = country, Population = SP.POP.TOTL)

# Affichage du résultat
kable(head(tab,10), digits=1)

2.5.3 Exercice 3

On se propose de comparer l’évolution des émissions de CO2 (EN.ATM.CO2E.KT)de la Chine (CHN), l’Inde (IND), la Russie (RUS) le Japon (JPN) et des Etats-Unis d’Amérique (USA) de 1995 à 2015.

2.5.3.1 CO2 en valeur brute (tonnes)

Réalisez un graphique présentant les valeurs de CO2 en milliers de tonnes avec une échelle logarithmique sur l’axe y pour mieux visualiser les taux de croissance.

# Chargement des données avec l'API
tab <- wb_data(indicator = c("EN.ATM.CO2E.KT"),
                  country = c("CHN","IND","RUS","USA","JPN"),
                  start_date=1995,
                  end_date = 2015)

p<-ggplot(tab) + aes(x=date, y = EN.ATM.CO2E.KT, color= country) +
                  geom_line() +
                  scale_y_log10("en milliers de t") +
                  ggtitle(label = "Principaux pays émetteurs de CO2 (1995-2015)",
                          subtitle = "Source : Banque Mondiale - API wbstat")

p

2.5.3.2 CO2 en valeur relative (tonnes par habitant)

Même exercice mais en téléchargeant aussi la population (SP.POP.TOTL) de façon à calculer la variable CO2.per.capita qui mesure le nombre de tonnes de CO2 par habitant. On utilisera cette fois-ci une échelle arithmétique sur l’axe vertical.

# Chargement des données avec l'API
tab <- wb_data(indicator = c("EN.ATM.CO2E.KT", "SP.POP.TOTL"),
                  country = c("CHN","IND","RUS","USA","JPN"),
                  start_date=1995,
                  end_date = 2015)

tab <- tab %>% mutate(CO2.per.capita = 1000*EN.ATM.CO2E.KT/SP.POP.TOTL)

# Visualisation avec ggplot2
p<-ggplot(tab) + aes(x=date, y = CO2.per.capita, color= country) +
                  geom_line() +
                  scale_y_continuous("en tonnes par habitant") +
                  ggtitle(label = "Principaux pays émetteurs de CO2 (1995-2015)",
                          subtitle = "Source : Banque Mondiale - API wbstat")

p

2.5.4 Exercice 4

On se propose de comparer les plus grands pays du Monde en combinant deux critères :

  • DEVDUR = Développement durable : mesuré par les quantités de CO2 par habitant
  • DEVECO = Développement économique : mesurée par le PIB par habitant

2.5.4.1 Analyse pour une année (2010) et un seuil de population (10 millions)

On construit un programme pour une année précise (2010)et en ne retenant que les pays ayant une population minimale (10 millions d’habitants)

# Chargement des données avec l'API
tab <- wb_data(indicator = c("EN.ATM.CO2E.KT", "SP.POP.TOTL","NY.GDP.MKTP.CD"),
                  start_date=2010,
                  end_date = 2010)

tab <- tab %>% mutate(DEVDUR = 1000*EN.ATM.CO2E.KT/SP.POP.TOTL,
                      DEVECO = NY.GDP.MKTP.CD/SP.POP.TOTL,
                      POP = SP.POP.TOTL/1000000) %>%
                      rename(Code = iso3c, 
                      Pays = country) %>%
               select(Code,Pays, POP, DEVDUR, DEVECO)%>%
               filter(POP > 10)
        

# Visualisation avec ggplot2
p<-ggplot(tab) + aes(x=DEVECO, y = DEVDUR) +
                  geom_point(aes(size=POP),col="red") +
                  geom_text(aes(label=Code), size=2, nudge_y=1)+
                  scale_x_log10("PIB par habitant (échelle logarithmique)") +
                  scale_y_continuous("CO2 par habitant") +
                  ggtitle(label = "Développement dans le Monde en 2010",
                          subtitle = "Source : Banque Mondiale - API wbstat")

p

2.5.4.2 Création d’une fonction f(année, population)

On reprend le même programme mais sous forme d’une fonction mongraphique() renvoyant le diagramme en selon le choix de deux paramètres : l’année et le seuil minimal de population. On teste ensuite la fonction pour l’année 1996 et l’année 2016 en prenant un seuil de 50 millions d’habitants.

mongraphique <-function(year = 2010, minpop = 10)
{ 

# Chargement des données avec l'API
tab <- wb_data(indicator = c("EN.ATM.CO2E.KT", "SP.POP.TOTL","NY.GDP.MKTP.CD"),
                  start_date=year,
                  end_date = year)

tab <- tab %>% mutate(DEVDUR = 1000*EN.ATM.CO2E.KT/SP.POP.TOTL,
                      DEVECO = NY.GDP.MKTP.CD/SP.POP.TOTL,
                      POP = SP.POP.TOTL/1000000) %>%
                      rename(Code = iso3c, 
                      Pays = country) %>%
               select(Code,Pays, POP, DEVDUR, DEVECO)%>%
               filter(POP > minpop)
        

# Visualisation avec ggplot2
p<-ggplot(tab) + aes(x=DEVECO, y = DEVDUR) +
                  geom_point(aes(size=POP),col="red") +
                  geom_text(aes(label=Code), size=2, nudge_y=1)+
                  scale_x_log10("PIB par habitant (échelle logarithmique)") +
                  scale_y_continuous("CO2 par habitant") +
                  ggtitle(label = paste("Développement dans le Monde en ", year),
                          subtitle = "Source : Banque Mondiale - API wbstat")

p
}

mongraphique(1996,50)
mongraphique(2016,50)