Chapitre 2 L’API Wbstats (Banque Mondiale)

## Global options
library(knitr)
library(dplyr)
library(ggplot2)
opts_chunk$set(echo=TRUE,
               cache=TRUE,
               prompt=FALSE,
               tidy=FALSE,
               comment=NA,
               message=FALSE,
               warning=FALSE)

2.1 Objectifs

Supposons que l’on souhaite télécharger la population, le PIB et les émisssions de CO2 des pays du monde de 2000 à 2015. Plutôt que d’aller chercher des fichiers sur un site web, nous allons utiliser une API proposée par la Banque Mondiale qui permet de télécharger les données facilement et surtout de les mettre à jour régulièrement. Pour cela on va installer le package R correspondant à l’API wbstats de la Banque mondiale.

https://cran.r-project.org/web/packages/wbstats/vignettes/Using_the_wbstats_package.html

Au moment du chargement du package, il est créé un fichier wb_cachelist qui fournit l’ensemble des donnes disponibles sous la forme d’une liste de tableaux de méta-données.

library("wbstats")
cat<-wb_cachelist
str(cat,max.level = 1)

List of 8
 $ countries    : tibble [304 × 18] (S3: tbl_df/tbl/data.frame)
 $ indicators   : tibble [16,649 × 8] (S3: tbl_df/tbl/data.frame)
 $ sources      : tibble [63 × 9] (S3: tbl_df/tbl/data.frame)
 $ topics       : tibble [21 × 3] (S3: tbl_df/tbl/data.frame)
 $ regions      : tibble [48 × 4] (S3: tbl_df/tbl/data.frame)
 $ income_levels: tibble [7 × 3] (S3: tbl_df/tbl/data.frame)
 $ lending_types: tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
 $ languages    : tibble [23 × 3] (S3: tbl_df/tbl/data.frame)

2.2 Le tableau “countries”

Il fournit des renseignements de base sur les différents pays, leurs codes, etc.

str(cat$countries)

tibble [304 × 18] (S3: tbl_df/tbl/data.frame)
 $ iso3c             : chr [1:304] "ABW" "AFG" "AFR" "AGO" ...
 $ iso2c             : chr [1:304] "AW" "AF" "A9" "AO" ...
 $ country           : chr [1:304] "Aruba" "Afghanistan" "Africa" "Angola" ...
 $ capital_city      : chr [1:304] "Oranjestad" "Kabul" NA "Luanda" ...
 $ longitude         : num [1:304] -70 69.2 NA 13.2 19.8 ...
 $ latitude          : num [1:304] 12.52 34.52 NA -8.81 41.33 ...
 $ region_iso3c      : chr [1:304] "LCN" "SAS" NA "SSF" ...
 $ region_iso2c      : chr [1:304] "ZJ" "8S" NA "ZG" ...
 $ region            : chr [1:304] "Latin America & Caribbean" "South Asia" "Aggregates" "Sub-Saharan Africa" ...
 $ admin_region_iso3c: chr [1:304] NA "SAS" NA "SSA" ...
 $ admin_region_iso2c: chr [1:304] NA "8S" NA "ZF" ...
 $ admin_region      : chr [1:304] NA "South Asia" NA "Sub-Saharan Africa (excluding high income)" ...
 $ income_level_iso3c: chr [1:304] "HIC" "LIC" NA "LMC" ...
 $ income_level_iso2c: chr [1:304] "XD" "XM" NA "XN" ...
 $ income_level      : chr [1:304] "High income" "Low income" "Aggregates" "Lower middle income" ...
 $ lending_type_iso3c: chr [1:304] "LNX" "IDX" NA "IBD" ...
 $ lending_type_iso2c: chr [1:304] "XX" "XI" NA "XF" ...
 $ lending_type      : chr [1:304] "Not classified" "IDA" "Aggregates" "IBRD" ...

Le tableau comporte 304 observation et il mélange des pays (France), des fragments de pays (Réunion) et des agrégats de pays (Europe). Il faudra donc bien faire attention lors de l’extraction à réfléchir à ce que l’on souhaite utiliser. Par exemple, si l’on veut juste les pays :

## Programme en langage R_base
# pays<-cat$countries[cat$countries$income_level!="Aggregates",c("iso3c", "country","capital_city","longitude","latitude", "region","income_level")]


## Programme en langage dplyr

pays <- cat$countries %>% 
          filter(income_level !="Aggregates") %>%
          select(iso3c,country, capital_city, latitude, longitude, region, income_level)


kable(head(pays))

iso3c	country	capital_city	latitude	longitude	region	income_level
ABW	Aruba	Oranjestad	12.51670	-70.0167	Latin America & Caribbean	High income
AFG	Afghanistan	Kabul	34.52280	69.1761	South Asia	Low income
AGO	Angola	Luanda	-8.81155	13.2420	Sub-Saharan Africa	Lower middle income
ALB	Albania	Tirane	41.33170	19.8172	Europe & Central Asia	Upper middle income
AND	Andorra	Andorra la Vella	42.50750	1.5218	Europe & Central Asia	High income
ARE	United Arab Emirates	Abu Dhabi	24.47640	54.3705	Middle East & North Africa	High income

2.3 Le tableau indicators

Il comporte pas loin de 17000 variables … Autant dire qu’il est difficile de l’explorer facilement si l’on ne sait pas ce que l’on cherche.

indic<-cat$indicators
dim(indic)

[1] 16649     8

kable(head(indic))

indicator_id	indicator	unit	indicator_desc	source_org	topics	source_id	source
1.0.HCount.1.90usd	Poverty Headcount ($1.90 a day)	NA	The poverty headcount index measures the proportion of the population with daily per capita income (in 2011 PPP) below the poverty line.	LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank).	11 , Poverty	37	LAC Equity Lab
1.0.HCount.2.5usd	Poverty Headcount ($2.50 a day)	NA	The poverty headcount index measures the proportion of the population with daily per capita income (in 2005 PPP) below the poverty line.	LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank).	11 , Poverty	37	LAC Equity Lab
1.0.HCount.Mid10to50	Middle Class ($10-50 a day) Headcount	NA	The poverty headcount index measures the proportion of the population with daily per capita income (in 2005 PPP) below the poverty line.	LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank).	11 , Poverty	37	LAC Equity Lab
1.0.HCount.Ofcl	Official Moderate Poverty Rate-National	NA	The poverty headcount index measures the proportion of the population with daily per capita income below the official poverty line developed by each country.	LAC Equity Lab tabulations of data from National Statistical Offices.	11 , Poverty	37	LAC Equity Lab
1.0.HCount.Poor4uds	Poverty Headcount ($4 a day)	NA	The poverty headcount index measures the proportion of the population with daily per capita income (in 2005 PPP) below the poverty line.	LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank).	11 , Poverty	37	LAC Equity Lab
1.0.HCount.Vul4to10	Vulnerable ($4-10 a day) Headcount	NA	The poverty headcount index measures the proportion of the population with daily per capita income (in 2005 PPP) below the poverty line.	LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank).	11 , Poverty	37	LAC Equity Lab

2.3.1 Recherche du code d’un indicateur

Supposons qu’on recherche les données récentes sur les émissions de CO2. On va utiliser le mot-clé CO2 pour rechercher les variables correspondantes dans le catalogue à l’aide de la fonction wbsearch, ce qui donne 45 réponses

vars <- wb_search(pattern = "CO2",fields="indicator")
kable(vars)

indicator_id	indicator	indicator_desc
EN.ATM.CO2E.CP.KT	CO2 emissions from cement production (thousand metric tons)	Carbon dioxide emissions from cement production refer mainly to emissions during cement production. Cement production is a multi-step process and CO2 is actually released from klinker production during the cement production process. The U.S. Department of Energy’s carbon Dioxide Information Analysis Center (CDIAC) calculates annual anthropogenic emissions from data on fossil fuel consumption (from the United Nations Statistics Division’s World Energy Data Set) and world cement manufacturing (from the U.S. Bureau of Mine’s Cement Manufacturing Data Set). Carbon dioxide emissions, often calculated and reported as elemental carbon, were converted to actual carbon dioxide mass by multiplying them by 3.664 (the ratio of the mass of carbon to that of carbon dioxide). Although estimates of global carbon dioxide emissions are probably accurate within 10 percent (as calculated from global average file chemistry and use), country estimates may have larger error bounds. Trends estimated from a consistent time series tend to be more accurate than individual values. Each year the CDIAC recalculates the entire time series since 1949, incorporating recent findings and corrections. Estimates exclude fuels supplied to ships and aircraft in international transport because of the difficulty of apportioning he fuels among benefitting countries. The ratio of carbon dioxide per unit of energy shows carbon intensity, which is the amount of carbon dioxide emitted as a result of using one unit of energy in the process of production.
EN.ATM.CO2E.EG.ZS	CO2 intensity (kg per kg of oil equivalent energy use)	Carbon dioxide emissions from solid fuel consumption refer mainly to emissions from use of coal as an energy source.
EN.ATM.CO2E.FF.KT	CO2 emissions from fossil-fuels, total (thousand metric tons)	Fossil fuel is any hydrocarbon deposit that can be burned for heat or power, such as petroleum, coal, and natural gas. This is the sum total of all fossil fuel emissions (solid fuel consumption, liquid fuel consumption, gas fuel consumption, cement production and gas flaring). The U.S. Department of Energy’s carbon Dioxide Information Analysis Center (CDIAC) calculates annual anthropogenic emissions from data on fossil fuel consumption (from the United Nations Statistics Division’s World Energy Data Set) and world cement manufacturing (from the U.S. Bureau of Mine’s Cement Manufacturing Data Set). Carbon dioxide emissions, often calculated and reported as elemental carbon, were converted to actual carbon dioxide mass by multiplying them by 3.664 (the ratio of the mass of carbon to that of carbon dioxide). Although estimates of global carbon dioxide emissions are probably accurate within 10 percent (as calculated from global average file chemistry and use), country estimates may have larger error bounds. Trends estimated from a consistent time series tend to be more accurate than individual values. Each year the CDIAC recalculates the entire time series since 1949, incorporating recent findings and corrections. Estimates exclude fuels supplied to ships and aircraft in international transport because of the difficulty of apportioning he fuels among benefitting countries. The ratio of carbon dioxide per unit of energy shows carbon intensity, which is the amount of carbon dioxide emitted as a result of using one unit of energy in the process of production.
EN.ATM.CO2E.FF.ZS	CO2 emissions from fossil-fuels (% of total)	Fossil fuel is any hydrocarbon deposit that can be burned for heat or power, such as petroleum, coal, and natural gas. This is the sum total of all fossil fuel emissions (solid fuel consumption, liquid fuel consumption, gas fuel consumption, cement production and gas flaring).
EN.ATM.CO2E.GF.KT	CO2 emissions from gaseous fuel consumption (kt)	Carbon dioxide emissions from liquid fuel consumption refer mainly to emissions from use of natural gas as an energy source.
EN.ATM.CO2E.GF.ZS	CO2 emissions from gaseous fuel consumption (% of total)	Carbon dioxide emissions from liquid fuel consumption refer mainly to emissions from use of natural gas as an energy source.
EN.ATM.CO2E.GL.KT	CO2 emissions from gas flaring (thousand metric tons)	Carbon dioxide emissions from gas flaring fuel consumption refer mainly to emissions from gas flaring activities.
EN.ATM.CO2E.KD.GD	CO2 emissions (kg per 2010 US$ of GDP)	Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.
EN.ATM.CO2E.KT	CO2 emissions (kt)	Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.
EN.ATM.CO2E.LF.KT	CO2 emissions from liquid fuel consumption (kt)	Carbon dioxide emissions from liquid fuel consumption refer mainly to emissions from use of petroleum-derived fuels as an energy source.
EN.ATM.CO2E.LF.ZS	CO2 emissions from liquid fuel consumption (% of total)	Carbon dioxide emissions from liquid fuel consumption refer mainly to emissions from use of petroleum-derived fuels as an energy source.
EN.ATM.CO2E.PC	CO2 emissions (metric tons per capita)	Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.
EN.ATM.CO2E.PP.GD	CO2 emissions (kg per PPP $ of GDP)	Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.
EN.ATM.CO2E.PP.GD.KD	CO2 emissions (kg per 2017 PPP $ of GDP)	Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.
EN.ATM.CO2E.SF.KT	CO2 emissions from solid fuel consumption (kt)	Carbon dioxide emissions from solid fuel consumption refer mainly to emissions from use of coal as an energy source.
EN.ATM.CO2E.SF.ZS	CO2 emissions from solid fuel consumption (% of total)	Carbon dioxide emissions from solid fuel consumption refer mainly to emissions from use of coal as an energy source.
EN.ATM.GHGO.KT.CE	Other greenhouse gas emissions, HFC, PFC and SF6 (thousand metric tons of CO2 equivalent)	Other greenhouse gas emissions are by-product emissions of hydrofluorocarbons, perfluorocarbons, and sulfur hexafluoride.
EN.ATM.GHGT.KT.CE	Total greenhouse gas emissions (kt of CO2 equivalent)	Total greenhouse gas emissions in kt of CO2 equivalent are composed of CO2 totals excluding short-cycle biomass burning (such as agricultural waste burning and savanna burning) but including other biomass burning (such as forest fires, post-burn decay, peat fires and decay of drained peatlands), all anthropogenic CH4 sources, N2O sources and F-gases (HFCs, PFCs and SF6).
EN.ATM.HFCG.KT.CE	HFC gas emissions (thousand metric tons of CO2 equivalent)	Hydrofluorocarbons, used as a replacement for chlorofluorocarbons, are used mainly in refrigeration and semiconductor manufacturing.
EN.ATM.METH.AG.KT.CE	Agricultural methane emissions (thousand metric tons of CO2 equivalent)	Agricultural methane emissions are emissions from animals, animal waste, rice production, agricultural waste burning (nonenergy, on-site), and savanna burning.
EN.ATM.METH.EG.KT.CE	Methane emissions in energy sector (thousand metric tons of CO2 equivalent)	Methane emissions from energy processes are emissions from the production, handling, transmission, and combustion of fossil fuels and biofuels.
EN.ATM.METH.KT.CE	Methane emissions (kt of CO2 equivalent)	Methane emissions are those stemming from human activities such as agriculture and from industrial methane production.
EN.ATM.NOXE.AG.KT.CE	Agricultural nitrous oxide emissions (thousand metric tons of CO2 equivalent)	Agricultural nitrous oxide emissions are emissions produced through fertilizer use (synthetic and animal manure), animal waste management, agricultural waste burning (nonenergy, on-site), and savanna burning.
EN.ATM.NOXE.EG.KT.CE	Nitrous oxide emissions in energy sector (thousand metric tons of CO2 equivalent)	Nitrous oxide emissions from energy processes are emissions produced by the combustion of fossil fuels and biofuels.
EN.ATM.NOXE.KT.CE	Nitrous oxide emissions (thousand metric tons of CO2 equivalent)	Nitrous oxide emissions are emissions from agricultural biomass burning, industrial activities, and livestock management.
EN.ATM.PFCG.KT.CE	PFC gas emissions (thousand metric tons of CO2 equivalent)	Perfluorocarbons, used as a replacement for chlorofluorocarbons in manufacturing semiconductors, are a byproduct of aluminum smelting and uranium enrichment.
EN.ATM.SF6G.KT.CE	SF6 gas emissions (thousand metric tons of CO2 equivalent)	Sulfur hexafluoride is used largely to insulate high-voltage electric power equipment.
EN.CLC.GHGR.MT.CE	GHG net emissions/removals by LUCF (Mt of CO2 equivalent)	GHG net emissions/removals by LUCF refers to changes in atmospheric levels of all greenhouse gases attributable to forest and land-use change activities, including but not limited to (1) emissions and removals of CO2 from decreases or increases in biomass stocks due to forest management, logging, fuelwood collection, etc.; (2) conversion of existing forests and natural grasslands to other land uses; (3) removal of CO2 from the abandonment of formerly managed lands (e.g. croplands and pastures); and (4) emissions and removals of CO2 in soil associated with land-use change and management. For Annex-I countries under the UNFCCC, these data are drawn from the annual GHG inventories submitted to the UNFCCC by each country; for non-Annex-I countries, data are drawn from the most recently submitted National Communication where available. Because of differences in reporting years and methodologies, these data are not generally considered comparable across countries. Data are in million metric tons.
EN.CO2.BLDG.ZS	CO2 emissions from residential buildings and commercial and public services (% of total fuel combustion)	CO2 emissions from residential buildings and commercial and public services contains all emissions from fuel combustion in households. This corresponds to IPCC Source/Sink Category 1 A 4 b. Commercial and public services includes emissions from all activities of ISIC Divisions 41, 50-52, 55, 63-67, 70-75, 80, 85, 90-93 and 99.
EN.CO2.ETOT.ZS	CO2 emissions from electricity and heat production, total (% of total fuel combustion)	CO2 emissions from electricity and heat production is the sum of three IEA categories of CO2 emissions: (1) Main Activity Producer Electricity and Heat which contains the sum of emissions from main activity producer electricity generation, combined heat and power generation and heat plants. Main activity producers (formerly known as public utilities) are defined as those undertakings whose primary activity is to supply the public. They may be publicly or privately owned. This corresponds to IPCC Source/Sink Category 1 A 1 a. For the CO2 emissions from fuel combustion (summary) file, emissions from own on-site use of fuel in power plants (EPOWERPLT) are also included. (2) Unallocated Autoproducers which contains the emissions from the generation of electricity and/or heat by autoproducers. Autoproducers are defined as undertakings that generate electricity and/or heat, wholly or partly for their own use as an activity which supports their primary activity. They may be privately or publicly owned. In the 1996 IPCC Guidelines, these emissions would normally be distributed between industry, transport and “other” sectors. (3) Other Energy Industries contains emissions from fuel combusted in petroleum refineries, for the manufacture of solid fuels, coal mining, oil and gas extraction and other energy-producing industries. This corresponds to the IPCC Source/Sink Categories 1 A 1 b and 1 A 1 c. According to the 1996 IPCC Guidelines, emissions from coke inputs to blast furnaces can either be counted here or in the Industrial Processes source/sink category. Within detailed sectoral calculations, certain non-energy processes can be distinguished. In the reduction of iron in a blast furnace through the combustion of coke, the primary purpose of the coke oxidation is to produce pig iron and the emissions can be considered as an industrial process. Care must be taken not to double count these emissions in both Energy and Industrial Processes. In the IEA estimations, these emissions have been included in this category.
EN.CO2.MANF.ZS	CO2 emissions from manufacturing industries and construction (% of total fuel combustion)	CO2 emissions from manufacturing industries and construction contains the emissions from combustion of fuels in industry. The IPCC Source/Sink Category 1 A 2 includes these emissions. However, in the 1996 IPCC Guidelines, the IPCC category also includes emissions from industry autoproducers that generate electricity and/or heat. The IEA data are not collected in a way that allows the energy consumption to be split by specific end-use and therefore, autoproducers are shown as a separate item (Unallocated Autoproducers). Manufacturing industries and construction also includes emissions from coke inputs into blast furnaces, which may be reported either in the transformation sector, the industry sector or the separate IPCC Source/Sink Category 2, Industrial Processes.
EN.CO2.OTHX.ZS	CO2 emissions from other sectors, excluding residential buildings and commercial and public services (% of total fuel combustion)	CO2 emissions from other sectors, less residential buildings and commercial and public services, contains the emissions from commercial/institutional activities, residential, agriculture/forestry, fishing and other emissions not specified elsewhere that are included in the IPCC Source/Sink Categories 1 A 4 and 1 A 5. In the 1996 IPCC Guidelines, the category also includes emissions from autoproducers in the commercial/residential/agricultural sectors that generate electricity and/or heat. The IEA data are not collected in a way that allows the energy consumption to be split by specific end-use and therefore, autoproducers are shown as a separate item (Unallocated Autoproducers).
EN.CO2.TRAN.ZS	CO2 emissions from transport (% of total fuel combustion)	CO2 emissions from transport contains emissions from the combustion of fuel for all transport activity, regardless of the sector, except for international marine bunkers and international aviation. This includes domestic aviation, domestic navigation, road, rail and pipeline transport, and corresponds to IPCC Source/Sink Category 1 A 3. In addition, the IEA data are not collected in a way that allows the autoproducer consumption to be split by specific end-use and therefore, autoproducers are shown as a separate item (Unallocated Autoproducers).
IN.ENV.CO2.CONC	CO2 Emission (in thousand metric tons of Carbon)	NA

On va finalement trouver le code de la variable recherchée

EN.ATM.CO2E.KT : émissions de CO2 en kilotonnes

Les deux autres variables dont nous avons besoin ont pour code

NY.GDP.MKTP.CD : PIB en parités de pouvoir d’achat
SP.POP.TOTL : Population totale

2.3.2 Extraction des métadonnées

Une fois que l’on pense connaître le code de nos variables, on peut extraire les métadonnés pour vérifier qu’il s’agit bien de ce que l’on cherche, quelle est la source exacte, quelle est l’unité de mesure …

# Programme R-base
meta<-cat$indicators[cat$indicators$indicator_id %in% c("SP.POP.TOTL","NY.GDP.MKTP.CD","EN.ATM.CO2E.KT"),]

# Programme dplyr
meta<-cat$indicators %>%
        filter(indicator_id %in% c("SP.POP.TOTL","NY.GDP.MKTP.CD","EN.ATM.CO2E.KT"))

kable(meta)

indicator_id	indicator	unit	indicator_desc	source_org	topics	source_id	source
EN.ATM.CO2E.KT	CO2 emissions (kt)	NA	Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.	Carbon Dioxide Information Analysis Center, Environmental Sciences Division, Oak Ridge National Laboratory, Tennessee, United States.	19 , 6 , Climate Change, Environment	2	World Development Indicators
NY.GDP.MKTP.CD	GDP (current US$)	NA	GDP at purchaser’s prices is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. Data are in current U.S. dollars. Dollar figures for GDP are converted from domestic currencies using single year official exchange rates. For a few countries where the official exchange rate does not reflect the rate effectively applied to actual foreign exchange transactions, an alternative conversion factor is used.	World Bank national accounts data, and OECD National Accounts data files.	3 , Economy & Growth	2	World Development Indicators
SP.POP.TOTL	Population, total	NA	Total population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. The values shown are midyear estimates.	(1) United Nations Population Division. World Population Prospects: 2019 Revision. (2) Census reports and other statistical publications from national statistical offices, (3) Eurostat: Demographic Statistics, (4) United Nations Statistical Division. Population and Vital Statistics Reprot (various years), (5) U.S. Census Bureau: International Database, and (6) Secretariat of the Pacific Community: Statistics and Demography Programme.	19 , 8 , Climate Change, Health	2	World Development Indicators

2.4 L’extraction des données

Elle se fait à l’aide de la fonction wb_data qui comporte de nombreuses options.

2.4.1 le paramètre `indicator =`

Ce paramètre permet de choisir les indicateurs à collecter, ce qui suppose que l’on connaisse leur code. Par exemple, supposons que l’on veuille extraire la population et le PIB pour pouvoir calculer ensuite le PIB par habitant

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"))
dim(df)

[1] 13237     6

kable(head(df,6))

iso2c	iso3c	country	date	NY.GDP.MKTP.CD	SP.POP.TOTL
AW	ABW	Aruba	1960	NA	54208
AW	ABW	Aruba	1961	NA	55434
AW	ABW	Aruba	1962	NA	56234
AW	ABW	Aruba	1963	NA	56699
AW	ABW	Aruba	1964	NA	57029
AW	ABW	Aruba	1965	NA	57357

commentaire : Nous obtenons un tableau très grand (> 13000 lignes) qui comporte les valeurs pour toutes les dates disponibles depuis 1960 et pour tous les pays, même si les valeurs sont souvent manquantes.

2.4.2 le choix d’une période de temps

2.4.2.1 les paramètres `startdate =` et `startdate =`

Ces deux paramètres permettent de choisir une plage de temps. On peut par exemple décider de ne collecter que les données relatives aux années 2014, 2015 et 2016

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"),
                start_date = 2014,
                end_date = 2016)
dim(df)

[1] 651   6

kable(head(df,6))

iso2c	iso3c	country	date	NY.GDP.MKTP.CD	SP.POP.TOTL
AW	ABW	Aruba	2014	2790849162	103776
AW	ABW	Aruba	2015	2962905028	104339
AW	ABW	Aruba	2016	2983636872	104865
AF	AFG	Afghanistan	2014	20497126770	33370804
AF	AFG	Afghanistan	2015	19134211764	34413603
AF	AFG	Afghanistan	2016	18116562465	35383028

commentaire : Le tableau ne comporte donc plus que 651 lignes correspondant aux trois dates pour les différents pays du Monde.

2.4.2.2 Le paramètre `mrv` (most recent value)

Lorsque l’on souhaite juste obtenir les données les plus récentes, on peut remplacer les paramètres startdate = et startdate = par le paramètre mrv = suivit d’un chiffre indiquant le nombre d’années que l’on souhaite à partir de la date la plus récente. Avec mrv=1 on récupère uniquement la dernière année disponible pour au moins l’une des variables.

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"),
                mrv = 1)
dim(df)

[1] 217   6

kable(head(df,6))

iso2c	iso3c	country	date	NY.GDP.MKTP.CD	SP.POP.TOTL
AW	ABW	Aruba	2020	NA	106766
AF	AFG	Afghanistan	2020	20116137326	38928341
AO	AGO	Angola	2020	58375976293	32866268
AL	ALB	Albania	2020	14887629268	2837743
AD	AND	Andorra	2020	NA	77265
AE	ARE	United Arab Emirates	2020	358868765175	9890400

L’inconvénient de cette méthode est que cela peut aboutir à un grand nombre de valeurs manquantes si l’une des variables recherchée n’a pas été mise à jour. Par exemple, la variable relative au CO2 n’est pas disponible après 2016 et du coup le tableau va mélanger des dates différentes.

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL","EN.ATM.CO2E.KT" ),
                mrv =1)
dim(df)

[1] 434   7

kable(head(df,6))

iso2c	iso3c	country	date	EN.ATM.CO2E.KT	NY.GDP.MKTP.CD	SP.POP.TOTL
AW	ABW	Aruba	2018	NA	NA	NA
AW	ABW	Aruba	2020	NA	NA	106766
AF	AFG	Afghanistan	2018	7440	NA	NA
AF	AFG	Afghanistan	2020	NA	20116137326	38928341
AO	AGO	Angola	2018	27340	NA	NA
AO	AGO	Angola	2020	NA	58375976293	32866268

Il est donc préférable de sélectioner une période plus longue mrv=5 et de faire ensuite soi-même le tri :

2.4.3 Le choix des unités géographiques

Le paramètre country = permet de choisir les entités spatiales à collecter, soit sous forme de liste de codes, soit à l’aide de valeurs spéciales. Par défaut; il renvoie la liste de tous les pays, mais on peut se limiter à quelques uns seulement à l’aide de leur nom en anglais (risqué …) ou de leur code ISO3 (plus sûr)

2.4.3.1 sélection de pays

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"),
                start_date = 2018,
                end_date = 2018,
                country = c("USA","CHN"))
df$GDP.per.capita <- round(df$NY.GDP.MKTP.CD / df$SP.POP.TOTL,0)
kable(head(df,6))

iso2c	iso3c	country	date	NY.GDP.MKTP.CD	SP.POP.TOTL	GDP.per.capita
CN	CHN	China	2018	1.389482e+13	1402760000	9905
US	USA	United States	2018	2.061186e+13	326838199	63064

commentaire : Il est donc facile de travailler sur un petit nombre de pays que l’on souhaite comparer.

2.4.3.2 Opérateurs spéciaux

Il existe un certain nombre de paramètres spéciaux que l’on peut utiliser à la place de la liste des pays :

“countries_only” (Default)
“regions_only”
“admin_regions_only”
“income_levels_only”
“aggregates_only”
“all”

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"),
                start_date = 2018,
                end_date = 2018,
                country = "regions_only")
df$GDP.per.capita <- round(df$NY.GDP.MKTP.CD / df$SP.POP.TOTL,0)
kable(df)

iso2c	iso3c	country	date	NY.GDP.MKTP.CD	SP.POP.TOTL	GDP.per.capita
Z4	EAS	East Asia & Pacific	2018	2.641632e+13	2338223462	11298
Z7	ECS	Europe & Central Asia	2018	2.321731e+13	918031055	25290
ZJ	LCN	Latin America & Caribbean	2018	5.703879e+12	640483586	8906
ZQ	MEA	Middle East & North Africa	2018	3.356567e+12	448974232	7476
XU	NAC	North America	2018	2.234094e+13	363967296	61382
8S	SAS	South Asia	2018	3.436594e+12	1814455018	1894
ZG	SSF	Sub-Saharan Africa	2018	1.753415e+12	1078319512	1626

commentaire : Nous avons extrait les données par grandes régions du Monde pour l’année 2016

2.4.4 Le format de sortie du tableau

Il existe deux façons d’extraire un tableau comprenant plusieurs variables ou plusieurs dates, selon que l’on veut un tableau large (wide) ou étroit. On peut régler la sortie à l’aide du paramètre return_wide qui est TRUE par défaut mais que l’on peut régler sur FALSE.

2.4.4.1 `return_wide` = FALSE

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"),
                return_wide = TRUE,
                start_date = 2016,
                end_date = 2018,
                country = c("USA","CHN"))
df

# A tibble: 6 × 6
  iso2c iso3c country        date NY.GDP.MKTP.CD SP.POP.TOTL
  <chr> <chr> <chr>         <dbl>          <dbl>       <dbl>
1 CN    CHN   China          2016        1.12e13  1387790000
2 CN    CHN   China          2017        1.23e13  1396215000
3 CN    CHN   China          2018        1.39e13  1402760000
4 US    USA   United States  2016        1.87e13   323071755
5 US    USA   United States  2017        1.95e13   325122128
6 US    USA   United States  2018        2.06e13   326838199

2.4.4.2 `return_wide` = FALSE

df   <- wb_data(indicator  = c("NY.GDP.MKTP.CD","SP.POP.TOTL"),
                return_wide = FALSE,
                start_date = 2016,
                end_date = 2018,
                country = c("USA","CHN"))
df[,1:7]

# A tibble: 12 × 7
   indicator_id   indicator         iso2c iso3c country        date   value
   <chr>          <chr>             <chr> <chr> <chr>         <dbl>   <dbl>
 1 NY.GDP.MKTP.CD GDP (current US$) CN    CHN   China          2018 1.39e13
 2 NY.GDP.MKTP.CD GDP (current US$) CN    CHN   China          2017 1.23e13
 3 NY.GDP.MKTP.CD GDP (current US$) CN    CHN   China          2016 1.12e13
 4 NY.GDP.MKTP.CD GDP (current US$) US    USA   United States  2018 2.06e13
 5 NY.GDP.MKTP.CD GDP (current US$) US    USA   United States  2017 1.95e13
 6 NY.GDP.MKTP.CD GDP (current US$) US    USA   United States  2016 1.87e13
 7 SP.POP.TOTL    Population, total CN    CHN   China          2018 1.40e 9
 8 SP.POP.TOTL    Population, total CN    CHN   China          2017 1.40e 9
 9 SP.POP.TOTL    Population, total CN    CHN   China          2016 1.39e 9
10 SP.POP.TOTL    Population, total US    USA   United States  2018 3.27e 8
11 SP.POP.TOTL    Population, total US    USA   United States  2017 3.25e 8
12 SP.POP.TOTL    Population, total US    USA   United States  2016 3.23e 8

2.5 Exercices

2.5.1 Exercice 1

Extraire les métadonnées relatives à la variable SP.URB.TOTL

indicator_id	indicator	unit	indicator_desc	source_org	topics	source_id	source
SP.URB.TOTL	Urban population	NA	Urban population refers to people living in urban areas as defined by national statistical offices. It is calculated using World Bank population estimates and urban ratios from the United Nations World Urbanization Prospects. Aggregation of urban and rural population may not add up to total population because of different country coverages.	World Bank staff estimates based on the United Nations Population Division’s World Urbanization Prospects: 2018 Revision.	19 , 16 , Climate Change , Urban Development	2	World Development Indicators

meta<-cat$indicators[cat$indicators$indicator_id %in% c("SP.URB.TOTL"),]

2.5.2 Exercice 2

Créer un tableau de la population des pays du monde en 2000, triez le par ordre décroissant et affichez les 10 pays les plus peuplés avec leur nom,leur code et la population en millions

Code	Pays	Population
CHN	China	1262.6
IND	India	1056.6
USA	United States	282.2
IDN	Indonesia	211.5
BRA	Brazil	174.8
RUS	Russian Federation	146.6
PAK	Pakistan	142.3
BGD	Bangladesh	127.7
JPN	Japan	126.8
NGA	Nigeria	122.3

# Chargement des données avec l'API
tab <- wb_data(indicator = c("SP.POP.TOTL"),
                  start_date=2000,
                  end_date = 2000)

### Tri, sélection, transformation  et recodage en R-Base
 # tab<-tab[order(tab$SP.POP.TOTL,decreasing = T),]
 # tab<-tab[,c("iso3c","country","SP.POP.TOTL")]
 # tab$SP.POP.TOTL<-tab$SP.POP.TOTL/1000000
 # names(tab)<-c("Code", "Nom", "Population")

### Tri, sélection, transformationet recodage en dplyr
tab<- tab %>% 
          arrange(desc(SP.POP.TOTL)) %>%
          select(iso3c, country, SP.POP.TOTL) %>%
          mutate(SP.POP.TOTL = SP.POP.TOTL/1000000) %>%
          rename(Code=iso3c, Pays = country, Population = SP.POP.TOTL)

# Affichage du résultat
kable(head(tab,10), digits=1)

2.5.3 Exercice 3

On se propose de comparer l’évolution des émissions de CO2 (EN.ATM.CO2E.KT)de la Chine (CHN), l’Inde (IND), la Russie (RUS) le Japon (JPN) et des Etats-Unis d’Amérique (USA) de 1995 à 2015.

2.5.3.1 CO2 en valeur brute (tonnes)

Réalisez un graphique présentant les valeurs de CO2 en milliers de tonnes avec une échelle logarithmique sur l’axe y pour mieux visualiser les taux de croissance.

# Chargement des données avec l'API
tab <- wb_data(indicator = c("EN.ATM.CO2E.KT"),
                  country = c("CHN","IND","RUS","USA","JPN"),
                  start_date=1995,
                  end_date = 2015)

p<-ggplot(tab) + aes(x=date, y = EN.ATM.CO2E.KT, color= country) +
                  geom_line() +
                  scale_y_log10("en milliers de t") +
                  ggtitle(label = "Principaux pays émetteurs de CO2 (1995-2015)",
                          subtitle = "Source : Banque Mondiale - API wbstat")

p

2.5.3.2 CO2 en valeur relative (tonnes par habitant)

Même exercice mais en téléchargeant aussi la population (SP.POP.TOTL) de façon à calculer la variable CO2.per.capita qui mesure le nombre de tonnes de CO2 par habitant. On utilisera cette fois-ci une échelle arithmétique sur l’axe vertical.

# Chargement des données avec l'API
tab <- wb_data(indicator = c("EN.ATM.CO2E.KT", "SP.POP.TOTL"),
                  country = c("CHN","IND","RUS","USA","JPN"),
                  start_date=1995,
                  end_date = 2015)

tab <- tab %>% mutate(CO2.per.capita = 1000*EN.ATM.CO2E.KT/SP.POP.TOTL)

# Visualisation avec ggplot2
p<-ggplot(tab) + aes(x=date, y = CO2.per.capita, color= country) +
                  geom_line() +
                  scale_y_continuous("en tonnes par habitant") +
                  ggtitle(label = "Principaux pays émetteurs de CO2 (1995-2015)",
                          subtitle = "Source : Banque Mondiale - API wbstat")

p

2.5.4 Exercice 4

On se propose de comparer les plus grands pays du Monde en combinant deux critères :

DEVDUR = Développement durable : mesuré par les quantités de CO2 par habitant
DEVECO = Développement économique : mesurée par le PIB par habitant

2.5.4.1 Analyse pour une année (2010) et un seuil de population (10 millions)

On construit un programme pour une année précise (2010)et en ne retenant que les pays ayant une population minimale (10 millions d’habitants)

# Chargement des données avec l'API
tab <- wb_data(indicator = c("EN.ATM.CO2E.KT", "SP.POP.TOTL","NY.GDP.MKTP.CD"),
                  start_date=2010,
                  end_date = 2010)

tab <- tab %>% mutate(DEVDUR = 1000*EN.ATM.CO2E.KT/SP.POP.TOTL,
                      DEVECO = NY.GDP.MKTP.CD/SP.POP.TOTL,
                      POP = SP.POP.TOTL/1000000) %>%
                      rename(Code = iso3c, 
                      Pays = country) %>%
               select(Code,Pays, POP, DEVDUR, DEVECO)%>%
               filter(POP > 10)
        

# Visualisation avec ggplot2
p<-ggplot(tab) + aes(x=DEVECO, y = DEVDUR) +
                  geom_point(aes(size=POP),col="red") +
                  geom_text(aes(label=Code), size=2, nudge_y=1)+
                  scale_x_log10("PIB par habitant (échelle logarithmique)") +
                  scale_y_continuous("CO2 par habitant") +
                  ggtitle(label = "Développement dans le Monde en 2010",
                          subtitle = "Source : Banque Mondiale - API wbstat")

p

2.5.4.2 Création d’une fonction f(année, population)

On reprend le même programme mais sous forme d’une fonction mongraphique() renvoyant le diagramme en selon le choix de deux paramètres : l’année et le seuil minimal de population. On teste ensuite la fonction pour l’année 1996 et l’année 2016 en prenant un seuil de 50 millions d’habitants.

mongraphique <-function(year = 2010, minpop = 10)
{ 

# Chargement des données avec l'API
tab <- wb_data(indicator = c("EN.ATM.CO2E.KT", "SP.POP.TOTL","NY.GDP.MKTP.CD"),
                  start_date=year,
                  end_date = year)

tab <- tab %>% mutate(DEVDUR = 1000*EN.ATM.CO2E.KT/SP.POP.TOTL,
                      DEVECO = NY.GDP.MKTP.CD/SP.POP.TOTL,
                      POP = SP.POP.TOTL/1000000) %>%
                      rename(Code = iso3c, 
                      Pays = country) %>%
               select(Code,Pays, POP, DEVDUR, DEVECO)%>%
               filter(POP > minpop)
        

# Visualisation avec ggplot2
p<-ggplot(tab) + aes(x=DEVECO, y = DEVDUR) +
                  geom_point(aes(size=POP),col="red") +
                  geom_text(aes(label=Code), size=2, nudge_y=1)+
                  scale_x_log10("PIB par habitant (échelle logarithmique)") +
                  scale_y_continuous("CO2 par habitant") +
                  ggtitle(label = paste("Développement dans le Monde en ", year),
                          subtitle = "Source : Banque Mondiale - API wbstat")

p
}

mongraphique(1996,50)
mongraphique(2016,50)