This notebook present the functions used by the interface TELEMAC presented in the paper published in the journal Frontiers and available at https://analytics.huma-num.fr/Claude.Grasland/telemac/

DATA COLLECTION

Selection of newspapers

We have collected titles of news declared as international (an article is an article where as least one foreign country has been detected by the tagging procedure). In each country of analysis, we selected newspapers with a national audience for which we have the avaible data for a period of seven years (mid 2013 to mid 2020).

Selection of daily newspapers
Code	Name	URL
DEU_tagspi	Tagespiegel	https://www.tagesspiegel.de/
DEU_frankf	FAZ	http://www.faz.net/
DEU_suddeu	Süd. Zeitung	http://www.sueddeutsche.de/
DEU_diewel	Die Welt	https://www.welt.de/
ESP_abcxxx	ABC	http://www.abc.es/
ESP_percat	Periodico de Cat.	http://www.elperiodico.com/es/
ESP_vangua	La Vanguardia	https://www.lavanguardia.com/
ESP_mundo	El Mundo	https://www.elmundo.es/
FRA_figaro	Le Figaro	http://www.lefigaro.fr/
FRA_lacroi	La Croix	http://www.la-croix.com/
FRA_libera	Libération	http://liberation.fr/
FRA_lmonde	Le Monde	http://www.lemonde.fr/
GBR_guardi	The Guardian	http://www.theguardian.com/uk
GBR_indept	The Independent	http://www.independent.co.uk/
GBR_mirror	The Mirror	http://www.mirror.co.uk/
GBR_dailyt	The Telegraph	https://www.telegraph.co.uk/
ITA_mattin	Il Mattino	http://www.ilmattino.it
ITA_messag	Il Messaggero	http://www.ilmessaggero.it/
ITA_repubb	La Repubblica	http://www.repubblica.it/
ITA_stampa	La Stampa	http://www.lastampa.it/

Selection of textual units

Each item of the RSS feeds has been divided in a maximum of four sentences (textual units): * The title is considered as the first sentence, * A part of the description with a maximum size of three sentences. Longer descriptions has been rejected from the data set. The following table present the the number of textual units (sentences) aviable in each newspaper. The textual content of two newspapers ( The Guardian, El Mundo) is limited to the only title.

Textual units by media and order in news
	Title	1st sentence	2nd sentence	3rd sentence	Sum
de_DEU_diewel	35589	43738	16145	4720	100192
de_DEU_frankf	23396	36811	11745	2871	74823
de_DEU_suddeu	21149	34484	9509	2191	67333
de_DEU_tagspi	10265	26286	7495	2526	46572
en_GBR_dailyt	70642	0	0	0	70642
en_GBR_guardi	78063	37368	22035	10550	148016
en_GBR_indept	65193	43461	2717	633	112004
en_GBR_mirror	77968	34418	1952	120	114458
es_ESP_abcxxx	36965	93767	60517	44802	236051
es_ESP_mundo	31902	54	1	0	31957
es_ESP_percat	36889	42928	6232	2717	88766
es_ESP_vangua	9382	23655	6723	1211	40971
fr_FRA_figaro	58621	67692	18660	9551	154524
fr_FRA_lacroi	58624	65196	25604	8067	157491
fr_FRA_libera	13150	9850	1522	249	24771
fr_FRA_lmonde	30319	21882	2285	179	54665
it_ITA_mattin	15264	22596	9750	2252	49862
it_ITA_messag	26068	33012	13286	3054	75420
it_ITA_repubb	22362	24993	5918	892	54165
it_ITA_stampa	14059	21914	14859	5365	56197
Sum	735870	684105	236955	101950	1758880

Distribution of news by country and by week

For the full period of observation, the total number of textual units published weekly in each country of the sample is between 500 and 2000.

BUILDING AN HYPERCUBE

Multiple dimensions for the observation of news

The hypercube is the result of an aggregation of foreign news according several dimensions:

who : this dimension is related to the variable which describe the media outlets which published the RSS feeds. Each source is related to a codell_sss_xxxxxx where llis the language, sssis the ISO3 code of the country and xxxxxxthe name of the media. For instance, a RSS feed produced by the French newspaper Le Figaro is identified by the code who = fr_FRA_figaro. Starting from there, it is then possible to proceed to aggregation of the data by group of languages (eg. computation of the indicators for all the French speaking newspapers) or countries (compute the indicateurs for all the media outlets located in France).
when : this dimension describe the period when an article of the RSS feeds has been published, according to the time zone of Paris. We note that this variable was transformed from the original data.: the Mediacloud collect the data according to the time zone of Boston. In the interface, the data are aggregated according to different period of aggregation: weeks, months, quarters or years. In each case, the variable when is identified by the date of the first day of the period. For instance, by choosing to work on monthly aggregated data, the first period of observation for the year 2015 will be: when = 2015-01-01. These different aggregated data are pre-computed in the data base used by the TELEMAC interface: it is not possible to access to the daily data published by the newspaper and avaible in the Mediacloud data base.
where1 and where2 : this dual dimension is associated to the cross-list of foreign countries detected by the country dictionary in the news. For example the news (“Conflict between Russia and Turkey about Syria”) will lead to list of three places (RUS,TUR,SYR) associated to the cross-list of nine couple of places (RUS-RUS, RUS-TUR, RUS-SYR, TUR-RUS, TUR-TUR, TUR-SYR, SYR-RUS, SYS-TUR, SYR-SYR) where each couple will receive a weight of 1/9. It is important to keep in mind that the countries where the media are located (mentionned in the who dimension) are excluded from the list (see our ambition to work foreign news). We excluded fro the data set the news where any countries is mentioned.
what : this dimension refers to the topic of interest for the exploration. The topic is identified through a boolean (presence of the topic: what = _yes_, absence of the topic: what = _no_ ) This solution is applied 1) for the border topic: the variable identify the articles where the keyword “border” is detected in the language of the source (using a multilingual dictionary for the tagging procedure). The topic of international migration is composed by several group of words related to three different way to speak about immigrant people : “migrants”, “refugee” and “asylum seekers” (what = MIG, what = REF, what = ASY) which refers to news that use the lexical terms of refugees or migrants or ayslum seekers (again: this procedure use a multilingual dictionary taking in account the variation of vocabulary specific of each national context). When two subtopics are detected in the same text, the weigh of the news is shared between the different topics. For example a news entitled (“Migrant and refugees from Syria arrived in Hungary”) will be broken in 8 cells of the hypercube, each of them with weigh 1/8th corresponding to the combinations (MIG-SYR-SYR, MIG-SYR-HUN, MIG-HUN-SYR, MIG-HUN-HUN, REF-SYR-SYR, REF-SYR-HUN, REF-HUN-SYR, REF-HUN-HUN) because we have to combine the topic dimension what with the previous dimensions where1and where2.

To build the hypercube, it is possible to works on different size of text units: (order=1): the title or the first sentence, or (order = 2,3,4, ...): the title with the selected number of sentence of the description abaible This parameter is important because some results, especially regarding the spatial dimension of the analysis (where) are more noticeable on longer texts.

Dictionary of countries

The aim is to create a World map of states which includes an extended list of UN Members, and all pieces of land belonging to a state, whatever its legal status.The states list gathers:

the 193 member states of the UN,
the 2 observer states of the UN : Palestine and the Holy See (Vatican),
the states recognized by at least one member of the UN: Kosovo, Taiwan, Western Sahara, Abkhazia, South Ossetia, Northern Cyprus.

The base map of the 201 spatial units is extracted from Natural Earth with a high level of resolution for the location of the centroid of all spatial units (world_ctr). But a simplified version is produced with the geometry of the 177 larger units only for quicker visualization in the graphic interface (world). Both files are stored in spatial features format from the package R sf which is currently the most practical storage. The default projection is the Winkel Tripel, centered on the meridian of Greenwich.

The recognition of states is based on a limited of keywords or regular expression which concern the name of the country, the adjective and/or the gentile of inhabitants and the capital city. The regular expression are used cautiously in order to exclude false positive.

Sample of the dictionary of countries (Syria)
ISO3	regexp	language
SYR	syriens	de
SYR	syrien	de
SYR	damaskus	de
SYR	syrisch*	de
SYR	syria	en
SYR	damascus	en
SYR	syrian*	en
SYR	siria	es
SYR	damasco	es
SYR	sirio	es
SYR	siria	es
SYR	syrien*	fr
SYR	damas	fr
SYR	syrie	fr
SYR	siria	it
SYR	damasco	it
SYR	sirian*	it

Dictionary of border topic

The dictionary of border topic is voluntary limited to one or two words or regular expression in each language related to the political divisions of the world. We exclude words that are more ambiguous and are not necessary related to geopolitical divisions like frontier or limit in english.

Dictionary of border topic
ISO3	regexp	language
BOR	confine	it
BOR	frontier*	it
BOR	border*	en
BOR	frontière*	fr
BOR	frontali*	fr
BOR	frontera*	es
BOR	fronteriz*	es
BOR	confin*	es
BOR	grenz*	de

Dictionary of international human mobility

The dictionary of international human mobility is more complex as it is based on two different concept which correspond in english to migrant and refugees and which can be observed in the other languages under investigation. The first concept (designed as MIG) is more general but is also often associated to more negative visions of human mobility across borders. The second concept (designed as REF) is generally related to legal issues but can also be associated to “politically correct” way of description associated to a more positive - or less negative- vision of human mobility. These dual concepts insure a bette coverage of the topic in different languages but can also be used to discover semantic shifts inside the same language. For a more detailed discussion, see. Leconte R., Toureille E. & Grasland C., 2020

Dictionary of mobility topic
ISO3	regexp	language
MIG	einwander*	de
MIG	migrant*	de
REF	asyl*	de
REF	flüchtling*	de
REF	geflüchtete*	de
MIG	immigrant*	en
MIG	migrant*	en
REF	asylum	en
REF	refugee*	en
MIG	inmigrant*	es
MIG	migrant*	es
REF	asilo	es
REF	refugiad*	es
MIG	immigrant*	fr
MIG	immigre*	fr
MIG	migrant*	fr
REF	asile	fr
REF	réfugié*	fr
MIG	immigrant*	it
MIG	immigrat*	it
MIG	migrant*	it
REF	asilo	it
REF	rifugiat*	it

Storage and aggregation ofthe hypercube

Considering the potential size of the hypercubes, we have chosen an efficient format of storage with the R package data.table (https://rdatatable.gitlab.io/data.table/) which is recognized as more efficient for large computation than the classical data.frame or tibble formats.

In order to reduce the size of storage we do not store the empty cells of the cube but who have to keep in mind the fact that these empty cells should be taken into account when we will further aggregate the cube for the production of maps or timelines.

As an example, we present below an extraction of the hypercube of titles (order) of news during the month of september 2015 (when = “2015-09-01”) by the french newspaper La Croix (who) about Syria and Hungary (where1 and where2) for the topic of human mobility (what). We have normally 1 x 1 x 1 x 2 x 2 x 4 = 16 possibilities of cells as the dimensions of order, who and time are fixed. But only 10 possibilities out of 16 are realized.

order	who	when	where1	where2	what	news
1	fr_FRA_lacroi	2015-09-01	SYR	SYR	no	51.4714719
1	fr_FRA_lacroi	2015-09-01	HUN	HUN	MIG	16.3055556
1	fr_FRA_lacroi	2015-09-01	SYR	SYR	REF	11.3333333
1	fr_FRA_lacroi	2015-09-01	HUN	HUN	no	3.5000000
1	fr_FRA_lacroi	2015-09-01	SYR	SYR	MIG	3.2500000
1	fr_FRA_lacroi	2015-09-01	HUN	HUN	REF	2.3111111
1	fr_FRA_lacroi	2015-09-01	SYR	HUN	REF	0.1111111
1	fr_FRA_lacroi	2015-09-01	HUN	SYR	REF	0.1111111

In the most of cases, the news just mention one single country. In the case of Syria, 51.4 news does not mention the topic, 11.3 mentions “refugees”, 3.25 “migrants”. In the case of Hungary, 3.5 news does not mention the topic, 16.3 mentions “migrants”, 2.3 “refugees”. The two countries are associated only one time about a news about asylum seekers where a third country was mentioned, which explains the weight of 0.111 = 1/9.

EXPLORATION AND VISUALIZATION FONCTIONS

The five dimensions of analysis could be analysed through different aggregation of the dimensions of the hypercubes, leading to different tables authorizing different modes of visualization. Each function is named according to the dimensions that it is built for. For the the what dimension the different subtopics are never analyzed simultaneously. The default option is the aggregation of all the subtopics.

WHY : DEFINITION OF THE RESEARCH QUESTION

The researcher interested in a topic (WHAT) can develop two mains strategies to explore the topic through the computation and the visualization of two different indicators :

The salience indicator, defined as the ratio \(p_{obs}/p_{est}\) between observed and predicted number of news related to the topic, is ideal for an inductive approach. It is similar to specialization indexes used in econometric model and very useful at the initial stage to identify under- or over-representations of the topic of interest during the time units of observation.
The p.value of a chi-2 test applied to the hypothesis of independence is more adapted to a deductive or hypthetico-deductive approach, to improve hypothetisis like the suspition of under- or over-representations revealed by a first observation using the salience index. Using an unilateral test (\(H_0 : p_{obs} > p_{est}\)) we obtain a p-value which can be interpreted as a normalized index of deviation define on \([0,1]\) (with the tool it is possible to interpret the value greater than 0.99 as significant positive exceptions and the values lower than 0.01 as significant negative exceptions).

All the functions described below will therefore use a statistical test which identify for which value it is possible to say that the proportion of news related to a topic is significantly lower or greater to a reference value. The R function testchi2 receive as input a table with three columns:

trial : the number of textual units where the topic can be identified: the number of title or sentences that has been collected. The number of trial can be greater than the number of news when we combine the title and the sentences of the description.
success : the number of textual units where the topic of interest has been found.: a sum of weights than are not necessarily integers, the values are rounded to zero before computation of the test.
null.value : the expected probability of success (i.e. the expected value of the ratio of success divided by trial). In the one-dimension cases, the null.value is equal to a unique value corresponding to the mean probability of success for all sources during the period of observation. But in the case of multi-dimensional analysis, the null.value can be segmented by one of the dimension of interest.

Once the three variables are fixed, the function will be applied according to a set of three parameters :

mintest: the minimum value of the estimated number of success needed for the chi-square test in optimal statistical conditions. The literature suggest a threshold value of 5 but the user can decide to reduce or increase this threshold. The test will be made if and only if the threshold is reached. We do not recommend to use low threshold: it will generate error messages by using the prop.test() function and increase the time need for the computation procedure.
cut-breaks and cut_namesare linked parameters used to define qualitative interpretations of the p-value. In our example, we are interested in both value located below or above the threshold. For this reason we will use a double scale of significance in both directions around the expected probability.

The result of the procedure is the addition of four columns to the initial table :

estimate is the observed proportion of the news related to the topic (i.e. ratio of success divided by trial) which has to be compared to the reference value,
salience is the ratio between estimate and null.value,
chi2 is the value of the chi-square test of the relation between expected and observed success,
p.value is the result of the unilateral test \(p_{observed} < p_{estimated}\) with 1 degree of freedom.
NB.1: When the threshold minsamp is not reached, the columns estimate and salience are filled with missing values. The aim is to avoid interpretation of variation of proportion that are not based on a reasonable sample of news. The default value for this parameter is 20| .
NB.2 : When the threshold minsamp and mintest are not reached together, the columns chi2 and p.value are filled with missing values. The aim is to avoid realization of test that do not fulfill the statistical conditions according to the rules of the art. The dafault value for this parameter is 5.

Function

#### ---------------- testchi2 ----------------
#' @title  Compute the average salience of the topic and test significance of deviation
#' @name what
#' @description create a table and graphic of the topic
#' @param tabtest a table with variable trial, success and null.value
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest : Threshold of estimated value requested for chi-square test


testchi2<-function(tabtest=tabtest,
                   minsamp = 20,
                   mintest = 5) 
  {
  tab<-tabtest
  n<-dim(tab)[1]
  
  # Compute salience if sample size sufficient (default : N>20)
  tab$estimate <-NA
  tab$salience <-NA
  tab$chi2<-NA
  tab$p.value<-NA
  if (tab$trial > minsamp){ tab$estimate<-round(tab$success/tab$trial,5)
                      tab$salience<-tab$estimate/tab$null.value
  
         # Chi-square test if estimated value sufficient (default : Nij* > 5)

           for (i in 1:n) {
               if(tab$trial[i]*tab$null.value[i]>=mintest) {  
                    test<-prop.test(x=tab$success[i],n=tab$trial[i], p=tab$null.value[i], 
                                alternative = "less")
                    tab$chi2[i]<-round(test$statistic,2)
                    tab$p.value[i]<-round(test$p.value,5)
                 } 
                }
  }
  return(tab)
}

who	trial	success	null.value	estimate	salience	chi2	p.value
fr_FRA_lmonde	54665	1319	0.0209	0.02413	1.1545455	27.69	1.00000
fr_FRA_figaro	154524	3926	0.0209	0.02541	1.2157895	153.17	1.00000
fr_FRA_lacroi	157490	3278	0.0209	0.02081	0.9956938	0.05	0.40915
fr_FRA_libera	24771	602	0.0209	0.02430	1.1626794	13.85	0.99990
de_DEU_tagspi	46571	1369	0.0209	0.02940	1.4066986	163.86	1.00000
de_DEU_suddeu	67332	1657	0.0209	0.02461	1.1775120	45.09	1.00000
de_DEU_diewel	100192	3132	0.0209	0.03126	1.4956938	525.00	1.00000
de_DEU_frankf	74823	1662	0.0209	0.02221	1.0626794	6.23	0.99373
en_GBR_mirror	114458	951	0.0209	0.00831	0.3976077	886.16	0.00000
en_GBR_indept	112004	2663	0.0209	0.02378	1.1377990	45.13	1.00000
en_GBR_dailyt	70642	1419	0.0209	0.02009	0.9612440	2.24	0.06719
en_GBR_guardi	148015	3474	0.0209	0.02347	1.1229665	47.67	1.00000
es_ESP_abcxxx	236050	3683	0.0209	0.01560	0.7464115	323.45	0.00000
es_ESP_percat	88766	2000	0.0209	0.02253	1.0779904	11.46	0.99964
es_ESP_mundo	31957	592	0.0209	0.01852	0.8861244	8.69	0.00160
es_ESP_vangua	40971	507	0.0209	0.01237	0.5918660	145.11	0.00000
it_ITA_repubb	54165	1305	0.0209	0.02409	1.1526316	26.83	1.00000
it_ITA_messag	75421	1205	0.0209	0.01598	0.7645933	89.09	0.00000
it_ITA_stampa	56196	1175	0.0209	0.02091	1.0004785	0.00	0.50004
it_ITA_mattin	49862	835	0.0209	0.01675	0.8014354	41.84	0.00000

WHO.WHAT

Function

#### ---------------- who.what ----------------
#' @title  visualize variation of the topic between media
#' @name who.what
#' @description create a table of variation of the topic by media
#' @param hc an hypercube prepared as data.table
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest sample size of estimate for chi-square test (default = 5)
#' @param title Title of the graphic


who.what <- function (hc = hypercube,
                      test = FALSE,
                      minsamp = 20,
                      mintest = 5,
                      title = "Who says What ?")
{
  
  tab<-hc
{tab$what <-tab$what !="_no_"}
  
  tab<-tab[,list(trial = sum(news),success=round(sum(news*what),0)),by = list(who)]
  ref <-round(sum(tab$success)/sum(tab$trial),4)
  tab$null.value<-ref
  
  tab<-testchi2(tabtest=tab,
                minsamp = minsamp,
                mintest = mintest)
  

  
    if (test==FALSE) {tab$index =tab$salience
                     tab<-tab[tab$trial > minsamp,]
                     mycol<-brewer.pal(7,"YlOrRd")
                     } 
               else {tab$index=tab$p.value
                     tab<-tab[tab$trial*tab$null.value>mintest,]
                      mycol<-rev(brewer.pal(7,"RdYlBu"))
                      mycol[4]<-"lightyellow"
                     }
  
  p <- plot_ly(tab,
               x = ~who,
               y = ~estimate*100,
               color= ~index,
               colors= mycol,
               hoverinfo = "text",
               text = ~paste('Source: ',who,
                             '<br /> Total news  : ', round(trial,0),
                             '<br /> Topic news : ', round(success,0),
                             '<br /> % observed  : ', round(estimate*100,2),'%',
                             '<br /> % estimated : ', round(null.value*100,2),'%',
                             '<br /> Salience : ', round(salience,2),  
                             '<br /> p.value : ', round(p.value,4)),
               type = "bar")  %>%
    layout(title = title,
           yaxis = list(title = "% news"),
           barmode = 'stack')
  
  output<-list("table" = tab, "plotly" =p)
  
  return(output)
  
}

Example

WHEN.WHAT

Function

#### ---------------- when.what ----------------
#' @title  visualize variation of the topic through time
#' @name when.what
#' @description create a table of variation of the topic by media
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest sample size of estimate for chi-square test (default = 5)
#' @param title Title of the graphic


when.what <- function (hc = hypercube,
                      test = FALSE,
                      minsamp = 20,
                      mintest = 5,
                      title = "Who says What ?")
{
  
  tab<-hc
{tab$what <-tab$what !="_no_"}
  
  tab<-tab[,list(trial = sum(news),success=round(sum(news*what),0)),by = list(when)]
  ref <-round(sum(tab$success)/sum(tab$trial),4)
  tab$null.value<-ref
  
  tab<-testchi2(tabtest=tab,
                minsamp = minsamp,
                mintest = mintest)
  
    if (test==FALSE) {tab$index =tab$salience
                     tab<-tab[tab$trial > minsamp,]
                     mycol<-brewer.pal(7,"YlOrRd")
                     } 
               else {tab$index=tab$p.value
                     tab<-tab[tab$trial*tab$null.value>mintest,]
                      mycol<-rev(brewer.pal(7,"RdYlBu"))
                      mycol[4]<-"lightyellow"
                     }
  
  
  p <- plot_ly(tab,
               x = ~as.character(when),
               y = ~estimate*100,
               color= ~index,
               colors= mycol,
               hoverinfo = "text",
               text = ~paste('Time: ',when,
                             '<br /> Total news  : ', round(trial,0),
                             '<br /> Topic news : ', round(success,0),
                             '<br /> % observed  : ', round(estimate*100,2),'%',
                             '<br /> % estimated : ', round(null.value*100,2),'%',
                             '<br /> Salience : ', round(salience,2),  
                             '<br /> p.value : ', round(p.value,4)),
               type = "bar")  %>%
    layout(title = title,
           yaxis = list(title = "% news"),
           barmode = 'stack')
  
  output<-list("table" = tab, "plotly" =p)
  
  return(output)
  
}

Example

##          order           who       when where1 where2 what     news
##       1:     2 fr_FRA_lmonde 2014-01-06    POL    POL _no_ 1.000000
##       2:     2 fr_FRA_figaro 2014-01-06    PRT    PRT _no_ 1.500000
##       3:     2 fr_FRA_figaro 2014-01-06    DNK    DNK _no_ 3.000000
##       4:     2 fr_FRA_figaro 2014-01-06    ESP    ESP _no_ 9.950000
##       5:     1 fr_FRA_lmonde 2014-01-06    USA    USA _no_ 8.651923
##      ---                                                           
## 1808766:     4 it_ITA_repubb 2016-08-01    SDN    SDN _no_ 0.500000
## 1808767:     1 it_ITA_repubb 2014-10-13    PRK    PRK _no_ 1.000000
## 1808768:     1 it_ITA_repubb 2019-07-22    AUT    AUT _no_ 1.000000
## 1808769:     1 it_ITA_repubb 2014-07-14    VAT    VAT _no_ 1.000000
## 1808770:     1 it_ITA_stampa 2019-04-22    NIC    NIC _no_ 1.000000

WHERE.WHAT

Function

#### ---------------- where.what ----------------
#' @title  visualize spatialization of the topic 
#' @name where.what
#' @description create a table of variation of the topic by media
#' @param hc an hypercube prepared as data.table
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest sample size of estimate for chi-square test (default = 5)
#' @param map a map with coordinates in lat-long
#' @param proj a projection accepted by plotly
#' @param title Title of the graphic


where.what <- function (hc = hypercube,
                        test = FALSE,
                       minsamp = 20,
                       mintest = 5,
                       map = world_ctr,
                       proj = 'azimuthal equal area',
                       title = "Where said What ?")
{

tab<-hc
tab$what <-tab$what !="_no_"

tab<-tab[,list(trial = round(sum(news),0),success=round(sum(news*what),0)),by = list(where1)]
ref <-round(sum(tab$success)/sum(tab$trial),4)
tab$null.value<-ref

  tab<-testchi2(tabtest=tab,
                minsamp = minsamp,
                mintest = mintest)



tab<-tab[order(-chi2),]

    if (test==FALSE) {tab$index =tab$salience
                     tab<-tab[tab$trial > minsamp,]
                     mycol<-brewer.pal(7,"YlOrRd")
                     } 
               else {tab$index=tab$p.value
                     tab<-tab[tab$trial*tab$null.value>mintest,]
                      mycol<-rev(brewer.pal(7,"RdYlBu"))
                      mycol[4]<-"lightyellow"
                     }
  

map<-merge(map,tab,all.x=T,all.y=F,by.x="ISO3",by.y="where1")
  


#map2<-map[is.na(map$pct)==F,]
#map2<-st_centroid(map2)
#map2<-st_drop_geometry(map2)


g <- list(showframe = TRUE,
          framecolor= toRGB("gray20"),
          coastlinecolor = toRGB("gray20"),
          showland = TRUE,
          landcolor = toRGB("gray50"),
          showcountries = TRUE,
          countrycolor = toRGB("white"),
          countrywidth = 0.2,
         projection = list(type = proj))



p<- plot_geo(map)%>%
  add_markers(x = ~lon,
              y = ~lat,
              sizes = c(0, 250),
              size = ~success,
#             color= ~signif,
              color = ~index,
              colors= mycol,
              hoverinfo = "text",
              text = ~paste('Location: ',NAME,
                             '<br /> Total news  : ', round(trial,0),
                             '<br /> Topic news : ', round(success,0),
                             '<br /> % observed  : ', round(estimate*100,2),'%',
                             '<br /> % estimated : ', round(null.value*100,2),'%',
                             '<br /> Salience : ', round(salience,2),  
                             '<br /> p.value : ', round(p.value,4))) %>%
  
  layout(geo = g,
         title = title)

p


output<-list("table" = tab, "plotly" =p)

return(output)

}

Example

WHEN.WHO.WHAT

Function

#### ---------------- when.who.what ----------------
#' @title  visualize variation of the topic by media through time
#' @name when.who.what
#' @description create a table of variation of the topic by media through time
#' @param hc an hypercube prepared as data.table
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest sample size of estimate for chi-square test (default = 5)
#' @param title Title of the graphic


when.who.what <- function (hc = hypercube,
                           test = FALSE,
                           minsamp = 20,
                           mintest = 5,
                           title = "What by Whom and When ?")
{

tab<-hc
tab$what <-tab$what !="_no_"
tab<-tab[is.na(when)==F,]



tab<-tab[,list(trial = sum(news),success=round(sum(news*what),0)),by = list(when,who)]
ref<-tab[,list(null.value = round(sum(success)/sum(trial),4)), by = list(who)]
tab<-merge(tab,ref,by="who")

  tab<-testchi2(tabtest=tab,
                minsamp = minsamp,
                mintest = mintest)

    if (test==FALSE) {tab$index =tab$salience
                     tab<-tab[tab$trial > minsamp,]
                     mycol<-brewer.pal(7,"YlOrRd")
                     } 
               else {tab$index=tab$p.value
                     tab<-tab[tab$trial*tab$null.value>mintest,]
                      mycol<-rev(brewer.pal(7,"RdYlBu"))
                      mycol[4]<-"lightyellow"
                     }



 p <- plot_ly(tab,
  x = ~when,
  y = ~who,
  z= ~index,
  sizes = c(0, 250),
  size = ~success,
  colors= mycol,
  hoverinfo = "text",
  text = ~paste('Date: ',when,
                '<br> Media: ',who,
                '<br /> Total news  : ', round(trial,0),
                '<br /> Topic news : ', round(success,0),
                '<br /> % observed  : ', round(estimate*100,2),'%',
                '<br /> % estimated : ', round(null.value*100,2),'%',
                '<br /> Salience : ', round(salience,2),  
                '<br /> p.value : ', round(p.value,4)),
#  name = ~tags,
  type = "heatmap")  %>%
  layout(title = title,
         yaxis = list(title = "media"),
         xaxis = list(title = "time"))
p



output<-list("table" = tab, "plotly" =p)

return(output)

}

Example

WHERE.WHO.WHAT

Function

#### ---------------- where.who.what ----------------
#' @title  visualize variation of the topic by location and media
#' @name where.who.what
#' @description create a table of variation of the topic by location and media
#' @param hc an hypercube prepared as data.table
#' @param maxloc maximum number of location
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest sample size of estimate for chi-square test
#' @param title Title of the graphic


where.who.what <- function (hc = hypercube,
                           maxloc = 15,
                           test = FALSE,
                           minsamp=20,
                           mintest = 5,
                           title = "What by Whom and Where ?")
{

tab<-hc
tab$what <-tab$what !="_no_"

tab<-tab[,list(trial = sum(news),success=round(sum(news*what),0)),by = list(where1,who)]
ref<-tab[,list(null.value = round(sum(success)/sum(trial),4)), by = list(who)]
tab<-merge(tab,ref,by="who")

# selection
sel<-tab[,list(success = sum(success)), by = list(where1)]
sel<-sel[order(-success),]
sel<- sel[1:maxloc,]
tab<-tab[where1 %in% sel$where1,]
tab$trial<-round(tab$trial,0)
tab$success<-round(tab$success,0)


  tab<-testchi2(tabtest=tab,
                minsamp = minsamp,
                mintest = mintest)

    if (test==FALSE) {tab$index =tab$salience
                     tab<-tab[tab$trial > minsamp,]
                     mycol<-brewer.pal(7,"YlOrRd")
                     } 
               else {tab$index=tab$p.value
                     tab<-tab[tab$trial*tab$null.value>mintest,]
                      mycol<-rev(brewer.pal(7,"RdYlBu"))
                      mycol[4]<-"lightyellow"
                     }




 p <- plot_ly(tab,
  x = ~where1,
  y = ~who,
  z= ~index,
  sizes = c(0, 250),
  size = ~success,
  colors= mycol,
  hoverinfo = "text",
  text = ~paste('Location: ',where1,
                '<br>Media: ',who,
                '<br /> Total news  : ', round(trial,0),
                '<br /> Topic news : ', round(success,0),
                '<br /> % observed  : ', round(estimate*100,2),'%',
                '<br /> % estimated : ', round(null.value*100,2),'%',
                '<br /> Salience : ', round(salience,2),  
                '<br /> p.value : ', round(p.value,4)),
#  name = ~tags,
  type = "heatmap")  %>%
  layout(title = title,
         yaxis = list(title = "Host media"),
         xaxis = list(title = "Guest countries"))




output<-list("table" = tab, "plotly" =p)

return(output)

}

Example

WHEN.WHERE.WHAT

Function

#### ---------------- when.where.what ----------------
#' @title  visualize variation of the topic by location through time
#' @name when.where.what
#' @description create a table of variation of the topic by location through time
#' @param hc an hypercube prepared as data.table
#' @param maxloc maximum number of location
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest sample size of estimate for chi-square test
#' @param title Title of the graphic


when.where.what <- function (hc = hypercube,
                           maxloc = 15,
                           test = FALSE,
                           minsamp=20,
                           mintest = 5,
                           title = "What, Where and When ?")
{

tab<-hc
tab$what <-tab$what !="_no_"

tab<-tab[,list(trial = sum(news),success=round(sum(news*what),0)),by = list(where1,when)]
ref<-tab[,list(null.value = round(sum(success)/sum(trial),4)), by = list(where1)]
tab<-merge(tab,ref,by="where1")

# selection
sel<-tab[,list(success = sum(success)), by = list(where1)]
sel<-sel[order(-success),]
sel<- sel[1:maxloc,]
tab<-tab[where1 %in% sel$where1,]
tab$trial<-round(tab$trial,0)
tab$success<-round(tab$success,0)

  tab<-testchi2(tabtest=tab,
                minsamp = minsamp,
                mintest = mintest)

    if (test==FALSE) {tab$index =tab$salience
                     tab<-tab[tab$trial > minsamp,]
                     mycol<-brewer.pal(7,"YlOrRd")
                     } 
               else {tab$index=tab$p.value
                     tab<-tab[tab$trial*tab$null.value>mintest,]
                      mycol<-rev(brewer.pal(7,"RdYlBu"))
                      mycol[4]<-"lightyellow"
                     }



 p <- plot_ly(tab,
  x = ~when,
  y = ~where1,
  z= ~index,
  sizes = c(0, 250),
  size = ~success,
  colors= mycol,
  hoverinfo = "text",
  text = ~paste('Location: ',where1,
                '<br>Date: ',when,
                '<br /> Total news  : ', round(trial,0),
                '<br /> Topic news : ', round(success,0),
                '<br /> % observed  : ', round(estimate*100,2),'%',
                '<br /> % estimated : ', round(null.value*100,2),'%',
                '<br /> Salience : ', round(salience,2),  
                '<br /> p.value : ', round(p.value,4)),
#  name = ~tags,
  type = "heatmap")  %>%
  layout(title = title,
         yaxis = list(title = "Guest countries"),
         xaxis = list(title = "Time"))

p

output<-list("table" = tab, "plotly" =p)

return(output)
}

Example

WHERE.WHERE.WHAT

Function

#### ---------------- where.where.what ----------------
#' @title  visualize variation of the topic by co-location 
#' @name where.where.what
#' @description create a table of variation of the topic by co-location
#' @param hc an hypercube prepared as data.table
#' @param test : visualize test (TRUE) or salience (FALSE)
#' @param minsamp : Threshold of sample size requested for salience computation
#' @param mintest minimum expected size of news for test
#' @param minedge minimum news with topic by edge
#' @param minnode minimum news with topic by node
#' @param title Title of the graphic


where.where.what <- function (hc = hypercube,
                           test=FALSE,
                           minsamp = 20,
                           mintest = 5,
                           minedge = 2,
                           minnode = 10,
                           title = "What, Where and Where ?")
{

  #test...

  
  
  
tab<-hc
tab$what <-tab$what !="_no_"

# Palette


# Create edges
tab1<-tab[where1 !=where2,list(trial = sum(news),success=round(sum(news*what),0)),by = list(where1,where2)]
tab1$null.value<-sum(tab1$success)/sum(tab1$trial)

tab1<-testchi2(tabtest=tab1,
              minsamp = minsamp,
              mintest = mintest)

    if (test==FALSE) {tab1$index =tab1$salience
                     tab1<-tab1[tab1$trial > minsamp,]
                     mycol<-brewer.pal(7,"YlOrRd") }  else
                       {tab1$index=tab1$p.value
                     tab1<-tab1[tab1$trial*tab1$null.value>mintest,]
                      mycol<-rev(brewer.pal(7,"RdYlBu"))
                      mycol[4]<-"lightyellow"
                     }




tab1<-tab1[order(success),]

# Create nodes
tab2<-tab[where1 !=where2,list(trial = sum(news),success=round(sum(news*what),0)),by = list(where1)]
tab2$null.value<-sum(tab2$success)/sum(tab2$trial)
tab2<-testchi2(tabtest=tab2,
                minsamp = minsamp,
                mintest = mintest)

    if (test==FALSE) {tab2$index =tab2$salience
                     tab2<-tab2[tab2$trial > minsamp,]
                     mycol<-brewer.pal(7,"YlOrRd") }  else
                       {tab2$index=tab2$p.value
                     tab2<-tab2[tab2$trial*tab2$null.value>mintest,]
                      mycol<-rev(brewer.pal(7,"RdYlBu"))
                      mycol[4]<-"lightyellow"
                     }

tab2<-tab2[order(success),]




# Build tibble graph object
tib_g=tidygraph::tbl_graph(nodes=tab2,edges=tab1) 

# filter
sel_tib_g <-tib_g %>% activate(edges) %>%
                      filter(success > minedge) %>%
                      activate(nodes) %>%
                      filter(success > minnode) %>%
                      mutate(isol = node_is_isolated()) %>%
                      filter(isol == FALSE)

## Create a ggraph layout
g=sel_tib_g %>% 
   ggraph(layout="stress")

# visualize
gg<-g + geom_edge_link(aes(edge_width=success, edge_colour = index),
                   alpha = 0.3 , show.legend=c(TRUE,FALSE,FALSE, FALSE,FALSE)) +
        scale_edge_colour_gradientn(colors = mycol)+
   geom_node_point(aes(colour = index, size=success), 
                  alpha=0.6) +
        scale_color_gradientn(colors =mycol)+
  geom_node_label(aes(label = where1, size = 2*sqrt(success)),alpha =1,label.size=0.1,show.legend = FALSE)


output<-list("table" = tab1, "plot" =gg)

return(output)
}

TELEMAC (Topic Explorator Lab of European Media Agenda applied to Crisis)

Version 2.2 - June 2021

Claude Grasland, Université de Paris

DATA COLLECTION

Selection of newspapers

Selection of textual units

Distribution of news by country and by week

BUILDING AN HYPERCUBE

Multiple dimensions for the observation of news

Dictionary of countries

Dictionary of border topic

Dictionary of international human mobility

Storage and aggregation ofthe hypercube

EXPLORATION AND VISUALIZATION FONCTIONS

WHY : DEFINITION OF THE RESEARCH QUESTION

Function

WHO.WHAT

Function

Example

WHEN.WHAT

Function

Example

WHERE.WHAT

Function

Example

WHEN.WHO.WHAT

Function

Example

WHERE.WHO.WHAT

Function

Example

WHEN.WHERE.WHAT

Function

Example

WHERE.WHERE.WHAT

Function

Example