SDG Indicator Direction Estimation

This is the second blog post about thinking of the Sustainable Development Goals(SDGs) in network terms. Please see the first one to understand the general context of my interest in the SDGs.

Anyway, so the big list is 17 Goals. Each goal, has numeric Targets to achieve. Targets have indicators, also numeric (and many of the indicators can be calculated in many ways, but let’s not worry about that for now). I think the idea is loosely based around a temporal, or cause-and-effect (dare I say “causal”) structure, where we can measure indicators quarterly or yearly, measure the target progress every few years, and Goals I guess in 2030, when we either hit them or not. Or a causal version of that. I don’t want to get into the theory of the structure just yet… there’s a simpler question to consider:

How can we evaluate the effect of a specific indicator, or a collection of indicators, if the concept of a “target” maps to a separate location in the ontology? Said another way… if “the proportion of people catching a disease goes up… is this good or bad?” Answering that question is easy, but there’s over 250 indicators, and I want to know if “up is good or bad” for them all indicators. You will see why I wanted this in my next blogpost.

For this one, let’s just say I wanted to come up with a mapping of whether each indicator was “up is bad” or “up is good”. But I’m also lazy and don’t want to manually map 250+ indicators. How would you accomplish this task?

Bing!

Well, sentiment analysis, that’s how! At least that was my fist port of call, and by golly was it a good one!

Sentiment analysis basically determines whether the “sentiment” of a group of words is either good or bad… although there are other types that map out to more nuanced feelings. For this purpose, I just evaluated each indicator name, and if it was negative, I assigned it the title “up is bad”, and viceversa. I used the “Bing” library, for those of you that might be interested.

After I finished this first pass, I did look at every single indicator and mapped a few corrections. In the end, the sentiment analysis approach was right over 70% of the time!

Just goes to show that even poorly utilized tools can save a ton of time! Here are a few examples:

suppressMessages(library(tidyverse))
suppressMessages(library(gt))
df <- read_csv("data/indicator_directionality_final.csv")

df %>% sample_n(10) %>% gt::gt()

ind_code	ind	indicators	mean_sent	n	why	up_is	reviewed_up_is
SL_ISV_IFEM	8.3.1	Proportion of informal employment, by sector and sex (ILO harmonized estimates) (%)	1	11	harmon_positive	good	good
SP_ACS_BSRVH2O	1.4.1	Proportion of population using basic drinking water services, by location (%)	1	10	us_positive	good	good
SG_NHR_IMPLN	16.a.1	Countries with National Human Rights Institutions in compliance with the Paris Principles, A status (1 = YES; 0 = NO)	1	17	human_positive; right_positive; principl_positive	good	good
DC_TOF_TRDDBMDL	8.a.1	Total official flows (disbursement) for Aid for Trade, by donor countries (millions of constant 2019 United States dollars)	0	17	offici_negative; state_positive	bad	good
SG_STT_NSDSIMPL	17.18.3	Countries with national statistical plans that are under implementation (1 = YES; 0 = NO)	NA	13	NA	NA	good
GB_XPD_RSDV	9.5.1	Research and development expenditure as a proportion of GDP (%)	NA	8	NA	NA	good
SE_GCEDESD_CUR	4.7.1	Extent to which global citizenship education and education for sustainable development are mainstreamed in curricula	1	14	educ_positive; sustain_positive	good	good
DC_ODA_POVDLG	1.a.1	Official development assistance grants for poverty reduction, by donor countries (percentage of GNI)	-1	13	offici_negative; poverti_negative	bad	good
VC_DSR_HOLH	1.5.2	Direct economic loss in the housing sector attributed to disasters (current United States dollars)	0	14	econom_positive; loss_negative; disast_negative; state_positive	bad	bad
EG_IFF_RANDN	7.a.1	International financial flows to developing countries in support of clean energy research and development and renewable energy production, including in hybrid systems (millions of constant United States dollars)	1	23	support_positive; clean_positive; renew_positive; product_positive; state_positive	good	good

And the full list can be downloaded here. I’m providing the internal analysis columns for fun, but the column that contains the final directions* are “reviewed_up_is”.

Please use caution, I did the check rapidly and thus I don’t guarantee the accuracy of the indicator directions. Use at your own risk.

Gratuitous Viz

Let’s take a look at what these directions look like, organized by goal. Just for fun!

Zoom in to see the Indicator numbers!

## First, create a df for the nodes to coexist:
df_proc <- df %>% select(ind, up_is = reviewed_up_is) %>% 
  mutate(root = gsub("\\..+", "", ind),
         second_level = gsub("..$", "", ind))

## Then the full edgelist is as follows.. but also give it a friendly start so they aren't all individual
edgelist <- bind_rows(
 df_proc %>% select(from = root, to = second_level),
 df_proc %>% select(from = second_level, to = ind),
 tibble(from = "SDGs", to = 1:17 %>% as.character)
)

## and easy-processed, it's:
g <- easyNetwork::edgeListToNodesEdges(edgelist)

## now, finally, let's correct the color... adding red if it's bad to go up, and green if it's good. In several cases, indicators have multiple directions due to different series. Since it doesn't REEEALLY matter, this is just visual candy, just select the first indicator row for each (but do use the full csv file for anything actually important).

g$nodes <- g$nodes %>% select(-color) %>% 
  left_join(df_proc %>% select(ind, color = up_is) %>% 
              mutate(color = ifelse(color == "good", "green", "red")) %>% 
              group_by(ind) %>% slice(1) %>% ungroup, by = c("name" = "ind")) %>% 
  ## and clean up colors for non-indicators, and sizes for all:
  mutate(color = ifelse(is.na(color), "grey", color), value = 1)

g$edges$value <- 1

library(visNetwork)
visNetwork(g$nodes, g$edges)

Conclusion

So there you have it.. indicator directions! Done. Useful on their own? No, I don’t think so… but I have plans for them! Stay tuned for the subsequent post in this series!