Intro

In this investigation, I use Paul Hensen’s ICOW Colonial History Data Set which you can find hosted (here), a copy of which I’ve hosted on GitHub (here).

The ICOW colonial history data set attempts to identify colonial or other dependency relationships for each state over the past two centuries. This includes states that have ruled each state as a colony, dependency, League of Nations mandate, UN trust territory, or other type of possession, as well as states that have seceded from existing states and states that have merged into existing states. This should be most useful for analyses of a variety of possible propositions on the general impact of colonial rule… and similar topics.

I look at potential relationships between variables in icow and certain socio-economic factors such as:

To be completed:


Data & Prep

All code is included in the appendix.

Load libraries

## [1] "loaded knitr, tidyr, dplyr, ggplot2, stringr, lubridate, mosaic, countrycode, viridis, and rvest libraries"

Load and prep icow

This Colonial History dataset (icow) features 217 observations across 15 variables, which are described in detail in the accompanying markdown file (here). We write a new variable here Year of Independence which is a R-friendly conversion of IndDate.

Let’s take a quick look at the variables we have available:

#dimensions and variables
dim(icow); names(icow)
## [1] 217  16
##  [1] "State"                "Name"                 "ColRuler"            
##  [4] "IndFrom"              "IndDate"              "IndViol"             
##  [7] "IndType"              "SecFrom"              "SecDate"             
## [10] "SecViol"              "Into"                 "IntoDate"            
## [13] "COWsys"               "GWsys"                "Notes"               
## [16] "Year of Independence"

Duplicate some icow entries

Because of inconsistent country naming on Wikipedia, e.g. United States and United States of America, or Congo, Dem. Rep. and Republic of Congo, I duplicated some entries so dplyr::inner_join() could have a common key to join them by. I considered using adist() which measures approximate string distance, but it seemed more trouble than it was worth because of the many edge cases for which it might not work. I later do use the countrycode package for converting from wrld_data('world') to icow friendly codes.

#dim after duplications
dim(icow) 
## [1] 243  16

Scraping function, etc.

I create two functions, scrapefun() which itself calls clean(). clean() takes a column (vector) and strips any ()’s or []’s from every entry, then returning the result. This is necessary because many country names in Wikipedia’s tables are appended by additional names in parentheses or footnotes, e.g. “Serbia [5]”.

Data Scraping

I then scraped the datasets outlined in the intro.


Investigation

ICOW - a first look, Year of Independence

Year of Independence stands out as an interesting place to first explore the icow dataset. Observing the data on a continuous scale, we observe peaks starting in the 1800s, but the picture isn’t too clear. Looking only at decades with cases of countries becoming independent provides better context.



Clearly, more countries (as we know them) became independent in the last few centuries, especially in the period 1950-1990, while the 1960s stands out as the decade with the most countries becoming independent.

This can nicely be related to Samuel Huntington’s ‘Waves of Democracy’ model, which describes three major global waves of democratization: the first taking place in the nineteenth century, the second following WW2, and the third wave starting in the 1970’s. You can read more about his hypothesis (here). This supports the idea that democracy is correlated with independence.

This effect can be seen by mapping the cumulative number of independent countries as denoted by their Year of Independence against the cumulative number of democracies.

Data on the latter is available thanks to the work of C. Boix, M. Miller, S. Rosato, for creating A Complete Dataset of Political Regimes, 1800-2007, the most complete dataset on democracies, paper here. Copy of raw count data here.