Collection of Open Source GIS Work
Replication of
Original study by Malcomb, D. W., E. A. Weaver, and A. R. Krakowka. 2014. Vulnerability modeling for sub-Saharan Africa: An operationalized approach in Malawi. Applied Geography 48:17–30. DOI:10.1016/j.apgeog.2014.01.004
Replication Authors: Alitzel Villanueva, Joseph Holler, Kufre Udoh, Open Source GIScience students of fall 2019, and group collaborators: Drew An-Pham, Emma Clinton, Jacob Freedman, Maja Cannavo, and Nick Nonnenmacher.
Replication Materials Available at: Re-Malcomb
Created: 25 May 2021
Revised: 26 May 2021
The original study is a multi-criteria analysis of vulnerability to Climate Change in Malawi, and is one of the earliest sub-national geographic models of climate change vulnerability for an African country. The study aims to be replicable, and had 40 citations in Google Scholar as of April 8, 2021.
The study region is the country of Malawi. The spatial support of input data includes DHS survey points, Traditional Authority boundaries, and raster grids of flood risk (0.833 degree resolution) and drought exposure (0.416 degree resolution).
The original study was published without data or code, but has detailed narrative description of the methodology. The methods used are feasible for undergraduate students to implement following completion of one introductory GIS course. The study states that its data is available for replication in 23 African countries.
This section was written collaboratively with: Maja Cannavo, Emma Clinton, Jacob Freedman, Nick Nonnenmacher, and Drew An-Pham.
Demographic and Health Survey data are a product of the United States Agency for International Development (USAID). Variables contained in this dataset are used to represent adaptive capacity (access + assets) in the Malcomb et al.’s (2014) study. These data come from survey questionnaires with large sample sizes. The DHS data used in our study were collected in 2010. In Malawi, the provenance of the DHA data dates back as far as 1992, but has not been collected consistently every year. Each point in the household dataset represents a cluster of households with each cluster corresponding to some form of census enumeration units, such as villages in rural areas or city blocks in urban areas DHS GPS Manual. This means that each household in each cluster has the same GPS data. This data is collected by trained USAID staff using GPS receivers. Missing data is a common occurrence in this dataset as a result of negligence or incorrect naming. However, according to the DHS GPS Manual, these issues are easily rectified and typically sites for which data does not exist are recollected. Sometimes, however, missing information is coded in as such or assigned a proxy location. The DHS website acknowledges the high potential for inconsistent or incomplete data in such broad and expansive survey sets. Missing survey data (responses) are never estimated or made up; they are instead coded as a special response indicating the absence of data. As well, there are clear policies in place to ensure the data’s accuracy. More information about data validity can be found on the DHS’s Data Quality and Use site. In this analysis, we use the variables listed in Table 1 to determine the average adaptive capacity of each TA area. Data transformations are outlined below.
Transformations:
Table 1: DHS Variables used in Analysis
Variable Code | Definition |
---|---|
HHID | “Case Identification” |
HV001 | “Cluster number” |
HV002 | Household number |
HV246A | “Cattle own” |
HV246D | “Goats own” |
HV246E | “Sheep own” |
HV246G | “Pigs own” |
HV248 | “Number of sick people 18-59” |
HV245 | “Hectares for agricultural land” |
HV271 | “Wealth index factor score (5 decimals)” |
HV251 | “Number of orphans and vulnerable children” |
HV207 | “Has Radio” |
HV243A | “Has a Mobile Telephone” |
HV219 | Sex of Head of Household” |
HV226 | “Type of Cooking Fuel” |
HV206 | “Has electricty” |
HV204 | “Time to get to Water Source” |
Demographic and Health Surveys Program
Malawi Traditional Authorities
Major Lakes
Clara R. Burgert Blake Zachary Josh Colston — Authors of DHS document
The Livelihood zone data is created by aggregating general regions where similar crops are grown and similar ecological patterns exist. This data exists originally at the household level and was aggregated into Livelihood Zones. To construct the aggregation used for “Livelihood Sensitivity” in this analysis, we use these household points from the FEWSnet data that had previously been aggregated into livelihood zones. The four Livelihood Sensitivity categories are 1) Percent of food from own farm (6%); 2) Percent of income from wage labor (6%); 3) Percent of income from cash crops (4%); and 4) Disaster coping strategy (4%). In the original R script, household data from the DHS survey was used as a proxy for the specific data points in the livelihood sensitivity analysis (transformation: Join with DHS clusters to apply LHZ FNID variables). With this additional FEWSnet data at the household level, we can construct these four livelihood sensitivity categories using existing variables (Table 1). The LHZ data variables are outlined in Table 2. The four categories used to determine livelihood sensitivity were ranked from 1-5 based on percent rank values and then weighted using values taken from Malcomb et al. (2014).
Table 2: Constructing livelihood sensitivity categories
Livelihood Sensitivity Category (LSC) | Percent Contributing | How LSC was constructed |
---|---|---|
Percent of food from own farm | 6% | Sources of food: crops + livestock |
Percent of income from wage labor | 6% | Sources of cash: labour etc. / total * 100 |
Percent of income from cash crops | 4% | sources of cash (Crops): (tobacco + sugar + tea + coffee) + / total sources of cash * 100 |
Disaster coping strategy | 4% | Self-employment & small business and trade: (firewood + sale of wild food + grass + mats + charcoal) / total sources of cash * 100 |
Variables
Transformations
Create ecological areas: LHZ boundaries intersected with TA boundaries to clip out park/conservation boundaries and rename those park areas with the park information from TA data), combined with lake data to remove environmental areas from the analysis
Physical Exposure: Floods
This dataset stems from work collected by multiple agencies and funneled into the PREVIEW Global Risk Data Platform, “an effort to share spatial information on global risk from natural hazards.” The dataset was designed by UNEP/GRID-Europe for the Global Assessment Report on Risk Reduction (GAR), using global data. A flood estimation value is assigned via an index of 1 (low) to 5 (extreme).
Physical Exposure: Physical exposition to droughts events 1980-2001
This dataset uses the Standardized Precipitation Index to measure annual drought exposure across the globe. The Standardized Precipitation Index draws on data from a “global monthly gridded precipitation dataset” from the University of East Anglia’s Climatic Research Unit, and was modeled in GIS using methodology from Brad Lyon at Columbia University. The dataset draws on 2010 population information from the LandScanTM Global Population Database at the Oak Ridge National Laboratory. Drought exposure is reported as the expected average annual (2010) population exposed. The data were compiled by UNEP/GRID-Europe for the Global Assessment Report on Risk Reduction (GAR). The data use the WGS 1984 datum, span the years 1980-2001, and are reported in raster format with spatial resolution 1/24 degree x 1/24 degree.
Variables
Transformations
The original study was conducted using ArcGIS and STATA, but does not state which versions of these software were used. The replication study will use R.
Figure 1. Map of average Malawi adaptive capacity by traditional authority in 2010, similar to Figure 4 in Malcomb et al.
Figure 2. Map of difference between original average adaptive capacity scores and my replication average scores.
Overall, there was little difference between the original map and my replication for adaptive capacity (Figure 1) as seen in Figure 2 where much of the map is covered in minor differences between -1 and 0. The Spearman’s Rho value of difference in adaptive capacity was pretty close to 1, at 0.7497 with a p-value of 0, indicating that the Malcomb et al was supported by my replication of mapping adaptive capacity. These similarities between the original study and my replication were predicted due to the lack of planned deviation from Malcomb et al because both the assets and access data were provided in full, allowing for a better replication with all the data for adaptive capacity.
Figure 3. Map of Malawi Vulnerability to Climate Change, similar to figure 5 in Malcomb et al.
Figure 4. Map of difference between vulnerability from Malcomb et al and vulnerability from my replication.
Figure 5. Scatterplot of vulnerability difference between my replication and Malcomb et al.
Malcomb et al. was not supported by my replication (Figure 3), given the Spearman’s rho value was 0.2567068 with a p-value of 0. This significant difference between Malcomb et al’s vulnerability map and my replication is not clear in Figure 4, however, the lack of any trend in Figure 5 demonstrates the lack of correlation between the original study and my replication more explicitly. This difference between Figure 3 and the original vulnerability map was not surprising given the lack of data from Malcomb et al. This replication attempted to add to the work done previously by Kufre Udoh and Joe Holler to reproduce Malcomb et al’s findings. However, this was difficult to do without the livelihood sensitivity data that comprised part of the vulnerability information along with the physical exposure data that was provided by Malcomb et al. The ambiguity of the livelihood sensitivity data proved it difficult to accurately reproduce Malcomb et al, explaining why the vulnerability maps are so different between my replication and the original study.
Given just the Malcomb et al paper, it was difficult to know exactly which steps were necessary to fully reproduce this study, much less know the order by which each process would need to take place. The methods from Malcomb et al were unclear about much of the necessary detail to plan a workflow for a successful reproduction of this study. Although we were able to gather that aggregation, rasterization, and even calculation were necessary steps, without the data our preliminary workflow did not look anything like the final workflow that we used to replicate previous reproductions. Once we had access to most of the data used by Malcomb et al we were able to better account for the different levels of data being used due to privacy issues as well as see the effects of the holes in Malcomb et al’s methodology that failed to explain how they normalized indicators for households into 5 categories, scaling from 0 to 5, which is in fact 6 groups. These details were initially hard to account for without looking at the metadata and datasets that were utilized.
This replication was only half a failure given the high Spearman’s rho value for the replication of the adaptive capacity for traditional authorities despite the low Spearman’s rho value for vulnerability. The success of replicating the adaptive capacity portion of Malcomb et al can be attributed to the data provided that did not deviate from the dataset used in the original study. It is possible the Spearman’s rho value was not closer to 1 for adaptive capacity due to the overall lack of code that meant it was impossible to recreate the study exactly as Malcomb et al had. Overall though, the adaptive capacity replication was significantly similar to the original study enough to make that part a success undoubtedly.
Conversely, the vulnerability replication was undoubtedly a failure due to two main reasons: the lack of livelihood data used by Malcomb et al and the lack of details in the original analysis that created uncertainty in how to successfully reproduce their original results. As I mentioned earlier, it was no surprise the vulnerability replication had such a low Spearman’s rho value because we were never given the same dataset for the livelihood data which meant certain error in my replication. Additionally the lack of code from Malcomb et al overall negatively affected how well this study could be reproduced because even if they had not gone into enough detail about their methods in their paper the code could have remedied a lot of the vagueness or even contradictory nature of the methods. Due to the lack of code, the reliance on only the methods meant if there was any chance of a successful complete replication of household reliance there would have had to have been extreme detail and care taken to provide unquestionable methods. Throughout this process of attempting to replicate Kufre Udoh and Joe Holler’s reproduction of Malcomb et al, it has been apparent how the overall vagueness of Malcomb et al’s methods has been to the detriment of this replication.
This replication has been a great learning experience as I have seen the importance of a well-written methodology no matter the content of the study. With the rise in open source science, there is a possibility for more studies to be held accountable and reproduced for further validity as well as an opportunity to apply certain methodologies to other areas of study. This means there is a need to have a clear methodology and/or easy accessibility to the computation of the data analyzed. This is seen clearly as we were able to more or less successfully replicate the adaptive capacity portion of household resilience due largely to the availability of the data and clarity of the methods for that. However, the vulnerability part of the household resilience shows the importance of better documentation and openness in how a study was conducted if this methodology were to be applied elsewhere, seeing as climate change is a universal issue currently.
Malcomb, D. W., E. A. Weaver, and A. R. Krakowka. 2014. Vulnerability modeling for sub-Saharan Africa: An operationalized approach in Malawi. Applied Geography 48:17–30. DOI:10.1016/j.apgeog.2014.01.004
This template was developed by Peter Kedron and Joseph Holler with funding support from HEGS-2049837. This template is an adaptation of the ReScience Article Template Developed by N.P Rougier, released under a GPL version 3 license and available here: https://github.com/ReScience/template. Copyright © Nicolas Rougier and coauthors. It also draws inspiration from the pre-registration protocol of the Open Science Framework and the replication studies of Camerer et al. (2016, 2018). See https://osf.io/pfdyw/ and https://osf.io/bzm54/
Camerer, C. F., A. Dreber, E. Forsell, T.-H. Ho, J. Huber, M. Johannesson, M. Kirchler, J. Almenberg, A. Altmejd, T. Chan, E. Heikensten, F. Holzmeister, T. Imai, S. Isaksson, G. Nave, T. Pfeiffer, M. Razen, and H. Wu. 2016. Evaluating replicability of laboratory experiments in economics. Science 351 (6280):1433–1436. https://www.sciencemag.org/lookup/doi/10.1126/science.aaf0918.
Camerer, C. F., A. Dreber, F. Holzmeister, T.-H. Ho, J. Huber, M. Johannesson, M. Kirchler, G. Nave, B. A. Nosek, T. Pfeiffer, A. Altmejd, N. Buttrick, T. Chan, Y. Chen, E. Forsell, A. Gampa, E. Heikensten, L. Hummer, T. Imai, S. Isaksson, D. Manfredi, J. Rose, E.-J. Wagenmakers, and H. Wu. 2018. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour 2 (9):637–644. http://www.nature.com/articles/s41562-018-0399-z.