For data science in international development, user feedback is essential
This winter, Europeans complained heavily about the absence of snow, making alpine skiing almost impossible in parts of the Alps. Strange new weather patterns, with mild temperatures during the winter are becoming a commonality. However, first world problems like these are a piece of cake, compared to climate developments in Africa. This January, humanitarian agencies alarmed the public about an ongoing drought in East Africa, resulting in a significant deterioration of food security with a severe risk of a widespread famine. Millions of people are in need of immediate food assistance to cope with the unfavorable climate. Even though news coverage remains slim, various new data-heavy approaches aim to tackle the challenge of drought-related famine in developing countries. One of those projects is the ‘Satellite Technologies for Improved Drought Risk Assessment’ (SATIDA).
SATIDA is a research project, supported by Médecins sans Frontières and the Austrian Research Promotion Agency (FFG), which focuses on the utilization of satellite data (such as rainfall, land surface temperature, vegetation status, surface and profile soil moisture) for a better prediction of the region-specific drought and famine risk. The unique angle of this project is its inclusion of socio-economic data, gathered directly via smartphone app in the relevant communities. We had the opportunity to catch up with Markus Enenkel, project lead of SATIDA and current postdoc fellow at Columbia University in New York to talk about the project, the key learning and the role of data science in international development efforts.
idalab: Mr Enenkel, what got you guys started to look into the benefits of satellite data for a better coordination of humanitarian aid?
At the core of the project initiation was the high level of ambiguity between the vast amount of data, which is accessible nowadays and its current usage in daily operations of humanitarian aid organizations. Generally, there is so much satellite data about the climate, but little of it is used outside of scientific research. So we were really eager to bring the knowledge of our domain into direct application and help humanitarian aid organizations to leverage this data in their daily operations. For us, this had direct implications on the research design as we set out to interact directly with the user – in this case the humanitarian aid organization -, and have constant feedback loops to ensure that data will essentially be turned into information.
idalab: When put into the perspective of a data value chain, you are essentially claiming that only real-world applications are turning existing data into value?
Yes, precisely. The activities of data science encompass generally two different domains. First, we have the handling of data, its integration and calibration. The second part, however, the direct translation of data into tools and applications is much more interesting for the user. So at SATIDA we were challenged to not only depend on observational data, climate forecasts and the likes, but to try to establish a link between satellite data, climate shocks and local conditions. Because only if we establish such a link, do we ensure that our findings have a real-world impact.
idalab: Could you elaborate a little bit more on how you approached the integration of climate and socio-economic data?
The climate component of the model revolved around the detection of anomalies in satellite data on dimensions such as precipitation, temperature, soil moisture and vegetation health. But this data is only interesting if put into a realistic context. The socio-economic part of the model concerns the effects of climate shocks on local conditions. The essential problem was that there is no harmonized, long-term dataset, which could potentially display this information. Therefore we teamed up with Médecins sans Frontières to develop a mobile application in order to gather this data in the Central African Republic. Our goal was to essentially detect patterns for the development of scenarios. For example, if at the beginning of the growing season, there is a high soil moisture anomaly, and I have seen such a situation already two years ago and I know how it developed until June. Then I could potentially abstract that towards a prediction on how the situation this year is going to evolve.
idalab: The detection of anomalies was solely focused on satellite data and then ‘validated’ by socio-economic feedback?
Our project was focused on the pre-operational stage. In order to really use both data sources for decision-making, one would need longer time series of socio-economic data, which we didn’t have. So, our project essentially tried to establish the necessary pre-conditions, so that more effective forecasts and predictions are possible in the future. If aid organizations in sensitive African areas would periodically use smartphones to collect information on, for example, malnutrition and price developments then we could slowly but surely build up the necessary datasets we need to allow for a better integration of satellite and socio-economic data.
idalab: Is the absence of coherent data sets one of the major challenges to unlock the potential of data science in international development?
Yes, if you look at the availability of such data sets in Europe and the US, you can imagine that it will not necessarily be better in Africa. Because even in Europe and the US records regarding agricultural losses are oftentimes neither updated nor easy to access. You can imagine that it will certainly not be better in a country, where there is political instability and poverty. There are some agricultural statistics by the Food and Agriculture Organization (FAO), but those are rarely specific enough. In addition, these statistics are not really touching upon information, which we care about: What are the current vulnerabilities? How do people cope with the situation and what mechanisms do they apply to strengthen their resilience? Do they lend money? Do they sell their cattle? Those are things we need to understand, patterns we need to recognize. But the data for those endeavours still has to be collected. And that, I believe is also part of data science in international development cooperation. While we have satellite data of great abundance, this sole existence might be of little value on the ground. It is only beneficial, when accompanied with direct feedback from the user, in order to foster an understanding for the data and to discuss that insights are rarely black and white, but oftentimes gray. We believe that such an endeavor requires a more fine-grained local approach, engaging the relevant stakeholders, in order to enable them to make decisions based on data that are tailored to their needs.
idalab: Could you specify how predictions – mirrored by socio-economic specificities – differed in the regions?
When we gathered data in the Central African Republic in 2015, we found that one third of the people in our assessment had received a food distribution in the previous year. So, we actually assumed that there was a food shortage, which was a direct effect of a drought. But when we looked at the climate data, we found that there was no anomaly whatsoever. In the end, it turned out that the reason for the food shortage was an armed conflict between two rebel groups, which prevented the farmers from cultivating their fields, which led to a decrease in yield. And these are precisely the relationships and impacts, we have to understand.
idalab: You set out to improve the drought risk assessment with a new seasonal forecast. How satisfied were you with the results?
In the monitoring part of the project, we performed quite well. We composed a drought-index for Ethiopia and the Central African Republic, which captured all relevant drought events. Unfortunately, though, the seasonal forecasting did not work out as expected. Essentially it boils down to the question what kind of uncertainty can a seasonal forecast have to still create added value for the user, because it takes some time and effort to integrate such a tool into the decision-making process. And if a seasonal forecast reveals that there is a 80% chance of a very low precipitation in the next few months, this might lead some people to say ‘wow, 80%, we should equip ourselves’. But others might say ‘well, there is a 1-in-5 chance that we get away with doing nothing’. So, how should resources be allocated based on such a forecast? Generally I would claim that in areas of central Africa, where there is a general vulnerability to climate shocks, additional resources are never a bad investment, but in international development, which is very donor-driven, every decision needs to be justified on a solid basis.
idalab: What precisely was your approach on the seasonal forecasting?
We tried to use our combined drought-index of the satellite data regarding precipitation, surface temperature, soil moisture and vegetation health to calibrate seasonal forecasts. So, seasonal forecasts are initially to a large part based on the monitoring index, then there is an overlapping part of monitoring index and seasonal forecast and in the end the forecast is solely seasonal. Assessing our approach from a statistical perspective, we found that the forecasting precision rapidly decreases after 20 to 30 days. Initially, we wanted to have a forecast, which works well for several months. But with our current model, I would not feel confident to make a prediction which goes beyond one month.
idalab: Despite the mentioned forecasting challenges, you are now looking into how to operationalize the research?
We are currently reviewing our findings with Médecins sans Frontières. It should not be our task to continuously go into the field and collect data. Rather, we aim for snowball effects, training local multipliers with the relevant smartphone app and help them engage others. When we did that in the Central African Republic, it worked surprisingly well. Within a few days, the selected individuals were able to gather data without supervision.
idalab: How do you generally see the receptiveness of humanitarian aid organizations towards data-driven approaches?
There might be lot of scepticism, but the key is definitely communication. If you just communicate that you will try to use satellite data in a new and innovative way, juggling with scientific acronyms, you will never receive good feedback. It is essential to involve the organizations into a dialogue, understand their needs and wants. At the same time, it is crucial to honestly communicate existing limitations of your own approach. If people see real value for their daily operations, they are definitely willing to invest time and effort.
idalab: You mentioned the role of donors as well. Are they pushing towards more data science?
As many aid organizations are supported by many individual donors, there is a great focus on results. Unfortunately, research always inherits the possibility of failure – one might conclude that whatever was attempted did not work out. Such information is equally valuable, but might be tough to justify in front of donors. Thus, there is a tendency to stick with the usual approach. In this setting, the role of research-sponsoring organizations is crucial, because otherwise innovation would never spread into these domains.
idalab: Interestingly, though, all humanitarian aid organizations alike could heavily benefit from data. Are there initiatives for a joint effort?
Generally, I think it would be a very efficient approach to identify a pool of organizations with similar demands and then try to develop tools, which cater their needs. Is it possible to develop such local data pools across organizations? I believe it is, even though every organization is highly specific. A flexible approach could accommodate this particular situation. With our app, for example, organizations like the Red Cross, Médecins sans Frontières or the World Food Program could all be involved in gathering core data regarding malnutrition, local prices and coping mechanisms. But all organizations would also have the opportunity to modify and extend the application to collect more specific information, which benefits their organization. While the core data set would remain accessible for everyone, organization could retain rights over the additionally collected data. This would provide an adequate infrastructure for building-up reliable time series of socio-economic data in order to enable more sophisticated analysis.
idalab: Mr Enenkel, thanks for these very interesting insights.