Large, publicly available data sets are increasingly being used to develop artificial intelligence (AI) and other technologies to improve the diagnosis of diseases and track their spread in communities. They are also being mobilised in the fight against COVID-19 to predict the burden of illness and determine which populations and social groups require the most help.
Yet many communities, including large areas of the Global South, are not represented in the data sets these technologies are based on. This so-called “data poverty” could result in a growing digital divide between rich countries with advanced data infrastructure and poorer ones, thereby exacerbating existing health inequalities.
Many communities are not represented in the data sets these technologies are based on. “Data poverty” could result in a growing digital divide, thereby exacerbating existing health inequalities.
Data poverty and COVID-19
An absence of reliable data is already impeding the global response to COVID-19. Countries’ abilities to test their populations for current infections or antibodies that might indicate previous infection, varies – as does their willingness to publicly share this information. In some cases, this lack of data may result in the severity of the pandemic being underestimated or be used to prop up false claims, such as Africans being immune to the virus.
Within populations, there are also communities that we know particularly little about, such as undocumented migrants and refugees; victims of domestic violence; and people working in unstable or precarious conditions, such as sex workers or farmhands. This data gap may result in such groups being excluded from pandemic recovery plans or economic subsidies.
Seeking to address some of these data gaps, activists in certain countries have begun mining their own data to document the spread of COVID-19. For instance, in China, citizen journalists have taken to sharing on-the-ground information about infection rates and lockdown measures.
Marginalised communities aren’t the only people being affected by this COVID-19 data gap. According to a report by Data2X and Open Data Watch, many countries, including upper middle- and high-income ones, are struggling to report infection rates by sex, which is important because we know the virus affects men differently from women.
Neither is there enough data on the impact of the pandemic on the well-being of women and girls and domestic violence, the report says. Collecting this information is important, because other studies suggest that women are shouldering the brunt of the emotional stress related to COVID-19, in addition to job and income loss and increases in unpaid work, such as educating children at home during lockdowns and school closures.
Health tech and the digital divide
AI is also being developed to improve other areas of health. For instance, algorithms are being developed to aid the detection of breast cancer from mammograms and pneumonia from chest radiographs. It is also increasingly being used in resource-poor settings such as the Philippines, where researchers have developed machine learning systems to identify weather and land-use patterns associated with the transmission of dengue, and South Africa, where AI has been developed to help predict cholera outbreaks. For this technological revolution to proceed equitably, access to broad data sets covering different geographical locations and diverse populations is essential.
Focus on eye health
Some of the pitfalls of limited access to data can be seen in a new study that examines the application of machine learning to eye health. Ophthalmology is considered particularly suitable for the development of machine learning tools because imaging is already widely used in the diagnosis of eye disease. Technology companies have therefore been developing automated tools for assessing and tracking the health of patients’ eyes, training their algorithms on publicly available data sets.
Researchers at the University of Birmingham, UK, carried out a global search to explore the extent to which these data sets represented the diversity and needs of the world’s population. Most of the data sets they identified came from populations in Asia and Europe, with very few covering sub-Saharan Africa and South America. In addition, closer analysis revealed that within each data set, information such as individuals’ age, sex and ethnicity was often missing – although the reason for this was unclear.
Another issue was disparities in the disease types covered by the data sets. Most of the images they contained related to diseases such as diabetic retinopathy, glaucoma and age-related macular degeneration – possibly because they are more common in wealthy countries that possess the health infrastructure to routinely collect such images.
However, eye diseases like cataracts, trachoma and refractive error – which account for half of all global blindness and which the World Health Organization has designated as priority diseases – were under-represented. Digital technology could help improve the diagnosis and detection of such diseases in low- and middle-income countries, but only if the relevant data is gathered from across societies and made publicly available for developing and training these AI-based tools.