Lai, Yingsi. Bayesian geostatistical and mathematical models to assess the geographical distribution of neglected tropical diseases. 2016, Doctoral Thesis, University of Basel, Faculty of Science.
|
PDF
8Mb |
Official URL: http://edoc.unibas.ch/diss/DissB_11749
Downloads: Statistics Overview
Abstract
Neglected tropical diseases (NTDs) are a group of communicable diseases affecting more than one billion of the world’s poorest population. Soil-transmitted helminth infections, schistosomiasis, and foodborne trematodiasis are among the most important NTDs. Soil-transmitted helminth infections are caused by a group of parasite nematode worms (i.e., Ascaris lumbricoides, Trichuris trichiura, and hookworm) through contact with parasite eggs or larvae which thrive in warm and moist soil. They are widely endemic in the tropics and sub-tropics and ranked on the top among all NTDs burden, contributing to the global disease burden with 5.2 million disability-adjusted life years (DALYs). Schistosomiasis is caused by trematode parasites of the genus Schistosoma. It is the second highest in terms of NTD burden and responsible for around 3.3 million DALYs worldwide. More than 90% of schistosomiasis cases occur in Africa. Clonorchiasis is one of the most important foodborne trematodiasis and it is caused by infection with the Chinese liver fluke, Clonorchis sinensis. China accounts for around 85% of the global infected people and most cases occur in the southern and the northeastern parts of the country. For all the three diseases, preventive chemotherapy is advocated by WHO as a key strategy for morbidity control. Furthermore, integrated approaches are highly recommended to achieve sustainable control and elimination. Such approaches may include preventive chemotherapy in combination with improvement of water, sanitation, and hygiene, as well as better information, education, and communication.
To implement control strategies cost-effectively, high-resolution maps depicting the geographical distribution of disease risk are important. These maps provide useful information for spatial targeting of control measures and for long-term monitoring and surveillance. Geostatistical modeling is the most rigorous inferential approach for high-resolution risk mapping of NTDs. It is a data-driven approach, which relates georeferenced disease data (usually point-referenced) with potential predictors (e.g., environmental and socioeconomic factors) that are considered important for disease transmission. Location-specific random effects can explain geographical variation in the data, assuming that neighboring areas have similar infection status due to common disease exposures they receive. Geostatistical models are highly parameterized, however Bayesian model formulations provide a flexible inferential framework and powerful computational tools such as Markov chain Monte Carlo (MCMC) simulation or approximations (e.g., integrated nested Laplace approximation (INLA)) are applied for model fit.
A good coverage and a fine amount of disease data are necessary to capture the spatial heterogeneity of the infection risk. Due to lack of large surveys covering the whole study region, this PhD thesis is based on historical survey data that are compiled via bibliometric searches. Publications however are either report the survey data as point-referenced (with geographical information at the survey location) or as areal, aggregated over several locations within an administrative level (e.g., county or district). The areal data can provide useful information especially when the spatial coverage of point-referenced data is low. Geostatistical model for jointly analysing point-level and areal survey data are not available. Furthermore, historical data are generated from studies with different designs between locations, including different population age-groups. Geostatistical models that align survey data across locations to a common age group do not exist in the field of NTDs. Ignoring the age-heterogeneity of the data can lead to biased estimation because models cannot distinguish whether risk differences between locations is due to differences in age or to exposures. Mathematical models can be used to age-align the surveys, but there is no model formulation allowing changes of the shape of the age-prevalence curve over space as a result of the varying endemicity.
The overall goal of the thesis is to develop Bayesian geostatistical and mathematical models for analysing georeferenced NTD survey data and to provide tools and knowledge for disease control and prevention.
In Chapter 2 surveys pertaining to soil-transmitted helminth infections in People’s Republic of China (P.R. China) were compiled. Bayesian geostatistical models were developed and used to estimate the disease risk throughout the country at high spatial resolution. Advanced Bayesian variable selection methods were employed to identify the most important predictors. Results indicate that the prevalence of soil-transmitted helminth infections in P.R. China considerably decreased from 2005 onwards. Yet, some 144 million people were estimated to be infected in 2010. High prevalence (>20%) was predicted in large areas of Guizhou and the southern part of Hubei and Sichuan provinces for Ascaris lumbricoides infection, in large areas of Hainan, the eastern part of Sichuan, and the southern part of Yunnan provinces for hookworm infection, as well as in a few small areas of south P.R. China for Trichuris trichiura infection.
In Chapter 3 a systematic review was carried out to identify prevalence surveys to soil-transmitted helminth infections in South Asia. Bayesian geostatistical models were applied to identify important environmental and socioeconomic predictors, and to estimate infection risk at high spatial resolution across the study region. Results show that 397 million of South Asia population was infected with at least one species of soil-transmitted helminths in 2015. A. lumbricoides was the most common infection species. Moderate to high prevalence (>20%) of any soil-transmitted helminth infection was predicted in the northeastern part and some northern areas of the study region as well as the southern coastal-line areas of India. The annual treatment needs for the school-aged population requiring preventive chemotherapy was estimated at 187 million doses. The study highlights the need for up-to-date surveys to accurately evaluate the disease burden in the region.
In Chapter 4 georeferenced survey data of C. sinensis infection were obtained via a systematic review and additional data were provided by the National Institute of Parasitic Diseases, Chinese Center for Diseases Control and Prevention. Bayesian geostatistical models were applied to quantify the relation between infection risk and important predictors, and to predict the risk of infection across P.R. China at high spatial resolution. The results show an increasing risk of C. sinensis infection over time, particularly from 2005 onwards, which urges the Chinese government to pay more attention on the public health importance of the diseases. Highly endemic areas (>20%) were concentrated in southern and northeastern parts of the country. The provinces with the highest risk of infection and the largest number of infected people were Guangdong, Guangxi and Heilongjiang.
In Chapter 5 a systematic review was conducted to identify relevant surveys pertaining to prevalence of Schistosoma infection in sub-Saharan Africa. Bayesian geostatistical meta-analysis and rigorous variable selection were used to obtain up-to-date risk estimates of schistosomiasis at high spatial resolution, based on environmental and socioeconomic predictors. The literature search identified Schistosoma haematobium and Schistosoma mansoni surveys at 9,318 and 9,140 unique locations, respectively. Results show a decreased infection risk from 2000 onwards, yet suggesting that 163 million Africans were infected in 2012. Mozambique had the highest prevalence of Schistosoma infection among 44 countries of sub-Saharan Africa. Annualised treatment needs with praziquantel were estimated at 123 million doses for school-aged children and 247 million for the entire population.
In Chapter 6 a Bayesian geostatistical modeling approach was developed to analyse jointly areal and point-referenced survey data. We assumed that the point-referenced data arise from a binomial distribution and that the aggregated area data follow a Poisson binomial distribution which was approximated by a two parameter shifted binomial distribution. Results from extensive simulations shows that our proposed model has better predictive ability and improves parameter estimation compared to models that treat area data as points, located at the centroid of the areas. We applied the new model to obtain high spatial resolution estimates of the infection risk of clonorchiasis in an endemic region of P.R. China.
In Chapter 7 we integrated geostatistical and mathematical transmission models of schistosomiasis within a single model formulation to analyse age-heterogenous S. mansoni data from Côte d’Ivoire. A series of age-specific risk maps of S. mansoni infection in Côte d’Ivoire were produced at high geographical resolution, which allow us to identify the most important age groups of the population to treat at a given place. We predicted that the infection risk reached the peak at younger ages in high risk areas and at older ages in low risk areas. Moreover, a more rapid decline rate of infection risk was observed at older ages in high risk areas compared to that in moderate and low risk ones.
In summary, this PhD thesis contributes to the fields of spatial statistics and of epidemiology of NTDs with (i) statistical methodology for modeling spatially-structured disease data, having heterogeneous geographical support (i.e., georeferenced at point or area level) across the study region and they are collected over different age groups between locations, (ii) applications on soil transmitted helminth infections, schistosomiasis, and clonorchiasis in sub-Saharan Africa, South Asia, and P.R. China, to obtain spatially explicit estimates of disease risk, number of infected people, and annual treatment needs for preventive chemotherapy at different administrative levels, and (iii) large amount of geo-referenced data on NTD surveys conducted at over 10,750 unique locations that are available via the open access Global Neglected Tropical Diseases Database (GNTD). The innovative statistical methodology for analysing historical survey data, heterogeneous in space can be readily applied to other disease survey data. The up-to-date, model-based, high-resolution risk maps and estimates of treatment needs provide useful tools and information for guiding disease control and interventions.
To implement control strategies cost-effectively, high-resolution maps depicting the geographical distribution of disease risk are important. These maps provide useful information for spatial targeting of control measures and for long-term monitoring and surveillance. Geostatistical modeling is the most rigorous inferential approach for high-resolution risk mapping of NTDs. It is a data-driven approach, which relates georeferenced disease data (usually point-referenced) with potential predictors (e.g., environmental and socioeconomic factors) that are considered important for disease transmission. Location-specific random effects can explain geographical variation in the data, assuming that neighboring areas have similar infection status due to common disease exposures they receive. Geostatistical models are highly parameterized, however Bayesian model formulations provide a flexible inferential framework and powerful computational tools such as Markov chain Monte Carlo (MCMC) simulation or approximations (e.g., integrated nested Laplace approximation (INLA)) are applied for model fit.
A good coverage and a fine amount of disease data are necessary to capture the spatial heterogeneity of the infection risk. Due to lack of large surveys covering the whole study region, this PhD thesis is based on historical survey data that are compiled via bibliometric searches. Publications however are either report the survey data as point-referenced (with geographical information at the survey location) or as areal, aggregated over several locations within an administrative level (e.g., county or district). The areal data can provide useful information especially when the spatial coverage of point-referenced data is low. Geostatistical model for jointly analysing point-level and areal survey data are not available. Furthermore, historical data are generated from studies with different designs between locations, including different population age-groups. Geostatistical models that align survey data across locations to a common age group do not exist in the field of NTDs. Ignoring the age-heterogeneity of the data can lead to biased estimation because models cannot distinguish whether risk differences between locations is due to differences in age or to exposures. Mathematical models can be used to age-align the surveys, but there is no model formulation allowing changes of the shape of the age-prevalence curve over space as a result of the varying endemicity.
The overall goal of the thesis is to develop Bayesian geostatistical and mathematical models for analysing georeferenced NTD survey data and to provide tools and knowledge for disease control and prevention.
In Chapter 2 surveys pertaining to soil-transmitted helminth infections in People’s Republic of China (P.R. China) were compiled. Bayesian geostatistical models were developed and used to estimate the disease risk throughout the country at high spatial resolution. Advanced Bayesian variable selection methods were employed to identify the most important predictors. Results indicate that the prevalence of soil-transmitted helminth infections in P.R. China considerably decreased from 2005 onwards. Yet, some 144 million people were estimated to be infected in 2010. High prevalence (>20%) was predicted in large areas of Guizhou and the southern part of Hubei and Sichuan provinces for Ascaris lumbricoides infection, in large areas of Hainan, the eastern part of Sichuan, and the southern part of Yunnan provinces for hookworm infection, as well as in a few small areas of south P.R. China for Trichuris trichiura infection.
In Chapter 3 a systematic review was carried out to identify prevalence surveys to soil-transmitted helminth infections in South Asia. Bayesian geostatistical models were applied to identify important environmental and socioeconomic predictors, and to estimate infection risk at high spatial resolution across the study region. Results show that 397 million of South Asia population was infected with at least one species of soil-transmitted helminths in 2015. A. lumbricoides was the most common infection species. Moderate to high prevalence (>20%) of any soil-transmitted helminth infection was predicted in the northeastern part and some northern areas of the study region as well as the southern coastal-line areas of India. The annual treatment needs for the school-aged population requiring preventive chemotherapy was estimated at 187 million doses. The study highlights the need for up-to-date surveys to accurately evaluate the disease burden in the region.
In Chapter 4 georeferenced survey data of C. sinensis infection were obtained via a systematic review and additional data were provided by the National Institute of Parasitic Diseases, Chinese Center for Diseases Control and Prevention. Bayesian geostatistical models were applied to quantify the relation between infection risk and important predictors, and to predict the risk of infection across P.R. China at high spatial resolution. The results show an increasing risk of C. sinensis infection over time, particularly from 2005 onwards, which urges the Chinese government to pay more attention on the public health importance of the diseases. Highly endemic areas (>20%) were concentrated in southern and northeastern parts of the country. The provinces with the highest risk of infection and the largest number of infected people were Guangdong, Guangxi and Heilongjiang.
In Chapter 5 a systematic review was conducted to identify relevant surveys pertaining to prevalence of Schistosoma infection in sub-Saharan Africa. Bayesian geostatistical meta-analysis and rigorous variable selection were used to obtain up-to-date risk estimates of schistosomiasis at high spatial resolution, based on environmental and socioeconomic predictors. The literature search identified Schistosoma haematobium and Schistosoma mansoni surveys at 9,318 and 9,140 unique locations, respectively. Results show a decreased infection risk from 2000 onwards, yet suggesting that 163 million Africans were infected in 2012. Mozambique had the highest prevalence of Schistosoma infection among 44 countries of sub-Saharan Africa. Annualised treatment needs with praziquantel were estimated at 123 million doses for school-aged children and 247 million for the entire population.
In Chapter 6 a Bayesian geostatistical modeling approach was developed to analyse jointly areal and point-referenced survey data. We assumed that the point-referenced data arise from a binomial distribution and that the aggregated area data follow a Poisson binomial distribution which was approximated by a two parameter shifted binomial distribution. Results from extensive simulations shows that our proposed model has better predictive ability and improves parameter estimation compared to models that treat area data as points, located at the centroid of the areas. We applied the new model to obtain high spatial resolution estimates of the infection risk of clonorchiasis in an endemic region of P.R. China.
In Chapter 7 we integrated geostatistical and mathematical transmission models of schistosomiasis within a single model formulation to analyse age-heterogenous S. mansoni data from Côte d’Ivoire. A series of age-specific risk maps of S. mansoni infection in Côte d’Ivoire were produced at high geographical resolution, which allow us to identify the most important age groups of the population to treat at a given place. We predicted that the infection risk reached the peak at younger ages in high risk areas and at older ages in low risk areas. Moreover, a more rapid decline rate of infection risk was observed at older ages in high risk areas compared to that in moderate and low risk ones.
In summary, this PhD thesis contributes to the fields of spatial statistics and of epidemiology of NTDs with (i) statistical methodology for modeling spatially-structured disease data, having heterogeneous geographical support (i.e., georeferenced at point or area level) across the study region and they are collected over different age groups between locations, (ii) applications on soil transmitted helminth infections, schistosomiasis, and clonorchiasis in sub-Saharan Africa, South Asia, and P.R. China, to obtain spatially explicit estimates of disease risk, number of infected people, and annual treatment needs for preventive chemotherapy at different administrative levels, and (iii) large amount of geo-referenced data on NTD surveys conducted at over 10,750 unique locations that are available via the open access Global Neglected Tropical Diseases Database (GNTD). The innovative statistical methodology for analysing historical survey data, heterogeneous in space can be readily applied to other disease survey data. The up-to-date, model-based, high-resolution risk maps and estimates of treatment needs provide useful tools and information for guiding disease control and interventions.
Advisors: | Utzinger, Jürg and Vounatsou, Penelope and Stensgaard, Anna-Sofie |
---|---|
Faculties and Departments: | 09 Associated Institutions > Swiss Tropical and Public Health Institute (Swiss TPH) > Former Units within Swiss TPH > Health Impact Assessment (Utzinger) |
UniBasel Contributors: | Utzinger, Jürg and Vounatsou, Penelope |
Item Type: | Thesis |
Thesis Subtype: | Doctoral Thesis |
Thesis no: | 11749 |
Thesis status: | Complete |
Number of Pages: | 1 Online-Ressource (xvii, 184 Seiten) |
Language: | English |
Identification Number: |
|
edoc DOI: | |
Last Modified: | 02 Aug 2021 15:13 |
Deposited On: | 06 Sep 2016 08:27 |
Repository Staff Only: item control page