A Pan-Amazonian dataset integrating 20 years of respiratory, cardiovascular, zoonotic and vector-borne disease cases and landscape changes

0
A Pan-Amazonian dataset integrating 20 years of respiratory, cardiovascular, zoonotic and vector-borne disease cases and landscape changes

Study area

Geographic and temporal scope

The dataset encompasses the Amazon biome, spanning nine administrative territories—Brazil, Bolivia, Colombia, Peru, Ecuador, Venezuela, Guyana, Suriname, and French Guiana. Data were collected at the subnational level. It covers 2,395 municipalities that fall fully or partially within the biome, providing comprehensive spatial coverage aligning with available health data. The dataset spans a 20-year period from 2000 to 2019, enabling analysis of the interplay between landscape changes and health dynamics in this biologically and culturally diverse region.

Data sources and collection

Health indicators

The dataset includes annual case counts of 21 diseases, Table 1 lists each disease along with its corresponding ICD-10 codes for reference. These include respiratory, cardiovascular, and zoonotic/vector-borne diseases, such as Chagas disease, hantavirus, cutaneous leishmaniasis, visceral leishmaniasis, rickettsial diseases, and malaria (Fig. 1). This extensive compilation of disease records, collected beyond open-source national health institutions, ministries, and other official platforms, mostly involved significant ground-level efforts by local collaborators in each Amazonian country to access health data, local health reports, and surveys. To enable harmonization across countries with heterogeneous reporting systems, individual diseases were grouped into these broader categories—namely respiratory, cardiovascular, and zoonotic/vector-borne. These groupings maximize comparability across datasets with varying levels of diagnostic detail. To provide transparency and traceability, Table 2 lists the countries and their respective sources where collaborators accessed health data.

Table 1 List of diseases included in the compiled dataset, with corresponding International Classification of Diseases codes (ICD-10) and grouped into disease types: respiratory, cardiovascular, vector-borne, and zoonotic.
Fig. 1
figure 1

Temporal coverage of diseases available across Amazonian countries. Diseases are separated by plot and colors, including respiratory diseases, cardiovascular diseases, Chagas disease, hantavirus, cutaneous leishmaniasis, visceral leishmaniasis, rickettsial diseases, and malaria. Each horizontal line shows a country’s data coverage, with dots marking years when cases were recorded for each disease.

Table 2 List of health data sources, institutions and platforms by countries in the Amazon biome.

To support incidence-based analyses, we included annual population estimates at the observation level (typically municipalities) alongside raw case counts. Data were sourced from the WorldPop project (www.worldpop.org), which provides high-resolution estimates every five years. We linearly interpolated values at the observation level to cover intervening years.

Environmental data

Habitat cover and forest fragmentation

Habitat cover and fragmentation metrics were extracted from MapBiomas Amazon mapping Collection 4 (available at with a 30-meter spatial resolution, available from 1985 to 2022. Metrics extraction was carried out for years coinciding with both health and pollution data (2000–2019). MapBiomas mapping includes 25 land cover classifications: forest, savanna, mangrove, flooded forest, grassland, rocky outcrop, pasture, agriculture, silviculture, palm oil, mining, other non-vegetated areas, and river, lakes, and oceans. A detailed methodology of the mapping can be found at From these land cover classifications, we retained three main natural cover types in the region: forest, savanna, and other non-forest natural formations. They were extracted and measured at the observation unit level (tipically municipality level, or national level for Suriname), aligned with health data resolution. Fragmentation metrics were extracted using forest cover only, included (1) forest edge density, the length of forest patch edges per unit area (ED = (E / A) × 10,000; where E is total edge in meters and A is area in m², (2) forest patch density, the number of forest patches per hectare (PD = (n / A) × 10,000), and (3) the forest aggregation Index, the clumping of forest patches, where values near 100 indicate high spatial aggregation (AI = (g_ii / g_max) × 100), and were extracted at the municipality level (country-level in the case of Suriname). All landscape metrics were calculated using R, ArcGis 10.8.1, and Fragstats 4.2.

Forest fires and PM2.5 estimation

A yearly time series of total fire events for the period 2001–2019 was developed from daily MODIS Terra thermal anomalies product (MOD14A1 V6.1) at 1 km spatial resolution. Daily thermal anomalies were masked to include only fires detected with high confidence (Bits 0–3 = 9). The masked daily images were then summed to produce yearly images where the pixel value corresponds to the number of days a pixel was burning during that year.

To map the yearly spatial distribution of PM2.5, the Multi-angle Implementation of Atmospheric Correction (MAIAC) Land Aerosol Optical Depth (MCD19A2 V6.1) was combined with the PM2.5 from NASA’s Socioeconomic Data and Applications Center (SEDAC V4.0312) from 2001 to 2019, at a spatial resolution of 1 km. Total fire events in the Amazon had a better temporal relationship with MAIAC AOD than with SEDAC PM2.5. Although MAIAC AOD is a good proxy for PM2.5 ground concentrations13, it needs to be calibrated into PM2.5 concentrations to be suitable for health impact analysis.

The AOD-calibrated PM2.5 dataset was calculated by extracting daily MODIS MAIAC aerosol optical deep blue band (0.47 μm). Each daily image was masked to keep only pixels that were clear from clouds and deemed the best quality (bits 0 to 2 = 1, and bits 8 to 11 = 0). The mean yearly AOD was calculated at each pixel considering only those best quality values. MAIAC AOD values were then calibrated into PM2.5 using NASA’s SEDAC PM2.5 as a reference through a pixel-level temporal Ordinary Least Squares (OLS) regression. For each 1 km pixel, a temporal linear regression was extracted between AOD (dependent) and PM2.5 (independent). The results of this are, for each pixel, a slope and intercept coefficients that were then applied to the complete MAIAC AOD time series 2001–2019 to transform the AOD values into PM2.5. The final result is a time series 2001–2019 of calibrated AOD-PM2.5 (μg/m3). The calibrated AOD-PM2.5 product showed a better correlation with fires than AOD or SEDAC alone. Finally, the average PM2.5 in each municipality was calculated. The calibrated AOD-PM2.5 product showed a high correlation with the SEDAC PM2.5 for the years of overlap (Pearson r > 0.95).

Since the pollution generated by forest fires in this region could be displaced by the wind over 500 kilometers8, we calculated the sum of PM2.5 within this radius by using a moving window approach, an assumed that this considers the transboundary effect of pollutant that can affect human health. All spatial analysis was performed on Google Earth Engine, TerrSet, and projected in South America Albers equal-area conic projection (EPSG:4618). The total area burned in the Amazon biome was calculated considering only the presence or absence of fire per pixel between 2001 and 2019. This way, even if a pixel caught fire every year during this period, it was only counted once when calculating the area burned. As each pixel is 1 km2, the sum of all pixels that caught fire at least once gives us the total area burned between 2001 and 2019. For the accumulated burned area, the frequency of forest fires per pixel was also considered.

Data processing and integration

Data harmonization

We faced substantial heterogeneity across countries in terms of health surveillance systems, with health data available in open-source platforms (e.g. Colombia, Brazil, etc) or by demanding request directly to institutions (e.g. Suriname); temporal ranges were complete and continuous in some countries (Fig. 1, e.g. Bolivia), but fragmented in others (e.g. Ecuador); spatial granularity ranged from sub municipal-level data in Venezuela and Ecuador (‘parroquia’, which is equivalent to district) to national-level in Suriname (Table 2). Also, disease classifications varied (Fig. 2), with some systems providing detailed breakdowns (e.g. individual cardiovascular diseases in Brazil, such as conduct disorders and cardiac arrhythmias, acute myocardial infarction, etc) while others used broader categories (e.g. ‘cardiovascular diseases’ in Bolivia). Where possible, data were standardized to the municipal level; Suriname retained national-level due to data availability.

Fig. 2
figure 2

Distribution of diseases by country compiled in the dataset. The bar plot (left) shows the total number of respiratory (red), cardiovascular (yellow), and zoonotic and vector-borne (green) diseases monitored by each country. The pie charts (bottom) display the relative proportions of six zoonotic diseases—Chagas, hantavirus, cutaneous leishmaniasis, visceral leishmaniasis, rickettsial diseases, and malaria—within each country. Note that some rare diseases such as may not be visually distinguishable in the pie chart.

To ensure comparability, standardize the diverse health data sources, the number of reported cases of some individual diseases were grouped based on their physiological impact, specifically those affecting the respiratory and cardiac/vascular systems. This resulted in categories for respiratory and cardiovascular diseases. While zoonotic and vector-borne diseases were retained individual reporting for zoonotic/vector-borne diseases to preserve their epidemiological specificity and relevance.

All data processing steps, grouping decisions, and harmonization routines are documented in the repository scripts and metadata files. We also integrate the health indicators with the environmental data mentioned above. Geospatial data on our repository are in the projected coordinate reference system South America Albers Equal Area Conic (ESRI:102033).

link

Leave a Reply

Your email address will not be published. Required fields are marked *