Data
Compiling the database
We compiled our database by identifying all sources providing data on wild meat consumption in seven Central African countries between 2000 and 2022: Cameroon, Central African Republic, Democratic Republic of the Congo, Equatorial Guinea, Gabon and Republic of the Congo. We also included data from the Cross River Forest landscape in southeast Nigeria, as forests there are contiguous with protected areas in Cameroon. We considered peer-reviewed articles, technical reports, PhD and master’s dissertations, online data repositories and unpublished data, adopting a snowball sampling approach70 to search reference lists and online libraries. We used wildmeat, wild meat, bushmeat, bush meat and viande de brousse as main keywords, and consumption, nutrition and food as secondary keywords. We defined a study as a set of data collected using a single methodology in a specific study area over a determined timeframe. In this way, each data source could provide more than one study. For example, large projects that monitored multiple regions in different countries were split so that each study area represented a single study. For consistency, we restricted our research to studies investigating wild meat consumption at the household level, discarding those monitoring consumption of individual consumers who could not be aggregated to households, for example by enquiring people randomly met in the streets, a methodology mostly used in cities, where household surveys are difficult to implement. This is also the reason for the limited number of cities included in our database. Most recent studies used KoboToolBox (https://www.kobotoolbox.org/) and different versions of the KoboCollect App (v.2.020.40 and subsequent releases). Older studies recorded data with pen and paper. When possible, we downloaded the raw data from online resources (such as publicly available databases). Alternatively, we contacted the authors to request the raw data.
Data preparation
Individual studies underwent a preliminary phase of data cleaning and standardization to conform with the format required by the database https://www.wildmeat.org/. The resulting database included studies providing three different datatypes: (1) consumption/non-consumption; (2) frequency of consumption; (3) quantity consumed, each requiring specific formatting of the raw data.
Consumption/non-consumption data were provided by all studies and were therefore available for all recall events in our database (that is, at the recall level). If a household declared to have consumed wild meat during the specific recall, we recorded a ‘consumption event’ and coded the recall as 1. By contrast, if no wild meat consumption was reported during the recall period, we coded the recall as 0. Here, we considered a minimum recall period of 24 h (see the ‘Statistical analyses’ section for a description of how the number of monitored days was accounted for in the analyses). Therefore, if wild meat consumption was available for multiple meals within 24 h, we aggregated the information available for single days. In other terms, if wild meat was consumed twice (for example, in the morning and afternoon) within the same 24 h, we coded the 24 h recall as 1.
Frequency of consumption—defined as the number of consumption events recorded over the number of days a household’s consumption was monitored—was provided by 24 out of the 30 studies included in our analysis, representing 11,582 households and 107,896 recall events. For studies that recorded the frequency of consumption in categories such as daily, weekly and monthly, we calculated the frequency on a scale from 0 to 1. For example, if a household reported consuming wild meat monthly, it was assigned a frequency of consumption of 0.033 (12/365 days). However, for studies that recorded several recall events from the same household, we calculated the frequency of consumption by dividing the number of consumption events, by the total number of recalled days. For example, if a household was interviewed for 6 days about wild meat consumption over the previous 24 h and consumption occurred on two occasions, we calculated frequency as 2/6 days = 0.33. In this way, the frequency of consumption referred to individual households, rather than to single recall events. Each household had to be monitored for a minimum of 2 days to be considered (see the ‘Statistical analyses’ section for a description of how the number of monitored days was accounted for in the analyses). In other words, studies that recorded wild meat consumption over 24 h, in a single occasion for each household, were considered as consumption/non-consumption data.
Finally, 19 studies provided the quantity consumed (in g, kg, or local units such as leg, piece or entire animal) by the households over a recall period, as well as information on the wild animal species consumed. These studies included 9,189 households and 105,503 recall events. Data were available at the recall level, and we standardized the data as the quantity (in kg) consumed per household per day. Therefore, if the recorded quantities represented the cumulative consumption over a recall period of >24 h, we divided the reported quantities by the duration of the recall (in days). So, if a household reported having consumed 12 kg of undressed meat over a 72 h recall (that is, 3 days), we considered the quantity consumed by the household in a day to be 4 kg. Conversely, if quantities consumed were recorded for multiple meals within 24 h, we summed the quantity of wild meat reported for a single day. Thus, if a household reported to have consumed 0.5 kg of wild meat in the morning and 1 kg in the evening, the quantity of wild meat consumed by the household in that day would be 1.5 kg. Finally, when consumed quantities were reported in local units of measure (Extended Data Table 1), we estimated consumed kilograms following procedures specific for each unit. If the consumed units were reported in local units (such as entire, half, quarter or gigot), we assigned the species-specific average mass value using data available from the literature71 or empirical data obtained in Gabon by the authors of this study (L.C., Dibouka, 2001–2010; D.F. and D.C., Lastourville area, 2021). For all other units, including piece, pile and plate, we used estimated conversion factors, based on empirical observation collected by various authors of this study (L.C., K.A., F.S., D.D.). Because in Central Africa wild meat is generally sold and cooked along with bones and sometimes skin, we considered the quantities in our database as the quantity of undressed meat (in kg) consumed per household per day.
Ethics statement
The procedure used to compile the database was approved by the ethics review committee of CIFOR/ICRAF (SLF6430000-UFW044-AI2; 13/12/2021) and included the anonymization of all sensitive data. All included studies obtained (1) an ethical review of data collection protocols, (2) the agreement of the local communities (focal groups/authorization of the communities’ representatives) before data collection; (3) prior informed consent from all respondents (Extended Data Table 1).
Statistical analyses
Our model jointly analyses the datatypes described above to estimate consumption rates in the region and investigate drivers of consumption. It is therefore composed of three submodels. The first estimates wild meat consumption probability using consumption/non consumption data as a function of relevant predictors and accounting for spatial autocorrelation between sites. The second submodel models wild meat frequency of consumption as a function of predictors, and, as above, accounts for spatial autocorrelation. The third submodel investigates predictors of daily quantity of wild meat consumed individually, using the AME transformation31 (see below for details).
Estimating wild meat consumption rates
We defined the wild meat consumption rate as the daily quantity of wild meat consumed per AME, using the following formula:
$$\mathrm{Consumption}\,\mathrm{rate}=\mathrm{consumption}\times \mathrm{frequency}\times \mathrm{quantity}$$
(1)
Where consumption represents a binary output, 0 or 1, of wild meat consumption, where 0 = non consumption and 1 = consumption; frequency represents how often is wild meat consumed on a scale from 0 (never) to 1 (every day); and quantity is the quantity (in kg) of wild meat consumed per AME in 24 h.
Consumption probability
The first level of our model estimated the consumption probability from binary data of consumption/non-consumption. Although binary data are the least informative towards the estimation of wild meat consumption, this submodel was important to include more studies into our analysis and increase the study coverage. Moreover, all datatypes could be scaled down to binary consumption/non-consumption data, and we were therefore able to obtain data for all recall events. Finally, modelling consumption probability enabled us to deal with the large proportion of non-consumption records (that is, 84%). We modelled the probability of consumption π for each recall event r as
$${\mathrm{consumption}}_{{r}} \sim \mathrm{Bernoulli}({\pi }_{{r}})$$
(2)
Where consumption is a vector of consumption/non-consumption data of length R, with R equivalent to the number of recall events in our database.
We defined π as a function of explanatory variables with logit-link, as
$$\mathrm{logit}({\pi }_{r})={\alpha }_{0}+{\alpha }_{1}\times {V1}_{r}+\ldots +\alpha k\times {{Vk}}_{r}+{\alpha }_{k+1}\times {\mathrm{days}}_{r}$$
(3)
Where α0 is the intercept; α1 to αk are parameters specific to each variable V (n = k) included in the model, slopes for continuous variables or factors, for categorical variables.
Here we accounted for the fact that longer recalls would increase the probability of recording at least on consumption event, by including the parameter αk + 1, defining the increment in π as a function of the recall duration in days.
Frequency of consumption
The second level of the model estimated the frequency of wild meat consumption, that is, how often a household reportedly consumed wild meat. If frequency data were not available, we assigned a missing code (that is, −1). As events with no consumption of wild meat were modelled in the previous submodel, and we modelled frequency conditional on being >0 (that is, a frequency of 0 was not considered) as
$${{\rm{f}}{\rm{r}}{\rm{e}}{\rm{q}}{\rm{u}}{\rm{e}}{\rm{n}}{\rm{c}}{\rm{y}}}_{h} \sim {\rm{B}}{\rm{e}}{\rm{t}}{\rm{a}}({\varphi }_{h}\times \kappa ,(1-{\varphi }_{h})\times \kappa )$$
(4)
Where frequency is a vector of length H, equivalent to the number of households included in our study; φ is the mean frequency of consumption for household h and κ is the sample size of the beta distribution.
Here, we wanted to account for the possibility that the precision of the observed frequency is conditional on the number of monitored days. For example, if a household was interviewed on a single day and asked how often it consumed wild meat over a year, we expected the uncertainty to be higher than cases in which a household was visited every day and asked about what they ate in the previous 24 h. We assumed that the latter case as the ideal scenario (n = 365 monitored days), where we could be certain that the recorded frequency was correct, with minimal associated error. Conversely, we assumed the first (n = 1 monitored days) to be the case with the highest uncertainty, as if a household was only visited once and asked what they ate in the previous 24 h (n = 1 monitored days). We therefore modelled φ for each household h as
$$\mathrm{logit}({\varphi }_{h}) \sim \mathrm{Normal}({\underline{\varphi }}_{h},{\Sigma }_{h})$$
(5)
where \(\underline{\varphi }\) is the mean of the frequency of consumption on the logit scale and Σ its s.d., which we modelled conditional on the number of monitored days as
$${\Sigma }_{h}=\sigma \times (365-{\mathrm{mdays}}_{h})$$
(6)
Here, mdays is the sum of monitored days for household h; and σ is the reduction rate in Σ for each monitored day, that is, σ scales to 0 if 365 days were monitored and is maximum if only 1 day was monitored.
Finally, we defined the mean frequency of consumption \(\underline{\varphi }\) for household h as a function of explanatory variables with logit-link, as
$$\mathrm{logit}({\underline{\varphi }}_{h})={\beta }_{0}+{\beta }_{1}\times V{1}_{h}+\cdots +\beta k\times V{k}_{h}$$
(7)
Where β0 is the intercept, β1 to βk are parameters specific to each variable V (n = k) included in the model (slopes for continuous variables; factors for categorical variables).
To improve sampling efficacy of our model, we used the equivalent non-centred parameterization of equation (5), defined as follows72:
$${\varphi }_{h}={\mathop{\varphi }\limits_{\_}}_{h}+{\Sigma }_{h}\times {\tau }_{h}$$
(8)
Quantity consumed
Finally, the third level of our model estimated the daily quantity in kg consumed per AME. Here we used data provided by studies that recorded the weight (in g, kg or local units of measure) consumed in a household over a certain recall period. If the quantity consumed was not available, we assigned a missing code (that is, −1). As the probability of consuming wild meat on a certain day and the frequency of wild meat consumption were analysed in previous submodels, we modelled the daily quantity consumed per AME in recall event r conditional on it being >0 (that is, kg of consumed wild meat were recorded and > 0), as
$${{\rm{Q}}{\rm{u}}{\rm{a}}{\rm{n}}{\rm{t}}{\rm{i}}{\rm{t}}{\rm{y}}}_{r}/{{\rm{A}}{\rm{M}}{\rm{E}}}_{r} \sim {\rm{G}}{\rm{a}}{\rm{m}}{\rm{m}}{\rm{a}}({\mu }_{r}\times \theta ,\theta )$$
(9)
where quantity is a vector of length equivalent to the number of recall events R considered in our study providing the quantity (in kg) of wild meat consumed per household h in recall r; AME is a vector of length R, storing the number of AME registered for each recall event r; μ is the mean quantity (in kg) of wild meat consumed per AME in recall r; and θ is the scale parameter of the gamma distribution.
Finally, we defined μ as a function of explanatory variables with log-link, as
$$\log ({\mu }_{r})={\gamma }_{0}+{\gamma }_{1}\times {\mathrm{AME}}_{r}+{\gamma }_{2}\times {V1}_{r}+\ldots +{\gamma }_{k}\times {{Vk}}_{r}$$
(10)
Where γ0 is the intercept, γ1 is covariate-specific slopes for the number of AME participating in a recall event; γ2 to γk are parameters specific to each variable V (n = k) included in the model (slopes for continuous variables, or factors for categorical variables).
Spatial autocorrelation
We expected geographically close locations to be more likely to share similar patterns of wild meat consumption. Our model therefore also included a spatial autocorrelation component, allowing for the similarities between two sites to decrease as the distance grows, using the quadratic kernel function. Specifically, we implemented a latent Gaussian process regression, exploiting the Euclidean distance between locations to estimate the covariance of each pair at different distances from each other72. In practice, we first built a distance matrix with dimension equal to the number of locations l in our model, Dl,j and then implemented the quadratic kernel function to build the covariance matrix Xl,j
$${X}_{l,j}={\zeta }^{2}\exp (-{\rho }^{2}{D}_{l,j}^{2})+\delta $$
(11)
Where ζ is the marginal s.d., representing the maximum covariance between sites, ρ is the rate of decrease in covariance (that is, length scale) and δ is a small positive scalar (that is, 10−9), added to the diagonal of X to ensure that it remains positive.
The resulting covariance matrix was then converted to a Cholesky factor LXl,j (that is, the product of the lower triangular matrix and its conjugate transpose) for more efficient numerical solution. Finally, LX was multiplied by η, a vector of length equal to the number of locations, used to generate a multivariate normal vector ε, corresponding to the latent Gaussian process72,73.
AME imputation
The number of AME per household was unavailable for 55.9% of the recall events. Having included AME in the linear model for μ (equation (10)), we were able to use Bayesian imputation to estimate missing values of AME, a method that is independent of the percentage of missing values in the dataset74. To do that, we assigned a distribution to the missing values, such as
$$\mathrm{missing}\,{\mathrm{AME}}_{m} \sim \mathrm{Normal}(\nu ,\psi )$$
(12)
Where missing AMEm is a vector of length equal to the number of missing values M; ν is the mean number of AME present in a recall event; and ψ its s.d. In this way we obtained the vector AME merged, of length equivalent to the number of recall events R and composed of both observed and estimated (that is, imputed) values of AME.
Priors
We set weakly informative priors to all our parameters (Supplementary Table 1), providing the model with enough information to avoid exploring impossible values75. In the case of the imputation of missing AME, we centred the mean ν in equation (12) to the mean number of AME, \(\underline{AME}\), calculated from available data, that is, 5.09.
Simulation study
Before running the model on real data, we evaluated its accuracy in retrieving the parameters of interest in a simulation study in which we investigated three different scenarios of coverage of the study area: 5%, 10% and 15%, similar to the coverage of our data (that is, 7.5% of our prediction grid; Supplementary Fig. 1). For that, we used a simple version of the model described above, including 2 continuous variables and 1 categorical variable.
We created a study area composed of 900 cells, and divided it into three regions, with different characteristics. For each cell, we simulated the mean value of two continuous variables V1 and V2. The first region (number of cells = 360) was simulated having high V1 and low V2. The second (180 cells), as having high V2 and low V1. Finally, the third (360 cells) was simulated with intermediate values of V1 and V2. For simplicity, we allocated one location in each cell and considered it as a cluster of villages. We then calculated a distance matrix of the simulated location and created a varying number of households according to the cells’ features. If the site fell within region one, it was given a low number of households (mean = 40). Region two had the highest mean number of households (mean = 100) and region three an intermediate number (mean = 65). We then simulated the number of AME for each household, using a mean of 5 AME per household, and calculated the total number of AME within the study area. We also assigned a categorical variable V3 (2 levels) to each household, with the first level being less frequent (30%) than the second (70%).
For each household, we simulated (1) consumption probability π (equations (2) and (3)); (2) the frequency of consumption φ (equations (7) and (8)) and (3) the mean quantity consumed μ (equations (9) and (11)), defining their mean values using the simulated variables. Specifically, in (1) the mean consumption probability was simulated as a function of V1, V3 and spatial autocorrelation ε, making use of the distance matrix between sites described above. In (2) the mean frequency of consumption was a function of V2, and V3. In (3) the mean quantity consumed was a function of V2 and the number of AME in the household. In all of the models, we set intercept and slopes (for continuous variables) varying by region (Supplementary Table 2).
By averaging the values of all simulated households, we calculated the ‘real’ average (1) consumption probability, (2) frequency of consumption and (3) quantity consumed per AME in the region, calculated consumption rates applying equation (1) and obtained the number of tonnes consumed in the study area by summing up the product of the consumption rates and the number of AME simulated in each cell (equation (29)).
We then randomly selected a number of locations, conditional on the coverage scenarios described above. All those selected were assumed to provide information on wild meat consumption/non-consumption. However, we also assumed that only 80% of those provided information on frequency of consumption and only 50% gave information on quantities of wild meat consumed. We also selected a proportion of households within each surveyed cell as well as a subset of household (20%) for which we assumed that the number of AME was unknown. Finally, we simulated (1) the number of monitored days for each selected household, (2) the uncertainty around the real frequency value conditional on the number of monitored days (that is, longer monitoring = lower uncertainty) and (3) the observed frequency values by applying the obtained uncertainty to the real simulated value of each household (equation (5)).
For each scenario, we generated 100 databases and run 1 chain of 2,000 iterations (warmup = 1,000) for each of them (n = 100) in R (v.4.2.0)76 using Rstan (v.2.26.11)77. We verified the accuracy of our model by comparing the posterior distribution of the parameters estimated in each scenario (from the 100 samples aggregated) with the true simulated values. The results of the simulation are provided in the Supplementary Results.
Correlates of wild meat consumption
To investigate the factors driving wild meat consumption in Central Africa, we evaluated a set of variables available at different levels (Supplementary Table 3): (1) the study level included information specific to the year and design of the studies included in the analysis; (2) the site level provided data relative to the sites where the studies were conducted, including geographical layers available for the entire region; (3) the household level provided information specific to characteristics of each household; and (4) the recall-level data specific to each recall event. Below, we describe continuous and categorical variables, state our hypothesis with respect to the effect on wild meat consumption rates and describe the process to format the data as used in the analysis. However, as random factors were simply identifiers (from 1 to n) of specific studies, sites, households and recalls, they did not require any data processing and are not mentioned below.
Study type
Wild meat consumption studies are generally conducted using recall interviews, where respondents are asked whether (consumption/non-consumption) or how often (that is, frequency of consumption) they consumed wild meat, and how much of it they consumed (quantity consumed), over a certain period of time, called the recall period. The studies included in our analysis used different recall periods, from 24 h to an entire year. A different approach was represented by ‘cooking pot’ studies, in which respondents were not asked what they consumed, but rather what they cooked. As in Central Africa, it is common to share what is cooked with other households78, we expected cooking-pot studies to overestimate the quantity consumed per capita in the interviewed household, as part of the cooked meat might have been consumed elsewhere.
Hypothesis
Cooking-pot studies tend to overestimate quantities consumed, but not the frequency of consumption or consumption/non-consumption.
Data processing
Studies that recorded quantities of consumed wild meat and used a recall period of 24, 48 or 72 h, were given a dummy study type (ST) value of 1; longer recall periods (1 week, 1 month or 1 year) were used only by studies focussing on the frequency or consumption/non-consumption and were assigned a ST value of 2; cooking-pot studies were given a ST value of 3 (Extended Data Table 1).
Location type
To fulfil nutritional requirements, Central African rural populations often have no/few alternatives to the consumption of wild meat. However, urban populations often do, particularly those living in metropolitan areas and capital cities, where affordable alternatives are available79. Here, wild meat consumption is less a matter of survival, and more of culture. In Central African cities, wild meat is perceived to be healthier than imported poultry and pork, and it represents both a way to maintain a connection with one’s place of origin (usually a rural area, where wild meat is the main source of protein) and a status symbol, as wild meat is generally more expensive than domestic alternatives27. In the region, another settlement type is represented by towns between 10,000 and 100,000 inhabitants, where alternatives are available but are generally more expensive than wild meat.
Hypothesis
Hypothesis: wild meat consumption probability and frequency of consumption are highest in the villages, and lowest in urban areas. We also expected the quantity consumed by AME to be higher in the cities (where wild meat is sometimes luxury product).
Data processing
None. The data collected in each site were given a dummy code (from 1 to 3; Supplementary Table 3), according to the settlement type.
Distance between locations
Wild meat consumption rates are known to vary with respect to many factors, including price, availability of wildlife and alternative sources of protein15 and seasonality78. Consumption rates are also likely to be driven by fine-scale characteristics at the site and household level, which are mostly expected to be cultural. Some of these factors have been measured in the past and were included in this study, but we suspected others unobserved factors could drive consumption rates in the region. For example, cultural features are shared by more neighbouring villages and change gradually as a function of distance. In other terms, locations closer to each other are more likely to share similar cultural and environmental features than those further apart.
Hypothesis
A substantial part of the variation in consumption rates can be explained by unmeasured characteristics shared between geographically related locations.
Data processing
We calculated the distance between each pair of locations included in our analysis by georeferencing the site and then using the Distance matrix algorithm in QGis (v.3.22.1)69, resulting in a square matrix D with dimension equal to the number of locations (n = 252), and zeroes on the diagonal.
Human population density
Human population density in Central Africa is growing at a 3% annual rate, increasing wild meat demand and, consequently, wildlife extraction rates. As human population density increases, wildlife becomes scarce, wild meat prices increases and consumption rates decrease32. However, even limited consumption rates from a large human population can have a major effect on the total amount of biomass consumed. An increasing number of people in the region are moving from rural to urban areas, and the urban population of Central Africa doubled between 2000 and 2020 (https://data.worldbank.org/indicator/SP.URB.TOTL).
Hypothesis
High human population density (HPD) values would result in lower consumption probability and frequency of consumption.
Data processing
We calculated HPD for each site included in our analysis using dynamic human population layer80. To provide a value representative of the surroundings of the site and not relative to a single point in space, we used QGis (v.3.22.1)69 to create a 40 km buffer around the georeferenced location of each site. This area represents the furthest distance that communities in Central Africa are willing to cover to procure wild meat81. We then averaged the values of HPD for each available year (that is, 2000, 2005, 2010, 2015, 2020) within each 40 km buffer and assigned it to each site according to the time the site was surveyed. In other terms, if a site was surveyed in 2002, it was given the value of HPD calculated for the year 2000; sites surveyed in 2013 were given the value calculated for the year 2015. Consequently, each household, and each recall related to a particular site, obtained the same value of HPD.
Remoteness
In Central Africa, remote areas are those where alternatives to wild meat are rarest, and even if available, cannot be afforded by most inhabitants46. Moreover, many communities do not have a history of livestock rearing38, and locally reared livestock is used as a security commodity in time of economic or nutritional need44.
Hypothesis
High remoteness (REM) values would result in a higher consumption probability and frequency of consumption, with either an opposite or a non-detectable effect on the daily quantity of wild meat consumed.
Data processing
We calculated REM for each site included in our analysis in the same way we did for HPD (see above) using a remoteness layer82 available for 2015 only (Supplementary Discussion). Therefore, REM values were assigned to each site independently of survey time.
Human development index
The human development index (HDI) is an indicator of human development, calculated as the geometric mean of the normalized indices of (1) life expectancy at birth, (2) average years (for adults >25 years) and expected years of schooling for children; (3) gross national income per capita83.
Hypothesis
High HDI values would result in lower consumption probability and frequency of consumption, with either an opposite or a non-detectable effect on the daily quantity of wild meat consumed.
Data processing
We allocated an HDI value to each site according to site-specific administrative level 1 and year of survey. Consequently, each site, household and recall related to a particular administrative level, obtained the same value of HDI.
Forest condition index
In Central Africa, a large proportion of consumed wild meat is sourced in the rainforest20. Intact habitats are essential to the persistence of abundant, healthy and diverse wildlife communities. Conversely, in human modified habitats, where forest is degraded, wildlife populations might be depleted and unable to provide a substantial amount of wild meat64. Consequently, wild meat consumption is most relevant in regions where the forest is healthy, wildlife is abundant and hunting is profitable, making wild meat a cheap source of food15.
Hypothesis
High FCI values would result in higher consumption probability and frequency of consumption, and in either a similar or non-detectable effect on the daily quantity of wild meat consumed.
Data processing
We calculated FCI for each site included in our analysis in the same way we did for HPD (see above). However, here the forest condition index layer84 was available for the year 2019 only (Supplementary Discussion). FCI values were therefore assigned to each site independently of survey time.
Education level
The education level attained in a household can be considered as a proxy of its wealth85. Higher education increases opportunities to find paid employment, which in turn gives access to more expensive sources of food58. However, poor job markets in rural areas limit the earning advantages of education. As such, we expected potentially opposing effects of education as a proxy for wealth in rural and urban areas. Where wild meat is cheaper than alternative sources of protein, education levels might have little effect on consumption rates15. However, in cities in which wild meat is expensive, education might be linked to higher consumption, as education is more likely to result in higher wealth in more vibrant job markets, and wealth is more likely to be used to purchase more expensive wild meat27. Accordingly, we investigated (1) the fixed effect of education level (ED) (not considering differences between location type) and (2) the interaction between ED and location type (LT) (that is, village, town, city).
Hypothesis
Households with higher education show lower consumption rates in rural areas (that is, villages), but higher rates in towns and cities. Education level has no effect on consumption rate when households from different settlement types are aggregated.
Data processing
Rural households where the reported highest education was primary (or no education) were given a dummy ED value of 1; town households where the reported highest education was primary (or no education) were given a dummy ED value of 2; city households where the reported highest education was primary (or no education) were given a dummy ED value of 3. In the fixed-effect model, these categories were aggregated by assigning an ED value of 1. Rural households that reported a secondary (or higher) education level were given an ED value of 4; town households that reported a secondary (or higher) education level were given an ED value of 5; city households that reported a secondary (or higher) education level were given an ED value of 6. In the fixed-effect model, these categories were aggregated by assigning an ED value of 2. Finally, households for which the education level was unknown were assigned ED value of 7 (in the interaction model) or 3 (in the fixed-effect model) regardless of the location type.
Household size
Finally, the number of people participating in a meal might affect the quantity of wild meat consumed. Assuming a household only has a certain budget to spend, or that the hunters in the household could only provide a certain amount of meat per day, the more people participate in the meal the smaller the quantity consumed per AME.
Hypothesis
Higher AME numbers present during a recall event result in lower quantities consumed per capita, and vice versa.
Data processing
If the number of AME was not calculated and the sex and age of each person present in the recall were available, we used the following formula86:
$$\mathrm{AME}=\mathrm{AM}\times 1+\mathrm{AF}\times 0.86+C(10\mbox{–}15\,\mathrm{years})\times 0.96+C(6\mbox{–}10\,\mathrm{years})\times 0.85+C(0\mbox{–}5\,\mathrm{years})\times 0.52$$
(13)
Where AM = adult male individual (>16 years old); AF = adult female individual (>16 years old); and C = Child. If a child’s age was not specified, we multiplied by 0.78, that is, the average between the three child multipliers. Similarly, if adult sex was not specified, we multiplied the number of adults by 0.93, that is, the average between the adult male and adult female multipliers. If the number of AME was available for the household but not for each recall event (in case of multiple recalls of the same household), we allocated the same AME value to each recall event related to a household. Finally, if no information regarding the age structure of the household was present, we coded with a missing code and imputed the value within the model (equation (12)).
Model selection process
There is substantial debate on the best process to be used when deciding the explanatory variable to include in a model to avoid (1) spurious correlations and (2) overfitting, while at the same time achieving sufficient predictive power. Here, to reduce the probability of spurious correlations, we made use of our knowledge to select variables that we considered as potential drivers of wild meat consumption probability, frequency of consumption and quantity consumed, based on a priori hypotheses (Extended Data Table 2). Accordingly, we defined three full submodels based on the hypotheses and research questions described above.
To account for study-specific features in terms of the methodology and cultural and contextual characteristics of the study area, we used an intercept varying by study ID. According to our hypotheses, we considered all continuous variables important and included two categorical variables (1) education level ED, to evaluate whether higher education resulted in lower consumption rates, and (2) location type LT, to test our hypothesis of higher consumption rates in rural areas. In the model estimating quantity consumed, we included the number of AME present during the recall period, to test our hypothesis of higher AME resulting in lower quantities consumed per capita. To account for multiple recall events recorded for the same household in the models estimating the probability of consumption and quantity consumed, we also included household H as a random factor. When evaluating consumption probability, we also included the duration (in days) of the recall period days, on the assumption that longer recall periods have a higher probability of recording a consumption event.
To test the submodels for overfitting, we evaluated collinearity in the continuous variables included in each submodel by examining the pairs plot of the residuals87. In case of issues, we (1) included the spatial autocorrelation component, (2) checked whether collinearity issues remained by visually inspecting the pairs plot of the residuals, and (3) assessed whether the spatial component improved the model’s predictive power by comparing the expected log predictive density (ELPD) using the R package loo (v.2.5.1)88. We considered a significant increase in ELPD as an indication of the importance of the autocorrelation term. We considered two models to be equivalent if (1) the ELPD difference was ≤4; (2) the standard error of the difference was ≥ the difference in ELPD89. In case of persisting issues, we (4) evaluated the importance of each variable included by removing one at a time to investigate the submodels predictive power using loo. Here, a drop in ELPD with respect to the full model was considered an indication of the importance of the removed variable, that is, the larger the drop, the more important the variable that was removed. Conversely, a non-significant drop indicated a limited importance of the removed variable in explaining the data and suggested that the reduced and full model’s predictive power was similar.
We run each submodel (2 chains, 2,000 iterations, 1,000 warmup) using a subset of our database including 5 studies, spanning 3 countries and 2 time periods, and representing 10 sites, 401 households and 6,628 recalls. The results of the variables selection process are provided in the Supplementary Results.
Past and present consumption rates
The final step in our study involved the prediction of consumption rates and the estimation of the amount of wild meat consumed per year in the entire region. To do so, we projected a grid of J cells over Central Africa, with J = 874 (Supplementary Fig. 1). Each cell j had size of 5,027 km2 (70.09 × 70.09 km), equal to the area of the circle (radius = 40 km) used to calculate the value of continuous variables for each site included in our analyses (see the ‘Correlates of wild meat consumption’ section).
As our data were mostly representative of the Central African forest region (Fig. 1 and Supplementary Discussion), we also restricted our predictions to an area that encompassed all the locations included in our analyses but excluded the Sahel regions of Cameroon and Central African Republic and southern Democratic Republic of the Congo (Supplementary Fig. 1), uncovered by the studies included in our database. To do so, we selected only cells intersecting a buffer around patches of continuous forest68 (>5,000 km2). To include areas of forest–savannah transition, we set a buffer radius equal to twice the side of the cells (that is, 140.18 km).
We defined 3 scenarios, predicting (1) past (2000–2010); (2) recent (2011–2021); and (3) present (2022) wild meat consumption in Central Africa. Within each cell, we calculated scenario specific values of (1) human population density, (2) remoteness, (3) human development index, (4) forest condition index, (5) education level, (6) location type and (7) number of AMEs, obtaining 7 vectors of length J, equal to the number of cells (see below for details).
To discriminate the parameters described above (observed and estimated) from those used for prediction, we annotated all predicted objects with the accent ~.
Calculating prediction variables
Prediction variables for each cell j, and scenario z were calculated as follow.
Human population density
To calculate cell and scenario specific \(\widetilde{\mathrm{HPD}}\) values, we used the human population density raster layer clipped over our prediction grid (Supplementary Fig. 2). We averaged HPD values from year 2000, 2005 and 2010 within cell j (past scenario), values from year 2015 and 2020 (recent scenario), and values from 2020 (that is, the most recent year).
Remoteness
Remoteness data were available for 2015 only. We therefore averaged values of the 2015 remoteness raster layer82 (Supplementary Fig. 3) within each cell j and use the obtained mean for all scenarios.
Human development index
We used subnational human development index83 values to calculate cell and scenario specific \(\widetilde{\mathrm{HDI}}\). We averaged HDI values (years: 2000 to 2010, past; 2011 to 2019, recent; 2019, present scenario) within each administrative level available in the region (Supplementary Fig. 4). If a cell j was completely within the boundaries of an administrative level, it was assigned an averaged \(\widetilde{\mathrm{HDI}}\) value calculated as described above. However, if a cell overlapped >1 administrative level, we first calculated the proportion of each administration within the cell and then calculated a weighted \(\widetilde{\mathrm{HDI}}\) value, conditional on the area of each administrative level represented in the cells.
Proportion of natural terrestrial habitat
The forest condition index layer84 was available for 2019 only (Supplementary Fig. 5). We therefore calculated the average \(\widetilde{\mathrm{FCI}}\) in 2019 within each cell j and use the obtained mean for each scenario z.
Location type
To predict consumption rates conditional on the type of settlements within each cell j, we needed a standardized categorization based on available data across the entire region. However, there is no regional, nor national, database available in Central African countries providing a classification for each settlement. In the same way, there are no databases of, for example, facilities present in each settlement, which could be used for a facility-based classification. As done by several other studies, either focusing specifically on wild meat30,90,91 or more generally on urbanization92,93,94,95, the only tested and replicable approach to (remotely) classify villages, cities and towns across Central Africa is to use population size. In our database, all villages (n = 224) had a population up to 10,000 people; towns (n = 24) had between 10,000 and 100,000 inhabitants; and cities (n = 4) all had more than 100,000 people (Supplementary Discussion). Accordingly, we used a global settlement type layer (resolution: 1 km2), available for year 2000, 2005, 2010, 2015 and 202037. We used settlement type data from year 2005 (that is, the midpoint of the period 2000–2010), year 2015 (that is, the midpoint of the period 2011–2021) and year 2020 (that is, most recent available year) for the past, recent and present scenarios, respectively. For each scenario, we first reclassified the settlement type raster to discriminate between rural (coded as 1) and urban (coded as 2) inhabited areas. We then converted the reclassified raster to obtain a vector file of polygons representing urban settlements within the study area (Supplementary Fig. 6) and calculated the number of inhabitants by summing up human population data within each polygon80. Based on our population-based classification, we coded polygons as 1 (that is, village) if the number of people calculated within it was <10,000; as 2 (that is, town) if the estimated population was >10,000 but <100,000; and as 3 (that is, city) if the estimated population was >100,000. In this way, we obtained the estimated number of people present, as well as the proportion of people living in villages \(\widetilde{{\rm{L}}{\rm{T}}1}\), towns \(\widetilde{{\rm{L}}{\rm{T}}2}\) and cities \(\widetilde{{\rm{L}}{\rm{T}}3}\), in each cell j.
Education level
To predict average education level within each cell, we first compiled a database composed of 11 ICF Demographic Health Surveys (DHS) (https://dhsprogram.com/methodology/survey-Types/dHs.cfm) and 14 UNICEF Multiple Indicator Cluster Surveys (MICS) (https://mics.unicef.org/) conducted between 2000 and 2021, and including information on the highest education level of 213,659 households, as well as the subnational district, that is, administration level 1, of the household (Supplementary Table 4 and Supplementary Fig. 7). For each cell j, we calculated the proportion of people that reported an education level ≥secondary and averaged values from within each administrative level covered by our prediction grid according to our scenarios. We used data from years 2000 to 2010 and 2011 to 2022 for the past and recent scenario, respectively, and data from the most recent available survey for each country (Supplementary Table 4), to calculate the present proportions of people attending secondary education. In cases in which a cell j was completely within the boundaries of an administrative level, the cell was assigned the specific calculated proportion of people attending secondary education \(\widetilde{\mathrm{ED}}\). However, if a cell overlapped >1 administrative levels, we first calculated the proportion of the cell that fell within each administration and then calculated a weighted average of the proportion of people attending secondary education, conditional on the areas of each administrative level represented in the cell. As the analysis of the interaction between education level and location type did not show any clear indication of such effect (Extended Data Fig. 2 and Supplementary Table 5), for our prediction, we used the simplest approach and did not consider the interaction, but only the fixed effect of education.
Number of AMEs
For each scenario z, we multiplied values of \(\widetilde{\mathrm{HPD}}\) previously calculated for each cell j, by 5,027, that is, the area of the cell in km2 to obtain the absolute number of people pop estimated to be present in each cell j. We calculated the proportion of children \({\mathrm{prop}}_{\mathrm{child}}\) in each cell j for each scenario z, using country-specific estimates of the proportion of children in the total population (https://data.worldbank.org/indicator/SP.POP.0014.TO.ZS) available from year 2000 to 2022. Similarly, we calculated the corresponding proportion of adults as
$${{\mathrm{prop}}_{\mathrm{adult}}}_{j,z}=1-{{\mathrm{prop}}_{\mathrm{child}}}_{j,z}$$
(14)
Finally, we obtained the predicted number of AME in each cell j and scenario z by adapting equation (13) as:
$${\widetilde{\mathrm{AME}}}_{j,z}={(\mathrm{pop}}_{j,z}\times {{\mathrm{prop}}_{\mathrm{adult}}}_{j,z}\times 0.5)\times 1+{(\mathrm{pop}}_{j,z}\times {{\mathrm{prop}}_{\mathrm{adult}}}_{j,z}\times 0.5)\times 0.86+{(\mathrm{pop}}_{j,z}\times {{\mathrm{prop}}_{\mathrm{child}}}_{j,z})\times 0.78$$
(15)
Here we assumed a sex ratio of 0.5 in the adult population, and used the same multipliers described above (see the ‘Drivers of wild meat consumption’ section) to convert the number of women and children into AME86. We calculated \({\mathrm{prop}}_{\mathrm{child}}\) as the average of the country-specific proportion of children from year 2000 to 2010 (past), 2011 to 2022 (recent) and 2022 (present) and obtained \({\mathrm{prop}}_{\mathrm{adult}}\) applying equation (14).
Predicting wild meat consumption
We used the variables described above to predict cell and scenario-specific consumption probability \(\widetilde{\pi }\), mean frequency of consumption \(\widetilde{\varphi }\) and mean quantity consumed \(\widetilde{\mu }\). For parameters varying by period t, we used the one specific to period 1 (2000–2010) for the past scenario, and the one specific to period 2 (2011–2021) for the recent and present scenarios. For random factors, we used the estimated average, annotated with the accent ‘-’. We weighed the parameters obtained for location type (equation (16), consumption probability; equation (17), frequency of consumption; equation (18), quantity consumed) and education level (equation (19), consumption probability; equation (20), frequency of consumption; equation (21), quantity consumed) by the proportion of people estimated living in villages, towns and cities and attending secondary school in each cell j, obtaining weighed parameters used for prediction by applying the following equations:
$${\widetilde{\alpha 1}}_{j,z}={\alpha 1}_{\mathrm{lt}1}\times {\widetilde{{\rm{L}}{\rm{T}}1}}_{j,z}+{\alpha 1}_{\mathrm{lt}2}\times {\widetilde{{\rm{L}}{\rm{T}}3}}_{j,z}+{\alpha 1}_{\mathrm{lt}3}\times {\widetilde{{\rm{L}}{\rm{T}}3}}_{j,z}$$
(16)
$$\widetilde{{\widetilde{\beta 1}}_{j,z}={\beta 7}_{\mathrm{lt}1}\times {\widetilde{{\rm{L}}{\rm{T}}1}}_{j,z}+{\beta 7}_{\mathrm{lt}2}\times {\widetilde{{\rm{L}}{\rm{T}}3}}_{j,z}+{\beta 7}_{\mathrm{lt}3}\times {\widetilde{{\rm{L}}{\rm{T}}3}}_{j,z}}$$
(17)
$${\widetilde{\gamma 2}}_{j,z}={\gamma 2}_{\mathrm{lt}1}\times {\widetilde{{\rm{L}}{\rm{T}}1}}_{j,z}+{\gamma 2}_{\mathrm{lt}2}\times {\widetilde{{\rm{L}}{\rm{T}}3}}_{j,z}+{\gamma 2}_{\mathrm{lt}3}\times {\widetilde{{\rm{L}}{\rm{T}}3}}_{j,z}$$
(18)
$${\widetilde{\alpha 5}}_{j,z}={\alpha 5}_{\mathrm{ed}1}\times (1-{\widetilde{\mathrm{ED}}}_{j,z})+{\alpha 5}_{\mathrm{ed}2}\times {\widetilde{\mathrm{ED}}}_{j,z}$$
(19)
$${\widetilde{\beta 5}}_{j,z}={\beta 5}_{\mathrm{ed}1}\times (1-{\widetilde{\mathrm{ED}}}_{j,z})+{\beta 5}_{\mathrm{ed}2}\times {\widetilde{\mathrm{ED}}}_{j,z}$$
(20)
$${\widetilde{\gamma 3}}_{j,z}={\gamma 2}_{\mathrm{ed}1}\times (1-{\widetilde{\mathrm{ED}}}_{j,z})+{\gamma 2}_{\mathrm{ed}2}\times {\widetilde{\mathrm{ED}}}_{j,z}$$
(21)
Finally, we generated predicted consumption probability \(\widetilde{\pi }\), mean frequency of consumption \(\widetilde{\varphi }\) and mean quantity consumed \(\widetilde{\mu }\) by replacing variables at the recall and household levels in the linear models specific to each submodel (Extended Data Fig. 1a), with those calculated for the prediction grid:
$$\mathrm{logit}({\widetilde{\pi }}_{j,z}) \sim \underline{\alpha }+{\widetilde{\alpha 1}}_{j,z}\times {\widetilde{\mathrm{HPD}}}_{j}+\alpha 2\times {\widetilde{\mathrm{HDI}}}_{j}+\alpha 3\times {\widetilde{\mathrm{REM}}}_{j}+\alpha 4\times {\widetilde{FCI}}_{j}+{\widetilde{\alpha 5}}_{j,z}+\underline{\alpha 6}+\alpha 7$$
(22)
$$\mathrm{logit}({\widetilde{\varphi }}_{j,z}) \sim \underline{\beta 0}+{\widetilde{\beta 1}}_{j,z}\times {\widetilde{\mathrm{HPD}}}_{j,z}+\beta 2\times {\widetilde{\mathrm{HDI}}}_{j,z}+\beta 3\times {\widetilde{\mathrm{REM}}}_{j}+\beta 4\times {\widetilde{\mathrm{FCI}}}_{j}+{\widetilde{\beta 5}}_{j,z}$$
(23)
$$\log ({\widetilde{\mu }}_{j,z}) \sim \underline{\gamma 0}+\gamma 1\times \underline{\mathrm{AME}}+{\widetilde{\gamma 2}}_{j,z}+{\widetilde{\gamma 3}}_{j,z}+{\underline{\gamma 4}}_{st}+\underline{\gamma 5}$$
(24)
Consequently, we generated the predicted consumption, frequency and quantity (see description of equation (1)) for cell j and scenario z using the following equations:
$$\widetilde{{{\rm{c}}{\rm{o}}{\rm{n}}{\rm{s}}{\rm{u}}{\rm{m}}{\rm{p}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}}_{j,z}}=\mathrm{Bernoulli}({\widetilde{\pi }}_{j,z})$$
(25)
$$\widetilde{{{\rm{f}}{\rm{r}}{\rm{e}}{\rm{q}}{\rm{u}}{\rm{e}}{\rm{n}}{\rm{c}}{\rm{y}}}_{j,z}}=\mathrm{Beta}({\widetilde{\varphi }}_{j,z}\times \kappa ,(1-{\widetilde{\varphi }}_{j,z})\times \kappa )$$
(26)
$$\widetilde{{{\rm{q}}{\rm{u}}{\rm{a}}{\rm{n}}{\rm{t}}{\rm{i}}{\rm{t}}{\rm{y}}}_{j,z}}=\mathrm{Gamma}({\widetilde{\pi }}_{j,z}\times \theta ,\theta )$$
(27)
In this way, we estimated consumption rates in each cell j, and for each scenario z, using equation (1) as
$$\widetilde{{{\rm{c}}{\rm{o}}{\rm{n}}{\rm{s}}{\rm{u}}{\rm{m}}{\rm{p}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}{\rm{r}}{\rm{a}}{\rm{t}}{\rm{e}}}_{j,z}}=\widetilde{{{\rm{c}}{\rm{o}}{\rm{n}}{\rm{s}}{\rm{u}}{\rm{m}}{\rm{p}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}}_{j,z}}\times \widetilde{{{\rm{f}}{\rm{r}}{\rm{e}}{\rm{q}}{\rm{u}}{\rm{e}}{\rm{n}}{\rm{c}}{\rm{y}}}_{j,z}}\times \widetilde{{{\rm{q}}{\rm{u}}{\rm{a}}{\rm{n}}{\rm{t}}{\rm{i}}{\rm{t}}{\rm{y}}}_{j,z}}$$
(28)
And calculated the biomass consumed (in kg) in each cell j, and for each scenario z, as
$$\mathrm{kg}\,{\mathrm{consumed}}_{j,z}=\widetilde{{{\rm{c}}{\rm{o}}{\rm{n}}{\rm{s}}{\rm{u}}{\rm{m}}{\rm{p}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}{\rm{r}}{\rm{a}}{\rm{t}}{\rm{e}}}_{j,z}}\times \widetilde{{{\rm{A}}{\rm{M}}{\rm{E}}}_{j,z}}\times 365$$
(29)
Where the \(\widetilde{\mathrm{consumption}\,\mathrm{rate}}\) is the result of equation (28) for cell j and scenario z, \(\widetilde{\mathrm{AME}}\) is the number of AMEs estimated to be present in each cell j for scenario z; 365 is the number of days in a year.
Finally, by summing up the predicted consumption rates, we calculated the total quantity of wild meat (in tonnes) consumed in the region in one year for each scenario z as
$$\mathrm{Total}\,\mathrm{tonnes}\,{\mathrm{consumed}}_{z}=\mathop{\sum }\limits_{j=1}^{J}\left(\frac{\mathrm{Tonnes}\,{\mathrm{consumed}}_{{j},{z}}}{1,000}\right)$$
(30)
Where total tonnes consumed is the number of tonnes consumed in the region in a year for scenario z, tonnes consumed is the result of equation (29) for cell j and 1,000 is the factor converting consumed kilograms to tonnes.
Finally, we mapped the consumption rates (Fig. 3a) and tonnes of wild meat consumed (Fig. 3c) in Central Africa using QGis (.3.22.1)69 by interpolating the values thus obtained using the plugin Heatmap, which returns a density layer using kernel density estimation weighed using the predicted values. For that, we used a radius of 0.9 decimal degrees and a Quartic kernel decay rate.
Evaluating geographical uncertainty
As we predicted wild meat consumption over the entire Central African region, we wanted to evaluate the uncertainty of our estimates. To do so, we produced maps of uncertainty associated with our spatial estimates following two different approaches.
First, we extracted the s.d. of the posterior distribution of predicted values of each cell j (1) consumption rates (Extended Data Fig. 3b) and (2) biomass consumed (Extended Data Fig. 3c).
Second, we produced (3) a map of uncertainty based on the difference between the characteristics (that is, the continuous variables evaluated as potential drivers) of each cell of our prediction grid and the average values of our data. To do so, we first calculated the mean of the values of each continuous variable V (that is, human population density HPD, remoteness REM, human development index HDI and forest condition index FCI) assigned to each recall event r. By that, we obtained four average values \({\underline{M}}_{V}\), one for each variable V. Then, for each variable V, we subtracted \({\underline{M}}_{V}\) calculated from the data to the value x assigned to each cell j of our prediction grid. In doing so, we obtained a difference \({\delta }_{j}={x}_{j}-{\underline{M}}_{V}\), with 0 being equal to no difference between the average of the data and the value of the prediction cell. To standardize the difference, we converted negative values of \({\delta }_{j}\) to positive and then obtained the normalized differences \({\delta }_{j}^{{\prime} }\), with values of between 0 (that is, no difference) and 1 (maximum difference), one for each variable V. We then summed the values obtained in this manner for each prediction cell j and obtained an index of dissimilarity going from 0 (that is, no difference) to 4 (that is, maximum difference), as \(\triangle j={\sum }_{j=1}^{J=874}{\delta }_{j}^{{\prime} }\).
Finally, we mapped the uncertainty values obtained in QGis (v.3.22.1)69 (Extended Data Fig. 4). As above we used the plugin Heatmap, with radius around points of 0.9 decimal degrees and a Quartic kernel decay rate.
Compiling, running and checking the model
We coded our model in Stan96 and run it in R (v.4.2.0)76, using four chains in parallel for 5,000 (4,000 warmup) iterations each using RStan (v.2.26.11)77. We then evaluated the model convergence by examining the potential scale reduction factor Rhat of the estimated parameters (Extended Data Table 3) as well as the trace plots (Supplementary Fig. 15) of the realized iterations87. Finally, we ensured that the model did not show issues of collinearity by visually inspecting the pairs plots of the residuals of the explanatory variables87 (Supplementary Fig. 16).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

