Micro-level social structures and the success of COVID-19 national policies – Nature.com

Physical interaction network model

COVID-19 mainly spreads from person-to-person in interactions with close physical proximity. Consequently, one ideally needs to build a physical contact network-based on micro-level geographical co-location data for all (or at least a representative sample) of the population. However, collecting the data on all physical contacts between all individuals in a population is an impractical, time-consuming task40. Moreover, even if we did have access to all co-location data, we would run into computational issues due to the size of the resulting network, especially for policy applications that require analyzing many what-if scenario questions. To address these issues we use scaled, notional networks, constructed based on real demographic data to be used as the initial point (that is, pre-pandemic collocation) for in-person contacts. These types of scaled, notional networks provide us with a balance between the diversity of networks for different countries enabled by real data, and the practicality and computational efficiency needed for scenario studies.

But which class of notional networks is suitable for this purpose? The structure of physical contact networks (geographic co-location) is quite different in nature from those commonly observed in online social networks and other types of networks whose functions are primarily exchange of information. Unlike the latter in which (often) strong preferential attachment mechanisms can result in the emergence of high-degree network hubs and power-law distributions, the degree of physical contact networks are tightly linked to nodes capacity in terms of time and mobility. Small-world networks41, on the other hand, have been shown to be able to capture several empirical facts of the contact pattern such as, the heterogeneous number of contacts, people living in small groups which are overlapping, and people contacting across long distances due to work and social relations42. Similar qualities of the small-world network are also found in a simulated physical contact-based network, created by Eubank30. Eubank’s physical contact-based network is built by large-scale individual-based urban traffic simulations using data from the census, land-use and population-mobility. The degree distribution and diameter of this simulated network suggest that the people-contact graph is more like a small-world graph rather than a random graph. Evidences of the small-world network have also been shown in mobility networks. For example, the structure of the mobility network in Germany, generated by Schlosser43, shows properties of the small-world network, which are the large clustering coefficient and the small average shortest path length. In this work we use small-world networks to represent the pre-pandemic contact network of a given country and in the next section will discuss the process we use to set the parameters of such networks.

Generating pre-pandemic notional contact networks using demographic data

Building the notional small-world network requires three parameters41: the number of nodes, N, the mean number of neighbors of each node, M, and the probability of each link to rewire, P. In our model, we fix the number of nodes that represent individuals in our notional model for all of the countries and use the other two parameters (M and P) to introduce key variations across different countries. Although this decision is not necessary, it facilitates comparison of the effect of different policies across countries.

We use the next parameter, M, to capture the average number of in-person contacts of individuals per unit of time. Although this data is not available for all, we do have access to survey-based data for a number of countries. Such survey reports44,45,46,47 provide a large-scale quantitative approach for infections transmitted by in-person contact route. For this work, we use data from Joe Mossong’s population-based servery46, which includes a total of 7,290 participants and 97,904 physical contact records with different individuals during one day across five targeted European countries. To use the data from this survey, we simply round the mean reported contact number of each country in Joe Mossong’s survey report to the closest integer to approximate the mean degree (M) of corresponding simulated contact-based network. For example, Joe Mossong’s survey report shows that the mean daily reported contact number, including the familial contacts (within-household interactions) and external contacts (out of household in-person social interaction), in Germany is 7.96, then the mean degree is set as 8 in the simulated small-world network of Germany.

Once the values for N and M are set for each country, we are left with setting P. This parameter can drastically modulate the speed of disease transmission41,48, but how do we decide its value? This is not directly evident as, unlike the other two parameters, we don’t have a direct corresponding interpretation of this parameter in real networks, and to the best of our knowledge, there is no previous work that maps real data on physical contacts to estimate the value of rewiring probability in small-world networks. This becomes even more challenging in our work as we don’t want to estimate this parameter to match simulation to the real virus spread data. Here we propose an algorithm to transform the survey-based data on average household size that has been previously shown to be important in shaping the disease dynamic16,18, combined with the value of the average number of physical interactions used in the previous step, into an estimated value of rewiring probability. The data on average household size across different countries is readily available for most counties and is regularly updated by various organizations that collect such data (see the 2019 report in ref. 49, for example). The basic idea behind this transformation is assuming a limited close interaction capacity (time and cognitive capacity) for each individual50, which means that if people have a fixed contact number and more connections with their families, then they have fewer external connections with distant groups. That is, in the small-world network, such nodes have a lower probability to rewire links from neighbors to further nodes.

We use a four-step process to use the country-specific average household size for setting the rewiring probability in the representative small-world network. First, we build a small-world network with the given N and M, already determined, and an initial estimated value for P, which will be adjusted in the final step. Second, all nodes were separated into N/K different groups by order, where K is a new parameter and considered as the smallest integer that is larger than the average household size. After the first two steps, the links within each group are designated as the strong links (that is familial contacts), which are stable and—in our model—are assumed to not change during the social distancing phase. The rest of the links are considered as weak links (external contacts) and may be disconnected by social distancing policies. Third, we consider the full social distancing case in which all weak links are removed as a result of NPIs. This hypothetical transformation will leave us with many disconnected sub-networks, where each sub-network (connected components, but not necessarily full graphs), contains at least one node and at most K nodes, given how step two is designed. Given the way these sub-networks are constructed, we then consider each as a household and the number of nodes in each sub-network is denoted as the household size. We then calculate the average household size of the small-world network by averaging the size of all the sub-networks generated in step 3. In the final step, we adjust the value of P for each country, to match the average size of these sub-networks to the average household size from the survey data.

Contagion model

Similar to many recent studies on the spread of COVID-19, we model the dynamic process of virus transmission as a continuous-time SEIR stochastic model2,3,4,10,12,27,51. In the SEIR model, every individual in the contact-based network is in one of four possible states: susceptible (S), exposed (E), infected (I) and recovered (R). Compared to the other classic compartmental model in epidemiology, that is, the SIR (susceptible, infectious, recovered) model, SEIR adds an additional state, E, to denote the infected individuals who are in the incubation period, which is the transition time between the exposed and infected states. Assuming this state is crucial for many applications, given the incubation period of COVID-19 that can last up to 14 days52.

The infectability of pre-symptomatic cases has been a matter of many studies over the last year. Although some scholars assume that the individuals who are in the incubation period of COVID-19 are infected but not infectious2,3,4,51,52, others consider that the familial cluster of the COVID-19 infection indicates a possibility of transmission during the incubation period10,53,54,55. In the work of Aleta et al.10, which uses the SLIR model, the individuals in state L (which has a similar definition as the state E) is set as infected but not infectious. But Aleta et al. add one more state (P for pre-symptomatic) between L and I to express the infected, infectious and pre-symptomatic state. Furtmermore, Aleta and colleagues’ work separates the infected state (I) into two sub-states: where one is infectious and symptomatic, and the other is infectious but asymptomatic. In other words, Aleta considers the infected, but asymptomatic or pre-symptomatic individuals are infectious.

In our model we define that the exposed node is pre-symptomatic and infectious. Under this setting, our model show that even when social distancing policies were applied early when there were a small number of reported accumulative confirmed cases (the sum of nodes I and R), the epidemic still increased fast due to a mass of undetected infectious cases (E nodes). In our SEIR model, the susceptible individual (S nodes) can be infected by the pre-symptomatic but infectious individual (E nodes) or the symptomatic infectious individuals (I nodes). The infection rates of E nodes and I nodes are thus distinguished by: βE and βI. The main reason to set two different infection rates is that the pre-symptomatic individual can not be observed. As a consequence of unobservable symptoms, pre-symptomatic individuals can easily transmit the virus due to the lack of protection during the contact. By contrast, the symptomatic individual is observable and then the infection prevention can be applied during the contact.

We assume that an exposed individual will infect its susceptible neighbors after a period, with a probability that follows the exponential distribution with the parameter, βE. In other words, a susceptible node will be infected by one exposed neighbor after an average of \(\frac{1}{{\beta }_{E}}\) days. Following the same logic, a susceptible node will be infected by one infected neighbor after an average of \(\frac{1}{{\beta }_{I}}\) days. Aggregating the impact of contacts with exposed and infected neighbors, a given susceptible node, z, will be infected by one of its exposed or infected neighbors with the weighted infection rate, \({\beta }_{W}^{z}\), where

$${\beta }_{W}^{z}={\beta }_{I}\times {\mathrm{number}}\,{\mathrm{of}}\,{\mathrm{infected}}\,{\mathrm{neighbors}}+{\beta }_{E}\times {\mathrm{number}}\,{\mathrm{of}}\,{\mathrm{exposed}}\,{\mathrm{neighbors}}$$

After an average of \(\frac{1}{{\beta }_{W}^{z}}\) days, node z will move to the exposed state (E). In the early stages of the pandemic, we assume that βE and βI are the same due to inadequate special precautions such as observing social distancing or wearing masks. Once the exposed individuals (E) are showing symptoms or have tested positive, they are moved to the infected state (I). In the early stages of the outbreak and without sufficient test capacity, the incubation period follows the exponential distribution with the mean of \(\frac{1}{\alpha }\) days. Moreover, the expected period that it takes for an infectious individual to move into the recovered state follows an exponential distribution with γ. Once the I node is recovered, it will be moved to an R state, after which the node will no longer be infected.

For the SEIR simulation, we use the Gillespie algorithm56,57, which is one of the efficient methods for simulating stochastic processes, and has been adapted to work for temporal-network simulations58. The continuous nature of the method can make it easier to set the NPI timing more precisely in the simulation since this timing is a function of the number of cases (see Supplementary Section 1 for more details regarding the implementation of the SEIR model).

Modeling NPIs

In most countries, the first wave of the pandemic quickly caught the attention of local and federal governments who responded to the rapid spread of the virus by implementing a set of NPIs. In the early months of 2020, these interventions were mostly in the form of social distancing policies, ranging from limited rules for stores and restaurants capacity all the way to partial and total lockdown. What social distancing measures each country had in their NPIs portfolio in the early stages of the pandemic, and for how long, created a heterogeneity in the aggregate strength of cross-country NPIs. Creating a compact model requires a composite measure of policy stringency that incorporates all the essential heterogeneity factors. Moreover, once the stringency measure is quantified, we need to apply it to the country-specific small-world network to build a post-NPI network. In this section we discuss our method to tackle these two questions.

It has been empirically demonstrated that social distancing policies modified the contact pattern of people and reduced social interactions, even after taking into account the voluntary responses by the people to the pandemic39. In our model, the direct impact of social distancing policies needs to turn into a method for modification of the contact-based simulated network. One possible method is to infer policy stringency from changes in mobility data, using available resources provided by companies such as Google and SafeGraph. Although these datasets have been used by many scholars to estimate the degree of social distancing6,10,12,13,39, the complexity and scale of such datasets call for multi-level, elaborate models, something that is against the main goal of this work. Some of such elaborate models have been successfully implemented by other scholars: for instance, in Aleta and colleagues’ research10, a weighted multi-layer synthetic network is built to represent the interactions in different groups (school, workplace and community) by combining mobility data, census and demographic data in the metropolitan Boston area. In another work, Lai et al.12 analyzed the mobility change over a network, consisting of 340 prefecture-level cities in mainland China, by the dataset on near-real time daily relative outbound and inbound flow of smartphone users for each city in 2020.

The goal of this paper, however, is to build a simple compact model, using high-level country-level input data, rather than detailed data at the local levels. To maintain the simplicity needed for such a compact model, we capture the impact of policies on social interactions by the Government Stringency Index (GSI), defined and calculated by the Oxford COVID-19 Government Response Tracker59,60, which tracks policies and interventions of cross-national governments across a standardized series of indicators over the period of COVID-19 and creates a set of country-specific GSIs to measure the extent of governments’ responses. More details on the component of the GSI and the trajectories of GSI in targeted countries can be found in the Supplementary Sections 2 and 3.

Social distancing policies are dynamically adjusted based on the state of the epidemic. This dynamic process of policies is also reflected by the trajectory of the GSI60. In the early stage of the pandemic, the GSI in each country grew higher and higher until to the maximum. To simulate the intervention of policies on the pandemic, applying multi-time adjusted GSI is indeed more realistic. In doing so, however, the complexity of the model is also increased by more inputs in the GSIs and corresponding temporal shifts for the GSIs. To avoid this in favor of simplicity and since our primary goal is to explain/predict the relative trajectories of different countries, we focus on the strongest index for each country and the corresponding time in which the index jumped to its strongest level. This is especially a feasible assumption for the early stages of the pandemic during which the stringency index was a monotonic non-decreasing function. This level can also be interpreted as the accumulation of all the previous policies up to the point of the strongest measures.

Once the level and timing of NPIs for each country are determined, we need to change the small-world contact network accordingly. This involves deciding about the timing, types and the ratio of link removal. Given our decision to use a single accumulated change in the contact network, we make this change once the GSI reaches a critical point, GSIt. We then scale the GSIt to a ratio σSD [0, 1]. To decide which links to remove, we only focus on the weak links, as previous empirical studies have demonstrated that a substantial portion of the long-distance connections (weak links) disappeared after the implementation of policies, at least during the early stages of the pandemic43. Consequently, in the next step, we randomly remove the proportion σSD of weak links on the basis of how we defined weak versus strong links in the previous section. For example, if we set the GSIt as the maximum value of the GSI in the study period, then the GSIt in Germany is 76.85 and is scaled to 76.85% as σSD, which is the proportion of weak links that are randomly disconnected after the implementation of the social distancing policy in Germany. Importantly, for each country, the σSD is transformed from its GSIt with the same scale, which is aligned with our goal of not using any country-specific calibration. This proportional removal of close social contacts especially makes sense for the early stages of the pandemic, before other factors such as lockdown fatigue, social learning or the arrival of vaccination change the correlation between the two. We validated this assumption by comparing the stringency of country-level GSI with google mobility index of that demonstrates how much time people spend at their residence location (see Supplementary Section 4). After the removal of a subset of weak links, the modified contact-based network in our model exhibits a reduction of the network’s small-world property, something that is also captured by Schlosser’s empirical research43.

After the scaled GSIt (σSD) is set as the input to simulate the intervention of social distancing policies on the contact-based network for countries, we need to intervention time in the simulation. In our model, we set the increasing proportion of accumulative confirmed infected cases in the whole population (record by The Oxford Martin School38), rather than the date time, as the metric to describe when the intervention happened42. That is, in the simulation model, when the number of accumulative confirmed infected nodes account for a threshold of the whole population, the contact-based network is modified. Moreover, in the simulation, we need to scale the threshold smaller to ω, where ω (0, 1), due to the smaller population size in the simulation model, comparing to the real population in countries. Similar as the σSD, for each country, the ω is also transformed from its threshold with the same scale. More details on country-specific intervention timing’s selection, and including a figure of the demonstration for all countries can be found in the Supplementary Section 3.

Model validation with longer period

Our model is validated using data from the first wave of the pandemic by focusing on the first two months. This choice of modeling period was a conscious decision: our goal here is to capture the effect of the interaction of social distancing policies with micro-level factors. Thus, to capture as much of this interaction as possible by using a simple model, we focused on the first wave of COVID-19 to have clear data from the pre-treatment periods, both in terms of social interaction data, as well as policy data. Importantly, this was the only period that different countries started from no-policy to some level of policy, whereas all other subsequent waves had different starting points for different countries. Moreover, to limit the number of confounding factors, we decided to focus on the period where policies were changing monotonously— although at a heterogeneous rate in different countries. This is because capturing the ramping down of the policies in the model requires a more explicit way of accounting for country-to-country variations in voluntary social distancing after the number of cases starts falling. So, that brings us to the period between the beginning of the first wave, until the situation comes under control in most countries and they start reversing some of the policies.

With this background rationale in mind, we extended the period of the model to include 90 days and 120 days and the results are included. We did not try the model for longer than 120 days, since after the first four months, the number of cases in most of these countries is quite low and most of the social distancing restrictions are lifted. This leaves little room for the interaction effect to be captured by our model.