Complex Systems

Modelling a Pandemic

Building an agent-based model using Python and pandas

Christian Graf
Towards Data Science
8 min readSep 3, 2020

--

Photo by Christopher Burns on Unsplash

In our daily life, we encounter many complex systems where individuals are interacting with each other such as the stock market or rush hour traffic. Finding appropriate models for these complex systems may give us a better understanding of their dynamics and allows us to simulate its behaviour under changing conditions. One way of modelling complex systems is by using agent-based models, meaning that we are explicitly simulating individuals and their interactions instead of deriving the dynamics of the system in an aggregate way.

You can find an introductory article about agent based models in general here:

In this post, we want to develop such an agent-based model using python. As an example, we try to model the behaviour of a pandemic. Please note that I am not at all an epidemiologist. The goal of this post is not to build a sophisticated model capable of making real life predictions, but rather to see how we can build a simple agent-based model and study some of the resulting dynamics. Let's start with some basic considerations.

Foundations of Our Model

For our example we assume a non-lethal disease that may spread between individuals which were in contact with each other. The most basic approach is to consider three different groups:

  1. Individuals that are not yet infected, called the susceptible group.
  2. Individuals that are infected and may spread the disease.
  3. Individuals that have recovered from the disease and are now immune.

Because of the three involved groups (Susceptible, Infected, Recovered), these models are also called SIR-Models.

Analytical SIR-Model

We will start with a mathematical SIR-model that will serve us as a benchmark model. In the basic SIR-model, the flow between the three groups is: S -> I -> R . It is a one-way street where in the beginning most individuals are in the S group, eventually cascading via the I group into the R group. At each time step t a certain amount of individuals are traversing from S to I and from I to R, while the total number of individuals N = S+I+R stays constant. We can write these dynamics into a set of differential equations, or, in a bit more understandable form, we can write down by how much each of the groups changes for a certain time step:

Basic SIR-Model

The dynamics are governed by two variables β and γ. While β is the rate with which infectious individuals infect others, γ is the rate at which infectious individuals recover. These dynamics are visualized below for a fixed β and γ:

You can see that the number of infected individuals grows fast, peaking around day 40 which is when the number of susceptible individuals drops significantly, slowing down the rate of infections. This is simply because by then a significant amount of individuals already had the disease and cannot be infected anymore. Towards the end, the number of infected individuals drops to zero, eradicating the disease. Note that by then around 20% of the individuals were never infected. This so-called steady-state solution can also be calculated analytically and depends on the parameters β and γ.

With this simple SIR-model we can already observe some basic dynamics for our problem. However, we are looking at our groups only in an aggregate way. We assume that the individuals are a homogeneous, unstructured set organized into three well defined, perfectly mixed groups. The interactions that are modeled are only on average. Every infected individual infects on each day a fixed number of contacts and a constant fraction of all infected individuals is cured each day. There is no way of implementing complex social interactions of individuals within this model. In order to relax some of these assumptions we will now set up an agent-based model simulating each individual separately.

Agent-Based Model

Our first goal is to reproduce the results from the analytical SIR-model. As a data structure we want to use pandas dataframes. Let's start with initializing 10'000 agents represented as rows in the dataframe:

Currently, the dataframe has only one row called state which indicates the health state of the agent. We encode susceptible with 0, infected with 1 and recovered with 2.

Now we need some function that infects an agent. We want this function to take a list of agents that were in contact with an infected agent. Additionally, we want to give a probability with which these contacts actually get infected. Here some Monte Carlo methods come into play in order to add randomness. The function below does the required job.

The list of contacts allows to hold the same agent multiple times. We roll a random number between 0 to 1 for each unique agent in the contact list and update the state from susceptible (0) to infected (1) if this roll is below a probability threshold. The last line of the function is updating the state column accordingly.

Similarly, we need a function that recovers infected agents with a certain probability. Here, we use a flat chance of recovery in every time step.

The infect and recover functions are called at every time step. For this we create a step function. Here, we are generating the list of random contacts which has a length of a constant time the number of infected agents.

In order to get a feeling for the variations in the outcome of our agent based model we will run the simulation ten times. For each experiment we initialize a set of 10'000 agents with 5 infected patients zero to start with. We then perform 150 time steps.

Baseline Results

Visualizing the size of each of the three groups (susceptible, infected and recovered) at each time step, we can see that the dynamics of our agent based model are in agreement with the basic SIR-model.

Outcome of the base line model. Here we use the following parameters: _randomContacts=9, _chanceOfInfection=0.025 and _chanceOfRecovery=0.1. For the SIR-Model, the parameters are β=0.225 and γ=0.1 .

The solid lines show the median of our 10 runs of the simulation, while the shaded area shows the area between the 25%-75% quantile. Even though there is some variance in the central part of the simulation, all models arrive at a very similar endpoint, which equals to the analytical solution.

Up to now we have not gained much in comparison to the basic SIR-model, but we have setup an agent-based baseline model and verified that it behaves similar. With this setup we can now start to add extra complexity.

Spatial Agent-Based Model

It is intuitive that the assumption that an infected agent will have contact with a set of completely random agents may not hold true in real life. You would rather expect some social neighborhood, a group of contacts the infected agents acts with on a regular basis. An easy way of simulating this effect is to place the agents on a lattice and let them interact with their nine closest neighbors.

Outcome of the spatial model. Here we use the following parameters: _randomContacts=0, _chanceOfInfection=0.06 and _chanceOfRecovery=0.1. For the SIR-Model, the parameters are β=0.54 and γ=0.1 .

Note the prolonged x-axis. You can see that the dynamics are now much slower for the spatial agent based model. I even had to increase the chanceOfInfection significantly, to get it going. The structure that we introduced to the contacts leads to the fact that an infected agent lives in an environment were there are already many agents who are infected as well or have recovered already thus leading to a significant decrease in the spreading of the disease. We can have a look at the spatial distribution of the agents visually in the animation below:

Blue: Susceptible, Yellow: Infected, Green: Recovered

Adding Random Contacts

We saw that when we introduce spatial structure to the social interactions of the agents, the dynamics of the disease are slowed down significantly. What happens when we introduce for every agents an additional random contact besides its nine spatial neighbors?

Outcome of the spatial model. Here we use the following parameters: _randomContacts=1, _chanceOfInfection=0.06 and _chanceOfRecovery=0.1. For the SIR-Model, the parameters are β=0.6 and γ=0.1 .
Blue: Susceptible, Yellow: Infected, Green: Recovered

With only one additional random contact the dynamics of the infection are again much faster, quickly breaking the structure we introduced by placing the agents on the lattice.

Increasing Complexity

We have a working setup that one can now play with by increasing the complexity. One could think of modeling different separate clusters of agents that are only interconnected weakly, or introducing an age structure for the agents reflecting different kinds of interactions for different age groups. Additionally, one could start introducing measures to reduce the chance of infection at a certain time step or reducing the number of contacts.

Performance

One word about the performance of the model. Usually, I like using an object oriented approach for building agent-based models. Modelling the agents as a class makes the simulation and the coding quite intuitive. However, in python the simulation may quickly become relatively slow. By storing the data into pandas dataframes, where one row represents one agent, we are loosing a bit of flexibility, but we can rely on numpy functions doing the major workload, thus making the simulation reasonably fast. The presented examples run with about 50 steps per second on my machine for 100'000 simulated agents, producing the output of the simulation within a few seconds.

Conclusions

I showed you how to set up a basic agent-based model from scratch. We looked at the example of modelling a spreading disease. As a first step we were validating a minimal version of our model against a known mathematical model. We then started changing parameters in order to investigate changes in the dynamics of the system. By introducing a lattice structure to the agents we observed that the spread of the disease slowed down significantly, but allowing for only one random contact again lead to increasing dynamics. The presented implementation is a flexible setup that allows for an easy implementation of more complex interactions, heterogeneity and structure within the agents. Also, we are capable of studying agents on an individual level, or subgroups of agents within a complex, large scale simulation.

Feel free to use this setup as a starter and play with it. The full code can be accessed here:

--

--