Note: A shorter version of this post was written for Scientific American, which was posted on April 17th.
The Density Paradox
Anyone who follows the journalist Matt Yglesias on Twitter knows his style. He’s sardonic, facetious, and relies on barbs to make his point. His recent tweet on March 23, 2020, at 8:10 pm EST is a perfect example:
The moral of corona virus is that we should adopt the kind of low-density living patterns associated with Asian countries like South Korea, Japan, Taiwan, and Singapore that have successfully controlled its spread.
Though the tweet oozes sarcasm, it highlights how southeast Asian cities—known for their hyper-density—not only were invaded by the coronavirus pandemic but also figured out ways to slow its spread without destroying the very essence of what makes cities so successful. But his tweet contrasts sharply with the perception that New York City—the epicenter of the epidemic in the United States—is somehow to blame for causing so much damage. For example, the New York Times on March 23rd ran an article entitled, “Density Is New York City’s Big ‘Enemy’ in the coronavirus Fight.” Journalist, Brian Rosenthal, writes:
New York has tried to slow the spread of the coronavirus by closing its schools, shutting down its nonessential businesses and urging its residents to stay home almost around the clock. But it faces a distinct obstacle in trying to stem new cases: its cheek-by-jowl density.
This kind of sentiment begs the question: what is the role of population density in spreading the virus? Are big cities worse than smaller ones? Does New York deserve such harsh criticism?
How Contagions Conjugate
Before we answer these questions, let’s take a step back and focus on the process of infectious diseases in general. For this, we need to assume that no preventive measures are taken, and everyone goes about business as usual, without social distancing or self-isolation. To understand how widely and quickly an infectious disease will spread, epidemiologists refer to something called the reproduction number or simply, R. It measures the average number of people that an infected individual will infect over the course of their infection.
Among extant diseases, measles has the highest reproduction number, at about 15. That is to say, assuming that the measles vaccine didn’t exist, one person with measles, if left untreated or unquarantined, can expect to pass it along to 15 others (hence the 2019 crises). For influenza, that number is only about 1.5. The current evidence puts the coronavirus at around 2.65—not as bad as the measles, but worse than the typical flu strain.
The Reproduction Number
The number of new cases on any given day is directly tied to the reproduction number. Say, for example, that R is two. This means that the average infected person passes it on to two people, who then collectively pass it on to four people, who then collectively pass it on to eight people and so on. Then, assuming the virus began with a single person, the number of new cases on a given day is directly tied to the reproduction number and the number of days since “patient zero” was infected.
But to know why and how fast the disease spreads, we need to know more about the reproduction number and what determines if it is large or small. It turns out that the reproduction number is determined by four elements. Thus, to see the role that cities play in spreading the virus, we need to look more deeply at those components.
First is the contact rate. It is a measure of the average number of contacts people have with other people on a given day. Extroverts will have more contacts than their introverted counterparts. People who take public transportation and interact with many co-workers or customers each day will have more contacts than those that commute by car and sit at a solitary desk. Children in a crowded school typically have more contacts than their grandparents. Those sitting at a computer screen in Area 51 monitoring for extraterrestrial life will have fewer contacts than D.J.s. (but maybe more with Martians or Klingons).
If we take all the people in a society and calculate how frequently they mix on average, then we get the contact rate. One thing to consider, however, is that contacts are not the same for all infectious diseases. Influenza and HIV spread through very different mechanisms, and thus the count of contacts differs greatly as well. (More generally, the structure of social contacts, who interacts with whom and who they interact with as well, can also play a role, but we ignore that for the time being.)
Second is the transmission rate, which is the likelihood that any one individual will pass the disease onto someone else. Not all diseases are equally transmissible. Some infectious diseases, such as HIV, are relatively difficult to transmit. The CDC estimates that only 63 out of 10,000 exposures to shared needles will result in the transmission of HIV. Estimates for the common cold are much higher, 2%-40% depending on age. Coronavirus appears to be in the lower end of this range, perhaps around 1%-5%, although the data is far from complete. One of the key problems with the coronavirus is that even if an infected person chats with only a few other people on a given day, there is a significant chance that she passes on the infection.
Third is the time it takes for the disease to work its way through the human body from infection to illness to recovery. In the case of the Corona Virus, the average time is about a month—two weeks between infection and first symptom and then about two weeks until recovery (or death).
Last is the fraction of people at a given time who are susceptible to the disease. In the case of a novel virus that is newly introduced into a population, such as Corona Virus, susceptibility is typically near 100%. One may recall the stories of Hernan Cortes carrying smallpox to the Aztecs, who were decimated from its spread. As the disease spreads and people recover, that fraction of susceptible people in a population will fall because of developed immunity.
In summary, the reproduction number—the average number of infections a person will bestow to their fellow humans —is determined by the product of the contact rate, the transmission rate, duration of infection, and fraction of susceptible people in the population.
To offer an analogy, we can think of an infectious disease as operating like a fire. A spark ignites the wood, which burns brightly and hot, until finally, all the fuel is spent, and the fire goes out. The reproduction number determines how hot and bright the fire burns. If R is bigger than one, more and more people are infected, and the fire grows larger. But, as the epidemic grows, more people are infected and then recover to become immune; this means that the fraction of susceptible people is smaller.
In turn, as this fraction gets smaller, the reproduction number gets smaller and smaller, eventually dropping below one. It is at this point that the fire has reached the peak of its heat and begins to die out and eventually disappears because there are no longer enough susceptible people to keep the fire going. It may be that people have recovered to become immune or that people simply die. This is the story of the Black Death, which killed somewhere between 75 and 200 million people in the mid-14th century. Once started, it burned until the hosts were buried in the ground.
The Role of Density
But what role does density play in these fires? The intuition that population density increases the propensity of an epidemic to spread in cities is correct in the sense that increased density likely leads to an increase in the contact rate of individuals, which makes the reproduction number larger, leading to more infections in dense areas. If you live in an apartment building, commute in a subway, work in a skyscraper and take your lunches in crowded diners, you are more likely to interact with someone who passes the infection along to you.
And, you’re likely to spread the infection more widely than others living in less dense areas. Thus, the reproduction number should be bigger in large dense cities and lead to bigger fires. But, how much bigger and how much of an effect will it have? We aim to answer this question below for the current stage of the epidemic in the US.
The Missing Role of Time
If we think about an epidemic spreading across a spectrum of cities, the reproduction number misses a portion of the story. An infectious disease doesn’t enter all cities at the same time. In the U.S., we heard notable stories of outbreaks on the west and east costs early in the epidemic. Only more recently are we are hearing about New Orleans, Detroit, and other urban areas in North America. The reproduction number tells us little about how the epidemic first appears and spreads through these cities and across the globe.
To think through this spread, imagine each city or town as a small forest of trees, once a spark begins a fire anywhere in the city, the rest of the forest will burn until the fuel is spent. Because dense cities have higher contact rates, they will burn a little faster and a little longer, but what is currently happening in New York City and New Orleans will be replicated in cities across the country as soon as a spark arrives. As the epidemic fire burns, it will spread to nearby areas: New Rochelle, NY lead to New York City and nearby places in New Jersey and Connecticut. But, as people travel between more distant areas of the country, this creates new sparks in new locations, and new fires start.
Thus, a traveler from elsewhere arrives in New Orleans for Mardi Gras or an otherwise anonymous spring-breaker arrives in Miami, and a new fire begins. Because we are early in an epidemic, we haven’t seen how the sparks distribute across the landscape. But, the growing fires will be bigger in areas where the fire started earlier.
Contact Rate versus Sparks: Time is of the Essence
As discussed above, the reproduction rate is given by four variables: the contact rate, the transmission rate, the disease duration, and the fraction of the population that remains susceptible. But for analyzing the coronavirus epidemic in the U.S. during the last month, the transmission rate and duration rate are likely constant across the U.S. As for susceptibility, since we are analyzing things relatively early in the epidemic (at least for the U.S.), it’s safe to assume for now that the susceptibility rate across the U.S. is also constant and likely close to 100%. Even in a city with a large outbreak like New York, with more than 30,000 cases, it is still far below 1% of the population.
But the contact rate across the country is likely affected by factors, such as population density or economic density (like regional gross domestic product). At the same time, we can measure the number of sparks which could start a fire by, for example, by the presence of a major international airport. Finally, we can investigate timing by looking at when each fire started, measured by the number of cases in each county in early March, a little over a month after the first confirmed case in the U.S.
The Density Effect
To this end, we have performed a statistical (regression) analysis of daily county-level COVID-19 cases as of March 27, 2020 (day 66 of the epidemic in the U.S.), along with several potential explanatory variables, in an effort to see how density, airports, and timing help to determine the current spread of cases across the United States. (Technical details are here.) The results show that population density does matter but is not as large as the popular media would have you believe. In fact, on average, an increase in county-level population density by 20% increases the number of cases by about 11-12%, on average (conditional on having at least one case of the virus). In other words, more populous counties are likely to have fewer cases on a per capita basis than their sparser counterparts.
We find, however, that what matters more than population density is the presence of a very large airport. A county with an airport that has at least one million passengers is likely to have nearly double the cases as compared to counties with no or smaller airports.
Further, counties that had early cases of COVID-19 have much larger case counts today. Our findings suggest that if a county had a least one case on March 1 (day 40), then by March 27, they are likely to have nearly double the cases of counties without an early case. The timing of early case arrivals is much more important than that of population density. This suggests that, at this early stage of the epidemic, the current distribution of cases in the U.S. is determined more by the arrival of early cases into a county and less by the density of the county itself. There are no low-density, low-risk counties, just counties that haven’t been found by the pandemic yet.
Around the World
It’s also worth stressing that when we look at the epicenters around the world, we see that they frequently are not located in the largest cities in each of the respective countries. In Italy, the epicenter was in northern Italy around Milan, not in Rome, which has twice as many people. In China, the epicenter was in Wuhan, China’s 10th largest city. If population density were really the reason for the epidemic, the COVID-19 epicenters would have been in Rome and Shanghai, China’s largest city. Timing and bad luck is simply a more important driver.
The Future Course of the Virus
The analysis of density discussed here is based on the premise that the virus is free to run its course without any interventions. This is a reasonable assumption in the early stages in the U.S. before governments were able to mobilize resources and enact isolation measures. Now that states around the country have begun to fight back, there is evidence these measures are slowing its spread. This suggests the question of which policies are best to lower the reproduction rate? Do self-isolation, social distancing, or testing provide better means for fighting back? Are cities better places to be if you get infected? These are questions we’ll return to in the future when the virus has finally been vanquished.