We are in the middle of an information overload about the novel coronavirus, with hourly case updates and endless streams of information. How do you find the signal in the noise? What information do you need to make decisions in the face of this pandemic?
Having spent more than a decade working on data visualization in public health, I’ve personally seen how visualizations can be powerful tools for communicating information, but can also mislead, misinform, and—in the worst cases—incite panic. In this time of crisis, I’ve seen excellent graphs that clarify key concepts (like this animation of why flattening the curve matters by Harry Stevens) and graphs so complex, cluttered, or inaccurate that at best they confuse and at worst, mislead.
As we watch COVID-19 continue to spread, visuals should help people understand the severity of this pandemic, and point to actions that we can take to flatten the curve. To do this, we also need to become more conversant in the language of data visualization.
As you sift through the news and your Twitter feed, here are eight considerations to help you find relevant information and make complex epidemiology more accessible.
1. Case data is deceptively complex
The most common data set being visualized are the number of confirmed cases and known outcomes (cases that have resolved or resulted in death). These numbers are reported to public health agencies and the World Health Organization. We use this data to tally the number of confirmed cases, recovered cases, and deaths. (You can learn more about the different kinds of cases including COVID-19 definitions from the CDC and WHO).
Over the last few months we have seen significant fluctuations of case counts due to changes in what is being reported as a confirmed case. For example, in the earlier days of the outbreak in China, only laboratory-confirmed cases of COVID-19 were reported by China to the WHO. Adding clinically diagnosed cases where a doctor has assessed a patient and classified their illness as COVID-19 changed what was included in the confirmed cases number. As a result,we see a sharp spike in the number of cases that may be partially due to the new definition.
When charts report on trends, look for annotations that clarify where changes in methodology explain odd spikes in the chart, as seen in this example from Datawrapper.
How cases are quantified impacts the summary numbers you see. Most visualizations report on summary case numbers in line with the CDC’s methodology, including both confirmed and presumptive positive cases, which have been confirmed in a private or state laboratory but need to be validated at a CDC site. The widely used data set underlying the Johns Hopkins COVID-19 Operations Dashboard uses this methodology.
When you’re reading footnotes describing which cases are included, look for presumptive cases—that means a higher count but likely one that better represents the actual impact of the disease.
Some state reports also disaggregate the confirmed and presumptive cases, rather than reporting an aggregate figure. What is important is that on a given chart, the definition of what is used is consistent.
2. The volume of data may feel staggering but what we know for certain is limited
Given the volume of data being publicly shared, it would be easy to assume we have more certainty in how the disease will progress and its long-term impacts. But in the early stages of an epidemic, the fact that our numbers, rates, and knowledge of the disease will change daily (if not hourly) is the only thing we know for certain.
We currently have data on reported cases and deaths, analysis of the highest risk populations, and a global community of scientists researching SARS-CoV2 (the technical term for the novel coronavirus) and COVID-19. Researchers and clinicians are also documenting what the impact of the virus is on the human body. The volume of knowledge grows daily thanks to this global mobilization of brilliant minds.
That said, many cases have likely gone undetected thanks to limited availability of testing in many countries and the long incubation period during which a person can be contagious but asymptomatic.
These undetected cases result in a lag in reporting, during which people are infected with the virus but have not presented with symptoms or been tested. On visualizations, look for annotations and reference bands noting this uncertainty, as in this chart from the CDC.
3. COVID-19 will impact countries, cities, and demographic groups differently
In a global pandemic, demographic profiles and health systems can dramatically influence a disease’s spread and human impact. Government measures to combat the spread of infection (such as closing government offices and schools or restricting large gatherings) and individual actions can increase social distance and slow the spread of contagion.
We should learn from the experiences of countries that are at different stages of combating this virus, but numbers like fatality rates estimated for one country should not be extrapolated or assumed to be generalizable to a global population.
Look for annotations that specify who is represented in any visualizations of cases or case fatality rates and when different actions were taken that could influence the spread, like this graph from the Financial Times (image from March 13, 2020, updated on site daily).
Demographics can significantly shape the impact of the disease. For example, the population of Italy skews older than that of China or the U.S. Because elderly populations are higher risk and more likely to require hospital care, the percentage of cases requiring hospitalization may be higher in Italy than in countries with a younger population. (More on the ways demographics are influencing outcomes in Italy.)
Because of these differences, be mindful of the conclusions you draw from COVID-19 visualizations that compare the disease burden between countries with very different population sizes, political environments, and health systems.
4. Not all maps are created equal
Case numbers lend themselves well to maps as they are tied to specific locations, making them perfect for answering the question, “Are there cases near me?”
If you’re looking at a map, symbols (like bubbles) emphasize the number of cases rather than the size of the geographic territory, which is a good thing! For example, if you looked at a choropleth (filled) map, one case in Russia would look far more alarming than one case in Sweden, thanks to the sheer size of the area that seems affected. The same applies for comparing one case in Texas to one in Vermont.
Where we have more granular data, mapping to lower levels of geography provides more relevant information that better communicates the local risk of infection. For other mapping best practices and visual tricks to watch for, see Kenneth Field’s detailed recommendations for mapping COVID-19.
5. Be aware that small design choices can impact how you interpret a visualization
The design choices made by a visualization developer—color, text, chart selection, scale—all impact how we understand the data. Datawrapper has an excellent set of responsible visualizations of COVID-19 with notes on why they made specific choices.
For example COVID-19 is not a death sentence, and our visualizations need to reflect that. Including “recovered cases” is an essential piece of context and is a headline worth celebrating.
The color red can invoke fear and a sense of warning and can be particularly alarming on a map. Using other colors, like blue, can minimize this startling effect of a map covered in countries filled with red.
Pay careful attention to the scale of the axes. For data on daily cases or cumulative counts be wary of bar charts that have inconsistent periods between the bars.
— Brittany Fong (@BFongData) March 13, 2020
Look for details in the visualization or article that provide clear indicator names, definitions, and data sources (see WHO definitions for COVID-19 cases, an explainer on R0, and the CDC Glossary). You may find this information in footnotes, titles, subtitles, annotations, or explainer text, which should clearly state what is represented.
While designers often champion simplicity, elegance, and decluttering in charts and graphs, communicating information in an emerging pandemic is a unique moment in time where the individual and policy decisions made on a daily basis can impact the scale of the human toll of the disease. As a society, we’re also inundated with charts, maps, and infographics about COVID-19 that communicate different numbers and messages.
Take the time to read the supporting information provided with a chart, whether it’s an article or an annotation. Those details should provide the necessary context to understand not only what is happening with COVID-19, but why the numbers are trending a particular way.
6. Calculating rates early in an epidemic is complex
Early in an outbreak of a new disease, epidemiologists estimate a case fatality rate (CFR). The metric is calculated by dividing the number of deaths related to a disease to the total number of cases in a given period and multiples by 100 to create a percentage. Rates often help us compare different geographies and groups and are critical to understand early in a pandemic, but are more complex to estimate than the simple math may let on.
Early in the COVID-19 pandemic, WHO and other agencies have made CFR estimations, frequently reported as a range. These numbers can be helpful for understanding the severity of the epidemic, but should be approached with caution when you see them plotted as comparisons between countries and populations.
The challenge is that these calculations are complex due to the uncertainty of the estimates, both in the number of cases identified (the denominator) and the number of deaths due to the disease (the numerator), which lags behind case detection. If you want to see how an epidemiologist approaches this exercise, see this article (notably a pre-print, not peer reviewed).
When you see case fatality rates plotted, remember that there is still uncertainty around these figures. Look for estimated ranges for rates, rather than focusing on exact figures, and don’t be surprised if they change over time.
7. Comparing COVID-19 to other diseases is challenging
With all of this uncertainty and estimation, making comparisons between COVID-19 and other, more established, diseases is hard to represent accurately on the basic graphs that lend themselves to sharing on social media.
Early on, a comparison was frequently made on what was more alarming: COVID-19 or the flu. Now, we see media outlets trying to make comparisons to other diseases. For example, one chart about COVID-19 summed the total deaths to date and divided it by the known days in the epidemic to create a special disease deaths per day aggregation. Then, that number was compared with values for other diseases.
At best, this is an inaccurate comparison due to major differences in our knowledge of and resources for testing and treating COVID-19 compared to other diseases. At worst, it significantly understates the seriousness of COVID-19 and causes people to ignore the advice of public health professionals on social distancing and other individual actions that can slow the spread of the virus. (Thankfully, the design team behind this infographic have added some annotations since its original publication.)
8. Validate which organization designed the visualization and where they sourced the data
Teams are making ready-to-use COVID-19 data sets easily accessible for the wider community. Johns Hopkins posts frequently updated data on their GitHub page, and Tableau, a visual analytics software company, has created a COVID-19 Resource Hub with the same data reshaped for use in Tableau dashboards.
These public assets are immensely helpful for public health professionals and authorities responding to the epidemic. They make data from multiple sources accessible, which can enable quick analysis.
But as a result of this accessibility, you’ll also see plenty of charts pop up from armchair epidemiologists. Some will be well-designed, clear, and compelling, while others may confuse or mislead.
Always check the source before you hit retweet or otherwise amplify a message. Read the accompanying text, which may detail limitations and assumptions around what is represented in the chart. Look to reputable sources like the WHO or CDC dedicated COVID-19 response page for the most up-to-date information. The visualizations may not be the most visually stunning, but they will be accurate. Or look to news outlets with comprehensive reporting grounded in science like The New York Times, whose graphics team has done incredible work making complex epidemiology available, and Reuters, whose article and graphics explaining contact tracing and the story of Patient 31 in South Korea visualize the ripple effects of one person’s choices.
As you read charts about COVID-19, remember that behind every data point is a person.
In the face of this global pandemic, our best response is individual actions that will collectively have an impact. Self-quarantine where appropriate to create social distance. Ensure we’re not stigmatizing people who are from countries and regions that have had a lot of cases. Understand what steps you can take to flatten the curve and slow the spread of the virus.
The goal of data visualization is to make it easier to understand complex information. Hopefully the guidance above will aid in your ability to understand COVID-19 visualizations more readily, sort fact from fiction, avoid alarmist messaging, and—above all else—stay accurately informed in this age of information overload.
Amanda Makulec is the senior data visualization lead at Excella and holds a masters of public health from the Boston University School of Public Health. She worked with data in global health programs for eight years before joining Excella, where she leads teams and develops user-centered data visualization products for federal, nonprofit, and private sector clients. Amanda volunteers as the operations director for the Data Visualization Society and is a co-organizer for Data Visualization DC. Find her on Twitter at @abmakulec. And to read about how data visualization practitioners can create responsible visualizations, go here.