Lies, Damn Lies, and Statistics: How the COVID-19 Crisis Highlights Our Misuse of Data

Posted on April 17, 2020 by Jonathan Ettinger

As I was reading the latest statistics regarding the spread of COVID-19, I became frustrated.  My frustration stemmed not just from the fact that we are unprepared despite repeated warnings, but also from the way our elected officials and their teams present (and the media reports) the data.  Having practiced environmental law for over thirty years and observed countless instances of data misuse and misinterpretation, I am not surprised, but I am disappointed.

I am not talking about the inherent unreliability of the data due to selective and inconsistent testing or the fact that we cannot count infected but asymptomatic people.  For a good discussion of that, see Nate Silver’s recent article.  Rather, I am talking about something much simpler: how many people are getting infected and at what ages.  During the early stages of the pandemic, the media were reporting that the virus was unusual because it appeared to afflict not the young or the elderly but the middle-aged.  Then, of course, it became apparent that the elderly were dying at a much higher rate than others (and at a higher rate than those infected with an ordinary flu). 

I then had a discussion with someone who said “Yeah, but it turns out young adults are being infected at a high rate; they are vulnerable, too!”  It was this simple assertion I wished to validate (or invalidate).

But, that was not easy.  Nearly every article on the topic (and most government updates, too) focused on percentages – but the wrong percentages.  It is easy to find statements like the following: “A USA TODAY analysis of data reported by 19 states shows that Americans of all ages seem to be equally susceptible to a coronavirus infection. States are reporting cases in every age range, though people in their 50s have slightly more confirmed cases on average.”  Here is the graph that accompanied it. 

It afflicts everyone roughly equally, right?  Those in their 30s and 40s are as likely to be infected as those in their 70s, right?  WRONG!  These are percentages of total coronavirus cases, not percentages of the population.  There is a fundamental difference between saying 15% of the population between the ages of 30 and 40 are infected and 15% of the total infections are of people in their 30s. 

According to the US Census Bureau, in 2016 there were roughly 323 million people in the United States – 43 million (13.3%) in their 30s and 20 million (6.2%) in their 70s.  If those percentages remain valid today, the graph above shows that those in their 70s are more than twice as likely to become infected as those in their 30s.  Regardless of whether that figure is accurate, it certainly means that one cannot say that “Americans of all ages seem to be equally susceptible to a coronavirus infection.”

How the data are reported makes a big difference.  Let’s get it right.



Add comment




  Country flag
biuquote
  • Comment
  • Preview
Loading