Skip to main content

Predicting the future of COVID-19

For the past few months, I've been watching the Top 10 countries list for the ever-popular COVID-19 topic of discussion. I'm sure some people have grown tired of it at this point, but I've been tracking India with great interest as they cracked the Top 10 and have roared their way to take a solid #3 spot. There has been ZERO news in mainstream media outlets covering India despite the major influx of cases and the fact that they are really just getting started with COVID-19. What happens in India will impact the rest of the world, including the U.S. (e.g. call centers), so the lack of news is rather disappointing.

This post isn't about lamenting the lack of good global news coverage but rather my attempt to write some quick-n-dirty software to crunch some numbers in an effort to predict the future. We know the future is always in flux and so any attempt to predict it will be wrong in some way or other. The first step is to find a good dataset. I went with the Our World in Data - Coronavirus Source Data dataset since it was the first complete global dataset in a clean data format that I ran across. Most datasets seem to focus on individual states in the U.S., which doesn't really help.

Once I had a good enough dataset in hand, it was time to figure out what I wanted to build. The most important thing I want to know is approximately when India will take the #2 spot. That meant busting out linear regressions from a college statistics class that I never thought I would ever use in real life. Basically, the goal with using a linear regression here is to know the general direction a specific country is taking relative to their existing, known data. For the most part, the direction is a linear increase over a 30 day period. We know COVID-19 has a 15 day incubation period + a couple of weeks of symptoms/illness, so any policy changes in a country will take approximately 20-ish days to start playing out in the numbers. As a result, the last 30 days of data can be used with confidence that a linear regression will fit pretty well to that data and then the regression can be used to project out another ~30 days to see what the ranking differences will look like on a global stage.

You probably just want the results, so here they are:

Top 25, Total Cases, Sep 3
--------------------------
1.  6,114,406 - United States
2.  3,997,865 - Brazil
3.  3,853,406 - India
4.  1,005,000 - Russia
5.  663,437 - Peru
6.  633,339 - Colombia
7.  630,595 - South Africa
8.  610,957 - Mexico
9.  479,554 - Spain
10.  428,226 - Argentina
11.  414,739 - Chile
12.  378,752 - Iran
13.  338,676 - United Kingdom
14.  317,528 - Bangladesh
15.  317,486 - Saudi Arabia
16.  297,014 - Pakistan
17.  293,024 - France
18.  273,301 - Turkey
19.  271,515 - Italy
20.  246,116 - Germany
21.  242,284 - Iraq
22.  226,440 - Philippines
23.  180,646 - Indonesia
24.  132,354 - Kazakhstan
25.  129,923 - Canada

Top 25, Total Cases, Oct 3
--------------------------
1.  7,539,634 - United States (r = 0.9982)
2.  5,769,985 - India (r = 0.9990)
3.  5,217,875 - Brazil (r = 0.9986)
4.  1,153,389 - Russia (r = 0.9999)
5.  957,563 - Colombia (r = 0.9990)
6.  907,222 - Peru (r = 0.9994)
7.  779,774 - Mexico (r = 0.9992)
8.  749,184 - South Africa (r = 0.9788)
9.  650,726 - Spain (r = 0.9858)
10.  636,828 - Argentina (r = 0.9967)
11.  468,362 - Chile (r = 0.9996)
12.  446,851 - Iran (r = 0.9990)
13.  396,207 - Bangladesh (r = 0.9990)
14.  381,056 - France (r = 0.9787)
15.  369,842 - United Kingdom (r = 0.9979)
16.  357,627 - Saudi Arabia (r = 0.9973)
17.  355,322 - Iraq (r = 0.9994)
18.  351,935 - Philippines (r = 0.9995)
19.  314,953 - Pakistan (r = 0.9936)
20.  311,000 - Turkey (r = 0.9988)
21.  291,373 - Italy (r = 0.9707)
22.  281,314 - Germany (r = 0.9983)
23.  242,447 - Indonesia (r = 0.9974)
24.  175,870 - Ukraine (r = 0.9962)
25.  164,771 - Kazakhstan (r = 0.9654)

Top 25, Total Deaths, Sep 3
---------------------------
1.  185,744 - United States
2.  123,780 - Brazil
3.  67,376 - India
4.  65,816 - Mexico
5.  41,514 - United Kingdom
6.  35,497 - Italy
7.  30,686 - France
8.  29,259 - Peru
9.  29,194 - Spain
10.  21,797 - Iran
11.  20,348 - Colombia
12.  17,414 - Russia
13.  14,389 - South Africa
14.  11,344 - Chile
15.  9,898 - Belgium
16.  9,321 - Germany
17.  9,135 - Canada
18.  8,971 - Argentina
19.  7,616 - Indonesia
20.  7,201 - Iraq
21.  6,619 - Ecuador
22.  6,462 - Turkey
23.  6,328 - Pakistan
24.  6,226 - Netherlands
25.  5,820 - Sweden

Top 25, Total Deaths, Oct 3
---------------------------
1.  216,061 - United States (r = 0.9979)
2.  152,779 - Brazil (r = 0.9989)
3.  95,689 - India (r = 0.9999)
4.  83,233 - Mexico (r = 0.9971)
5.  41,834 - United Kingdom (r = 0.9947)
6.  41,138 - Peru (r = 0.9446)
7.  35,927 - Italy (r = 0.9477)
8.  31,043 - France (r = 0.9932)
9.  29,879 - Spain (r = 0.9794)
10.  29,672 - Colombia (r = 0.9999)
11.  26,386 - Iran (r = 0.9960)
12.  20,539 - Russia (r = 0.9994)
13.  20,386 - South Africa (r = 0.9893)
14.  14,168 - Argentina (r = 0.9962)
15.  13,072 - Chile (r = 0.9992)
16.  10,121 - Belgium (r = 0.9798)
17.  9,756 - Indonesia (r = 0.9953)
18.  9,471 - Germany (r = 0.9944)
19.  9,452 - Iraq (r = 0.9996)
20.  9,331 - Canada (r = 0.9988)
21.  7,456 - Ecuador (r = 0.9913)
22.  7,080 - Bolivia (r = 0.9991)
23.  7,034 - Turkey (r = 0.9877)
24.  6,651 - Pakistan (r = 0.9891)
25.  6,314 - Netherlands (r = 0.9813)

Top 25, Largest Cases Slope (y = a + bx), Cases/sec
---------------------------------------------------
1.  0.7638 - India (3,853,406; r = 0.9990)
2.  0.5349 - United States (6,114,406; r = 0.9982)
3.  0.4675 - Brazil (3,997,865; r = 0.9986)
4.  0.1214 - Colombia (633,339; r = 0.9990)
5.  0.0919 - Peru (663,437; r = 0.9994)
6.  0.0852 - Argentina (428,226; r = 0.9967)
7.  0.0703 - Spain (479,554; r = 0.9858)
8.  0.0641 - Mexico (610,957; r = 0.9992)
9.  0.0570 - Russia (1,005,000; r = 0.9999)
10.  0.0471 - Philippines (226,440; r = 0.9995)
11.  0.0438 - Iraq (242,284; r = 0.9994)
12.  0.0417 - South Africa (630,595; r = 0.9788)
13.  0.0388 - France (293,024; r = 0.9787)
14.  0.0296 - Bangladesh (317,528; r = 0.9990)
15.  0.0257 - Iran (378,752; r = 0.9990)
16.  0.0252 - Indonesia (180,646; r = 0.9974)
17.  0.0205 - Chile (414,739; r = 0.9996)
18.  0.0203 - Ukraine (125,789; r = 0.9962)
19.  0.0173 - Israel (122,539; r = 0.9980)
20.  0.0153 - Morocco (65,453; r = 0.9991)
21.  0.0151 - Turkey (273,301; r = 0.9988)
22.  0.0147 - Saudi Arabia (317,486; r = 0.9973)
23.  0.0146 - Ethiopia (54,409; r = 0.9906)
24.  0.0144 - Bolivia (117,928; r = 0.9952)
25.  0.0138 - Romania (89,891; r = 0.9996)

Top 25, Largest Deaths Slope (y = a + bx), Deaths/sec
-----------------------------------------------------
1.  0.01141 - United States (185,744; r = 0.9979)
2.  0.01105 - Brazil (123,780; r = 0.9989)
3.  0.01100 - India (67,376; r = 0.9999)
4.  0.00660 - Mexico (65,816; r = 0.9971)
5.  0.00406 - Peru (29,259; r = 0.9446)
6.  0.00360 - Colombia (20,348; r = 0.9999)
7.  0.00216 - South Africa (14,389; r = 0.9893)
8.  0.00206 - Argentina (8,971; r = 0.9962)
9.  0.00170 - Iran (21,797; r = 0.9960)
10.  0.00120 - Russia (17,414; r = 0.9994)
11.  0.00087 - Iraq (7,201; r = 0.9996)
12.  0.00087 - Indonesia (7,616; r = 0.9953)
13.  0.00074 - Bolivia (5,203; r = 0.9991)
14.  0.00065 - Chile (11,344; r = 0.9992)
15.  0.00063 - Philippines (3,623; r = 0.9914)
16.  0.00049 - Romania (3,721; r = 0.9998)
17.  0.00046 - Bangladesh (4,351; r = 0.9991)
18.  0.00039 - Saudi Arabia (3,956; r = 0.9995)
19.  0.00034 - Ukraine (2,656; r = 0.9946)
20.  0.00033 - Ecuador (6,619; r = 0.9913)
21.  0.00032 - Morocco (1,216; r = 0.9907)
22.  0.00031 - Kazakhstan (1,889; r = 0.9543)
23.  0.00031 - Guatemala (2,790; r = 0.9968)
24.  0.00028 - Spain (29,194; r = 0.9794)
25.  0.00026 - Turkey (6,462; r = 0.9877)

I threw the PHP code together that generated that output in about two hours. I plan on updating this post about once a month until I'm not interested anymore. The last two sections are pretty telling about growth rates and give some indication of what a future Top 25 might be if nothing changes in those countries. The number of interest in those sections is the slope (b), which is used to calculate cases and deaths per second respectively. Cases tends to lead toward increased deaths but could just be better testing. Deaths/sec, while pretty morbid to look at, is a little more definitive about which nations are at risk of seeing runaway numbers in the coming months.

Deaths/sec is where we see India clearly having serious problems only slightly trailing the U.S. and Brazil. Keep in mind that India is approximately 1/10 of the per-million population numbers of the rest of the Top 5 at the time of this writing (Aug 15, 2020). In short, their Deaths/sec count (0.00921 on Aug 15, 2020) could easily become 10 times greater than it currently is...something they are very much on track to do. And somehow this isn't making news.

A note on 'r'. If you aren't familiar with statistics, 'r' is a measure of how good the associated line fits the data - aka the "correlation coefficient." An 'r' of 0 indicates the data does not fit at all while an 'r' of 1 is a perfect fit. Anything above 0.9 is generally considered a very good fit. Most of the data points above show 0.97 to 0.99 r-values, which shows that my earlier estimate of 30 days of prior data fitting a line as being a pretty good estimate was a fairly valid estimate.

When a country has a poor reporting track record, the r-value can be lower but can still be a useful data point. For example, Kyrgyzstan isn't very good at reporting accurate results in a timely fashion and so its r-value tends to hang around 0.78. The associated line is not as good a predictor but it works well enough since the general direction is "up by some amount" and I'm only predicting one month into the future where any variance is going to be quite small given what we currently know about COVID-19. Predicting anything more than one month is going to have a much higher variance than I'm comfortable with.

At this point though, we can see that India will only close the gap to Brazil by ~200,000 cases per month. So maybe sometime in November 2020, India will take that not-coveted #2 spot from Brazil and then we'll see at that point if India can catch up to the U.S. for that #1 spot. I'll probably re-run my script on Sept 1 to see what Oct 1 looks like.

Comments