Skip to main content

Predicting the future of COVID-19

For the past few months, I've been watching the Top 10 countries list for the ever-popular COVID-19 topic of discussion. I'm sure some people have grown tired of it at this point, but I've been tracking India with great interest as they cracked the Top 10 and have roared their way to take a solid #3 spot. There has been ZERO news in mainstream media outlets covering India despite the major influx of cases and the fact that they are really just getting started with COVID-19. What happens in India will impact the rest of the world, including the U.S. (e.g. call centers), so the lack of news is rather disappointing.

This post isn't about lamenting the lack of good global news coverage but rather my attempt to write some quick-n-dirty software to crunch some numbers in an effort to predict the future. We know the future is always in flux and so any attempt to predict it will be wrong in some way or other. The first step is to find a good dataset. I went with the Our World in Data - Coronavirus Source Data dataset since it was the first complete global dataset in a clean data format that I ran across. Most datasets seem to focus on individual states in the U.S., which doesn't really help.

Once I had a good enough dataset in hand, it was time to figure out what I wanted to build. The most important thing I want to know is approximately when India will take the #2 spot. That meant busting out linear regressions from a college statistics class that I never thought I would ever use in real life. Basically, the goal with using a linear regression here is to know the general direction a specific country is taking relative to their existing, known data. For the most part, the direction is a linear increase over a 30 day period. We know COVID-19 has a 15 day incubation period + a couple of weeks of symptoms/illness, so any policy changes in a country will take approximately 20-ish days to start playing out in the numbers. As a result, the last 30 days of data can be used with confidence that a linear regression will fit pretty well to that data and then the regression can be used to project out another ~30 days to see what the ranking differences will look like on a global stage.

You probably just want the results, so here they are:

Top 25, Total Cases, Oct 4
--------------------------
1.  109,557,656 - United States (32.91%)
2.  93,429,375 - Brazil (43.66%)
3.  70,155,781 - India (5.03%)
4.  43,530,000 - Mexico (33.42%)
5.  32,215,468 - Russia (22.08%)
6.  31,159,843 - Peru (93.41%)
7.  22,214,531 - Indonesia (8.04%)
8.  21,459,062 - United Kingdom (31.46%)
9.  20,473,593 - Italy (33.91%)
10.  19,750,156 - Colombia (38.52%)
11.  18,923,281 - Iran (22.26%)
12.  18,374,218 - France (27.20%)
13.  18,007,031 - Argentina (39.48%)
14.  14,655,937 - Germany (17.47%)
15.  13,715,625 - South Africa (22.84%)
16.  13,509,843 - Spain (28.90%)
17.  11,827,343 - Poland (31.29%)
18.  10,067,812 - Turkey (11.84%)
19.  9,455,312 - Ukraine (21.75%)
20.  6,057,500 - Philippines (5.45%)
21.  5,866,250 - Romania (30.67%)
22.  5,858,437 - Chile (30.49%)
23.  5,123,593 - Ecuador (28.64%)
24.  4,762,187 - Czechia (44.40%)
25.  4,718,593 - Hungary (48.98%)

Top 25, Total Cases, Position Changes
-------------------------------------
Oct 26 - 25.  4,761,218 - Malaysia
Oct 27 - 24.  4,786,536 - Malaysia
Oct 27 - 25.  4,780,882 - Czechia
Oct 29 - 10.  19,923,275 - Iran
Oct 29 - 11.  19,903,540 - Colombia

Top 25, Total Cases, Nov 4
--------------------------
1.  119,025,315 - United States (35.75%; r = 0.9706)
2.  96,253,370 - Brazil (44.98%; r = 0.9883)
3.  71,510,453 - India (5.13%; r = 0.9940)
4.  46,111,891 - Mexico (35.40%; r = 0.9862)
5.  36,515,860 - Russia (25.03%; r = 1.0000)
6.  31,264,333 - Peru (93.72%; r = 0.9533)
7.  22,715,690 - Indonesia (8.22%; r = 0.9940)
8.  22,124,562 - United Kingdom (32.44%; r = 0.9900)
9.  20,718,125 - Italy (34.32%; r = 0.9886)
10.  20,153,196 - Iran (23.70%; r = 0.9992)
11.  19,938,697 - Colombia (38.89%; r = 0.9986)
12.  18,651,905 - France (27.61%; r = 0.9896)
13.  18,258,873 - Argentina (40.04%; r = 0.9483)
14.  14,918,954 - Germany (17.78%; r = 0.9667)
15.  14,199,085 - South Africa (23.65%; r = 0.9641)
16.  13,651,855 - Spain (29.20%; r = 0.9436)
17.  11,933,948 - Poland (31.57%; r = 0.9900)
18.  10,931,797 - Turkey (12.85%; r = 0.9886)
19.  10,436,549 - Ukraine (24.01%; r = 0.9984)
20.  7,245,168 - Philippines (6.52%; r = 0.9698)
21.  6,780,766 - Romania (35.45%; r = 0.9987)
22.  5,900,748 - Chile (30.71%; r = 0.9930)
23.  5,150,681 - Ecuador (28.79%; r = 0.8521)
24.  4,989,081 - Malaysia (15.22%; r = 0.9871)
25.  4,787,042 - Czechia (44.64%; r = 0.9280)

Top 25, Total Deaths, Oct 4
---------------------------
1.  701,169 - United States
2.  597,948 - Brazil
3.  448,997 - India
4.  278,592 - Mexico
5.  206,179 - Russia
6.  199,423 - Peru
7.  142,173 - Indonesia
8.  137,338 - United Kingdom
9.  131,031 - Italy
10.  126,401 - Colombia
11.  121,109 - Iran
12.  117,595 - France
13.  115,245 - Argentina
14.  93,798 - Germany
15.  87,780 - South Africa
16.  86,463 - Spain
17.  75,695 - Poland
18.  64,434 - Turkey
19.  60,514 - Ukraine
20.  38,768 - Philippines
21.  37,544 - Romania
22.  37,494 - Chile
23.  32,791 - Ecuador
24.  30,478 - Czechia
25.  30,199 - Hungary

Top 25, Total Deaths, Position Changes
--------------------------------------
Oct 26 - 25.  30,472 - Malaysia
Oct 27 - 24.  30,634 - Malaysia
Oct 27 - 25.  30,598 - Czechia
Oct 29 - 10.  127,509 - Iran
Oct 29 - 11.  127,383 - Colombia

Top 25, Total Deaths, Nov 4
---------------------------
1.  761,762 - United States (r = 0.9706)
2.  616,022 - Brazil (r = 0.9883)
3.  457,667 - India (r = 0.9940)
4.  295,116 - Mexico (r = 0.9862)
5.  233,702 - Russia (r = 1.0000)
6.  200,092 - Peru (r = 0.9533)
7.  145,380 - Indonesia (r = 0.9940)
8.  141,597 - United Kingdom (r = 0.9900)
9.  132,596 - Italy (r = 0.9886)
10.  128,980 - Iran (r = 0.9992)
11.  127,608 - Colombia (r = 0.9986)
12.  119,372 - France (r = 0.9896)
13.  116,857 - Argentina (r = 0.9483)
14.  95,481 - Germany (r = 0.9667)
15.  90,874 - South Africa (r = 0.9641)
16.  87,372 - Spain (r = 0.9436)
17.  76,377 - Poland (r = 0.9900)
18.  69,964 - Turkey (r = 0.9886)
19.  66,794 - Ukraine (r = 0.9984)
20.  46,369 - Philippines (r = 0.9698)
21.  43,397 - Romania (r = 0.9987)
22.  37,765 - Chile (r = 0.9930)
23.  32,964 - Ecuador (r = 0.8521)
24.  31,930 - Malaysia (r = 0.9871)
25.  30,637 - Czechia (r = 0.9280)

Top 25, Percentage Infected, Oct 4
----------------------------------
1.  93.41% - Peru
2.  50.92% - Bosnia and Herzegovina
3.  50.30% - North Macedonia
4.  48.98% - Hungary
5.  48.12% - Montenegro
6.  47.66% - Bulgaria
7.  44.40% - Czechia
8.  43.66% - Brazil
9.  41.81% - San Marino
10.  39.48% - Argentina
11.  38.52% - Colombia
12.  36.25% - Slovakia
13.  35.48% - Georgia
14.  35.06% - Paraguay
15.  34.40% - Belgium
16.  34.38% - Slovenia
17.  33.91% - Italy
18.  33.42% - Mexico
19.  33.22% - Croatia
20.  32.91% - United States
21.  32.62% - Tunisia
22.  31.46% - United Kingdom
23.  31.29% - Poland
24.  30.67% - Romania
25.  30.49% - Chile

Bottom 25, Percentage Infected, Oct 4
-------------------------------------
1.  0.0496% - Vanuatu
2.  0.0502% - China
3.  0.0902% - New Zealand
4.  0.1262% - Niger
5.  0.1359% - Burkina Faso
6.  0.1491% - Burundi
7.  0.1607% - Chad
8.  0.1785% - South Sudan
9.  0.1827% - Tanzania
10.  0.1833% - Democratic Republic of Congo
11.  0.1867% - Eritrea
12.  0.1995% - Benin
13.  0.2003% - Tajikistan
14.  0.2013% - Nigeria
15.  0.2322% - Sierra Leone
16.  0.3176% - Central African Republic
17.  0.3344% - Bhutan
18.  0.3417% - Laos
19.  0.3696% - Cote d'Ivoire
20.  0.4009% - Papua New Guinea
21.  0.4113% - Mali
22.  0.4294% - Togo
23.  0.4387% - Guinea
24.  0.4406% - Hong Kong
25.  0.4756% - Nicaragua

Top 25, Largest Cases Slope (y = a + bx), Cases/sec (1 month Death rate basis)
------------------------------------------------------------------------------
1.  3.3402 - 288,594 per day - United States (109,557,656; r = 0.9706)
2.  1.5567 - 134,498 per day - Russia (32,215,468; r = 1.0000)
3.  1.0058 - 86,897 per day - Brazil (93,429,375; r = 0.9883)
4.  0.9162 - 79,157 per day - Mexico (43,530,000; r = 0.9862)
5.  0.4854 - 41,936 per day - India (70,155,781; r = 0.9940)
6.  0.4435 - 38,320 per day - Iran (18,923,281; r = 0.9992)
7.  0.4223 - 36,484 per day - Philippines (6,057,500; r = 0.9698)
8.  0.3530 - 30,502 per day - Ukraine (9,455,312; r = 0.9984)
9.  0.3293 - 28,454 per day - Romania (5,866,250; r = 0.9987)
10.  0.3145 - 27,176 per day - Turkey (10,067,812; r = 0.9886)
11.  0.2932 - 25,335 per day - Vietnam (3,080,468; r = 0.9978)
12.  0.2930 - 25,318 per day - Malaysia (4,169,218; r = 0.9871)
13.  0.2593 - 22,400 per day - Tanzania (112,343; r = 0.8660)
14.  0.2370 - 20,480 per day - United Kingdom (21,459,062; r = 0.9900)
15.  0.1965 - 16,981 per day - Thailand (2,658,437; r = 0.9967)
16.  0.1793 - 15,491 per day - Indonesia (22,214,531; r = 0.9940)
17.  0.1713 - 14,799 per day - South Africa (13,715,625; r = 0.9641)
18.  0.1225 - 10,586 per day - Bulgaria (3,287,187; r = 0.9872)
19.  0.1035 - 8,940 per day - Guatemala (2,145,312; r = 0.9968)
20.  0.1012 - 8,744 per day - Sri Lanka (2,040,468; r = 0.9987)
21.  0.0988 - 8,538 per day - France (18,374,218; r = 0.9896)
22.  0.0927 - 8,008 per day - Germany (14,655,937; r = 0.9667)
23.  0.0926 - 7,997 per day - Lesotho (101,562; r = 0.8756)
24.  0.0917 - 7,919 per day - Cuba (1,184,375; r = 0.9996)
25.  0.0914 - 7,896 per day - Myanmar (2,794,218; r = 0.9998)

Top 25, Largest Cases Slope (y = a + bx), Cases/sec (current basis)
-------------------------------------------------------------------
1.  1.1698 - 101,075 per day - United States (43,683,048; r = 0.9817)
2.  0.3889 - 33,598 per day - United Kingdom (7,937,810; r = 0.9993)
3.  0.2717 - 23,477 per day - India (33,834,702; r = 0.9992)
4.  0.2714 - 23,445 per day - Turkey (7,208,851; r = 0.9870)
5.  0.2707 - 23,390 per day - Russia (7,474,850; r = 0.9995)
6.  0.2091 - 18,069 per day - Brazil (21,468,121; r = 0.9916)
7.  0.1982 - 17,120 per day - Philippines (2,593,399; r = 0.9937)
8.  0.1492 - 12,894 per day - Iran (5,624,128; r = 0.9992)
9.  0.1346 - 11,632 per day - Malaysia (2,277,565; r = 0.9989)
10.  0.1297 - 11,207 per day - Romania (1,265,827; r = 0.9995)
11.  0.1283 - 11,082 per day - Thailand (1,637,432; r = 0.9995)
12.  0.1279 - 11,052 per day - Ukraine (2,566,875; r = 0.9972)
13.  0.1044 - 9,020 per day - Germany (4,260,494; r = 0.9919)
14.  0.0878 - 7,589 per day - Mexico (3,678,980; r = 0.9824)
15.  0.0872 - 7,536 per day - Vietnam (808,578; r = 0.9928)
16.  0.0840 - 7,259 per day - Serbia (961,006; r = 0.9974)
17.  0.0637 - 5,502 per day - France (7,120,214; r = 0.9962)
18.  0.0607 - 5,246 per day - Tanzania (25,846; r = 0.8660)
19.  0.0591 - 5,110 per day - Cuba (891,447; r = 0.9985)
20.  0.0411 - 3,555 per day - Canada (1,640,565; r = 0.9878)
21.  0.0394 - 3,406 per day - Italy (4,682,034; r = 0.9992)
22.  0.0393 - 3,399 per day - Israel (1,290,129; r = 0.9932)
23.  0.0324 - 2,797 per day - Guatemala (566,250; r = 0.9871)
24.  0.0282 - 2,440 per day - Singapore (103,843; r = 0.9992)
25.  0.0282 - 2,438 per day - Oceania (189,707; r = 0.9988)

Top 25, Largest Deaths Slope (y = a + bx), Deaths/sec
-----------------------------------------------------
1.  0.02138 - 1,847 per day - United States (701,169; r = 0.9706)
2.  0.00996 - 861 per day - Russia (206,179; r = 1.0000)
3.  0.00644 - 556 per day - Brazil (597,948; r = 0.9883)
4.  0.00586 - 507 per day - Mexico (278,592; r = 0.9862)
5.  0.00311 - 268 per day - India (448,997; r = 0.9940)
6.  0.00284 - 245 per day - Iran (121,109; r = 0.9992)
7.  0.00270 - 234 per day - Philippines (38,768; r = 0.9698)
8.  0.00226 - 195 per day - Ukraine (60,514; r = 0.9984)
9.  0.00211 - 182 per day - Romania (37,544; r = 0.9987)
10.  0.00201 - 174 per day - Turkey (64,434; r = 0.9886)
11.  0.00188 - 162 per day - Vietnam (19,715; r = 0.9978)
12.  0.00188 - 162 per day - Malaysia (26,683; r = 0.9871)
13.  0.00166 - 143 per day - Tanzania (719; r = 0.8660)
14.  0.00152 - 131 per day - United Kingdom (137,338; r = 0.9900)
15.  0.00126 - 109 per day - Thailand (17,014; r = 0.9967)
16.  0.00115 - 99 per day - Indonesia (142,173; r = 0.9940)
17.  0.00110 - 95 per day - South Africa (87,780; r = 0.9641)
18.  0.00078 - 68 per day - Bulgaria (21,038; r = 0.9872)
19.  0.00066 - 57 per day - Guatemala (13,730; r = 0.9968)
20.  0.00065 - 56 per day - Sri Lanka (13,059; r = 0.9987)
21.  0.00063 - 55 per day - France (117,595; r = 0.9896)
22.  0.00059 - 51 per day - Germany (93,798; r = 0.9667)
23.  0.00059 - 51 per day - Lesotho (650; r = 0.8756)
24.  0.00059 - 51 per day - Cuba (7,580; r = 0.9996)
25.  0.00058 - 51 per day - Myanmar (17,883; r = 0.9998)

I threw the PHP code together that generated that output in about two hours. I plan on updating this post about once a month until I'm not interested anymore. The last two sections are pretty telling about growth rates and give some indication of what a future Top 25 might be if nothing changes in those countries. The number of interest in those sections is the slope (b), which is used to calculate cases and deaths per second respectively. Cases tends to lead toward increased deaths but could just be better testing. Deaths/sec, while pretty morbid to look at, is a little more definitive about which nations are at risk of seeing runaway numbers in the coming months.

Deaths/sec is where we see India clearly having serious problems only slightly trailing the U.S. and Brazil. Keep in mind that India is approximately 1/10 of the per-million population numbers of the rest of the Top 5 at the time of this writing (Aug 15, 2020). In short, their Deaths/sec count (0.00921 on Aug 15, 2020) could easily become 10 times greater than it currently is...something they are very much on track to do. And somehow this isn't making news.

A note on 'r'. If you aren't familiar with statistics, 'r' is a measure of how good the associated line fits the data - aka the "correlation coefficient." An 'r' of 0 indicates the data does not fit at all while an 'r' of 1 is a perfect fit. Anything above 0.9 is generally considered a very good fit. Most of the data points above show 0.97 to 0.99 r-values, which shows that my earlier estimate of 30 days of prior data fitting a line as being a pretty good estimate was a fairly valid estimate.

When a country has a poor reporting track record, the r-value can be lower but can still be a useful data point. For example, Kyrgyzstan isn't very good at reporting accurate results in a timely fashion and so its r-value tends to hang around 0.78. The associated line is not as good a predictor but it works well enough since the general direction is "up by some amount" and I'm only predicting one month into the future where any variance is going to be quite small given what we currently know about COVID-19. Predicting anything more than one month is going to have a much higher variance than I'm comfortable with.

Update Oct 10, 2020: Looks like India will surpass the U.S. in total cases around October 26, 2020. I added per-day Top 25 shifting to the script so I could see what day the countries in the Top 25 will probably change positions and to also predict the date that India would take the #1 spot. It's not perfect since changing conditions could alter the date by a few days.

Update Oct 5, 2021: I gave up on updating the data for many months since it looked like India got their problem mostly under control just a couple of weeks before they would have overtaken the U.S. Meanwhile the U.S. has continued to be #1 at spreading disease among its population such that no one else could possibly ever catch up. I don't know why anyone would ever want to come here and be a U.S. Citizen after our extremely poor response to a global pandemic. Oh, yeah, for the freedom to die stupidly! 700K dead from a single disease in one year = yay for freedom! In short, there are a bunch of stupid people who don't understand what the basics of the cost of freedom are and the Founding Fathers are rolling over in their graves. At any rate, enough commentary on that. I updated the data today after making a few adjustments to the script. You might notice that the numbers are WAY higher than anyone else is reporting. The script now makes adjustments to the numbers based on death or testing rates, depending on which fall outside reality (heavily based on the WHO estimated infection death rate of 0.64%). Also added a few points of interest. Peru, for instance, has had approximately 93% of its mostly unvaccinated population exposed to at least one strain of COVID, which is the worst metric possible. Vanuatu, on the other hand, has managed to keep COVID out for the most part. Based on the WHO infection death rate of 0.64%, we can reasonably estimate that 32% of the population in the U.S. has been infected by COVID. 56.4% are currently fully vaccinated, so we can also assume that overlaps those who have encountered a single strain of COVID already. However, since vaccination rates in the U.S. are slowing dramatically, we can further estimate that another 300K will die from COVID in the U.S. in the next year. Singapore is the country of interest at the moment. They have a 79.7% fully vaccinated population but are somehow in the global Top 25 in (unadjusted) reported cases/sec. Russia's r-value for reported deaths is VERY suspicious. A linear regression perfectly fits the data? Hmm.... Note that these are just comments and shouldn't be used for anything serious/important.

Comments