New 14-Day National Baseline Estimate

My estimates (aka forecasts) are now being included in the CDC’s Ensemble model. Mine are the “CovidComplete” forecasts. CDC forecasts use 7-day periods from Sunday through Saturday.

My new 14-day baseline estimate for 8/23/20 – 9/5/20, which is included in the CDC forecast, is 12,400 deaths, with a range of 11,000-14,000.

Review of Previous 14-Day Estimation Period

My estimates for 8/10/20 – 8/23/20 worked out pretty well.  Cumulative error for the period was 4.4%. After the first day of the period, the maximum cumulative error on any day in the period was 14.1%. Median error on daily estimates was 0.7%, with an average of 6.5%. Median error on the cumulative estimates was 5.9%, with an average of 9.5%. The overall pattern was that the estimates stayed closer to Method 2 for the first half of the period then rose to closer to Method 1 for the second half. This is same pattern as the estimates for the preceding 14-day period.

Background on Estimation Methodology

Estimating Deaths from Infections and the Fatality Rate

If we knew the number of infections, and we knew the infection fatality rate (IFR), it would be very simple to forecast the number of deaths from the number of infections. Throughout the pandemic, however, we have been estimating both the number of infections and the IFR, and in many cases doing that badly. The numbers also seem to vary a lot from state to state.

IFR is in the Neighborhood of 0.5%

I’ve explained how I arrived at this conclusion in an article on Medium. The super quick explanation is that this number can be calculated, at least approximately, from the antibody tests that New York State conducted in late April, combined with information about New York state’s fatalities from Covid-19. Once we know the approximate IFR, we can bootstrap methods for estimating the number of infections from the number of deaths, and that helps us zero in on the relationship between positive tests and infections.

Positive Tests Do Not Equal Infections

Positive tests and infections are not the same thing. Positive tests are a useful proxy for the number of infections, but they are not the same. This simple concept seems to thwart many of the estimates that are created.

The relationship between positive tests and infections has been shifting over time. Early in the pandemic it appeared that the ratio of infections to positive tests was somewhere between 10:1 and 20:1. Now the ratio appears to be somewhere in the range of 3:1 to 5:1.

Naive CFR

The cornerstone of my estimation approaches has been tracking the changes in Naive CFR, as an indicator of the ratio of infections to positive tests. Naive CFR is calculated as today’s deaths divided by positive tests 14 days earlier. Because deaths lag positive tests, positive tests are a useful leading indicator of future deaths.

I continue to be amazed at the huge inaccuracy of many highly publicized short-term fatality estimates. If you simply took the average naive CFR for the past 7 days and used it to estimate deaths for the next 14 days, based on 14-day-lagging positive tests, you’d create more accurate estimates for deaths over the next 14 days than the vast majority of estimates that have been published.

For my estimates through about August 1, I used the strong correlation between positive test percentage (positivity) and naive CFR. Because we know test positivity when test data is reported, it has been useful as a leading indicator of the death rate, and I’ve used recent positivity along with the number of positive tests to estimate near-term future death rates.

But Naive CFR has dropped prodigiously since the beginning of the pandemic. It started out over 50%, and for a while in early August it dropped as low as 1.4%. As Naive CFR has dropped, positivity has become less useful as a basis for estimating.

The main factor that has changed is that the average age of people being tested has been trending downward. The percentage of people who test positive that are under age 50 is also strongly correlated with naive CFR. The CDC reports that information, but not in a very timely way. It lags about 2 weeks, if it’s published at all.

Updated Estimation Methodology

Because of uncertainty in the positivity model, I’ve reverted back to using multiple estimation methods and comparing the results.

Method 1 – This is similar to the method I just described. It is a curve-fit model that is based on (a) positive test percentage (positivity) and (b) the trend in average age being tested. Both of these factors have high RSQ values vs. Naive CFR, but as the average age trends downward, part (a) of the model becomes less accurate and part (b) becomes more accurate, although neither has been super accurate for the past 5-6 weeks. I’ve also had difficulty finding recent information on ages of people being tested. The main input to this model is positive test cases from 14 days earlier than the date of estimated deaths.

Method 2 – This method is based on a trending of Naive CFR. I take the history of recent Naive CFRs, calculate a trend line, and project that forward. I then apply the estimates of future Naive CFR’s, day by day, to the positive tests from 14 days earlier. The challenge with this approach is deciding which Naive CFR trend to use. The numbers can be fairly different depending on how the trend is calculated. I start with daily smoothed Naive CFRs (based on 7-day smoothing). With that as input, I make one projection based on the one-week arithmetic average Naive CFR, one based on the two-week arithmetic average Naive CFR,  and one based on the arithmetic average of the 7-day and 14-day geometric mean CFRs.

Method 3 – This is the sum of the state estimates.

The composite model is the mean of Method 1, Method 3, and the minimum number from Method 2. This is then fine-tuned on a day-by-day basis to account for a history of day-of-week fluctuations in reporting.