COVID-19: Cumulative Binomial Probability Analysis with Shiny Web Apps
Here is an example of how a “probability of death” from COVID-19 metric can be modelled using a simulation with Shiny Web Apps.
As mentioned in the disclaimer above, none of the below constitutes any form of medical or otherwise professional advice in what constitutes a rapidly evolving situation around the world. The purpose of this article is simply to use data science tools in demonstrating how the Shiny Web App platform can be used to conduct probability analysis, with simplifying assumptions pertaining to COVID-19 being used as an example.
COVID-19 is a pandemic that has caused a high number of deaths and has had massive economic and social ramifications for the world.
What is interesting, however, is that COVID-19 is actually not the most deadly disease that the world has recently faced.
When taking into account the actual death rate from this disease, e.g. deaths per 1,000 people, both SARS and MERS reportedly had a significantly higher death rate than COVID-19.
With that being said, what is unique about COVID-19 is that the virus is highly contagious and can seemingly transmit between humans much easier than the former two diseases.
In this regard, the total number of deaths from COVID-19 has exceeded that of SARS and MERS — simply due to the fact that many more people have been infected — even if the risk of an individual death from COVID-19 is lower.
Probability of Death
Here is a sample of the probability of death for ten countries across Europe (including Spain and Italy):
We can see that the probability of death (as in the number of deaths / number of cases) varies significantly across these countries.
Moreover, simply looking at the total number of deaths per country is not sufficient — countries with higher populations can naturally be expected to show a higher number of deaths.
Therefore, a “probability of death” simulation with Shiny allows us to inspect the differences in risk that each country faces from a probability standpoint.
Probability Analysis: The Theory
In conducting probability analysis, the two variables that take account of the chance of an event happening are N (number of observations) and λ (lambda — our hit rate/chance of occurrence in a single interval). When we talk about a cumulative binomial probability distribution, we mean to say that the greater the number of trials, the higher the overall probability of an event occurring.
probability = 1 — ((1 — λ)^N)
For instance, the odds of rolling a number 6 on a fair die is 1/6. However, suppose that same die is rolled 10 times:
1 — ((1–0.1667)^10) = 0.8385
We see that the probability of rolling a number 6 now increases to 83.85%.
Based on the law of large numbers, the larger the number of trials; the larger the probability of an event happening even if the probability within a single trial is very low. So, let us generate a cumulative binomial probability to demonstrate how probability increases given an increase in the number of trials.
Firstly, we define a function (with probabilities set at 2%, 4%, and 6%, along with trials of up to 100:
par(bg = '#191661', fg = '#ffffff', col.main = '#ffffff', col.lab = '#ffffff', col.axis = '#ffffff')#lambda = probability of event occuring in a single trial
#powers = number of trials
#mu = overall probability given n number of trialsmuCalculation <- function(lambda, powers) {1 - ((1 - lambda)^powers)}
probability_at_lambda <- sapply(c(0.02, 0.04, 0.06), muCalculation, seq(0, 100, 1))Then, we can set up our data as a data frame and then plot as normal:probability_at_lambdadf=data.frame(probability_at_lambda)
col_headings <- c("probability1","probability2","probability3")
names(probability_at_lambdadf) <- col_headings
probability_at_lambdadf
attach(probability_at_lambdadf)
plot(probability_at_lambdadf$probability1,type="o",col="#b1aef4", xlab="N", ylab="Probability", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
lines(probability_at_lambdadf$probability2,type="o",col="red", xlab="N", ylab="Probability2", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
lines(probability_at_lambdadf$probability3,type="o",col="green", xlab="N", ylab="Probability3", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
title(main="Probability Chart")
grid(nx = NULL, ny = NULL, col = "lightgray", lty = "dotted",
lwd = par("lwd"), equilogs = TRUE)
legend("bottomright", probability[2], c("probability_at_lambda_1","probability_at_lambda_2", "probability_at_lambda_3"), cex=0.6, col=c("#b1aef4","red","green"), pch=21:22, lty=1:2)
proc.time()
Sample Table
Here is a sample table with the calculated probabilities (probability_at_lambdadf):
Plot
Accordingly, here is a plot of the probabilities:
Applying Cumulative Binomial Probability Theory to COVID-19
This is an example of a Shiny Web application that can calculate cumulative binomial probabilities on the fly.
The idea is that while the probability of an individual event happening may be low, the cumulative probability of the event happening increases with the number of trials.
Therefore, we are looking to examine how the individual probability of death would ultimately affect the probability of death across N number of cases.
1 — ((1 — λ)^N)
Here is an example of a Shiny Web App that allows us to manipulate the lambda values using a set of sliders and automatically update the probability curve.
To run this app, open the R Studio console and click File -> New File -> Shiny Web App and select either Single File to paste the ui.R and server.R codes together, or Multiple File to paste them separately.
A few points when setting up the UI (User Interface):
- lambda represents the probability of an event occurring in a single trial
- The slider input allows the user to set different values for lambda based on the associated probability
- The plot is then outputted with the output being designated the name “ProbPlot”
ui.R
library(shiny)# Define UI for application that draws a probability plot
shinyUI(fluidPage(
# Application title
titlePanel("COVID-19: Probability Analysis"),
# Sidebar with a slider input for value of lambda
sidebarLayout(
sidebarPanel(
sliderInput("lambda",
"Country 1: Probability of death (Deaths/Total Cases)",
min = 0,
max = 0.03,
value = 0.001),
sliderInput("lambda2",
"Country 2: Probability of death (Deaths/Total Cases)",
min = 0,
max = 0.03,
value = 0.001),
sliderInput("lambda3",
"Country 3: Probability of death (Deaths/Total Cases)",
min = 0,
max = 0.03,
value = 0.001),
sliderInput("lambda4",
"Country 4: Probability of death (Deaths/Total Cases)",
min = 0,
max = 0.03,
value = 0.001),
sliderInput("lambda5",
"Country 5: Probability of death (Deaths/Total Cases)",
min = 0,
max = 0.03,
value = 0.001)
),
# Show a plot of the generated probability plot
mainPanel(
plotOutput("ProbPlot")
)
)
))
Now, we set up the server — this is the part that takes the inputs and calculates the output that is eventually shown in the UI.
- The lambda values represent the inputs that we defined in the UI; i.e. the user sets the probability from the slider.
- The probability function is defined: {1 — ((1 — lambda)^powers)}
- The separate probability arrays are then calculated.
- The probability is then plotted.
server.R
Here is the slider application. When manipulating the individual probabilities (sliders) for each country — we can see that when the individual probability of death exceeds 1%, this is when the probability of at least 1 death per 100 cases approaches 80%. At 3%, the number of cases needed for a death to occur becomes even smaller still.
Let’s take a look at the examples of Germany, Ireland, Portugal, Sweden and Greece based on the table near the beginning of this article (all figures at the time of writing and rounded off to the nearest decimal point).
We can see that for Germany and Ireland, the number of cases would need to surpass 300 to result in a fatality (at least according to this model).
However, with Portugal, Sweden and Greece showing a probability of higher than 1%, it takes less than 100 cases to reach an 80% or higher probability of a person dying from coronavirus.
From this standpoint, the challenge in combating this virus is not only in reducing the number of cases, but also in reducing the probability of death per individual — i.e. paying special care to individuals who are elderly or have underlying health conditions. This is one of the reasons why Italy has been hit so hard by this virus — the country has the largest elderly population after Japan.
In terms of practical application — again reiterating that I am not a medical professional — probability readings could be used in this manner to assess at-risk groups (e.g. death rate of 1% or higher) by gauging the threshold for death — i.e. a death is expected to occur if N number of cases are exceeded in any one group.
Conclusion
In this article, you have seen how to:
- Use Shiny to visualise cumulative binomial probability
- Implement sliders to conduct interactive visualisations
- Use these tools to generate a probability analysis — in this case for COVID-19
Many thanks for your time, and the GitHub repository with the relevant code can be found here.
You can also find more of my data science content at https://www.psalmfield.info/?m=1