Dear friends!
Ever wonder how statisticians decide whether a hypothesis is true or false? The answer may surprise you — it depends on whether they follow the principles of frequentist or Bayesian statistics. From hypothesis testing to decision-making, 👋Shaun Wiechmann and 👋I will walk you through how frequentist and Bayesian statistics differ and how they impact the way we make decisions and draw conclusions from data.
Frequentist statistics and Bayesian statistics?
Bayesian statistics uses probabilities to analyze statistical models based on both data and prior beliefs, while frequentist statistics relies on data sets to determine probabilities and emphasizes experimentation to gather data. Frequentist statistics is seen as more objective because it relies on data and statistical methods without considering personal beliefs, while Bayesian statistics incorporates subjective beliefs through probability and inferences based on both data and prior beliefs. Both approaches have their own assumptions and limitations, and the choice of which approach to use can depend on the specific goals and context of the analysis.
The Bayesian approach to statistical inference incorporates prior information as probabilities into a new analysis through the assignment of hypotheses and parameters as probabilities. Subjective beliefs about past similar experiments are included when testing a model as opposed to Frequentist methods that do not rely on past beliefs at all. These additional beliefs may lead to different hypotheses and possibly even conflicting ones depending on the study. The result is the creation of more insight when running current experiments. This makes the Bayesian approach much broader, as some researchers may have differing opinions about these past experiments. These are known as priors where each result has resulted in a prior probability which can affect new models and research being conducted in the present and the past. New evidence and data can be implemented into a historical distribution based on updated beliefs. This will yield a new posterior distribution that has incorporated current subjective beliefs to update those beliefs based on more current findings, which altogether is referred to as the informative prior.
The Frequentist approach to statistics does not rely on hypotheses when considering the probabilities in a dataset. There is often a predetermined structural approach to an experiment which excludes any assumptions or beliefs made in the past. These prior events or hypotheses will be assumed null before any trials are run. The frequency of repeated random events is used to determine a P-Value to see if other data can represent a similar outcome in the experiment. This will be gleaned by the emphasis of the predetermined structure. Parameters will be in place ahead of time to control the model. The t-distribution and F-distribution will be observed to see how the data must interact relative to the P-Value of the experiment. Frequentist statistics will boil down to the confidence interval of the model relative to the observed values. Assuming a 95% confidence, the experiments’ P-Value(s) will be significant if 0.05 or less.
👣Example
Let us use the COVID-19 pandemic as an example in comparing a Bayesian versus a Frequentist approach. In March 2020, the world shut down due to the emergence of the then-novel coronavirus. In the medical field, statistics became a critical tool in understanding how the virus affected populations, and how it spread. In the present day, experts can look at both past and current data to discuss the implications COVID-19 may have in society.
A Bayesian approach would take historical mortality, morbidity, and other relevant infection data into consideration when hypothesizing on the spread and severity of COVID-19. Due to the “updating principle” in Bayes’ theorem, medical experts have updated current health measures based on new discoveries by analyzing statistics from earlier in the pandemic. Looking at posterior probabilities and hypotheses about the nature of COVID-19 has allowed experts to update their current findings and develop more informed measures to direct the public.
A Frequentist approach would ignore the past hypotheses associated with the nature of COVID-19 and focus on statistics observed in a current experiment. This will usually include a treatment group with a sample size of patients and a structured number of parameters specified in advance. Patients will be directly compared against expected probabilities of different health reactions against COVID-19, based on the qualities of the control group. Experts may then use the findings from a more current and structured experiment to advise the public on what actions to take.
Both approaches have statistical upsides and downsides by themselves. The Bayesian method considers prior hypotheses about the coronavirus; how it spreads, the rate of infections, who is at the highest risk, etc. However, this involves a lot of subjectivity in adherence to and interpretations of the statistics. The Frequentist approach relies on the sample size and other parameters set with a control group of patients. But, to understand the effects of the virus, it is also important to observe its nature in the past.
In the end, both the Bayesian and Frequentist approach to statistical inference are sound methods to build significant models. As seen in recent times with the COVID-19 pandemic, both provide valuable information in understanding how a virus operates. However, statisticians have prioritized the use of Bayesian statistics in researching COVID-19. Previous data about symptoms, infection rates, and the virus’ nature are referenced when predicting what populations could get infected. Historical medical data is also utilized with analyses of many different diseases in the medical field, making the Bayesian approach more popular. Even so, the circumstances of a model should be carefully considered when choosing which method to use. If you have preconceived ideas or opinions about a case study, it may be beneficial to use the Bayesian approach. On the other hand, if you have uniform data and research beforehand, the Frequentist approach might be more suitable. Either approach requires a quality selection and definition of data. All analyses and methods should be properly defined and disclosed regardless of the approach.