Bayesian statistics

Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a degree of belief in an event, which can change as new information is gathered, rather than a fixed value based upon frequency or propensity.[1] The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation that views probability as the limit of the relative frequency of an event after a large number of trials.[2]

Bayesian statistical methods use Bayes' theorem to compute and update probabilities after obtaining new data. Bayes' theorem describes the conditional probability of an event based on data as well as prior information or beliefs about the event or conditions related to the event. For example, in Bayesian inference, Bayes' theorem can be used to estimate the parameters of a probability distribution or statistical model. Since Bayesian statistics treats probability as a degree of belief, Bayes' theorem can directly assign a probability distribution that quantifies the belief to the parameter or set of parameters.[2]

Bayesian statistics was named after Thomas Bayes, who formulated a specific case of Bayes' theorem in his paper published in 1763. In several papers spanning from the late-1700s to the early-1800s, Pierre-Simon Laplace developed the Bayesian interpretation of probability. Laplace used methods that would now be considered as Bayesian methods to solve a number of statistical problems. Many Bayesian methods were developed by later authors, but the term was not commonly used to describe such methods until the 1950s. During much of the 20th century, Bayesian methods were unfavorable with many statisticians due to philosophical and practical considerations. Many Bayesian methods required a lot of computation to complete, and most methods that were widely used during the century were based on the frequentist interpretation. However, with the advent of powerful computers and new algorithms like Markov chain Monte Carlo, Bayesian methods have seen increasing use within statistics in the 21st century.[2][3]

Bayes' theorem

Bayes' theorem is a fundamental theorem in Bayesian statistics, as it is used by Bayesian methods to update probabilities, which are degrees of belief, after obtaining new data. Given two events and , the conditional probability of given that is true is expressed as follows:[4]

where . Although Bayes' theorem is a fundamental result of probability theory, it has a specific interpretation in Bayesian statistics. In the above equation, usually represents a proposition (such as the statement that a coin lands on heads fifty percent of the time) and represents the evidence, or new data that is to be taken into account (such as the result of a series of coin flips). is the prior probability of which expresses one's beliefs about before evidence is taken into account. The prior probability may also quantify prior knowledge or information about . is the likelihood function, which can be interpreted as the probability of the evidence given that is true. The likelihood quantifies the extent to which the evidence supports the proposition . is the posterior probability, the probability of the proposition after taking the evidence into account. Essentially, Bayes' theorem updates one's prior beliefs after considering the new evidence .[2]

The probability of the evidence can be calculated using the law of total probability. If is a partition of the sample space, which is the set of all outcomes of an experiment, then,[2][4]

When there are an infinite number of outcomes, it is necessary to integrate over all outcomes to calculate using the law of total probability. Often, is difficult to calculate as the calculation would involve sums or integrals that would be time-consuming to evaluate, so often only the product of the prior and likelihood is considered, since the evidence does not change in the same analysis. The posterior is proportional to this product:[2]

The maximum a posteriori, which is the mode of the posterior and is often computed in Bayesian statistics using mathematical optimization methods, remains the same. The posterior can be approximated even without computing the exact value of with methods such as Markov chain Monte Carlo or variational Bayesian methods.[2]

Outline of Bayesian methods

The general set of statistical techniques can be divided into a number of activities, many of which have special Bayesian versions.

Bayesian inference

Bayesian inference refers to statistical inference where uncertainty in inferences is quantified using probability. In classical frequentist inference, model parameters and hypotheses are considered to be fixed. Probabilities are not assigned to parameters or hypotheses in frequentist inference. For example, it would not make sense in frequentist inference to directly assign a probability to an event that can only happen once, such as the result of the next flip of a fair coin. However, it would make sense to state that the proportion of heads approaches one-half as the number of coin flips increases.[5]

Statistical models specify a set of statistical assumptions and processes that represent how the sample data is generated. Statistical models have a number of parameters that can be modified. For example, a coin can be represented as samples from a Bernoulli distribution, which models two possible outcomes. The Bernoulli distribution has a single parameter equal to the probability of one outcome, which in most cases is the probability of landing on heads. Devising a good model for the data is central in Bayesian inference. In most cases, models only approximate the true process, and may not take into account certain factors influencing the data.[2] In Bayesian inference, probabilities can be assigned to model parameters. Parameters can be represented as random variables. Bayesian inference uses Bayes' theorem to update probabilities after more evidence is obtained or known.[2][6]

Statistical modeling

The formulation of statistical models using Bayesian statistics has the identifying feature of requiring the specification of prior distributions for any unknown parameters. Indeed, parameters of prior distributions may themselves have prior distributions, leading to Bayesian hierarchical modeling[7]., or may be interrelated, leading to Bayesian networks.

Design of experiments

The Bayesian design of experiments includes a concept called 'influence of prior beliefs'. This approach uses sequential analysis techniques to include the outcome of earlier experiments in the design of the next experiment. This is achieved by updating 'beliefs' through the use of prior and posterior distribution. This allows the design of experiments to make good use of resources of all types. An example of this is the multi-armed bandit problem.

Statistical graphics

Statistical graphics includes methods for data exploration, for model validation, etc. The use of certain modern computational techniques for Bayesian inference, specifically the various types of Markov chain Monte Carlo techniques, have led to the need for checks, often made in graphical form, on the validity of such computations in expressing the required posterior distributions.

References

  1. ^ "What are Bayesian Statistics?". deepai.org.
  2. ^ a b c d e f g h i Gelman, Andrew; Carlin, John B.; Stern, Hal S.; Dunson, David B.; Vehtari, Aki; Rubin, Donald B. (2013). Bayesian Data Analysis, Third Edition. Chapman and Hall/CRC. ISBN 978-1-4398-4095-5.
  3. ^ Fienberg, Stephen E. (2006). "When Did Bayesian Inference Become "Bayesian"?". Bayesian Analysis. 1 (1): 1–40.
  4. ^ a b Grinstead, Charles M.; Snell, J. Laurie (2006). Introduction to probability (2nd ed.). Providence, RI: American Mathematical Society. ISBN 978-0-8218-9414-9.
  5. ^ Wakefield, Jon (2013). Bayesian and frequentist regression methods. New York, NY: Springer. ISBN 978-1-4419-0924-4.
  6. ^ Congdon, Peter (2014). Applied Bayesian modelling (2nd ed.). Wiley. ISBN 978-1119951513.
  7. ^ Hajiramezanali, E. & Dadaneh, S. Z. & Karbalayghareh, A. & Zhou, Z. & Qian, X. Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data. 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada. https://arxiv.org/pdf/1810.09433.pdf

Further reading

External links

Aumann's agreement theorem

In game theory, Aumann's agreement theorem is a theorem which demonstrates that rational agents with common knowledge of each other's beliefs cannot agree to disagree. It was first formulated in the 1976 paper titled "Agreeing to Disagree" by Robert Aumann, after whom the theorem is named.

Bayesian probability

Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief.The Bayesian interpretation of probability can be seen as an extension of propositional logic that enables reasoning with hypotheses, i.e., the propositions whose truth or falsity is uncertain. In the Bayesian view, a probability is assigned to a hypothesis, whereas under frequentist inference, a hypothesis is typically tested without being assigned a probability.

Bayesian probability belongs to the category of evidential probabilities; to evaluate the probability of a hypothesis, the Bayesian probabilist specifies some prior probability, which is then updated to a posterior probability in the light of new, relevant data (evidence). The Bayesian interpretation provides a standard set of procedures and formulae to perform this calculation.

The term Bayesian derives from the 18th century mathematician and theologian Thomas Bayes, who provided the first mathematical treatment of a non-trivial problem of statistical data analysis using what is now known as Bayesian inference. Mathematician Pierre-Simon Laplace pioneered and popularised what is now called Bayesian probability.

Coherence (philosophical gambling strategy)

In a thought experiment proposed by the Italian probabilist Bruno de Finetti in order to justify Bayesian probability, an array of wagers is coherent precisely if it does not expose the wagerer to certain loss regardless of the outcomes of events on which they are wagering, even if their opponent makes the most judicious choices.

Expectation propagation

Expectation propagation (EP) is a technique in Bayesian machine learning.

EP finds approximations to a probability distribution. It uses an iterative approach that leverages the factorization structure of the target distribution. It differs from other Bayesian approximation approaches such as variational Bayesian methods.

More specifically, suppose we wish to approximate an intractable probability distribution with a tractable distribution . Expectation propagation achieves this approximation by minimizing the Kullback-Leibler divergence . Variational Bayesian methods minimize instead.

If is a Gaussian , then is minimized with and being equal to the mean of and the covariance of , respectively; this is called moment matching.

Graphical model

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning.

Hyperparameter

In Bayesian statistics, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis.

For example, if one is using a beta distribution to model the distribution of the parameter p of a Bernoulli distribution, then:

p is a parameter of the underlying system (Bernoulli distribution), and

α and β are parameters of the prior distribution (beta distribution), hence hyperparameters.One may take a single value for a given hyperparameter, or one can iterate and take a probability distribution on the hyperparameter itself, called a hyperprior.

Isotonic regression

In statistics, isotonic regression or monotonic regression is the technique of fitting a free-form line to a sequence of observations under the following constraints: the fitted free-form line has to be non-decreasing everywhere, and it has to lie as close to the observations as possible.

JASP

JASP is a free and open-source graphical program for statistical analysis, designed to be easy to use, and familiar to users of SPSS. Additionally, it provides many Bayesian statistical methods. JASP generally produces APA style results tables and plots to ease publication. It promotes open science by integration with the Open Science Framework and reproducibility by integrating the analysis settings into the results. The development of JASP is financially supported by several universities and research funds.

Just another Gibbs sampler

Just another Gibbs sampler (JAGS) is a program for simulation from Bayesian hierarchical models using Markov chain Monte Carlo (MCMC), developed by Martyn Plummer. JAGS has been employed for statistical work in many fields, for example ecology, management, and genetics.JAGS aims for compatibility with WinBUGS/OpenBUGS through the use of a dialect of the same modeling language (informally, BUGS), but it provides no GUI for model building and MCMC sample postprocessing, which must therefore be treated in a separate program (for example calling JAGS from R through a library such as rjags and post-processing MCMC output in R).The main advantage of JAGS in comparison to the members of the original BUGS family (WinBUGS and OpenBUGS) is its platform independence. It is written in C++, while the BUGS family is written in Component Pascal, a less widely known programming language. In addition, JAGS is already part of many repositories of Linux distributions such as Ubuntu. It can also be compiled as a 64-bit application on 64-bit platforms, thus making all the addressable space available to BUGS models.

JAGS can be used via the command line or run in batch mode through script files. This means that there is no need to redo the settings with every run and that the program can be called and controlled from within another program (e.g. from R via rjags as outlined above).

JAGS is licensed under the GNU General Public License.

Kernel (statistics)

The term kernel is used in statistical analysis to refer to a window function. The term "kernel" has several distinct meanings in different branches of statistics.

Marginal likelihood

In statistics, a marginal likelihood function, or integrated likelihood, is a likelihood function in which some parameter variables have been marginalized. In the context of Bayesian statistics, it may also be referred to as the evidence or model evidence.

Markov logic network

A Markov logic network (MLN) is a probabilistic logic which applies the ideas of a Markov network to first-order logic, enabling uncertain inference. Markov logic networks generalize first-order logic, in the sense that, in a certain limit, all unsatisfiable statements have a probability of zero, and all tautologies have probability one.

OpenBUGS

OpenBUGS is a software application for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo (MCMC) methods. OpenBUGS is the open source variant of WinBUGS (Bayesian inference Using Gibbs Sampling). It runs under Windows and Linux, as well as from inside the R statistical package. Versions from v3.0.7 onwards have been designed to be at least as efficient and reliable as WinBUGS over a range of test applications.

Posterior probability

In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence or background is taken into account. Similarly, the posterior probability distribution is the probability distribution of an unknown quantity, treated as a random variable, conditional on the evidence obtained from an experiment or survey. "Posterior", in this context, means after taking into account the relevant evidence related to the particular case being examined. For instance, there is a ("non-posterior") probability of a person finding buried treasure if they dig in a random spot, and a posterior probability of finding buried treasure if they dig in a spot where their metal detector rings.

Precision (statistics)

In statistics, precision is the reciprocal of the variance, and the precision matrix (also known as concentration matrix) is the matrix inverse of the covariance matrix. Some particular statistical models define the term precision differently.

One particular use of the precision matrix is in the context of Bayesian analysis of the multivariate normal distribution: for example, Bernardo & Smith prefer to parameterise the multivariate normal distribution in terms of the precision matrix, rather than the covariance matrix, because of certain simplifications that then arise.

In general, statisticians prefer to use the dual term variability rather than precision. Variability is the amount of imprecision.

Relevance vector machine

In mathematics, a Relevance Vector Machine (RVM) is a machine learning technique that uses Bayesian inference to obtain parsimonious solutions for regression and probabilistic classification. The RVM has an identical functional form to the support vector machine, but provides probabilistic classification.

It is actually equivalent to a Gaussian process model with covariance function:

where is the kernel function (usually Gaussian), are the variances of the prior on the weight vector , and are the input vectors of the training set.

Compared to that of support vector machines (SVM), the Bayesian formulation of the RVM avoids the set of free parameters of the SVM (that usually require cross-validation-based post-optimizations). However RVMs use an expectation maximization (EM)-like learning method and are therefore at risk of local minima. This is unlike the standard sequential minimal optimization (SMO)-based algorithms employed by SVMs, which are guaranteed to find a global optimum (of the convex problem).

The relevance vector machine is patented in the United States by Microsoft.

Stan (software)

Stan is a probabilistic programming language for statistical inference written in C++. The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function.Stan is licensed under the New BSD License. Stan is named in honour of Stanislaw Ulam, pioneer of the Monte Carlo method.Stan was created by Andrew Gelman and Bob Carpenter, with a development team consisting of 34 members.

Subjectivism

Subjectivism is the doctrine that "our own mental activity is the only unquestionable fact of our experience.", instead of shared or communal, and that there is no external or objective truth.

The success of this position is historically attributed to Descartes and his methodic doubt. Subjectivism accords primacy to subjective experience as fundamental of all measure and law. In extreme forms like Solipsism, it may hold that the nature and existence of every object depends solely on someone's subjective awareness of it. One may consider the qualified empiricism of George Berkeley in this context, given his reliance on God as the prime mover of human perception. Thus, subjectivism.

WinBUGS

WinBUGS is statistical software for Bayesian analysis using Markov chain Monte Carlo (MCMC) methods.

It is based on the BUGS (Bayesian inference Using Gibbs Sampling) project started in 1989. It runs under Microsoft Windows, though it can also be run on Linux or Mac using Wine.It was developed by the BUGS Project, a team of UK researchers at the MRC Biostatistics Unit, Cambridge, and Imperial College School of Medicine, London.

The last version of WinBUGS was version 1.4.3, released in August 2007. Development is now focused on OpenBUGS, an open-source version of the package. WinBUGS 1.4.3 remains available as a stable version for routine use, but is no longer being developed.

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.