# Posterior probability

In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence or background is taken into account. Similarly, the posterior probability distribution is the probability distribution of an unknown quantity, treated as a random variable, conditional on the evidence obtained from an experiment or survey. "Posterior", in this context, means after taking into account the relevant evidence related to the particular case being examined. For instance, there is a ("non-posterior") probability of a person finding buried treasure if they dig in a random spot, and a posterior probability of finding buried treasure if they dig in a spot where their metal detector rings.

## Definition

The posterior probability is the probability of the parameters ${\displaystyle \theta }$ given the evidence ${\displaystyle X}$: ${\displaystyle p(\theta |X)}$.

It contrasts with the likelihood function, which is the probability of the evidence given the parameters: ${\displaystyle p(X|\theta )}$.

The two are related as follows:

Let us have a prior belief that the probability distribution function is ${\displaystyle p(\theta )}$ and observations ${\displaystyle x}$ with the likelihood ${\displaystyle p(x|\theta )}$, then the posterior probability is defined as

${\displaystyle p(\theta |x)={\frac {p(x|\theta )p(\theta )}{p(x)}}.}$[1]

The posterior probability can be written in the memorable form as

${\displaystyle {\text{Posterior probability}}\propto {\text{Likelihood}}\times {\text{Prior probability}}}$.

## Example

Suppose there is a mixed school having 60% boys and 40% girls as students. The girls wear trousers or skirts in equal numbers; all boys wear trousers. An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers. What is the probability this student is a girl? The correct answer can be computed using Bayes' theorem.

The event ${\displaystyle G}$ is that the student observed is a girl, and the event ${\displaystyle T}$ is that the student observed is wearing trousers. To compute the posterior probability ${\displaystyle P(G|T)}$, we first need to know:

• ${\displaystyle P(G)}$, or the probability that the student is a girl regardless of any other information. Since the observer sees a random student, meaning that all students have the same probability of being observed, and the percentage of girls among the students is 40%, this probability equals 0.4.
• ${\displaystyle P(B)}$, or the probability that the student is not a girl (i.e. a boy) regardless of any other information (${\displaystyle B}$ is the complementary event to ${\displaystyle G}$). This is 60%, or 0.6.
• ${\displaystyle P(T|G)}$, or the probability of the student wearing trousers given that the student is a girl. As they are as likely to wear skirts as trousers, this is 0.5.
• ${\displaystyle P(T|B)}$, or the probability of the student wearing trousers given that the student is a boy. This is given as 1.
• ${\displaystyle P(T)}$, or the probability of a (randomly selected) student wearing trousers regardless of any other information. Since ${\displaystyle P(T)=P(T|G)P(G)+P(T|B)P(B)}$ (via the law of total probability), this is ${\displaystyle P(T)=0.5\times 0.4+1\times 0.6=0.8}$.

Given all this information, the posterior probability of the observer having spotted a girl given that the observed student is wearing trousers can be computed by substituting these values in the formula:

${\displaystyle P(G|T)={\frac {P(T|G)P(G)}{P(T)}}={\frac {0.5\times 0.4}{0.8}}=0.25.}$

The intuition of this result is that out of every hundred students (60 boys and 40 girls), since we observe trousers the student is one of the 80 students who wear these (60 boys and 20 girls); since 20/80 = 1/4 of these are girls, the probability that the student in trousers is a girl is 1/4.

## Calculation

The posterior probability distribution of one random variable given the value of another can be calculated with Bayes' theorem by multiplying the prior probability distribution by the likelihood function, and then dividing by the normalizing constant, as follows:

${\displaystyle f_{X\mid Y=y}(x)={f_{X}(x){\mathcal {L}}_{X\mid Y=y}(x) \over {\int _{-\infty }^{\infty }f_{X}(u){\mathcal {L}}_{X\mid Y=y}(u)\,du}}}$

gives the posterior probability density function for a random variable ${\displaystyle X}$ given the data ${\displaystyle Y=y}$, where

• ${\displaystyle f_{X}(x)}$ is the prior density of ${\displaystyle X}$,
• ${\displaystyle {\mathcal {L}}_{X\mid Y=y}(x)=f_{Y\mid X=x}(y)}$ is the likelihood function as a function of ${\displaystyle x}$,
• ${\displaystyle \int _{-\infty }^{\infty }f_{X}(u){\mathcal {L}}_{X\mid Y=y}(u)\,du}$ is the normalizing constant, and
• ${\displaystyle f_{X\mid Y=y}(x)}$ is the posterior density of ${\displaystyle X}$ given the data ${\displaystyle Y=y}$.

## Credible interval

Posterior probability is a conditional probability conditioned on randomly observed data. Hence it is a random variable. For a random variable, it is important to summarize its amount of uncertainty. One way to achieve this goal is to provide a credible interval of the posterior probability.

## Classification

In classification, posterior probabilities reflect the uncertainty of assessing an observation to particular class, see also Class membership probabilities. While statistical classification methods by definition generate posterior probabilities, Machine Learners usually supply membership values which do not induce any probabilistic confidence. It is desirable to transform or re-scale membership values to class membership probabilities, since they are comparable and additionally more easily applicable for post-processing.

## References

### Citations

1. ^ Christopher M. Bishop (2006). Pattern Recognition and Machine Learning. Springer. pp. 21–24. ISBN 978-0-387-31073-2.

### Sources

Bayes' theorem

In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if cancer is related to age, then, using Bayes' theorem, a person's age can be used to more accurately assess the probability that they have cancer, compared to the assessment of the probability of cancer made without knowledge of the person's age.

One of the many applications of Bayes' theorem is Bayesian inference, a particular approach to statistical inference. When applied, the probabilities involved in Bayes' theorem may have different probability interpretations. With the Bayesian probability interpretation the theorem expresses how a degree of belief, expressed as a probability, should rationally change to account for availability of related evidence. Bayesian inference is fundamental to Bayesian statistics.

Bayes' theorem is named after Reverend Thomas Bayes (; 1701?–1761), who first used conditional probability to provide an algorithm (his Proposition 9) that uses evidence to calculate limits on an unknown parameter, published as An Essay towards solving a Problem in the Doctrine of Chances (1763). In what he called a scholium, Bayes extended his algorithm to any unknown prior cause. Independently of Bayes, Pierre-Simon Laplace in 1774, and later in his 1812 "Théorie analytique des probabilités" used conditional probability to formulate the relation of an updated posterior probability from a prior probability, given evidence. Sir Harold Jeffreys put Bayes's algorithm and Laplace's formulation on an axiomatic basis. Jeffreys wrote that Bayes' theorem "is to the theory of probability what the Pythagorean theorem is to geometry".

Bayesian inference in phylogeny

Bayesian inference of phylogeny uses a likelihood function to create a quantity called the posterior probability of trees using a model of evolution, based on some prior probabilities, producing the most likely phylogenetic tree for the given data. The Bayesian approach has become popular due to advances in computing speeds and the integration of Markov chain Monte Carlo (MCMC) algorithms. Bayesian inference has a number of applications in molecular phylogenetics and systematics.

Bayesian linear regression

In statistics, Bayesian linear regression is an approach to linear regression in which the statistical analysis is undertaken within the context of Bayesian inference. When the regression model has errors that have a normal distribution, and if a particular form of prior distribution is assumed, explicit results are available for the posterior probability distributions of the model's parameters.

Bayesian probability

Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief.The Bayesian interpretation of probability can be seen as an extension of propositional logic that enables reasoning with hypotheses, i.e., the propositions whose truth or falsity is uncertain. In the Bayesian view, a probability is assigned to a hypothesis, whereas under frequentist inference, a hypothesis is typically tested without being assigned a probability.

Bayesian probability belongs to the category of evidential probabilities; to evaluate the probability of a hypothesis, the Bayesian probabilist specifies some prior probability, which is then updated to a posterior probability in the light of new, relevant data (evidence). The Bayesian interpretation provides a standard set of procedures and formulae to perform this calculation.

The term Bayesian derives from the 18th century mathematician and theologian Thomas Bayes, who provided the first mathematical treatment of a non-trivial problem of statistical data analysis using what is now known as Bayesian inference. Mathematician Pierre-Simon Laplace pioneered and popularised what is now called Bayesian probability.

Checking whether a coin is fair

In statistics, the question of checking whether a coin is fair is one whose importance lies, firstly, in providing a simple problem on which to illustrate basic ideas of statistical inference and, secondly, in providing a simple problem that can be used to compare various competing methods of statistical inference, including decision theory. The practical problem of checking whether a coin is fair might be considered as easily solved by performing a sufficiently large number of trials, but statistics and probability theory can provide guidance on two types of question; specifically those of how many trials to undertake and of the accuracy an estimate of the probability of turning up heads, derived from a given sample of trials.

A fair coin is an idealized randomizing device with two states (usually named "heads" and "tails") which are equally likely to occur. It is based on the coin flip used widely in sports and other situations where it is required to give two parties the same chance of winning. Either a specially designed chip or more usually a simple currency coin is used, although the latter might be slightly "unfair" due to an asymmetrical weight distribution, which might cause one state to occur more frequently than the other, giving one party an unfair advantage. So it might be necessary to test experimentally whether the coin is in fact "fair" – that is, whether the probability of the coin falling on either side when it is tossed is exactly 50%. It is of course impossible to rule out arbitrarily small deviations from fairness such as might be expected to affect only one flip in a lifetime of flipping; also it is always possible for an unfair (or "biased") coin to happen to turn up exactly 10 heads in 20 flips. Therefore, any fairness test must only establish a certain degree of confidence in a certain degree of fairness (a certain maximum bias). In more rigorous terminology, the problem is of determining the parameters of a Bernoulli process, given only a limited sample of Bernoulli trials.

Computational phylogenetics

Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa. For example, these techniques have been used to explore the family tree of hominid species and the relationships between specific genes shared by many types of organisms. Traditional phylogenetics relies on morphological data obtained by measuring and quantifying the phenotypic properties of representative organisms, while the more recent field of molecular phylogenetics uses nucleotide sequences encoding genes or amino acid sequences encoding proteins as the basis for classification. Many forms of molecular phylogenetics are closely related to and make extensive use of sequence alignment in constructing and refining phylogenetic trees, which are used to classify the evolutionary relationships between homologous genes represented in the genomes of divergent species. The phylogenetic trees constructed by computational methods are unlikely to perfectly reproduce the evolutionary tree that represents the historical relationships between the species being analyzed. The historical species tree may also differ from the historical tree of an individual homologous gene shared by those species.

Credible interval

In Bayesian statistics, a credible interval is an interval within which an unobserved parameter value falls with a particular subjective probability. It is an interval in the domain of a posterior probability distribution or a predictive distribution. The generalisation to multivariate problems is the credible region. Credible intervals are analogous to confidence intervals in frequentist statistics, although they differ on a philosophical basis; Bayesian intervals treat their bounds as fixed and the estimated parameter as a random variable, whereas frequentist confidence intervals treat their bounds as random variables and the parameter as a fixed value. Also, Bayesian credible intervals use (and indeed, require) knowledge of the situation-specific prior distribution, while the frequentist confidence intervals do not.

For example, in an experiment that determines the distribution of possible values of the parameter ${\displaystyle \mu }$, if the subjective probability that ${\displaystyle \mu }$ lies between 35 and 45 is 0.95, then ${\displaystyle 35\leq \mu \leq 45}$ is a 95% credible interval.

Cromwell's rule

Cromwell's rule, named by statistician Dennis Lindley, states that the use of prior probabilities of 0 ("the event will definitely not occur") or 1 ("the event will definitely occur") should be avoided, except when applied to statements that are logically true or false, such as 2+2 equaling 4 or 5.

The reference is to Oliver Cromwell, who wrote to the General Assembly of the Church of Scotland on 5 August 1650, including a phrase that has become well known and frequently quoted:

I beseech you, in the bowels of Christ, think it possible that you may be mistaken.

As Lindley puts it, assigning a probability should "leave a little probability for the moon being made of green cheese; it can be as small as 1 in a million, but have it there since otherwise an army of astronauts returning with samples of the said cheese will leave you unmoved." Similarly, in assessing the likelihood that tossing a coin will result in either a head or a tail facing upwards, there is a possibility, albeit remote, that the coin will land on its edge and remain in that position.

If the prior probability assigned to a hypothesis is 0 or 1, then, by Bayes' theorem, the posterior probability (probability of the hypothesis, given the evidence) is forced to be 0 or 1 as well; no evidence, no matter how strong, could have any influence.

A strengthened version of Cromwell's rule, applying also to statements of arithmetic and logic, alters the first rule of probability, or the convexity rule, 0 ≤ Pr(A) ≤ 1, to 0 < Pr(A) < 1.

Jump diffusion

Jump diffusion is a stochastic process that involves jumps and diffusion. It has important applications in magnetic reconnection, coronal mass ejections, condensed matter physics, in Pattern theory and computational vision and in option pricing.

Lindley's paradox is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give different results for certain choices of the prior distribution. The problem of the disagreement between the two approaches was discussed in Harold Jeffreys' 1939 textbook; it became known as Lindley's paradox after Dennis Lindley called the disagreement a paradox in a 1957 paper.Although referred to as a paradox, the differing results from the Bayesian and frequentist approaches can be explained as using them to answer fundamentally different questions, rather than actual disagreement between the two methods.

Nevertheless, for a large class of priors the differences between the frequentist and Bayesian approach are caused by keeping the significance level fixed: as even Lindley recognized, "the theory does not justify the practice of keeping the significance level fixed'' and even "some computations by Prof. Pearson in the discussion to that paper emphasized how the significance level would have to change with the sample size, if the losses and prior probabilities were kept fixed.'' In fact, if the critical value increases with the sample size suitably fast, then the disagreement between the frequentist and Bayesian approaches becomes negligible as the sample size increases.

Maximum a posteriori estimation

In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of maximum likelihood (ML) estimation, but employs an augmented optimization objective which incorporates a prior distribution (that quantifies the additional information available through prior knowledge of a related event) over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of ML estimation.

Normalizing constant

The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one.

Occupancy grid mapping

Occupancy Grid Mapping refers to a family of computer algorithms in probabilistic robotics for mobile robots which address the problem of generating maps from noisy and uncertain sensor measurement data, with the assumption that the robot pose is known.

The basic idea of the occupancy grid is to represent a map of the environment as an evenly spaced field of binary random variables each representing the presence of an obstacle at that location in the environment. Occupancy grid algorithms compute approximate posterior estimates for these random variables.

Phrymaceae

Phrymaceae, also known as the lopseed family, is a small family of flowering plants in the order Lamiales. It has a nearly cosmopolitan distribution, but is concentrated in two centers of diversity, one in Australia, the other in western North America. Members of this family occur in diverse habitats, including deserts, river banks and mountains.

Phrymaceae is a family of mostly herbs and a few subshrubs, bearing tubular, bilaterally symmetric flowers. They can be annuals or perennials. Some of the Australian genera are aquatic or semiaquatic. One of these, Glossostigma, is among the smallest of flowering plants, larger than the aquatic Lemna but similar in size to the terrestrial Lepuropetalon. The smallest members of Phrymaceae are only a few centimeters long, while the largest are woody shrubs to 4 m tall. The floral structure of Phrymaceae is variable, to such an extent that a morphological assessment is difficult. Reproduction is also variable, being brought about by different mating systems which may be sexual or asexual, and may involve outcrossing, self-fertilization, or mixed mating. Some are pollinated by insects, others by hummingbirds. The most common fruit type in this family is a dehiscent capsule containing numerous seeds, but exceptions exist such as an achene, in Phryma leptostachya, or a berry-like fruit in Leucocarpus.

About 16 species are in cultivation. They are known horticulturally as "Mimulus" and were formerly placed in the genus Mimulus when it was defined broadly to include about 150 species. Mimulus, as a botanical name, rather than a common name or horticultural name, now represents a genus of only seven species. Most of its former species have been transferred to Diplacus or Erythranthe. Six of the horticultural species are of special importance. These are Diplacus aurantiacus, Diplacus puniceus, Erythranthe cardinalis, Erythranthe guttata, Erythranthe lutea, and Erythranthe cuprea.

Phrymaceae has recently become a model system for evolutionary studies.Within the order Lamiales, Phrymaceae is a member of an unnamed clade of five families. This clade has the topology of a phylogenetic grade and can therefore be represented as {Mazaceae [Phrymaceae (Paulowniaceae )]}. Two of these families, Mazaceae and Rehmanniaceae are not part of the APG III system. They were not formally validated until 2011.The composition of Phrymaceae and the delimitation of genera changed radically from 2002 to 2012 as a result of molecular phylogenetic studies. Previously, Phrymaceae had been monotypic with Phryma leptostachya as its only species. It was limited in geographic range to eastern North America and eastern China. Phryma had been previously placed by Cronquist in Verbenaceae. Research on phylogenetic relationships revealed that several genera, traditionally included in Scrophulariaceae, were actually more closely related to Phryma than to Scrophularia. These genera became part of an expanded Phrymaceae. Mazus and Lancea were included in Phrymaceae for a short time before further studies indicated that they, along with Dodartia should be segregated as a new family, Mazaceae.

As currently understood, Phrymaceae consists of about 210 species in 13 genera. Erythranthe (111 species) and Diplacus (46 species) are much larger than the other genera. Phrymaceae is distributed nearly worldwide but with the majority of species in western North America (about 130 species) and Australia (about 30 species). Phrymaceae consists of four clades, all of which have strong statistical support in cladistic analyses of DNA sequences. No relationships among these four clades have been strongly supported by the bootstrap or posterior probability assessments of clade support in any of the datasets that have been produced so far. One of the four main clades consists of a single species, Phryma leptostachya. Another consists of Mimulus sensu stricto (seven species) and six genera that have an Australian distribution. The other two clades have an American-Asian disjunct distribution. One of these includes the large genus Diplacus, while the other of these includes the other large genus, Erythranthe.

Estimates of the number of species in Phrymaceae have varied widely because of a lack of clear differences between species in certain genera, especially Diplacus and Erythranthe. When these two genera have been treated as segregates of Mimulus, the number of species assigned to Mimulus sensu lato has ranged from about 90 to about 150. A 2008 paper indicates that the actual number of species is well over 150.In 2012, a revision of Phrymaceae recognized 188 species in the family, but noted that 17 species from Australia and five from North America would be named and described in future publications. Ten of those unnamed species will be in Peplidium, raising the number of species in that genus from four to 14.

Posterior

Posterior may refer to:

Posterior (anatomy), the end of an organism opposite to its head

Buttocks, as a euphemism

Posterior probability, the conditional probability that is assigned when the relevant evidence is taken into account

Posterior tense, a relative future tense

Predictive probability of success

Predictive probability of success (PPOS) is a statistics concept commonly used in the pharmaceutical industry including by health authorities to support decision making. In clinical trials, PPOS is the probability of observing a success in the future based on existing data. It is one type of probability of success. A Bayesian means by which the PPOS can be determined is through integrating the data's likelihood over possible future responses (posterior distribution).

Prior probability

In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into account. For example, the prior could be the probability distribution representing the relative proportions of voters who will vote for a particular politician in a future election. The unknown quantity may be a parameter of the model or a latent variable rather than an observable variable.

Bayes' theorem calculates the renormalized pointwise product of the prior and the likelihood function, to produce the posterior probability distribution, which is the conditional distribution of the uncertain quantity given the data.

Similarly, the prior probability of a random event or an uncertain proposition is the unconditional probability that is assigned before any relevant evidence is taken into account.

Priors can be created using a number of methods. A prior can be determined from past information, such as previous experiments. A prior can be elicited from the purely subjective assessment of an experienced expert. An uninformative prior can be created to reflect a balance among outcomes when no information is available. Priors can also be chosen according to some principle, such as symmetry or maximizing entropy given constraints; examples are the Jeffreys prior or Bernardo's reference prior. When a family of conjugate priors exists, choosing a prior from that family simplifies calculation of the posterior distribution.

Parameters of prior distributions are a kind of hyperparameter. For example, if one uses a beta distribution to model the distribution of the parameter p of a Bernoulli distribution, then:

p is a parameter of the underlying system (Bernoulli distribution), and

α and β are parameters of the prior distribution (beta distribution); hence hyperparameters.Hyperparameters themselves may have hyperprior distributions expressing beliefs about their values. A Bayesian model with more than one level of prior like this is called a hierarchical Bayes model.

Probabilistic neural network

A probabilistic neural network (PNN) is a feedforward neural network, which is widely used in classification and pattern recognition problems. In the PNN algorithm, the parent probability distribution function (PDF) of each class is approximated by a Parzen window and a non-parametric function. Then, using PDF of each class, the class probability of a new input data is estimated and Bayes’ rule is then employed to allocate the class with highest posterior probability to new input data. By this method, the probability of mis-classification is minimized. This type of ANN was derived from the Bayesian network and a statistical algorithm called Kernel Fisher discriminant analysis. It was introduced by D.F. Specht in 1966. In a PNN, the operations are organized into a multilayered feedforward network with four layers:

Input layer

Pattern layer

Summation layer

Output layer

Template modeling score

In bioinformatics, the template modeling score or TM-score is a measure of similarity between two protein structures with different tertiary structures. The TM-score is intended as a more accurate measure of the quality of full-length protein structures than the often used RMSD measure. The TM-score indicates the difference between two structures by a score between ${\displaystyle (0,1]}$, where 1 indicates a perfect match between two structures (thus the higher the better). Generally scores below 0.20 corresponds to randomly chosen unrelated proteins whereas structures with a score higher than 0.5 assume roughly the same fold. A quantitative study shows that proteins of TM-score = 0.5 have a posterior probability of 37% in the same CATH topology family and of 13% in the same SCOP fold family. The probabilities increase rapidly when TM-score > 0.5. The TM-score is designed to be independent of protein lengths. The Global Distance Test (GDT) algorithm, and its GDT TS score to represent "total score", is another measure of similarity between two protein structures with known amino acid correspondences (e.g. identical amino acid sequences) but different tertiary structures.

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.