# Almost surely

In probability theory, one says that an event happens almost surely (sometimes abbreviated as a.s.) if it happens with probability one. In other words, the set of possible exceptions may be non-empty, but it has probability zero. The concept is precisely the same as the concept of "almost everywhere" in measure theory.

In probability experiments on a finite sample space, there is often no difference between almost surely and surely. However, the distinction becomes important when the sample space is an infinite set, because an infinite set can have non-empty subsets of probability zero.

Some examples of the use of this concept include the strong and uniform versions of the law of large numbers, and the continuity of the paths of Brownian motion.

The terms almost certainly (a.c.) and almost always (a.a.) are also used. Almost never describes the opposite of almost surely: an event that happens with probability zero happens almost never.[1]

## Formal definition

Let ${\displaystyle (\Omega ,{\mathcal {F}},P)}$ be a probability space. An event ${\displaystyle E\in {\mathcal {F}}}$ happens almost surely if ${\displaystyle P(E)=1}$. Equivalently, ${\displaystyle E}$ happens almost surely if the probability of ${\displaystyle E}$ not occurring is zero: ${\displaystyle P(E^{C})=0}$. More generally, any event ${\displaystyle E\subseteq \Omega }$ (not necessarily in ${\displaystyle {\mathcal {F}}}$) happens almost surely if ${\displaystyle E^{C}}$ is contained in a null set: a subset of some ${\displaystyle N\in {\mathcal {F}}}$ such that ${\displaystyle P(N)=0}$.[2] The notion of almost sureness depends on the probability measure ${\displaystyle P}$. If it is necessary to emphasize this dependence, it is customary to say that the event ${\displaystyle E}$ occurs P-almost surely, or almost surely (P).

## Illustrative examples

In general, an event can happen "almost surely" even if the probability space in question includes outcomes which do not belong to the event, as is illustrated in the examples below.

### Throwing a dart

Imagine throwing a dart at a unit square (i.e. a square with area 1) so that the dart always hits exactly one point of the square, and so that each point in the square is equally likely to be hit.

Now, notice that since the square has area 1, the probability that the dart will hit any particular subregion of the square equals the area of that subregion. For example, the probability that the dart will hit the right half of the square is 0.5, since the right half has area 0.5.

Next, consider the event that "the dart hits a diagonal of the unit square exactly". Since the areas of the diagonals of the square are zero, the probability that the dart lands exactly on a diagonal is zero. So, the dart will almost never land on a diagonal (i.e. it will almost surely not land on a diagonal). Nonetheless the set of points on the diagonals is not empty and a point on a diagonal is no less possible than any other point: the diagonal does contain valid outcomes of the experiment.

### Tossing a coin repeatedly

Consider the case where a (possibly biased) coin is tossed, corresponding to the probability space ${\displaystyle (\{H,T\},2^{\{H,T\}},P)}$, where the event ${\displaystyle \{H\}}$ occurs if heads is flipped, and ${\displaystyle \{T\}}$ if tails. For this particular coin, assume the probability of flipping heads is ${\displaystyle P(H)=p\in (0,1)}$ from which it follows that the complement event, flipping tails, has ${\displaystyle P(T)=1-p}$.

Suppose we were to conduct an experiment where the coin is tossed repeatedly, with outcomes ${\displaystyle \omega _{1},\omega _{2},\ldots }$, and it is assumed each flip's outcome is independent of all the others. That is, they are i.i.d.. Define the sequence of random variables on the coin toss space, ${\displaystyle (X_{i})_{i\in \mathbb {N} }}$ where ${\displaystyle X_{i}(\omega )=\omega _{i}}$. i.e. each ${\displaystyle X_{i}}$ records the outcome of the ${\displaystyle i}$'th flip.

Any infinite sequence of heads and tails is a possible outcome of the experiment. However, any particular infinite sequence of heads and tails has probability zero of being the exact outcome of the (infinite) experiment. To see why, note that the i.i.d. assumption implies that the probability of flipping all heads over ${\displaystyle n}$ flips is simply ${\displaystyle P(X_{i}=H,\ i=1,2,\dots ,n)=\left(P(X_{1}=H)\right)^{n}=p^{n}}$. Letting ${\displaystyle n\rightarrow \infty }$ yields zero, since ${\displaystyle p\in (0,1)}$ by assumption. Note that the result is the same no matter how much we bias the coin towards heads, so long as we constrain ${\displaystyle p}$ to be greater than 0, and less than 1.

In particular, the event "the sequence contains at least one ${\displaystyle T}$" happens almost surely (i.e., with probability 1). However, if instead of an infinite number of flips we stop flipping after some finite time, say a million flips, then the all-heads sequence has non-zero probability. The all-heads sequence has probability ${\displaystyle p^{1,000,000}\neq 0}$, while the probability of getting at least one tails is ${\displaystyle 1-p^{1,000,000}}$ and the event is no longer almost sure.

## Asymptotically almost surely

In asymptotic analysis, one says that a property holds asymptotically almost surely (a.a.s.) if, over a sequence of sets, the probability converges to 1. For instance, a large number is asymptotically almost surely composite, by the prime number theorem; and in random graph theory, the statement "${\displaystyle G(n,p_{n})}$ is connected" (where ${\displaystyle G(n,p)}$ denotes the graphs on ${\displaystyle n}$ vertices with edge probability ${\displaystyle p}$) is true a.a.s. when, for any ${\displaystyle \varepsilon >0}$

${\displaystyle p_{n}<{\tfrac {(1+\varepsilon )\ln n}{n}}}$.[3]

In number theory this is referred to as "almost all", as in "almost all numbers are composite". Similarly, in graph theory, this is sometimes referred to as "almost surely".[4]

## Notes

1. ^ Grädel, Erich; Kolaitis, Phokion G.; Libkin, Leonid; Marx, Maarten; Spencer, Joel; Vardi, Moshe Y.; Venema, Yde; Weinstein, Scott (2007). Finite Model Theory and Its Applications. Springer. p. 232. ISBN 978-3-540-00428-8.
2. ^ Jacod, Jean; Protter, (2004). Probability Essentials. Springer. p. 37. ISBN 978-3-540-438717.
3. ^ Friedgut, Ehud; Rödl, Vojtech; Rucinski, Andrzej; Tetali, Prasad (January 2006). "A Sharp Threshold for Random Graphs with a Monochromatic Triangle in Every Edge Coloring". Memoirs of the American Mathematical Society. AMS Bookstore. 179 (845): 3–4. ISSN 0065-9266.
4. ^ Spencer, Joel H. (2001). "0. Two Starting Examples". The Strange Logic of Random Graphs. Algorithms and Combinatorics. 22. Springer. p. 4. ISBN 978-3540416548.

## References

• Rogers, L. C. G.; Williams, David (2000). Diffusions, Markov Processes, and Martingales. 1: Foundations. Cambridge University Press. ISBN 978-0521775946.
• Williams, David (1991). Probability with Martingales. Cambridge Mathematical Textbooks. Cambridge University Press. ISBN 978-0521406055.
Almost everywhere

In measure theory (a branch of mathematical analysis), a property holds almost everywhere if, in a technical sense, the set for which the property holds takes up nearly all possibilities. The notion of almost everywhere is a companion notion to the concept of measure zero. In the subject of probability, which is largely based in measure theory, the notion is referred to as almost surely.

More specifically, a property holds almost everywhere if the set of elements for which the property does not hold is a set of measure zero (Halmos 1974), or equivalently if the set of elements for which the property holds is conull. In cases where the measure is not complete, it is sufficient that the set is contained within a set of measure zero. When discussing sets of real numbers, the Lebesgue measure is assumed unless otherwise stated.

The term almost everywhere is abbreviated a.e.; in older literature p.p. is used, to stand for the equivalent French language phrase presque partout.

A set with full measure is one whose complement is of measure zero. In probability theory, the terms almost surely, almost certain and almost always refer to events with probability 1, which are exactly the sets of full measure in a probability space.

Occasionally, instead of saying that a property holds almost everywhere, it is said that the property holds for almost all elements (though the term almost all also has other meanings).

Convergence of random variables

In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to statistics and stochastic processes. The same concepts are known in more general mathematics as stochastic convergence and they formalize the idea that a sequence of essentially random or unpredictable events can sometimes be expected to settle down into a behaviour that is essentially unchanging when items far enough into the sequence are studied. The different possible notions of convergence relate to how such a behaviour can be characterised: two readily understood behaviours are that the sequence eventually takes a constant value, and that values in the sequence continue to change but can be described by an unchanging probability distribution.

Degenerate distribution

In mathematics, a degenerate distribution is a probability distribution in a space (discrete or continuous) with support only on a space of lower dimension. If the degenerate distribution is univariate (involving only a single random variable) it is a deterministic distribution and takes only a single value. Examples include a two-headed coin and rolling a die whose sides all show the same number. This distribution satisfies the definition of "random variable" even though it does not appear random in the everyday sense of the word; hence it is considered degenerate.

In the case of a real-valued random variable, the degenerate distribution is localized at a point k0 on the real line. The probability mass function equals 1 at this point and 0 elsewhere.

The degenerate univariate distribution can be viewed as the limiting case of a continuous distribution whose variance goes to 0 causing the probability density function to be a delta function at k0, with infinite height there but area equal to 1.

The cumulative distribution function of the univariate degenerate distribution is:

${\displaystyle F_{k_{0}}(x)=\left\{{\begin{matrix}1,&{\mbox{if }}x\geq k_{0}\\0,&{\mbox{if }}x

Diffusion process

In probability theory and statistics, a diffusion process is a solution to a stochastic differential equation. It is a continuous-time Markov process with almost surely continuous sample paths. Brownian motion, reflected Brownian motion and Ornstein–Uhlenbeck processes are examples of diffusion processes.

A sample path of a diffusion process models the trajectory of a particle embedded in a flowing fluid and subjected to random displacements due to collisions with other particles, which is called Brownian motion. The position of the particle is then random; its probability density function as a function of space and time is governed by an advection-diffusion equation.

Doob's martingale convergence theorems

In mathematics – specifically, in the theory of stochastic processes – Doob's martingale convergence theorems are a collection of results on the long-time limits of supermartingales, named after the American mathematician Joseph L. Doob.

Doob decomposition theorem

In the theory of stochastic processes in discrete time, a part of the mathematical theory of probability, the Doob decomposition theorem gives a unique decomposition of every adapted and integrable stochastic process as the sum of a martingale and a predictable process (or "drift") starting at zero. The theorem was proved by and is named for Joseph L. Doob.The analogous theorem in the continuous-time case is the Doob–Meyer decomposition theorem.

Erdős–Rényi model

In the mathematical field of graph theory, the Erdős–Rényi model is either of two closely related models for generating random graphs. They are named after mathematicians Paul Erdős and Alfréd Rényi, who first introduced one of the models in 1959, while Edgar Gilbert introduced the other model contemporaneously and independently of Erdős and Rényi. In the model of Erdős and Rényi, all graphs on a fixed vertex set with a fixed number of edges are equally likely; in the model introduced by Gilbert, each edge has a fixed probability of being present or absent, independently of the other edges. These models can be used in the probabilistic method to prove the existence of graphs satisfying various properties, or to provide a rigorous definition of what it means for a property to hold for almost all graphs.

Glivenko–Cantelli theorem

In the theory of probability, the Glivenko–Cantelli theorem, named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, determines the asymptotic behaviour of the empirical distribution function as the number of independent and identically distributed observations grows. The uniform convergence of more general empirical measures becomes an important property of the Glivenko–Cantelli classes of functions or sets. The Glivenko–Cantelli classes arise in Vapnik–Chervonenkis theory, with applications to machine learning. Applications can be found in econometrics making use of M-estimators.

Assume that ${\displaystyle X_{1},X_{2},\dots }$ are independent and identically-distributed random variables in ${\displaystyle \mathbb {R} }$ with common cumulative distribution function ${\displaystyle F(x)}$. The empirical distribution function for ${\displaystyle X_{1},\dots ,X_{n}}$ is defined by

${\displaystyle F_{n}(x)={\frac {1}{n}}\sum _{i=1}^{n}I_{[X_{i},\infty )}(x)}$

where ${\displaystyle I_{C}}$ is the indicator function of the set ${\displaystyle C}$. For every (fixed) ${\displaystyle x}$, ${\displaystyle F_{n}(x)}$ is a sequence of random variables which converge to ${\displaystyle F(x)}$ almost surely by the strong law of large numbers, that is, ${\displaystyle F_{n}}$ converges to ${\displaystyle F}$ pointwise. Glivenko and Cantelli strengthened this result by proving uniform convergence of ${\displaystyle F_{n}}$ to ${\displaystyle F}$.

Theorem

${\displaystyle \|F_{n}-F\|_{\infty }=\sup _{x\in \mathbb {R} }|F_{n}(x)-F(x)|\longrightarrow 0}$ almost surely.

This theorem originates with Valery Glivenko, and Francesco Cantelli, in 1933.

Remarks

Hewitt–Savage zero–one law

The Hewitt–Savage zero–one law is a theorem in probability theory, similar to Kolmogorov's zero–one law and the Borel–Cantelli lemma, that specifies that a certain type of event will either almost surely happen or almost surely not happen. It is sometimes known as the Hewitt–Savage law for symmetric events. It is named after Edwin Hewitt and Leonard Jimmie Savage.

Infinite monkey theorem

The infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type any given text, such as the complete works of William Shakespeare. In fact, the monkey would almost surely type every possible finite text an infinite number of times. However, the probability that monkeys filling the observable universe would type a complete work such as Shakespeare's Hamlet is so tiny that the chance of it occurring during a period of time hundreds of thousands of orders of magnitude longer than the age of the universe is extremely low (but technically not zero).

In this context, "almost surely" is a mathematical term with a precise meaning, and the "monkey" is not an actual monkey, but a metaphor for an abstract device that produces an endless random sequence of letters and symbols. One of the earliest instances of the use of the "monkey metaphor" is that of French mathematician Émile Borel in 1913, but the first instance may have been even earlier.

Variants of the theorem include multiple and even infinitely many typists, and the target text varies between an entire library and a single sentence. Jorge Luis Borges traced the history of this idea from Aristotle's On Generation and Corruption and Cicero's De natura deorum (On the Nature of the Gods), through Blaise Pascal and Jonathan Swift, up to modern statements with their iconic simians and typewriters. In the early 20th century, Borel and Arthur Eddington used the theorem to illustrate the timescales implicit in the foundations of statistical mechanics.

Kolmogorov's zero–one law

In probability theory, Kolmogorov's zero–one law, named in honor of Andrey Nikolaevich Kolmogorov, specifies that a certain type of event, called a tail event, will either almost surely happen or almost surely not happen; that is, the probability of such an event occurring is zero or one.

Tail events are defined in terms of infinite sequences of random variables. Suppose

${\displaystyle X_{1},X_{2},X_{3},\dots }$

is an infinite sequence of independent random variables (not necessarily identically distributed). Let ${\displaystyle {\mathcal {F}}}$ be the σ-algebra generated by the ${\displaystyle X_{i}}$. Then, a tail event ${\displaystyle F\in {\mathcal {F}}}$ is an event which is probabilistically independent of each finite subset of these random variables. (Note: ${\displaystyle F}$ belonging to ${\displaystyle {\mathcal {F}}}$ implies that membership in ${\displaystyle F}$ is uniquely determined by the values of the ${\displaystyle X_{i}}$ but the latter condition is strictly weaker and does not suffice to prove the zero-one law.) For example, the event that the sequence converges, and the event that its sum converges are both tail events. In an infinite sequence of coin-tosses, a sequence of 100 consecutive heads occurring infinitely many times is a tail event.

Intuitively, tail events are precisely those events whose occurrence can still be determined if an arbitrarily large but finite initial segment of the ${\displaystyle X_{i}}$ are removed.

In many situations, it can be easy to apply Kolmogorov's zero–one law to show that some event has probability 0 or 1, but surprisingly hard to determine which of these two extreme values is the correct one.

Law of large numbers

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.

The LLN is important because it guarantees stable long-term results for the averages of some random events. For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. It is important to remember that the law only applies (as the name indicates) when a large number of observations is considered. There is no principle that a small number of observations will coincide with the expected value or that a streak of one value will immediately be "balanced" by the others (see the gambler's fallacy).

Law of the iterated logarithm

In probability theory, the law of the iterated logarithm describes the magnitude of the fluctuations of a random walk. The original statement of the law of the iterated logarithm is due to A. Y. Khinchin (1924). Another statement was given by A. N. Kolmogorov in 1929.

Liliopsida

Liliopsida Batsch (synonym: Liliatae) is a botanical name for the class containing the family Liliaceae (or Lily Family). It is considered synonymous (or nearly synonymous) with the name monocotyledon. Publication of the name is credited to Scopoli (in 1760): see author citation (botany). This name is formed by replacing the termination -aceae in the name Liliaceae by the termination -opsida (Art 16 of the ICBN).

Although in principle it is true that circumscription of this class will vary with the taxonomic system being used, in practice this name is very strongly linked to the Cronquist system, and the allied Takhtajan system. These two are the only major systems to use the name, and in both these systems it refers to the group more widely known as the monocotyledons. Earlier systems referred to this group by the name Monocotyledones, with Monocotyledoneae an earlier spelling (these names may be used in any rank). Systems such as the Dahlgren and Thorne systems (more recent than the Takhtajan and Cronquist systems) refer to this group by the name Liliidae (a name in the rank of subclass). Modern systems, such as the APG and APG II systems refer to this group by the name monocots (a name for a clade). Therefore, in practice the name Liliopsida will almost surely refer to the usage as in the Cronquist system.

In summary the monocotyledons were named:

Monocotyledoneae in the de Candolle system and the Engler system.

Monocotyledones in the Bentham & Hooker system and the Wettstein system

class Liliatae and later Liliopsida in the Takhtajan

class Liliopsida in the Cronquist system (also in the Reveal system).

subclass Liliidae in the Dahlgren system and the Thorne system (1992)

clade monocots in the APG system, the APG II system and the APG III system.Each of the systems mentioned above use their own internal taxonomy for the group.

Local martingale

In mathematics, a local martingale is a type of stochastic process, satisfying the localized version of the martingale property. Every martingale is a local martingale; every bounded local martingale is a martingale; in particular, every local martingale that is bounded from below is a supermartingale, and every local martingale that is bounded from above is a submartingale; however, in general a local martingale is not a martingale, because its expectation can be distorted by large values of small probability. In particular, a driftless diffusion process is a local martingale, but not necessarily a martingale.

Local martingales are essential in stochastic analysis, see Itō calculus, semimartingale, Girsanov theorem.

Optional stopping theorem

In probability theory, the optional stopping theorem (or Doob's optional sampling theorem) says that, under certain conditions, the expected value of a martingale at a stopping time is equal to its initial expected value. Since martingales can be used to model the wealth of a gambler participating in a fair game, the optional stopping theorem says that, on average, nothing can be gained by stopping play based on the information obtainable so far (i.e., without looking into the future). Certain conditions are necessary for this result to hold true. In particular, the theorem applies to doubling strategies.

The optional stopping theorem is an important tool of mathematical finance in the context of the fundamental theorem of asset pricing.

Sample-continuous process

In mathematics, a sample-continuous process is a stochastic process whose sample paths are almost surely continuous functions.

Weakly measurable function

In mathematics—specifically, in functional analysis—a weakly measurable function taking values in a Banach space is a function whose composition with any element of the dual space is a measurable function in the usual (strong) sense. For separable spaces, the notions of weak and strong measurability agree.

Wiener process

In mathematics, the Wiener process is a continuous-time stochastic process named in honor of Norbert Wiener. It is often called standard Brownian motion process or Brownian motion due to its historical connection with the physical process known as Brownian movement or Brownian motion originally observed by Robert Brown. It is one of the best known Lévy processes (càdlàg stochastic processes with stationary independent increments) and occurs frequently in pure and applied mathematics, economics, quantitative finance, evolutionary biology, and physics.

The Wiener process plays an important role in both pure and applied mathematics. In pure mathematics, the Wiener process gave rise to the study of continuous time martingales. It is a key process in terms of which more complicated stochastic processes can be described. As such, it plays a vital role in stochastic calculus, diffusion processes and even potential theory. It is the driving process of Schramm–Loewner evolution. In applied mathematics, the Wiener process is used to represent the integral of a white noise Gaussian process, and so is useful as a model of noise in electronics engineering (see Brownian noise), instrument errors in filtering theory and unknown forces in control theory.

The Wiener process has applications throughout the mathematical sciences. In physics it is used to study Brownian motion, the diffusion of minute particles suspended in fluid, and other types of diffusion via the Fokker–Planck and Langevin equations. It also forms the basis for the rigorous path integral formulation of quantum mechanics (by the Feynman–Kac formula, a solution to the Schrödinger equation can be represented in terms of the Wiener process) and the study of eternal inflation in physical cosmology. It is also prominent in the mathematical theory of finance, in particular the Black–Scholes option pricing model.

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.