The effective population size is the number of individuals that an idealised population would need to have in order for some specified quantity of interest to be the same in the idealised population as in the real population. Idealised populations are based on unrealistic but convenient simplifications such as random mating, simultaneous birth of each new generation, constant population size, and equal numbers of children per parent. In some simple scenarios, the effective population size is the number of breeding individuals in the population. However, for most quantities of interest and most real populations, the census population size N of a real population is usually larger than the effective population size Ne. The same population may have multiple effective population sizes, for different properties of interest, including for different genetic loci.
The effective population size is most commonly measured with respect to the coalescence time. In an idealised diploid population with no selection at any locus, the expectation of the coalescence time in generations is equal to twice the census population size. The effective population size is measured as within-species genetic diversity divided by four times the mutation rate, because in such an idealised population, the heterozygosity is equal to . In a population with selection at many loci and abundant linkage disequilibrium, the coalescent effective population size may not reflect the census population size at all, or may reflect its logarithm.
Depending on the quantity of interest, effective population size can be defined in several ways. Ronald Fisher and Sewall Wright originally defined it as "the number of breeding individuals in an idealised population that would show the same amount of dispersion of allele frequencies under random genetic drift or the same amount of inbreeding as the population under consideration". More generally, an effective population size may be defined as the number of individuals in an idealised population that has a value of any given population genetic quantity that is equal to the value of that quantity in the population of interest. The two population genetic quantities identified by Wright were the one-generation increase in variance across replicate populations (variance effective population size) and the one-generation change in the inbreeding coefficient (inbreeding effective population size). These two are closely linked, and derived from F-statistics, but they are not identical.
Today, the effective population size is usually estimated empirically with respect to the sojourn or coalescence time, estimated as the within-species genetic diversity divided by the mutation rate, yielding a coalescent effective population size. Another important effective population size is the selection effective population size 1/scritical, where scritical is the critical value of the selection coefficient at which selection becomes more important than genetic drift.
In Drosophila populations of census size 16, the variance effective population size has been measured as equal to 11.5. This measurement was achieved through studying changes in the frequency of a neutral allele from one generation to another in over 100 replicate populations.
For coalescent effective population sizes, a survey of publications on 102 mostly wildlife animal and plant species yielded 192 Ne/N ratios. Seven different estimation methods were used in the surveyed studies. Accordingly, the ratios ranged widely from 10-6 for Pacific oysters to 0.994 for humans, with an average of 0.34 across the examined species. A genealogical analysis of human hunter-gatherers (Eskimos) determined the effective-to-census population size ratio for haploid (mitochondrial DNA, Y chromosomal DNA), and diploid (autosomal DNA) loci separately: the ratio of the effective to the census population size was estimated as 0.6–0.7 for autosomal and X-chromosomal DNA, 0.7–0.9 for mitochondrial DNA and 0.5 for Y-chromosomal DNA.
Let denote the same, typically larger, variance in the actual population under consideration. The variance effective population size is defined as the size of an idealized population with the same variance. This is found by substituting for and solving for which gives
In the following examples, one or more of the assumptions of a strictly idealised population are relaxed, while other assumptions are retained. The variance effective population size of the more relaxed population model is then calculated with respect to the strict model.
For example, say the population size was N = 10, 100, 50, 80, 20, 500 for six generations (t = 6). Then the effective population size is the harmonic mean of these, giving:
or more generally,
where D represents dioeciousness and may take the value 0 (for not dioecious) or 1 for dioecious.
When N is large, Ne approximately equals N, so this is usually trivial and often ignored:
If population size is to remain constant, each individual must contribute on average two gametes to the next generation. An idealized population assumes that this follows a Poisson distribution so that the variance of the number of gametes contributed, k is equal to the mean number contributed, i.e. 2:
However, in natural populations the variance is often larger than this. The vast majority of individuals may have no offspring, and the next generation stems only from a small number of individuals, so
The effective population size is then smaller, and given by:
Note that if the variance of k is less than 2, Ne is greater than N. In the extreme case of a population experiencing no variation in family size, in a laboratory population in which the number of offspring is artificially controlled, Vk = 0 and Ne = 2N.
Where Nm is the number of males and Nf the number of females. For example, with 80 males and 20 females (an absolute population size of 100):
Again, this results in Ne being less than N.
Alternatively, the effective population size may be defined by noting how the average inbreeding coefficient changes from one generation to the next, and then defining Ne as the size of the idealized population that has the same change in average inbreeding coefficient as the population under consideration. The presentation follows Kempthorne (1957).
For the idealized population, the inbreeding coefficients follow the recurrence equation
Using Panmictic Index (1 − F) instead of inbreeding coefficient, we get the approximate recurrence equation
The difference per generation is
The inbreeding effective size can be found by solving
although researchers rarely use this equation directly.
When organisms live longer than one breeding season, effective population sizes have to take into account the life tables for the species.
Assume a haploid population with discrete age structure. An example might be an organism that can survive several discrete breeding seasons. Further, define the following age structure characteristics:
The generation time is calculated as
Then, the inbreeding effective population size is
Similarly, the inbreeding effective number can be calculated for a diploid population with discrete age structure. This was first given by Johnson, but the notation more closely resembles Emigh and Pollak.
Assume the same basic parameters for the life table as given for the haploid case, but distinguishing between male and female, such as N0ƒ and N0m for the number of newborn females and males, respectively (notice lower case ƒ for females, compared to upper case F for inbreeding).
The inbreeding effective number is
According to the neutral theory of molecular evolution, a neutral allele remains in a population for Ne generations, where Ne is the effective population size. An idealised diploid population will have a pairwise nucleotide diversity equal to 4Ne, where is the mutation rate. The sojourn effective population size can therefore be estimated empirically by dividing the nucleotide diversity by the mutation rate.
The coalescent effective size may have little relationship to the number of individuals physically present in a population. Measured coalescent effective population sizes vary between genes in the same population, being low in genome areas of low recombination and high in genome areas of high recombination. Sojourn times are proportional to N in neutral theory, but for alleles under selection, sojourn times are proportional to log(N). Genetic hitchhiking can cause neutral mutations to have sojourn times proportional to log(N): this may explain the relationship between measured effective population size and the local recombination rate.
In an idealised Wright-Fisher model, the fate of an allele, beginning at an intermediate frequency, is largely determined by selection if the selection coefficient s ≫ 1/N, and largely determined by neutral genetic drift if s ≪ 1/N. In real populations, the cutoff value of s may depend instead on local recombination rates. This limit to selection in a real population may be captured in a toy Wright-Fisher simulation through the appropriate choice of Ne. Populations with different selection effective population sizes are predicted to evolve profoundly different genome architectures.
The Chinese mountain cat (Felis bieti), also known as Chinese desert cat and Chinese steppe cat, is a wild cat endemic to western China that has been listed as Vulnerable on the IUCN Red List since 2002, as the effective population size may be fewer than 10,000 mature breeding individuals.It was provionally classified as a wildcat subspecies with the name F. silvestris bieti in 2007.
It is recognised as a valid species since 2017, as it is morphologically distinct from wildcats.Dominance (ecology)
Ecological dominance is the degree to which a taxon is more numerous than its competitors in an ecological community, or makes up more of the biomass.
Most ecological communities are defined by their dominant species.
In many examples of wet woodland in western Europe, the dominant tree is alder (Alnus glutinosa).
In temperate bogs, the dominant vegetation is usually species of Sphagnum moss.
Tidal swamps in the tropics are usually dominated by species of mangrove (Rhizophoraceae)
Some sea floor communities are dominated by brittle stars.
Exposed rocky shorelines are dominated by sessile organisms such as barnacles and limpets.F-statistics
In population genetics, F-statistics (also known as fixation indices) describe the statistically expected level of heterozygosity in a population; more specifically the expected degree of (usually) a reduction in heterozygosity when compared to Hardy–Weinberg expectation.
F-statistics can also be thought of as a measure of the correlation between genes drawn at different levels of a (hierarchically) subdivided population. This correlation is influenced by several evolutionary processes, such as genetic drift, founder effect, bottleneck, genetic hitchhiking, meiotic drive, mutation, gene flow, inbreeding, natural selection, or the Wahlund effect, but it was originally designed to measure the amount of allelic fixation owing to genetic drift.
The concept of F-statistics was developed during the 1920s by the American geneticist Sewall Wright, who was interested in inbreeding in cattle. However, because complete dominance causes the phenotypes of homozygote dominants and heterozygotes to be the same, it was not until the advent of molecular genetics from the 1960s onwards that heterozygosity in populations could be measured.
F can be used to define effective population size.Feeding frenzy
In ecology, a feeding frenzy occurs when predators are overwhelmed by the amount of prey available. For example, a large school of fish can cause nearby sharks, such as the lemon shark, to enter into a feeding frenzy. This can cause the sharks to go wild, biting anything that moves, including each other or anything else within biting range. Another functional explanation for feeding frenzy is competition amongst predators. This term is most often used when referring to sharks or piranhas. It has also been used as a term within journalism.Fixation (population genetics)
In population genetics, fixation is the change in a gene pool from a situation where there exists at least two variants of a particular gene (allele) in a given population to a situation where only one of the alleles remains. In the absence of mutation or heterozygote advantage, any allele must eventually be lost completely from the population or fixed (permanently established at 100% frequency in the population). Whether a gene will ultimately be lost or fixed is dependent on selection coefficients and chance fluctuations in allelic proportions. Fixation can refer to a gene in general or particular nucleotide position in the DNA chain (locus).
In the process of substitution, a previously non-existent allele arises by mutation and undergoes fixation by spreading through the population by random genetic drift or positive selection. Once the frequency of the allele is at 100%, i.e. being the only gene variant present in any member, it is said to be "fixed" in the population.Similarly, genetic differences between taxa are said to have been fixed in each species.Genetic drift
Genetic drift (also known as allelic drift or the Sewall Wright effect) is the change in the frequency of an existing gene variant (allele) in a population due to random sampling of organisms. The alleles in the offspring are a sample of those in the parents, and chance has a role in determining whether a given individual survives and reproduces. A population's allele frequency is the fraction of the copies of one gene that share a particular form. Genetic drift may cause gene variants to disappear completely and thereby reduce genetic variation. It can also cause initially rare alleles to become much more frequent and even fixed.
When there are few copies of an allele, the effect of genetic drift is larger, and when there are many copies the effect is smaller. In the early 20th century, vigorous debates occurred over the relative importance of natural selection versus neutral processes, including genetic drift. Ronald Fisher, who explained natural selection using Mendelian genetics, held the view that genetic drift plays at the most a minor role in evolution, and this remained the dominant view for several decades. In 1968, population geneticist Motoo Kimura rekindled the debate with his neutral theory of molecular evolution, which claims that most instances where a genetic change spreads across a population (although not necessarily changes in phenotypes) are caused by genetic drift acting on neutral mutations.Genetic monitoring
Genetic monitoring is the use of molecular markers to (i) identify individuals, species or populations, or (ii) to quantify changes in population genetic metrics (such as effective population size, genetic diversity and population size) over time. Genetic monitoring can thus be used to detect changes in species abundance and/or diversity, and has become an important tool in both conservation and livestock management. The types of molecular markers used to monitor populations are most commonly mitochondrial, microsatellites or single-nucleotide polymorphisms (SNPs), while earlier studies also used allozyme data. Species gene diversity is also recognized as an important biodiversity metric for implementation of the Convention on Biological Diversity.Genetic viability
To be genetically viable, i.e having a realistic chance of avoiding the problems of inbreeding, a population of plants or animals requires a certain amount of genetic diversity, and consequently a certain minimum number of members. See effective population size. The minimum is normally somewhere in the region of a hundred unrelated individuals. Where a population has become extremely small in a population bottleneck, due for example to near-extinction of the species, it may have lost its genetic viability, and if numbers recover it will be through inbreeding, possibly leaving an unhealthy population.Human evolutionary genetics
Human evolutionary genetics studies how one human genome differs from another human genome, the evolutionary past that gave rise to the human genome, and its current effects. Differences between genomes have anthropological, medical, historical and forensic implications and applications. Genetic data can provide important insights into human evolution.Idealised population
In population genetics an idealised population is one that can be described using a number of simplifying assumptions. Models of idealised populations are either used to make a general point, or they are fit to data on real populations for which the assumptions may not hold true. For example, coalescent theory is used to fit data to models of idealised populations. The most common idealized population in population genetics is described in the Wright-Fisher model after Sewall Wright and Ronald Fisher (1922, 1930) and (1931). Wright-Fisher populations have constant size, and their members can mate and reproduce with any other member. Another example is a Moran model, which has overlapping generations, rather than the non-overlapping generations of the Fisher-Wright model. The complexities of real populations can cause their behavior to match an idealised population with an effective population size that is very different from the census population size of the real population. For sexual diploids, idealized populations will have genotype frequencies related to the allele frequencies according to Hardy-Weinberg equilibrium.Minimum viable population
Minimum viable population (MVP) is a lower bound on the population of a species, such that it can survive in the wild. This term is commonly used in the fields of biology, ecology, and conservation biology. MVP refers to the smallest possible size at which a biological population can exist without facing extinction from natural disasters or demographic, environmental, or genetic stochasticity. The term "population" is defined as a group of interbreeding individuals in similar geographic area that undergo negligible gene flow with other groups of the species. Typically, MVP is used to refer to a wild population, but can also be used for ex-situ conservation (Zoo populations).Nearly neutral theory of molecular evolution
The nearly neutral theory of molecular evolution is a modification of the neutral theory of molecular evolution that accounts for the fact that not all mutations are either so deleterious such that they can be ignored, or else neutral. Slightly deleterious mutations are reliably purged only when their selection coefficient are greater than one divided by the effective population size. In larger populations, a higher proportion of mutations exceed this threshold for which genetic drift cannot overpower selection, leading to fewer fixation events and so slower molecular evolution.
The nearly neutral theory was proposed by Tomoko Ohta in 1973. The population-size-dependent threshold for purging mutations has been called the "drift barrier" by Michael Lynch, and used to explain differences in genomic architecture among species.Paleodemography
Paleodemography is the study of human demography in antiquity and prehistory.
More specifically, paleodemography looks at the changes in pre-modern populations in order to determine something about the influences on the lifespan and health of earlier peoples.
Reconstruction of ancient population sizes and dynamics are based on bioarchaeology, ancient DNA as well as inference from modern population genetics.Population ecology
Population ecology is a sub-field of ecology that deals with the dynamics of species populations and how these populations interact with the environment. It is the study of how the population sizes of species change over time and space. The term population ecology is often used interchangeably with population biology or population dynamics.
The development of population ecology owes much to demography and actuarial life tables. Population ecology is important in conservation biology, especially in the development of population viability analysis (PVA) which makes it possible to predict the long-term probability of a species persisting in a given habitat patch. Although population ecology is a subfield of biology, it provides interesting problems for mathematicians and statisticians who work in population dynamics.Population size
In population genetics and population ecology, population size (usually denoted N) is the number of individual organisms in a population. Population size is directly associated with amount of genetic drift, and is the underlying cause of effects like population bottlenecks and the founder effect. Genetic drift is the major source of decrease of genetic diversity within populations which drives fixation and can potentially lead to speciation events.Proofreading (biology)
The term proofreading is used in genetics to refer to the error-correcting processes, first proposed by John Hopfield and Jacques Ninio, involved in DNA replication, immune system specificity, enzyme-substrate recognition among many other processes that require enhanced specificity. The proofreading mechanisms of Hopfield and Ninio are non-equilibrium active processes that consume ATP to enhance specificity of various biochemical reactions.
In bacteria, all three DNA polymerases (I, II and III) have the ability to proofread, using 3’ → 5’ exonuclease activity. When an incorrect base pair is recognized, DNA polymerase reverses its direction by one base pair of DNA and excises the mismatched base. Following base excision, the polymerase can re-insert the correct base and replication can continue.
In eukaryotes only the polymerases that deal with the elongation (delta and epsilon) have proofreading ability (3’ → 5’ exonuclease activity).Proofreading also occurs in mRNA translation for protein synthesis. In this case, one mechanism is release of any incorrect aminoacyl-tRNA before peptide bond formation.The extent of proofreading in DNA replication determines the mutation rate, and is different in different species.
For example, loss of proofreading due to mutations in the DNA polymerase epsilon gene results in a hyper-mutated genotype with >100 mutations per Mbase of DNA in human colorectal cancers.The extent of proofreading in other molecular processes can depend on the effective population size of the species and the number of genes affected by the same proofreading mechanism.Small population size
Small populations can behave differently from larger populations. They are often the result of a population bottlenecks from larger populations, leading to loss of heterozygosity and reduced genetic diversity and loss or fixation of alleles and shifts in allele frequencies. A small population is then more susceptible to demographic and genetic stochastic events, which can impact the long-term survival of the population. Therefore, small populations are often considered at risk of endangerment or extinction, and are often of conservation concern.Watterson estimator
In population genetics, the Watterson estimator is a method for describing the genetic diversity in a population. It was developed by Margaret Wu and G. A. Watterson in the 1970s. It is estimated by counting the number of polymorphic sites. It is a measure of the "population mutation rate" (the product of the effective population size and the neutral mutation rate) from the observed nucleotide diversity of a population. , where is the effective population size and is the per-generation mutation rate of the population of interest (Watterson (1975) ). The assumptions made are that there is a sample of haploid individuals from the population of interest, that there are infinitely many sites capable of varying (so that mutations never overlay or reverse one another), and that . Because the number of segregating sites counted will increase with the number of sequences looked at, the correction factor is used.
The estimate of , often denoted as , is
where is the number of segregating sites (an example of a segregating site would be a single-nucleotide polymorphism) in the sample and
is the th harmonic number.
This estimate is based on coalescent theory. Watterson's estimator is commonly used for its simplicity. When its assumptions are met, the estimator is unbiased and the variance of the estimator decreases with increasing sample size or recombination rate. However, the estimator can be biased by population structure. For example, is downwardly biased in an exponentially growing population. It can also be biased by violation of the infinite-sites mutational model; if multiple mutations can overwrite one another, Watterson's estimator will be biased downward.
Comparing the value of the Watterson's estimator, to nucleotide diversity is the basis of Tajima's D which allows inference of the evolutionary regime of a given locus.World population estimates
This article lists current estimates of world population, as well as projections of population growth.
In summary, estimates for the progression of world population since the late medieval period are in the following ranges:
Estimates for pre-modern times are necessarily fraught with great uncertainties, and few of the published estimates have confidence intervals; in the absence of a straightforward means to assess the error of such estimates, a rough idea of expert consensus can be gained by comparing the values given in independent publications. Population estimates cannot be considered accurate to more than two decimal digits;
for example, world population for the year 2012 was estimated at
7.02, 7.06 and 7.08 billion by the United States Census Bureau, the Population Reference Bureau and the United Nations Department of Economic and Social Affairs, respectively, corresponding to
a spread of estimates of the order of 0.8%.
|Concepts in Quantitative Genetics|
Effects of selection
on genomic variation