Time complexity

In computer science, the time complexity is the computational complexity that describes the amount of time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to differ by at most a constant factor.

Since an algorithm's running time may vary among different inputs of the same size, one commonly considers the worst-case time complexity, which is the maximum amount of time required for inputs of a given size. Less common, and usually specified explicitly, is the average-case complexity, which is the average of the time taken on inputs of a given size (this makes sense because there are only a finite number of possible inputs of a given size). In both cases, the time complexity is generally expressed as a function of the size of the input.[1]:226 Since this function is generally difficult to compute exactly, and the running time for small inputs is usually not consequential, one commonly focuses on the behavior of the complexity when the input size increases—that is, the asymptotic behavior of the complexity. Therefore, the time complexity is commonly expressed using big O notation, typically etc., where n is the input size in units of bits needed to represent the input.

Algorithmic complexities are classified according to the type of function appearing in the big O notation. For example, an algorithm with time complexity is a linear time algorithm and an algorithm with time complexity for some constant is a polynomial time algorithm.

Comparison computational complexity
Graphs of functions commonly used in the analysis of algorithms, showing the number of operations N versus input size n for each function

Table of common time complexities

The following table summarizes some classes of commonly encountered time complexities. In the table, poly(x) = xO(1), i.e., polynomial in x.

Name Complexity class Running time (T(n)) Examples of running times Example algorithms
constant time O(1) 10 Finding the median value in a sorted array of numbers

Calculating (–1)n

inverse Ackermann time O(α(n)) Amortized time per operation using a disjoint set
iterated logarithmic time O(log* n) Distributed coloring of cycles
log-logarithmic O(log log n) Amortized time per operation using a bounded priority queue[2]
logarithmic time DLOGTIME O(log n) log n, log(n2) Binary search
polylogarithmic time poly(log n) (log n)2 AKS primality test[3][4]
fractional power O(nc) where 0 < c < 1 n1/2, n2/3 Searching in a kd-tree
linear time O(n) n, 2n + 5 Finding the smallest or largest item in an unsorted array, Kadane's algorithm
"n log-star n" time O(n log* n) Seidel's polygon triangulation algorithm.
quasilinear time O(n log n) n log n, log n! Fastest possible comparison sort; Fast Fourier transform.
quadratic time O(n2) n2 Bubble sort; Insertion sort; Direct convolution
cubic time O(n3) n3 Naive multiplication of two n×n matrices. Calculating partial correlation.
polynomial time P 2O(log n) = poly(n) n2 + n, n10 Karmarkar's algorithm for linear programming;
quasi-polynomial time QP 2poly(log n) nlog log n, nlog n Best-known O(log2 n)-approximation algorithm for the directed Steiner tree problem.
sub-exponential time
(first definition)
SUBEXP O(2nε) for all ε > 0 O(2log nlog log n) Contains BPP unless EXPTIME (see below) equals MA.[5]
sub-exponential time
(second definition)
2o(n) 2n1/3 Best-known algorithm for integer factorization and graph isomorphism
exponential time
(with linear exponent)
E 2O(n) 1.1n, 10n Solving the traveling salesman problem using dynamic programming
exponential time EXPTIME 2poly(n) 2n, 2n2 Solving matrix chain multiplication via brute-force search
factorial time O(n!) n! Solving the traveling salesman problem via brute-force search
double exponential time 2-EXPTIME 22poly(n) 22n Deciding the truth of a given statement in Presburger arithmetic

Constant time

An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input. For example, accessing any single element in an array takes constant time as only one operation has to be performed to locate it. In a similar manner, finding the minimal value in an array sorted in ascending order; it is the first element. However, finding the minimal value in an unordered array is not a constant time operation as scanning over each element in the array is needed in order to determine the minimal value. Hence it is a linear time operation, taking O(n) time. If the number of elements is known in advance and does not change, however, such an algorithm can still be said to run in constant time.

Despite the name "constant time", the running time does not have to be independent of the problem size, but an upper bound for the running time has to be bounded independently of the problem size. For example, the task "exchange the values of a and b if necessary so that ab" is called constant time even though the time may depend on whether or not it is already true that ab. However, there is some constant t such that the time required is always at most t.

Here are some examples of code fragments that run in constant time :

int index = 5;
int item = list[index];
if (condition true) then
   perform some operation that runs in constant time
else
   perform some other operation that runs in constant time
for i = 1 to 100
   for j = 1 to 200
      perform some operation that runs in constant time

If T(n) is O(any constant value), this is equivalent to and stated in standard notation as T(n) being O(1).

Logarithmic time

An algorithm is said to take logarithmic time when T(n) = O(log n). Since loga n and logb n are related by a constant multiplier, and such a multiplier is irrelevant to big-O classification, the standard usage for logarithmic-time algorithms is O(log n) regardless of the base of the logarithm appearing in the expression of T.

Algorithms taking logarithmic time are commonly found in operations on binary trees or when using binary search.

An O(log n) algorithm is considered highly efficient, as the ratio of the number of operations to the size of the input decreases and tends to zero when n increases. An algorithm that must access all elements of its input cannot take logarithmic time, as the time taken for reading an input of size n is of the order of n.

An example of logarithmic time is given by dictionary search. Consider a dictionary which contains n entries, sorted by alphabetical order. We suppose that, for 1 ≤ kn, one may access to the kth entry of the dictionary in a constant time. Let D[k] denote this kth entry. Under these hypotheses, the test if a word D is in the dictionary may be done in logarithmic time: consider where denotes the floor function. If then we are done. Else, if continue the search in the same way in the left half of the dictionary, otherwise continue similarly with the right half of the dictionary. This algorithm is similar to the method often used to find an entry in a paper dictionary.

Polylogarithmic time

An algorithm is said to run in polylogarithmic time if T(n) = O((log n)k), for some constant k. For example, matrix chain ordering can be solved in polylogarithmic time on a parallel random-access machine.[6]

Sub-linear time

An algorithm is said to run in sub-linear time (often spelled sublinear time) if T(n) = o(n). In particular this includes algorithms with the time complexities defined above, as well as others such as the O(n½) Grover's search algorithm.

Typical algorithms that are exact and yet run in sub-linear time use parallel processing (as the NC1 matrix determinant calculation does), non-classical processing (as Grover's search does), or alternatively have guaranteed assumptions on the input structure (as the logarithmic time binary search and many tree maintenance algorithms do). However, formal languages such as the set of all strings that have a 1-bit in the position indicated by the first log(n) bits of the string may depend on every bit of the input and yet be computable in sub-linear time.

The specific term sublinear time algorithm is usually reserved to algorithms that are unlike the above in that they are run over classical serial machine models and are not allowed prior assumptions on the input.[7] They are however allowed to be randomized, and indeed must be randomized for all but the most trivial of tasks.

As such an algorithm must provide an answer without reading the entire input, its particulars heavily depend on the access allowed to the input. Usually for an input that is represented as a binary string b1,...,bk it is assumed that the algorithm can in time O(1) request and obtain the value of bi for any i.

Sub-linear time algorithms are typically randomized, and provide only approximate solutions. In fact, the property of a binary string having only zeros (and no ones) can be easily proved not to be decidable by a (non-approximate) sub-linear time algorithm. Sub-linear time algorithms arise naturally in the investigation of property testing.

Linear time

An algorithm is said to take linear time, or O(n) time, if its time complexity is O(n). Informally, this means that the running time increases at most linearly with the size of the input. More precisely, this means that there is a constant c such that the running time is at most cn for every input of size n. For example, a procedure that adds up all elements of a list requires time proportional to the length of the list, if the adding time is constant, or, at least, bounded by a constant.

Linear time is the best possible time complexity in situations where the algorithm has to sequentially read its entire input. Therefore, much research has been invested into discovering algorithms exhibiting linear time or, at least, nearly linear time. This research includes both software and hardware methods. There are several hardware technologies which exploit parallelism to provide this. An example is content-addressable memory. This concept of linear time is used in string matching algorithms such as the Boyer–Moore algorithm and Ukkonen's algorithm.

Quasilinear time

An algorithm is said to run in quasilinear time (also referred to as log-linear time) if T(n) = O(n logk n) for some positive constant k; linearithmic time is the case k = 1.[8][9] Using soft O notation these algorithms are Õ(n). Quasilinear time algorithms are also O(n1+ε) for every constant ε > 0, and thus run faster than any polynomial time algorithm whose time bound includes a term nc for any c > 1.

Algorithms which run in quasilinear time include:

In many cases, the n · log n running time is simply the result of performing a Θ(log n) operation n times (for the notation, see Big O notation § Family of Bachmann–Landau notations). For example, binary tree sort creates a binary tree by inserting each element of the n-sized array one by one. Since the insert operation on a self-balancing binary search tree takes O(log n) time, the entire algorithm takes O(n log n) time.

Comparison sorts require at least O(n log n) number of comparisons in the worst case because log(n!) = Θ(n log n), by Stirling's approximation. They also frequently arise from the recurrence relation T(n) = 2T(n/2) + O(n).

Sub-quadratic time

An algorithm is said to be subquadratic time if T(n) = o(n2).

For example, simple, comparison-based sorting algorithms are quadratic (e.g. insertion sort), but more advanced algorithms can be found that are subquadratic (e.g. Shell sort). No general-purpose sorts run in linear time, but the change from quadratic to sub-quadratic is of great practical importance.

Polynomial time

An algorithm is said to be of polynomial time if its running time is upper bounded by a polynomial expression in the size of the input for the algorithm, i.e., T(n) = O(nk) for some positive constant k.[1][10] Problems for which a deterministic polynomial time algorithm exists belong to the complexity class P, which is central in the field of computational complexity theory. Cobham's thesis states that polynomial time is a synonym for "tractable", "feasible", "efficient", or "fast".[11]

Some examples of polynomial time algorithms:

  • The selection sort sorting algorithm on n integers performs operations for some constant A. Thus it runs in time and is a polynomial time algorithm.
  • All the basic arithmetic operations (addition, subtraction, multiplication, division, and comparison) can be done in polynomial time.
  • Maximum matchings in graphs can be found in polynomial time.

Strongly and weakly polynomial time

In some contexts, especially in optimization, one differentiates between strongly polynomial time and weakly polynomial time algorithms. These two concepts are only relevant if the inputs to the algorithms consist of integers.

Strongly polynomial time is defined in the arithmetic model of computation. In this model of computation the basic arithmetic operations (addition, subtraction, multiplication, division, and comparison) take a unit time step to perform, regardless of the sizes of the operands. The algorithm runs in strongly polynomial time if [12]

  1. the number of operations in the arithmetic model of computation is bounded by a polynomial in the number of integers in the input instance; and
  2. the space used by the algorithm is bounded by a polynomial in the size of the input.

Any algorithm with these two properties can be converted to a polynomial time algorithm by replacing the arithmetic operations by suitable algorithms for performing the arithmetic operations on a Turing machine. If the second of the above requirements is not met, then this is not true anymore. Given the integer (which takes up space proportional to n in the Turing machine model), it is possible to compute with n multiplications using repeated squaring. However, the space used to represent is proportional to , and thus exponential rather than polynomial in the space used to represent the input. Hence, it is not possible to carry out this computation in polynomial time on a Turing machine, but it is possible to compute it by polynomially many arithmetic operations.

Conversely, there are algorithms which run in a number of Turing machine steps bounded by a polynomial in the length of binary-encoded input, but do not take a number of arithmetic operations bounded by a polynomial in the number of input numbers. The Euclidean algorithm for computing the greatest common divisor of two integers is one example. Given two integers and , the algorithm performs arithmetic operations on numbers with at most bits. At the same time, the number of arithmetic operations cannot be bounded by the number of integers in the input (which is constant in this case, there are always only two integers in the input). Due to the latter observation, the algorithm does not run in strongly polynomial time. Its real running time depends on the magnitudes of and and not only on the number of integers in the input.

An algorithm which runs in polynomial time but which is not strongly polynomial is said to run in weakly polynomial time.[13] A well-known example of a problem for which a weakly polynomial-time algorithm is known, but is not known to admit a strongly polynomial-time algorithm, is linear programming. Weakly polynomial-time should not be confused with pseudo-polynomial time.

Complexity classes

The concept of polynomial time leads to several complexity classes in computational complexity theory. Some important classes defined using polynomial time are the following.

P The complexity class of decision problems that can be solved on a deterministic Turing machine in polynomial time
NP The complexity class of decision problems that can be solved on a non-deterministic Turing machine in polynomial time
ZPP The complexity class of decision problems that can be solved with zero error on a probabilistic Turing machine in polynomial time
RP The complexity class of decision problems that can be solved with 1-sided error on a probabilistic Turing machine in polynomial time.
BPP The complexity class of decision problems that can be solved with 2-sided error on a probabilistic Turing machine in polynomial time
BQP The complexity class of decision problems that can be solved with 2-sided error on a quantum Turing machine in polynomial time

P is the smallest time-complexity class on a deterministic machine which is robust in terms of machine model changes. (For example, a change from a single-tape Turing machine to a multi-tape machine can lead to a quadratic speedup, but any algorithm that runs in polynomial time under one model also does so on the other.) Any given abstract machine will have a complexity class corresponding to the problems which can be solved in polynomial time on that machine.

Superpolynomial time

An algorithm is said to take superpolynomial time if T(n) is not bounded above by any polynomial. Using little omega notation, it is ω(nc) time for all constants c, where n is the input parameter, typically the number of bits in the input.

For example, an algorithm that runs for 2n steps on an input of size n requires superpolynomial time (more specifically, exponential time).

An algorithm that uses exponential resources is clearly superpolynomial, but some algorithms are only very weakly superpolynomial. For example, the Adleman–Pomerance–Rumely primality test runs for nO(log log n) time on n-bit inputs; this grows faster than any polynomial for large enough n, but the input size must become impractically large before it cannot be dominated by a polynomial with small degree.

An algorithm that requires superpolynomial time lies outside the complexity class P. Cobham's thesis posits that these algorithms are impractical, and in many cases they are. Since the P versus NP problem is unresolved, no algorithm for an NP-complete problem is currently known to run in polynomial time.

Quasi-polynomial time

Quasi-polynomial time algorithms are algorithms that run slower than polynomial time, yet not so slow as to be exponential time. The worst case running time of a quasi-polynomial time algorithm is for some fixed . For we get a polynomial time algorithm, for we get a sub-linear time algorithm.

Quasi-polynomial time algorithms typically arise in reductions from an NP-hard problem to another problem. For example, one can take an instance of an NP hard problem, say 3SAT, and convert it to an instance of another problem B, but the size of the instance becomes . In that case, this reduction does not prove that problem B is NP-hard; this reduction only shows that there is no polynomial time algorithm for B unless there is a quasi-polynomial time algorithm for 3SAT (and thus all of NP). Similarly, there are some problems for which we know quasi-polynomial time algorithms, but no polynomial time algorithm is known. Such problems arise in approximation algorithms; a famous example is the directed Steiner tree problem, for which there is a quasi-polynomial time approximation algorithm achieving an approximation factor of (n being the number of vertices), but showing the existence of such a polynomial time algorithm is an open problem.

Other computational problems with quasi-polynomial time solutions but no known polynomial time solution include the planted clique problem in which the goal is to find a large clique in the union of a clique and a random graph. Although quasi-polynomially solvable, it has been conjectured that the planted clique problem has no polynomial time solution; this planted clique conjecture has been used as a computational hardness assumption to prove the difficulty of several other problems in computational game theory, property testing, and machine learning.[14]

The complexity class QP consists of all problems that have quasi-polynomial time algorithms. It can be defined in terms of DTIME as follows.[15]

Relation to NP-complete problems

In complexity theory, the unsolved P versus NP problem asks if all problems in NP have polynomial-time algorithms. All the best-known algorithms for NP-complete problems like 3SAT etc. take exponential time. Indeed, it is conjectured for many natural NP-complete problems that they do not have sub-exponential time algorithms. Here "sub-exponential time" is taken to mean the second definition presented below. (On the other hand, many graph problems represented in the natural way by adjacency matrices are solvable in subexponential time simply because the size of the input is square of the number of vertices.) This conjecture (for the k-SAT problem) is known as the exponential time hypothesis.[16] Since it is conjectured that NP-complete problems do not have quasi-polynomial time algorithms, some inapproximability results in the field of approximation algorithms make the assumption that NP-complete problems do not have quasi-polynomial time algorithms. For example, see the known inapproximability results for the set cover problem.

Sub-exponential time

The term sub-exponential time is used to express that the running time of some algorithm may grow faster than any polynomial but is still significantly smaller than an exponential. In this sense, problems that have sub-exponential time algorithms are somewhat more tractable than those that only have exponential algorithms. The precise definition of "sub-exponential" is not generally agreed upon,[17] and we list the two most widely used ones below.

First definition

A problem is said to be sub-exponential time solvable if it can be solved in running times whose logarithms grow smaller than any given polynomial. More precisely, a problem is in sub-exponential time if for every ε > 0 there exists an algorithm which solves the problem in time O(2nε). The set of all such problems is the complexity class SUBEXP which can be defined in terms of DTIME as follows.[5][18][19][20]

Note that this notion of sub-exponential is non-uniform in terms of ε in the sense that ε is not part of the input and each ε may have its own algorithm for the problem.

Second definition

Some authors define sub-exponential time as running times in 2o(n).[16][21][22] This definition allows larger running times than the first definition of sub-exponential time. An example of such a sub-exponential time algorithm is the best-known classical algorithm for integer factorization, the general number field sieve, which runs in time about , where the length of the input is n. Another example is the best-known algorithm for the graph isomorphism problem, which runs in time .

Note that it makes a difference whether the algorithm is allowed to be sub-exponential in the size of the instance, the number of vertices, or the number of edges. In parameterized complexity, this difference is made explicit by considering pairs of decision problems and parameters k. SUBEPT is the class of all parameterized problems that run in time sub-exponential in k and polynomial in the input size n:[23]

More precisely, SUBEPT is the class of all parameterized problems for which there is a computable function with and an algorithm that decides L in time .

Exponential time hypothesis

The exponential time hypothesis (ETH) is that 3SAT, the satisfiability problem of Boolean formulas in conjunctive normal form with, at most, three literals per clause and with n variables, cannot be solved in time 2o(n). More precisely, the hypothesis is that there is some absolute constant c>0 such that 3SAT cannot be decided in time 2cn by any deterministic Turing machine. With m denoting the number of clauses, ETH is equivalent to the hypothesis that kSAT cannot be solved in time 2o(m) for any integer k ≥ 3.[24] The exponential time hypothesis implies P ≠ NP.

Exponential time

An algorithm is said to be exponential time, if T(n) is upper bounded by 2poly(n), where poly(n) is some polynomial in n. More formally, an algorithm is exponential time if T(n) is bounded by O(2nk) for some constant k. Problems which admit exponential time algorithms on a deterministic Turing machine form the complexity class known as EXP.

Sometimes, exponential time is used to refer to algorithms that have T(n) = 2O(n), where the exponent is at most a linear function of n. This gives rise to the complexity class E.

Double exponential time

An algorithm is said to be double exponential time if T(n) is upper bounded by 22poly(n), where poly(n) is some polynomial in n. Such algorithms belong to the complexity class 2-EXPTIME.

Well-known double exponential time algorithms include:

See also

References

  1. ^ a b Sipser, Michael (2006). Introduction to the Theory of Computation. Course Technology Inc. ISBN 0-619-21764-2.
  2. ^ Mehlhorn, Kurt; Naher, Stefan (1990). "Bounded ordered dictionaries in O(log log N) time and O(n) space". Information Processing Letters. 35 (4): 183–189. doi:10.1016/0020-0190(90)90022-P.
  3. ^ Tao, Terence (2010). "1.11 The AKS primality test". An epsilon of room, II: Pages from year three of a mathematical blog. Graduate Studies in Mathematics. 117. Providence, RI: American Mathematical Society. pp. 82–86. doi:10.1090/gsm/117. ISBN 978-0-8218-5280-4. MR 2780010.
  4. ^ Lenstra, Jr., H.W.; Pomerance, Carl (11 December 2016). "Primality testing with Gaussian periods" (PDF).
  5. ^ a b Babai, László; Fortnow, Lance; Nisan, N.; Wigderson, Avi (1993). "BPP has subexponential time simulations unless EXPTIME has publishable proofs". Computational Complexity. Berlin, New York: Springer-Verlag. 3 (4): 307–318. doi:10.1007/BF01275486.
  6. ^ Bradford, Phillip G.; Rawlins, Gregory J. E.; Shannon, Gregory E. (1998). "Efficient Matrix Chain Ordering in Polylog Time". SIAM Journal on Computing. Philadelphia: Society for Industrial and Applied Mathematics. 27 (2): 466–490. doi:10.1137/S0097539794270698. ISSN 1095-7111.
  7. ^ Kumar, Ravi; Rubinfeld, Ronitt (2003). "Sublinear time algorithms" (PDF). SIGACT News. 34 (4): 57–67. doi:10.1145/954092.954103.
  8. ^ Naik, Ashish V.; Regan, Kenneth W.; Sivakumar, D. (1995). "On Quasilinear Time Complexity Theory" (PDF). Theoretical Computer Science. 148: 325–349. doi:10.1016/0304-3975(95)00031-q. Retrieved 23 February 2015.
  9. ^ Sedgewick, R. and Wayne K (2011). Algorithms, 4th Ed. p. 186. Pearson Education, Inc.
  10. ^ Papadimitriou, Christos H. (1994). Computational complexity. Reading, Mass.: Addison-Wesley. ISBN 0-201-53082-1.
  11. ^ Cobham, Alan (1965). "The intrinsic computational difficulty of functions". Proc. Logic, Methodology, and Philosophy of Science II. North Holland.
  12. ^ Grötschel, Martin; László Lovász; Alexander Schrijver (1988). "Complexity, Oracles, and Numerical Computation". Geometric Algorithms and Combinatorial Optimization. Springer. ISBN 0-387-13624-X.
  13. ^ Schrijver, Alexander (2003). "Preliminaries on algorithms and Complexity". Combinatorial Optimization: Polyhedra and Efficiency. 1. Springer. ISBN 3-540-44389-4.
  14. ^ Braverman, Mark; Ko, Young Kun; Rubinstein, Aviad; Weinstein, Omri (2015), ETH hardness for densest-k-subgraph with perfect completeness, arXiv:1504.08352, Bibcode:2015arXiv150408352B.
  15. ^ Complexity Zoo: Class QP: Quasipolynomial-Time
  16. ^ a b Impagliazzo, R.; Paturi, R. (2001). "On the complexity of k-SAT". Journal of Computer and System Sciences. Elsevier. 62 (2): 367–375. doi:10.1006/jcss.2000.1727. ISSN 1090-2724.
  17. ^ Aaronson, Scott (5 April 2009). "A not-quite-exponential dilemma". Shtetl-Optimized. Retrieved 2 December 2009.
  18. ^ Complexity Zoo: Class SUBEXP: Deterministic Subexponential-Time
  19. ^ Moser, P. (2003). "Baire's Categories on Small Complexity Classes". Lecture Notes in Computer Science. Berlin, New York: Springer-Verlag: 333–342. ISSN 0302-9743.
  20. ^ Miltersen, P.B. (2001). "DERANDOMIZING COMPLEXITY CLASSES". Handbook of Randomized Computing. Kluwer Academic Pub: 843.
  21. ^ Kuperberg, Greg (2005). "A Subexponential-Time Quantum Algorithm for the Dihedral Hidden Subgroup Problem". SIAM Journal on Computing. Philadelphia: Society for Industrial and Applied Mathematics. 35 (1): 188. arXiv:quant-ph/0302112. doi:10.1137/s0097539703436345. ISSN 1095-7111.
  22. ^ Oded Regev (2004). "A Subexponential Time Algorithm for the Dihedral Hidden Subgroup Problem with Polynomial Space". arXiv:quant-ph/0406151v1.
  23. ^ Flum, Jörg; Grohe, Martin (2006). Parameterized Complexity Theory. Springer. p. 417. ISBN 978-3-540-29952-3. Retrieved 5 March 2010.
  24. ^ Impagliazzo, R.; Paturi, R.; Zane, F. (2001). "Which problems have strongly exponential complexity?". Journal of Computer and System Sciences. 63 (4): 512–530. doi:10.1006/jcss.2001.1774.
  25. ^ Mayr,E. & Mayer,A.: The Complexity of the Word Problem for Commutative Semi-groups and Polynomial Ideals. Adv. in Math. 46(1982) pp. 305–329
  26. ^ J.H. Davenport & J. Heintz: Real Quantifier Elimination is Doubly Exponential. J. Symbolic Comp. 5(1988) pp. 29–35.
  27. ^ G.E. Collins: Quantifier Elimination for Real Closed Fields by Cylindrical Algebraic Decomposition. Proc. 2nd. GI Conference Automata Theory & Formal Languages (Springer Lecture Notes in Computer Science 33) pp. 134–183
Analysis of algorithms

In computer science, the analysis of algorithms is the determination of the computational complexity of algorithms, that is the amount of time, storage and/or other resources necessary to execute them. Usually, this involves determining a function that relates the length of an algorithm's input to the number of steps it takes (its time complexity) or the number of storage locations it uses (its space complexity). An algorithm is said to be efficient when this function's values are small, or grow slowly compared to a growth in the size of the input. Different inputs of the same length may cause the algorithm to have different behavior, so best, worst and average case descriptions might all be of practical interest. When not otherwise specified, the function describing the performance of an algorithm is usually an upper bound, determined from the worst case inputs to the algorithm.

The term "analysis of algorithms" was coined by Donald Knuth. Algorithm analysis is an important part of a broader computational complexity theory, which provides theoretical estimates for the resources needed by any algorithm which solves a given computational problem. These estimates provide an insight into reasonable directions of search for efficient algorithms.

In theoretical analysis of algorithms it is common to estimate their complexity in the asymptotic sense, i.e., to estimate the complexity function for arbitrarily large input. Big O notation, Big-omega notation and Big-theta notation are used to this end. For instance, binary search is said to run in a number of steps proportional to the logarithm of the length of the sorted list being searched, or in O(log(n)), colloquially "in logarithmic time". Usually asymptotic estimates are used because different implementations of the same algorithm may differ in efficiency. However the efficiencies of any two "reasonable" implementations of a given algorithm are related by a constant multiplicative factor called a hidden constant.

Exact (not asymptotic) measures of efficiency can sometimes be computed but they usually require certain assumptions concerning the particular implementation of the algorithm, called model of computation. A model of computation may be defined in terms of an abstract computer, e.g., Turing machine, and/or by postulating that certain operations are executed in unit time.

For example, if the sorted list to which we apply binary search has n elements, and we can guarantee that each lookup of an element in the list can be done in unit time, then at most log2 n + 1 time units are needed to return an answer.

Asymptotic computational complexity

In computational complexity theory, asymptotic computational complexity is the usage of asymptotic analysis for the estimation of computational complexity of algorithms and computational problems, commonly associated with the usage of the big O notation.

Clustal

Clustal is a series of widely used computer programs used in Bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its algorithm are also detailed in their respective categories. Available operating systems listed in the sidebar are a combination of the software availability and may not be supported for every current version of the Clustal tools. Clustal Omega has the widest variety of operating systems out of all the Clustal tools.

Complexity class

In computational complexity theory, a complexity class is a set of problems of related resource-based complexity. A typical complexity class has a definition of the form:

the set of problems that can be solved by an abstract machine M using O(f(n)) of resource R, where n is the size of the input.

Computational complexity

In computer science, the computational complexity, or simply complexity of an algorithm is the amount of resources required for running it. The computational complexity of a problem is the minimum of the complexities of all possible algorithms for this problem (including the unknown algorithms).

As the amount of needed resources varies with the input, the complexity is generally expressed as a function n → f(n), where n is the size of the input, and f(n) is either the worst-case complexity, that is the maximum of the amount of resources that are needed for all inputs of size n, or the average-case complexity, that is average of the amount of resources over all input of size n.

When the nature of the resources is not explicitly given, this is usually the time needed for running the algorithm, and one talks of time complexity. However, this depends on the computer that is used, and the time is generally expressed as the number of needed elementary operations, which are supposed to take a constant time on a given computer, and to change by a constant factor when one changes of computer.

Otherwise, the resource that is considered is often the size of the memory that is needed, and one talks of space complexity.

The study of the complexity of explicitly given algorithms is called analysis of algorithms, while the study of the complexity of problems is called computational complexity theory. Clearly, both areas are strongly related, as the complexity of an algorithm is always an upper bound of the complexity of the problem solved by this algorithm.

DLOGTIME

In computational complexity theory, DLOGTIME is the complexity class of all computational problems solvable in a logarithmic amount of computation time on a deterministic Turing machine. It must be defined on a random-access Turing machine, since otherwise the input tape is longer than the range of cells that can be accessed by the machine. It is a very weak model of time complexity: no random-access Turing machine with a smaller deterministic time bound can access the whole input.DLOGTIME-uniformity is important in circuit complexity.

Disjoint-set data structure

In computer science, a disjoint-set data structure (also called a union–find data structure or merge–find set) is a data structure that tracks a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It provides near-constant-time operations (bounded by the inverse Ackermann function) to add new sets, to merge existing sets, and to determine whether elements are in the same set. In addition to many other uses (see the Applications section), disjoint-sets play a key role in Kruskal's algorithm for finding the minimum spanning tree of a graph.

GOST (block cipher)

The GOST block cipher (Magma), defined in the standard GOST 28147-89 (RFC 5830), is a Soviet and Russian government standard symmetric key block cipher with a block size of 64 bits. The original standard, published in 1989, did not give the cipher any name, but the most recent revision of the standard, GOST R 34.12-2015, specifies that it may be referred to as Magma. The GOST hash function is based on this cipher. The new standard also specifies a new 128-bit block cipher called Kuznyechik.

Developed in the 1970s, the standard had been marked "Top Secret" and then downgraded to "Secret" in 1990. Shortly after the dissolution of the USSR, it was declassified and it was released to the public in 1994. GOST 28147 was a Soviet alternative to the United States standard algorithm, DES. Thus, the two are very similar in structure.

Generation of primes

In computational number theory, a variety of algorithms make it possible to generate prime numbers efficiently. These are used in various applications, for example hashing, public-key cryptography, and search of prime factors in large numbers.

For relatively small numbers, it is possible to just apply trial division to each successive odd number. Prime sieves are almost always faster.

Graham scan

Graham's scan is a method of finding the convex hull of a finite set of points in the plane with time complexity O(n log n). It is named after Ronald Graham, who published the original algorithm in 1972. The algorithm finds all vertices of the convex hull ordered along its boundary. It uses a stack to detect and remove concavities in the boundary efficiently.

Interpolation attack

In cryptography, an interpolation attack is a type of cryptanalytic attack against block ciphers.

After the two attacks, differential cryptanalysis and linear cryptanalysis, were presented on block ciphers, some new block ciphers were introduced, which were proven secure against differential and linear attacks. Among these there were some iterated block ciphers such as the KN-Cipher and the SHARK cipher. However, Thomas Jakobsen and Lars Knudsen showed in the late 90's that these ciphers were easy to break by introducing a new attack called the interpolation attack.

In the attack, an algebraic function is used to represent an S-box. This may be a simple quadratic, or a polynomial or rational function over a Galois field. Its coefficients can be determined by standard Lagrange interpolation techniques, using known plaintexts as data points. Alternatively, chosen plaintexts can be used to simplify the equations and optimize the attack.

In its simplest version an interpolation attack expresses the ciphertext as a polynomial of the plaintext. If the polynomial has a relative low number of unknown coefficients, then with a collection of plaintext/ciphertext (p/c) pairs, the polynomial can be reconstructed. With the polynomial reconstructed the attacker then has a representation of the encryption, without exact knowledge of the secret key.

The interpolation attack can also be used to recover the secret key.

It is easiest to describe the method with an example.

Memory hierarchy

In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlling technologies. Memory hierarchy affects performance in computer architectural design, algorithm predictions, and lower level programming constructs involving locality of reference.

Designing for high performance requires considering the restrictions of the memory hierarchy, i.e. the size and capabilities of each component. Each of the various components can be viewed as part of a hierarchy of memories (m1,m2,...,mn) in which each member mi is typically smaller and faster than the next highest member mi+1 of the hierarchy. To limit waiting by higher levels, a lower level will respond by filling a buffer and then signaling for activating the transfer.

There are four major storage levels.

Internal – Processor registers and cache.

Main – the system RAM and controller cards.

On-line mass storage – Secondary storage.

Off-line bulk storage – Tertiary and Off-line storage.This is a general memory hierarchy structuring. Many other structures are useful. For example, a paging algorithm may be considered as a level for virtual memory when designing a computer architecture, and one can include a level of nearline storage between online and offline storage.

Overhead (computing)

In computer science, overhead is any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to perform a specific task. It is a special case of engineering overhead. Overhead can be a deciding factor in software design, with regard to structure, error correction, and feature inclusion. Examples of computing overhead may be found in functional programming, data transfer, and data structures.

Parallel random-access machine

In computer science, a parallel random-access machine (PRAM) is a shared-memory abstract machine. As its name indicates, the PRAM was intended as the parallel-computing analogy to the random-access machine (RAM). In the same way that the RAM is used by sequential-algorithm designers to model algorithmic performance (such as time complexity), the PRAM is used by parallel-algorithm designers to model parallel algorithmic performance (such as time complexity, where the number of processors assumed is typically also stated). Similar to the way in which the RAM model neglects practical issues, such as access time to cache memory versus main memory, the PRAM model neglects such issues as synchronization and communication, but provides any (problem-size-dependent) number of processors. Algorithm cost, for instance, is estimated using two parameters O(time) and O(time × processor_number).

Prim's algorithm

In computer science, Prim's (also known as Jarník's) algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex.

The algorithm was developed in 1930 by Czech mathematician Vojtěch Jarník and later rediscovered and republished by computer scientists Robert C. Prim in 1957 and Edsger W. Dijkstra in 1959. Therefore, it is also sometimes called the Jarník's algorithm, Prim–Jarník algorithm, Prim–Dijkstra algorithm

or the DJP algorithm.Other well-known algorithms for this problem include Kruskal's algorithm and Borůvka's algorithm. These algorithms find the minimum spanning forest in a possibly disconnected graph; in contrast, the most basic form of Prim's algorithm only finds minimum spanning trees in connected graphs. However, running Prim's algorithm separately for each connected component of the graph, it can also be used to find the minimum spanning forest. In terms of their asymptotic time complexity, these three algorithms are equally fast for sparse graphs, but slower than other more sophisticated algorithms.

However, for graphs that are sufficiently dense, Prim's algorithm can be made to run in linear time, meeting or improving the time bounds for other algorithms.

Push–relabel maximum flow algorithm

In mathematical optimization, the push–relabel algorithm (alternatively, preflow–push algorithm) is an algorithm for computing maximum flows. The name "push–relabel" comes from the two basic operations used in the algorithm. Throughout its execution, the algorithm maintains a "preflow" and gradually converts it into a maximum flow by moving flow locally between neighboring nodes using push operations under the guidance of an admissible network maintained by relabel operations. In comparison, the Ford–Fulkerson algorithm performs global augmentations that send flow following paths from the source all the way to the sink.The push–relabel algorithm is considered one of the most efficient maximum flow algorithms. The generic algorithm has a strongly polynomial O(V 2E) time complexity, which is asymptotically more efficient than the O(VE 2) Edmonds–Karp algorithm. Specific variants of the algorithms achieve even lower time complexities. The variant based on the highest label node selection rule has O(V 2√E) time complexity and is generally regarded as the benchmark for maximum flow algorithms. Subcubic O(VElog(V 2/E)) time complexity can be achieved using dynamic trees, although in practice it is less efficient.The push–relabel algorithm has been extended to compute minimum cost flows. The idea of distance labels has led to a more efficient augmenting path algorithm, which in turn can be incorporated back into the push–relabel algorithm to create a variant with even higher empirical performance.

Rope (data structure)

In computer programming, a rope, or cord, is a data structure composed of smaller strings that is used to efficiently store and manipulate a very long string. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.

Selection sort

In computer science, selection sort is a sorting algorithm, specifically an in-place comparison sort. It has O(n2) time complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity, and it has performance advantages over more complicated algorithms in certain situations, particularly where auxiliary memory is limited.

The algorithm divides the input list into two parts: the sublist of items already sorted, which is built up from left to right at the front (left) of the list, and the sublist of items remaining to be sorted that occupy the rest of the list. Initially, the sorted sublist is empty and the unsorted sublist is the entire input list. The algorithm proceeds by finding the smallest (or largest, depending on sorting order) element in the unsorted sublist, exchanging (swapping) it with the leftmost unsorted element (putting it in sorted order), and moving the sublist boundaries one element to the right. Uniqueness of selection sort when compared to other sorting techniques:The time efficiency of selection sort is quadratic, so there exists a number of sorting techniques which have better time complexity than Selection Sort. Even then, considering the number of swaps made, the number of swaps will be n-1 both in worst as well as best case. That is, time efficiency of selection sort with respect to swaps is linear. This property distinguishes selection sort positively from many other sorting algorithms.

Worst-case complexity

In computer science, the worst-case complexity (usually denoted in asymptotic notation) measures the resources (e.g. running time, memory) an algorithm requires in the worst-case. It gives an upper bound on the resources required by the algorithm.

In the case of running time, the worst-case time-complexity indicates the longest running time performed by an algorithm given any input of size n, and thus this guarantees that the algorithm finishes on time. Moreover, the order of growth of the worst-case complexity is used to compare the efficiency of two algorithms.

The worst-case complexity of an algorithm should be contrasted with its average-case complexity, which is an average measure of the amount of resources the algorithm uses on a random input.

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.