Computational statistics

Computational statistics, or statistical computing, is the interface between statistics and computer science. It is the area of computational science (or scientific computing) specific to the mathematical science of statistics. This area is also developing rapidly, leading to calls that a broader concept of computing should be taught as part of general statistical education.[1]

As in traditional statistics the goal is to transform raw data into knowledge,[2] but the focus lies on computer intensive statistical methods, such as cases with very large sample size and non-homogeneous data sets.[2]

The terms 'computational statistics' and 'statistical computing' are often used interchangeably, although Carlo Lauro (a former president of the International Association for Statistical Computing) proposed making a distinction, defining 'statistical computing' as "the application of computer science to statistics", and 'computational statistics' as "aiming at the design of algorithm for implementing statistical methods on computers, including the ones unthinkable before the computer age (e.g. bootstrap, simulation), as well as to cope with analytically intractable problems" [sic].[3]

The term 'Computational statistics' may also be used to refer to computationally intensive statistical methods including resampling methods, Markov chain Monte Carlo methods, local regression, kernel density estimation, artificial neural networks and generalized additive models.

London School of Economics Statistics Machine Room 1964
Students working in the Statistics Machine Room of the London School of Economics in 1964.

Computational statistics journals

Associations

See also

References

  1. ^ Nolan, D. & Temple Lang, D. (2010). "Computing in the Statistics Curricula", The American Statistician 64 (2), pp.97-107.
  2. ^ a b Wegman, Edward J. “Computational Statistics: A New Agenda for Statistical Theory and Practice.Journal of the Washington Academy of Sciences, vol. 78, no. 4, 1988, pp. 310–322. JSTOR
  3. ^ Lauro, Carlo (1996), "Computational statistics or statistical computing, is that the question?", Computational Statistics & Data Analysis, 23 (1): 191–193, doi:10.1016/0167-9473(96)88920-1

Further reading

Articles

  • Albert, J.H.; Gentle, J.E. (2004), Albert, James H; Gentle, James E (eds.), "Special Section: Teaching Computational Statistics", The American Statistician, 58: 1–1, doi:10.1198/0003130042872
  • Wilkinson, Leland (2008), "The Future of Statistical Computing (with discussion)", Technometrics, 50 (4): 418–435, doi:10.1198/004017008000000460

Books

  • Drew, John H.; Evans, Diane L.; Glen, Andrew G.; Lemis, Lawrence M. (2007), Computational Probability: Algorithms and Applications in the Mathematical Sciences, Springer International Series in Operations Research & Management Science, Springer, ISBN 0-387-74675-7
  • Gentle, James E. (2002), Elements of Computational Statistics, Springer, ISBN 0-387-95489-9
  • Gentle, James E.; Härdle, Wolfgang; Mori, Yuichi, eds. (2004), Handbook of Computational Statistics: Concepts and Methods, Springer, ISBN 3-540-40464-3
  • Givens, Geof H.; Hoeting, Jennifer A. (2005), Computational Statistics, Wiley Series in Probability and Statistics, Wiley-Interscience, ISBN 978-0-471-46124-1
  • Klemens, Ben (2008), Modeling with Data: Tools and Techniques for Statistical Computing, Princeton University Press, ISBN 978-0-691-13314-0
  • Monahan, John (2001), Numerical Methods of Statistics, Cambridge University Press, ISBN 978-0-521-79168-7
  • Rose, Colin; Smith, Murray D. (2002), Mathematical Statistics with Mathematica, Springer Texts in Statistics, Springer, ISBN 0-387-95234-9
  • Thisted, Ronald Aaron (1988), Elements of Statistical Computing: Numerical Computation, CRC Press, ISBN 0-412-01371-1
  • Gharieb, Reda. R. (2017), Data Science: Scientific and Statistical Computing, Noor Publishing, ISBN 978-3-330-97256-8

External links

Associations

Journals

Bootstrap aggregating

Bootstrap aggregating, also called bagging, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach.

Jackknife resampling

In statistics, the jackknife is a resampling technique especially useful for variance and bias estimation. The jackknife pre-dates other common resampling methods such as the bootstrap. The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the estimate and then finding the average of these calculations. Given a sample of size , the jackknife estimate is found by aggregating the estimates of each -sized sub-sample.

The jackknife technique was developed by Maurice Quenouille (1924-1973) from 1949, and refined in 1956. John Tukey expanded on the technique in 1958 and proposed the name "jackknife" since, like a physical jack-knife (a compact folding knife), it is a rough-and-ready tool that can improvise a solution for a variety of problems even though specific problems may be more efficiently solved with a purpose-designed tool.

The jackknife is a linear approximation of the bootstrap.

Journal of Statistical Computation and Simulation

Journal of Statistical Computation and Simulation is a peer-reviewed scientific journal that publishes papers related to computational statistics. It is published by Taylor & Francis in English.

The journal started publishing in 1972. It publishes 12 issues each year.

List of statistics journals

This is a list of scientific journals published in the field of statistics.

Markov chain Monte Carlo

In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain a sample of the desired distribution by observing the chain after a number of steps. The more steps there are, the more closely the distribution of the sample matches the actual desired distribution.

Outline of statistics

Statistics is a field of inquiry that studies the collection, analysis, interpretation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities; it is also used and misused for making informed decisions in all areas of business and government.

Plug-in principle

In statistics, the plug-in principle is the method of estimation of functionals of a population distribution by evaluating the same functionals at the empirical distribution based on a sample.

For example, when estimating the population mean, this method uses the sample mean; to estimate the population median, it uses the sample median; to estimate the population regression line, it uses the sample regression line.

It is called a principle because it is too simple to be otherwise, it is just a guideline, not a theorem.

Stan (software)

Stan is a probabilistic programming language for statistical inference written in C++. The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function.Stan is licensed under the New BSD License. Stan is named in honour of Stanislaw Ulam, pioneer of the Monte Carlo method.Stan was created by Andrew Gelman and Bob Carpenter, with a development team consisting of 34 members.

Technometrics

Technometrics is a journal of statistics for the physical, chemical, and engineering sciences, published quarterly since 1959 by the American Society for Quality and the American Statistical Association.

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.