The R Journal is an online, open-access, refereed journal published by The R Foundation since 2009. The journal publishes research articles in statistical computing that are of interest to users of the R programming language. The journal is entirely free: it does not charge authors for publication nor are there fees for subscription. The journal includes a News and Notes section that supersedes the R News newsletter, which was published from 2001 to 2008.
The journal serves a dual role as a research journal in statistical computing and as the official newsletter of the R Project. It publishes regular news updates about The R Foundation, the CRAN repository system and the Bioconductor project. It also published articles fore-shadowing new development directions for R.
The R Journal publishes short to medium length research articles. Articles may describe innovations in the R system itself, new R software packages or statistical computing theory implemented in R. Articles in The R Journal are often the primary references for the associated packages, for example for the core grid graphics packages or for parallel processing. The journal also publishes articles on best-practice and innovation in modelling, for example in multivariate statistics or multi-level modelling. A feature of the journal is the inclusion in articles of complete code by which readers can reproduce results and examples.
The journal is indexed in the ISI Web of Knowledge. Despite including many non-citable news articles, it had a 2017 impact factor of 1.371 and a 5-year IF of 2.522. In 2017 the journal was ranked 79th out of 105 journals in computer science and 55th out of 124 journals in statistics and probability. Google Scholar Metrics give the R Journal an h5-index of 18 and an h5-median of 38.
The editors of the Journal are appointed by the board of the R Foundation. Each editor in chief generally serves for one year and two issues. Editors-in-chief have been Vince Carey (2009), Peter Dalgaard (2010), Heather Turner (2011), Martyn Plummer (2012), Hadley Wickham (2013), Deepayan Sarkar (2014), Bettina Grün (2015), Michael Lawrence (2016), Roger Bivand (2017) and John Verzani (2018).
|The R Journal|
|Edited by||Michael Lawrence|
The R Foundation (Austria)
This is a comparison of peer-reviewed scientific journals published in the field of statistics.Computational statistics
Computational statistics, or statistical computing, is the interface between statistics and computer science. It is the area of computational science (or scientific computing) specific to the mathematical science of statistics. This area is also developing rapidly, leading to calls that a broader concept of computing should be taught as part of general statistical education.As in traditional statistics the goal is to transform raw data into knowledge, but the focus lies on computer intensive statistical methods, such as cases with very large sample size and non-homogeneous data sets.The terms 'computational statistics' and 'statistical computing' are often used interchangeably, although Carlo Lauro (a former president of the International Association for Statistical Computing) proposed making a distinction, defining 'statistical computing' as "the application of computer science to statistics",
and 'computational statistics' as "aiming at the design of algorithm for implementing
statistical methods on computers, including the ones unthinkable before the computer
age (e.g. bootstrap, simulation), as well as to cope with analytically intractable problems" [sic].The term 'Computational statistics' may also be used to refer to computationally intensive statistical methods including resampling methods, Markov chain Monte Carlo methods, local regression, kernel density estimation, artificial neural networks and generalized additive models.Hadley Wickham
Hadley Wickham is a statistician from New Zealand who is currently Chief Scientist at RStudio and an adjunct Professor of statistics at the University of Auckland, Stanford University, and Rice University. He is best known for his development of open-source statistical analysis software packages for R (programming language) that implement logics of data visualisation and data transformation. Wickham's packages and writing are known for advocating a tidy data approach to data import, analysis and modelling methods.Hierarchical generalized linear model
In statistics, hierarchical generalized linear models (HGLM) extend generalized linear models by relaxing the assumption that error components are independent. This allows models to be built in situations where more than one error term is necessary and also allows for dependencies between error terms. The error components can be correlated and not necessarily follow a normal distribution. When there are different clusters, that is, groups of observations, the observations in the same cluster are correlated. In fact, they are positively correlated because observations in the same cluster share some common features. In this situation, using generalized linear models and ignoring the correlations may cause problems.Inverse Gaussian distribution
In probability theory, the inverse Gaussian distribution (also known as the Wald distribution) is a two-parameter family of continuous probability distributions with support on (0,∞).
Its probability density function is given by
for x > 0, where is the mean and is the shape parameter.
As λ tends to infinity, the inverse Gaussian distribution becomes more like a normal (Gaussian) distribution. The inverse Gaussian distribution has several properties analogous to a Gaussian distribution. The name can be misleading: it is an "inverse" only in that, while the Gaussian describes a Brownian motion's level at a fixed time, the inverse Gaussian describes the distribution of the time a Brownian motion with positive drift takes to reach a fixed positive level.
Its cumulant generating function (logarithm of the characteristic function) is the inverse of the cumulant generating function of a Gaussian random variable.
To indicate that a random variable X is inverse Gaussian-distributed with mean μ and shape parameter λ we write .Kolmogorov–Smirnov test
In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test). It is named after Andrey Kolmogorov and Nikolai Smirnov.
The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. The null distribution of this statistic is calculated under the null hypothesis that the sample is drawn from the reference distribution (in the one-sample case) or that the samples are drawn from the same distribution (in the two-sample case). In the one-sample case, the distribution considered under the null hypothesis may be continuous (see Section 2), purely discrete or mixed (see Section 2.2). In the two-sample case (see Section 3), the distribution considered under the null hypothesis is a continuous distribution but is otherwise unrestricted.
The two-sample K–S test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples.
The Kolmogorov–Smirnov test can be modified to serve as a goodness of fit test. In the special case of testing for normality of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic: see below. Various studies have found that, even in this corrected form, the test is less powerful for testing normality than the Shapiro–Wilk test or Anderson–Darling test. However, these other tests have their own disadvantages. For instance the Shapiro–Wilk test is known not to work well in samples with many identical values.List of statistics journals
This is a list of scientific journals published in the field of statistics.Multi-label classification
In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of more than two classes; in the multi-label problem there is no constraint on how many of the classes the instance can be assigned to.
Formally, multi-label classification is the problem of finding a model that maps inputs x to binary vectors y (assigning a value of 0 or 1 for each element (label) in y).Predictive Model Markup Language
The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format conceived by Dr. Robert Lee Grossman, then the director of the National Center for Data Mining at the University of Illinois at Chicago. PMML provides a way for analytic applications to describe and exchange predictive models produced by data mining and machine learning algorithms. It supports common models such as logistic regression and feedforward neural networks. Version 0.9 was published in 1998. Subsequent versions have been developed by the Data Mining Group.Since PMML is an XML-based standard, the specification comes in the form of an XML schema. PMML itself is a mature standard with over 30 organizations having announced products supporting PMML.R (programming language)
R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, data mining surveys, and studies of scholarly literature databases show substantial increases in popularity in recent years. as of March 2019, R ranks 14th in the TIOBE index, a measure of popularity of programming languages.A GNU package, source code for the R software environment is written primarily in C, Fortran and R itself, and is freely available under the GNU General Public License. Pre-compiled binary versions are provided for various operating systems. Although R has a command line interface, there are several graphical user interfaces, such as RStudio, an integrated development environment.Rattle GUI
Rattle GUI is a free and open source software (GNU GPL v2) package providing a graphical user interface (GUI) for data mining using the R statistical programming language. Rattle is used in a variety of situations. Currently there are 15 different government departments in Australia, in addition to various other organisations around the world, which use Rattle in their data mining activities and as a statistical package.
Rattle provides considerable data mining functionality by exposing the power of the R Statistical Software through a graphical user interface. Rattle is also used as a teaching facility to learn the R software Language. There is a Log Code tab, which replicates the R code for any activity undertaken in the GUI, which can be copied and pasted. Rattle can be used for statistical analysis, or model generation. Rattle allows for the dataset to be partitioned into training, validation and testing. The dataset can be viewed and edited. There is also an option for scoring an external data file.Stochastic volatility
In statistics, stochastic volatility models are those in which the variance of a stochastic process is itself randomly distributed. They are used in the field of mathematical finance to evaluate derivative securities, such as options. The name derives from the models' treatment of the underlying security's volatility as a random process, governed by state variables such as the price level of the underlying security, the tendency of volatility to revert to some long-run mean value, and the variance of the volatility process itself, among others.
Stochastic volatility models are one approach to resolve a shortcoming of the Black–Scholes model. In particular, models based on Black-Scholes assume that the underlying volatility is constant over the life of the derivative, and unaffected by the changes in the price level of the underlying security. However, these models cannot explain long-observed features of the implied volatility surface such as volatility smile and skew, which indicate that implied volatility does tend to vary with respect to strike price and expiry. By assuming that the volatility of the underlying price is a stochastic process rather than a constant, it becomes possible to model derivatives more accurately.Timothy Jurka
Timothy Paul Jurka (born September 21, 1988) is a Polish-American computer scientist and political scientist. He is the son of computational biologist Jerzy Jurka.