Controlling for a variable

In statistics, controlling for a variable is the attempt to reduce the effect of confounding variables in an observational study or experiment. It means that when looking at the effect of one variable, the effects of all other variable predictors are taken into account,[1] either by making the other variables take on a fixed value (in an experiment) or by including them in a regression to separate their effects from those of the explanatory variable of interest (in an observational study).


Experiments attempt to assess the effect of manipulating one or more independent variables on one or more dependent variables. To ensure the measured effect is not influenced by external factors, other variables must be held constant. These variables that are made to remain constant during an experiment are referred to as the control variables.

For example, if an outdoor experiment were to be conducted to compare how different wing designs of a paper airplane (the independent variable) affect how far it can fly (the dependent variable), one would want to ensure that they conduct the experiment at times when the weather is the same because one would not want weather to affect the experiment. In this case, the control variables may be wind speed and precipitation. If the experiment were conducted when it was sunny with no wind, but the weather changed, one would want to postpone the completion of the experiment until the control variables (the wind and precipitation level) were the same as when the experiment began.

In controlled experiments of medical treatment options on humans, researchers randomly assign individuals to a treatment group or control group. This is done to reduce the confounding effect of irrelevant variables that are not being studied, such as the placebo effect.

Observational studies

In an observational study, researchers have no control over the values of the independent variables, such as who receives the treatment. Instead, they must control for variables using statistics.

Observational studies are used when controlled experiments may be unethical or impractical. For instance, if a researcher wished to study the effect of unemployment (the independent variable) on health (the dependent variable), it would be considered unethical by most institutional review boards to randomly assign some participants to have jobs and some not to. Instead, the researcher will have to create a sample where some people are employed and some are unemployed. However, there could be factors that affect both whether someone is employed and how healthy he or she is. Any observed association between the independent variable and the dependent variable could be due instead to these outside, spurious factors rather than indicating a true link between them. This can be problematic even in a true random sample. By controlling for the extraneous variables, the researcher can come closer to understanding the true effect of the independent variable on the dependent variable.

In this context the extraneous variables can be controlled for by using multiple regression. The regression uses as independent variables not only the one or ones whose effects on the dependent variable are being studied, but also any potential confounding variables, thus avoiding omitted variable bias.

See also


  1. ^ Frost, Jim. "A Tribute to Regression Analysis | Minitab". Retrieved 2015-08-04.
  • Freedman, David; Pisani, Robert; Purves, Roger (2007). Statistics. W. W. Norton & Company. ISBN 0393929728.
Causal model

A causal model (or structural causal model) is a conceptual model that describes the causal mechanisms of a system. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.

They can allow some questions to be answered from existing observational data without the need for an interventional study such as a randomized controlled trial. Some interventional studies are inappropriate for ethical or practical reasons, meaning that without a causal model, some questions cannot be answered.

Causal models can help with the question of external validity (whether results from one study apply to unstudied populations). Causal models can allow data from multiple studies to be merged (in certain circumstances) to answer questions that cannot be answered by any individual data set.

Causal models are falsifiable, in that if they do not match data, they must be rejected as invalid.

Causal models have found applications in signal processing, epidemiology and machine learning.

Dependent and independent variables

In mathematical modeling, statistical modeling and experimental sciences, the values of dependent variables depend on the values of independent variables. The dependent variables represent the output or outcome whose variation is being studied. The independent variables, also known in a statistical context as regressors, represent inputs or causes, that is, potential reasons for variation. In an experiment, any variable that the experimenter manipulates can be called an independent variable. Models and experiments test the effects that the independent variables have on the dependent variables. Sometimes, even if their influence is not of direct interest, independent variables may be included for other reasons, such as to account for their potential confounding effect.

Design of experiments

The design of experiments (DOE, DOX, or experimental design) is the design of any task that aims to describe or explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in which natural conditions that influence the variation are selected for observation.

In its simplest form, an experiment aims at predicting the outcome by introducing a change of the preconditions, which is represented by one or more independent variables, also referred to as "input variables" or "predictor variables." The change in one or more independent variables is generally hypothesized to result in a change in one or more dependent variables, also referred to as "output variables" or "response variables." The experimental design may also identify control variables that must be held constant to prevent external factors from affecting the results. Experimental design involves not only the selection of suitable independent, dependent, and control variables, but planning the delivery of the experiment under statistically optimal conditions given the constraints of available resources. There are multiple approaches for determining the set of design points (unique combinations of the settings of the independent variables) to be used in the experiment.

Main concerns in experimental design include the establishment of validity, reliability, and replicability. For example, these concerns can be partially addressed by carefully choosing the independent variable, reducing the risk of measurement error, and ensuring that the documentation of the method is sufficiently detailed. Related concerns include achieving appropriate levels of statistical power and sensitivity.

Correctly designed experiments advance knowledge in the natural and social sciences and engineering. Other applications include marketing and policy making.

Mediation (statistics)

In statistics, a mediation model is one that seeks to identify and explain the mechanism or process that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third hypothetical variable, known as a mediator variable (also a mediating variable, intermediary variable, or intervening variable). Rather than a direct causal relationship between the independent variable and the dependent variable, a mediation model proposes that the independent variable influences the (non-observable) mediator variable, which in turn influences the dependent variable. Thus, the mediator variable serves to clarify the nature of the relationship between the independent and dependent variables.Mediation analyses are employed to understand a known relationship by exploring the underlying mechanism or process by which one variable influences another variable through a mediator variable. Mediation analysis facilitates a better understanding of the relationship between the independent and dependent variables when the variables appear to not have a definite connection. They are studied by means of operational definitions and have no existence apart.

Scientific control

A scientific control is an experiment or observation designed to minimize the effects of variables other than the independent variable. This increases the reliability of the results, often through a comparison between control measurements and the other measurements. Scientific controls are a part of the scientific method.

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.