WBM STATS: Mediators, Moderators, Confounders, and Covariates in Research: What Sets Them Apart?

by W. B. Meitei, PhD

In empirical research, particularly in epidemiology, psychology, public health, and social sciences, it is essential to understand the roles of different types of variables in a statistical model. Terms like mediators, moderators, confounders, and covariates are often used, but they are frequently misunderstood or used interchangeably. This article aims to clarify their distinctions with examples and offer practical strategies for their identification and correct use in analysis.

Definitions

A mediator explains how or why an intervention affects an outcome. It lies in the causal pathway between exposure and outcome. For example, in a rehabilitation program designed to improve motor skills, increased motivation might be the mechanism through which improvement occurs. Thus, motivation is a mediator.

A moderator affects the strength or direction of the relationship between an intervention and its outcome. It also lies in the causal pathway. For example, if a health treatment is more effective in younger patients than in older ones, age acts as a moderator of the treatment effect.

A confounder is a variable that is associated with both the exposure and the outcome but is not part of the causal pathway. Failure to control for confounders can lead to biased results. For example, pre-existing physical activity may influence both participation in a fitness intervention and health outcomes. If not adjusted for, it confounds the relationship.

A covariate is a variable included in a model to control for variability or improve the model’s precision. Not all covariates are confounders. For example, gender or education level may be included in a model to reduce unexplained variance, even if they don’t confound the relationship being studied.

How to identify???

1. Start with a conceptual model

Outline the theoretical pathways between the exposure and outcome variables.
Include all plausible variables based on literature, clinical knowledge, and contextual nuances (household dynamics, regional disparities, etc.).

2. Classify variables based on their roles

Variable Type	Identification Clue	Example
Mediator	Lies in the pathway between exposure and outcome	Behaviour change resulting from a health campaign that leads to better diabetes management
Moderator	Alters the strength or direction of the effect	Household income level influencing how effective dietary interventions are
Confounder	Associated with both exposure and outcome but not part of the causal chain	Age or genetic predisposition that affects both intervention uptake and disease prevalence
Covariate	Statistically controlled to improve precision or account for variability	Gender or education level included in the model to reduce residual variance

Analytical Strategies

Understanding the roles of mediators, confounders, and moderators requires both conceptual clarity and appropriate statistical techniques. Below is an outline of the key analytical strategies:

a. Common mediation analysis

Baron & Kenny Approach: It is a classic and widely cited method for testing mediation in statistical analysis. Introduced by Reuben M. Baron and David A. Kenny in 1986, it provides a step-by-step regression-based framework to determine whether the relationship between an independent variable and a dependent variable is transmitted through a third variable, called a mediator. See Baron & Kenny (1986) to know more.
Sobel Test: It is a statistical method used to assess whether a mediator variable significantly carries the influence of an independent variable to a dependent variable. It provides a formal test of the indirect effect. In mediation analysis, we examine Path A: the effect of independent variables on the mediator and Path B: the effect of the mediator on the dependent variable. The Sobel test checks whether this indirect effect (i.e., A × B) is significantly different from zero.
Bootstrapping Methods: Bootstrapping is a non-parametric resampling technique used to assess the significance and confidence intervals of the indirect effect in a mediation model without relying on the assumption of normality. It is particularly useful because the sampling distribution of the indirect effect (A × B) is often skewed, especially in small samples. If the 95% confidence interval for the indirect effect does not include 0, the mediation effect is statistically significant.
Structural Equation Modelling (SEM): SEM is widely used for establishing the direct and indirect relationship between intercorrelated dependent and independent variables. It enables simultaneous estimation of complex pathways (direct, indirect, and total effects).
Causal Mediation Analysis (CMA): One can use the mediation package in R to perform CMA. In addition to the estimation of causal mediation effects, the package also allows researchers to conduct sensitivity analysis for certain parametric models.

b. Common moderator analysis

Interaction Terms in Regression: The test for moderation is done by adding interaction terms in regression. For example, in a linear regression equation, Y = β₀ + β₁X + β₂Z + β₃(X × Z) + ε, having a significant β₃ suggests moderation.
Stratified Analysis: It is a straightforward and intuitive method used to explore moderation by examining whether the relationship between an independent variable and a dependent variable varies across levels (strata) of the moderator. For example, perform the regression analysis in subgroups of the moderator (e.g., males vs. females) and examine the changes in the regression estimates.
Hierarchical Linear Modelling (HLM): It is a powerful statistical approach used when data are nested (e.g., schools, communities). It is particularly well-suited for moderator analysis when the moderator variable exists at a different level than the predictor or outcome. See Davison et al. (2002) to know more.

c. Common confounder analysis

Change-in-Estimate Criterion: It is a practical and commonly used method to determine whether a variable is a confounder in regression-based analyses. It assesses whether adjusting for a variable substantially changes the estimated effect of the exposure on the outcome. The method involves including a variable in the model and examining if the exposure-outcome estimate changes. A large change (e.g., exceeding a predefined threshold like 10%) suggests the variable is a confounder.
Multivariable Regression Adjustment: It helps in isolating the independent effects of exposure by adding potential confounders to the model.
Directed Acyclic Graphs (DAGs): It is a visual tool used in epidemiology, statistics, and causal inference to represent assumptions about the causal relationships between variables. It can be used to conceptualise confounding structure and select proper adjustment sets. See IBM (2025) to know more.
Other common analytical methods include matching, stratification, weighting (inverse propensity weighting), and randomisation.

d. Common covariate adjustment

To increase statistical precision by reducing unexplained variance,

Include variables known to affect the outcome but are not confounders or mediators.
Include variables to control for heterogeneity, not necessarily causality.

Common mistake while identifying these variables

a. Confusing moderators with mediators

Mistake: Assuming any variable that influences the outcome must be a mediator.
Why It’s Wrong: Mediators transmit the effect; moderators alter the strength or direction. For example, saying household income explains why a nutrition program works (mediator) when it actually makes the program more effective in some income groups (moderator).

b. Mislabelling covariates as confounders

Mistake: Treating all adjusted variables as confounders.
Why It’s Wrong: Not all covariates distort causal inference; some are simply included to improve precision. For example, controlling for education to tighten model estimates doesn’t mean education confounds the intervention-outcome pathway.

c. Overadjustment bias

Mistake: Including a mediator as a covariate during estimation of total effects.
Why It’s Wrong: This blocks part of the causal pathway, underestimating the total effect. For example, adjusting for motivation when estimating the impact of a rehab program, if motivation is a true mediator, this skews the findings.

d. Ignoring temporal ordering

Mistake: Identifying variables without considering time sequence.
Why It’s Wrong: Confounders must precede exposure; mediators occur post-exposure. For example, including blood pressure measured after intervention as a confounder, when it might actually be a mediator or outcome itself.

e. Lack of theoretical framework

Mistake: Relying solely on statistical associations.
Why It’s Wrong: Without a causal or conceptual model, variable misclassification is easy.

Best practices that can be adopted to avoid these errors

a. Conceptual clarity

Build a theoretical framework or DAG before any analysis

It helps distinguish causal pathways, spurious relationships, and variables needing adjustment.
Use DAGs to clarify: What comes before the exposure? What is the mechanism? What simply correlates?

Clearly define each variable’s hypothesised role before modelling

Don't let statistical output alone drive classification; context matters.

b. Temporal sequencing

Ensure correct variable timing

Confounders precede the exposure. For a variable to qualify as a confounder, it must occur before the intervention in a study. This timing is crucial because it ensures that the confounding effect is not simply a consequence of the intervention itself. When a confounder happens after the exposure, it may merely reflect the outcome influenced by the exposure rather than biasing the relationship. This can lead researchers to draw incorrect conclusions about the causal effects of the intervention on the outcome.
Mediators occur after the exposure.
Avoid measuring “mediators” before the intervention; they aren’t mediators in that case.

c. Statistical discipline

Run sensitivity analyses

See how the inclusion/exclusion of variables shifts your effect estimates.
If effect estimates swing dramatically after adjusting for a variable, flag it as a potential confounder.

Separate models for direct and mediated effects

For SEM: estimate total, direct, and indirect effects carefully.
Avoid controlling for mediators when estimating the total effect

Test interaction terms thoughtfully

Only include moderators with theoretical justification, not just because an interaction “looks” significant.

d. Smart covariate adjustment

Covariates ≠ confounders by default

Include them to reduce variance or improve precision, but don’t overinterpret their roles.
Avoid "kitchen-sink modelling", every variable included should have a reason.

Document every assumption

Include notes in your reports/presentations explaining your rationale for each variable’s classification.

e. Clearly specify the hypothesis

Depending on the theory or hypothesis and models, a variable can be a mediator or a moderator. Below are a few examples.

Example 1: stress, social support, and depression

If we want to study the effect of stress on depression with social support as an additional variable.

Social support as mediator

Process: Stress affects social support, which in turn affects depression.
Hypothesis: Higher stress → reduced social support (people withdraw or perceive less support) → increased depression.

In this case, social support mediates the effect of stress on depression, i.e., it transmits the effect.

Social support as moderator

Process: Social support changes the impact of stress on depression.
Hypothesis: The effect of stress on depression is higher among people with lower social support than their counterparts.

In this case, social support moderates the stress-depression relationship, i.e., it changes its strength.

Example 2: education, income, and gender

Let’s say education predicts income, then gender can be,

A mediator if we hypothesise that education changes gender roles (in certain social contexts), which then influences income.
A moderator, if we hypothesise the impact of education on income is different for men and women.

Suggested Readings:

Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of personality and social psychology, 51(6), 1173. https://www.sesp.org/files/The%20Moderator-Baron.pdf
Davison, M. L., Kwak, N., Seo, Y. S., & Choi, J. (2002). Using hierarchical linear models to examine moderator effects: Person-by-organization interactions. Organizational Research Methods, 5(3), 231-254. https://doi.org/10.1177/1094428102005003003
Field-Fote, E. (2019). Mediators and moderators, confounders and covariates: Exploring the variables that illuminate or obscure the “active ingredients” in neurorehabilitation. Journal of Neurologic Physical Therapy, 43(2), 83-84. https://pubmed.ncbi.nlm.nih.gov/30883494/
IBM (2025). What is a directed acyclic graph (DAG)? https://www.ibm.com/think/topics/directed-acyclic-graph
Imai, K., Keele, L., & Tingley, D. (2010). A general approach to causal mediation analysis. Psychological methods, 15(4), 309. https://cran.r-project.org/web/packages/mediation/vignettes/mediation-old.pdf
Kenny, D. A. (2025). Mediation. https://davidakenny.net/cm/mediate.htm
Kenny, D. A. (2018). Moderation. https://davidakenny.net/cm/moderation.htm

Suggested Citation: Meitei, W. B. (2025). Mediators, Moderators, Confounders, and Covariates in Research: What Sets Them Apart? WBM STATS.

Pages

Sunday, 27 July 2025

Mediators, Moderators, Confounders, and Covariates in Research: What Sets Them Apart?

No comments:

Post a Comment

Disclaimer