DAG Confounder Picker
A causal DAG shows which variables influence which others, so you can spot confounders, mediators, and colliders without guesswork. Sketch your graph, mark the exposure and outcome, and get the minimum set of variables you need to adjust for to estimate a clean causal effect, plus the dagitty R code.
New to causal DAGs? Read the 4-min primer ▾
What a DAG is. A directed acyclic graph is a picture of your causal beliefs. Nodes are variables. Arrows mean “this causes that.” No arrow means no direct effect. The graph is acyclic: nothing causes itself, even by a long route. The DAG is the assumption set; the math just reads off its consequences.
Confounder vs collider. A confounder Z is a common cause of both X and Y (Z → X and Z → Y). It opens a back-door bias path X ← Z → Y; you must condition on Z to remove it. A collider M is a common effect (X → M and Y → M). The path X → M ← Y is naturally blocked; conditioning on M opens it and induces a spurious association.
The back-door criterion. Pearl's rule: a set S is a sufficient adjustment set for the causal effect of X on Y if (1) no node in S is a descendant of X, and (2) S blocks every back-door path from X to Y. A back-door path is any path that starts with an arrow into X. Block them all and the observed X–Y association equals the causal effect.
Picking the set. The minimum sufficient set is the smallest collection of variables that meets the criterion. Often it is just the obvious confounders. Sometimes the smallest set looks counter-intuitive (M-bias: adjust for nothing rather than for a tempting collider). Always pick the set before seeing the regression results; otherwise it's data dredging dressed up as causal inference.
Try a real-world example to load.
Z causes both X and Y. The observed X-Y association is biased; condition on Z to recover the causal effect.
How we got there
Read more Anatomy of the back-door criterion
Caveats When this is the wrong tool
- If you have…
- Use instead
- Time-varying treatments and confounders affected by prior treatment
- G-methods (g-computation, IPW with marginal structural models, g-estimation). Static back-door adjustment is biased when confounders are post-treatment with respect to earlier treatments.
- Unmeasured confounding (a confounder you cannot measure)
- Front-door criterion if the structure allows, instrumental variables, negative controls, or sensitivity analysis (E-value). Back-door alone fails when a needed variable is unobserved.
- A DAG with a cycle (mutual causation)
- Structural equation models or feedback-loop methods. The back-door criterion assumes acyclicity; this tool refuses to score cyclic graphs.
- Mediator analysis (you want the indirect effect of X via M)
- Use a mediation model (Baron & Kenny, or counterfactual mediation per VanderWeele). Adjusting for the mediator in a single regression mixes total and direct effects.
- You're picking the adjustment set after seeing the regression coefficients
- Stop. Pick the DAG and the set before running the model. The other order is causal p-hacking and reviewers will eat you alive.
- Confounding variables, plain-English - the intuition before the algebra.
- Causal inference in R - methods beyond regression.
- lm() output interpreter - once you have the right adjustment set, fit and read the model.
- Confidence interval calculator - for the precision of the adjusted effect.
Algorithm: enumerates all undirected paths X-Y, classifies each edge orientation, and applies Pearl's d-separation test for every candidate subset of V \ {X, Y, desc(X)} in size order. Cross-checked against dagitty::adjustmentSets() for confounder, collider, mediator, IV, and M-bias canonical DAGs.