Rr‑statistics.co

DAG Confounder Picker

A causal DAG shows which variables influence which others, so you can spot confounders, mediators, and colliders without guesswork. Sketch your graph, mark the exposure and outcome, and get the minimum set of variables you need to adjust for to estimate a clean causal effect, plus the dagitty R code.

i New to causal DAGs? Read the 4-min primer

What a DAG is. A directed acyclic graph is a picture of your causal beliefs. Nodes are variables. Arrows mean “this causes that.” No arrow means no direct effect. The graph is acyclic: nothing causes itself, even by a long route. The DAG is the assumption set; the math just reads off its consequences.

Confounder vs collider. A confounder Z is a common cause of both X and Y (Z → X and Z → Y). It opens a back-door bias path X ← Z → Y; you must condition on Z to remove it. A collider M is a common effect (X → M and Y → M). The path X → M ← Y is naturally blocked; conditioning on M opens it and induces a spurious association.

The back-door criterion. Pearl's rule: a set S is a sufficient adjustment set for the causal effect of X on Y if (1) no node in S is a descendant of X, and (2) S blocks every back-door path from X to Y. A back-door path is any path that starts with an arrow into X. Block them all and the observed X–Y association equals the causal effect.

Picking the set. The minimum sufficient set is the smallest collection of variables that meets the criterion. Often it is just the obvious confounders. Sometimes the smallest set looks counter-intuitive (M-bias: adjust for nothing rather than for a tempting collider). Always pick the set before seeing the regression results; otherwise it's data dredging dressed up as causal inference.

Pearl's back-door criterion · Confounders · mediators · colliders · Runs in your browser

Try a real-world example to load.

🧮 Classic confounder

Z causes both X and Y. The observed X-Y association is biased; condition on Z to recover the causal effect.

Minimum sufficient adjustment set
{ }
How we got there
R code RUNNABLE
R Reproduce in R

        
DAG INTERACTIVE
drag nodes to rearrange · X = blue, Y = orange, adjustment = filled
Inference

Read more Anatomy of the back-door criterion
Back-door path: any path P from X to Y where the first edge points INTO X (X ← ... ··· Y)
Back-door paths. Any path between X and Y whose first arrow points into X is a back-door route for non-causal association. The classic case is X ← Z → Y. Walk the graph in both directions; any non-causal route from X to Y must start by going against the arrow at X, so this enumeration is exhaustive.
Path blocking (d-separation): S blocks path P if either (a) some non-collider on P is in S, OR (b) some collider on P has neither itself nor any descendant in S
Blocking a path. A non-collider on the path (a chain link or a fork) lets information flow until you condition on it. A collider (two arrows meeting head-to-head) does the opposite: it blocks the path naturally, and conditioning on it (or on any of its descendants) opens it. Conditioning on a collider is the cardinal sin of selection bias.
Valid adjustment set S: 1. no node in S is a descendant of X 2. S blocks every back-door path from X to Y
The criterion. Pearl (1995). Condition (1) keeps mediators and post-treatment variables out (adjusting on them either blocks the very effect you want, or induces collider bias). Condition (2) closes every non-causal route. The remaining association equals the causal effect of X on Y.
Minimum sufficient set: search subsets S of (V \ {X, Y, desc(X)}) in size order; return the first valid S (may not be unique)
Finding the minimum. Enumerate candidate sets from smallest (the empty set) to largest, skipping descendants of X. The first that satisfies the criterion is minimum sufficient. Sometimes several disjoint sets are minimum; the picker reports them all so you can choose by measurement cost or domain plausibility.
Caveats When this is the wrong tool
If you have…
Use instead
Time-varying treatments and confounders affected by prior treatment
G-methods (g-computation, IPW with marginal structural models, g-estimation). Static back-door adjustment is biased when confounders are post-treatment with respect to earlier treatments.
Unmeasured confounding (a confounder you cannot measure)
Front-door criterion if the structure allows, instrumental variables, negative controls, or sensitivity analysis (E-value). Back-door alone fails when a needed variable is unobserved.
A DAG with a cycle (mutual causation)
Structural equation models or feedback-loop methods. The back-door criterion assumes acyclicity; this tool refuses to score cyclic graphs.
Mediator analysis (you want the indirect effect of X via M)
Use a mediation model (Baron & Kenny, or counterfactual mediation per VanderWeele). Adjusting for the mediator in a single regression mixes total and direct effects.
You're picking the adjustment set after seeing the regression coefficients
Stop. Pick the DAG and the set before running the model. The other order is causal p-hacking and reviewers will eat you alive.
Further reading

Algorithm: enumerates all undirected paths X-Y, classifies each edge orientation, and applies Pearl's d-separation test for every candidate subset of V \ {X, Y, desc(X)} in size order. Cross-checked against dagitty::adjustmentSets() for confounder, collider, mediator, IV, and M-bias canonical DAGs.