What is the difference between a confounder and a collider?

A confounder causes both the exposure and the outcome, adjust for it to remove bias. A collider is caused by both, adjusting for a collider INTRODUCES bias by opening a non-causal path. The picker uses d-separation to tell you which variables are which, given your DAG.

How do I choose which variables to control for in a regression?

Draw the DAG, then find a minimal sufficient adjustment set: variables that block all backdoor paths from exposure to outcome without opening collider paths. The picker computes this automatically. Avoid the "throw everything in" approach, it can introduce M-bias by conditioning on colliders.

What is M-bias in causal inference?

M-bias arises when you condition on a variable that is a common effect of two other variables, opening a non-causal path between them. The DAG looks like an M. Adjusting for the wrong "confounder" (one that is actually a collider) can flip the sign of your estimated effect. The picker flags these explicitly.

DAG Confounder Picker

A causal DAG shows which variables influence which others, so you can spot confounders, mediators, and colliders without guesswork. Sketch your graph, mark the exposure and outcome, and get the minimum set of variables you need to adjust for to estimate a clean causal effect, plus the dagitty R code.

Pearl's back-door criterion · Confounders · mediators · colliders · Runs in your browser

Try a real-world example to load.

🧮 Classic confounder

Z causes both X and Y. The observed X-Y association is biased; condition on Z to recover the causal effect.

Context

How DAG-based adjustment works

Use when

You have a causal question (does X affect Y?) and a set of variables you suspect matter. The DAG encodes your assumptions; the back-door criterion turns them into a regression model.

e.g., does smoking cause lung cancer? Age is a likely confounder; tar in the lungs is a mediator; hospital admission could be a collider.

Inputs needed

Vnode names (variables)
Edirected edges (cause → effect)
Xexposure (treatment) variable
Youtcome variable

What it returns

The minimum sufficient adjustment set, all valid sets, every back-door path, and a list of variables you must not condition on (mediators of the X→Y effect, descendants of X, and unwarranted colliders).

Adjustment

Minimum sufficient adjustment set

{ }

How we got there

R code RUNNABLE

R Reproduce in R

DAG INTERACTIVE

drag nodes to rearrange · X = blue, Y = orange, adjustment = filled

Inference

We applied d-separation rules to your causal diagram to find the minimal sufficient adjustment set.

Read more Anatomy of the back-door criterion

Back-door path: any path P from X to Y where the first edge points INTO X (X ← ... ··· Y)

Back-door paths. Any path between X and Y whose first arrow points into X is a back-door route for non-causal association. The classic case is X ← Z → Y. Walk the graph in both directions; any non-causal route from X to Y must start by going against the arrow at X, so this enumeration is exhaustive.

Path blocking (d-separation): S blocks path P if either (a) some non-collider on P is in S, OR (b) some collider on P has neither itself nor any descendant in S

Blocking a path. A non-collider on the path (a chain link or a fork) lets information flow until you condition on it. A collider (two arrows meeting head-to-head) does the opposite: it blocks the path naturally, and conditioning on it (or on any of its descendants) opens it. Conditioning on a collider is the cardinal sin of selection bias.

Valid adjustment set S: 1. no node in S is a descendant of X 2. S blocks every back-door path from X to Y

The criterion. Pearl (1995). Condition (1) keeps mediators and post-treatment variables out (adjusting on them either blocks the very effect you want, or induces collider bias). Condition (2) closes every non-causal route. The remaining association equals the causal effect of X on Y.

Minimum sufficient set: search subsets S of (V \ {X, Y, desc(X)}) in size order; return the first valid S (may not be unique)

Finding the minimum. Enumerate candidate sets from smallest (the empty set) to largest, skipping descendants of X. The first that satisfies the criterion is minimum sufficient. Sometimes several disjoint sets are minimum; the picker reports them all so you can choose by measurement cost or domain plausibility.

Caveats When this is the wrong tool

If you have…: Use instead
Time-varying treatments and confounders affected by prior treatment: G-methods (g-computation, IPW with marginal structural models, g-estimation). Static back-door adjustment is biased when confounders are post-treatment with respect to earlier treatments.
Unmeasured confounding (a confounder you cannot measure): Front-door criterion if the structure allows, instrumental variables, negative controls, or sensitivity analysis (E-value). Back-door alone fails when a needed variable is unobserved.
A DAG with a cycle (mutual causation): Structural equation models or feedback-loop methods. The back-door criterion assumes acyclicity; this tool refuses to score cyclic graphs.
Mediator analysis (you want the indirect effect of X via M): Use a mediation model (Baron & Kenny, or counterfactual mediation per VanderWeele). Adjusting for the mediator in a single regression mixes total and direct effects.
You're picking the adjustment set after seeing the regression coefficients: Stop. Pick the DAG and the set before running the model. The other order is causal p-hacking and reviewers will eat you alive.