Counting Principles in R: Permutations, Combinations & Birthday Problem
Counting principles in R let you count outcomes without listing them one by one. Use factorial() for orderings, choose() for unordered selections, and combn() to enumerate every subset — three tools that power permutations, combinations, and classic puzzles like the birthday problem.
Every probability starts with a count. This post walks you through the three counting tools every R user needs, then uses them to crack the birthday problem from first principles. We stick to base R throughout.
What is the multiplication rule in R?
Probability starts with counting, and counting starts with the multiplication rule. The rule says: if one step has a outcomes and a second independent step has b outcomes, the combined process has a * b outcomes. Every other counting formula in this post is a disguised version of this one idea. Let's count every possible result of rolling two dice.
expand.grid() builds every combination of the input vectors, one row per outcome. Counting the rows (36) matches 6 * 6 exactly — that is the multiplication rule in action. For small spaces, enumerating outcomes is fine. For larger ones, multiplication is the only practical path.
The rule generalises to any number of independent steps. For k steps with option counts $a_1, a_2, \ldots, a_k$, the total is:
$$N = a_1 \times a_2 \times \cdots \times a_k$$
Consider a 4-character password drawn from 26 letters plus 10 digits — 36 options per slot, four independent slots.
Over 1.6 million combinations for a weak four-character password. The count grows as an exponent of the alphabet size, which is why one extra character massively expands the search space for a brute-force attacker.
Try it: How many outcomes are there when you roll one die and flip one coin? Save the result to ex_outcomes.
Click to reveal solution
Explanation: Six die faces times two coin sides gives 12 outcomes — the multiplication rule with two independent steps.
How do you compute permutations in R?
A permutation is an ordered arrangement. "ABC" and "CAB" are different permutations of the same three letters because the positions are different. The count of ways to arrange all n distinct items is n!, read "n factorial", and R computes it with factorial().
Think of it as filling five slots. The first slot can hold any of five books, the second any of the remaining four, and so on — the multiplication rule, applied down to 1.
Often we only fill k slots out of n. The count of ordered arrangements of n items taken k at a time is the permutation formula:
$$P(n, k) = \frac{n!}{(n - k)!}$$
Where:
- $n$ = total items available
- $k$ = slots to fill
- $P(n, k)$ = number of ordered arrangements
The intuition: you fill k slots, with n, n−1, n−2, … options at each step — stopping after k multiplications. The formula is a shortcut for that product.
Both routes give 60. The first uses the permutation formula directly. The second says: pick the three people (unordered), then count the 3! ways to sit them in the chairs. That identity — "permutation = combination × k!" — shows up again in the birthday problem.
To see every permutation explicitly, we can filter expand.grid() for rows with no duplicates. This is the base-R trick when you do not want to reach for a package.
Six rows, matching factorial(3) = 6. For small n this is fine, but it scales terribly — factorial(10) is already 3.6 million orderings, so enumeration is rarely the right tool.
Try it: Use factorial() to compute P(10, 3) — the number of ways to award gold, silver, and bronze medals to 10 athletes. Save as ex_p.
Click to reveal solution
Explanation: Ten athletes for gold, nine remaining for silver, eight for bronze — 10 × 9 × 8 = 720, which is 10! / 7!.
How do you compute combinations in R?
A combination is an unordered selection. The team {Alice, Bob} is the same as {Bob, Alice} — only who is in matters. The count of k-element subsets of an n-element set is the binomial coefficient, written "n choose k":
$$C(n, k) = \binom{n}{k} = \frac{n!}{k!\,(n - k)!}$$
R computes it with choose(n, k). No package needed.
Ten teams. The formula adds a k! to the denominator of the permutation formula, erasing the over-counting from orderings that combinations ignore.
choose() tells you how many, but combn() gives you the actual subsets. combn(x, m) returns an m-by-C(n, m) matrix where each column is one combination.
Ten rows match choose(5, 3). Order inside a row does not matter — combn() always emits combinations in lexicographic order.
combn(1:5, 3, sum) returns the sum of every 3-subset. This turns combn() into a one-liner for exhaustive search over small subsets.Try it: How many 5-card hands can be dealt from a standard 52-card deck? Save to ex_hands.
Click to reveal solution
Explanation: Poker deals five cards from 52 without regard to order, so this is choose(52, 5) — roughly 2.6 million distinct hands.
When do you use permutations vs combinations?
The entire decision comes down to one question: does order matter? If yes, use permutations. If no, use combinations. Phrased differently — if swapping two items in the arrangement produces a different outcome for your problem, order matters.
Same (n, k), wildly different counts. The PIN problem produces 60 arrangements because a PIN of 731 is a different PIN from 137. The committee problem produces 10 because {Alice, Bob, Cy} is the same committee regardless of how we list the names.
The relationship is exact: permutations over-count by the k! orderings within each combination.
This is why combinations are always smaller: they group all k! orderings of the same k items into a single unordered set.
choose() on a seating problem or a PIN problem is a top-5 probability bug. Before calling a counting function, write one sentence explaining whether swapping two items changes the answer.Try it: Given 7 books on a shelf, how many ways can you arrange 3 of them left-to-right? Save as ex_arrange.
Click to reveal solution
Explanation: Left-to-right arrangement means order matters, so this is P(7, 3). Seven books for the first slot, six for the second, five for the third — 7 × 6 × 5 = 210.
How does the birthday problem apply counting principles in R?
The birthday problem asks: in a room of n people, what is the probability that at least two share a birthday? The counterintuitive answer is that with just 23 people, the probability is already above 50%. Most readers guess closer to 100 or 200.
The direct way — "count every way at least one pair matches" — is awkward. The trick is the complement: compute the probability that everyone is different, then subtract from 1.
Assume 365 equally likely birthdays and independence. For n people, there are $365^n$ possible birthday sequences. For all n to be different, the first person has 365 options, the second 364, the third 363, and so on — a permutation count, $P(365, n)$. So:
$$P(\text{shared}) = 1 - \frac{P(365, n)}{365^n} = 1 - \frac{365!}{(365 - n)! \cdot 365^n}$$
Where:
- $n$ = number of people
- $P(\text{shared})$ = probability at least two share a birthday
Factorials of 365 overflow factorial() in R. The fix is lfactorial() — the logarithm of the factorial — and exponentiating at the end.
With 23 people, the probability is 0.5073 — just over 50%. The function stays numerically stable up to n = 365 because all arithmetic is in log space before a single exp() at the end.
Next, sweep n from 1 to 60 and find the first n where the probability crosses 0.5.
Crossover happens at exactly 23 people. By 30 there is a 70% chance, and by 50 it is 97% — the curve climbs fast once the number of pairs grows large.
The simulation route skips the formula entirely: sample birthdays, check for duplicates, and repeat thousands of times.
The simulation lands at 0.5063, within 0.001 of the analytic 0.5073 — a strong validation that the formula is correct. replicate() runs the expression 10,000 times, sample() draws 23 birthdays with replacement, and duplicated() flags any repeat.
Finally, visualise the full curve with both methods overlaid. The base-R plot runs directly in the browser.
The simulated points sit almost exactly on the analytic curve. Dashed guides mark the 50% line and n = 23 — the famous crossover point.
choose(23, 2) = 253 distinct pairs. Each pair has a 1/365 chance of matching, so the expected number of matches is about 0.69 — big enough that at least one match becomes likely. The probability grows with pairs, which grow quadratically in n.Try it: Modify bday_prob() so the year length is an argument days (default 365). Use it to find the crossover n for a leap year (days = 366). Save as ex_crossover.
Click to reveal solution
Explanation: Adding one day (Feb 29) barely shifts the probabilities — crossover is still at 23. The birthday problem is robust to small changes in the denominator because it is driven by the number of pairs.
Practice Exercises
These capstones combine several ideas from the tutorial. Use variable names prefixed with my_ so they do not overwrite tutorial variables.
Exercise 1: Probability of a flush in poker
Compute the probability that a random 5-card hand from a standard 52-card deck is a flush — all five cards of the same suit. Combine choose() with the multiplication rule: pick a suit, then pick 5 cards from that suit. Save to my_flush_prob.
Click to reveal solution
Explanation: Four suits times choose(13, 5) same-suit hands gives the flush count; dividing by choose(52, 5) gives roughly 0.2% — about 1 in 505 hands. (Poker convention excludes straight flushes from this count, but the formula above matches the general "all same suit" question.)
Exercise 2: Triple-birthday simulation
Simulate the probability that in a room of 30 random people, at least three people share a birthday (not just two). The closed form is messy, so use simulation only: draw 30 birthdays, check whether any single birthday appears three or more times, repeat 10,000 times. Save to my_triple_prob.
Click to reveal solution
Explanation: tabulate() counts occurrences of each integer, and max() >= 3 asks whether any birthday appears at least three times. Around 2.9% — much rarer than the pair version, but still non-trivial in a standard classroom.
Exercise 3: 4-digit PINs that contain a 7
Count how many 4-digit PINs with distinct digits from 0-9 contain the digit 7 at least once. Generate all such PINs with combn() plus permutation, then filter. Save the count to my_with7.
Click to reveal solution
Explanation: Using the complement is cleaner than enumeration. Total permutations P(10, 4) = 5040, PINs with no 7 are P(9, 4) = 3024, so 2016 PINs contain at least one 7 — about 40%.
Complete Example: the bootcamp-classroom question
A data science bootcamp has 28 students per cohort. The instructor wants to open each class with the fact that two students probably share a birthday. Write one script that (a) estimates the probability analytically, (b) validates it by simulation, and (c) reports how many distinct pairs of students exist.
A 28-person cohort has a 65% shared-birthday probability, backed by 378 distinct pairs — any of which could collide. Analytic and simulated probabilities agree to within 0.001. The same three-step pattern — formula, simulation, pair count — answers almost every intro-probability counting question that comes up in practice.
Summary
The three tools in this tutorial cover almost every counting problem in introductory probability.
| Concept | R function | Order matters? | Example |
|---|---|---|---|
| Multiplication rule | a * b |
N/A | Two dice → 36 outcomes |
| Factorial | factorial(n) |
Yes, all n | 5 books in a row → 120 |
| Permutations | factorial(n) / factorial(n - k) |
Yes | 5 people, 3 chairs → 60 |
| Combinations (count) | choose(n, k) |
No | 3-person team from 5 → 10 |
| Combinations (enumerate) | combn(x, k) |
No | List every team |
| Large-n formulas | lfactorial() |
— | Birthday problem |
Key takeaways:
- The multiplication rule is the foundation: everything else adjusts for order and repetition.
- If order matters use permutations, otherwise combinations. Writing the rule in a sentence prevents the most common probability bug.
choose()gives the count,combn()gives the actual subsets — both are base R.- For probabilities of rare events over large spaces,
lfactorial()keeps arithmetic stable; simulation viareplicate()is a sanity check that rarely disagrees. - The birthday problem's surprise is quadratic pair growth —
choose(n, 2)climbs fast, which is why 23 people already give you a 50% chance of a match.
References
- Grinstead, C. M. & Snell, J. L. — Introduction to Probability, Chapter 3: Combinatorics. Hosted on LibreTexts. Link/03:_Combinatorics)
- Wikipedia — Birthday problem. Link
- R Core Team —
choose,factorial, andcombnreference (base package). Link - Robinson, D. — The birthday paradox puzzle: tidy simulation in R. Variance Explained. Link
- Heiss, A. — Calculating birthday probabilities with R instead of math. Link
- Wickham, H. — Advanced R, 2nd Edition. CRC Press (2019). Link
Continue Learning
- Sample Spaces, Events and Probability Axioms in R — the parent post, where every counting principle here becomes a probability via the axioms.
- Conditional Probability in R — builds on counting to update beliefs when new information arrives.
- Central Limit Theorem in R — another Monte Carlo showcase, using
replicate()to validate a theoretical result.