Algorithmic Fairness in R: fairml & aif360 for Bias Auditing
Algorithmic fairness ensures that machine learning models don't systematically discriminate against protected groups. This guide teaches you to measure, audit, and improve fairness using R tools, because a model that's accurate on average can still be unfair to specific groups.
A hiring model that rejects 80% of female applicants but only 30% of male applicants is unfair, even if its overall accuracy is high. A credit scoring model that gives higher rates to minorities with the same creditworthiness as non-minorities is unfair. These aren't hypothetical: they've happened in real deployments. This guide gives you the tools to catch and fix these problems.
Fairness Definitions
There are multiple definitions of fairness, and they can conflict with each other. Understanding them is essential for choosing the right one for your context.
A critical result: except in trivial cases, you cannot simultaneously satisfy demographic parity, equalized odds, and calibration. You must choose which fairness criterion matters most for your application.
RDemonstrate the fairness trade-off
# Demonstrating the fairness trade-offset.seed(42)n <-1000# Simulate two groups with different base ratesgroup <-rep(c("A","B"), each = n/2)base_rate <-ifelse(group =="A", 0.4, 0.2) # Different base ratestrue_label <-rbinom(n, 1, base_rate)# A perfectly calibrated modelscore <- true_label +rnorm(n, 0, 0.3)predicted <-as.integer(score >0.5)cat("=== Fairness Trade-off Demo ===\n")cat("Base rates differ between groups:\n")cat(" Group A base rate:", mean(true_label[group =="A"]), "\n")cat(" Group B base rate:", mean(true_label[group =="B"]), "\n")cat("\nSelection rates (demographic parity check):\n")cat(" Group A:", mean(predicted[group =="A"]), "\n")cat(" Group B:", mean(predicted[group =="B"]), "\n")cat("\nTrue Positive Rates (equal opportunity check):\n")tpr_a <-mean(predicted[group =="A"& true_label ==1])tpr_b <-mean(predicted[group =="B"& true_label ==1])cat(" Group A TPR:", round(tpr_a, 3), "\n")cat(" Group B TPR:", round(tpr_b, 3), "\n")cat("\nEqualizing selection rates would break calibration.\n")cat("Equalizing TPR would change selection rates.\n")cat("You must choose which fairness criterion to prioritize.\n")
Measuring Disparate Impact
The four-fifths rule: the selection rate for any protected group should be at least 80% of the rate for the most-selected group.
cat("=== AIF360 for R ===\n\n")cat("IBM's AI Fairness 360 toolkit (Python-based, R interface available):\n\n")cat("Bias metrics:\n")cat(" - Statistical parity difference\n")cat(" - Disparate impact ratio\n")cat(" - Equal opportunity difference\n")cat(" - Average odds difference\n")cat(" - Theil index\n\n")cat("Bias mitigation algorithms:\n")cat(" Pre-processing: Reweighting, Optimized Preprocessing\n")cat(" In-processing: Adversarial Debiasing, Prejudice Remover\n")cat(" Post-processing: Equalized Odds, Calibrated Equalized Odds\n\n")cat("R usage via reticulate:\n")cat(' library(reticulate)\n')cat(' aif <- import("aif360.datasets")\n')cat(' metrics <- import("aif360.metrics")\n')
Practical Audit Workflow
Step
Action
Tool
1. Define protected attributes
List sensitive variables
Domain knowledge
2. Choose fairness metric
Match to application context
See definitions table above
3. Measure baseline
Calculate metrics on current model
fairness_audit() function
4. Set threshold
Define acceptable disparity level
Four-fifths rule or domain-specific
5. Mitigate if needed
Apply debiasing technique
fairml, reweighting, threshold tuning
6. Re-measure
Verify improvement
Same audit function
7. Document
Record decisions and trade-offs
Analysis report
RTune thresholds for fairness
# Step 5 example: threshold tuning for fairnessset.seed(42)n <-500df <-data.frame( group =rep(c("A","B"), each = n/2), score =c(rnorm(n/2, 0.6, 0.2), rnorm(n/2, 0.5, 0.2)), actual =c(rbinom(n/2, 1, 0.6), rbinom(n/2, 1, 0.4)))# Single threshold: may create disparate impactsingle_threshold <-0.5df$pred_single <-as.integer(df$score > single_threshold)# Group-specific thresholds: equalize selection ratestarget_rate <-mean(df$actual)thresh_a <-quantile(df$score[df$group =="A"], 1- target_rate)thresh_b <-quantile(df$score[df$group =="B"], 1- target_rate)df$pred_adjusted <-ifelse(df$group =="A",as.integer(df$score > thresh_a),as.integer(df$score > thresh_b))cat("=== Threshold Tuning ===\n")cat("Single threshold (0.5):\n")cat(" Group A rate:", mean(df$pred_single[df$group =="A"]), "\n")cat(" Group B rate:", mean(df$pred_single[df$group =="B"]), "\n")cat("\nAdjusted thresholds:\n")cat(" Group A threshold:", round(thresh_a, 3), "-> rate:",mean(df$pred_adjusted[df$group =="A"]), "\n")cat(" Group B threshold:", round(thresh_b, 3), "-> rate:",mean(df$pred_adjusted[df$group =="B"]), "\n")
Summary
Fairness Criterion
Best For
Trade-off
Demographic parity
Employment, lending
May select less qualified from one group
Equalized odds
Criminal justice, medical
Harder to achieve with different base rates
Equal opportunity
Scholarship, hiring
Only equalizes true positive rate
Calibration
Risk assessment, insurance
Doesn't guarantee equal rates
Individual fairness
Any
Hard to define similarity metric
FAQ
Which fairness metric should I use? It depends on context. For hiring: demographic parity or disparate impact. For criminal risk: equalized odds (equal TPR and FPR). For medical diagnostics: equal opportunity (equal sensitivity). Discuss with stakeholders and domain experts, this is not a purely technical decision.
Can I just remove the protected attribute from the model? No. This is "fairness through unawareness" and it doesn't work because other features (zip code, name patterns, school attended) can serve as proxies. You need to test outcomes by protected group regardless.
Is there a legal requirement for algorithmic fairness? Increasingly, yes. The EU AI Act classifies hiring and credit models as "high-risk" requiring bias audits. US agencies (EEOC, CFPB) use disparate impact analysis. Several US states have enacted algorithmic accountability laws. The legal landscape is evolving rapidly.