caret icr() in R: Independent Component Regression Models
The icr() function in caret fits an Independent Component Regression by extracting n.comp independent components from the predictors with fastICA::fastICA() and regressing the response on those components. It is the dimension-reduction cousin of principal component regression for cases where the underlying signals are statistically independent rather than orthogonal.
icr(mpg ~ ., data = mtcars, n.comp = 3) # formula interface icr(x = mtcars[, -1], y = mtcars$mpg, n.comp = 3) # x, y interface predict(fit, newdata = mtcars[1:5, ]) # score new rows predict(fit, newdata = mtcars[1:5, ], n.comp = 2) # subset components at predict icr(mpg ~ ., data = mtcars, n.comp = 5, maxit = 500) # passthrough to fastICA train(mpg ~ ., method = "icr", data = mtcars) # CV-tuned via caret fit$ica$K # whitening matrix
Need explanation? Read on for examples and pitfalls.
What icr() does in one sentence
icr() is caret's regression-on-independent-components wrapper. You hand it a formula or an x and y, set n.comp (the number of independent components to extract, default 3), and the function calls fastICA::fastICA() to rotate the predictors into statistically independent latent variables, then fits an ordinary linear regression of y on those components. predict() later applies the same rotation to new rows and scores them with the stored linear model.
icr() syntax and arguments
Two equivalent entry points cover formula and matrix workflows. The function mirrors the familiar lm() and pls::pcr() API, then passes any extra arguments straight to fastICA::fastICA().
Formula form:
icr(formula, data, ..., subset, na.action, contrasts = NULL)
Matrix form:
icr(x, y, ...)
formula: likempg ~ ., a numeric outcome regressed on the predictor columns.data: a data frame holding the columns named in the formula.x,y: a numeric matrix or data frame of predictors plus a numeric outcome vector.n.comp: number of independent components to extract. Passed tofastICA. Default 3. Must be at mostncol(x)....: forwarded tofastICA::fastICA(), includingalg.typ,fun,maxit,tol, androw.norm.subset,na.action,contrasts: standard model-fitting controls used by the formula method.
icr() accepts every fastICA() tuning knob. Pass alg.typ = "deflation" to extract components one at a time instead of in parallel, or raise maxit = 500 if the default 200 iterations does not converge. The defaults work for most small tabular inputs.icr() examples by use case
1. Fit a basic ICR model on mtcars
The shortest call extracts three independent components from the ten predictors and regresses mpg on them. The returned object stores the linear fit, the ICA rotation matrices, and the call.
The model object exposes fit$model (the underlying lm on the components), fit$ica$K (the pre-whitening matrix), and fit$ica$W (the unmixing matrix). Together they map raw predictors to the latent components used for prediction.
2. Score new data with predict()
predict() applies the stored ICA rotation to newdata and runs the linear regression on the resulting components.
Compare these to the actual mtcars$mpg[1:5] values (21.0, 21.0, 22.8, 21.4, 18.7). The fit tracks the truth on the high-mpg cars and slightly overestimates the Datsun, which is typical when three components retain most but not all of the predictive structure.
3. Use fewer components at prediction time
Pass n.comp to predict() to score with a prefix of the components stored in the model. This is useful for ablation checks: how much does each component carry?
Predictions shift because dropping the third component zeroes out one term of the linear model. Sweep n.comp from 1 to the model rank to plot a learning curve.
4. Hold out a test set and check RMSE
The training-set fit is optimistic. Always validate on rows that the ICA rotation has never seen.
An out-of-sample RMSE near 3.5 mpg sets the noise floor on this tiny holdout. ICR shines on wider tabular inputs where predictors are linear mixtures of a handful of independent drivers.
5. Tune n.comp through caret train()
For grid search over n.comp, hand icr to train() and let caret cross-validate.
train() refits ICR once per n.comp value, for length(grid) * folds ICA decompositions total. The decomposition dominates cost on high-dimensional x; tune on a subsample if the grid runs slow.
icr() vs other dimension-reduction regressions
icr() is the right pick when the underlying signals are statistically independent. Other techniques optimize for different criteria.
| Function | Decomposition basis | Optimizes | When to use |
|---|---|---|---|
caret::icr() |
independent components | statistical independence, non-Gaussianity | latent signals are non-Gaussian mixtures |
pls::pcr() |
principal components | variance of predictors | collinear predictors, no response info in rotation |
pls::plsr() |
latent variables | covariance with response | supervised reduction, response-informed axes |
glmnet(alpha = 0) |
original predictors | shrinkage of coefficients | keep features, penalize magnitude |
fastICA::fastICA() |
independent components | independence only | inspect components without regression |
For the underlying decomposition, see the fastICA reference.
Common pitfalls
Pitfall 1: forgetting the fastICA package. caret requires fastICA but does not install it. Run install.packages("fastICA") once; without it, icr() errors with "there is no package called 'fastICA'".
Pitfall 2: setting n.comp larger than ncol(x). ICA cannot extract more independent components than input features. Cap n.comp at ncol(x), or at the rank of the predictor matrix, and set a set.seed() for reproducibility.
Pitfall 3: skipping predictor scaling. fastICA centers predictors but does not scale them. Variables on different scales (disp in cubic inches vs qsec in seconds) dominate the decomposition. Pre-process with caret::preProcess(method = c("center", "scale")) first.
Pitfall 4: expecting interpretable signs and order. ICA returns components up to sign and permutation. The same data fit twice can produce sign-flipped or reordered components, even though predictions are identical. Compare by absolute loadings.
predict() does not accept a bare numeric vector. Pass a data frame or matrix with the same column structure as the training data. A single new observation must be wrapped as a one-row data frame, or the call will silently misalign predictors with the stored ICA rotation.Try it yourself
Try it: Fit an ICR model on mtcars with n.comp = 4, predict mpg for the first three rows, and compute the residuals. Save the predictions to ex_pred and residuals to ex_resid.
Click to reveal solution
Explanation: predict() rotates the three new rows through the stored ICA matrices, then runs the fitted linear regression on the four resulting components. Subtracting the predictions from the true mpg values gives per-row residuals; small residuals confirm the four-component fit captures most of the variance.
Related caret functions
These complete a typical ICR workflow:
train()withmethod = "icr": cross-validated search for the bestn.comppreProcess(): center and scale predictors before ICA decompositioncreateDataPartition(): stratified train and test split before fittingvarImp(): variable importance from the linear model on the componentsbagEarth(): a different dimension-stable regressor when MARS is the natural base learner
FAQ
What is icr in caret used for?
icr() fits a regression of a numeric response on a small number of independent components extracted from the predictors. It is the ICA-based counterpart to principal component regression, useful when the underlying drivers are non-Gaussian mixtures. The decomposition is computed by fastICA::fastICA(), and the regression on the resulting components is an ordinary lm fit, so all of lm's diagnostics still apply.
How do I choose n.comp for caret icr?
Sweep n.comp = 1:k (where k is at most ncol(x)) through train(method = "icr", trControl = trainControl(method = "cv")) and pick the value with the lowest cross-validated RMSE. Start near the number of latent signals you expect from domain knowledge, then expand the grid. Compare training and holdout RMSE to confirm the choice does not over-fit.
How is icr different from principal component regression?
Both rotate the predictors before fitting a linear regression, but they optimize different criteria. PCR uses principal components that maximize predictor variance and are uncorrelated; ICR uses independent components that maximize non-Gaussianity and are statistically independent. When latent factors are non-Gaussian mixtures, ICR recovers them more faithfully; when factors are roughly Gaussian, PCR is usually preferable and faster.
Does caret icr support classification?
No. icr() regresses a numeric response with lm() under the hood, so it is regression-only. For ICA preprocessing on a classification task, use caret::preProcess(method = "ica") to extract components, then fit any caret classifier (glm, glmnet, rf) on the rotated data.
Why are my icr predictions different across reruns?
fastICA initializes its unmixing matrix randomly. Without a seed, the decomposition lands on different but equivalent solutions (sign and permutation differences) and propagates slightly different fitted coefficients. Call set.seed() immediately before icr() for reproducibility, and read absolute loadings rather than raw values.