Ordinal logistic regression can be used to model a ordered factor response.
The polr() function from the MASS package can be used to build the proportional odds logistic regression and predict the class of multi-class ordered variables. One such use case is described below.
Example: Predict Cars Evaluation
Below is a example on how we can use ordered logistic regression to predict the cars evaluation based on cars evaluation dataset. The cars are evaluated as one amongst very good, good, acceptable or unacceptable. The attributes of the cars available to use to predict this decision are:
- buying : v-high, high, med, low
 - maint : v-high, high, med, low
 - doors : 2, 3, 4, 5-more
 - persons : 2, 4, more
 - lug_boot : small, med, big
 - safety : low, med, high
 
Also, it is worthwhile to note that about 70% of the cars are evaluated as unacceptable. The class distribution of the ordered multi class Y is as follows:
| class | N | N[%] | 
|---|---|---|
| unacc | 1210 | (70.023 %) | 
| acc | 384 | (22.222 %) | 
| good | 69 | (3.993 %) | 
| v-good | 65 | (3.762 %) | 
Lets being the modeling process by first importing the data and assigning the correct orders to the factor variables.
Import the data
carsdata <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data", header=F, stringsAsFactors=F)  # import string variables as characters.
colnames(carsdata) <- c("buying", "maint", "doors", "persons", "lug_boot", "safety", "class")Reorder the levels of factors
In order logistic regression, the order of the levels in the factor variables matters. So, lets define them explicitly. This is an critical step, otherwise, predictions could go worng easily.
# Reorder
carsdata$buying <- factor(carsdata$buying, levels=c("low", "med", "high", "vhigh"), ordered=TRUE)
carsdata$maint <- factor(carsdata$maint, levels=c("low", "med", "high", "vhigh"), ordered=TRUE)
carsdata$doors <- factor(carsdata$doors, levels=c("2", "3", "4", "5more"), ordered=TRUE)
carsdata$persons <- factor(carsdata$persons, levels=c("2", "4", "more"), ordered=TRUE)
carsdata$lug_boot <- factor(carsdata$lug_boot, levels=c("small", "med", "big"), ordered=TRUE)
carsdata$safety <- factor(carsdata$safety, levels=c("low", "med", "high"), ordered=TRUE)
carsdata$class <- factor(carsdata$class, levels=c("unacc", "acc", "good", "vgood"), ordered=TRUE)Prepare training and test data
# Prepare Training and Test Data
set.seed(100)
trainingRows <- sample(1:nrow(carsdata), 0.7 * nrow(carsdata))
trainingData <- carsdata[trainingRows, ]
testData <- carsdata[-trainingRows, ]Build the ordered logistic regression model
### Build ordered logistic regression model
options(contrasts = c("contr.treatment", "contr.poly"))
polrMod <- polr(class ~ safety + lug_boot + doors + buying + maint, data=trainingData)
summary(polrMod)
#> Call:
#> polr(formula = class ~ safety + lug_boot + doors + buying + maint, 
#>     data = trainingData)
#> 
#> Coefficients:
#>               Value Std. Error   t value
#> safety.L    19.9443    0.06145  324.5411
#> safety.Q   -10.6548    0.10088 -105.6189
#> lug_boot.L   1.0119    0.14011    7.2224
#> lug_boot.Q  -0.3197    0.13355   -2.3940
#> doors.L      0.5415    0.15573    3.4774
#> doors.Q     -0.2787    0.15466   -1.8018
#> doors.C     -0.1096    0.15372   -0.7132
#> buying.L    -2.0945    0.18137  -11.5480
#> buying.Q    -0.1369    0.15659   -0.8746
#> buying.C     0.5219    0.15318    3.4069
#> maint.L     -1.8209    0.17533  -10.3856
#> maint.Q     -0.4768    0.15811   -3.0153
#> maint.C      0.3319    0.15518    2.1388
#> 
#> Intercepts:
#>            Value     Std. Error t value  
#> unacc|acc     9.4557    0.0740   127.8297
#> acc|good     11.8726    0.1345    88.2882
#> good|vgood   13.1331    0.1997    65.7533
#> 
#> Residual Deviance: 1300.15 
#> AIC: 1332.15Predict on test data
### Predict
predictedClass <- predict(polrMod, testData)  # predict the classes directly
head(predictedClass)
#> [1] unacc unacc unacc unacc unacc unacc
#> Levels: unacc acc good vgood
predictedScores <- predict(polrMod, testData, type="p")  # predict the probabilites
head(predictedScores)
#>        unacc          acc         good        vgood
#> 3  0.9774549 2.049194e-02 1.470224e-03 5.829671e-04
#> 6  0.9347665 5.904708e-02 4.424660e-03 1.761744e-03
#> 12 0.9774549 2.049194e-02 1.470224e-03 5.829671e-04
#> 13 1.0000000 3.574918e-14 2.664535e-15 8.881784e-16
#> 14 0.9762376 2.159594e-02 1.551314e-03 6.151902e-04
#> 18 0.9120030 7.946377e-02 6.099087e-03 2.434191e-03
## Confusion matrix and misclassification error
table(testData$class, predictedClass)  # confusion matrix
#>        predictedClass
#>       unacc acc good vgood
#> unacc   305  45    0     4
#> acc      60  60    0     0
#> good      0  17    0     0
#> vgood     0  18    0    10
mean(as.character(testData$class) != as.character(predictedClass))  # misclassification error
#> 0.277