Ordinal regression — proportional-odds model (POMLikelihood)

The proportional-odds model (POM) is the workhorse of ordinal-response analysis: a single set of regression coefficients β describes how covariates shift the cumulative log-odds of being at or below each class boundary. It is the cumulative-link cousin of the binary logit; where binary logit has one cutoff at zero, POM has K − 1 ordered cutoffs α_1 < α_2 < ⋯ < α_{K−1} carving the latent scale into K ordered classes.

The model dates back to McCullagh (1980) and is the default ordinal treatment in standard texts (Agresti 2010, Analysis of Ordinal Categorical Data; Tutz 2012, Regression for Categorical Data). This vignette walks through a synthetic recovery, matching the R-INLA oracle fixture under test/oracle/test_synthetic_pom.jl.

Model

For ordered response y_i ∈ {1, …, K} and linear predictor η_i (without an intercept — the cut points absorb it):

\[\Pr(y_i \le k \mid \eta_i, \alpha) \;=\; F(\alpha_k - \eta_i), \qquad k = 1, \dots, K-1\]

with α_0 = −∞, α_K = +∞, and F the standard logistic CDF. Class probabilities are differences:

\[\Pr(y_i = k \mid \eta_i, \alpha) \;=\; F(\alpha_k - \eta_i) \;-\; F(\alpha_{k-1} - \eta_i).\]

The "proportional odds" name comes from the log-odds:

\[\log \frac{\Pr(y_i \le k)}{\Pr(y_i > k)} \;=\; \alpha_k - \eta_i,\]

so a one-unit increase in any covariate shifts the cumulative log-odds of being at or below every boundary by the same β — the proportional-odds assumption.

Internal-scale parameterisation

The K − 1 cutpoints carry as likelihood hyperparameters on R-INLA's internal scale:

\[\theta_1 = \alpha_1, \qquad \theta_k = \log(\alpha_k - \alpha_{k-1}),\;\; k = 2, \dots, K-1.\]

The increments α_k − α_{k−1} = exp(θ_k) are strictly positive, guaranteeing the ordering for any θ ∈ ℝ^{K−1}. The POMLikelihood(K) constructor exposes K as n_classes; nhyperparameters(ℓ) returns K − 1.

Prior on the cut points

R-INLA's pom family hard-wires a single Dirichlet prior on the class probabilities implied by the cut points at η = 0:

\[\pi_k(\alpha) \;=\; F(\alpha_k) - F(\alpha_{k-1}), \qquad (\pi_1, \dots, \pi_K) \sim \text{Dirichlet}(\gamma, \dots, \gamma).\]

The single concentration γ > 0 (dirichlet_concentration, default 3 — matching R-INLA) is the only free prior parameter. All K − 1 θ slots inherit from this joint prior; you tune the shape of the prior by tuning γ alone.

γ → 1 is uniform on the simplex; large γ concentrates mass on the balanced-class corner π_k = 1/K. The default γ = 3 is a mild preference for balanced cutoffs that still lets the data move them.

Synthetic recovery

using Random, SparseArrays
using GMRFs, LatentGaussianModels
using LatentGaussianModels: POMLikelihood, FixedEffects,
                            LatentGaussianModel, inla,
                            fixed_effects, hyperparameters,
                            log_marginal_likelihood

rng        = MersenneTwister(20260505)
n          = 400
K          = 4
β_true     = 0.8
α_true     = (-1.0, 0.0, 1.2)            # cut points

x  = randn(rng, n)
η  = β_true .* x

# Sample y via the cumulative-logit construction:
# y_i = 1 + Σ_k 𝟙{α_k − η_i < logit(U_i)}, U_i ~ Uniform(0, 1).
function _sample_ordinal(rng, η, α, K)
    n = length(η)
    y = Vector{Int}(undef, n)
    for i in 1:n
        u = rand(rng); t = log(u / (1 - u))
        k = K
        for j in 1:(K - 1)
            if α[j] - η[i] >= t
                k = j; break
            end
        end
        y[i] = k
    end
    return y
end

y = _sample_ordinal(rng, η, α_true, K)
(class_counts = [count(==(k), y) for k in 1:K], β_true = β_true)
(class_counts = [114, 79, 86, 121], β_true = 0.8)

The class counts should be roughly balanced because the cut points are symmetric around zero and the linear predictor has unit-variance support.

Fit

The latent vector is the single β slot — POM has no intercept, so the design matrix is just the n × 1 covariate column.

ℓ      = POMLikelihood(K)                # default γ = 3, logit link
c_β    = FixedEffects(1)
A      = sparse(reshape(x, n, 1))
model  = LatentGaussianModel(ℓ, (c_β,), A)
res    = inla(model, y; int_strategy = :grid)

fe = fixed_effects(model, res)
hp = hyperparameters(model, res)
(β_julia      = fe[1].mean,
 β_sd         = fe[1].sd,
 β_true       = β_true,
 log_marginal = log_marginal_likelihood(res))
(β_julia = 0.8529081226170822, β_sd = 0.1030875955520214, β_true = 0.8, log_marginal = -520.8181073537527)

The recovered β should land within ±0.1 of the true β = 0.8 at this n. The θ posterior carries the cut-point information; you can recover the cut points themselves by walking the parameterisation in reverse.

Recovering the cut points

θ̂  = [hp[k].mean for k in 1:(K - 1)]
α̂  = Vector{Float64}(undef, K - 1)
α̂[1] = θ̂[1]
for k in 2:(K - 1)
    α̂[k] = α̂[k - 1] + exp(θ̂[k])
end
(α̂ = α̂, α_true = collect(α_true))
(α̂ = [-1.040738006412375, -0.05291603042905435, 1.002382153089295], α_true = [-1.0, 0.0, 1.2])

α̂ should track α_true = (-1.0, 0.0, 1.2) to within ~0.1 per component. The mode and 0.025/0.975 quantiles for each θ slot live on hp[k] if you want credible-interval bands on the cut points directly.

Choosing dirichlet_concentration

The default γ = 3 is a mild preference for balanced classes and matches R-INLA's default. Tune it via the keyword argument:

ℓ_uniform = POMLikelihood(K; dirichlet_concentration = 1.0)
ℓ_strong  = POMLikelihood(K; dirichlet_concentration = 10.0)
(uniform = ℓ_uniform.dirichlet_concentration,
 default = POMLikelihood(K).dirichlet_concentration,
 strong  = ℓ_strong.dirichlet_concentration)
(uniform = 1.0, default = 3.0, strong = 10.0)
  • γ = 1 is uniform on the simplex — every class-probability vector on Δ^{K−1} is equally likely a priori.
  • γ < 1 concentrates mass on the boundary of the simplex — i.e. prior favouring unbalanced classes (some classes near-empty). Use with care; the small-class data tends to be the limiting evidence for ordinal cut-point identifiability.
  • γ > 3 increasingly pulls toward the balanced corner. With γ very large, the prior dominates the data and cut points sit near F^{-1}(k/K) regardless of y.

In practice the default works well unless you have strong external information about expected class balance.

Marginal log-likelihood — note on R-INLA agreement

POMLikelihood's log_hyperprior includes the full Jacobian of the θ → α → π chain (Σ log f(α_k) + Σ_{k≥2} θ_k), making the prior mathematically exact on the internal scale. R-INLA's documentation (see inla.doc("pom")) records that R-INLA's internal Dirichlet prior is "correct only up to a multiplicative constant due to a missing correction in the log-Jacobian for the sum-to-zero constraint."

The practical consequence: every posterior moment of α, β, and θ matches between Julia and R-INLA, but the absolute marginal log-likelihood reported by the two tools differs by a fixed θ-independent additive constant. Don't compare absolute mlik values across the two engines — use one engine throughout when ranking models by mlik. Within either engine, mlik differences between candidate POM models are directly comparable.

Notes on extending the model

  • Random effects. Add IID, BYM2, etc. components alongside FixedEffects — POM's projector is just the linear predictor, and any latent component composes. The cut points are likelihood hyperparameters and are independent of the latent structure.
  • Continuation-ratio / adjacent-category links. Currently only the cumulative-logit link is supported, mirroring R-INLA's family = "pom". Cumulative probit can be added later via dispatch on the link argument; the closed forms in pom.jl are specific to logistic-CDF arithmetic.
  • Boundaries. When y is uniformly the lowest class, the gradient collapses to ∂η log F(α_1 − η) = −(1 − F(α_1 − η)) — identical to a binary logistic regression with intercept α_1 and slope −1 on η. Same applies symmetrically at the highest class.

The regression test test/regression/test_likelihoods.jl covers the closed-form derivative and Jacobian math; the R-INLA oracle test/oracle/test_synthetic_pom.jl locks the posterior moments against R-INLA on a synthetic ordinal fixture.