Multinomial regression — independent-Poisson reformulation
Multinomial-logit regression with K classes does not appear as a first-class likelihood in either R-INLA or this package. Both engines fit it through the Multinomial-Poisson trick (Baker 1994; Chen 1985): for row i with K-vector of counts Y_i ~ Multinomial(N_i, π_i) and per-class linear predictor η_ik = x_i^⊤ β_k, the likelihood factorises as
\[Y_{ik} \mid \alpha_i \;\stackrel{\text{ind.}}{\sim}\; \text{Poisson}(\lambda_{ik}), \qquad \lambda_{ik} = \exp(\alpha_i + x_i^\top \beta_k),\]
where α_i is a per-row nuisance intercept that absorbs the row-sum information N_i. Conditional on (α_i)_{i=1}^{n} set to a fixed-precision IID prior (R-INLA's recipe: prec = list(initial = -10, fixed = TRUE)), the posterior on the β coefficients matches the original multinomial regression bit-for-bit. Reference-class identifiability comes from dropping the reference-class block (corner-point parameterisation, Agresti 2010 §8.5): β_K = 0.
This vignette walks through synthetic recovery of β against the R-INLA oracle fixture under test/oracle/test_synthetic_multinomial.jl. The decision to use the independent-Poisson route as the default is recorded in ADR-024.
Building blocks
LatentGaussianModels.jl exposes two small helpers that turn an (n_rows, K) count matrix into the long-format Poisson layout:
multinomial_to_poisson— reshapesYinto the long-format triple(y, row_id, class_id)with row-major layoutidx = (i - 1) * K + k.multinomial_design_matrix— builds the class-specific covariate block of the long-format design matrix; drops the reference-class block to identify the model.
The per-row nuisance α_i is attached as an IID(n_rows; τ_init, fix_τ) component with fix_τ = true. The IID constructor's τ_init and fix_τ keyword arguments mirror R-INLA's hyper = list(prec = list(initial = ..., fixed = TRUE)) directly.
Synthetic recovery
using Random, SparseArrays, Distributions
using GMRFs, LatentGaussianModels
using LatentGaussianModels: PoissonLikelihood, FixedEffects, IID,
LatentGaussianModel, inla, component_range,
log_marginal_likelihood,
multinomial_to_poisson,
multinomial_design_matrix
rng = MersenneTwister(20260504)
n_rows = 200
K = 3
p = 1
N_trials = 5
β_true = (0.7, -0.4) # β_1, β_2; β_3 = 0 (reference)
x = randn(rng, n_rows)
# Sample multinomial counts row-by-row.
Y = zeros(Int, n_rows, K)
for i in 1:n_rows
η = (β_true[1] * x[i], β_true[2] * x[i], 0.0)
p_i = collect(exp.(η .- maximum(η))); p_i ./= sum(p_i)
Y[i, :] .= rand(rng, Multinomial(N_trials, p_i))
end
(class_totals = sum(Y, dims = 1), N_total = sum(Y))(class_totals = [355 326 319], N_total = 1000)Long-format layout + design matrix
helper = multinomial_to_poisson(Y)
X = reshape(x, n_rows, p)
A_β = multinomial_design_matrix(helper, X)
A_α = sparse(1:helper.n_long, helper.row_id, ones(helper.n_long),
helper.n_long, n_rows)
A = hcat(A_β, A_α)
size(A)(600, 202)A_β carries the (K - 1) * p = 2 β columns (corner-point: β3 is dropped); `Aα` is the indicator matrix that maps each long-format observation to its row's nuisance intercept.
Fit
ℓ = PoissonLikelihood()
c_β = FixedEffects((K - 1) * p)
c_α = IID(n_rows; τ_init = -10.0, fix_τ = true)
model = LatentGaussianModel(ℓ, (c_β, c_α), A)
res = inla(model, helper.y)
β_rng = component_range(model, 1)
β_mean = res.x_mean[β_rng]
β_sd = sqrt.(res.x_var[β_rng])
(β_julia = β_mean,
β_sd = β_sd,
β_true = collect(β_true),
log_marginal = log_marginal_likelihood(res))(β_julia = [0.7265149738833178, -0.2714911935363326], β_sd = [0.08736601682748245, 0.07764649004523384], β_true = [0.7, -0.4], log_marginal = -2032.8346991107192)The recovered β should sit within ~5% of R-INLA's posterior mean and its standard deviation within ~10% — the oracle test test/oracle/test_synthetic_multinomial.jl locks those tolerances against the R-INLA fixture.
Notes on the dim(θ) = 0 fast path
The model above has zero hyperparameters: PoissonLikelihood has none, the FixedEffects block has none, and IID(n_rows; τ_init = -10.0, fix_τ = true) has none (the precision is hard-coded). The INLA fast path detects n_hyperparameters(model) == 0 and skips both the θ-mode optimisation and the θ-grid integration — a single Laplace approximation at θ = Float64[] is the exact posterior, and the returned INLAResult carries an empty θ̂ and a single integration point with weight one.
This fast path is automatic — inla(model, y) does the right thing without any extra arguments.
Why this layout, not a built-in MultinomialLikelihood
ADR-024 records the trade-off in detail. In short:
- Independent-Poisson is R-INLA's default and matches its posterior bit-for-bit. The latent dimension grows by
n_rows(one nuisance intercept per row), but the per-rowα_iis a fixed-precision IID with one global precision — no extra hyperparameters. - Stick-breaking would parameterise
K - 1independent binary-logistic sub-models conditional on each class boundary. The latent dimension is smaller (no nuisance intercepts), but the posterior diverges from R-INLA's, and the per-classβ_kare not exchangeable across the boundary chain — a different statistical object.
The independent-Poisson route is the right default for users coming from R-INLA and for ranking against R-INLA oracle fixtures.
Extending the model
- Random effects. Add
IID,BYM2, etc. components alongside the class-specificFixedEffectsblock. Theαnuisance lives on its own component slot and composes with anything else. - Per-class covariates.
multinomial_design_matrixaccepts a shared(n_rows, p)covariate matrix; if you want different covariates per class, build the long-formatA_βblock by hand usinghelper.row_id/helper.class_id. - Choosing the reference class. Pass
reference_class = ktomultinomial_design_matrixto drop a non-default class. The fittedβthen refers to log-odds against classk. - Class names.
multinomial_to_poisson(Y; class_names = ...)attaches names to the helper for downstream use; the design-matrix layout is independent of the names.