Areal — Scotland lip-cancer BYM2

The Scotland lip-cancer dataset (Clayton & Kaldor 1987; Breslow & Clayton

is the standard areal-disease-mapping example: 56 districts, observed

counts y_i, expected counts E_i from age-stratification, and a single covariate x_i (proportion of the workforce in agriculture, fishing, or forestry — "AFF"). The model is a Poisson regression with a BYM2 spatial random effect:

\[\begin{aligned} y_i &\sim \mathrm{Poisson}(E_i \exp(\eta_i)), \\ \eta_i &= \alpha + \beta\,x_i + b_i, \\ b &\sim \text{BYM2}(\tau, \phi; W), \end{aligned}\]

with the Riebler et al. (2016) BYM2 parameterisation and PC priors on τ and φ.

Loading the fixture

The Scotland BYM2 fixture lives under packages/LatentGaussianModels.jl/test/oracle/fixtures/scotland_bym2.jld2. It carries both the input data (y, E, x, W) and the R-INLA posterior summaries used as the validation oracle.

using JLD2, SparseArrays, LinearAlgebra
using GMRFs, LatentGaussianModels

const FIXTURE = joinpath(@__DIR__, "..", "..", "..",
    "packages", "LatentGaussianModels.jl",
    "test", "oracle", "fixtures", "scotland_bym2.jld2")

fx = jldopen(FIXTURE, "r") do f
    f["fixture"]
end

inp = fx["input"]
y   = Int.(inp["cases"])
E   = Float64.(inp["expected"])
x   = Float64.(inp["x"])
W   = inp["W"]
n   = length(y)

Building the model

Latent vector layout: x = [α; β; b; u]. The intercept and AFF slope go into η = A x directly; b is the per-region BYM2 effect; u is the BYM2 unstructured component, which is internally constrained and does not contribute to η directly (it shows up via b's scaled mixture).

ℓ      = PoissonLikelihood(; E = E)
c_int  = Intercept()
c_beta = FixedEffects(1)
c_bym2 = BYM2(GMRFGraph(W); hyperprior_prec = PCPrecision(1.0, 0.01))

A = sparse(hcat(
    ones(n),                        # intercept → α
    reshape(x, n, 1),               # AFF slope → β
    Matrix{Float64}(I, n, n),       # per-region pick of bᵢ
    zeros(n, n),                    # u does not enter η
))

model = LatentGaussianModel(ℓ, (c_int, c_beta, c_bym2), A)

Fitting

res = inla(model, y; int_strategy = :grid)

Posterior summaries

fixed_effects(model, res)

2-element Vector{@NamedTuple{name::String, mean::Float64, sd::Float64, lower::Float64, upper::Float64}}:
 (name = "Intercept[1]", mean = 0.13630188217356412, sd = 0.06725057335432606, lower = 0.004493180353150633, upper = 0.2681105839939776)
 (name = "FixedEffects[2]", mean = 0.3402699796003948, sd = 0.08804747486356884, lower = 0.16770009979897674, upper = 0.5128398594018129)

hyperparameters(model, res)

2-element Vector{@NamedTuple{name::String, mean::Float64, sd::Float64, lower::Float64, upper::Float64}}:
 (name = "BYM2[3][1]", mean = 1.4021607615377527, sd = 0.31179335987822615, lower = 0.7910570050650161, upper = 2.013264518010489)
 (name = "BYM2[3][2]", mean = 0.27066879110240943, sd = 0.8386534154930207, lower = -1.3730617001006076, upper = 1.9143992823054266)

log_marginal_likelihood(res)

-139.00324594831255

Comparing to R-INLA

The fixture's summary_fixed and summary_hyperpar carry the R-INLA posterior. Per plans/testing-strategy.md, the v0.1 oracle tolerances are 7% relative on fixed-effect means and 10% relative on τ. The full assertion suite lives in test/oracle/test_scotland_bym2.jl.

sf = fx["summary_fixed"]
rn = String.(sf["rownames"])
α_R = Float64(sf["mean"][findfirst(==("(Intercept)"), rn)])
β_R = Float64(sf["mean"][findfirst(==("x"), rn)])

fe = fixed_effects(model, res)
(α_julia = fe[1].mean, α_R = α_R, β_julia = fe[2].mean, β_R = β_R)

(α_julia = 0.13630188217356412, α_R = 0.07776027428876284, β_julia = 0.3402699796003948, β_R = 0.3181822705097178)

The marginal log-likelihood gap on Scotland (K = 4 connected components) currently sits around 5 nats / 3.4% — likely the global Sørbye–Rue scaling vs R-INLA's per-component scaling (Freni-Sterrantino et al. 2018). Asserted via @test_broken so the suite surfaces a future fix automatically; Pennsylvania (single component) passes within the 1% tolerance.

Same model, written with `@lgm`

LGMFormula.jl supplies a Tier-2 formula macro that lowers to the same LatentGaussianModel(...) constructor. The Scotland BYM2 fit reads:

using LGMFormula

df = (cases = y, x = x, area = collect(1:n))
model_lgm = @lgm cases ~ 1 + x + f(area, BYM2(GMRFGraph(W); hyperprior_prec = PCPrecision(1.0, 0.01))) data=df family=PoissonLikelihood(; E = E)

The macro is sugar — @macroexpand @lgm(...) shows the same LatentGaussianModel(family, (Intercept(), FixedEffects(1), BYM2(...)), A) constructor call we built explicitly above. The migration guide walks through the syntax in detail.