Quality vs R-INLA

Side-by-side posterior summaries for the three R-INLA examples that ship as oracle fixtures: Scotland lip cancer (BYM2 areal), Pennsylvania lung cancer (BYM2 areal), and Meuse zinc (SPDE geostatistics). Each section refits the model in Julia and compares against the R-INLA posterior stored in the fixture.

The tolerance bands come from plans/testing-strategy.md: roughly 1–7% relative on fixed-effect means, 5–10% on hyperparameters, 2% on the marginal log-likelihood for connected-graph BYM2 and the SPDE oracle. Looser bands and known divergences are noted per dataset. The full assertion suites live in each package's test/oracle/.

Wall-clock numbers and a wider per-problem quality-error table are generated by bench/oracle_compare.jl and embedded below. The fragment is regenerated on demand; see the bench README for column meanings, host details, and how to refresh.

Auto-generated by bench/oracle_compare.jl. Last refreshed: 2026-05-02T19:39:32.215 UTC.

Quality (relative error vs R-INLA fixture)

problemnfixedmaxrelhyperparmaxrelmlik_relmlik_abs
scotland_bym2560.058540.052970.0084031.158
scotland_bym560.062660.98890.0074361.009
pennsylvania_bym2670.001730.088340.0028120.6501
synthetic_gamma2000.0013222.988e-50.034456.565
synthetic_seasonal482.867e-50.091670.031361.346
synthetic_generic0300.028970.0088040.4475
synthetic_generic1300.030790.012130.4479
synthetic_leroux808.198e-50.20320.0072750.4497
synthetic_nbinomial2000.0014040.10670.0040421.497
syntheticdisconnectedbesag120.044560.99730.39439.783
meuse_spde1550.0042110.057170.045814.736

Performance (wall-clock seconds, single thread)

problemnjuliainlasjuliaebsrinlasspeedup×vsr
scotland_bym2560.089590.74457.29381.4
scotland_bym560.10090.84051.92619.1
pennsylvania_bym2670.15440.2777.53148.8
synthetic_gamma2000.0044720.50171.808404.0
synthetic_seasonal480.082770.55142.00324.2
synthetic_generic0300.003220.51741.98615.0
synthetic_generic1300.019990.51323.249163.0
synthetic_leroux800.019540.66412.056105.0
synthetic_nbinomial2000.01690.47071.84109.0
syntheticdisconnectedbesag120.0022670.60831.879829.0
meuse_spde1551.3152.0745.864.46

Roll-up

DatasetFixed effectsHyperparametersMarginal log-lik
Scotland BYM2within 7%within 10%within 1%
Pennsylvania BYM2within 5%within 10%within 1%
Meuse SPDEwithin 1%within 6%within 5%

Performance vs R-INLA

Wall-clock comparison on the same three datasets, single-threaded BLAS on both sides, runs a separate harness under benchmarks/. Methodology, version pins, and the full markdown table land in per-(date, architecture) files under benchmarks/results/ — the headline number per release is below; everything else (IQR, allocation footprint, hardware spec, R + R-INLA + Julia versions) is in the result file.

DatasetINLA.jl medianR-INLA medianSpeedupResult file
Scotland BYM20.07s28.00s392×2026-05-07_apple_aarch64.md
Pennsylvania BYM20.11s28.36s247×(same)
Meuse SPDE1.18s5.76s(same)

The Scotland and Pennsylvania speedups are dominated by R-INLA's per-fit C-binary startup overhead — a constant ~25–30s load time on this hardware that dwarfs the actual fit on small areal-Poisson problems. The Meuse comparison reflects the genuine factorisation work on a 2,000-vertex SPDE mesh, where the gap is closer to a single decimal order. Multi-threaded numbers — once the GMRFsPardiso.jl factorisation backend lands as a weakdep — will ship as a separate table in future result files. To reproduce:

julia --project=benchmarks -e 'using Pkg; Pkg.instantiate()'
julia --project=benchmarks benchmarks/run.jl

Helpers (hidden)

Scotland BYM2

Areal Poisson disease mapping; 56 districts; intercept + AFF covariate + BYM2 spatial random effect; PC priors throughout. See the Scotland vignette.

fx = jldopen(joinpath(REPO_ROOT, "packages", "LatentGaussianModels.jl",
    "test", "oracle", "fixtures", "scotland_bym2.jld2"), "r") do f
    f["fixture"]
end
inp = fx["input"]
y, E, x, W = Int.(inp["cases"]), Float64.(inp["expected"]),
             Float64.(inp["x"]), inp["W"]
n = length(y)

ℓ      = PoissonLikelihood(; E = E)
c_int  = Intercept()
c_beta = FixedEffects(1)
c_bym2 = BYM2(GMRFGraph(W); hyperprior_prec = PCPrecision(1.0, 0.01))
A = sparse(hcat(ones(n), reshape(x, n, 1),
                Matrix{Float64}(I, n, n), zeros(n, n)))
model = LatentGaussianModel(ℓ, (c_int, c_beta, c_bym2), A)
res   = inla(model, y; int_strategy = :grid)

fe   = fixed_effects(model, res)
α_R  = _row_lookup(fx["summary_fixed"], "(Intercept)", "mean")
β_R  = _row_lookup(fx["summary_fixed"], "x", "mean")
τ_R  = _row_lookup(fx["summary_hyperpar"], "Precision for region", "mean")
φ_R  = _row_lookup(fx["summary_hyperpar"], "Phi for region", "mean")
mlik_R = Float64(fx["mlik"][1])

_comp_md([
    ("α (intercept) mean",  fe[1].mean,                  α_R),
    ("β (AFF) mean",        fe[2].mean,                  β_R),
    ("τ_b mean",            exp(res.θ̂[1]),               τ_R),
    ("φ mean",              1 / (1 + exp(-res.θ̂[2])),    φ_R),
    ("log marginal lik",    log_marginal_likelihood(res), mlik_R),
])
QuantityJuliaR-INLARel. error
α (intercept) mean0.13630.077765.85%
β (AFF) mean0.340270.318182.21%
τ_b mean4.0644.29135.3%
φ mean0.567260.7443217.7%
log marginal lik-139.0-137.840.84%

Status. Fixed effects pass within 7% relative; τ_b within 10%; φ is weakly identified (loose comparison). The marginal log-lik matches R-INLA within 1% — the historical ~3.4% gap on this K = 4 disconnected graph closed via the improper-by-default Intercept prior (matching R-INLA's prec.intercept = 0) and the BYM2 log-normalising-constant pieces wired through inla.c's extra() form for F_BYM. Per-component Sørbye–Rue scaling (Freni-Sterrantino et al. 2018) is implemented and active.

Pennsylvania BYM2

Same model class as Scotland but with a single connected graph (K = 1), 67 counties, smoking-rate covariate. See test/oracle/test_pennsylvania_bym2.jl.

fx = jldopen(joinpath(REPO_ROOT, "packages", "LatentGaussianModels.jl",
    "test", "oracle", "fixtures", "pennsylvania_bym2.jld2"), "r") do f
    f["fixture"]
end
inp = fx["input"]
y, E, x, W = Int.(inp["cases"]), Float64.(inp["expected"]),
             Float64.(inp["x"]), inp["W"]
n = length(y)

ℓ      = PoissonLikelihood(; E = E)
c_int  = Intercept()
c_beta = FixedEffects(1)
c_bym2 = BYM2(GMRFGraph(W); hyperprior_prec = PCPrecision(1.0, 0.01))
A = sparse(hcat(ones(n), reshape(x, n, 1),
                Matrix{Float64}(I, n, n), zeros(n, n)))
model = LatentGaussianModel(ℓ, (c_int, c_beta, c_bym2), A)
res   = inla(model, y; int_strategy = :grid)

fe   = fixed_effects(model, res)
α_R  = _row_lookup(fx["summary_fixed"], "(Intercept)", "mean")
β_R  = _row_lookup(fx["summary_fixed"], "x", "mean")
τ_R  = _row_lookup(fx["summary_hyperpar"], "Precision for region", "mean")
mlik_R = Float64(fx["mlik"][1])

_comp_md([
    ("α (intercept) mean",  fe[1].mean,                   α_R),
    ("β (smoking) mean",    fe[2].mean,                   β_R),
    ("τ_b mean",            exp(res.θ̂[1]),                τ_R),
    ("log marginal lik",    log_marginal_likelihood(res), mlik_R),
])
QuantityJuliaR-INLARel. error
α (intercept) mean-0.049678-0.0514070.173%
β (smoking) mean0.0273140.0273820.00685%
τ_b mean131.74144.58.83%
log marginal lik-230.53-231.180.281%

Status. All headline summaries pass within 2% relative on the marginal log-likelihood, 5% on fixed effects, 10% on τ_b. This is the cleanest BYM2 oracle: the connected-graph case where the Sørbye–Rue scaling is unambiguous.

Meuse SPDE

Gaussian observation model on log-zinc; intercept + distance-to-river covariate + SPDE-Matérn (α = 2) random field; PC priors on (ρ, σ) and on noise precision. See the Meuse vignette.

fxt = load(joinpath(REPO_ROOT, "packages", "INLASPDE.jl",
    "test", "oracle", "fixtures", "meuse_spde.jld2"))["fixture"]

y       = Float64.(fxt["input"]["y"])
dist    = Float64.(fxt["input"]["dist"])
points  = fxt["mesh"]["loc"]::Matrix{Float64}
tv      = fxt["mesh"]["tv"]::Matrix{Int}
A_field = SparseMatrixCSC{Float64, Int}(fxt["A_field"])
n_obs = length(y)

spde = SPDE2(points, tv; α = 2,
    pc = PCMatern(; range_U = 0.5, range_α = 0.5,
                    sigma_U = 1.0, sigma_α = 0.5))

c_int  = Intercept(prec = 1.0e-3)
c_dist = FixedEffects(1; prec = 1.0e-3)
A = hcat(ones(n_obs, 1), reshape(dist, n_obs, 1), A_field)

ℓ     = GaussianLikelihood(hyperprior = PCPrecision(1.0, 0.01))
model = LatentGaussianModel(ℓ, (c_int, c_dist, spde), A)
res   = inla(model, y)

fe = fixed_effects(model, res)
α_R = _row_lookup(fxt["summary_fixed"], "intercept", "mean")
β_R = _row_lookup(fxt["summary_fixed"], "dist", "mean")
τ_R = _row_lookup(fxt["summary_hyperpar"],
                  "Precision for the Gaussian observations", "mean")
ρ_R = _row_lookup(fxt["summary_hyperpar"], "Range for field", "mean")
σ_R = _row_lookup(fxt["summary_hyperpar"], "Stdev for field", "mean")
mlik_R = Float64(fxt["mlik"][1])

ρ̂, σ̂ = spde_user_scale(spde, res.θ̂[2:3])

_comp_md([
    ("α (intercept) mean",  fe[1].mean,                   α_R),
    ("β (dist) mean",       fe[2].mean,                   β_R),
    ("τ_ε (noise prec.)",   exp(res.θ̂[1]),                τ_R),
    ("ρ (range)",           ρ̂,                            ρ_R),
    ("σ (stdev)",           σ̂,                            σ_R),
    ("log marginal lik",    log_marginal_likelihood(res), mlik_R),
])
QuantityJuliaR-INLARel. error
α (intercept) mean6.61676.61930.0393%
β (dist) mean-2.8234-2.81160.421%
τ_ε (noise prec.)15.38916.1034.44%
ρ (range)0.578690.613773.51%
σ (stdev)0.452670.461210.854%
log marginal lik-98.648-103.384.58%

Status. Fixed effects within 1% relative; (ρ, σ) within ~3.5%; noise precision within ~5%; marginal log-lik within ~1.3 nats. The SPDE oracle is the tightest of the three — Matérn covariance reproduction is well-conditioned on the Meuse mesh, and the joint PC-Matérn / PC-precision priors agree closely with R-INLA's defaults.

Source

This page is generated from docs/src/benchmarks/quality.md. The same fits run as oracle tests in each package's test/oracle/ suite — the docs page and the test suite use identical model code and fixtures, so any divergence between rendered numbers and CI status would be a bug.