Inverse Retrieval
The inverse module retrieves ice physical properties from observed albedo using emulator-powered optimisation. Given spectral or satellite-band observations, it estimates specific surface area (SSA), impurity concentrations, and optionally solar zenith angle, with uncertainty.
Quick start
from biosnicar.emulator import Emulator
from biosnicar.inverse import retrieve
emu = Emulator.load("data/emulators/glacier_ice_8_param_default.npz")
# Spectral retrieval (480-band observed albedo)
result = retrieve(
observed=measured_albedo,
parameters=["ssa", "glacier_algae"],
emulator=emu,
fixed_params={"direct": 1, "solzen": 50, "rho": 700},
)
print(result.summary())
print(f"SSA = {result.best_fit['ssa']:.2f} ± {result.uncertainty['ssa']:.2f} m²/kg")
Why SSA?
The spectral shape of ice albedo in the NIR is controlled by specific surface area (SSA), not by bubble radius (rds) or density (rho) individually. Many (rds, rho) combinations produce the same SSA and therefore nearly identical spectra. Retrieving them separately gives large errors:
| Parameter | Individual error | SSA error |
|---|---|---|
| rds | ~73% | — |
| rho | ~33% | — |
| SSA | — | ~5.5% |
Optimising directly in SSA space eliminates this degeneracy. The optimiser sees one well-constrained parameter instead of two degenerate ones.

Observation modes
Spectral mode (default)
Pass a 480-element array of spectral albedo (e.g. from a field spectrometer). This provides maximum information content and can constrain 4–5 parameters simultaneously.
result = retrieve(
observed=measured_albedo,
parameters=["ssa", "black_carbon", "glacier_algae", "dust"],
emulator=emu,
fixed_params={"direct": 1, "solzen": 50, "snow_algae": 0},
)Use wavelength_mask to exclude noisy bands (e.g. water vapour absorption):
import numpy as np
mask = np.ones(480, dtype=bool)
mask[117:122] = False # exclude 1.38 µm water vapour
mask[149:154] = False # exclude 1.87 µm water vapour
result = retrieve(observed=measured_albedo, ..., wavelength_mask=mask)Satellite band mode
Set platform and observed_band_names to work in band space. The emulator predicts the full spectrum internally, convolves to the platform’s bands, and compares:
import numpy as np
result = retrieve(
observed=np.array([0.82, 0.78, 0.75, 0.45, 0.03]),
parameters=["ssa", "black_carbon", "glacier_algae"],
emulator=emu,
platform="sentinel2",
observed_band_names=["B2", "B3", "B4", "B8", "B11"],
obs_uncertainty=np.array([0.02, 0.02, 0.02, 0.03, 0.05]),
fixed_params={"direct": 1, "solzen": 50, "dust": 1000, "snow_algae": 0},
)Supported platforms: sentinel2, sentinel3, landsat8, modis.
Important: Band mode provides less spectral information (4–5 values vs 480). Limit free parameters to 2–3 and fix well-constrained parameters via
fixed_params.
Optimisation methods
| Method | Use case | Typical time |
|---|---|---|
L-BFGS-B (default) | Fast hybrid: DE pre-search + gradient polishing | ~0.5 s |
Nelder-Mead | Derivative-free, robust to noisy cost surfaces | ~1 s |
differential_evolution | Global search for strongly multimodal problems | ~3 s |
mcmc | Full Bayesian posterior distributions | ~30 s |
The default L-BFGS-B method uses a two-phase strategy: a quick differential evolution pre-search finds the right basin of attraction, then gradient-based polishing converges to machine precision.
# MCMC for publication-quality uncertainty
result = retrieve(
observed=measured_albedo,
parameters=["ssa", "black_carbon", "glacier_algae"],
emulator=emu,
method="mcmc",
mcmc_walkers=32,
mcmc_steps=5000,
mcmc_burn=1000,
fixed_params={"direct": 1, "solzen": 50, "snow_algae": 0},
)Uncertainty estimation
Both methods below report 1-sigma uncertainties in result.uncertainty, accessible as:
print(f"SSA = {result.best_fit['ssa']:.2f} ± {result.uncertainty['ssa']:.2f} m²/kg")Hessian-based (default)
Applied automatically after L-BFGS-B, Nelder-Mead, and differential evolution. This is the Laplace approximation: the cost surface is assumed locally quadratic near the optimum, and the curvature of that quadratic (the Hessian matrix) encodes how tightly each parameter is constrained.
Concretely, BioSNICAR:
- Evaluates the cost function at small perturbations around the best fit using a 4-point central-difference stencil
- Assembles the Hessian matrix H (second derivatives of the cost with respect to each parameter pair)
- Inverts to get the approximate posterior covariance: C = H⁻¹
- Reports 1-sigma uncertainties as the square roots of the diagonal: σₖ = √|Cₖₖ|
This is the same mathematical framework as the optimal estimation approach widely used in atmospheric and cryospheric remote sensing (Rodgers, 2000; Bohn et al., 2021). It is computationally cheap — ~4n² additional emulator evaluations, negligible at microsecond speeds — and accurate when the cost surface is unimodal and approximately Gaussian near the minimum.
A singular Hessian (flat cost surface in some direction) produces infinite uncertainty, correctly flagging parameters that the observations cannot constrain.
When to trust it: For well-posed retrievals (2–4 parameters, good spectral coverage, moderate noise). The Hessian approximation tends to slightly underestimate uncertainty when the posterior is non-Gaussian (e.g., skewed or multimodal), but for most practical BioSNICAR retrievals the approximation is reliable.
MCMC posteriors
With method="mcmc", the full posterior distribution is explored using the affine-invariant ensemble sampler from emcee (Foreman-Mackey et al., 2013). Rather than assuming a shape for the posterior, MCMC draws samples from it directly.
BioSNICAR:
- Initialises an ensemble of walkers (default 32) in a tight Gaussian ball around the best-fit estimate
- Each walker proposes moves accepted or rejected based on the log-probability (uniform prior within bounds, chi-squared likelihood)
- After a burn-in period (default 500 steps), the remaining chain (default 2000 steps) samples the posterior
- Reports the posterior median as the point estimate and the posterior standard deviation as the 1-sigma uncertainty
result = retrieve(
observed=measured_albedo,
parameters=["ssa", "black_carbon", "glacier_algae"],
emulator=emu,
method="mcmc",
mcmc_walkers=32,
mcmc_steps=5000,
mcmc_burn=1000,
fixed_params={"direct": 1, "solzen": 50, "snow_algae": 0},
)
# Full chains are available for downstream analysis (corner plots, convergence checks)
chains = result.chains # shape: (n_steps, n_walkers, n_params)
print(f"Acceptance fraction: {result.acceptance_fraction:.2f}") # target ~0.2-0.5MCMC is slower (~30–60 s for a typical retrieval) but captures non-Gaussian posteriors, parameter correlations, and multimodality that the Hessian approximation misses. This matters for:
- Weakly constrained parameters (e.g. dust at low concentrations)
- Band-mode retrievals with few observations
- Publication-quality uncertainty reporting
MCMC-based inversion of radiative transfer models for snow and ice has been validated in the literature (Robledano et al., 2023; Dillon et al., 2025).
Requires
emcee>=3(pip install emcee).
Which should I use?
| Hessian (default) | MCMC | |
|---|---|---|
| Speed | < 1 s | 30–60 s |
| Assumptions | Gaussian posterior (quadratic cost surface) | None — samples the true posterior |
| Best for | Routine retrievals, well-constrained problems | Publication uncertainty, weakly constrained problems |
| Output | Point estimate + symmetric error bars | Full posterior distribution + chains |
For most users, the default Hessian uncertainty is sufficient. Switch to MCMC when you need to report uncertainties in a publication, when you suspect the posterior is non-Gaussian, or when you want to examine parameter correlations.
Regularization
Add prior information when parameters are weakly constrained by the observations. This adds a Gaussian penalty term to the cost function, pulling the retrieval towards the prior:
result = retrieve(
...,
regularization={"dust": (500, 200)}, # prior: 500 ± 200 ppb
)References
- Rodgers, C. D. (2000). Inverse Methods for Atmospheric Sounding: Theory and Practice. World Scientific Publishing.
- Tarantola, A. (2005). Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM.
- Bohn, N. et al. (2021). Optimal estimation of snow and ice surface parameters from imaging spectroscopy measurements. Remote Sensing of Environment, 264, 112613.
- Foreman-Mackey, D. et al. (2013). emcee: The MCMC Hammer. Publications of the Astronomical Society of the Pacific, 125(925), 306–312.
- Robledano, A. et al. (2023). Unraveling the optical shape of snow. Nature Communications, 14, 3955.
- Dillon, K. M. et al. (2025). Evaluating sensitivity of optical snow grain size retrievals to radiative transfer models, shape parameters, and inversion techniques. The Cryosphere, 19, 2913–2935.
Fixed parameters
Use fixed_params to constrain known parameters and reduce the retrieval dimensionality:
result = retrieve(
...,
parameters=["ssa", "black_carbon", "glacier_algae"],
fixed_params={"direct": 1, "solzen": 53.2, "dust": 500, "snow_algae": 0},
)The direct parameter must be fixed — it’s a binary flag that cannot be continuously optimised.
Retrievable parameters
| Parameter | Physical meaning | Spectral sensitivity |
|---|---|---|
ssa | Specific surface area | Very high (NIR) |
rds | Bubble effective radius | High (NIR) |
rho | Ice density | Moderate |
solzen | Solar zenith angle | Moderate |
black_carbon | Black carbon | High (VIS) |
snow_algae | Snow algae | High (VIS) |
glacier_algae | Glacier algae | High (VIS) |
dust | Mineral dust | Low (see note) |
Dust caveat: Mineral dust has a very flat spectral signature at typical environmental concentrations. Dust retrievals are unreliable unless concentrations are very high (> ~10,000 ppb). Consider fixing dust via
fixed_params.
Known limitations
- Only surface-layer impurity concentrations are retrieved
- The default emulator assumes
layer_type=1(solid ice) — build a custom emulator for snow - Satellite reflectances are assumed to be surface reflectance (no atmospheric correction)
ssacannot appear alongsiderdsorrhoin the same retrieval