Bayesian Earthquake Modeling Sarah Minson USGS Earthquake Science Center

Bayesian Earthquake ModelingSarah MinsonUSGS Earthquake Science Center Featuring contributions from… Mark Simons, James L. Beck, Junle Jiang, Francisco H. Ortega, AsafInbal, Susan E. Owen, and Anthony Sladen (Tohoku source model) John Langbein, Jessica Murray, and Joan Gomberg (real-time GPS)

Overview • Why you should consider using Bayesian analysis • How to use Bayesian methods with examples: • Bayesian epicenter location • Bayesian source model for the great 2011 Tohoku-Oki earthquake • Bayesian detection of changes in river discharge • Bayesian detection of slow slip events • Bayesian real-time earthquake slip models Numerical Numerical Analytical Analytical Hybrid

Why Bayesian analysis? • Most important answer: • Because geophysics is filled with under-determined inverse problems Landers earthquake

Slip inversions for 2011 Tohoku-oki

Consider: d=G*mIf problem is ill-posed, we can’t evaluate G-1 • Traditional optimization: • Regularized inverse picks one solution which fits data and regularization scheme • Choice of regularization is often based on convenience not physics • Choice of regularization can have enormous effects on resulting solution • Bayesian inference: • Solution includes ensemble of all plausible models • Bayesian solutions typically do not require evaluation of G-1 • Regularization not required

Embrace uncertainty • When faced with a lack of model resolution, the best answer to the inverse problem is to identify the ensemble of all plausible values for your model parameters • “Plausible” models are those that satisfy the data and which agree with your prior assumptions about the model parameters • The “prior” is what you believe the likely values of the model parameters are in the absence of data • i.e., The prior can be used to impose physics-based constraints Making explicit assumptions is a feature not a bug

Bayesian inference for inverse problems • Bayes’ theorem (1763): Posterior Prior Data Likelihood

Inverse problems without inversion Solution to inverse Constraints (if any) G not G-1 • Only the forward problem is evaluated! • Notice that the only matrix inverse is C-1 • Covariance matrices are symmetric positive definite • Inverse exists • Regularization is not required even for ill-conditioned problems

Why choose Bayesian analysis?

How to do Bayesian analysis • Two options: • You can choose whatever the heck priors and data likelihood you want and draw samples of P(θ|D) by Monte Carlo simulation • If you choose your priors wisely, you can get an analytical expression forP(θ|D) • One way to do this is with a conjugate prior: e.g., • P(D|θ) Gaussian & P(θ) Gaussian → P(θ|D) Gaussian • Option 1 may require a whole lotta computer power. Option 2 requires brain power and possible compromise.

Approach 1: Numerical Simulation • Works for just about any kind of model: • Can use any metric for data fit you like (L2 norm, Laplacian norm, etc.) • Can use any kind of model design (linear, non-linear, etc.) • Can use any kind of constraints on your parameters (linear, non-linear, etc.) • Solution is computed using Monte Carlo simulation • Due to “curse of dimensionality,” simulation may require infinite time…

Monte Carlo simulation • There are only a few PDFs (e.g., uniform distribution, Gaussian distribution) that you can simulate directly with a random number generator • Monte Carlo sampling of a generic posterior PDF typically looks like: • Draw a random realization of your model parameters from a PDF you can simulate directly. (We call this the proposal PDF.) • Decide whether to keep or reject the candidate model by whether it’s posterior probability (how well it fits the data and satisfies your model constraints) is higher than it’s probability in the proposal PDF • Go back to Step 1 again and again and again and again… • Note that not all candidate samples are accepted • Monte Carlo simulation can be extremely inefficient

Curse of Dimensionality • Monte Carlo simulation of PDFs requires drawing enough samples to fill the model space • Huge numbers of samples required for high-dimensional problems • One sample = One forward model evaluation • The total numerical cost is: • Number of samples X Time to compute forward model • If you have a large number of model parameters, you need a very fast forward model • If you have a slow forward model, you must have a small number of model parameters (low-dimensional) http://www.iro.umontreal.ca/~bengioy/yoshua_en/research.html

It gets worse (sort of) • As you go to higher dimensional spaces, most of the volume is in the corners • 7-D cube • Remember: Monte Carlo can be very inefficient • Inefficiency can increase in higher-dimensions http://yaroslavvb.blogspot.com/2006/05/curse-of-dimensionality-and-intuition.html

Don’t fear high-dimensional problems • Using efficient parallel sampling algorithms and Beowulf clusters, even very high-dimensional parameter spaces can be simulated

More on Monte Carlo simulation • All algorithms for Monte Carlo simulation of an unknown target PDF are similar: • Generate a candidate sample (θproposed) from a proposal PDF (usually Gaussian) • Decide whether to keep that sample • e.g., for the Metropolis algorithm (random walk): • Let r = P(θproposed|D) / P(θcurrent|D) • If r > 1, accept candidate sample • Otherwise draw u~U(0,1), a random sample from the uniform distribution between 0 and 1 • If r > u, accept candidate sample • Otherwise reject candidate sample • Note that candidates that are “worse” are occasionally accepted, ensuring that the sampler will not be trapped forever in local maxima

Metropolis algorithm (1953):A random walk through the parameter space • Choose a starting vector of model parameters, x(0), and proposal PDF, q(x) • Commonly, q(x(j))~N(x(j-1),Σ) (Gaussian random walk) • Repeat for j=1,2,...,N • Generate y from q(x(j)) and u from U(O, 1) • Calculate r = min[p(y|D)/p(x(j)|D), 1] • If u ≤ r • Set x(j)=y • Else • Set x(j)=x(j-1) • {x(1), x(2),…, x(N)} is a set of samples of the posterior PDF p(x|D) Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953), Equations of State Calculations by Fast Computing Machines, J. Chem. Phys., 21, 1087-1092.

Step-by-step example 1 Epicenter Location:A Toy Problem Solved Numerically

The steps: • Design a forward model to stick into data likelihood, P(D|θ) • Choose prior assumptions to define prior PDF, P(θ) • Use Monte Carlo sampling to generate lots of random models and eventually fill in posterior PDF, P(θ|D)

Epicenter location: inversion design • Data: arrival times (t) at stations located at (stx,sty) with uncertainty  • Model parameters: location of epicenter specified by θ=(hx,hy) • Forward model: t(i)pred(x,y)=[(stx(i)-hx)2+(sty(i)-hy)2]/6 km/s • Prior PDF: uniform between 0 and 10 km • Posterior PDF: • P(θ|D)=P(hx,hy|D) • P(hx,hy|D)exp{-(t-tpred)T(t-tpred)/22} * [1/100] if 0≤(hx,hy)≤10 • P(hx,hy|D)=0 otherwise Albert Tarantola http://www.ipgp.fr/~tarantola/Files/Professional/Teaching/CalTech/index.html

Epicenter location: running the Metropolis algorithm • Choose θ(0) (a vector containinginitial guess for hx and hy) to be (5 km, 5 km) • For j=1…50,000 • Generate candidate model θproposedfrom 2D Gaussian distribution q(θ(j))~N(θ(j-1),0.05I) • The first coordinate is a candidate value for hx, the second coordinate is a candidate value for hy • Calculate r = min[p(θproposed|D)/p(θ(j)|D), 1] • If u ≤ r • Set θ(j)=θproposed • Else • Set θ(j)= θ(j-1) • {θ(1),θ(2),…,θ(N)} is a set of samples of the posterior PDF p(D|θ)

Try it yourself at home! • This example can be recreated with a few lines in Matlab, Mathematica, or the like • The first thing you’ll discover is that probabilities are numerically difficult: • Rewrite everything using log-likelihoods: • ln p(D|θ)  -(t-tpred)T(t-tpred)/22 • ln p(θ) = -2 ln10 • ln p(θ|D) = ln p(D|θ) + ln p(θ) • r = min[exp{ln p(θproposed|D) – ln p(θ(j)|D)}, 1]

Just a few of the many sampling algorithms in existence • Rejection method: parallel but inefficient • Metropolis algorithm: more efficient, but MCMC (e.g., a random walk) is serial • Gibbs sampling: certain restrictions on types of PDFs that can be simulated • Various tempering, annealing, and transitioning algorithms • Build your own • Good reference for Metropolis and Rejection sampling: • Chib, S., and E. Greenberg (1995), Understanding the Metropolis-Hastings Algorithm, Amer. Stat., 49(4), 327-335.

Step-by-step example 2 Tohoku-Oki Rupture Model:The hit it with a big numerical hammer approach

Forward model • Define a finite fault mesh, hypocenter and elastic structure • For each patch, we solve for four parameters: • Strike-slip motion • Dip-slip motion • Slip duration of triangular STF • Rupture velocity • For a mesh with N patches, θ is a vector with 4*N elements • For GPS offsets, seafloor geodesy, and tsunami waveforms, data predictions are: • For 1-Hz kinematic GPS, data predictions are synthetic seismograms calculated by: • Scaling GFs for each patch by slip on that patch • Convolving with the appropriate STF • Time-shifting each waveform according to rupture velocity field • Summing all of the resulting per-patch synthetics Linear Non-linear

Prior PDF • Broad priors • What does the data resolve and what isn’t resolved? • Strike-slip prior is zero-mean Gaussian • On average, slip is thrust • Dip-slip prior is uniform ~ U(umin<0,) • All dip-slip motion is equally likely except large amounts of back-slip is forbidden • Priors on rise time and rupture velocity are uniform and broad

Sample • For i=1…Gazillion • Draw a random source model, θ • Assign a prior probability to that model, P(θ) • Run your forward model, G(θ) • Evaluate P(D| θ) • Evaluate the unnormalized posterior probability, P(θ|D)P(D|θ)P(θ) • This is a very high-dimensional problem with O(1,000) free parameters • “Curse of dimensionality” • Simulation executed in parallel on NASA Pleiades supercomputer using CATMIP algorithm (Minson et al., submitted to GJI) • Required evaluation of ~50 billion candidate rupture scenarios

Average posterior slip

Slip PDFs Slip

95% Credibility Interval OnSource-Time Function

Uncertainties on derived quantities

Epicenter location and Tohoku-oki source model were both straightforward inversions… • …But there’s a lot more that you can do with Bayesian analysis than inverting equations

Model class selection: which model design is best? • Denominator is the evidence in favor of model design, M • Denominator also called marginal likelihood • Can be shown to be theoretically excellent method for maximizing data fit while penalizing model complexity (over-fitting) Muto and Beck (2008), Bayesian updating and model class selection for hysteretic structural models using stochastic simulation, J. Vib. Control, 14(1–2), doi: 10.1177/1077546307079400

Model class selection: which degree of polynomial is the best model for the data? …But the evidence says that a cubic is best (linear and quadratic don’t fit the data enough, degree 4 and higher are over-fitting) The fit to the data increases with increasing polynomial degree (more parameters = better fits)… Tom Minka, MIT http://alumni.media.mit.edu/~tpminka/statlearn/demo/

Step-by-step example 3 Nile River Discharge:The ridiculously cheap analytical solution approach

Nile River Annual Discharge • Did the average annual discharge decrease? • If so, is it related to building the Aswan dam? Annual Discharge Year • Cobb, G. W. (1978), The Problem of the Nile: Conditional Solution to a Changepoint Problem, Biometrika, 65 (2), 243-251. • Denison, D. G. T., C. C. Holmes, B. K. Mallick, A. F. M. Smith (2002), Bayesian methods for nonlinear classification and regression, Wiley.

Bayesian changepoint detection • Changepoint: the time that at least one model parameter changes • e.g., Let’s fit a mean level to the data and see if the data require that the mean changes • Can use Bayes’ theorem to compute the probability of a changepoint as a function of time, P(changepoint=t|D) • Can also use Bayes’ theorem to assess significance of potential changepoints

Bayesian changepoint detection = Bayesian piecewise linear regression • Model: • Where G is model design matrix and  is matrix of model parameters for linear regression problem D=G • Goal: P(θ=t|D,M) as a function of t

Analytical solution (heuristically) • Let our data likelihood be a Gaussian distribution, P(D|θ)~N() • Let our prior distribution be a normal-inverse-gamma distribution, P(θ)~NIG(). (It’s the product of a Gaussian and an inverse-gamma distribution, and it’s a real thing.) • Then our posterior distribution, P(θ|D)P(D|θ)P(θ), is proportional to the product of a Gaussian with the product of a Gaussian and an inverse-gamma distribution…which is another normal-inverse-gamma distribution. • The evidence, P(D|M), is a constant given by P(D|θ)P(θ)/P(θ|D)

Analytical solution (mathematically) Evidence (marginal likelihood)

Evaluating the solution • For every time, t: • Calculate evidence associated with linear regression model for change in mean discharge at time, t: • Look at evidence as a function of time to see when, if ever, there was a change in discharge.

Annual Nile Discharge The Nile River & The Aswan Dam Evidence for Changepoint

Annual Nile Discharge The Nile River & The Aswan Dam Evidence for Changepoint 1898

Annual Nile Discharge The Nile River & The Aswan Dam Evidence for Changepoint 1898 Aswan 1902

Step-by-step example 4 Cascadia Slow-Slip Events:The ridiculously cheap analytical solution approach

Again: an analytical solution • This is exactly the same problem as the Nile discharge problem • Our forward model with be linear fit as a function of time and we will solve for a changepoint which could be a change in rate, a step (offset), or both

Hybrid Approaches • We are free to mix and match different Bayesian approaches • Consider a simultaneous inversion for fault slip and geometry: • Fault geometry inversion is non-linear • Inversion for slip on a fault given a fault geometry is linear • Why not solve the linear slip inversion analytically using a conjugate prior and solve the fault geometry inversion with Monte Carlo simulation?

Step-by-step example 5 Real-Time Finite Fault Slip Models:Wrapping a Monte Carlo Simulation Around An Analytical Solution

Bayesian Earthquake Modeling Sarah Minson USGS Earthquake Science Center

Bayesian Earthquake Modeling Sarah Minson USGS Earthquake Science Center

Presentation Transcript

Earthquake

Earthquake

EARTHQUAKE

Earthquake Modeling Innovations

EARTHQUAKE

Earthquake

Earthquake

Earthquake

Bayesian kinematic earthquake source models

The Seattle Report Sarah Minson USGS Seattle

Earthquake

Earthquake

EARTHQUAKE

Earthquake

Earthquake!

USGS National Earthquake Information Center, Golden, Colorado

Alaska Earthquake Information Center

EARTHQUAKE

earthquake

EARTHQUAKE

Earthquake

Bayesian Kinematic Earthquake Models Sarah Minson USGS Seattle