Bayesian Methods and Gaussian Processes in Data Modeling

Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics

Overview • The Bayesian paradigm • Bayesian data modelling • Quantifying prior beliefs • Data modelling with Gaussian processes An Overview of State-of-the-Art Data Modelling 2

Bayesian methods The beginning, the subjectivist philosophy, and an overview of Bayesian techniques.

Subjective probability • Bayesian statistics involves a very different way of thinking about probability in comparison to classical inference. • The probability of a proposition is defined to a measure of a person’s degree of belief. • Wherever there is uncertainty, there is probability • This covers aleatory and epistemic uncertainty An Overview of State-of-the-Art Data Modelling 4

Differences with classical inference To a frequentist, data are repeatable, parameters are not: P(data|parameters) To a Bayesian, the parameters are uncertain, the observed data are not: P(parameters|data) An Overview of State-of-the-Art Data Modelling 5

Bayes’s theorem for distributions • This can be extended to continuous distributions: • In Bayesian statistics, we use Bayes’s theorem in a particular way: • In early probability courses, we are taught Bayes’s theorem for events: An Overview of State-of-the-Art Data Modelling 6

Prior to posterior updating Data Bayes’s theorem is used to update our beliefs. The posterior is proportional to the prior times the likelihood. Prior Posterior An Overview of State-of-the-Art Data Modelling 7

Posterior distribution • So, once we have our posterior, we have captured all our beliefs about the parameter of interest. • We can use this to do informal inference, i.e. intervals, summary statistics. • Formally, to make choices about the parameter, we must couple this with decision theory to calculate the optimal decision. An Overview of State-of-the-Art Data Modelling 8

Sequential updating Today’s posterior is tomorrow’s prior Prior beliefs Data Posterior beliefs More data Posterior beliefs An Overview of State-of-the-Art Data Modelling 9

The triplot • A triplot gives a graphical representation of prior to posterior updating. Prior Likelihood Posterior An Overview of State-of-the-Art Data Modelling 10

Audience participation Quantification of our prior beliefs • What proportion of people in this room are left handed? – call this parameter ψ • When I toss this coin, what’s the probability of me getting a tail? – call this θ An Overview of State-of-the-Art Data Modelling 11

A simple example • The archetypal example in probability theory is the outcome of tossing a coin. • Each toss of a coin is a Bernoulli trial with the probability of tails given by θ. • If we carry out 10 independent trials, we know the number of tails(X) will follow a binomial distribution. [X | θ~ Bi(10, θ)] An Overview of State-of-the-Art Data Modelling 12

Our prior distribution • A Beta(2,2) distribution may reflect our beliefs about θ. An Overview of State-of-the-Art Data Modelling 13

Our posterior distribution • If we observe X = 3, we get the following triplot: An Overview of State-of-the-Art Data Modelling 14

Our posterior distribution • If we are more convinced, a priori, that θ= 0.5 and we observe X = 3, we get the following triplot: An Overview of State-of-the-Art Data Modelling 15

Credible intervals • If asked to provide an interval in which there is a 90% chance of θlying, we can derive this directly from our posterior distribution. • Such an interval is called a credible interval. • In frequentist statistics, there are confidence intervals that cannot be interpreted in the same way. • In our example, using our first prior distribution, we can report a 95% posterior credible interval for θ of (0.14,0.62). An Overview of State-of-the-Art Data Modelling 16

Basic linear model • Yesterday we saw a lot of this: • We have a least squares solution given by • Instead of trying to find the optimal set of parameters, we express our beliefs about them. An Overview of State-of-the-Art Data Modelling 17

Basic linear model • By selecting appropriate priors for the two parameters, we can derive the posterior analytically. • It is a normal inverse-gamma distribution. • The mean of our posterior distribution is then which is a weighted average of the LSE and prior mean. An Overview of State-of-the-Art Data Modelling 18

Bayesian model comparison • Suppose we have two plausible models for a set of data, M and N say. • We can calculate posterior odds in favour of M using An Overview of State-of-the-Art Data Modelling 19

Bayesian model comparison • The Bayes factor is calculated using • A Bayes factor that is greater than one would mean that your odds in favour of M increase. • Bayes factors naturally help guard against too much model structure. An Overview of State-of-the-Art Data Modelling 20

Advantages/Disadvantages • Bayesian methods are often more complex than frequentist methods. • There is not much software to give scientists off-the-shelf analyses. • Subjectivity: all the inferences are based on somebody’s beliefs. An Overview of State-of-the-Art Data Modelling 21

Advantages/Disadvantages • Bayesian statistics offers a framework to deal with all the uncertainty. • Bayesians make use of more information – not just the data in their particular experiment. • The Bayesian paradigm is very flexible and it is able to tackle problems that frequentist techniques could not. • In selecting priors and likelihoods, Bayesians are showing their hands – they can’t get away with making arbitrary choices when it comes to inference. • … An Overview of State-of-the-Art Data Modelling 22

Summary • The basic principles of Bayesian statistics have been covered. • We have seen how we update our beliefs in the light of data. • Hopefully, I’ve convinced you that the Bayesian way is the right way. An Overview of State-of-the-Art Data Modelling 23

Priors Advice on choosing suitable prior distributions and eliciting their parameters.

Importance of priors • As we saw in the previous section, prior beliefs about uncertain parameters are a fundamental part of Bayesian statistics. • When we have few data about the parameter of interest, our prior beliefs dominate inference about that parameter. • In any application, effort should be made to model our prior beliefs accurately. An Overview of State-of-the-Art Data Modelling 25

Weak prior information • If we accept the subjective nature of Bayesian statistics and are not comfortable using subjective priors, then many have argued that we should try to specify prior distributions that represent no prior information. • These prior distributions are called noninformative, reference, ignorance or weak priors. • The idea is to have a completely flat prior distribution over all possible values of the parameter. • Unfortunately, this can lead to improper distributions being used. An Overview of State-of-the-Art Data Modelling 26

Weak prior information • In our coin tossing example, Be(1,1), Be(0.5,0.5) and Be(0,0) have been recommended as noninformative priors. Be(0,0) is improper. An Overview of State-of-the-Art Data Modelling 27

Conjugate priors • When we move away from noninformative priors, we might use priors that are in a convenient form. • That is a form where combining them with the likelihood produces a distribution from the same family. • In our example, the beta distribution is a conjugate prior for a binomial likelihood. An Overview of State-of-the-Art Data Modelling 28

Informative priors • An informative prior is an accurate representation of our prior beliefs. • We are not interested in the prior being part of some conjugate family. • An informative prior is essential when we have few or no data for the parameter of interest. • Elicitation, in this context, is the process of translating someone’s beliefs into a distribution. An Overview of State-of-the-Art Data Modelling 29

Elicitation • It is unrealistic to expect someone to be able to fully specify their beliefs in terms of a probability distribution. • Often, they are only able to report a few summaries of the distribution. • We usually work with medians, modes and percentiles. • Sometimes they are able to report means and variances, but there are more doubts about these values. An Overview of State-of-the-Art Data Modelling 30

Elicitation • Once we have some information about their beliefs, we fit some parametric distribution to them. • These distribution almost never fit the judgements precisely. • There are nonparametric techniques that can bypass this. • Feedback is essential in the elicitation process. An Overview of State-of-the-Art Data Modelling 31

Normal with unknown mean Noninformative prior: An Overview of State-of-the-Art Data Modelling 32

Normal with unknown mean Conjugate prior: An Overview of State-of-the-Art Data Modelling 33

Normal with unknown mean Proper prior: An Overview of State-of-the-Art Data Modelling 34

Structuring prior information • It is possible to structure our prior beliefs in a hierarchical manner: • Here is referred to as the hyperparameter(s). x Data model: First level of prior: Second level of prior: An Overview of State-of-the-Art Data Modelling 35

Structuring prior information • An example of this type of hierarchical is a nonparametric regression model. • We want to know about μ so the other parameters must be removed. The other parameters are known as nuisance parameters. Data model: First level of prior: Second level of prior: An Overview of State-of-the-Art Data Modelling 36

Analytical tractability • The more complexity that is built into your prior and likelihood the more likely it is that you won’t be able to derive your posterior analytically. • In the ’90’s, computational techniques were devised to combat this. • Markov chain Monte Carlo (MCMC) techniques allow us to access our posterior distributions even in complex models. An Overview of State-of-the-Art Data Modelling 37

Sensitivity analysis • It is clear that the elicitation of prior distributions is far from being a precise science. • A good Bayesian analysis will check that the conclusions are sufficiently robust to changes in the prior. • If they aren’t, we need more data or more agreement on the prior structure. An Overview of State-of-the-Art Data Modelling 38

Summary • Prior distributions are an important part of Bayesian statistics. • They are far from being ad hoc, pick-the-easiest-to-use distributions when modelled properly. • There are classes of noninformative priors that allow us to represent ignorance. An Overview of State-of-the-Art Data Modelling 39

Gaussian processes A Bayesian data modelling technique that fully accounts for uncertainty.

Data modelling: a fully probabilistic method • Bayesian statistics offers a framework to account for uncertainty in data modelling. • In this section, we’ll concentrate on regression using Gaussian processes and the associated Bayesian techniques An Overview of State-of-the-Art Data Modelling 41

The basic idea We have: or and are uncertain. In order to proceed, we must elicit our beliefs about these two. can be dealt with as in the previous section. An Overview of State-of-the-Art Data Modelling 42

Gaussian processes A process is Gaussianif and onlyif every finite sample from the process is a vector-valued Gaussian random variable. • We assume that f(.) follows a Gaussian process a priori. • That is: • i.e. any sample of f(x)’s will follow a MV-normal. An Overview of State-of-the-Art Data Modelling 43

Gaussian processes We have prior beliefs about the form of the underlying model. We observe/experiment to get data about the model with which we train our GP. We are left with our posterior beliefs about the model, which can have a ‘nice’ form. An Overview of State-of-the-Art Data Modelling 44

A simple example Warning: more audience participation coming up An Overview of State-of-the-Art Data Modelling 45

A simple example • Imagine we have data about some one dimensional phenomenon. • Also, we’ll assume that there is no observational error. • We’ll start with five data points between 0 and 4. • A priori, we believe is roughly linear and differentiable everywhere. An Overview of State-of-the-Art Data Modelling 46

A simple example An Overview of State-of-the-Art Data Modelling 47

A simple example with error • Now, we’ll start over and put some Gaussian error on the observations. • Note: in kriging, this is equivalent to adding a nugget effect. , An Overview of State-of-the-Art Data Modelling 50

Bayesian Methods and Gaussian Processes in Data Modeling