Non-Informative Priors Via Sieves and Packing Numbers

Non-Informative Priors Via Sieves and Packing Numbers -S.Ghoshal -J.K.Ghosh -R.V.Ramamoorthi (presented by Priyam Das)

Abstract: In this paper, methods for the construction of a non-informative prior have been shown using the uniform distributions on approximating sieves. In parametric families satisfying regularity conditions, it is shown that Jeffreys’ Prior is obtained. The case with nuisance parameters is also considered. In case of infinite dimensional situation, it has been shown that such a prior leads to consistent posterior.

Definition of Probability using Packing Number Suppose K be a compact metric space with a metric ρ. A finite subset S of K is called ϵ-dispersed if ρ(x,y)≥ ϵ for all x≠y. A maximal ϵ-dispersed set is called an ϵ-net. An ϵ-net with maximum possible cardinality is called ϵ-lattice and its cardinality is called Packing Number of K and is denoted by D(ϵ,K) or D(ϵ,K,ρ). Since K is totally bounded, D(ϵ,K) is finite.

We define ϵ-probability on on K by (X)= XK • We note the following points: • 0(X) • ()=0 • (K)=1 • (X)=(X)+(Y) if X,Y K are separated at least by

For an ϵ-lattice can be thought as an approximate uniform distribution on K. Definition of Uniformizable:In the above case, if all the sub-sequential limits are the same, then K is called uniformizable and the common point is called the uniform probability on K. Dembski’s Theorem:Let (K,ρ) be a compact metric space. Then the following assertions hold: (i) If K is uniformizable with uniform probability then (X)= (X) for all X with (X) =0. (ii) If (X) exists on some convergence determining class in K, then K is uniformizable.

Jeffrey’s Prior Let ‘s be i.i.d with density f(.;) (with respect to a -finite measure ) where Ө and Ө is an open subset of . We assume that {f(.;) : Ө} is an regular parametric family, i.e., there exists Ѱ(.;) (d-fold product of such that for any compact KӨ, As 0. Define the Fisher information by the relation I()=4 And we assume that I() is positive definite and the map I() is continuous. We further assume on every compact subset KӨ, Inf {: , K,|||| ≥ ϵ} > 0, ϵ > 0

For i.i.d. observations, it is natural to equip Ө with the Hellinger distance defined by H(,) = Main Result: Fix a compact subset K of Ө. Then for all QK with Vol(Q)=0, we have = Now we see the L.H.S is nothing but equals to (Q) and under certain conditions, we know by Dembski’s (Q)= (Q)

Thus by using the Main Result we can see that the Jeffrey’s measure on Ө is defined by (Q) , Q Ө Is the non-informative prior on Ө. Main Idea of the Proof: It can be shown that m‖-‖ ≤ H(,)≤M‖-‖ for ,∊ K Now cover K by J cubes of length . In each cube consider the interior cube with length -. Now, given , choose 0 s.t./m . So any two interior cubes are separated at least by /m and hence by in terms of Hellinger’s distance.

For Q ⊂ K, let be the intersection of Q with the j-th cube and be the intersection with the j-th interior cube, j=1,2…., J Thus clearly, ……=Q…… Hence, ;H) D(,Q;H) ;H) For Q=K, we have, ;H) D(,K;H) ;H) Where and are analogously defined. From the j-th cube, choose ∊ K. Then for all , in j-th cube it can be shown, H( , ) Where and tend to 1 as tends to 0.

Let = and = So from previous equation we have, H( , ) Hence we conclude, D(;H) D(; ) and D(;H) D(; ) Also it can be shown as goes to 0 D(; )vol() D(; )vol() Where , are constants.

and Now let . n By the convergence of Riemann sums, and And similarly for the sums involving ’s and s. Also we know and tend to 1 as tends to 0. Thus the desired result follows.

Case with Nuisance Parameter in Compact Space: We now consider the case when there is a nuisance parameter. Let be the parameter of interest and be the nuisance parameter and we also assume that both are real valued. We can write the information matrix as Where =. Due to the previous theorem, it is natural to put the prior for given (|)= Now we need to construct a non-informative marginal prior for .

First, let us assume that the parameter space is compact. For n iid observations, the joint density of the observations given only is given by, g(;)= Where c()= is the constant of normalization. Let (;g) denote the information for the family {g(;) : } Under adequate regularity conditions, it can be shown that the information per observation (;g) /n satisfies == J() Where =-/ .

Let (,+h) be the Hellinger distance between g(;) and g(;). Locally, as h0, (,+h) behaves like . Hence by the previous theorem, the non-informative prior for would be proportional to . Now in the equation == J() Passing to the limit as , the marginal non-informative prior for should be taken to be proportional to and so the prior for is proportional to (|).

Case with Nuisance Parameter in non-Compact Space: We now consider the case when there is a nuisance parameter in a non-compact parameter space. In this scenario, we fix a sequence of compact sets increasing to the whole parameter space. Put ()={:} and normalize (|) on () as, (|)=(|) I{() } Where =. The marginal non-informative prior for at stage l is then defined as, ()=

Let be a fixed value of . The non-informative prior is finally defined by

Theorem for Infinite Dimensional Case : Let be a family of densities where , metrized by the Hellinger distance, is compact. Let be a positive sequence satisfying . Let be an -net in , be the uniform distribution on and be the probability on defined by = where ’s are positive numbers adding upto unity. If for any >0 = Then the posterior distribution based on the prior and i.i.d. observations , ... Is consistent at every

In other words it is equivalent to show the following. Fix a and a neighborhood U of . Let stand for the probability corresponding to the density . We need to show that 0 a.s.[]

Example: Consider ={f= : g[0,1], , , j=1…r | - | } where r is a positive integer. 0 m 1 and ’s are fixed constants. In that case, choosing properly and constructing hierarchical prior by the last theorem leads to consistent posterior.

THANKS for watching

Non-Informative Priors Via Sieves and Packing Numbers

Non-Informative Priors Via Sieves and Packing Numbers

Presentation Transcript

Lecture 11. Bayesian Regression with conjugate and non-conjugate priors

Non-negative matrix factorization with Gaussian process priors

Rapid Protein Side-Chain Packing via Tree Decomposition

Patch-based Image Deconvolution via Joint Modeling of Sparse Priors

Non-Informative Dirichlet Score for learning Bayesian networks

Mixed Non-Rectangular Block Packing for Non-Manhattan Layout Architectures

Interconnect Estimation without Packing via ACG Floorplans

Non-standard physics and user-defined priors in GLoBES

Image Reconstruction and Image Priors

Upper Limits and Priors

SCALING AND NON-DIMENSIONAL NUMBERS

Non-Negative Rational Numbers

Rapid Protein Side-Chain Packing via Tree Decomposition

Image reconstruction and Image Priors

Using Informative Priors to Enhance Wisdom in Small Crowds

Compare And Order Non-rational numbers

Carbon Molecular Sieves Working

Rapid Protein Side-Chain Packing via Tree Decomposition

About priors

Non-standard physics and user-defined priors in GLoBES

SCALING AND NON-DIMENSIONAL NUMBERS

Industrial Sieves Machine