KDD Group Research Seminar Fall, 2001 – Presentation 2b of 11

KDD Group Research SeminarFall, 2001 – Presentation 2b of 11 Adaptive Importance Sampling on Bayesian Networks (AIS-BN) Friday, 05 October 2001 Julie A. Stilson http://www.cis.ksu.edu/~jas3466 Reference Cheng, J. and Druzdzel, M (2000). “AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks.” Journal of Artificial Intelligence Research, 13, 155-188.

Outline • Basic Algorithm • Definitions • Updating importance function • Example using Sprinkler-Rain • Why Adaptive Importance Sampling? • Heuristic initialization • Sampling with unlikely evidence • Different Importance Sampling Algorithms • Forward Sampling (FS) • Logic Sampling (LS) • Self-Importance Sampling (SIS) • Differences between SIS, AIS-BN • Gathering results • How RMSE values are collected • Sample results for FS, AIS-BN

Definitions • Importance Conditional Probability Tables (ICPTs) • Probability tables that represent the learned importance function • Initially, equal to the CPTs • Updated after each updating interval (see below) • Learning Rate • The rate at which the true importance function is being learned • Learning rate = a (b / a) ^ (k / kmax) • A = initial learning rate, b = learning rate in last step, k = number of updates that have been made, kmax = total number of updates that will be made • Frequency Table • Stores the frequency with which each instantiation of each query node occurs • Used to update importance function • Updating Interval • AIS-BN updates the importance function after this many samples • If 1000 total samples are to be taken, and the updating interval is 100, then 10 total updates will be made

Basic Algorithm k := number of updates so far , m := desired number of samples , l := updating interval for (int i = 1, i <= m, i++) { if (i mod l == 0) { k++; Update importance function Pr^k(X\E) based on total samples } generate a sample according to Pr^k(X\E), add to total samples totalweight += Pr(s,e) / Pr^k(s) } totalweight = 0; T = null; for (int i = 1; i <= m, i++) generate a sample according to Pr^kmax(X\E), add to total samples totalweight += Pr(s,e) / Pr^kmax(s) compute RMSE value of s using totalweight }

Updating Importance Function • Theorem: Xi in X, Xi not in Anc(E) => Pr(Xi | Pa(Xi), E) = Pr(Xi | Pa(Xi)) • Proved using d-connectivity • Only ancestors of evidence nodes need to have their importance function learned • The ICPT tables of all other nodes do not change throughout sampling • Algorithm for Updating Importance Function : Sample l points independently according to the current importance function, Pr^k(X\E) For every query node Xi that is an ancestor to evidence, estimate Pr’(xi | pa(Xi), e) based on the samples Update Pr^k(X\E) according to the following formula: Pr^(k+1)(xi | pa(Xi), e) = Pr^k(xi | pa(Xi), e) + LRate * (Pr’(xi | pa(Xi), e) – Pr^k(xi | pa(Xi), e)

Sprinkler: On, Off Ground: Wet, Dry Cloudy: Yes No S C G R Rain: Yes, No Example Using Sprinkler-Rain • Imagine Ground is evidence – instantiated to Wet • More probable that Sprinkler is on and that it is raining • ICPT tables update the probabilities of the ancestors to evidence nodes to reflect this

Why Adaptive Importance Sampling? • Heuristic Initialization: Parents to Evidence Nodes • Changes the probabilities of the parents to evidence to a uniform distribution when the probability of that evidence is sufficiently small • Parents of evidence nodes are most affected by the instantiation of evidence • Uniform distribution helps importance function be learned faster • Heuristic Initialization: Extremely Small Probabilities • Extremely low probabilities would usually not be sampled much • Slow to learn true importance function • AIS-BN raises extremely low probabilities to a set threshold and lowers extremely high probabilities accordingly • Sampling with Unlikely Evidence • Importance function very different from CPTs with unlikely evidence • Difficult to accurately sample without changing probability distributions • AIS-BN performs better than other sampling algorithms with unlikely evidence

Different Importance Sampling Algorithms • Forward Sampling / Likelihood Weighting (FS) • Similar to AIS-BN, but importance function is not learned • Performs well under most circumstances • Doesn’t do well when evidence is unlikely • Logic Sampling (LS) • Network is sampled randomly without regard to evidence • Samples that don’t match evidence are then discarded • Simplest importance sampling algorithm • Also performs poorly with unlikely evidence • Inefficient when many nodes are evidence • Self-Importance Sampling (SIS) • Also updates an importance function • Does not obtain samples from learned importance function • Updates to importance function do not use sampling information • For large numbers of samples, performs worse than FS

Gathering Results • Relative Root Mean Square Error : • P(i) is exact probability of sample • P^(i) is estimated probability of sample from frequency table • M:= arity, T:= number of samples • RMSE Collection • Relative RMSE computed for each sample • Each RMSE value is stored in an output file: printings.txt • Graphing Results • Open output file in Excel • Graph results using “Chart” • Example Chart • ALARM network, 10000 samples • Compares FS, AIS-BN

KDD Group Research Seminar Fall, 2001 – Presentation 2b of 11

KDD Group Research Seminar Fall, 2001 – Presentation 2b of 11

Presentation Transcript