1 / 16

Foundations of Constraint Processing CSCE421/821, Spring 2008:

Evaluation of (Deterministic) BT Search Algorithms. Foundations of Constraint Processing CSCE421/821, Spring 2008: www.cse.unl.edu/~choueiry/S08-421-821/ Berthe Y. Choueiry (Shu-we-ri) Avery Hall, Room 123B choueiry@cse.unl.edu Tel: +1(402)472-5444. Outline.

valiant
Download Presentation

Foundations of Constraint Processing CSCE421/821, Spring 2008:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation of (Deterministic) BT Search Algorithms Foundations of Constraint Processing CSCE421/821, Spring 2008: www.cse.unl.edu/~choueiry/S08-421-821/ Berthe Y. Choueiry (Shu-we-ri) Avery Hall, Room 123B choueiry@cse.unl.edu Tel: +1(402)472-5444

  2. Outline • Evaluation of (deterministic) BT search algorithms [Dechter, 6.6.2] • CSP parameters • Comparison criteria • Theoretical evaluations • Empirical evaluations

  3. CSP parameters • Number of variables: n • Domain size: a,d • Constraint tightness: t = |forbidden tuples| / | all tuples | • Proportion of constraints (a.k.a., constraint density, constraint probability): p1 = e / emax, e is nbr of constraints

  4. Comparison criteria • Number of nodes visited (#NV) • Every time you call label • Number of constraint check (#CC) • Every time you call check(i,j) • CPU time • Be as honest and consistent as possible • Number of Backtracks (#BT) • Every un-assignment of a variable in unlabel • Some specific criterion for assessing the quality of the improvement proposed Presentation of values: • Descriptive statistics of criterion: average, median, mode, max, min • (qualified) run-time distribution • Solution-quality distribution

  5. Theoretical evaluations • Comparing NV and/or CC • Common assumptions: • for finding all solutions • static orderings

  6. Empirical evaluation: data sets • Use real-world data (anecdotal evidence) • Use benchmarks • csplib.org • Solver competition benchmarks • Use randomly generated problems • Various models of random generators • Guaranteed with a solution • Uniform or structured

  7. Empirical evaluations: random problems • Various models exist (use Model B) • Models A, B, C, E, F, etc. • Vary parameters: <n, a, t, p> • Number of variables: n • Domain size: a,d • Constraint tightness: t = |forbidden tuples| / | all tuples | • Proportion of constraints (a.k.a., constraint density, constraint probability): p1 = e / emax • Issues: • Uniformity • Difficulty (phase transition) • Solvability of instances (for incomplete search techniques)

  8. Model B • Input: n, a, t, p1 • Generate n nodes • Generate a list of n.(n-1)/2 tuples of all combinations of 2 nodes • Choose e elements from above list as constraints to between the n nodes • If the graph is not connected, throw away, go back to step 4, else proceed • Generate a list of a2 tuples of all combinations of 2 values • For each constraint, choose randomly a number of tuples from the list to guarantee tightness t for the constraint

  9. Phase transition [Cheeseman et al. ‘91] • Significant increase of cost around critical value • In CSPs, order parameter is constraint tightness & ratio • Algorithms compared around phase transition Mostly un-solvable problems Mostly solvable problems Cost of solving Order parameter Critical value of order parameter

  10. Tests • Fix n, a, p1 and • Vary t in {0.1, 0.2, …,0.9} • Fix n, a, t and • Vary p1 in {0.1, 0.2, …,0.9} • For each data point (for each value of t/p1) • Generate (at least) 50 instances • Store all instances • Make measurements • #NV, #CC, CPU time, #messages, etc.

  11. Comparing two algorithms A1 and A2 • Store all measurements in Excel • Use Excel, R, SAS, etc. for statistical measurements • Use the t-test, paired test • Comparingmeasurements • A1, A2 a significantly different • Comparing ln measurements • A1is significantly better than A2

  12. t-test in Excel • Using ln values • p  ttest(array1,array2,tails,type) • tails=1 or 2 • type1 (paired) • t  tinv(p,df) • degree of freedom = #instances – 2

  13. t-test with 95% confidence • One-tailed test • Interested in direction of change • When t > 1.645, A1 is larger than A2 • When t  -1.645, A2 is larger than A1 • When -1.645  t  1.645, A1 and A2 do not differ significantly • |t|=1.645 corresponds to p=0.05 for a one-tailed test • Two-tailed test • Although it tells direction, not as accurate as the one-tailed test • When t > 1.96, A1 is larger than A2 • When t  -1.96, A2 is larger than A1 • When -1.96  t  1.96, A1 and A2 do not differ significantly • |t|=1.96 corresponds to p=0.05 for a two-tailed test • p=0.05 is a US Supreme Court ruling: any statistical analysis needs to be significant at the 0.05 level to be admitted in court

  14. Computing the 95% confidence interval • The t test can be used to test the equality of the means of two normal populations with unknown, but equal, variance. • We usually use the t-test • Assumptions • Normal distribution of data • Sampling distributions of the mean approaches a uniform distribution (holds when #instances  30) • Equality of variances Sampling distribution: distribution calculated from all possible samples of a given size drawn from a given population

  15. Alternatives to the t test • To relax the normality assumption, a non-parametric alternative to the t test can be used, and the usual choices are: • for independent samples, the Mann-Whitney U test • for related samples, either the binomial test or the Wilcoxon signed-rank test • To test the equality of the means of more than two normal populations, an Analysis of Variance can be performed • To test the equality of the means of two normal populations with known variance, a Z-test can be performed

  16. Alerts • For choosing the value of t in general, check http://www.socr.ucla.edu/Applets.dir/T-table.html • For a sound statistical analysis, consult the Help Desk of the Department of Statistics at UNL, held at least twice a week at Avery Hall. • Acknowledgments: Makram Geha, PhD candidate, Department of Statistics. All errors are mine..

More Related