Foundations of Constraint Processing CSCE421/821, Spring 2008:

Evaluation of (Deterministic) BT Search Algorithms Foundations of Constraint Processing CSCE421/821, Spring 2008: www.cse.unl.edu/~choueiry/S08-421-821/ Berthe Y. Choueiry (Shu-we-ri) Avery Hall, Room 123B choueiry@cse.unl.edu Tel: +1(402)472-5444

Outline • Evaluation of (deterministic) BT search algorithms [Dechter, 6.6.2] • CSP parameters • Comparison criteria • Theoretical evaluations • Empirical evaluations

CSP parameters • Number of variables: n • Domain size: a,d • Constraint tightness: t = |forbidden tuples| / | all tuples | • Proportion of constraints (a.k.a., constraint density, constraint probability): p1 = e / emax, e is nbr of constraints

Comparison criteria • Number of nodes visited (#NV) • Every time you call label • Number of constraint check (#CC) • Every time you call check(i,j) • CPU time • Be as honest and consistent as possible • Number of Backtracks (#BT) • Every un-assignment of a variable in unlabel • Some specific criterion for assessing the quality of the improvement proposed Presentation of values: • Descriptive statistics of criterion: average, median, mode, max, min • (qualified) run-time distribution • Solution-quality distribution

Theoretical evaluations • Comparing NV and/or CC • Common assumptions: • for finding all solutions • static orderings

Empirical evaluation: data sets • Use real-world data (anecdotal evidence) • Use benchmarks • csplib.org • Solver competition benchmarks • Use randomly generated problems • Various models of random generators • Guaranteed with a solution • Uniform or structured

Empirical evaluations: random problems • Various models exist (use Model B) • Models A, B, C, E, F, etc. • Vary parameters: <n, a, t, p> • Number of variables: n • Domain size: a,d • Constraint tightness: t = |forbidden tuples| / | all tuples | • Proportion of constraints (a.k.a., constraint density, constraint probability): p1 = e / emax • Issues: • Uniformity • Difficulty (phase transition) • Solvability of instances (for incomplete search techniques)

Model B • Input: n, a, t, p1 • Generate n nodes • Generate a list of n.(n-1)/2 tuples of all combinations of 2 nodes • Choose e elements from above list as constraints to between the n nodes • If the graph is not connected, throw away, go back to step 4, else proceed • Generate a list of a2 tuples of all combinations of 2 values • For each constraint, choose randomly a number of tuples from the list to guarantee tightness t for the constraint

Phase transition [Cheeseman et al. ‘91] • Significant increase of cost around critical value • In CSPs, order parameter is constraint tightness & ratio • Algorithms compared around phase transition Mostly un-solvable problems Mostly solvable problems Cost of solving Order parameter Critical value of order parameter

Tests • Fix n, a, p1 and • Vary t in {0.1, 0.2, …,0.9} • Fix n, a, t and • Vary p1 in {0.1, 0.2, …,0.9} • For each data point (for each value of t/p1) • Generate (at least) 50 instances • Store all instances • Make measurements • #NV, #CC, CPU time, #messages, etc.

Comparing two algorithms A1 and A2 • Store all measurements in Excel • Use Excel, R, SAS, etc. for statistical measurements • Use the t-test, paired test • Comparingmeasurements • A1, A2 a significantly different • Comparing ln measurements • A1is significantly better than A2

t-test in Excel • Using ln values • p  ttest(array1,array2,tails,type) • tails=1 or 2 • type1 (paired) • t  tinv(p,df) • degree of freedom = #instances – 2

t-test with 95% confidence • One-tailed test • Interested in direction of change • When t > 1.645, A1 is larger than A2 • When t  -1.645, A2 is larger than A1 • When -1.645  t  1.645, A1 and A2 do not differ significantly • |t|=1.645 corresponds to p=0.05 for a one-tailed test • Two-tailed test • Although it tells direction, not as accurate as the one-tailed test • When t > 1.96, A1 is larger than A2 • When t  -1.96, A2 is larger than A1 • When -1.96  t  1.96, A1 and A2 do not differ significantly • |t|=1.96 corresponds to p=0.05 for a two-tailed test • p=0.05 is a US Supreme Court ruling: any statistical analysis needs to be significant at the 0.05 level to be admitted in court

Computing the 95% confidence interval • The t test can be used to test the equality of the means of two normal populations with unknown, but equal, variance. • We usually use the t-test • Assumptions • Normal distribution of data • Sampling distributions of the mean approaches a uniform distribution (holds when #instances  30) • Equality of variances Sampling distribution: distribution calculated from all possible samples of a given size drawn from a given population

Alternatives to the t test • To relax the normality assumption, a non-parametric alternative to the t test can be used, and the usual choices are: • for independent samples, the Mann-Whitney U test • for related samples, either the binomial test or the Wilcoxon signed-rank test • To test the equality of the means of more than two normal populations, an Analysis of Variance can be performed • To test the equality of the means of two normal populations with known variance, a Z-test can be performed

Alerts • For choosing the value of t in general, check http://www.socr.ucla.edu/Applets.dir/T-table.html • For a sound statistical analysis, consult the Help Desk of the Department of Statistics at UNL, held at least twice a week at Avery Hall. • Acknowledgments: Makram Geha, PhD candidate, Department of Statistics. All errors are mine..

Foundations of Constraint Processing CSCE421/821, Spring 2008: