1 / 35

Automated Parameter Setting Based on Runtime Prediction:

Automated Parameter Setting Based on Runtime Prediction:. Towards an Instance-Aware Problem Solver. Frank Hutter, Univ. of British Columbia, Vancouver, Canada Youssef Hamadi, Microsoft Research, Cambridge, UK. Motivation(1): Why automated parameter setting ?.

yoshi-walls
Download Presentation

Automated Parameter Setting Based on Runtime Prediction:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Parameter Setting Based on Runtime Prediction: Towards an Instance-Aware Problem Solver Frank Hutter, Univ. of British Columbia, Vancouver, Canada Youssef Hamadi, Microsoft Research, Cambridge, UK

  2. Motivation(1): Why automated parameter setting ? • We want to use the best available heuristic for a problem • Strong domain-specific heuristics in tree search • Domain knowledge helps to pick good heuristics • But maybe you don‘t know the domain ahead of time ... • Local search parameters must be tuned • Performance depends crucially on parameter setting • New application/algorithm: • Restart parameter tuning from scratch • Waste of time both for researchers and practicioners • Comparability • Is algorithm A faster than algorithm B because they spent more time tuning it ?  Automated Parameter Setting

  3. Motivation(2): operational scenario • CP solver has to solve instances from a variety of domains • Domains not known a priori • Solver should automatically use best strategy for each instance • Want to learn from instances we solve Frank Hutter: Frank Hutter: Automated Parameter Setting

  4. Overview • Previous work on runtime prediction we base on[Leyton-Brown, Nudelman et al. ’02 & ’04] • Part I: Automated parameter setting based on runtime prediction • Part II: Incremental learning for runtime prediction in a priori unknown domains • Experiments • Conclusions Automated Parameter Setting

  5. Previous work on runtime prediction for algorithm selection • General approach • Portfolio of algorithms • For each instance, choose the algorithm that promises to be fastest • Examples • [Lobjois and Lemaître, AAAI’98] CSP • Mostly propagations of different complexity • [Leyton-Brown et al., CP’02] Combinatorial auctions • CPLEX + 2 other algorithms (which were thought incompetitive) • [Nudelman et al., CP’04] SAT • Many tree-search algorithms from last SAT competition • On average considerably faster than each single algorithm Automated Parameter Setting

  6. Runtime prediction: Basics (1 algorithm)[Leyton-Brown, Nudelman et al. ’02 & ’04] • Training: Given a set of t instances z1,...,zt • For each instance zi • Compute features xi = (xi1,...,xim) • Run algorithm to get its runtime yi • Collect (xi ,yi) pairs • Learn function f: X!R (features ! runtime), yi  f (xi) • Test: Given a new instance zt+1 • Compute features xt+1 • Predict runtime yt+1 = f(xt+1) Expensive Cheap Automated Parameter Setting

  7. Runtime prediction: Linear regression [Leyton-Brown, Nudelman et al. ’02 & ’04] • The learned function f has to be linear in the features xi = (xi1,...,xim) • yi¼ f(xi) = j=1..m (xij * wj) = xi * w • The learning problem thus reduces to fitting the weights w =w1,...,wm • To grasp the vast different in runtime better, estimate the logarithm of runtime: e.g. yi = 5  runtime is 105 sec Automated Parameter Setting

  8. Runtime prediction: Feature engineering [Leyton-Brown, Nudelman et al. ’02 & ’04] • Features can be computed quickly (in seconds) • Basic properties like #vars, #clauses, ratio • Estimates of search space size • Linear programming bounds • Local search probes • Linear functions are not very powerful • But you can use the same methodology to learn more complex functions • Let  = (1,...,q) be arbitrary combinations of the features x1,...,xm (so-called basis functions) • Learn linear function of basis functions: f() =  * w • Basis functions used in [Nudelman et al. ’04] • Original features: xi • Pairwise products of features: xi * xj • Only subset of these (drop useless basis functions) Automated Parameter Setting

  9. Algorithm selection based on runtime prediction[Leyton-Brown, Nudelman et al. ’02 & ’04] • Given n different algorithms A1,...,An • Training: • Learn n separate functions fj:!R, j=1...n • Test: • Predict runtime yjt+1 = fj(t+1) for each of the algorithms • Choose algorithm Aj with minimal yjt+1 Really Expensive Cheap Automated Parameter Setting

  10. Overview • Previous work on runtime prediction we base on [Leyton-Brown, Nudelman et al. ’02 & ’04] • Part I: Automated parameter setting based on runtime prediction • Part II: Incremental learning for runtime prediction in a priori unknown domains • Experiments • Conclusions Automated Parameter Setting

  11. Parameter setting based on runtime prediction Finding the best default parameter setting for a problem class Generate special purpose code [Minton ’93] Minimize estimated error [Kohavi & John ’95] Racing algorithm [Birattari et al. ’02] Local search [Hutter ’04] Experimental design [Adenso-Daz & Laguna ’05] Decision trees [Srivastava & Mediratta, ’05] Runtime prediction for algorithm selection on a per-instance base Predict runtime for each algorithm and pick the best [Leyton-Brown, Nudelman et al. ’02 & ’04] Runtime prediction for setting parameterson a per-instance base Automated Parameter Setting

  12. Naive application of runtime prediction for parameter setting • Given one algorithm with n different parameter settings P1,...,Pn • Training: • Learn n separate functions fj:!R, j=1...n • Test: • Predict runtime yjt+1 = fj(t+1) for each of the parameter settings • Run algorithm with setting Pj with minimal yjt+1 Too expensive Fairly Cheap • If there are too many parameter configurations: • Cannot run each parameter setting on each instance • Need to generalize (cf. human parameter tuning) • With separate functions there is no way to generalize Automated Parameter Setting

  13. X1:t w1 w2 wn y11:t y21:t yn1:t w X1:t y11:t y21:t yn1:t Generalization by parameter sharing • Naive approach: n separate functions. • Information on theruntime of setting icannot inform predictions for setting j i • Our approach: 1 single function. • Information on theruntime of setting ican inform predictions for setting i j Automated Parameter Setting

  14. Application of runtime prediction for parameter setting • View the parameters as additional features, learn a single function • Training: Given a set of instances z1,...,zt • For each instance zi • Compute features xi • Pick some parameter settings p1,...,pn • Run algorithm with settings p1,...,pn to get runtimes y1i ,...,yni • Basic functions 1i, ..., ni include the parameter settings • Collect pairs (ji,yji) (n data points per instance) • Only learn a single function g:!R • Test: Given a new instance zt+1 • Compute features xt+1 • Search over parameter settings pj. Evaluation: compute jt+1, check g(jt+1) • Run with best predicted parameter setting p* Moderately Expensive Cheap Automated Parameter Setting

  15. Summary of automated parameter setting based on runtime prediction • Learn a single function that maps features and parameter settings to runtime • Given a new instance • Compute the features (they are fix) • Search for the parameter setting that minimizes predicted runtime for these features Automated Parameter Setting

  16. Overview • Previous work on runtime prediction we base on [Leyton-Brown, Nudelman et al. ’02 & ’04] • Part I: Automated parameter setting based on runtime prediction • Part II: Incremental learning for runtime prediction in a priori unknown domains • Experiments • Conclusions Automated Parameter Setting

  17. Problem setting: Incremental learning for multiple domains Frank Hutter: Frank Hutter: Automated Parameter Setting

  18. Solution: Sequential Bayesian Linear Regression Update “knowledge“ as new data arrives:probability distribution over weights w • Incremental (one (xi, yi) pair at a time) • Seemlessly integrate this new data • “Optimal“: yields same result as a batch approach • Efficient • Computation: 1 matrix inversion per update • Memory: can drop data we integrated • Robust • Simple to implement (3 lines of Matlab) • Provides estimates of uncertainty in prediction Automated Parameter Setting

  19. What are uncertainty estimates? Automated Parameter Setting

  20. Instead of predicting a single runtime y, use a probability distribution P(Y) The mean of P(Y) is exactly the prediction of the non-Bayesian approach, but we get uncertainty estimates Sequential Bayesian linear regression – intuition Uncertainty of prediction P(Y) Log. runtime Y Mean predicted runtime Automated Parameter Setting

  21. Gaussian Assumed Gaussian Gaussian Sequential Bayesian linear regression – technical • Standard linear regression: • Training: given training data 1:n, y1:n, fit the weights w such that y1:n¼1:n* w • Prediction: yn+1 = n+1 * w • Bayesian linear regression: • Training: Given training data 1:n, y1:n, infer probability distribution P(w|1:n, y1:n) / P(w) * i P(yi|i, w) • Prediction: P(yn+1|n+1, 1:n, y1:n) = sP(yn+1|w, n+1) * P(w|1:n, y1:n) dw • “Knowledge“ about the weights: Gaussian (w, w) Automated Parameter Setting

  22. Start with a prior P(w)with very high uncertainty First data point (1,y1) P(w|1, y1) /P(w) * P(y1|1,w) Prediction with prior w Prediction with posterior w|1, y1 P(y2|,w) P(y2|,w) Log. runtime y2 Log. runtime y2 Sequential Bayesian linear regression – visualized P(wi) Weight wi P(y1|1,w) Weight wi P(wi|1, y1) Automated Parameter Setting

  23. Summary of incremental learning for runtime prediction • Have a probability distribution over the weights: • Start with a Gaussian prior, incremetally update it with more data • Given the Gaussian weight distribution, the predictions are also Gaussians • We know how uncertain our predictions are • For new domains, we will be very uncertain and only grow more confident after having seen a couple of data points Frank Hutter: Frank Hutter: Automated Parameter Setting

  24. Overview • Previous work on runtime prediction we base on [Leyton-Brown, Nudelman et al. ’02 & ’04] • Part I: Automated parameter setting based on runtime prediction • Part II: Incremental learning for runtime prediction in a priori unknown domains • Experiments • Conclusions Automated Parameter Setting

  25. Domain for our experiments • SAT • Best studied NP-hard problem • Good features already exist [Nudelman et al.’04] • Lots of benchmarks • Stochastic Local Search (SLS) • Runtime prediction has never been done for SLS before • Parameter tuning is very important for SLS • Parameters are often continuous • SAPS algorithm [Hutter, Tompkins, Hoos ‘02] • Still amongst the state-of-the-art • Default setting not always best • Well, I also know it well ;-) • But the approach is applicable to about anything whenever we can compute features!! Automated Parameter Setting

  26. Stochastic Local Search for SAT:Scaling and Probabilistic Smoothing (SAPS)[Hutter, Tompkins, Hoos ‘02] • Clause weighting algorithm for SAT, was state-of-the-art in 2002 • Start with all clause weights set to 1 • Hillclimbing until you hit a local minimum • In local minima: • Scaling: scale weights of unsatisfied clauses: wcÃ * wc • Probabilistic smoothing: with probability Psmooth, smooth all clause weights: wcÃ * wc + (1-) * average wc • Default parameter setting: (, , Psmooth) = (1.3,0.8,0.05) • Psmooth and  are very closely related Automated Parameter Setting

  27. Benchmark instances • Only satisfiable instances! • SAT04rand: SAT ‘04 competition instances • mix: mix of lots of different domains from SATLIB: random, graph colouring, blocksworld, inductive inference, logistics, ... Automated Parameter Setting

  28. Adaptive parameter setting vs. SAPS default on SAT04rand • Trained on mix and used to choose parameters for SAT04rand • 2 {0.5,0.6,0.7,0.8} • 2 {1.1,1.2,1.3} • For SAPS: #steps  time • Adaptive variant on average 2.5 times faster than default • But default is not strong here Automated Parameter Setting

  29. Where uncertainty helps in practice: qualitative differences in training & test set • Trained on mix, tested on SAT04rand Estimates of uncertaintyof prediction Optimal prediction Automated Parameter Setting

  30. Where uncertainty helps in practice (2):Zoomed to predictions with low uncertainty Optimal prediction Automated Parameter Setting

  31. Overview • Previous work on runtime prediction we base on [Leyton-Brown, Nudelman et al. ’02 & ’04] • Part I: Automated parameter setting based on runtime prediction • Part II: Incremental learning for runtime prediction in a priori unknown domains • Experiments • Conclusions Automated Parameter Setting

  32. Conclusions • Automated parameter tuning is needed and feasible • Algorithm experts waste their time on it • Solver can automatically choose appropriate heuristics based on instance characteristics • Such a solver could be used in practice • Learns incrementally from the instances it solves • Uncertainty estimates prevent catastrophic errors in estimates for new domains Automated Parameter Setting

  33. Future work along these lines • Increase predictive performance • Better features • More powerful ML algorithms • Active learning • Run most informative probes for new domains (need the uncertainty estimates) • Use uncertainty • Pick algorithm with maximal probability of success (not the one with minimal expected runtime!) • More domains • Tree search algorithms • CP Automated Parameter Setting

  34. Future work along related lines • If there are no features: • Local search in parameter space to find the best default parameter setting [Hutter ‘04] • If we can change strategies while running the algorithm: • Reinforment learning for algorithm selection[Lagoudakis & Littman ‘00] • Low knowledge algorithm control[Carchrae and Beck ‘05] Automated Parameter Setting

  35. The End • Thanks to • Youssef Hamadi • Kevin Leyton-Brown • Eugene Nudelman • You for your attention  Automated Parameter Setting

More Related