Simulated annealing for convex optimization

Simulated annealing for convex optimization Adam Tauman Kalai, TTI-Chicago Santosh Vempala, MIT

Three points of this talk • Design efficient algorithm for a convex optimization problem • We get current best (worst-case) bounds • Analysis of simulated annealing showing provable efficiency • Better understand simulated annealing • Simulated annealing is also atype of interior point algorithm • Rapid convergence to local/global min(we do not say anything about local vs global min)

Outline • The optimization problem • Previous approaches • Simulated annealing • Results • Simulated annealing works fast • Geometric “cooling schedule” is optimal • Issues with shape/covariance

The optimization problem • Linear optimization (f(x) = c¢x) over convex set K • x* = argminx2K c¢x • Inputs: • n = number of dimensions (large) • unit vector c 2<n • accuracy  > 0 • convex set K ½<n • membership oracle K(x) = 1 if x 2 K, 0 otherwise • starting point x02 K • K contains radius-r ball, contained in radius-R ball • Goal: output x where c¢x · c¢x* +  K r R x0 c x*

The optimization problem • Linear optimization (f(x) = c¢x) over convex set K • x* = argminx2K c¢x • Inputs: • n = number of dimensions (large) • unit vector c 2<n • accuracy  > 0 • convex set K ½<n • membership oracle K(x) = 1 if x 2 K, 0 otherwise • starting point x02 K • K contains radius-r ball, contained in radius-R ball • Goal: output x where c¢x · c¢x* +  K c x0

Previous approaches • minx2K c¢x, c 2<n, convex K ½<n • Ellipsoid method can solve this problem in O*(n10) membership queries • O*(nS) Bertsimas-Vempala stochastic search • Use “uniform sample from convex set” subroutine We get O*(n½S) Given a “good” starting point, random walk finds almost uniformly random point in K in S=O*(n4) steps K x1 x2 x3 c * hides logarithmic factors, O*(n10)=O(n10 logc(nR/red)) Cut off sections…

O*(nS) algorithm [BV03] • Elegant analysis • Requires (n) phases in worst case In n-dimensional cone, most of mass is within 1/n of top n-dim. cone c ) ¼n phases cuts height in half

Simulated annealing Completely random discrete or continuous T=1 • Goal: minimize f(x) over set K • Approach: decreasing temp 0 < T < 1 • Phase i, temp Ti = Ti-1, T0 large • Biased random walk • During phase i, stationary distribution is di(x) / exp(-f(x)/Ti) “Geometric” cooling schedule ( <1) T=0 Global minimum x* x’ x Fill in graph

Simulated annealing alg. for our problem K • T0 = R (radius of containing ball) • Temperature Ti, sample from density di(x)/ exp(–(c ¢ x)/Ti) • Repeat “hit and run” random walk S times: • At x, pick random line L passing through x • Pick random x’ on K Å L with prob. / exp(–(c¢x’)/Ti) • Ti+1=(1-n-½)Ti • Stop at Tfinal=/n x x’ L Temperature is cut in half every ¼ n½ phases

Analysis • Sampling at temperature Tfinal=/n brings you within  of opt=c¢x* • With a “good” starting point, after S=O*(n4) steps, hit-and-run is located in K according to density di(x) / exp(-(c¢x)/Ti) (true for any log-concave density) [LV03] • “Good” start technical condition • di(x) and di-1(x) must be close

Uniform distribution over truncated cone has small std. dev. i-1 i c di(x)/ exp(-(c ¢ x)/T) has much larger std. dev. (factor of n½ larger) i-1 i

Optimal distributions and schedule • Cannot do better than n1/2 phases • Assumptions • Using a sequence of probability densities di(x) • di(x) is log-concave, i.e. log(di(x)) is concave • Variation distance |di-di-1| · 1-1/poly(n) • Boltzmann distributions with geometric cooling schedule are worst-case optimal for this class of stochastic search strategies

Shape estimation and covariance I lied • To do random walk, it’s important to have estimate of shape of object • For “isotropic” shapes, can just step in random direction • For non-isotropic shapes • Maintain a sample of n points at all times • Use covariance matrix of current sample to bias direction selection

Conclusions • In addition to possibly helping avoid local optima, S.A. converges rapidly to local opt • Simulated annealing » interior point method • Justification for Boltmann distributions with geometric cooling schedule • Future work: same analysis for convex functions • Future work: understand how simulated annealing helps avoid local minima… • Reverse-annealing used for volume estimation [LV04]

Simulated annealing for convex optimization