1 / 24

Optimization With Parity Constraints: From Binary Codes to Discrete Integration

Optimization With Parity Constraints: From Binary Codes to Discrete Integration. Stefano Ermon* , Carla P. Gomes*, Ashish Sabharwal + , and Bart Selman* *Cornell University + IBM Watson Research Center UAI - 2013. High-dimensional integration.

kineta
Download Presentation

Optimization With Parity Constraints: From Binary Codes to Discrete Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization With Parity Constraints: From Binary Codes to Discrete Integration Stefano Ermon*, Carla P. Gomes*, Ashish Sabharwal+, and Bart Selman* *Cornell University +IBM Watson Research Center UAI - 2013

  2. High-dimensional integration • High-dimensional integrals in statistics, ML, physics • Expectations / model averaging • Marginalization • Partition function / rank models / parameter learning • Curse of dimensionality: • Quadrature involves weighted sum over exponential number of items (e.g., units of volume) n dimensional hypercube L2 L3 L4 Ln L

  3. Discrete Integration Size visually represents weight 2n Items 5 • We are given • A set of 2n items • Non-negative weights w • Goal: compute total weight • Compactly specified weight function: • factored form (Bayes net, factor graph, CNF, …) • Example 1: n=2 variables, sum over 4 items • Example 2: n= 100 variables, sum over 2100 ≈1030 items (intractable) 4 1 … 0 5 1 0 2 factor 5 2 Goal: compute 5 + 0 + 2 + 1 = 8 1 0

  4. Hard EXP Hardness PSPACE P^#P PH 0 1 • 0/1 weights case: • Is there at least a “1”? SAT • How many “1” ? #SAT • NP-complete vs. #P-complete. Much harder • General weights: • Find heaviest item (combinatorial optimization, MAP) • Sum weights (discrete integration) • [ICML-13] WISH: Approximate Discrete Integration via Optimization. E.g., partition function via MAP inference • MAP inference often fast in practice: • Relaxations / bounds • Pruning NP P 0 1 Easy 0 3 4 7

  5. WISH : Integration by Hashing and Optimization The algorithm requires only O(n log n) MAP queries to approximate the partition function within a constant factor MAP inference on model augmented with random parity constraints Repeat log(n) times Outer loop over n variables Aggregate MAP inference solutions AUGMENTED MODEL Original graphical model σ{0,1}n Parity check nodes enforcing A σ= b (mod 2) σ n binary variables

  6. Visual working of the algorithm n times • How it works 1 random parity constraint 2 random parity constraints 3 random parity constraints Function to be integrated …. …. …. …. Log(n) times Mode M0 + median M1 + median M2 + median M3 ×4 ×1 ×2 + …

  7. Accuracy Guarantees • Theorem [ICML-13]: With probability at least 1- δ (e.g., 99.9%) WISH computes a 16-approximation of the partition function (discrete integral) by solving θ(n log n) MAP inference queries (optimization). • Theorem [ICML-13]: Can improve the approximation factor to (1+ε) by adding extra variables and factors. • Example: factor 2 approximation with 4n variables • Remark: faster than enumeration only when combinatorial optimization is efficient

  8. Summary of contributions • Introduction and previous work: • WISH: Approximate Discrete Integration via Optimization. • Partition function / marginalization via MAP inference • Accuracy guarantees • MAP Inference subject to parity constraints: • Tractable cases and approximations • Integer Linear Programming formulation • New family of polynomial time (probabilistic) upper and lower bounds on partition function that can be iteratively tightened (will reach within constant factor) • Sparsity of the parity constraints: • Techniques to improve solution time and bounds quality • Experimental improvements over variational techniques

  9. MAP inference with parity constraints Hardness, approximations, and bounds

  10. Making WISH more scalable • Would approximations to the optimization (MAP inference with parity constraints) be useful? YES • Bounds on MAP (optimization) translate to bounds on the partition function Z (discrete integral) • Lower bounds (local search) on MAP  lower bounds on Z • Upper bounds (LP,SDP relaxation) on MAP  upper bounds on Z • Constant-factor approximations on MAP  constant factor on Z • Question: Are there classes of problems where we can efficiently approximate the optimization (MAP inference) in the inner loop of WISH?

  11. Error correcting codes Communication over a noisy channel • Bob: There has been a transmission error! What was the message actually sent by Alice? • Must be a valid codeword • As close as possible to received message y Alice Bob Noisy channel y x 0100|1 0110|1 Redundant parity check bit= 0 XOR 1 XOR 0 XOR 0 Parity check bit = 1 ≠ 0 XOR 1 XOR 1 XOR 0 = 0

  12. Decoding a binary code Noisy channel x y • Max-likelihood decoding 0110|1 0100|1 ML-decoding graphical model Noisy channel model x Transmitted string must be a codeword More complex probabilistic model MAP inference is NP hard to approximate within any constant factor [Stern, Arora,..] Max w(x) subject to A x = b (mod 2) Equivalent to MAP inference on augmented model LDPC Routinely solved: 10GBase-T Ethernet, Wi-Fi 802.11n, digital TV,.. Our more general case Parity check nodes Parity check nodes

  13. Decoding via Integer Programming • MAP inference subject to parity constraints encoded as an Integer Linear Program (ILP): • Standard MAP encoding • Compact (polynomial) encoding by Yannakakis for parity constraints • LP relaxation: relax integrality constraint • Polynomial time upper bounds • ILP solving strategy: cuts + branching + LP relaxations • Solve a sequence of LP relaxations • Upper and lower bounds that improve over time Parity polytope

  14. Iterative bound tightening Polynomial time upper ad lower bounds on MAP that are iteratively tightened over time • Recall: bounds on optimization (MAP)  (probabilistic) bounds on the partition function Z. New family of bounds. • WISH: When MAP is solved to optimality (LowerBound = UpperBound), guaranteed constant factor approximation on Z

  15. Sparsity of the parity constraints Improving solution time and bounds quality

  16. Inducing sparsity • Observations: • Problems with sparse A x = b (mod 2) are empirically easier to solve (similar to Low-Density Parity Check codes) • Quality of LP relaxation depends on A and b , not just on the solution space. Elementary row operations (e.g., sum 2 equations) do not change solution space but affect the LP relaxation. • Reduce A x = b (mod 2) to row-echelon form with Gaussian elimination (linear equations over finite field) • Greedy application of elementary row operations Matrix A in row-echelon form Parity check nodes Equivalent but sparser Parity check nodes

  17. Improvements from sparsity • Quality of LP relaxations significantly improves • Finds integer solutions faster (better lower bounds) Without sparsification, fails at finding integer solutions (LB) Upper bound improvement Improvements from sparsification using IBM CPLEX ILP solver for a 10x10 Ising Grid

  18. Generating sparse constraints We optimize over solutions of A x = b mod 2 (parity constraints) • WISH based on Universal Hashing: • Randomly generate A in {0,1}i×n, b in {0,1}i • Then A x + b (mod 2) is: • Uniform over {0,1}i • Pairwise independent • Suppose we generate a sparse matrix A • At most k variables per parity constraint (up to k ones per row of A) • A x+b (mod 2) is still uniform, not pairwise independent anymore • E.g. for k=1, A x = b mod 2 is equivalent to fixing i variables. Lots of correlation. (Knowing A x = b tells me a lot about A y = b) n A x b i = (mod 2) Given variable assignments x and y , the events A x = b (mod 2) and A y =b (mod 2) are independent.

  19. Using sparse parity constraints • Theorem: With probability at least 1- δ (e.g., 99.9%) WISH with sparse parity constraints computes an approximate lower bound of the partition function. • PRO: “Easier” MAP inference queries • For example, random parity constraints of length 1 (= on a single variable). Equivalent to MAP with some variables fixed. • CON: We lose the upper bound part. Output can underestimate the partition function. • CON: No constant factor approximation anymore

  20. MAP with sparse parity constraints • MAP inference with sparse constraints evaluation • ILP and Branch&Bound outperform message-passing (BP, MP and MPLP) 10x10 attractive Ising Grid 10x10 mixed Ising Grid

  21. Experimental results • ILP provides probabilistic upper and lower bounds that improve over time and are often tighter than variational methods (BP, MF, TRW)

  22. Experimental results (2) • ILP provides probabilistic upper and lower bounds that improve over time and are often tighter than variational methods (BP, MF, TRW)

  23. Conclusions • [ICML-13]WISH: Discrete integration reduced to small number of optimization instances (MAP) • Strong (probabilistic) accuracy guarantees • MAP inference is still NP-hard • Scalability: Approximations and Bounds • Connection with max-likelihood decoding • ILP formulation + sparsity (Gauss sparsification & uniform hashing) • New family of probabilistic polynomial time computable upper and lower bounds on partition function. Can be iteratively tightened (will reach within a constant factor) • Future work: • Extension to continuous integrals and variables • Sampling from high-dimensional probability distributions

  24. Extra slides

More Related