230 likes | 352 Views
Automated discovery in math. Machine learning techniques (GP, ILP, etc.) have been successfully applied in science How about mathematics? Can they be used to discover interesting relationships in mathematical “data”? This is an exploration of using GP for that purpose
E N D
Automated discovery in math • Machine learning techniques (GP, ILP, etc.) have been successfully applied in science • How about mathematics? Can they be used to discover interesting relationships in mathematical “data”? • This is an exploration of using GP for that purpose • Specifically, using GP to automatically discover Euler’s identity (V – E + F = 2) from a fairly limited amount of data
Cubes V = 8 E = 12 F = 6 V – E + F = 8 – 12 + 6 = 2
Tetrahedra V = 4 E = 6 F = 4 V – E + F = 4 – 6 + 4 = 2
Octahedra V = 6 E = 12 F = 8 V – E + F = 6 – 8 + 12 = 2
At a glance • 50 generations • Population: 4000 ASTs • Generation #: 3600 (90% of population) • Maximum AST depth: 13 • Ramped half-and-half initialization • 3 non-terminals: +, -, * • 12 terminals: V, E, F, 1, 2, …, 9 • Crossover, no mutation
Genetic algorithms (GA) • Search a space of solution attempts (“individuals”) • Use natural selection to guide the search • Must have a fitness function that can evaluate any given individual • Individuals procreate by exchanging (recombining) “genetic material”
Example: SAT solving • Problem: Given a CNF formula P over n variables x1,…,xn, find a satisfying assignment • Search space: all n-bit strings • Fitness measure for a given individual b1 bn: # of satisfied clauses in P • Genetic operations: crossover and mutation
Crossover: a1 … aj-1|aj … an + b1 … bj-1|bj … bn a1 … aj-1| bj … bn b1 … bj-1| aj … an Mutation: 0 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1
Generic GA algorithm Parameterized over: N, P, G • Construct a random initial population • Set i := 1 • If i > N then halt • Compute the fitness of each individual; if the fittest solves the problem, halt. • Create a new population: • Pick P – G individuals and copy them • Create G new individuals by repeated applications of genetic operations • Set i := i + 1 and go to step 3
Selection • How is an individual “picked” for reproduction or copying? • Main idea: the probability that an individual is selected should be proportional to the individual’s fitness • Many ways to ensure that. One method is tournament selection: • Pick 0 < k <= P individuals randomly • Select the fittest of the k • When k = 1: No selection pressure • When k = P: Too much selection pressure
Genetic Programming (GP) • An instance of the generic GA scheme • Individuals are now programs, i.e., syntactic objects • Search space is kept finite by bounding program size • Programs are represented as ASTs (abstract syntax trees)
Programs as ASTs if x > 0 then y := x * x else y := z + 1 Parsing if := := > + x y y 0 * x 1 x z
Program structure in GP • Programs are usually simple Herbrand terms, i.e., functional expressions • AST leaves are called terminals • Internal nodes are non-terminals • Non-terminals are function symbols (e.g. +) • Terminals are constants and variables • Terminals + non-terminals must be sufficient for expressing solutions
Viewing a functional AST as a “program” + * y x 2 The program has two “inputs”, x and y. Given specific values for these, it produces a unique result as output
AST Crossover Crossover pt 1 Crossover pt 2 + - + T4 * T3 T1 T2 T5 T6 Parents Children - + T4 * + T3 T1 T5 T6 T2
Initial population • Built randomly • Two methods for building a random AST: • Full method: All branches are equally long • Grow method: Different subtrees can have different sizes (but less than the maximum) • More usual: ramped half-and-half initialization: half of the trees are built with one method, the other half with the other method
Problem formulation • Can cast it as a standard symbolic regression problem • View F as a function of E and V, and search space of all rational functions of two variables (up to a max depth) • Error function: difference between actual # of faces and the result produced by the program • Optimization: minimize the error • Quick convergence
Another approach • Search space of all identities • Generated as follows: I T1 = T2 T L | T1 + T2 | T1 – T2 | T1 * T2 L V | E | F | 1 | 2 | … | 9 • Any other integer can be built from 1,…, 9 and the given non-terminals • Identity is not a non-terminal; it can only appear at the root of an AST
Details • Generate P identities randomly (using ramped half-and-half initialization) • Crossover on two identities S1 = S2 and T1 = T2: • Mate two random subterms Si and Tj from each identity, producing two new subterms Si’ and Tj’ • If either new term is deeper than the max depth, then use one of the original parents • Replace Si and Tj in the identities by Si’ and Tj’ • No mutation
Fitness • An identity is evaluated on a given triple of values for V, E, and F • Computing the fitness of an identity S = T: • For each of the k data triples ½: • If S = T holds for ½, then give the identity a point • Higher score, greater fitness • Maximum fitness: 9, minimum: 0
Problem • Trivially true identities can get perfect scores, e.g.: • V = V • 1 + 2 = 5 – 3 • E – E + E = E • Solution: negative triples, e.g.: • V = 0, E = 0, F = 1 • Trivial identities will hold for such negative triples, but plausible identities will not
Fitness computation • To evaluate an identity S = T: • For each of the k data triples p: • Allocate a point if S = T holds for p • Allocate a second point if S = T does not hold for the negative triple • Maximum score: 18, minimum: 0 • Also impose a penalty of b n/20 c points for an identity of length n (to discourage excessively long expressions)