Growth of Functions

Growth of Functions CS/APMA 202 Rosen section 2.2 Aaron Bloomfield

How does one measure algorithms • We can time how long it takes a computer • What if the computer is doing other things? • And what happens if you get a faster computer? • A 3 Ghz Windows machine chip will run an algorithm at a different speed than a 3 Ghz Macintosh • So that idea didn’t work out well…

How does one measure algorithms • We can measure how many machine instructions an algorithm takes • Different CPUs will require different amount of machine instructions for the same algorithm • So that idea didn’t work out well…

How does one measure algorithms • We can loosely define a “step” as a single computer operation • A comparison, an assignment, etc. • Regardless of how many machine instructions it translates into • This allows us to put algorithms into broad categories of efficientness • An efficient algorithm on a slow computer will always beat an inefficient algorithm on a fast computer

Bubble sort running time • The bubble step take (n2-n)/2 “steps” • Let’s say the bubble sort takes the following number of steps on specific CPUs: • Intel Pentium IV CPU: 58*(n2-n)/2 • Motorola CPU: 84.4*(n2-2n)/2 • Intel Pentium V CPU: 44*(n2-n)/2 • Notice that each has an n2 term • As n increases, the other terms will drop out

Bubble sort running time • This leaves us with: • Intel Pentium IV CPU: 29n2 • Motorola CPU: 42.2n2 • Intel Pentium V CPU: 22n2 • As processors change, the constants will always change • The exponent on n will not • Thus, we can’t care about the constants

An aside: inequalities • If you have a inequality you need to show: x < y • You can replace the lesser side with something greater: x+1 < y • If you can still show this to be true, then the original inequality is true • Consider showing that 15 < 20 • You can replace 15 with 16, and then show that 16 < 20. Because 15 < 16, and 16 < 20, then 15 < 20

An aside: inequalities • If you have a inequality you need to show: x < y • You can replace the greater side with something lesser: x < y-1 • If you can still show this to be true, then the original inequality is true • Consider showing that 15 < 20 • You can replace 20 with 19, and then show that 15 < 19. Because 15 < 19, and 19 < 20, then 15 < 20

An aside: inequalities • What if you do such a replacement and can’t show anything? • Then you can’t say anything about the original inequality • Consider showing that 15 < 20 • You can replace 20 with 10 • But you can’t show that 15 < 10 • So you can’t say anything one way or the other about the original inequality

Quick survey • I felt I understand running times and inequality manipulation… • Very well • With some review, I’ll be good • Not really • Not at all

The 2002 Ig Nobel Prizes • Biology • Physics • Interdisciplinary • Chemistry • Mathematics • Literature • Peace • Hygiene • Economics • Medicine “Courtship behavior of ostriches towards humans under farming conditions in Britain” “Demonstration of the exponential decay law using beer froth” A comprehensive study of human belly button lint Creating a four-legged periodic table “Estimation of the surface area of African elephants” “The effects of pre-existing inappropriate highlighting on reading comprehension” For creating Bow-lingual, a computerized dog-to-human translation device For creating a washing machine for cats and dogs Enron et. al. for applying imaginary numbers to the business world “Scrotal asymmetry in man in ancient sculpture”

Review of last time • Searches • Linear: n steps • Binary: log2n steps • Binary search is about as fast as you can get • Sorts • Bubble: n2 steps • Insertion: n2 steps • There are other, more efficient, sorting techniques • In principle, the fastest are heap sort, quick sort, and merge sort • These each take take n * log2n steps • In practice, quick sort is the fastest, followed by merge sort

Big-Oh notation • Let b(x) be the bubble sort algorithm • We say b(x) is O(n2) • This is read as “b(x) is big-oh n2” • This means that the input size increases, the running time of the bubble sort will increase proportional to the square of the input size • In other words, by some constant times n2 • Let l(x) be the linear (or sequential) search algorithm • We say l(x) is O(n) • Meaning the running time of the linear search increases directly proportional to the input size

Big-Oh notation • Consider: b(x) is O(n2) • That means that b(x)’s running time is less than (or equal to) some constant times n2 • Consider: l(x) is O(n) • That means that l(x)’s running time is less than (or equal to) some constant times n

Big-Oh proofs • Show that f(x) = x2 + 2x + 1 is O(x2) • In other words, show that x2 + 2x + 1 ≤c*x2 • Where c is some constant • For input size greater than some x • We know that 2x2≥ 2x whenever x ≥ 1 • And we know that x2≥1 whenever x ≥ 1 • So we replace 2x+1 with 3x2 • We then end up with x2 + 3x2 = 4x2 • This yields 4x2≤c*x2 • This, for input sizes 1 or greater, when the constant is 4 or greater, f(x) is O(x2) • We could have chosen values for c and x that were different

Big-Oh proofs

Rosen, section 2.2, question 2(b) • Show that f(x) = x2 + 1000 is O(x2) • In other words, show that x2 + 1000 ≤ c*x2 • We know that x2 > 1000 whenever x > 31 • Thus, we replace 1000 with x2 • This yields 2x2≤ c*x2 • Thus, f(x) is O(x2) for all x > 31 when c ≥ 2

Rosen, section 2.2, question 1(a) • Show that f(x) = 3x+7 is O(x) • In other words, show that 3x+7 ≤ c*x • We know that x > 7 whenever x > 7 • Duh! • So we replace 7 with x • This yields 4x≤c*x • Thus, f(x) is O(x) for all x > 7 when c ≥ 4

Quick survey • I felt I understand (more or less) Big-Oh proofs… • Very well • With some review, I’ll be good • Not really • Not at all

Today’s demotivators

A variant of the last question • Show that f(x) = 3x+7 is O(x2) • In other words, show that 3x+7 ≤ c*x2 • We know that x > 7 whenever x > 7 • Duh! • So we replace 7 with x • This yields 4x < c*x2 • This will also be true for x > 7 when c≥ 1 • Thus, f(x) is O(x2) for all x > 7 when c ≥ 1

What that means • If a function is O(x) • Then it is also O(x2) • And it is also O(x3) • Meaning a O(x) function will grow at a slower or equal to the rate x, x2, x3, etc.

Function growth rates • For input size n = 1000 • O(1) 1 • O(log n) ≈10 • O(n) 103 • O(n log n) ≈104 • O(n2) 106 • O(n3) 109 • O(n4) 1012 • O(nc) 103*cc is a consant • 2n ≈10301 • n! ≈102568 • nn 103000 Many interesting problems fall into this category

Function growth rates Logarithmic scale!

Integer factorization • Factoring a composite number into it’s component primes is O(2n) • This, if we choose 2048 bit numbers (as in RSA keys), it takes 22048 steps • That’s about 10617 steps!

Formal Big-Oh definition • Let f and g be functions. We say that f(x) is O(g(x)) if there are constants c and k such that |f(x)| ≤ C |g(x)| whenever x > k

Formal Big-Oh definition

A note on Big-Oh notation • Assume that a function f(x) is O(g(x)) • It is sometimes written as f(x) =O(g(x)) • However, this is not a proper equality! • It’s really saying that |f(x)| ≤ C |g(x)| • In this class, we will write it as f(x) is O(g(x))

The growth of combinations of functions • Ignore this part

Big-omega and Big-theta • Ignore this part

NP Completeness • Not in the textbook – this is additional material • A full discussion of NP completeness takes 3 hours for somebody who already has a CS degree • We are going to do the 15 minute version of it • Any term of the form nc, where c is a constant, is a polynomial • Thus, any function that is O(nc) is a polynomial-time function • 2n, n!, nn are not polynomial functions

Satisfiability • Consider a Boolean expression of the form: • (x1 x2 x3)  (x2 x3 x4)  (x1 x4 x5) • This is a conjunction of disjunctions • Is such an equation satisfiable? • In other words, can you assign truth values to all the xi’s such that the equation is true?

Satisfiability • If given a solution, it is easy to check if such a solution works • Plug in the values – this can be done quickly, even by hand • However, there is no known efficient way to find such a solution • The only definitive way to do so is to try all possible values for the n Boolean variables • That means this is O(2n)! • Thus it is not a polynomial time function • NP stands for “Not Polynomial” • Cook’s theorem (1971) states that SAT is NP-complete • There still may be an efficient way to solve it, though!

NP Completeness • There are hundreds of NP complete problems • It has been shown that if you can solve one of them efficiently, then you can solve them all • Example: the traveling salesman problem • Given a number of cities and the costs of traveling from any city to any other city, what is the cheapest round-trip route that visits each city once and then returns to the starting city? • Not all algorithms that are O(2n) are NP complete • In particular, integer factorization (also O(2n)) is not thought to be NP complete

NP Completeness • It is “widely believed” that there is no efficient solution to NP complete problems • In other words, everybody has that belief • If you could solve an NP complete problem in polynomial time, you would be showing that P = NP • If this were possible, it would be like proving that Newton’s or Einstein’s laws of physics were wrong • In summary: • NP complete problems are very difficult to solve, but easy to check the solutions of • It is believed that there is no efficient way to solve them

Quick survey • I sorta kinda get the hang of NP completeness • Very well • With some review, I’ll be good • Not really • Not at all

Star Wars: Episode III trailer • No, really!

Quick survey • I felt I understood the material in this slide set… • Very well • With some review, I’ll be good • Not really • Not at all

Quick survey • The pace of the lecture for this slide set was… • Fast • About right • A little slow • Too slow

Quick survey • How interesting was the material in this slide set? Be honest! • Wow! That was SOOOOOO cool! • Somewhat interesting • Rather borting • Zzzzzzzzzzz

Growth of Functions