250 likes | 262 Views
This article explores the concept of program complexity, discussing the difference between hard and complex problems. It covers various examples and introduces the concepts of Halstead's metrics and McCabe's Cyclomatic Number.
E N D
Herbert G. Mayer, PSU CS status 7/9/2011 CS 410Mastery in ProgrammingChapter 3Program and Language Complexity
Syllabus • Thoughts on Complexity • Hard to Understand Code? • Program Complexity • Complex vs. Hard • Halstead Program Metrics • McCabe Cyclomatic Number • Cyclomatic Number Samples • References
Thoughts on Complexity • Complexity used in this class: • Refers to the number of different paths of execution through a given program, dictated by possible flow of control … • Or expresses a degree of difficulty of expressing such an algorithm via a string of symbols –i.e. the source program; synonym: hard • Some hard to compute functions are easy to code and understand, once invented. E.g. Tarjan’s SCC algorithm, or Newton’s square-root • Complexity, as used here, does not mean: • “intractable to compute”, such as NP-complete problems requiring too much compute power to ever terminate in human time • Complexity also does not mean: • “hard to understand”, as may be the case with obfuscated programming styles
Hard to Understand Code? • #include <stdio.h> • int a[ 1 ]; // just to have an array to index • int p( char arg ) • { // p • printf( "%c", arg ); • return 0; • } //end p • int main( ) • { // main • a[ p( 'a' ) ] = • a[ p( 'b' ) ] = • a[ p( 'c' ) ] = a[ p( 'd' ) ]; • printf( "\n" ); • return 0; • } //end main
Hard to Understand Code? • Output is: a b c d • Is this correct? If not, what should output be? • Is the assignment-statement rule respected: • to execute the right-hand side first? • Are other outputs feasible, i.e. correct according to C++ (or Java)?
Hard to Understand, Not Complex • #include <stdio.h> • #define MAX 7 // 7 redundant? • int a[ MAX ] = { 0, 1, 2, 3, 4, 5, 6 }; • void p() • { // p • for( int i = 0; i < MAX; i++ ) { • printf( " a[%d] = %d\n", i, a[ i ] ); • } //end for • printf( "\n" ); • } //end p • int main() • { // main • int x = 99; • p(); • a[ x = 3 ] = a[ x = 5 ] = x = 6; • p(); • } //end main
Hard to Understand, Not Complex • a[0] = 0 a[1] = 1 a[2] = 2 a[3] = 3 a[4] = 4 a[5] = 5 a[6] = 6 a[0] = 0 a[1] = 1 a[2] = 2 a[3] = 6 a[4] = 4 a[5] = 6 • a[6] = 6 • x in the end is 6 on [most] C++ run-time systems
Program Complexity • Some computable problems are hard, some NP-hard, some complex, some hard-to-understand! • Assuming an experienced designer and programmer: • Some problems are hard to solve; they are “complex” due to amount of work • Others are hard, due to elusiveness of a solution • Yet others are not solvable; non computable functions, e.g. Halting Problem [10] • What is program complexity? • Is a large program complex, i.e. one with many lines of code (LOC)? • More complicated code? • Spaghetti code? • Recursive functions? • What unit-of-measure does complexity have? • Time to run? • Number of different paths through control-flow graph? • Space for memory locations needed to run? • Number of processors needed to solve computation? • Number of iterations for suitable solution? E.g. number of digits for π • Degree of “mental hardness” to identify a solution? E.g. in the chess game? • V(G) is a stab at a sample unit of complexity. But will it be universally acceptable?
Program Complexity • Programmatic solution for “chess” is hard or complex or both? • We can safely claim that chess is a hard problem • Yet the rules are simple and relatively few • And it has been solved programmatically to the grand-master level • Kasparov lost to “Deep Blue” in a Tournament in game 1 in 1996, overall competition ended up in a tie in 1997 [8] • Degree of difficulty for finding a solution quantifies complexity! • For example, solving Sudoku? • Some problems seem not hard, yet the number of special cases renders a solution virtually intractable • E.g. US tax code [9]; contains about 9800 different sections; ~75,000 pages • Could be simpler and fair, even equally applicable to all citizens • But instead is highly complex, due to “special cases” and requires experts to give definitive answers; has exceptions for individual tax payers! • Hence numerous CS attempts to formalize complexity, unit, computability • We cover 2 very briefly: Halstead’s and McCabe’s
Complex vs. Hard • Complex is to be interpreted as “Mathematically difficult to find a correct algorithm!” • E.g. find an algorithm to identify all strongly-connected components in a graph • Hard is to be interpreted as “Very much work to compute the correct solution” • E.g. compute the shortest path for a Travelling Salesman’s n stopping points • Might take so long that we are no longer interested in the solution • Instead: use heuristic provably no worse than x times the best solution • An incorrect solution, by contrast, is always easy to compute
Halstead Program Metrics • Measures a specific program’s complexity • Metrics developed by late Maurice Halstead • To directly quantify complexity of any given source program • Solely from operators, operands used in source • Halstead introduced measures in 1977 • Early formal program complexity measures • [1], [2], [3] • Not formally derived, but postulated • Halstead metrics carry a strong element of arbitrariness • Lack scientific base; lack proof!
Halstead Program Metrics • Halstead´s metrics count operators and operands in source code of program being analyzed • number of unique (distinct) operators (n1) • number of unique (distinct) operands (n2) • total number of operators (N1) • total number of operands (N2) • Number of unique operators and operands (n1 and n2) as well as the total number of operators and operands (N1 and N2) are calculated during lexical analysis of source program • Other Halstead measures are derived from these 4 units • but without proof or scientific derivation! • intuition of developer was used as the basis for deriving the measures • Halstead intended to provide formal proofs, but died
Halstead Program Metrics • Operands • Literals, AKA constants; e.g. 0, 1000, “hello” • User defined identifiers for values, AKA symbolic constants, e.g. MAX is an operand in: #define MAX 5 • Reserved keywords that denote value, e.g. NIL • Declarations like #define MAX 5 less obvious • Depending on language, some language-defined type specifiers are treated as operands, e.g. in C++ char, int, double
Halstead Program Metrics • Operators • Common arithmetic symbols, e.g. + - / * ^ % • Other arithmetic symbols, e.g. ( and ) • Symbols for boolean operations, e.g. > >= < <= != && || • Symbols for all kinds of operations, including cat for concatenation in some languages • Reserved keywords, e.g. or, or else, and, and then, xor • Function names, e.g. add( a, 8 ), sin( 45 ), sqrt( 3 ) • Reserved operations, e.g. try, catch, throw • Type qualifiers, e.g. const, volatile • Scope specifiers, e.g. extern, static1 1 "static" is an overloaded qualifier in C for scope and storage
Halstead Program Metrics • Operators that are control constructs: • if ( ... ) plus then-clause and optional else-clause • while ( ... ) • do ... • for( ; ; ) ... • catch() • return ... • switch {... }
Halstead Program Metrics • Program Length N, Size n, Volume V: • Program length N is the sum of total number of operators and operands in the program analyzed: • N = N1 + N2 • Vocabulary size n is the sum of the number of unique operators and operands: • n = n1 + n2 • Program volume V is the information contents of the program: • V = N * log2 n
Halstead Program Metrics • Difficulty level D, AKA degree of error-proneness: • Level of difficulty D of program is proportional to number of unique operators n1 in program • And proportional to the total number of operands N2 • But with scale-factors applied to both • D is postulated to be: • D = ( n1 / 2 ) * ( N2 / n2 ) • Interestingly, total number of operators N1 is not part of formula for difficulty level D
Halstead Program Metrics • Program level L: • Program level L is inverse of error-proneness • i.e. a low level program is more prone to errors than a corresponding high level program for the same computable function • L = 1 / D
Halstead Program Metrics • Other measures, for you to elaborate in your paper • Effort to implement • Time to implement • Number of bugs delivered • Etc.
Cyclomatic Number • Goal of McCabe’s Cyclomatic Numbers: • To have a measure of source program complexity • To manage complexity, rather than dealing with an unknown • See [4], [6] • Builds on: • Graph theory • E.g. [7] Berge: “Graphs and Hypergraphs” • Fundamental units: • Graph G –not necessarily connected! • Number of edges e • Number of nodes n • Number of connected components p • i.e. if ( p > 1 ) then G is not connected
Cyclomatic Number V • Cyclomatic number V of a graph G is called V(G) • e number of edges • n number of nodes, AKA vertices in other literature • p number of connected components • then: • V(G) = e – n + 2 * p
Cyclomatic Number Samples • Sequence of 2 statements • e = 1 • n = 2 • p = 1 • V(G) = 1 – 2 + 2 * 1 = 1 • If Statement with Then- and Else- • e = 4 • n = 4 • p = 1 • V(G) = 4 – 4 + 2 * 1 = 2 • Sequence of 4 statements • e = 3 • n = 4 • p = 1 • V(G) = 3 – 4 + 2 * 1 = 1
Cyclomatic Number of While • While Loop • e = 3 • n = 3 • p = 1 • V(G) = 3 - 3 + 2 * 1 = 2
Cyclomatic Number of Program • Multiple-Module program with no cross-module vertices • Main Program = M • subroutine 1 = A() • subroutine 2 = B() • V(G) = V( M U A U A ) = V(M) + V(A) + V(B) • M: A: B: V(M) = 3-2+2 = 1 V(A) = 4-4+2 = 2 V(B) = 6-5+2 = 3 V(G) = 12 – 12 + 2*3 = 6
References • Halstead metrics: http://www.verifysoft.com/en_halstead_metrics.html • Halstead’s book: Maurice Halstead, “Elements of Software Science”, Elsevier, 1977, ISBN 0444002057 • Detail on Halstead: http://www.horst-zuse.homepage.t-online.de/halstead.html • Wiki page on Cyclomatic numbers: http://en.wikipedia.org/wiki/Cyclomatic_complexity • Program complexity: http://www.acis.pamplin.vt.edu/faculty/tegarden/wrk-pap/DSS.PDF • Thomas J. McCabe, “A Complexity Measure”, IEEE Transactions on SWE, Viol. SE-2, No. 4, December 1976 • C. Berge: “Graphs and Hypergraphs”, North-Holland, Amsterdam 1973 • Deep Blue Info: http://www.research.ibm.com/deepblue/ • Tax code info: http://www.fourmilab.ch/ustax/ustax.html • Halting Problem: http://www.comp.nus.edu.sg/~cs5234/FAQ/halt.html • Robert Tarjan: "Depth-First Search and Linear Graph Algorithms". SIAM J. Computing, Vol. 1, No. 2, June 1972