1 / 71

Getting Started in Program Analysis Research: Outline

Getting Started in Program Analysis Research: Outline. Background and useful skills Ana Using and developing analysis Mary Lou Identifying and building infrastructure Lori Evaluating your analysis Ana. Ana Milanova. I am from Bulgaria National High School for Math and Science

nardo
Download Presentation

Getting Started in Program Analysis Research: Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Getting Started in Program Analysis Research: Outline • Background and useful skills • Ana • Using and developing analysis • Mary Lou • Identifying and building infrastructure • Lori • Evaluating your analysis • Ana

  2. Ana Milanova • I am from Bulgaria • National High School for Math and Science • American University in Bulgaria, 1997 • I have a degree in Business Administration • Rutgers University, PhD in CS, 2003 • Now Assistant Professor at RPI • Research: program analysis for software tools • Family • Husband Tony • Katarina, 5 and Petar, 2

  3. Program Analysis: Useful Background and Skills

  4. Program Analysis • Static program analysis • Analyzes the source code of the program • Run-time behavior properties without running the program • E.g., ”The object values that flow to reference variable x are only of classes A and B, but not C.” • Static analyses are conservative: consider all possible run-time behaviors of the program

  5. Program Analysis • Dynamic program analysis • Analyzes a set of program executions • Reasons about run-time behavior properties over observed executions • E.g., ”The object values that flowed to reference variable xduring observed executions were only of classes A and B, but not C.” • Dynamic analyses are incomplete: consider only behaviors over particular executions • Goal: combine with static analysis

  6. Uses of Static Program Analysis • Compilers – traditional application domain • Enables optimizing transformation • Software engineering tools • Static debugging, verification, security • Uncover difficult errors and security flaws • Testing • Evaluate and improve test suites • Software understanding • Calling structure • Complex dependences • Change impacts

  7. Uses of Program Analysis Analysis for compiler optimization is differentfrom Analysis for software tools Different requirements, different success criteria (more later…)

  8. Static Analysis Methodologies • Data-flow analysis • Constraint-based program analysis • Abstract interpretation • Type and effect systems • Model checking

  9. Example: Data-flow Analysis 1. i=11 read x,y • Flow facts • Information that we are propagating • E.g., set of definitions {(i,1), (i,4),(i,6)…} • Transfer functions • The effect of a statement on the incoming flow facts • E.g., statement i=11 at 6 “kills” the incoming definition (i,4), and “generates” definition (i,6) {(i,1)} 2. if x<y {(i,1)} {(i,1)} 3. p(i) 4. i=j+5 {(i,4)} 5. p(i) {(i,4)} 6. i=11 {(i,1)} {(i,6)} 7. i=i*i

  10. Theory • Data-flow frameworks • Control-flow graph CFG • Space of flow facts L • Space of transfer functions F • Certain properties of L and F allow a general solution procedure • Fixed-point iteration • Termination: the iterative computation terminates • Safety (correctness, soundness): the solution is conservative • For most problems the analysis produces “noise”

  11. Theory and Practice • Analysis cost – how much time, memory • Analysis precision – how much noise • a.m(): A more precise analysis a: {B}, and a less precise analysis a: {A,B,C} • Typically, there is a tradeoff between cost and precision! • In practice, we need to analyze very large programs, 100K LOC, even 1M LOC

  12. Theory and Practice • Approximations - introduce noise • make the CFG “smaller” • make the set of flow facts “smaller” • make the transfer functions converge faster • Approximations are necessary • But be careful: different approximations for different analyses

  13. Standard Approximations • Flow-sensitive vs. flow-insensitive x: {true}x = true; x: {true}, y: {false}y = false; x: {false}, y: {false} x = y; x: {true,false}, y: {false}

  14. Standard Approximations • Context-sensitive vs. context-insensitive Merged flow: A(bool X) { this.f = X;} a = new A(true); b = new A(false); a.f = true/false a.f = true b.f = true/false b.f = false a.f: {true,false}, b.f: {true,false} a.f: {true}, b.f: {false}

  15. Useful Background and Skills • Higher-level undergraduate or graduate courses on: • Programming Languages, Compilers, Algorithms, Logic, Software Engineering, Architecture • Analytical and programming skills Step1: Design a program analysis algorithm • Understand your target language (e.g., Java and C++, C) Step2: Implement the analysis algorithm • Understand the language(s) of the infrastructure Step3: Evaluate analysis algorithm

  16. Useful Resources • Books (my personal list) • “Compilers: Principles, Techniques and Tools” by Aho, Sethi, Ullman, Ch. 10 • An introduction to data-flow analysis • “Program Analysis” by Nielsen, Nielsen, Hankin • An excellent reference for advanced students • “Model Checking” by Clarke, Grumberg, Peled • Course material on the web • Classes taught by professors • My class (there are better ones, of course): www.cs.rpi.edu/~milanova/csci6961/lectures/

  17. Using and Developing Program Analysis Mary Lou Soffa University of Virginia

  18. About Mary Lou Soffa Confused about what I wanted to be • Ph.D. programs: • Mathematics, Sociology; Philosophy; Environmental Acoustics: disenchanted • Found what I really loved – computer science • After 25+ years at Pitt, moved to UVA • Small farm – grow “crops”; love my tractor • Passion – increasing the participation of women and minorities in computer science • Professional achievement – 24 Ph.D. students; ½ are women.

  19. Program analysis • How to apply program analysis in your research • What are questions and what do you have to do

  20. Solve a problem Program behavior static or dynamic Determine information needed What parts of program are involved Develop appropriate representation Develop analysis Develop algorithm

  21. Have a goal – program code • Problem • Improve performance • Understand program • Find errors • Locate cause of errors • Need to collect information about the program that helps you infer properties of program • Static or dynamic code

  22. Determine information needed • What questions are you asking • What do you need to gather to answer questions • Examples: • Statements needed to compute an expression • Values are always constant at a particular program point • Locations of dead statement • Branches that are correlated

  23. Example: redundancy • Remove redundancies with goal of improving performance – • Redundant redundant expressions • Redundant loads • Redundant stores • Dead code • Static Remove redundant expressions from program representation

  24. Redundant expressions • Does the value need to be computed for correct semantics? X := A * B F := C + E C := C + 1 If (cond) then R := A * B; S := C+ E Else X := A * B; A := 6 End if G= A*B

  25. What parts of program involved • Given information you need, what parts of program are involved • Examples: • branches and statements that change values in conditional • all possible execution paths • Array definitions and uses • Types • Loops

  26. Example: Redundant expressions • Expressions • Definitions • Control flow among definitions and expressions • Program paths

  27. Program representation • Program representation that enables collection of information • Granularity • Source, intermediate, binary • Issues: how to get representation from another representation

  28. Example: redundant expressions • Want to know how expressions flow • Is the value of an expression same as when expression used again • Need control flow graph with statements in nodes – intermediate level • X := A + B

  29. Available Expressions Control flow graph X := A * B F := C + E C := C + 1 R := A * B S := C+ E X := A * B A := 6 G := A*B

  30. Formulate analysis over representation • How to gather information from representation • How many analyses • Direction of flow of analysis • Along all paths or any path • Local solution • Global solution

  31. Example: Redundant expressions • Local - basic block – single entry/exit • What expressions are generated • What expressions are “killed” by a definition • Global Flow over flow graph • Forward flow • Must be true on all paths

  32. Redundant Expressions Control flow graph X := A * B F := C + E C := C + 1 {A * B} { A * B} { A * B} R := A * B S := C+ E X := A * B A := 6 { A * B, C+E} G := A * B

  33. Develop analyses • Data flow equations – use data flow framework • Algorithm • Preciseness • Expense

  34. Data flow equations • Gen (B) = all expressions • Kill (B) = all definitions – kill all incoming available expression • Out(B) = Gen(B)  (IN(B) – Kill(B)) • In(B) = Out(j)

  35. Dynamic Optimization • Static optimizations • Apply before execution • Dynamic Optimizations • Apply during execution – redundancy expressions • Binary code • Program traces

  36. 1. A = 4 2. T1 = A*B 3. L1: T2 = T1/C 4.if T2<W go to L2 5. M = T1*K 6. T3 = M + 1 7. L2: H = I 8. M = T3-H 9. If 3 > 0 go to L3 B1 10. go to L1 B2 B4 B3 11. L3:halt B6 B5 B1 1. A = 4 2. T1 = A*B 3. L1: T2 = T1/C 4. If T2 < W go to L2 5. M = T1 * K 6. T3 = M + 1 7. L2: H = I 8. M = T3 - H 9. If T3 > 0 go to L3 10. Go to L1 11. L3: halt B2 B3 B4 B5 B6

  37. Program Trace Binary code A = 4 T1 = A*B T2 = T1/C If T2 !< W jump out H = I M = T3 - H If T3 > 0 go to L3 T2 = T1/C If T2 !< W jump out M = T1 * K T3 = M + 1 H = I M = T3 - H halt

  38. Dynamic optimization Note: Single entry; multiple exits No Loops Need to Representation – bring up a level from binary code

  39. Applying optimizations • Not as complicated • But, cannot tolerate much overhead • Phases in static • Developed algorithm that can apply multiple optimizations • Demand driven • Limit study of dynamic optimizations

  40. Conclusion • Need analysis in many different applications • Virtual execution enviroments • Multicore • Wireless sensor networks • Testing • Testing for wireless sensor networks • Testing for security

  41. Identifying and Building Infrastructure

  42. Lori’s Journey Science/Math love: Started in chemistry at liberal arts college. Field Trip and first cs course -> CS major. Advisor’s strong push for grad school -> U Pitt. Took compilers course fromMary Lou -> PhD in compiler optimization. Big year: 10/85-married Mark. 1/86-started at Rice. 4/86-PhD Family: The yankees returned north 3 years later! University of Delaware: 15+ yrs. Visiting, Assistant, Associate, Full Family: Lauren (HS senior), Lindsay (16 and driving), Matt (11) Support: Mark, Mark, Mark,… Mary Lou, Errol, Sandee, CRA-W Currently: software tools, testing, compiler optimization

  43. Identifying and Building Infrastructure for Analysis Research • What kinds of infrastructure do you need? • How to identify and build infrastructure • Examples

  44. What kinds of infrastructure do you need? Analysis Research and Evaluation People Analysis Framework Software Labspace Hardware Workloads

  45. Identifying Analysis Framework Software - Short term - Long term Determine Goals - Needed - Desired (Prioritized) Specify Requirements - Peers/Experts - Technical papers - Internet search Search for Possibilities Try Them Out - Install + Run Tests - Read docs - Examine code - Try small task Weigh Choices - Meet Requirements? - Ease of Use/Change?...

  46. Example: Identifying Analysis Framework Software Evaluate new analysis on Java On its own and in client tool Determine Goals - Needed: call graph, cfg, chg Realistic environment/apps Easy to extend/build client tools Specify Requirements - Common environment is IDE, Java.  Eclipse platform Search for Possibilities Try Them Out - Install + explore - Write a small plugin - Use call graph, chg, cfg for small task Weigh Choices - Learning curve vs Available analyses, realism

  47. Implementing Your Analysis • Once you have decided on an infrastructure: • Think Reuse!! Think modularity!! • Think prototype, but extensible and scalable • Test, test, test - try to be systematic • Debug – not easy

  48. Example: Implementing My NL Analysis • Build small modular components -> reuse • Analyzing method signatures to extract NL • Building program representation for NL • Traversing program rep • Building program rep for IR • Design reps to avoid loss of info -> reuse • Id’s and their roles and locations in code • Verb, Direct object rep -> extensible

  49. Managing the Evolving Software Infrastructure • Managing change over time and people • CVS, subversion • Tracking tasks, bugs, deadlines/goals • TRAC, bugzilla, gforge • Maintaining documentation • JavaDocs, Doxygen • Testing, testing, testing • Unit, system, regression -- test suites Sounds like software engineering…

  50. Selecting Appropriate Hardware - Short term - Long term Determine Goals - Needed - Desired (Prioritized) Specify Requirements Search for Possibilities - Peers/Experts - System Staff Weigh Choices - Meet Requirements? - Costs within budget? - Need to ask for money?

More Related