290 likes | 466 Views
The Pivot: Static Analysis of C++ Applications. Bjarne Stroustrup Texas A&M University http://www.research.att.com/~bs. Overview. Static analysis of C++ What would be useful Why it is hard C++0x The Pivot Context Aims Organization Basic representations
E N D
The Pivot:Static Analysis of C++ Applications Bjarne Stroustrup Texas A&M University http://www.research.att.com/~bs
Overview • Static analysis of C++ • What would be useful • Why it is hard • C++0x • The Pivot • Context • Aims • Organization • Basic representations • High-level program representation for HPC • Concept-based checking and transformation
What would be useful? • Direct representation of high-level ideas in code • E.g. no sideeffects, idempotent operation, always gives the same answer for the same element, no security violation, no memory leak, no race condition, no deadlock, being sorted, being band-diagonal, parallel application … • Use of such direct representation • For providing guarantees • For information • For optimization • For program transformation
It’s hard • C++ is • Large • Extremely flexible and general • Quite irregular • Has it’s type-unsafe C subset • High-level ideas tend to be represented as templated classes and functions • Generic programming, Template meta-programming, generative programming • We have little experience with tools representing and manipulating templates • Such templates tend to be provided as part of domain specific libraries
Bell Labs proverbs • Library design is language design • Language design is library design But the devil is in the details
C++0x • 1998: ISO C++ standard • 2009 (estimated): ISO C++ standard • Better libraries and better support for library building • Hash maps, regular expressions, file system, … • Threads and memory model • Concepts • A type system for types, integers, and operations • Auto, template aliases, general, initializer lists, …
Concept: trivial example // Caveat: likely C++0x template<ForwardIterator Iter, ValueType Val> where Assignable<Iter::value_type,Val> Iter find(Iter first, Iter last, Val v); template<Container Cont, ValueType Val> where Assignable<Cont::value_type,Val> Iter find(Iter first, Iter last, Val v); vector<int> v = { 2, 3, 5, 8, 13, 21, 34 }; auto p1 = find(v.begin(), v.end(), 42); auto p2 = find(v,42.3); auto p3 = find(7,42); // error: 7 is not a Container
Concepts • Can express many high-level abstractions • A type system for sets of types, integers, and operations • We have experimental implementations of concepts • A concept is a handle to which we can attach • some “standard semantics” within the language • essentially arbitrary semantics outside the language using tools • Until we get concepts, we can “fake them” with static analysts and transformation tools
Context for the Pivot • Semantically Enhanced Library (Language) • Enhanced notation through libraries • Restrict semantics through tools • And take advantage of that semantics Domain Specific Library C++ Semantic Restrictions
Context for the Pivot Domain Specific Library • Provide the advantages of specialized languages • Without introducing new “special purpose” languages • Without supporting special-purpose language tool chains • Avoiding the 99.?% language death rate • Provide general support for the SELL idea • Not just a specialized tool per application/library • The Pivot fits here C++ Semantic Restrictions
Example SELL: Safe C++ • Add • Range-checked std::vector • iterators • Resource handles • Any (if needed) (a typesafe union type) • Subtract • Arrays • Pointers • New/delete • Unions • Excessively complex/obscure code • Uses of undefined construct not caught by compilers (e.g. a[++i] = i) • Transforms • Pointers into iterators and resource handles (if porting) • New/delete into resource handle uses
Example SELL: STAPL • Wait for Lawrence’s talk
Aims • To allow fully general analysis of C++ source code • “What a human can do” • Foci • Templates (e.g. specialization) • C++0x features (e.g. concepts, generalized initializers) • Distributed programming • Embedded systems • Limitation: we work after macro expansion • To allow transformation of C++ code • i.e. production of new code from old source • Non-aim: handling other languages • e.g. Fortran, Java • but C and C++ dialects are relatively easy
Related work • Lots • 20+ tools for analyzing C++ • But • Most are specialized • E.g. alias analysis, flow analysis, numeric optimizations • Most are attached to a single compiler/parser • None handles all of C++ • E.g. C + classes, C++ but not standard libraries • (that requires full handling of templates) • Hardly two tools handle the same subset • None handles the key C++0x features (e.g. concepts) • Some are proprietary • No serious interoperability
The Pivot C++ source Object code Tool 1 Compiler Compiler Compiler IDL Tool 2 IPR C++ source Tool 3 Tool 4 Specialized representation (e.g. flow graph) XPR “information”
Why? The Original Project • Communication with remote mobile device • Calling interface • CORBA, DCOM, Java RMI, …, homebrew interface • Transport • TCP/IP, XML, …, homebrew protocol • Big, Ugly, Slow, Proprietary, … • Why can’t I just write ISO Standard C++?
The original Project Distributed programs in ISO C++ • “as similar as possible to non-distributed programming, but no more similar” // use local object: X x; // remote at “my host” A a; std::string s("abc"); // … x.f(a, s); // a function call // use remote object : proxy<X> x; x.connect("my_host"); A a; std::string s("abc"); // … x.f(a, s); // a message send
IPR high-level principles • Complete: Direct representation of C++ • Built-in types, classes, templates, expressions, statements, translation units … • Can represent erroneous and incomplete C++ programs • Regular • The structure contains all of C++ but doesn’t mimic irregularities • Programming effort proportional to complexity of task • IPR is not just a data structure • Extensible • Node types • Information associated with a node • Operations • No integration with compilers
IPR design choices • Type safe • IPR (not its users) handles memory management • Minimal (run-time and space) • Minimal number of nodes (unification) • Minimal number of checked indirections (usually, virtual function calls) • Expression-based regular superset of C++ • E.g. statements, declarations are expressions too • C++0x features (most important: concepts – types have types) • Interfaces: • Purely functional, abstract classes, for most users • No mutation operation on abstract classes • Users don't get pointers directly • Mutating (operates on concrete classes) • Users get to use pointers for in-place transformation • Traversals (and queries) • Several, most not in “the Pivot core”
IPR is minimal • Necessary for dealing with real-world code • Multi-million line programs are not uncommon • Given the constraint of completeness • C++ is complex • especially when we use the advanced template features essential for high-performance work • Unified representation • E.g., there is only one int and only one 1 • Type comparison becomes pointer comparison • Indirections are minimized • An indirection (only) when there is a choice of different types of information
Original idea (XTI) • Too large, too slow
Current hierarchy (IPR) • Compact • minimal call overhead
IPR – Example 1 void foo(float b = 2.4)
XPR (eXternal Program Representation) • Can be thought of as a specialized portable object database • Easy/fast to parse • Easy/fast to write • Compact • About as compact as C++ source code • Robust • Read/write without using a symbol table • LR(1), strictly prefix declaration syntax • Human readable • Human writeable • Can represent almost all of C++ directly • No preprocessor directives • No multiple declarators in a declaration • No <, >, >>, or << in template arguments, except in parentheses
XPR i : int // int i; C : class { // class C { m : const int // const int m; mm : *const int // const int* mm; f : (:int,:*char) double // double f(int,char*); f : (z:complex) C // C f(complex z); } // }; vector : <T> class { // template<class T> class vector { p : *T // T* p; sz : int // int sz; } // };
Extremely simple SELL example template <Parallelizable T> void f(const T& v) { double d = v[2]; // OK double* d = &v[2]; // not OK };
Current and future work • Complete infrastructure • Complete EDG and GCC interfaces • Represent headers (modularity) directly • Complete type representation in XPR • Initial applications • Style analysis • including type safety and security • Analysis and transformation of STAPL programs • Build alliances
References [GJS+06] Gregor, Douglas; Järvi, Jaako; Siek, Jeremy; Lumsdaine, Andrew; Dos Reis, Gabriel; Stroustrup, Bjarne: Concepts: Linguistic Support for Generic Programming in C++. to appear OOPSLA'06. [DRS05] Stroustrup, Bjarne; Dos Reis, Gabriel: A concept design. C++ Committee, paper N1782. April 2005. [SDR05] Stroustrup, Bjarne; Dos Reis, Gabriel: Supporting SELL for High Performance Computing. LCPC '05. [Str05] Stroustrup, Bjarne: A rational for semantically enhanced libraries. LCSD '05.