200 likes | 355 Views
Example application: source code analysis. 125 file types; 8029 files; 4689 non-Java; 1112 svn revisions. Querying Software Artefacts. source code. query engine. IDE plugin. version history. parsers. developer. bug reports. build scripts. dash board. software repository. databases.
E N D
Example application: source code analysis 125 file types; 8029 files; 4689 non-Java; 1112 svn revisions
Querying Software Artefacts source code query engine IDE plugin version history parsers developer bug reports build scripts dashboard software repository databases manager spreadsheets exceladd-in config files web pages analyst
The problem design query language and engine for accessing vast repository of different types of source artefact libraries of queries: tailor framework to different types of artefact
Tough problem! Dozens of attempts, in industry and academia since 1984: databases, prolog, domain-specific query languages • Difficulties: • - does not scale • efficient queries extremely hard to write • specific to one kind of source artefact 18 man-years of research at University of Oxford 1996-2005 to discover ingredients of solution 15 man-years to implement an industrial product 3 patents pending, several more in pipeline
The query language .QL • Object-oriented, for creating libraries of queries • Recursive queries, as in logic programming • Familiar syntax to Java and SQL developers • On top of any traditional relational database • Syntax-highlighting, error-checking and auto-completion
How it works XMLfiles RDBMS .QL library java / jar .QL query bytecode for search procedural SQL template for RDBMS Semmle optimiser
Demo • The source we shall explore: • Alfresco: Enterprise Content Management • Spring: Java/JEE Application Framework • Builds on Tomcat, JBoss, … Vital statistics: 50553 Java methods 6647 Java types 516 XML files • Demo parts: • out-of-the-box • writing your own queries • querying XML config files
Using SemmleCode out-of-the-box 115 pre-packaged queries Find common bug patterns: e.g. compareTo/equals, cloning, serialisation, internationalization Compute metrics: 42 different metrics, including Robert Martin’s package metrics Examine dependencies: e.g. cyclic package dependencies • Visualization: • pie charts, bar charts, tables, graphs, warnings/errors • easy navigation to source • exportable for generating reports
Writing queries of your own: select from Method m where m.fromSource() and m.hasName("compareTo") and not m.getDeclaringType(). getAMethod().hasName("equals") select m, "missing equals?" In general: from <variable-declarations> where <conditions> select <results>
Writing queries of your own: aggregates selectsum (CompilationUnit cu | cu.fromSource() | cu.getNumberOfLinesOfCode()) In general: agg( T1 x1, …, Tn xn | condition | expr )
Writing queries of your own: recursion from RefType s, RefType t, RefType it where it.hasName("PasswordInputTag") and it.hasSupertype*(s) and it.hasSupertype*(t) and t.hasSupertype(s) select t,s In general, can write recursive predicate definitions
Queries in .QL from-where-select autocompletion, typechecking, emptiness tests aggregates arbitrary nesting, no group-by needed recursion implicit with chaining; or explicit
Defining new classes in .QL class ClassAttribute extends XMLAttribute { ClassAttribute() { this.getName()="class" } string getClassName() { this.getValue() = result } RefType getType() { result.getQualifiedName() = this.getClassName() } predicate noType() { notexists(this.getType()) } } from ClassAttribute ca where ca.noType() and ca.getClassName().matches("org.alfresco%") select ca, ca.getClassName() + " not found"
Classes in .QL classes are logical properties “constructor” specifies characteristic property methods body is relation between this, result and parameters more than one result allowed predicates methods without a result body is relation between this and parameters
The key points of .QL designed for creating libraries of queries • classes are predicates • inheritance is implication • nondeterministic expressions recursion with super-simple semantics syntax familiar to SQL and Java programmers excellent error checking and IDE integration
Couldn’t you use LINQ instead of .QL? • Different design goals:ORM versus libraries of queries • LINQ does not provide recursion • LINQ cannot do the optimisations across multiple queries that are key to efficiency in .QL “Fortunately, there is light in the darkness. Based on decades of programming language research, the brilliant team at Semmle has created an elegant, industrial strength object-oriented query language called .QL with full support for recursive queries and aggregation… .QL has all the requisites to become a runaway success.” (Erik Meijer, Creator of LINQ, Microsoft)
Too good to be true? Jeff Ullman, 1991: It is not possible for a query language to be seriously logical and seriously object-oriented at the same time. key breakthroughs are Semmle’s proprietary technology: - design of .QL - optimisations on “bytecode for search”
Wrapping up Java is not enough source code analysis tools must process a multitude of artefacts libraries of queries a means to achieve such heterogeneous tools .QL object-oriented queries over trees and graphs made fast and easy