270 likes | 489 Views
Code and Pattern Mining in C/C++. Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic). Agenda. Introduction Problem Definition Data flow Design patterns Summary. Introduction.
E N D
Code and Pattern Mining in C/C++ Aditya S. Deshpande Namratha Nayak Guides: Dr. A.Serebrenik(TU/e) P.Kourzanov, ir(NXP) Y.Dajsuren, PDEng(Virage Logic)
Agenda Introduction Problem Definition Data flow Design patterns Summary / Faculteit Wiskunde en Informatica
Introduction Code mining – Process of extracting patterns from source code. Design Patterns – A design pattern is a general reusable solution to a commonly occurring problem in software design. Streaming Data - Data streaming is the transfer of data at a steady high-speed rate sufficient to support such applications as high-definition television or a radio signal. / Faculteit Wiskunde en Informatica
Problem Definition • Lack of synchronisation between models and source code. • Significant amount of repetitive code in different modules. • Identifying patterns and integrating them in the framework. • Objective / Faculteit Wiskunde en Informatica
Approach Study the Design flow models available. Study the various design pattern matching methods and tools. / Faculteit Wiskunde en Informatica
Data flow models Kahn Process Networks. Synchronous Data Flow. / Faculteit Wiskunde en Informatica
Kahn Process Network - Introduction • Processes communicate via FIFO. • Parallel communication is organized as follows • Autonomous computing stations are connected to each other in a network by communication lines. • A station computes on data coming on its input lines to produce output on some or all of its output lines. • Assumptions • Communication lines are the only means of communication. • Communication lines transmit info within a finite time. / Faculteit Wiskunde en Informatica
Kahn Process Network - Introduction • Restrictions • At any given time a computing station is either computing or waiting for information on one of its input lines. • Each computing station follows a sequential program. / Faculteit Wiskunde en Informatica
Kahn Process Network - Example u f w v Process alternately reads from u and v, prints the data value, and writes it to w. From Kahn’s original 1974 paper process f(in int u, in int v, out int w) { int i; bool b = true; for (;;) { i = b ? wait(u) : wait(v); printf("%i\n", i); send(i, w); b = !b; } } / Faculteit Wiskunde en Informatica
Kahn Process Network - Example Process interface includes FIFO’s. wait() returns the next token in an input FIFO, blocking if it’s empty send() writes a data value on an output FIFO From Kahn’s original 1974 paper process f(in int u, in int v, out int w) { int i; bool b = true; for (;;) { i = b ? wait(u) : wait(v); printf("%i\n", i); send(i, w); b = !b; } } / Faculteit Wiskunde en Informatica
SDF - Introduction Synchronous data flow graph (SDF) is a network of synchronous nodes (also called blocks). For a synchronous node, the consumptions and productions are known a priori. Homogeneous SDF / Faculteit Wiskunde en Informatica
SDF - Delay A B d • Delay of signal processing • Unit delay on arc between A and B, means • nth sample consumed by B, is (n-1)th sample produced by A. • The arc is initialized with ‘d’ zero samples. / Faculteit Wiskunde en Informatica
SDF - Implementation • Implementation requires: • Buffering of the data samples passing between nodes • Schedule nodes when inputs are available • Dynamic implementation (= runtime) requires • Runtime scheduler checks when inputs are available and schedules nodes when a processor is free. / Faculteit Wiskunde en Informatica
SDF - Implementation • Contribution of Lee-87: • SDF graphs can be scheduled at compile time • No overhead • Compiler will: • Determine the execution order of the nodes on one or multiple processors or data path units • Determine communication buffers between nodes. / Faculteit Wiskunde en Informatica
Design Patterns Describe solutions for common recurring problems Can be used in a wider context as they are defined informally Documenting them in a software system simplifies maintenance and program understanding Usually it is not documented, so there is a need to discover design patterns from source code / Faculteit Wiskunde en Informatica
Pattern Mining Structure of design pattern is searched in the source code. Should include the main properties of the design pattern Flexible to describe the slightly distorted pattern occurrences. Helps to understand the relationships between the different parts of a large system / Faculteit Wiskunde en Informatica
Pattern Mining • Reverse Engineering • Analysis of a system to • Identify the components and their interrelationships • Create representations of the system in another form • Why tools for Reverse Engineering? • Existing legacy code • High number of participants in code development • Tools developed to mine the patterns from the source code / Faculteit Wiskunde en Informatica
Pattern Mining Tools • Aspects in the different mining tools • Programming Language : Tools for Java and C++ • Method used to discover design patterns : Graph Matching , Constraint Satisfaction Problem, pattern inference • Intermediate Representation – Abstract Semantic Graph, Abstract Syntax Tree, Matrix and Vector / Faculteit Wiskunde en Informatica
Columbus – Design Pattern Mining Tool Reverse engineering framework Developed in cooperation between the Research Group on Artificial Intelligence in Szeged, the Software Technology Laboratory of the Nokia Research Center and FrontEndART Ltd. Analyze large C/C++ projects and extract data according to the Columbus Schema Supports project handling , data extraction , data representation, data storage, filtering and visualization / Faculteit Wiskunde en Informatica
Columbus - Design Pattern Mining Tool Has a C/C++ extractor plug-in that performs the parsing of the source code Information collected by the plug-in corresponds to the Columbus Schema Schema captures C++ language at low detail(i.e, Abstract Syntax Tree) and has the higher –level elements(i.e., semantics of types) Supports various file formats for exporting the extracted data / Faculteit Wiskunde en Informatica
Other Pattern Mining Tools • Other tools to be studied • CPP2XMI • Maisa • CrocoPat / Faculteit Wiskunde en Informatica
Issues to be considered Can the tools support NXP source Code? Would it be possible to add proprietary patterns to these tools? Can these tools be extended to support other languages like C? / Faculteit Wiskunde en Informatica
Summary Overview of the Data flow models Introduced the design pattern mining tool - Columbus Find the patterns present in the NXP source code and check whether these can be mined using the available tools / Faculteit Wiskunde en Informatica
References E.A.Lee and D.G.Messerschmitt, “Synchronous data flow”,Proc. IEEE, vol. 75, pp. 1235-1245, Sept 1987. G.Kahn, “The semantics of a simple language for parallel programming”, Proc.IFIP congr., Stockholm, Sweden, Aug.1974, pp.471-475 Gamma, E., Helm, R., Johnson, R. and Vlissides, J. Design Patterns - Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995. / Faculteit Wiskunde en Informatica
References R. Ferenc, A. Beszedes, M. Tarkiainen, and T. Gyimothy. Columbus – Reverse Engineering Tool and Schema for C++. In Proceedings of the 6th International Conference on Software Maintenance (ICSM 2002), pages 172–181. IEEE Computer Society, Oct. 2002. R. Ferenc , and A. Beszedes. Data Exchange with the Columbus Schema for C++. In Proceedings of the 6th European Conference on Software Maintenance and Reengineering (CSMR 2002), pages 59–66. IEEE Computer Society, Mar. 2002. / Faculteit Wiskunde en Informatica
References Z. Balanyi, and R. Ferenc. Mining Design Patterns from C++ Source Code. In Proceedings of the 19thInternational Conference on Software Maintenance (ICSM 2003), pages 305–314. IEEE Computer Society, Sept. 2003. / Faculteit Wiskunde en Informatica
Questions QUESTIONS???? / Faculteit Wiskunde en Informatica