Component Rank : Relative Significance Rank for Software Component Search

Component Rank: Relative Significance Rank for Software Component Search Katsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, and Shinji Kusumoto Osaka University

SourceForge • Large open source software development web site • Version control, communication support, ... Hosted Projects: 60,888 Registered Users: 613,792

Motivation • Numerous software systems are being developed day by day • Similar components (libraries, portions of codes, or abstracted algorithms, ...) might be independently developed in different projects • Key factor for high productivity and reliability in today’s software development • Reuse • Exploring large software libraries is not easy • Little support to search components • Consistent management by human hand is difficult

Automated Component Library • Collect software components eagerly without preserving their inherent structures • Analyze relations among components by using various analysis techniques • Rank the components based on their significance • Answer user’s queries according to the rank Component Rank Model

Component Graph System Y System X A B F C G D E H I component use relation

0.1 0.1 0.1 0.2 0.2 0.1 0.1 0.05 0.05 Weight of Nodes System Y System X A B F C G D E H I sum of all node weights = 1 ... (1) weight of node represents significance of node

0.05 0.2 d=1/4 0.05 d=1/4 B 0.05 d=1/4 0.05 d=1/4 0.15 0.05 d: distribution ratio Weights of Edges A 0.4 0.2 • Node weight is distributed to each outgoing edge • Edge weights are collected at the destination node sum of all outgoing edge weights = origin node weight ... (2) sum of all incoming edge weights = destination node weight ... (3)

Definition of Weights • Under constraints (1)~(3), we have a simultaneous equation . = W: node weight vector Dt: transposed matrix of distribution ratios

0.34 0.33 0.17 0.17 0.33 0.33 0.33 Propagating Weights A B C

0.33 0.17 0.175 0.175 0.5 0.17 0.5 Propagating Weights A B C

0.25 0.25 0.345 0.175 Propagating Weights 0.5 0.175 A B 0.345 C

Propagating Weights 0.4 0.2 0.2 A B 0.2 0.4 0.2 0.4 C • Stable weight assignment • next-step weights are the same as previous ones • Component Rank : order of nodes sorted by the weight

0.02 0.01 0.01 0.05 0.03 0.001 0.1 Markov Model • Component rank model can be considered as a Markov Chain of user's focus • User's focus moves from one component to another along a use relation at a fixed time duration • Node weight represents the existence probability of the user's focus at infinite future

Adjustment to Software Products(1)Pseudo Use Relation A B C • Weight computation does not always converge • Add a pseudo edge from a node to another, if there is no 'real' edge • Distribution ratios: pseudo edges << real edges

C G BF AD E clustered component graph Adjustment to Software Products(2)Clustering Components C G B F A D E component graph

Prototype System SMMT measures similarity by clone detection technique • inheritance • method call • attribute access • abstract class impl. input measure similarity by SMMT extract use relation .java file = component similarity criterion t=0.8 (80% statements are the same) construct clustered component graph cluster similar components weight ratio p between real and pseudo edges : 0.85 output de-cluster to original components compute node weights component ranks equal distribution ratios d to outgoing edges

rank class name weight 1 java.lang.Object 0.161262 java.lang.Class 0.087123 java.lang.Throwable 0.055104 java.lang.Exception 0.031035 java.io.IOException 0.013436 java.lang.StringBuffer 0.012147 java.lang.SecurityManager 0.011698 java.io.InputStream 0.010279 java.lang.reflect.Field 0.0094810 java.lang.reflect.Constructor 0.00936 ... ...1256 sunw.util.EventListener 0.00011 ... ...1256 these 622 classes are not used by any other classes Experiment 1JDK1.3.0 575,000 lines, 1877 components 7 minutes on PC (Pentium IV, 2GHz, 2GB) superclass of all classes superclass of any error or exception handler • Very general and core classes : • ranked high • Specific and independent classes: • ranked low

rank class name weight 1 antlr.Token 0.10727 2 antlr.debug.Event 0.06189 2 antlr.debug.NewLineEvent 0.06189 4 antlr.collections.impl.Vector 0.05434 5 jp.gr.java_conf.keisuken.text.html.HtmlParameter 0.05246 6 jp.gr.java_conf.keisuken.net.server.ServerProperties 0.03699 7 Jama.Matrix 0.01564 8 jp.gr.java_conf.keisuken.util.IntegerArray 0.01390 8 jp.gr.java_conf.keisuken.util.LongArray 0.01390 10 jp.ac.osaka_u.es.ics.iip_lab.metrics.parser.IdentifierInfo 0.01365 ... ... 418 cktool_new.examples.Main 0.00050 Experiment 2:Collection of SE Tools and Libraries • CK metrics measurement tools, component rank system • ANTLR, JAMA, Caffe Cappuccino • 582 components Indicator of generality and specialty w.r.t. usage from other classes

Experiment 3:Application to Industry • Daiwa computer: a middle size software company in Osaka • Shared Java application framework for web-based data management • Framework+ 5 applications on framework • 1538 components, 339 clustered nodes • Classes in the framework and definitions of data structure are ranked high

class name weight order sorted by rank method definitions of obtaining node kinds in DOM tree 1(67) enhydra3.1 ... dom.Node 0.029110 2(169) saxon7_0 ... saxon.om.NodeInfo 0.000969 3(275) saxon7_0 ... saxon.pattern.NodeTest 0.000437 4(316) enhydra3.1 ... dom.DocumentImpl 0.000368 5(355) saxon7_0 ... saxon.pattern.Pattern 0.000324 6(382) saxon7_0 ... saxon.Controller 0.000296 7(437) enhydra3.1 ... xslt.XSLTEngineImpl 0.000241 8(446) enhydra3.1 ... dom.ElementImpl 0.000235 9(500) saxon7_0 ... saxon.style.StyleElement 0.000202 10(506) saxon7_0 ... saxon.tree.NodeImpl 0.000198 ... ... 125(4441) enhydra3.1 ... FuncID 0.000029 ... ... 125(4441) Experiment 4:Document Processing Tools and Libraries • JEDIT, jext, Enhydra, saxon, phex, JDK, etc. (7171 components) • Perform string search by grep command with keyword getNodetype We can easily find the core definitions of classes

Discussion 1: Weight Computation Reference Count Model Component Rank Model B B 0.31 0.2 A A 0.6 0.33 E D C E D C 0 0 0.2 0.03 0.03 0.30 Fragile to locally-made references, which may not be important globally More stable to local references

0.25 0.25 A X Clustering B Y 0.25 0.25 same weight arrangement as the case with no duplicated components Discussion 2: Clustering Policy (1) • Eliminate effect of simply duplicated components A A X B B Y original copy others

0.3 0.2 A X Clustering B C Y 0.15 0.15 0.2 A's weight is higher than others Discussion 2: Clustering Policy (2) • Count only reused components which are not simple duplicated A A X B C Y original modified others

Discussion 3: Similarity Criterion and Pseudo Use Relation • Similarity criterion t: 0.8 • Resulting ranks are fairly insensitive to t • Some inherently-different components are in the same cluster if t is less than 0.8 • Pseudo use relation ratios p: 0.85 • Resulting ranks are stable between 0.75 - 0.95

Related Works • Markov models of documentation traversal • Influence Weight: impact factor of journal publication thought incoming references • Page Rank: weight of HTML in the Internet through incoming web links Explicit use relations No clustering (important for software products) • Measurement reusability of components or interfaces • Use various characteristic metrics • Indirect indicator of reusability • Our approach directly reflects usage of components

S P A R S-J Software Product Archiving, Analyzing and Retrieving System for Java Analyzer and Evaluator Component Collector Internet / Corporate Repositories Query Handler Component Archive SPARS-J Software Component Searcher

Conclusion & Future Work • Component Rank: a novel model for software component • Prototype system for Java • Application to various collections of Java programs : promising results • Developing SPARS-J • Statistical evaluation (recall & precision) • Practical evaluation using SPARS-J • Other models (weight distribution, similarity, ...)

END

Global Analysis of Software Data Data Analysis Data on the Internet Collection Feedback Subsidiary Company Data Company-Wide Project Data

Weight Computation by Eigenvector • W is the eigenvector of eigenvalue 1 • math package for the eigenvector computation can be used, but generally slower then the propagation computation . = W: node weight vector Dt: transposed matrix of distribution ratios

Component Rank : Relative Significance Rank for Software Component Search

Component Rank : Relative Significance Rank for Software Component Search

Presentation Transcript

JavaBean Component

Component Identification

Pass-By-Value Services in Object Component Software

Bromeliaceae

Component-Based Software Engineering

Software Architecture or Component Frameworks?

Component-Based Software Engineering(CBSE)

Learning to Rank: New Techniques and Applications

Component-based software engineering 1

Component Based Development

Component-based Software Engineering

The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y:

Component Services

Lecture 9: Rank Aggregation in MetaSearch

Adding the TSE component to BANSMOM system and Software Development

Introduction to Distributed Component Models

Chapter 17 Component-based software engineering

Search Engine Rank Placement (SERP)

Rank Annihilation Based Methods

Vid Seo Rank review and (Free) $21,400 Bonus & Discount

Vid Seo Rank Detail Review and Vid Seo Rank $22,700 Bonus