1 / 35

Component Rank : Relative Significance Rank for Software Component Search

Explore large software libraries efficiently with a tool to collect, analyze, and rank components based on significance for improved productivity and reliability in software development.

wdella
Download Presentation

Component Rank : Relative Significance Rank for Software Component Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Component Rank: Relative Significance Rank for Software Component Search Katsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, and Shinji Kusumoto Osaka University

  2. SourceForge • Large open source software development web site • Version control, communication support, ... Hosted Projects: 60,888 Registered Users: 613,792

  3. Motivation • Numerous software systems are being developed day by day • Similar components (libraries, portions of codes, or abstracted algorithms, ...) might be independently developed in different projects • Key factor for high productivity and reliability in today’s software development • Reuse • Exploring large software libraries is not easy • Little support to search components • Consistent management by human hand is difficult

  4. Automated Component Library • Collect software components eagerly without preserving their inherent structures • Analyze relations among components by using various analysis techniques • Rank the components based on their significance • Answer user’s queries according to the rank Component Rank Model

  5. Component Graph System Y System X A B F C G D E H I component use relation

  6. 0.1 0.1 0.1 0.2 0.2 0.1 0.1 0.05 0.05 Weight of Nodes System Y System X A B F C G D E H I sum of all node weights = 1 ... (1) weight of node represents significance of node

  7. 0.05 0.2 d=1/4 0.05 d=1/4 B 0.05 d=1/4 0.05 d=1/4 0.15 0.05 d: distribution ratio Weights of Edges A 0.4 0.2 • Node weight is distributed to each outgoing edge • Edge weights are collected at the destination node sum of all outgoing edge weights = origin node weight ... (2) sum of all incoming edge weights = destination node weight ... (3)

  8. Definition of Weights • Under constraints (1)~(3), we have a simultaneous equation . = W: node weight vector Dt: transposed matrix of distribution ratios

  9. 0.34 0.33 0.17 0.17 0.33 0.33 0.33 Propagating Weights A B C

  10. 0.33 0.17 0.175 0.175 0.5 0.17 0.5 Propagating Weights A B C

  11. 0.25 0.25 0.345 0.175 Propagating Weights 0.5 0.175 A B 0.345 C

  12. Propagating Weights 0.4 0.2 0.2 A B 0.2 0.4 0.2 0.4 C • Stable weight assignment • next-step weights are the same as previous ones • Component Rank : order of nodes sorted by the weight

  13. 0.02 0.01 0.01 0.05 0.03 0.001 0.1 Markov Model • Component rank model can be considered as a Markov Chain of user's focus • User's focus moves from one component to another along a use relation at a fixed time duration • Node weight represents the existence probability of the user's focus at infinite future

  14. Adjustment to Software Products(1)Pseudo Use Relation A B C • Weight computation does not always converge • Add a pseudo edge from a node to another, if there is no 'real' edge • Distribution ratios: pseudo edges << real edges

  15. C G BF AD E clustered component graph Adjustment to Software Products(2)Clustering Components C G B F A D E component graph

  16. Prototype System SMMT measures similarity by clone detection technique • inheritance • method call • attribute access • abstract class impl. input measure similarity by SMMT extract use relation .java file = component similarity criterion t=0.8 (80% statements are the same) construct clustered component graph cluster similar components weight ratio p between real and pseudo edges : 0.85 output de-cluster to original components compute node weights component ranks equal distribution ratios d to outgoing edges

  17. rank class name weight 1 java.lang.Object 0.161262 java.lang.Class 0.087123 java.lang.Throwable 0.055104 java.lang.Exception 0.031035 java.io.IOException 0.013436 java.lang.StringBuffer 0.012147 java.lang.SecurityManager 0.011698 java.io.InputStream 0.010279 java.lang.reflect.Field 0.0094810 java.lang.reflect.Constructor 0.00936 ... ...1256 sunw.util.EventListener 0.00011 ... ...1256 these 622 classes are not used by any other classes Experiment 1JDK1.3.0 575,000 lines, 1877 components 7 minutes on PC (Pentium IV, 2GHz, 2GB) superclass of all classes superclass of any error or exception handler • Very general and core classes : • ranked high • Specific and independent classes: • ranked low

  18. rank class name weight 1 antlr.Token 0.10727 2 antlr.debug.Event 0.06189 2 antlr.debug.NewLineEvent 0.06189 4 antlr.collections.impl.Vector 0.05434 5 jp.gr.java_conf.keisuken.text.html.HtmlParameter 0.05246 6 jp.gr.java_conf.keisuken.net.server.ServerProperties 0.03699 7 Jama.Matrix 0.01564 8 jp.gr.java_conf.keisuken.util.IntegerArray 0.01390 8 jp.gr.java_conf.keisuken.util.LongArray 0.01390 10 jp.ac.osaka_u.es.ics.iip_lab.metrics.parser.IdentifierInfo 0.01365 ... ... 418 cktool_new.examples.Main 0.00050 Experiment 2:Collection of SE Tools and Libraries • CK metrics measurement tools, component rank system • ANTLR, JAMA, Caffe Cappuccino • 582 components Indicator of generality and specialty w.r.t. usage from other classes

  19. Experiment 3:Application to Industry • Daiwa computer: a middle size software company in Osaka • Shared Java application framework for web-based data management • Framework+ 5 applications on framework • 1538 components, 339 clustered nodes • Classes in the framework and definitions of data structure are ranked high

  20. class name weight order sorted by rank method definitions of obtaining node kinds in DOM tree 1(67) enhydra3.1 ... dom.Node 0.029110 2(169) saxon7_0 ... saxon.om.NodeInfo 0.000969 3(275) saxon7_0 ... saxon.pattern.NodeTest 0.000437 4(316) enhydra3.1 ... dom.DocumentImpl 0.000368 5(355) saxon7_0 ... saxon.pattern.Pattern 0.000324 6(382) saxon7_0 ... saxon.Controller 0.000296 7(437) enhydra3.1 ... xslt.XSLTEngineImpl 0.000241 8(446) enhydra3.1 ... dom.ElementImpl 0.000235 9(500) saxon7_0 ... saxon.style.StyleElement 0.000202 10(506) saxon7_0 ... saxon.tree.NodeImpl 0.000198 ... ... 125(4441) enhydra3.1 ... FuncID 0.000029 ... ... 125(4441) Experiment 4:Document Processing Tools and Libraries • JEDIT, jext, Enhydra, saxon, phex, JDK, etc. (7171 components) • Perform string search by grep command with keyword getNodetype We can easily find the core definitions of classes

  21. Discussion 1: Weight Computation Reference Count Model Component Rank Model B B 0.31 0.2 A A 0.6 0.33 E D C E D C 0 0 0.2 0.03 0.03 0.30 Fragile to locally-made references, which may not be important globally More stable to local references

  22. 0.25 0.25 A X Clustering B Y 0.25 0.25 same weight arrangement as the case with no duplicated components Discussion 2: Clustering Policy (1) • Eliminate effect of simply duplicated components A A X B B Y original copy others

  23. 0.3 0.2 A X Clustering B C Y 0.15 0.15 0.2 A's weight is higher than others Discussion 2: Clustering Policy (2) • Count only reused components which are not simple duplicated A A X B C Y original modified others

  24. Discussion 3: Similarity Criterion and Pseudo Use Relation • Similarity criterion t: 0.8 • Resulting ranks are fairly insensitive to t • Some inherently-different components are in the same cluster if t is less than 0.8 • Pseudo use relation ratios p: 0.85 • Resulting ranks are stable between 0.75 - 0.95

  25. Related Works • Markov models of documentation traversal • Influence Weight: impact factor of journal publication thought incoming references • Page Rank: weight of HTML in the Internet through incoming web links Explicit use relations No clustering (important for software products) • Measurement reusability of components or interfaces • Use various characteristic metrics • Indirect indicator of reusability • Our approach directly reflects usage of components

  26. S P A R S-J Software Product Archiving, Analyzing and Retrieving System for Java Analyzer and Evaluator Component Collector Internet / Corporate Repositories Query Handler Component Archive SPARS-J Software Component Searcher

  27. Conclusion & Future Work • Component Rank: a novel model for software component • Prototype system for Java • Application to various collections of Java programs : promising results • Developing SPARS-J • Statistical evaluation (recall & precision) • Practical evaluation using SPARS-J • Other models (weight distribution, similarity, ...)

  28. END

  29. Global Analysis of Software Data Data Analysis Data on the Internet Collection Feedback Subsidiary Company Data Company-Wide Project Data

  30. Weight Computation by Eigenvector • W is the eigenvector of eigenvalue 1 • math package for the eigenvector computation can be used, but generally slower then the propagation computation . = W: node weight vector Dt: transposed matrix of distribution ratios

More Related