ARIES: Refactoring Support Tool for Code Clone

ARIES: Refactoring Support Tool for Code Clone Yoshiki Higo1, Toshihiro Kamiya2, Shinji Kusumoto1, Katsuro Inoue1 1Osaka University 2National Institute of Advanced Industrial Science and Technology {y-higo, kamiya, kusumoto, inoue}@ist.osaka-u.ac.jp

Contents • Background • Proposed Refactoring Support Method • Step1, Step2, Step3 • Metrics used in Step3 • Refactoring Support Tool Aries • Case Study • Overview • Filtering Conditions • Evaluation Method • Result • Conclusion

Background • What is code clone? • a code fragment that has identical or similar fragments in the same or different files in a system • introduced in the source program because of various reasons such as reusing code by `copy-and-paste’ • makes software maintenance more difficult. copy-and-paste copy-and-paste

Our Refactoring Support Method for Code Clone • We have proposed a refactoring support method for code clone • We adopt three steps to get refactoring-oriented code clones quickly • First Step: Run CCFinder to get token-based code clones • Second Step: Extract structural parts from code clones detected by CCFinder • Final Step: Characterize extracted code clones to predict how they can be refactored • We have implemented our method as a tool Aries • In this presentation, we explain about our method and report the result of a case study using Aries

Our Refactoring Support Method:First Step: CCFinder (Outline) • CCFinder directly compares source code on token unit, and detects code clones • Normalization of name space • Replacement of names defined by user • Removal of table initialization • Consideration of module delimiter • CCFinder can analyze the system of millions line scale in practical use time

Source files Lexical analysis Token sequence Transformation Transformed token sequence Match detection Clones on transformed sequence Formatting Clone pairs 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. } 10. static void goo(String [] a) throws RESyntaxException { 11. RE exp = new RE("[0-9,]+"); 12. int sum = 0; 13. for (int i = 0; i < a.length; ++i) 14. if (exp.match(a[i])) 15. sum += parseNumber(exp.getParen(0)); 16. System.out.println("sum = " + sum); 17. } Lexical analysis Lexical analysis Lexical analysis Token sequence Token sequence Token sequence Transformation Transformation Transformation Transformed token sequence Transformed token sequence Transformed token sequence Match detection Match detection Match detection Clones on transformed sequence Clones on transformed sequence Clones on transformed sequence Formatting Formatting Formatting Our Refactoring Support Method: First Step: CCFinder (Clone Detection Process) 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. } 10. static void goo(String [] a) throws RESyntaxException { 11. RE exp = new RE("[0-9,]+"); 12. int sum = 0; 13. for (int i = 0; i < a.length; ++i) 14. if (exp.match(a[i])) 15. sum += parseNumber(exp.getParen(0)); 16. System.out.println("sum = " + sum); 17. }

C1 C2 C3 C4 C5 Definitions:Clone Pair and Clone Set • Clone Pair: a pair of identical or similar fragments • Clone Set: a set of identical or similar fragments • CCFinder detects code clones as clone pairs • After detection process, clone pairs are transformed into clone sets

Our Refactoring Support Method: Second Step: Extract Structural Code Clones • Structural code clones are regarded as the target of refactoring • Extract structural parts as structural code clones from the detected clone sets • What is structural code clone ? • example: Java language • Declaration: class declaration, interface declaration • Method: method body, constructor, static initializer • Statement: do, for, if, switch, synchronized, try, while

Code clones which CCFinder detects Code clones which our method detects fragment 1 fragment 2 609: reset(); 610: grammar = g; 611: // Lookup make-switch threshold in the grammar generic options 612: if (grammar.hasOption("codeGenMakeSwitchThreshold")) { 613: try { 614: makeSwitchThreshold = grammar.getIntegerOption("codeGenMakeSwitchThreshold"); 615: //System.out.println("setting codeGenMakeSwitchThreshold to " + makeSwitchThreshold); 616: } catch (NumberFormatException e) { 617: tool.error( 618: "option 'codeGenMakeSwitchThreshold' must be an integer", 619: grammar.getClassName(), 620: grammar.getOption("codeGenMakeSwitchThreshold").getLine() 621: ); 622: } 623: } 624: 625: // Lookup bitset-test threshold in the grammar generic options 626: if (grammar.hasOption("codeGenBitsetTestThreshold")) { 627: try { 628: bitsetTestThreshold = grammar.getIntegerOption("codeGenBitsetTestThreshold"); 623: } 624: 625: // Lookup bitset-test threshold in the grammar generic options 626: if (grammar.hasOption("codeGenBitsetTestThreshold")) { 627: try { 628: bitsetTestThreshold = grammar.getIntegerOption("codeGenBitsetTestThreshold"); 629: //System.out.println("setting codeGenBitsetTestThreshold to " + bitsetTestThreshold); 630: } catch (NumberFormatException e) { 631: tool.error( 632: "option 'codeGenBitsetTestThreshold' must be an integer", 633: grammar.getClassName(), 634: grammar.getOption("codeGenBitsetTestThreshold").getLine() 635: ); 636: } 637: } 638: 639: // Lookup debug code-gen in the grammar generic options 640: if (grammar.hasOption("codeGenDebug")) { 641: Token t = grammar.getOption("codeGenDebug"); 642: if (t.getText().equals("true")) {

Code clones which CCFinder detects 1007: if ( inputState.guessing==0 ) { 1008: buf.append(a.getText()); 1009: } 1010: { 1011: _loop144: 1012: do { 1013: if ((LA(1)==WILDCARD)) { 1014: match(WILDCARD); 1015: a=id(); 1016: if ( inputState.guessing==0 ) { 1017: buf.append('.'); buf.append(a.getText()); 1018: } 1019: } 1527: if ( inputState.guessing==0 ) { 1528: t=a.getText(); 1529: } 1530: { 1531: _loop84: 1532: do { 1533: if ((LA(1)==COMMA)) { 1534: match(COMMA); 1535: id(); 1536: if ( inputState.guessing==0 ) { 1537: t+=","+b.getText(); 1538: } 1539: }

Our Refactoring Support Method: Final Step: Characterize Extracted Code Clones • The following refactoring patterns[1][2] can be applied to remove clone sets including structural code clones • Extract Class, • Extract Method, • Extract Super Class, • Form Template Method, • Move Method, • Parameterize Method, • Pull Up Constructor, • Pull Up Method, • For each clone set, our method determines which refactoring pattern is applicable by using several metrics. [1]: M. Fowler: Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1999. [2]: http://www.refactoring.com/, 2004.

example： ・Clone set S includes fragments f1 andf2. ・In fragment f1 , externally defined variable band c are referred and ais assigned to. ・Fragment f2 is same as f1. then，NRV(S) = ( 2 + 2 ) / 2 = 2 NSV(S) = ( 1 + 1 ) / 2 = 1 int a , b, c; … if( … ){ …; … = b + c; a = …; …; } … int a , b, c; … if( … ){ …; … = b + c; a = …; …; } … Fragment f1 Fragment f2 reference reference assignment assignment Metrics(1): Coupling Metrics for Clone SetNRV, NSV • NRV(S): represents the average number of externally defined variables referred in the fragment of a clone set S • NSV(S): represents the average number of externally defined variables assigned to in the fragment of a clone set S • Definition • Clone set S includes fragment f1, f2, ・・・, fn • si is the number of externally defined variable which fragment fi refers • ti is the number of externally defined variable which fragment fi assigns

example 3： ・Clone set S includes fragments f1 and f2. ・If all classes which include f1 and f2 don’t have common parent class, then，DCH(S) = ∞ example 1: ・Clone set S includes fragments f1 and f2. ・If all fragments of clone set S are included in a same class, then， DCH(S) = 0 example 2：・Clone set S includes fragments f1 and f2. ・If all fragments of clone set S are included in a class and its direct child classes, then，DCH(S) = 1 class B class A fragment f1 fragment f2 class A fragment f1 fragment f2 class A class C class B fragment f1 fragment f2 Metrics（2）：Inheritance Metric for Clone SetDCH • DCH(S): represents the position and distance between each fragment of a clone set S • Definition • Clone set S includes fragment f1, f2, ・・・，fn • Fragment fi exists in class Ci • Class Cp is a class which locates lowest position in C1, C2, ・・・，Cn on class hierarchy • If no common parent class of C1，C２，・・・，Cn exists, the value of DCH(S) is -1 • This metric is measured for only the class hierarchy where target software exists.

Aries: Refactoring Support ToolOverview • Target: Java programs • Runtime environment: JDK1.4 or above • Implementation • Analysis component: Java 32,000 Lines • CCFinder is used as code clone detection component • JavaCC is used to construct syntax and semantic analysis component • GUI component: Java14,000 Lines • User can specify target clone sets through GUI operations.

Case Study: Overview • Business application developed by a software company in Japan • Size • LOC: 70,000 • Number of classes: 309 • Process of case study • Step1: Extract clone set which can be refactored using the following refactoring patterns • Extract Super Class • Move Method • Extract Method • Step2: Qualitatively evaluate the results of the refactorings by a developer from the viewpoint of size, design, cohesion, coupling, understandability, and reusability

Before Refactoring After Refactoring Department Party getTotalAnnualCost() getName() getHeadCount() getAnnualCost() getName() Similar classes Employee Employee Department getAnnualCost() getName() getId() getTotalAnnualCost() getHeadCount() getTotalAnnualCost() getHeadCount() Case Study: Filtering Conditions(1/3) • Conditions for “Extract Super Class” • The target is Class-Unit clone sets • Cloned Classes have no parent

Before Refactoring After Refactoring ClassA ClassA clonedMethod() ClassB ClassB clonedMethod() ClassC ClassC ClassD clonedMethod() ClassD UtilityClass clonedMethod() clonedMethod() Case Study:Filtering Conditions(1/3) • Conditions for “Move Method” • The target is Method-Body-Unit clone sets • Cloned methods use no other resource of its class

Before Refactoring After Refactoring void methodA(int i){ methodZ(); System.out.println(“name:” + name); System.out.println(“amount:” + i); } void methodB(int i){ methodY(); System.out.println(“name:” + name); System.out.println(“amount:” + i); } void methodA(int i){ methodZ(); methodC(i); } void methodB(int i){ methodY(); methodC(i); } void methodC(int i){ System.out.println(“name:” + name); System.out.println(“amount:” + i); } Case Study:Filtering Conditions(1/3) • Conditions for “Extract Method” • The target is Statement-Unit clone sets such as “if”, “while”, “for” … • Cloned statements assign any value to 1 or no externally defined variable • All statements of a clone set are in a same class

Case Study:Result of Filtering • Number of detected clone set • Declaration Unit: 4 • Method Unit: 13 • Statement Unit: 49 • Number of filtered clone set • Extract Super Class: 4 • Move Method: 5 • Extract Method: 12

Case Study:Evaluation Method(1/3) • A developer of the target system evaluated refactorings of filtered clone sets from the following viewpoints • About status of clones (He judged each following term of the status on a scale of (a) or (b)) • size of module • design of software • cohesion of class • coupling among classes (a) deteriorating, (b) making no impact

Case Study: Evaluation Method(2/3) • The continuation • About effectiveness of refactorings (He judged each following term of the effectiveness on a scale from (a) to (d) ) • size • design • cohesion • coupling • understandability • reusability (a) improving，(b) preventing problems for the future， (c) having no impact，(d) deteriorating

Case Study:Evaluation Method(3/3) • The continuation • About costs of refactorings (He judges two terms of the cost on a scale from (a) to (c)) • modifying source code • testing for regression (a) very easy, (b) a little troublesome，(c) complicated • Comprehensive Evaluation ( He judged the comprehensive evaluation on a scale from (a) to (d)) • This clone set … (a) must be refactored as soon as possible (b) should be refactored for the future (c) is no matter about refactoring (d) should not be refactored

Case Study:The Result - Extract Super Class（1/4） • The status of “Declaration-Unit” clones • All clone sets deteriorate size and design • All clone sets have no impact on cohesion, and coupling • Deteriorating • Making no Impact

Case Study:The Result - Extract Super Class（2/4） • The effectiveness of “Declaration-Unit” refactorings • Refactorings of all clone sets are effective in terms of size, design, understandability, and reusability • Refactorings of all clone sets have no impact in terms of cohesion, and coupling • Improving • Preventing problems for the future • Having no impact • Deteriorating

Case Study: The Result - Extract Super Class（3/4） • The cost of “Declaration-Unit” refactorings • Modifying requires a little trouble • Testing becomes complicated in some cases • The comprehensive evaluation of “Declaration-Unit” refactoring • All clone sets must or should be refactored • Very Easy • A little Troublesome • Complicated • Must be refactored • Should be refactored • No matter • Shouldn’t be refactored

Case Study:The Result - Extract Super Class（4/4） Before Refactoring After Refactoring • Methods “DoAfterBody” and “DoEndTag” are exactly identical respectively • Methods “DoStartTag” provides different functions AcceptHeadChecker SakubanHeadChecker New Class doStartTag() doAfterBody() doEndTag() doStartTag() doAfterBody() doEndTag() doAfterBody() doEndTag() AcceptHeadChecker SakubanHeadChecker doStartTag() doStartTag()

Case Study:The Result- Move Method（1/4） • The status of “Method-Unit” clones • All clone sets deteriorate size and design • Most of them(80%) deteriorate cohesion • All clone sets have no impact on coupling • Deteriorating • Making no Impact

Case Study: The Result - Move Method（2/4） • The effectiveness of “Method-Unit” refactorings • Refactorings of all clone sets are effective in terms of size, design, cohesion, understandability, and reusability • Refactorings of all clone sets have no impact on coupling • Improving • Preventing problems for the future • Having no impact • Deteriorating

Case Study: The Result - Move Method（3/4） • The cost of “Method-Unit” refactorings • Both modifying and testing are very easy • The comprehensive evaluation of “Method-Unit” refactorings • All clone sets must or should be refactored • Very Easy • A little Troublesome • Complicated • Must be refactored • Should be refactored • No matter • Shouldn’t be refactored

Case Study: The Result - Move Method（4/4） • The following method exist in 9 classes • This method converts ResultData to HashMap • This method uses no other fields or methods of its class • Moving to other class is very easy • The developer judged that this refactoring is effective private HashMap convertToMap(ResultData rd){ String[] names = rd.getNames(); HashMap map = new HashMap(); for( int i = names.length ; i++ ){ map.put([i], rd.getObject(names[i])); } return map; }

Case Study: The Result - Extract Method（1/4） • The status of “Statement-Unit” clones • Part of clone sets deteriorates size and design • All clone sets have no impact on cohesion, and coupling • Deteriorating • Making no Impact

Case Study:The Result - Extract Method（2/4） • The effectiveness of “Statement-Unit” refactorings • The effectiveness of refactorings is divisive in terms of size, design, understandability, and reusability • Refactorings of all clone set have no impact on cohesion , and coupling • Improving • Preventing problems for the future • Having no impact • Deteriorating

Case Study:The Result - Extract Method（3/4） • The cost of “Statement-Unit” refactorings • Almost half of clone sets require a lot of cost on both modifying and testing • The comprehensive evaluation of “Statement-Unit” refactorings • Almost half of clone sets shouldn’t be refactored • Very Easy • A little Troublesome • Complicated • Must be refactored • Should be refactored • No matter • Shouldn’t be refactored

Case Study:The Result - Extract Method（4/4） • The following if-statement appeared 5 times in a class • Red variables mean that they are defined outside this if-statement • They must be added to the argument of the extracted method. • This statement can be extracted as a new method in the same class, but … • The argument of the new method become long. • Extracting 5 times requires a lot of cost. if( vomrgpersonSyuningishi.getName() != null && !vomrgpersonSyuningishi.getName().equals(“”)) { checkOwnSettlement(vomrgpersonSyuningishi.getShimeiNo(), vomrgpersonSyuningishi.getName(), vomanager, voprojectinfo, vodocumentinfo, vomrgpersonSyunigishi.getBusyomei()); }

Case Study:Discussions • The following table represents the ratio of refactorings judged as effective ((a) improving, or (b) preventing problems for the future) • Some “Statement-Unit” clones were judged as not suited to be refactored • Extracting manually a part of a method is complicated • Making the filtering conditions stricter • Helping modification working • Some clone sets depend on the framework used in the system • “Extract Super Class” and “Move Method” refactorings are effective • Those clone sets deteriorate some software qualities • Refactorings of them can improve qualities • They don’t require much cost

Contents • Background • Proposed Refactoring Support Method • Step1, Step2, Step3 • Metrics used in Step3 • Refactoring Support Tool Aries • Case Study • Overview • Filtering Conditions • Evaluation Method • Result • Conclusion & Future Works

Conclusion & Future works • We have • proposed refactoring support method • implemented a refactoring support tool, Aries • conducted a case study to a business application • “Declaration-Unit” and “Method-Unit” refactorings are effective • Some “Statement-Unit” clones are not appropriate to be refactored • As future works, we are going to • evaluate refactoring effectiveness quantitatively • add an effectiveness measurement function to Aries for comprehensive supporting refactoring of code clones

Code clone detection for refactoring:Related Works • Detect similar sub-graphs as clone on program dependency graph [1]. • High accuracy: This approach finds out data-dependence and control dependence in source codes. • High time complexity: It takes O(n2) time to construct program dependency graph. • Detect similar methods and functions as clone using metrics [2]. • Low accuracy: if the size of target method or function is small, the values of metric make no difference. • detection unit restriction: only method and function unit clone can be detected. [1] R. Komondoor and S. Horwitz, “Using slicing to identify duplication insource code”, In Proc. of the 8th International Symposium on Static Analysis, Paris, France, July 16-18, 2001. [2] Magdalena Balazinska, Ettore Merlo, Michel Dagenais, Bruno Lague, and Lostas Kontogiannis, “Advanced Clone-Analysis to Support Object-Oriented System Refactoring”, WCRE 2000, pp. 98-107

Properties • Size: means the number of lines or token of modules • Design: means the structure of hierarchy of classes or encapsulation of classes • Cohesion: means the responsibility of classes. If a classes provide no or various functions, the cohesion of the class becomes wrong (low) • Coupling: means usage of fields or method of other classes. If fields or methods are defined in an inappropriate class, the coupling becomes wrong (high)

ARIES: Refactoring Support Tool for Code Clone

ARIES: Refactoring Support Tool for Code Clone

Presentation Transcript

Welcome to the Deployment Cycle Support (DCS) Resource Tool

Fox Thinking Tool

Why clone in eukaryotes?

Lecture 11: Code Optimization

AF BBP Tool Basics

Using Software Refactoring to Form Parallel Programs: the ParaPhrase Approach

HIKING TOOL – CLIMBIMG ROPE

Presentation 7 Summary

International Safety Management Code – ISM Code

Extreme Coding : Take control of your code

9A Unit 1 Revision

CODE OF CONDUCT

Code generation tools

BILINGUAL CODE-MIXING

UNDERSTANDING AND USING THERMAL-HYDRAULICS CATHARE 2 CODE FOR SAFETY ASSESSMENTS

Beyond the Basics of SonarQube : Improve Your Java(Script) Code Even Further

No-Code Diagnosis

How and When to do Refactoring

B R _ main

Beyond the Basics of SonarQube : Improve Your Java(Script) Code Even Further

重组 DNA 技术 Recombinant DNA Technology