1 / 77

A Software Engineering Tool for Distributed Development

A Software Engineering Tool for Distributed Development. Jason Carter Prasun Dewan University of North Carolina at Chapel Hill. Motivation. Grrr …. Hmm… is Bob stuck?. Programmer Bob. Programmer’s Mentor/Teammate. Applications. Need Help?. Offer help to student programmers

fayre
Download Presentation

A Software Engineering Tool for Distributed Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Software Engineering Tool for Distributed Development Jason Carter Prasun Dewan University of North Carolina at Chapel Hill

  2. Motivation Grrr… Hmm… is Bob stuck? Programmer Bob Programmer’s Mentor/Teammate

  3. Applications Need Help? Offer help to student programmers who are too shy to ask for it Significantly improve programmer productivity Benefits of this idea may occur in industry Manager Student

  4. Co-located vs. Distributed Distributed Team • [2]Herbsleb, J.D., et al. Distance, dependencies, and delay in a global collaboration. in Proc. CSCW 2000. Less productivity More productivity Co-Located Team

  5. Productivity Higher in War-rooms Than In Cubicles • Teasley, S., et al. How does radical collocation help a team succeed? in Proc. CSCW 2000. War-room Cubical Combined these studies show

  6. Distance Impedes Deduction Hmm… is Alice stuck? Bob Grrr… Distance Developers often do not explicitly ask for help How do we reduce this gap? Alice

  7. CollabVS Developers are aware of methods their distributed teammates are working on • Hedge R. and Dewan P. Connecting Programming Environments to Support Ad-Hoc Collaboration in 23rd IEEE/ACM International Conference on ASE. 2008. Use this information with project information to manually determine if teammate is stuck Distributed users the feeling of “being there” in a single location

  8. Can We Do Better Than Being There? Face-to-Face Interaction - “Being There” • Hollan, J. and Scott S. Beyond being there. CHI ’92. Bob Bob “Beyond Being There” Alice How do we plan to go “beyond being there”? Alice

  9. “Beyond Being There” You are having difficulty. Bob is having difficulty. Programmer Bob There are several ways to infer this information Programmer’s Mentor/Teammate

  10. Automatic Prediction of Frustration • Kapoor, A., Burleson, W., and Picard, R.W. "Automatic Prediction of Frustration," International Journal of Human-Computer Studies, Vol. 65, Issue 8, 2007. PROBLEM: Overhead of using this non-standard equipment Posture Seating Chairs Wireless Bluetooth skin conductance tests Video Camera Pressure Mice Determine this information by logging interaction with some component of the system Alternative approach

  11. Determine IF Programmers Are Interruptible • Fogarty, J., Ko, A.J., Aung, H.H., Golden, E., Tang, K.P. and Hudson, S.E. Examining task engagement in sensor-based statistical models of human interruptibility. In Proc. CHI 2005, ACM Press (2005), 331-340. Developed a tool that uses developers’ actions to determine if they are interruptible Randomly interrupted developers Interruptibility 0 100 Can we use the this approach?

  12. Information about Events No random interruption would find a developer is having difficulty Interruptibility 0 100 A better alternative is to allow developers to report their status

  13. USE Buttons To Collect Information Buttons used to indicate status “Eureka Button” To capture situations in which developers did not realize they had been having a problem until they had solved it “Notifications Enabled” These buttons are useful only for the training phase Allowed developers to determine if they received status change notifications Useful to run an initial naïve algorithm

  14. Our Approach

  15. Basic Intuition You are having difficulty. Monitor progress of developers progress < than some threshold Threshold indicate that they are having difficulty Progress 0 100 related but fundamentally different

  16. Relationship Between Productivity and Progress Usually measured after developers have written code Measured while developers write code Little work has been done on measuring progress The only work we could find was done by Kersten and Murphy

  17. Mylar: Tool to Reduce Navigation • Kersten, M., Murphy, G. C., Mylar: A degree-of-interest model for IDEs. In Proc. Conference on Aspect-Oriented Software Development, 2005, 159-168. # of Edit Commands Edit Ratio # of Navigation Commands

  18. Our Approach: Determine Measure Of Progress

  19. Metrics To Measure Progress > Low Threshold & Edit Ratio # of Debug Commands Stuck Having Difficulty Naïve algorithm did not predict the progress status well Explore the logs and corrections to derive a better algorithm Y. Sharon. Eclipseye—spying on eclipse. Bachelor’s thesis, University of Lugano, June 2007.

  20. Our Approach: Derive Mining Algorithm

  21. Deriving Mining Algorithm Analyze Logs To find patterns when developers indicated they were having difficulty Features Values that change when programmers are making progress and having difficulty A manual inspection of the logs showed that the frequency of certain edit commands decreased when developers were having difficulty

  22. Commands Grouped Into Five Categories Depending on the developer, the frequency of execution of other commands increased We used this categories to create our features

  23. Identifying Features For different segments of the log we calculated: the occurrences of each category of commands in that segment 100 * total # of commands in the segment Used these percentages as features to identify patterns The size of these segments is an important issue

  24. Determining Segment Size <action> <eventType>SOLUTION_OPENED</eventType> <solutionEvent> <timestamp>9/20/2009 12:44:02 PM</timestamp> </solutionEvent> </action> <action> <commandEvent> <command>Debug.Start</command> <timestamp>9/20/2009 12:45:33 PM </timestamp> </commandEvent> </action> <action> <eventType>WINDOW_LOST_FOCUS</eventType> <windowEvent> < timestamp> 12:46:01 PM</ timestamp > </windowEvent> </action> <action> <eventType>WINDOW_GAINED_FOCUS</eventType> <windowEvent> < timestamp> 12:48:01 PM</ timestamp > </windowEvent> </action> Segment Sizes: Whole Log 200 100 25 50 Graphed the programming behavior of all participants to determine usefulness of features

  25. Our Approach: Validate Algorithm

  26. Graphs to Validate Features

  27. Graphs to Validate Features (cont.)

  28. Graphs to Validate Features (Cont.) The two graphs validate our feature choice show that a general model must account for differences in what percentages change when developers are having difficulty There are several standard ways to build a general model

  29. Number of Stuck Events Significantly Less than Total Number of Events Total Events: 2288 This leads to the imbalance class distribution problem 29

  30. Imbalanced Class Distribution Disproportionate number of having difficulty segments to making progress segments Needle in a haystack “Standard” algorithms to predict making progress ~97% of the time Accuracy of this model: 83% Problem: Model can’t identify when a developer is having difficulty

  31. SMOTE Algorithm Replicates rare data, having difficulty, until there is more of a balance between having difficulty statuses and making progress statuses The replicated data of all developers were combined and used as input to several standard algorithms to build a model Making Progress Having Difficulty 2212 76 1216

  32. Build A Model Applied mining algorithms Participant1-2 Participant3-4 Participant5-6 Logs – Replicated Data 10 trials of model construction executed Standard 10 fold cross-validation Each trial used 90% of data for training The remaining 10% used as test data to evaluate the model in that trial Model

  33. Accuracy of Model Using Decision Tree Algorithm • Witten, I.H. and Frank, E. (1999) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann. Developers were having difficulty 1216 times Developers were making progress 2288 times

  34. Classification via Clustering Designed to identify rare events without replicating records • Witten, I.H. and Frank, E. (1999) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann. Developers were having difficulty 76 times Developers were making progress 2288 times

  35. Our Approach Approach is promising Left several unanswered questions

  36. Does Approach Work In Practice? Decision Tree Model Features Edit Percentage Debug Percentage Focus Percentage Navigation Percentage Classification Via Clustering Model Remove Percentage Research group and one industrial developer used software Learned several important lessons

  37. Having Difficulty Lesson Stuck button and the having difficulty status hurt my advisor’s ego

  38. Frequent False Positives Lesson Workflow system Industry Developer Building a new product Started a new session The navigations performed to build the working set of files Sometimes needed more time to determine if the predicted change of status was correct

  39. label aggregation technique 50 50 50 50 # of events Compute Features Compute Features Compute Features Compute Features 250 Two techniques account for the fact that developers’ status does not change instantaneously To give a more detailed explanation of how this works

  40. Implementation of Developer Notifications There are 5 status predictions when reporting a dominant status every 250 events Indeterminate Status Prediction Slow Progress Slow Progress Slow Progress Status Prediction Slow Progress Status Prediction Slow Progress Slow Progress Status Prediction Slow Progress Making Progress Making Progress Status Prediction Making Progress Allowed the developer to correct a predicted status to indeterminate

  41. Results of Pilot Study There was a total of 88 predictions made Every hour we switched models without interrupting the user Accuracy of this study is good Large number of false negatives How do we improve our accuracy?

  42. Cost of Processing Incremental Input Events Lesson Advisor noticeable intolerable 3-year old laptop

  43. Changes In The Tool Increased Programming Time and Effort Decision Tree Model Algorithm Classification Via Clustering Model Do not share code

  44. Solution To Creating New Code For Each Programming Environment Build an architecture that is independent of the programming environment Decision Tree Algorithm Architecture Developers’ actions Classification Via Clustering Algorithm Supports interoperability Also put process on server

  45. Architecture Made up of several modules

  46. event-interception module Developers’ actions Serialized object Developers’ actions WOX XML This module does several things: Captures events from both Eclipse and Visual Studio Passes these events to the prediction modules Prediction modules are written in C# so events from Visual Studio could be passed directly Java events were converted to C# using standard libraries

  47. Predication Modules Mediator Status Aggregator WOX IKVM Event Aggregator Feature Extractor Prediction Manager Previous Model Mediator allows modules to be loosely coupled We can use several algorithms for event aggregation

  48. Discrete Chunks/Sliding Window Aggregation Algorithm Discrete chunk of 3 events Window, Window Size = 3 Can this tool work with professional programmers?

  49. Controlled User Study 14developers 9student programmers Having difficulty is rare 5 industry programmers Make sure developers face difficulty during the study Tasks are no impossible to solve We use ACM programming problems

  50. ACM Programming Problems Mid-Atlantic ACM Programming Contest http://midatl.radford.edu/ • http://midatl.radford.edu/ Is self reporting reliable?

More Related