1 / 42

CoxR: Open Source Development History Search System

CoxR: Open Source Development History Search System. Makoto Matsushita, Kei Sasaki, and Katsuro Inoue Osaka University. Contents. Background Open-source software development Repository analysis system “CoxR” Supporting Dynamic Communication System Future research interests.

koko
Download Presentation

CoxR: Open Source Development History Search System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CoxR: Open Source Development History Search System Makoto Matsushita, Kei Sasaki, and Katsuro Inoue Osaka University

  2. Contents • Background • Open-source software development • Repository analysis system “CoxR” • Supporting Dynamic Communication System • Future research interests Supporting Knowledge Collaboration in Software Development

  3. Open Source Software Development • Open and parallel software development • Anybody join the party at anytime • Developers are living all over the world source code source code source code source code source code manual CVS email requests requests ↓ fixes developers email archives submit bug-report request feature enhancement GNATS Supporting Knowledge Collaboration in Software Development

  4. Reusing repositories • System repositories have valuable information such as products evolutional histories and each developer’s information • processes to be done to products • knowledge on requirements and design • Analyze and reuse these contents may help to reduce time/efforts of whole software development • reuse the ways of bug-fix • understanding a project itself that are going to join • reuse (a part of) products/components However, there are some difficulties to reuse contents… Supporting Knowledge Collaboration in Software Development

  5. Problem 1:less relationship between systems Where can I find what I want? It seems that ‘bktr’ driver has a bug so I’d like to fix it… user files also need to be changed discussions on bktr driver proposed fix for bktr driver source code fixes CVS GNATS email archive Supporting Knowledge Collaboration in Software Development

  6. Problem 2:Interests may vary • Even if the problem is same, a solution that is done in the past is not suitable for all peoples • knowledge and processes may vary for developers • information needs may vary on time Maybe similar bugs were appeared on other drivers so search them up Problem: there’s a bug on bktr driver I’d like to seek authorities of graphics driver I’d like to have a new version of bktr driver Supporting Knowledge Collaboration in Software Development

  7. Objective • Analyze past processes/histories kept on existing systems, to help developers to search, understand, reuse such processes • Modeling information on systems as “development community”, using CVS, Email, and GNATS • Propose an information extraction approach from development community • A prototype of the proposed approach Supporting Knowledge Collaboration in Software Development

  8. Topics • Step 1: Modeling information • Step 2: Information extraction algorithm • Step 3: System implementation Supporting Knowledge Collaboration in Software Development

  9. Model elements • People: developers registered to CVS, email archive, and GNATS databases • Knowledge: contents of CVS, E-mail, and GNATS integrated model email archives GNATS CVS Supporting Knowledge Collaboration in Software Development

  10. Extracting people/knowledge Knowledge 人 file path revision # tag, date developer contributor source code comments CVS Subject: body From: To:, Cc: Message-Id: Date: E-mail modification base file path PR # date last modified Originator Responsible category bug class description fix audit-trail status GNATS Supporting Knowledge Collaboration in Software Development

  11. People/Knowledge network We assume that the network has 3 types of edges: • People-Knoledge • People-People • Knowledge-Knoledge Development Community Supporting Knowledge Collaboration in Software Development

  12. Extracting network edges (1/2) • People-Knowledge edge • People/Knowledge elements in the same CVS, Email and GNATS information • People-People edge • Peoples in the same CVS, Email, and GNATS information • Peoples subscribed to the same lists • Peoples working on the same directory Supporting Knowledge Collaboration in Software Development

  13. Directly connected Revision histories to the same file Files in the same directory Modified at the same time Email threads Email/PR IDs Similar Knowledges Source codes Keywords Base/modification information in GNATS Extracting network edges (2/2) • Knowledge-Knowledge edge Supporting Knowledge Collaboration in Software Development

  14. Topics • Step 1: Modeling information • Step 2: Information extraction algorithm • Step 3: System implementation Finding out a small network that is matched to the users’ input Supporting Knowledge Collaboration in Software Development

  15. Topic community • Topic = reusable process and information • Elements related to a topic can be defined as a sub-network of development community • Topic community may vary to each user development community Experts on this area patches Topic communmity Supporting Knowledge Collaboration in Software Development

  16. Topic community extraction (1/6) • Select the initial knowledge elements • Assume that a topic is given by a user • Extract knowledge matched to the topic • Select an initial knowledge elements I found that there is an register error on bktr driver while watching TV by fxtv program… • Code fragments • Directory/file name • Mailing lists name • Bug class/description • Keywords • Date CVS:bktr_core.c 1.20 Comment: fix register error Keyword: ”bktr” E-mail:Subject bktr module unloding (2002) user GNATS:Description: fix bktr option error (2000) Search results Supporting Knowledge Collaboration in Software Development

  17. Topic community extraction (2/6) • Select the initial knowledge elements • Assume that a topic is given by a user • Extract knowledge matched to the topic • Select an initial knowledge elements It seems that bktr_card.c rev. 1.20 is good CVS:bktr_core.c 1.20 Comment: fix register error E-mail:Subject bktr module unloding (2002) user Select bktr_card.c GNATS:Description: fix bktr option error (2000) Supporting Knowledge Collaboration in Software Development

  18. Topic community extraction (3/6) • Show related people/knowledges using the network • User selects appropriate elements again I’d like to know the people working on bktr_core.c developer: fjoe bktr_core.c contributor: phk Search results user Search related elements Supporting Knowledge Collaboration in Software Development contributor: roger

  19. Topic community extraction (4/6) • Show related people/knowledges using the network • User selects appropriate elements again developer: fjoe Hmm, fjoe is actual developer so I want to know more about him. bktr_core.c contributor: phk Select fjoe user Supporting Knowledge Collaboration in Software Development contributor: roger

  20. Topic community extraction (5/6) • “Search and select elements” repeated Variables changed in yuv422_pro() Same time changed: bktr_card.c Ok, are there any other elements that when fjoe changed bktr_core.c … developer: fjoe bktr_core.c Search results user Search related elements Supporting Knowledge Collaboration in Software Development

  21. Topic community extraction (6/6) • “Search and select elements” repeated Variables changed in yuv422_pro() Tracking GNATS elements that is talking about bktr_card.c Same time changed: bktr_card.c GNATS PR:41437 (closed) Description:Problems bktr_card.c:yuv422_pro() developer: fjoe bktr_core.c Email commented to the change PR:41437 causes a register error Search results Topic community user The user finally get information about the changes to bktr_card.c, that helps to fix register error Search related elements Supporting Knowledge Collaboration in Software Development

  22. Topics • Step 1: Modeling information • Step 2: Information extraction algorithm • Step 3: System implementation CoxR: web-based system, using FreeBSD data Supporting Knowledge Collaboration in Software Development

  23. CoxR implementation • Using FreeBSD development data, from 1994 to 2004 • System development environment • CPU : Pentium41.5GHz • RAM : 512MB(SDRAM) • OS : Debian GNU/Linux • System size: about 10000 LOCs • CVS: FreeBSD CVS repository • (Total 57822 files, 618186 revisions) • E-mail: “Commited changes” mailing lists • (Total 213723) • BTS: FreeBSD GNATS PRs (Total 82350) Supporting Knowledge Collaboration in Software Development

  24. System overview Topic words Web Server Search results selection user System Control History DB Matched People/Knowledge People-Knowledge relations Knowledge-Knowledge relations People-People relations Information Extraction Knowledge People Relation DB Knowledge People CVS E-mail Relation extraction GNATS CoxR-C 情報探索の流れ 関連抽出の流れ データ抽出の流れ Supporting Knowledge Collaboration in Software Development

  25. System evaluation • Purpose • CoxR provides useful information to developers with appropriate search results • Process • Announcing CoxR to ‘freebsd-hackers’ and ‘freebsd-current’ mailing lists that are mainly for FreeBSD developers • Trace users’ behaviors with webserver’s log • Evaluation period: Jan/31/2005-Feb/21/2005 • Total users:79 (31 unique users) Supporting Knowledge Collaboration in Software Development

  26. Initial knowledge selection • Unfortunately not all users select knowledge from the topic search results Maybe they are just “try” to use CoxR search, or search results is not good for users • 18 out of 31 users select initial knowledge Type of information selected: • CVS: 12 • E-mail: 4 • GNATS: 2 • Selection times average: 4 times per topics (min 1, max 9) Supporting Knowledge Collaboration in Software Development

  27. Topic community search • Users actually search topic community • 12 out of 18 • they used to search related people and knowledge within the same subsystem • Average network traversal: 2 times • People-People: 1 • People-Knowledge: 8 • Knowledge-Knowledge: 13 Supporting Knowledge Collaboration in Software Development

  28. Discussions • Initial knowledge selections • 56% search results would leads to valuable information • “Search by keyword, then search by developer names and/or date” is a typical search patterns • Topic community selection • 67% users who find initial knowledge elements are successfully find their own topic community • They used to trace Knowledge-Knowledge and People-Knowledge edge of development network Supporting Knowledge Collaboration in Software Development

  29. Conclusion • “CoxR”, a search system for open-source software development • CVS, Email, and GNATS • Development network, topic community • Evaluation helped with real developers • Keywords may have its information costs • Easy to find important keywords • Links between similar keywords • Developer roles • Easy to find people by their roles • Reuse topic community found by others • It can be a suggestion of finding out topic community Supporting Knowledge Collaboration in Software Development

  30. Fin Supporting Knowledge Collaboration in Software Development

  31. Query Word = Source code Commit log Keyword, Time File name Developer name Query Word Search Result Commit log Keyword Time File name Developer name Similarity Time File name Developer name Source code Related Files /Data CVS information E-mail information Data Display Record System CGI-Main Search result File name Developer name Time Source code Developer name Time Topics Search Result Lexical analysis tool Token compare tool CVS Info DB Fusion info DB E-mail InfoDB Token Code DB Fusion info Create tool CVS info Create tool E-mail info Create tool DB Create tool CoxR CoxR (Web Server) CoxR user Data Display Record System CGI-Main Lexical analysis tool Token compare tool CVS Info DB Fusion info DB E-mail InfoDB Code DB CoDS SPxR Fusion info Create tool CVS info Create tool E-mail info Create tool DB Create tool E-mail Archive CVS Repository Supporting Knowledge Collaboration in Software Development

  32. user if (i != 0) error("Permission denied, please try again."); password = read_passphrase(prompt, 0); packet_start(SSH_CMSG_AUTH_PASSWORD); packet_put_string(password, strlen(password)); memset(password, 0, strlen(password)); xfree(password); packet_send(); packet_write_wait(); Password Attack Source code of sending a password Example case Sending a password Needs improvements Supporting Knowledge Collaboration in Software Development

  33. user Directory structure and list archive if (i != 0) error("Permission denied, please try again."); password = read_passphrase(prompt, 0); packet_start(SSH_CMSG_AUTH_PASSWORD); packet_put_string(password, strlen(password)); memset(password, 0, strlen(password)); xfree(password); packet_send(); packet_write_wait(); Keywords in source code Filename and repository Searching the repositories Identify similar code Supporting Knowledge Collaboration in Software Development

  34. After matching files are detected, identify which are most important in this case, with logs and diffs Developer green committed to here at 2001/03/20 02:06:40 packet_put_string() is changed to ssh_put_password() pad passwords sent to not give hints to the length Understanding with related information Searching similar code There’s an evidence of improvement, but hard to understand what’s are actually changed Supporting Knowledge Collaboration in Software Development

  35. Display change history of the file Search files and emails at the same time of this commit Search files and emails by same developer Searching related information Supporting Knowledge Collaboration in Software Development

  36. Search differences between revisions Search file by same developers, and the same time Search by revision histories Supporting Knowledge Collaboration in Software Development

  37. Detailed information of this file Email information Search by development time Supporting Knowledge Collaboration in Software Development

  38. キーワードによる検索 Search by keyword “openssh” Combining search results will make it easy to find what we need Supporting Knowledge Collaboration in Software Development

  39. Definition of ssh_put_password() Search living source code from relative source files Search similar information Files commit at the same time (2001/03/20 02:06:40) and same developer (green) Actual source code of how to hide the password packet length is found by CoxR Supporting Knowledge Collaboration in Software Development

  40. if (i != 0) error("Permission denied, please try again."); password = read_passphrase(prompt, 0); packet_start(SSH_CMSG_AUTH_PASSWORD); packet_put_string(password, strlen(password)); memset(password, 0, strlen(password)); xfree(password); packet_send(); packet_write_wait(); user void ssh_put_password(char *password) {int size;   char *padded;   size = roundup(strlen(password) + 1, 32);    padded = xmalloc(size);   memset(padded, 0, size);   strlcpy(padded, password, size);   packet_put_string(padded, size);   memset(padded, 0, size);   xfree(padded);} • Understanding changes • Reuse it later Solutions Search “how to fix” Supporting Knowledge Collaboration in Software Development

  41. Discussions • Search similar code・・・shows actual changes • Search relative infomation ・・・Understanding how to fix the security hole • Easy to detect what we need, since any kind of information, including keywords, time, developer name, code fragment, can be used. • Easy to understand search results by finding relative information easily; it helps to grasp not only “what,” but also “why” this change happened. Supporting Knowledge Collaboration in Software Development

  42. Conclusion Remarks • Implementing “CoxR”, a search system for both CVS revisions and email archives. • Using actual open-source development data, CoxR provides easy and quick way to search useful information on software development. • Broader experimentation • Improvements on search method (multiple search at one time) • Information scoring (define “importance/relation level” of each information) Supporting Knowledge Collaboration in Software Development

More Related