170 likes | 322 Views
Geographical Locations of Developers at SourceForge: Gregorio Robles Jesus M. Gonzalez-Barahona. Presented by Brian Chan Cisc 864. Overview. Background Motivation Data Gathering Methods Results Conclusions. Background Information.
E N D
Geographical Locations of Developers at SourceForge:Gregorio RoblesJesus M. Gonzalez-Barahona Presented by Brian Chan Cisc 864
Overview • Background • Motivation • Data Gathering Methods • Results • Conclusions
Background Information • Developers are distributed across the world for projects • i.e. Libre Software • Hard to account for all the developers and harder to control these resources
Motivation • To accurately account for all personnel in the world for a given project • Interesting for academic and economic reasons
Data Gathering Methods • Two Primary Sources of Information: • Private email address • Time Zone of the User • Acquired from special database for research purposes
Data Gathering Methods • Useful information in email and time zone: • ccTLD – Country Code Tope Level Domain • i.e. gsyc.escet.urjc.es Figure1.0 Figure 2.0
Hard to Pinpoint Location when data consists of: gTLD (Generic Top Level Domain) Time Zones which are GMT (Greenwhich Mean Time) Figure 3.0 Data Gathering Methods
Use Distributed Method to estimate where users should go: i.e. 22 users in one domain, 10 are unaccounted for due to GMT only Figure 4.0 Data Gathering Methods
What if the user was actually in the GMT? Need to rebalance equation to account for data that was ignored Figure 5.0 Data Gathering Methods
Weigh results by that factor Ratio of Own TZ to GMT for GMT countries same as Non-GMT regions Figure 6.0 Data Gathering Methods
Different Types of data sets Figure 7.0 Data Gathering Methods
Top 50 Countries account for 96.5% of developers in SourceForge. Top 20 Countries account for 83.9% of developers in SourceForge Figure 8.0 Results
Most developers are from Europe and North America: almost 50-50 ratio Penetration in Libre Software higher in North America because Europe has higher population Figure 9.0 Results
Conclusion • Method for redistributing developers to their place of origin • Not to identifying users to a single geographical location but aggregate numbers of developers of a certain national origin • Can be used to look for correlations which explain the GDP, the GDP per capita or other economic patterns
Personal Thoughts • Good Points • Interesting Results-North America accounts for almost half of total activity • Interesting method for redistributing unknown data sources to certain region
Personal Thoughts • Points for Improvement • Questionable: Is it really hard to ascertain the nationality of the developer or geographical location entry (even though private information) • SourceForge might be one of the most common open source systems, but is this indicative of all open source systems?