1 / 47

Diversity of User Activity and Content Quality in Online Communities

This study examines the diversity of user activity and content quality in online communities, using Essembly as a case study. It explores the patterns of user activity, content ratings, and voting history to understand the factors that contribute to user engagement and high-quality contributions. The study also discusses the limitations of the data set and the potential impact of user privacy on research usefulness.

veraf
Download Presentation

Diversity of User Activity and Content Quality in Online Communities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Diversity of User Activity and Content Quality in Online Communities Tad Hogg and Gabor Szabo HP Labs thanks to: C. Chan and J. Kittiyachavalit (Essembly) M. Brzozowski and D. Wilkinson (HP)

  2. Bugzilla essembly delicious online communities “wisdom of crowds”

  3. Why model online communities? • predict • e.g., which new content will become popular? • design web sites • e.g., what to show users? • encourage high-quality contributions • e.g., what incentives?

  4. number of cases activity heterogeneity is pervasive • most activity from a few ‘top users’ • most interest in small fraction of content • broad, long-tail distributions typical << average << maximum

  5. topics • case study: Essembly • user activity • content ratings

  6. What is Essembly? • political discussion web site • help people identify others with similar views • self-organize for political activity

  7. Essembly: resolves • users create resolves • e.g., “free trade is good for American workers” • other users vote & comment • 4-point scale • agree, lean agree, lean against, against

  8. Why study Essembly? • voting history since start of site • modest-sized community • can examine all users and content • useful to study diversity • distinct link semantics • friend, ally, nemesis • similar diversity as other communities • Digg, Wikipedia, …

  9. users active each month new resolves each month number of votes each month data set Aug. 2005 to Dec. 2006 • 15,424 users • 24,953 resolves • 1.3 million votes • networks • comments 50 new resolves per day

  10. data limitations • anonymous • no user characteristics • e.g., demographics, political party, … • no content of resolves or comments • e.g., political topic area • environment, economics, foreign aid,…. • hence: • can’t test if characteristics explain diversity user privacy vs. research usefulness • no info on • which resolves users view (but don’t vote on) • how users find resolves (e.g., via networks)

  11. topics • case study: Essembly • user activity • content ratings

  12. user activity 4741 active users with at least one action actions: create a resolve, vote on a resolve, form a link

  13. create vote active user inactive user link user model inactive: no activity for at least 30 days (conventional, but somewhat arbitrary, definition) • how long user is active • how often user contributes while active correlation between activity time and rate: -0.07 model as independent components of user behavior caveat: users active only a short time have larger (negative) correlation: -0.2

  14. create vote active user inactive user link user model this model: consider whether user votes on resolve not how user voted (agree,…,disagree) or comments note: how users vote correlates with link type (friend, ally, nemesis) M. Brzozowski et al., "Friends and Foes: Ideological Social Networking", Proc of CHI 2008

  15. user activitymodel components • activity time • activity rate

  16. activity time distribution:stretched exponential for users active at least 1 day • diverse time scales for user participation • users active a long time less likely to quit in next day than new users applies to many online communities [Wilkinson 2008]

  17. user activitymodel components • activity time • activity rate

  18. normal distribution fit to log(ρ) values 2 months/action 60 actions/day natural logarithm of actions per day activity rate distribution: lognormal actions: create a resolve, vote on a resolve, form a link

  19. user activity • activity time • activity rate • combined model

  20. mismatch for small number of actions negative correlation of time and rate for less active users e.g., a few actions to “try out” the site over a day or so user activity distribution • product: (activity time) x (activity rate) model captures diversity of action counts, but not bursts of activity (“sessions” of ~3 hours with longer breaks) 4741 active users with at least one action

  21. What determines user activity? • diversity from two underlying broad distributions: • activity time (stretched exponential) • multiple time scales for losing interest in site • activity rate (lognormal) • multiplicative process leading to activity rate heterogeneity • open question: • What user characteristics and community properties produce these distributions?

  22. utility utility “nurture” time user is active time user is active initially homogeneous change due to experience on site cohort increasingly dominated by users with good experience who are less likely to quit initially heterogeneous cohort increasingly dominated by high-utility users who are less likely to quit activity time:prior interest or experience? “nature”

  23. How to encourage participation? • “nature” • attract users whose interests fit the community • expose potential users to site, word of mouth, … • “nurture” • improve rewards of use to keep people engaged • “top contributor” status, niche subgroups, …

  24. topics • case study: Essembly • user activity • content ratings

  25. votes on resolves 24953 resolves similar broad distribution in other online communities Digg, Wikipedia,… [Wilkinson 2008]

  26. see the resolve? user comes to Essembly vote on the resolve? yes vote model • visibility • how easily users find a resolve • interestingness • probability users who see a resolve vote on it similar model for Digg [Lerman 2007]

  27. content ratingsmodel components • visibility • interest

  28. visibility:how users find content • browse • e.g., recent or popular • in general and within online network • word of mouth • from people aware of, and liking, the content • e.g., link on a blog • search

  29. large drop in visibility from user interface fewer votes to older resolves “law of surfing” [Huberman et al. 1998] approximately a power law visibility distribution: power-law • recency is key factor for visibility in Essembly • contrast with controversy (standard dev. of votes): not correlated with number of votes (number of subsequently introduced resolves)

  30. content ratingsmodel components • visibility • interest

  31. interestingness:how much users like what they see • persistent property of resolves • resolves consistently get few or many votes compared to average at similar age • may have time dependence • novelty decay [Wu & Huberman 2007] • e.g., current news stories (Digg) • vs. ideological discussions (e.g., free trade)

  32. model parameter estimation • model: • visibility based on recency • next vote goes to resolve x with relative probability rx f(ax) • r is resolve’s interestingness • a is resolve’s age • number of subsequently introduced resolves • simultaneously estimate • ‘aging’ visibility function f(a) • interestingness for resolves: r1,r2,… • arbitrary scale factor for f and r • we take f(1)=1

  33. interestingness distribution: lognormal normal distribution fit to log(r) values

  34. growth in number of votesfor high and low interestingness two examples log scale r=0.65 r=0.01 (number of subsequently introduced resolves)

  35. content ratings • visibility • interest • combined model

  36. lognormal center power law tails 24953 resolves vote distribution • sample at different ages from a multiplicative process: double Pareto lognormal distribution Reed & Jorgensen 2004

  37. What determines content value? • lognormal  multiplication of factors • possible mechanisms • “rich get richer” • “inherited wealth” • or a mix of both

  38. see the resolve? user comes to Essembly vote on the resolve? yes model: visibility and interest lead to votes votes increase visibility (“popular resolves”) votes visibility interest

  39. votes visibility interest votes  more votes“rich get richer” • new votes • proportional to number of prior votes • with some variation • influenced by observedpopularity • among all users or just friends • examples • costly to evaluate content personally • ‘fashion’, latest ‘cool’ product

  40. votes visibility interest match user interests“inherited wealth” • new votes • from matching users’ prior interests • with some variation • e.g. popular vs. niche political topics • why a broad distribution? • possibly: information cascade & confirmation bias • M. Shermer “The Political Brain” Scientific Amer. July 2006 • S. Bikhchandani et al., “A Theory of Fads …” J. Political Economy 100:992 (1992)

  41. topics • case study: Essembly • user activity • content ratings • additional behaviors

  42. predictions from early behavior • model can identify • new users likely to be very active • new resolves likely to have high interest • by factoring • web site properties (visibility) • user properties (interest in content) • also with other sites: Digg, YouTube • e.g., [Crane & Sornette 2008; Lerman & Galstyan 2008; Szabo & Huberman 2008]

  43. number of links per user • model: links due to common votes • as intended to link ideologically similar users • caveat: linked users also share visibility  votes degree distribution Hogg & Szabo, in Europhysics Letters (to appear)

  44. Do active users create interesting resolves? r vs. user activity rate r vs. user activity time (actions/day) 1827 active users who introduced at least one resolve little correlation between a user’s activity and interestingness of resolves from that user

  45. future work & summary

  46. distinguishing mechanisms(future work) • experiments • alter information shown to random groups of users • can change both visibility and popularity measures • e.g., music downloads [Salganik et al, 2006] • correlation  causal factors • do votes depend on how users find content? • e.g., influence of friends • relate to characteristics of content and users

  47. summary • heterogeneous behavior • user activity • interest in content • model via components of behavior • steps toward identifying mechanisms • example: political discussion Essembly • user activity: time on site & activity rate • votes: visibility & interestingness • experiments to distinguish mechanisms

More Related