500 likes | 655 Views
Uncorking the Varietals: Social Tagging, Folksonomies & Controlled Vocabularies. Margaret Maurer Head, Catalog and Metadata Kent State University Libraries and Media Services. In wine making - What is a Varietal?. A wine made from a single, named grape variety.
E N D
Uncorking the Varietals: Social Tagging, Folksonomies & Controlled Vocabularies Margaret Maurer Head, Catalog and Metadata Kent State University Libraries and Media Services
In wine making - What is a Varietal? • A wine made from a single, named grape variety. • Cabernet Sauvignon wines are made from cabernet sauvignon grapes • Chardonnay wines are made from chardonnay grapes
In information seeking – on the Web or in the catalog • Access and identification systems may be controlled by librarians–controlled vocabularies • Access and identification systems may be dynamically generated by users–social tagging, folksonomies • These are different varieties of access and identification systems
This presentation • Controlled vocabularies • Social Tagging • Folksonomies • My recommendations First we’ll talk about the cabernet sauvignons – the controlled vocabs
Purpose of a controlled vocabulary • To create sets of objects • To serve as a bridge between the searcher’s language and the author’s language • To provide consistency • To improve precision and recall
Characteristics of a controlled vocabulary • Features a single, authorized form of heading • Often features a syndetic structure of cross-references • Based on belief that the successful use of the catalog is based on the quality of the individual records
The authority record structure • Records the standardized form • Ensures the gathering together of records via that access point • Enables standardized catalog records • Documents decisions taken • Records all other heading forms and provides links from them to the standardized form
Benefits of controlled vocabularies • Promotes discovery generally • Promotes discovery when the aboutness of something has nothing to do with words in the resource or its representation • Imaginative literature (Genre headings) • Humanities • Promotes pre-coordinated displays expand access–http://cinema.library.ucla.edu
Benefits when combined with keyword searching • Keywords hook into strings of terms most efficiently • Users can be routed by pre-coordinated strings
Controlled vocabularies support faceted catalogs • Encore • Evergreen • Endeca • WorldCat Local All provide hyperlinks to authorized headings
Weaknesses of controlled vocabularies • The artificially controlled language is not necessarily natural language—Cookery anyone? • Subject searches are the most problematic for users • It may work better in theory than in practice • It is costly to perform necessary maintenance • Cost is seen to outweigh the benefits by many administrators
Library of Congress Subject Headings - LCSH • Has a long and well-documented history • Commonly used • Is contained in millions of bibliographic records • Strong institutional support from LC
More benefits of LCSH • The rich vocabulary covers most subjects • It imposes synonym and homograph control • There are machine assisted authority control mechanisms • There is pre-coordination with LCC • The music subject heading system is well developed
Weaknesses of LCSH • It is a generalist taxonomy that can’t always provide needed granularity • Terminology currency • It doesn’t allow for post-search coordination (it is pre-coordinated) • It suffers from LC Collection bias
More weaknesses of LCSH • Training needed • Requires some orientation to use effectively • Is not always accurately applied by catalogers • Maintenance • It is difficult to maintain when changes occur
Authority control outside the catalog • Data critical mass tipping point? • Homogeneity of data in terms of subject matter • Requirements within data community’s users for specificity • Size • Computing power • Wikipedia’s “disambiguation”
What if we did open up our authority files to the web? • National Library of Australia’s People Australia Project http://www.nla.gov.au/initiatives/peopleaustralia/ • Wikipedia Persondata-Tool http://www.ifla.org/IV/ifla73/papers/113-Danowski-en.pdf
Is ontology overrated? • Physicality requires ontologies for searching, but systems with hyperlinks do not • Browse versus search may eliminate the need for creating lists of authorized headings
Ontological classification • Works well when the domain to be organized is small, has formal categories, has stable entities, is restricted and has clear edges • Does not work well when the domain to be organized is large, has no formal categories, is unstable, is unrestricted and has no clear edges
Ontological classification • Works well when the participants are expert catalogers, authoritative sources of judgement, coordinated users or expert users • Does not work well when the participants are uncoordinated, armature, naïve or non-authoritative
Now we talk about the Chardonnays – social tagging and folksonomies
What are tags? • Keywords or terms associated with or assigned to a piece of information • They enable keyword-based classification and search of information
Common Web sites that use tags include • Del.icio.us – Social bookmarking site • Flickr – Image tagging • LibraryThing • Gmail - Webmail • YouTube
Tags, and therefore social tags and folksonomies are • Dynamic categorization systems • Often created on-the-fly • Chosen as relevant to the user – not to the creator, cataloger or researcher • A social activity (more on this later) • Hopefully one small step toward a more interactive and responsive library system
Social tags are • Non-hierarchical • A way to create links between items by the creation of sets of objects • A means of connecting with others interested in the same things
Way baaack in 2003… • Del.icio.us includes identity in its social bookmarking • Flickr includes tags • Lists of tags became a tool for serendipitous discovery (folksonomies)
Why is tagging so popular? • It is easy and enjoyable • It has a low cognitive cost • It is quick to do • It provides self and social feedback immediately
People tag things • To find them again • To get exposure and traffic • To voice their opinions • Incidentally as they perform other tasks • To take advantage of functionality built on top of a folksonomy • To play a game or earn points
Putting the social in tagging • Tags allow for social interaction because when we navigate by tags we are directly connecting with others • People tag for their own benefit
Don’t confuse tags with keywords or full-text searching • Keywords are behind the scenes, tags are often visibly aggregated for use and browsing • Keywords can not be hyper-linked • Keywords imply searching, tags imply linking • Full-text searching is passive, tagging is active • It’s more about connecting items rather than categorizing them.
What is a Folksonomy? • Folksonomy refers to an “emergent, grassroots taxonomy” • An aggregate collections of tags • A bottom-up categorical structure development • An emergent thesaurus • A term coined by Thomas Vander Wal
How do folksonomies work? • The searcher defines the access, but • The aggregation of the terms has public value • It’s a typically messy democratic approach
What makes folksonomies popular? • Their dynamic nature works well with dynamic resources • They’re personal • They lower barriers to cooperation
Tagging and the consequent folksonomies work best when • It’s easy to do • It’s not commercial in nature • Taggers have ownership • Taggers are more likely to tag their own stuff than they are your stuff • It has been shown to work well on the Web
The unexpected development: terminological consensus • Collective action yields common terms • Stabilization may be caused by imitation and shared knowledge • The wisdom of the crowd
Is your tagging influenced by my tagging? • Of course it is! • People are beginning tag in ways that make it easier for others to fine like stuff • Shared meaning consequently evolves for tags • Most used tags become most visible
Strengths of folksonomies • Cost-effective way to organize Internet • Social benefits • It’s inclusive • For many environments, they work well
Issues with meaning • They do not yield the level of clarity that controlled vocabularies do • Term ambiguity – words with multiple meanings • No synonym control
Issues with specificity • Variable specificity for related terms • Broadness of terms impacts precision – terms are often imprecise • Mixed perspectives
Issues with structure • Singular and plural forms create redundant headings • No guidelines for the use of compound headings, punctuation, word order • No scope notes • No cross references
Issues with accuracy • Collective ‘wisdom’ of the tagging community • How does wrong information impact retrieval • Conflicting cultural norms • Sometimes authority counts
“Spagging” and other problems • Opening doors to opinion tags • Tagging wars • “Spagging” Spam tagging
Tidying up the tags…? • Lists of tagging norms have been developed • Are there programmatic solutions? • Users know they are looking at tags • By tidying, do we destroy the essence of why this works? • Do we realistically have the resources?
Recommendations Don’t assume that one size fits all • Retain controlled vocabularies in the catalog • Explore ways to use controlled vocabularies to help organize the internet by re-purposing controlled vocabularies that already exist • Invite Folksonomies to the party in the catalog to gain their benefits • Explore ways to combine the two systems
Recommendations When you invite folksonomies into the catalog, do so strategically, and carefully • Don’t put terms in the same index as controlled vocabularies • Find ways to associate terms applied across editions of works • Need for mediation, or at least observation • The crowd is not necessarily the best arbiter of specific terminology
Recommendations Always remember why people tag • People tag things because they want to find them, not because they want others to find them • Be aware that this will impact the quality of the terms, and their frequency
Recommendations Controlled vocabularies could be better utilized than they currently are • Subject structures are underutilized in the ILS • Controlled vocabularies that exist are not being exported to the Web • Well-connected terms foster discovery – let’s connect them. Index those cross references where available
Questions? Margaret Maurer mbmaurer@kent.edu