Leonard Will

Uploaded from authorPOINT Lite
Download as
 PPT
Presentation Description 

No description available

Happy Thanksgiving
What's up on authorSTREAM?
Views: 130
Like it  ( Likes) Dislike it  ( Dislikes)
Added: December 03, 2007 This Presentation is Public 
Presentation Category : Entertainment All Rights Reserved
Presentation Transcript

Costs of vocabulary mapping: Costs of vocabulary mapping Leonard Will http://www.willpowerinfo.co.uk


Different kinds of subject vocabularies: Different kinds of subject vocabularies Classification schemes Subject headings Thesauri Free text searching Terms and phrases extracted from text by computer


Ways of mapping: Ways of mapping


Input and output mapping: Input and output mapping Mapping service thesaurus User’s local thesaurus User Information provider’s thesaurus Information provider’s database User’s free text terms Local preferred terms Information provider’s preferred terms Information provider’s preferred terms (perhaps expanded)


Output mapping via remote thesaurus: Output mapping via remote thesaurus Mapping service thesaurus User’s local thesaurus User Information provider’s thesaurus Information provider’s database Information provider’s preferred terms User’s free text terms Information provider’s preferred terms (perhaps expanded) Possible off-line consultation Possible off-line term information


Output mapping direct to remote database: Output mapping direct to remote database Mapping service thesaurus User’s local thesaurus User Information provider’s thesaurus Information provider’s database Off-line provision of term information User’s free text terms Information provider’s preferred terms Possible off-line consultation Possible off-line term information


Other projects and tools: Other projects and tools CARMEN RENARDUS AQUARELLE SIS-TMS UMLS GenThes CERES Knowledgecite Library


Personal experience: Personal experience Editing terms into thesaurus structure consistent with AAT 25 terms per hour = 150 terms per day 4500 terms take 30 days Assigning Dewey numbers to UNESCO terms 15 terms per hour = 90 terms per day 4500 terms take 50 days Note: AAT has about 125,000 terms At 90 terms per day would need 1400 days = 6.3 years


English Heritage estimator for thesaurus construction: English Heritage estimator for thesaurus construction Based on creating 10-20 simple terms per day Complex terms 2-8 per day At 10 terms per day, 4,500 terms take 450 person-days = just over 2 person-years


Factors affecting calculation : Factors affecting calculation Number of terms Number of uses Candidate terms per year Number of external terms mapped per year Number of licenses


Start up requirements (EH): Start up requirements (EH) User assessment / market testing: 3 days Introduction: 10 days Peer review: 5-15 days Initial documentation: 2 days Promotion: 3 days Audit of existing usage: 0.1 day per 500 uses Research (e.g. reconciliation of names): 10% require 1 day per 5 terms Training 1 day per 5 licences


Annual maintenance tasks: Annual maintenance tasks Candidate term evaluation: 1 day per 5 terms received Mapping of existing terminology: 1 day per 50 terms received Tracking and version control: 1 day per 1000 terms License management 0.5 days per license (c) English Heritage 2001 .


One to one mapping: One to one mapping Abandoned children  305.906945 Abandoned children Abbreviations  401.48 Abbreviations Ability  153.9 Intelligence and aptitudes Ability grouping  371.254 Homogeneous grouping


One to many mapping: One to many mapping Abortion  179.76 Abortion (ethics) 294.356976 Abortion (ethics - religion - Buddhism) 304.667 Abortion (demographic effects) 342.084 Abortion (law and comprehensive works) 342.085 Abortion (rights of fetuses) 342.0878 Abortion (rights of women) 344.04192 Abortion (medical law) 363.46 Abortion (social problems) 363.96 Abortion (birth control) 364.185 Abortion (criminal offences) 615.766 Abortion (drugs causing) 618.392 Abortion (spontaneous) 618.88 Abortion (surgical)