Presentation Transcript
Costs of vocabulary mapping: Costs of vocabulary mapping Leonard Will http://www.willpowerinfo.co.uk
Different kinds of subject vocabularies: Different kinds of subject vocabularies Classification schemes
Subject headings
Thesauri
Free text searching
Terms and phrases extracted from text by computer
Ways of mapping: Ways of mapping
Input and output mapping: Input and output mapping Mapping service
thesaurus User’s local
thesaurus User Information
provider’s
thesaurus Information
provider’s
database User’s free text terms Local preferred terms Information provider’s preferred terms Information provider’s
preferred terms (perhaps expanded)
Output mapping via remote thesaurus: Output mapping via remote thesaurus Mapping service
thesaurus User’s local
thesaurus User Information
provider’s
thesaurus Information
provider’s
database Information provider’s preferred terms User’s free text terms Information provider’s
preferred terms (perhaps expanded) Possible off-line consultation Possible off-line term information
Output mapping direct to remote database: Output mapping direct to remote database Mapping service
thesaurus User’s local
thesaurus User Information
provider’s
thesaurus Information
provider’s
database Off-line provision of term information User’s free text terms Information provider’s preferred terms Possible off-line consultation Possible off-line term information
Other projects and tools: Other projects and tools CARMEN
RENARDUS
AQUARELLE
SIS-TMS
UMLS
GenThes
CERES
Knowledgecite Library
Personal experience: Personal experience Editing terms into thesaurus structure consistent with AAT
25 terms per hour = 150 terms per day
4500 terms take 30 days
Assigning Dewey numbers to UNESCO terms
15 terms per hour = 90 terms per day
4500 terms take 50 days
Note: AAT has about 125,000 terms
At 90 terms per day would need 1400 days = 6.3 years
English Heritage estimator for thesaurus construction: English Heritage estimator for thesaurus construction Based on creating 10-20 simple terms per day
Complex terms 2-8 per day
At 10 terms per day,
4,500 terms take 450 person-days
= just over 2 person-years
Factors affecting calculation : Factors affecting calculation Number of terms
Number of uses
Candidate terms per year
Number of external terms mapped per year
Number of licenses
Start up requirements (EH): Start up requirements (EH) User assessment / market testing: 3 days
Introduction: 10 days
Peer review: 5-15 days
Initial documentation: 2 days
Promotion: 3 days
Audit of existing usage: 0.1 day per 500 uses
Research (e.g. reconciliation of names): 10% require 1 day per 5 terms
Training 1 day per 5 licences
Annual maintenance tasks: Annual maintenance tasks Candidate term evaluation:
1 day per 5 terms received
Mapping of existing terminology:
1 day per 50 terms received
Tracking and version control:
1 day per 1000 terms
License management
0.5 days per license
(c) English Heritage 2001 .
One to one mapping: One to one mapping Abandoned children  305.906945 Abandoned children
Abbreviations  401.48 Abbreviations
Ability  153.9 Intelligence and aptitudes
Ability grouping  371.254 Homogeneous grouping
One to many mapping: One to many mapping Abortion 
179.76 Abortion (ethics)
294.356976 Abortion (ethics - religion - Buddhism)
304.667 Abortion (demographic effects)
342.084 Abortion (law and comprehensive works)
342.085 Abortion (rights of fetuses)
342.0878 Abortion (rights of women)
344.04192 Abortion (medical law)
363.46 Abortion (social problems)
363.96 Abortion (birth control)
364.185 Abortion (criminal offences)
615.766 Abortion (drugs causing)
618.392 Abortion (spontaneous)
618.88 Abortion (surgical)