Introduction to Clinical Terminology and Classification Clinical Decision SupportL4: Introduction to Clinical Terminology and Classification Clinical Decision Support L4 AL Rector OpenGALEN TopThing UK The Medical Informatics Group, U of Manchester
www.cs.man.ac.uk/mig/galen www.opengalen.org www.topthing.com rector@cs.man.ac.uk
The Vision: The Vision Best Practice Best Practice
OpenGALEN: Philosophy: OpenGALEN: Philosophy Terminology is software
Terminology is the interface between people and machines
Re-use is the key
Patient-centred information
Terminology must have a purpose
Always ask: “What’s it for?”
Not art for art’s sake
Terminology supports clinical applications - not vice versa
Applications for someone to do something for somebody
Keep the ‘Horse before the Cart’
Always ask: “How will we know if it works?” “How will we know if it fails?”
OpenGALEN: Key ideas: OpenGALEN: Key ideas Separation of kinds of knowledge
Terminology, medical record and information system schemas
Concepts, language, Coding, Indexing, Pragmatics
Machine level, User level
Knowledge is fractal!
There will always be more detail to be added
Therefore terminologies must be extensible
Formal logical Support
Too big and complicated to maintain by hand
Extensibility requires rules
Software needs logical rigour
Axes for kinds of Knowledge: Axes for kinds of Knowledge Machine level
Human Level Concepts
Language
Coding
Indexing
Pragmatics & User Interface Terminology
Medical Records/ Information systems
Slide6: Uses of Terminology Clinical
Epidemiology and quality assurance
Reproducibility / Comparability
Indexing
Software
Re-use !
Integration and Messaging between systems
Authoring and configuring systems
Data capture and presentation (user interface)
Indexing information and knowledge (meta-data, The Web)
History:Origins of existing terminologies: History: Origins of existing terminologies Epidemiology
ICD - Farr in 1860s to ICD9 in 1979
International reporting of morbidity/mortality
ICPC - 1980s
Clinically validated epidemiology in primary care
Now expanded for use in Dutch GP software
Librarianship
MeSH - NLM from around 1900 - Index Medicus & Medline
EMTree - from Elsevier in 1950s - EMBase
Remumeration
ICD9-CM (Clinical Modification) 1980
10 x larger than ICD; aimed at US insurance reimbursement
Traditional Systems: Traditional Systems Built by people for interpretation by people (Coding clerks)
Most knowledge implicit in rubrics
Must understand medicine to use intelligently
Not built for software
On paper for use on paper
Enumerated - top down all possibilities listed
Serial - Single use - Single View
Hierarchical Thesauri
Traditional terminological techniques from librarianship
‘Broader than’ / ‘Narrower than’ (ISO 1087)
no logical foundation
Focused on ‘terms’
Language and concepts mixed
Synonyms, preferred terms, etc caused confusion
History (2) : History (2) Pathology indexing
SNOMED 1970s to 1990 (SNOMED International)
First faceted or combinatorial system
Topology, morphology, aetiology, function
Plus diseases cross referenced to ICD9
Specialty Systems
Mostly similar hierarchical systems
ACRNEMA/SDM - Radiology
NANDA, ICNP… - Nursing
…
History (3): History (3) Early computer systems
Read I (4 digit Read)
Aimed at saving space on early computers
1-5 Mbyte / 10,000 patients
Hierarchical modelled on ICD9
Detailed signs and symptoms for primary care
Purchased by UK government in 1990
Single use
Morbidity indexing
Medical Entities Dictionary (MED)
Jim Cimino
History (4): History (4) Aspirations for electronic patient records (EPRs)
Weed’s Problem Oriented Medical Record
Direct entry by health care professionals
Aspirations for decision support
Ted Shortliffe (MYCIN), Clem McDonald (Computer based reminders), Perry Miller (Critiquing),..
Aspirations for re-use
Patient centred information
Needed common multi-use multi-purpose terminology
None worked
Summary of Changes at end of 1st Generation: Summary of Changes at end of 1st Generation From terminologies for people to terminologies for machines
From paper to software
From single use to multiple re-use for patient centred systems
From entry by coding clerks to direct entry by health care professionals
From pre-defined reporting for statistics to reliable indexing for decision support
Problems with‘First Generation’Enumerated Systems in coping with these changes: Problems with ‘First Generation’ Enumerated Systems in coping with these changes
Problems (1): Problems (1) Scaling !!!
More detail and more specialities required scaling up, but...
The combinatorial explosion
Example: Burns:
100 sites x 3 depths 404 codes
5 subsites/site x chemical or thermal 7272
x 3 extents x 3 durations 116,352
‘The Persian chessboard’
264 1019
1019 grains of rice 100 billion tonnes of rice
1019 nanoseconds 10,000 years
Read II grew from 20,000 to 250,000 terms in ~100 staff-years
still too small to be useful
but too big to use
Problems (2): Problems (2) Information implicit in the rubrics
“Hypertension excluding pregancy”
Computers can’t read!
Invisible to software
No explicit information except the hierarchy
Minimal support for software
No opportunity to use softwre to help
Language and concepts confused
Synonyms
Preferred terms
Homonyms
Only simple look up and spelling correction
Problems (3) : Problems (3) Mixed Organisation
‘Heart diseases’ in 13 of 19 chapters of ICD
Tumours, infections, congenital abnormalities, toxic, …
‘Steroids’ in five chapters of standard drug classifications
Anti-inflammatories, anthi-asthmatics, …
Unreliable for indexing or Abstractions
How to say something about ‘all heart diseases’?
Fixed organisation
Single hierarchy - Single use
Where to put ‘gout’ - arthritis or metabolic disease?
Back and forth in each edition of ICD
No re-use
Problems 3bThesauri rather than Classifications: Problems 3b Thesauri rather than Classifications
Problems (4): Problems (4) ‘Semantic identifiers’
Codes really paths - moving a concept meant changing its code
3 Cardiovascular disorders
… 3.4 Disorders of Artery ...
... 3.4.2 Disorders of coronary artery ...
… 3.4.2.3 Coronary thrombosis …
Easy to process but...
Reorganisation requires changing codes
Codes cannot be permanent
Problems (5): Problems (5) Maintenance
20 Years from ICD9 to ICD10
~100 person-years from Read 1 to Read 3
Mega francs/guilders/crowns/marks on European coding schemes
Thousands of unpaid hours of committee time
Impossible / meaningless decisions take longest
You can search forever for something that is not there
Multiple uses compete -
Must choose one use
Most successful were clear about their purpose - ICD, ICPC, MeSH
Codes change meaning with version changes
Old data misleading!
Problems (6): Problems (6) Version specific artefacts
“Not otherwise specified” (NOS)
Used to move a general concept ‘down’
Not elsewhere classified (NEC)
Catch all - Nowhere else in coding system e.g. ‘Tumour not elsewhere classified’
dependent on version,
“Other”
Catch all - Not listed below, e.g. “Other diseases of the cardiovascular system”
dependent on version
Not used consistsently
Problem (7): Language is slippery:Two hands or Four?: Problem (7): Language is slippery: Two hands or Four?
Language/Concepts are slippery: Language/Concepts are slippery Human cognition makes it look easy
Logic fails to capture it
Classification is easy until you try to do it
Trying since Aristotle in the West and Ancient Chinese in the East
Words/Concepts mean what a community decides they mean
Does a chimpanzee have four hands?
Is a prion alive?
Is surgery on the ovary a kind of ‘Endocrine surgery’?
Easier to agree on the concrete than the abstract
Easy to agree on useful abstractions and generalisations
Harder to agree on how to name them
Problems (8): Problems (8) There is no re-use - there is no standard
The ‘grand challenge’: A common controlled vocabulary for medicine
But re-use requires multiple different views
People’s needs differ / People do and find different things
By profession
Doctors and specialties, nurses, physiotherapiests, dentists…
By situation
Inpatient, outpatient, primary care, community…
By task
Diagnosis, management, prescribing,
patient care, public health, quality assurance, management, planning
By country and community
US, UK, France, Germany, Japan, Korea, ...
Summary of Problems1st Generation Enumerated Systems: Summary of Problems 1st Generation Enumerated Systems Enumerated Single Hierarchies
List all possibilities in advance
Cannot cope with fractal knowledge
Most knowledge implicit
Invisible to software
Can’t agree on common concepts and classification
Unreliable for indexing
Difficult to use for healthcare professionals
No support for user interface
Can’t build and maintain big classifications
Language and concepts don’t translate easily to logic and software
Cimino’s Desiderata (1): Cimino’s Desiderata (1) Concept orientation
Separate language (terms) and concepts (codes)
Concept permanence
Never re-use a code (‘retire’ it)
Nonsemantic concept identifiers
Separate the code from the path
Polyhierarchy
Allow one concept to be classified in multiple ways
Gout can be both a metabolic disease and an arthritis
Cimino’s Desiderata (2) : Cimino’s Desiderata (2) Formal Definitions
i.e ‘Be compositional’
Reject ‘Not elsewhere classified’
concept permanence and NEC
Multiple granularities
Organ, tissue, cellular, molecular
Grades, types, classes of diseases
Special clinical criteria
Multiple consistsent views
Allow different organisations
e.g. functional, anatomical, pathological
Cimino’s Desiderata (3): Cimino’s Desiderata (3) Represent context
Family history, risk, source of information
Evolve gracefully
Allow controlled changes
Recognise redundancy (equivalence)
‘Carcinoma’ + ‘Lung’ ?=? ‘Carcinoma of the lung’
How would we know?
How could a machine know?
Solution Generation 1Megaterm + Crossmapping = UMLS: Solution Generation 1 Megaterm + Crossmapping = UMLS Clinical Applications Medical Records Data entry Decision support
Solution 1 Cross-mapping & UMLS : Unified Medical Language System (UMLS) from US National Library of Medicine
Defacto common registry for vocabularies
Concept Unique Identifiers (CUIs) and Lexical Unique Identifiers (LUIs) are defacto the common nomenclature Solution 1 Cross-mapping & UMLS
Solution 1 Cross-mapping & UMLS : Solution 1 Cross-mapping & UMLS An invaluable resource, but...
No better than the vocabularies which are mapped
Limited detail for patient care
Unreliable for indexing or abstraction of knowledge
Best for relating everything to MeSH for indexing literature
Still limited by combinatorial explosion
Still can’t cope with fractal knowledge
Not extensible - no help in building or extending terminologiese
No help in reorganising existing terminologies to re-use for new purposes
Top down
Information still implicit
Minimal help with software
No help with data capture, user interfaces
Solutions Generations 2-3Compositional Systems: Solutions Generations 2-3 Compositional Systems Beat the combinatorial explosion
Build concepts out of pieces - leggo
Dictionary and grammar rather than phrasebook
But hard
Solution Generation 1.5: Faceted: Solution Generation 1.5: Faceted Faceted systems: SNOMED International
Inflammation + Lung + Infection + Pneumococcus Pneumoccal pneumonia
Limit combinatorial explosion, but…
Rigid - a limited number of axes / facets / chapters
Each facet has the problems of a first generation enumerated system
Much knowledge still implicit
No way to know how identifiers relate
No explicit relations, only ‘+’
No way to recognise redundancy / equivalence
No help with data capture or user interface / No way to recognise nonsense
Carcinoma + Hair + Donkey + Emotional ????
Still can’t cope with fractal knowledge
Limited extensibility: limited help with building, extending or reorganising
Still Top Down
Generation 2: Enumerated Compositional: Generation 2: Enumerated Compositional Read III with qualifiers
Inflammation: site: lung, cause: pneumococcus Pnemococcal Pneumonia
More semantics but…
Limited qualifiers - limited views - limited re-use
Limited help with data capture - User interface difficult
Much information still implicit - limited software support
No way to recognise redundancy / equivalence / errors
Organisation still mixed - indexing better but still unreliable
Limited separation of language and concepts
Still can’t cope with fractal knowledge
Limited extensibility; limited help with building and reorganising terminologies
Top down
CT Vocabulary: CT Vocabulary “Reference Terminology” vs “Interface Terminologies”
Reference terminology = enumerated hierarchy of formally defined terms
Interface terminology = navigation structure for user interface
Explicitly excluded from SNOMED-RT
“Terming”, “Coding”, and “Grouping”
Terming - finding the lexical string
Coding - finding the correct unique code (concept)
Grouping - putting codes into groupers for epidmiological or other purposes
Generation 2.5 Pre-coordinatedFormal Compositions: Generation 2.5 Pre-coordinated Formal Compositions SNOMED-RT (SNOMED-CT?)
Formal logical model for classifying a fixed list of definitions
Simple fixed ontology (7 links)
GALEN derived terminologies
UK Drug Ontology
Procedure classifications
Generation 2.5 Pre-coordinatedFormal Compositions More semantics: Generation 2.5 Pre-coordinated Formal Compositions More semantics Limited ability to cope with combinatorial explosion
Any one pre-coordinated terminology of fixed size
but arbitrarily many terminologies might be derived
Limited ability to cope with fractal knowledge
Limited extensibility
Extensibility requires access to ‘Workbench’
Bottom up / middle out
More explicit information
Logical criteria for correctness / redundancy / equivalence
Based on knowledge representation (ontologies) and description logics
Limited support for data capture and user interface
Generation 3: Post-Coordinated Formal Concept Model with Constraints delivered as Software Services: Generation 3: Post-Coordinated Formal Concept Model with Constraints delivered as Software Services OpenGALEN Reference Model - PEN&PAD/Clinergy™
Inflammation which hasCause (Infection which hasCause Pneumococcus) PneumococcalPneumonia “Pneumococcal Pneumonia”
A dictionary and grammar rather than a phrase book
Software rather than data
A sound logical and ontological foundation
Generation 3: Post-Coordinated Formal Concept Models: Generation 3: Post-Coordinated Formal Concept Models Copes with combinatorial explosion
Indefinitely many compositions possible
Lists not pre-enumerated
Copes with fractal knowledge
Easily extensible to add more detail
Most information explicit
More comprehensive ontology (50-250 links)
Good support for data capture / user interface
But requires additional pragmatic knowledge layer
Separates user view and machine view
Intermediate representation vs GRAIL
Case Study 1: The exploding bicycle: Case Study 1: The exploding bicycle ICD-9 (E826) 8
READ-2 (T30..) 81
READ-3 87
ICD-10 (V10-19) 587
V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income
W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity
X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities
Description Logics: A crash course: Description Logics: A crash course Thing + feature: pathological + (feature: pathological)
Defusing the exploding bicycle:500 codes in pieces: Defusing the exploding bicycle: 500 codes in pieces 10 things to hit…
Pedestrian / cycle / motorbike / car / HGV / train / unpowered vehicle / a tree / other
5 roles for the injured…
Driving / passenger / cyclist / getting in / other
5 activities when injured…
resting / at work / sporting / at leisure / other
2 contexts…
In traffic / not in traffic
V12.24 Pedal cyclist injured in collision with two- or three-wheeled motor vehicle, unspecified pedal cyclist, nontraffic accident, while resting, sleeping, eating or engaging in other vital activities
Goodbye to picking lists…: Goodbye to picking lists… What you hit
Your Role
Activity
Location Cycling Accident
Other important links and initiatives: Other important links and initiatives HL7 Vocabulary group
See HL7 web site
Or join list server
SNOMED-DICOM-Microglossary (Radiology)
Nursing initiatives - see Nick Hardiker papers
ISO TC215 WG2 / CEN TC251 WG3
See web sites
Criteria for success: Criteria for success Re-use
A recognised growing library of common decsision support modules
Stop starting from scratch!
Integration
2+ independently developed DSSs integrated with 2+ independently developed EPRS without exponentially increasing effort.
Criteria for success: Criteria for success Authoring
No individual invests in their own terminology
enterprise-wide terminology servers
Indexing
Simplification of systems
a sharp drop in special cases and exceptions
a sharp increase in authors’ productivity
Criteria for success: Criteria for success User interfaces
Real systems in real use with real patients by real clinicians
transparent systems
OpenGALEN: OpenGALEN www.opengalen.org