Improving MEDLINE Query Precision With Pseudo Relevance Feedback : Improving MEDLINE Query Precision With Pseudo Relevance Feedback January 30, 2004
Jay Urbain
Advisor: Ophir Frieder
Improving MEDLINE Query Precision With Pseudo Relevance Feedback : Improving MEDLINE Query Precision With Pseudo Relevance Feedback Improving the precision of MEDLINE search results is an important challenge.
Prior investigations on improving retrieval performance using thesaurus and relevance feedback have produced mixed results.
The use of a mixture of IR strategies & utilities along with RF makes it difficult to determine the contributions of each technique.
We review prior work, and use a systematic method to evaluate best techniques & parameters for using pseudo RF on MEDLINE.
Our research showed an increase in average precision of 21.5% using pseudo RF, exceeding the results of similar efforts.
MEDLINE : MEDLINE Collection of more than 10,000,000 citations from more than 4,200 medical journals, indexed from a MetaThesaurus containing more than 19,000 Medical Subject Headings (MeSH) .
MeSH organized in a related hierarchy of categories and sub-categories.
Myocardial Ischemia (304,753)
Coronary Disease (140,720)
Angina Pectoris (33,938)
Angina, Pectoris, Variant (1,889)
Angina, Unstable
Syndrome X
Coronary Aneurysm
Coronary Arteriosclerosis
Coronary Thrombosis
Coronary Vasospasm
http://www.nlm.nih.gov/mesh/MBrowser.html
Sample MeSH : Sample MeSH
Adult
Cardiovascular Diseases
epidemiology
prevention & control
Diabetes Mellitus, Type II
epidemiology
…
UMLS MetaThesaurus : UMLS MetaThesaurus The MetaThesaurus is a comprehensive vocabulary that includes definitions MeSH terms [13].
Thesauri can have three types of relationships:
Synonym
Represents same underlying concept.
“cancer vs. carcinoma”
Hierarchical
Broader/narrower “is-a” relationship.
“Angiotensin Converting Enzyme Inhibitors vs. Captopril”
Related
Some other type of relationship
“hypertensive vs. anti-hypertensive”
Searching the MEDLINE Database : Searching the MEDLINE Database Requires knowledge of the hierarchical MeSH indexing scheme.
Even “expert” users tend to select MeSH terms that are too general.
Studies by Hersh, et al. [5] found novice users using VSM search can be as effective as experts using Boolean search.
IR systems are becoming essential tools for the practice of evidence based medicine [12].
Novice and expert users alike need to find relevant documents with the facts needed to make decisions.
Retrieval Strategies : Retrieval Strategies Vector Space Model (VSM)
Measure similarity by inner product, or Cosine angle between query & document vectors.
IDF = log(N/df) // N= total # of docs; df=# of docs with term
SC = ∑ (tfd*idf * tfq*idf) // dot product.
SC = ∑ (tfd*idf * tfq*idf)/nd // cosine normalized by document length.
Probabilistic
Probability estimate of the chance of a document being relevant to a query.
P(R | t1) = (num relevant with t1 / num relevant)
(num with t1 / all documents)
P(R | t1, t2, …, tn) = P(R | t1) x P(R | t1) x P(R | t2) x … x P(R | tn)
Relevance Feedback : Relevance Feedback Refine search queries based on prior relevance judgments.
Manual –
Relevant documents identified by the user.
New terms selected by the user or automatically.
Automatic “Pseudo” –
Assume the top-ranked documents are relevant.
New terms are selected automatically.
Prior Work : Prior Work
Experimental Methodology : Experimental Methodology Prior work used combinations of several IR strategies and utilities, making the identification of the contribution due to RF difficult.
Results from Hersh and Srinivasan, and others vary considerably.
Used a variation of the systematic method outlined by Lundquist, Grossman, and Frieder [7] for identifying optimal techniques to improve relevance feedback.
Our method include calibrating the number of top-ranked documents, the number of feedback terms used for relevance feedback, and adjusting the term weight.
Methodology – Experimental Steps : Methodology – Experimental Steps Evaluate baseline retrieval strategy and term weights.
Dot Product
Cosine
Probabilistic
Explore expansion using MeSH and title terms.
Identify optimal query expansion parameters:
Term relevance (term count, idf, term count and idf)
Number of top documents to include in term selection
Number of terms added to the query
Number of iterations
Data set, Sample Query : Data set, Sample Query Data set:
The TREC-9 training subset of the OHSUMED MEDLINE training collection: 54,710 MEDLINE citations from 1987 and 63 queries developed by the Hersh team at Oregon Health Sciences University [5].
Sample query:
Number: OHSU1
60 year old menopausal woman without hormone replacement therapy
Description:
Are there adverse effects on lipids when progesterone is given with estrogen replacement therapy
Sample MEDLINE Citation from Dataset : Sample MEDLINE Citation from Dataset .I 48200
.U
87316326
.S
Obstet Gynecol Clin North Am 8712; 14(1):299-320
.M
Drug Administration Schedule; Endometrium/*DE/PA; Estrogens/AD/*PD; Female; Human; Menopause/*DE; Middle Age; Norethindrone/*AA/AD/PD; Support, Non-U.S. Gov't; Uterine Hemorrhage/ET/PA; Uterine Neoplasms/CI/PC.
.T
The effects of estrogens and progestogens on the endometrium. Modern approach to treatment.
.P
JOURNAL ARTICLE; REVIEW.
.W
The major hazard of postmenopausal cyclic estrogen therapy is endometrial hyperstimulation. The incidence of hyperplasia is dose dependent; the incidence of carcinoma is both dose and duration dependent. The risk of carcinoma is small. Invasive procedures such as endometrial biopsy to detect those patients with hyperplasia and carcinoma are unlikely to be cost-effective and have other disadvantages. …
.A
Whitehead MI; Fraser D.
Methodology – Baseline : Methodology – Baseline Phrasing*:
Phrasing techniques were not included in the study.
Initial testing with 2-word phrasing showed no significant increase in precision.
Significantly increased the size of the index, and index build time.
Stemming*
Standard Porter and Lovins stemmers significantly reduce precision in early testing.
*Would have also increased the number of variables in the study.
Environment:
The SimpleIR IR engine used in Information Retrieval classes at IIT was enhanced to include weighted terms, support for the tested information retrieval strategies, and modified to improve scalability.
All runs were performed on a 400 MHz Pentium III Dell laptop with 500M of memory.
Results - Baseline retrieval strategy : Results - Baseline retrieval strategy Identify Document Term Weights
Results - Baseline retrieval strategy : Results - Baseline retrieval strategy Identify Retrieval Strategy with Constant Relevance Feedback Parameters*
*3 additional terms from top 10 documents with one additional iteration, feedback term weight: termFrequency*idf*idf;
Expansion using MeSH and title terms : Expansion using MeSH and title terms Query Term Expansion: MeSH terms, Title terms
*3 additional terms from top 10 documents with one additional iteration
Identify query expansion parameters : Identify query expansion parameters Term Relevance (term count, idf, term count and idf)
3 additional terms from top 10 documents with one additional iteration, dot product SC, max(1, 0.1tf) document term weighting, query expansion with MeSH terms only
Identify query expansion parameters : Identify query expansion parameters Number of Top Documents to Include In Term Selection
3 additional terms from top documents with one additional iteration, dot product SC, max(1, 0.1tf) document term weighting, query expansion with MeSH terms only, term relevance measure of termFrequency*idf.
Identify query expansion parameters : Identify query expansion parameters Number of Terms Added To Query Terms*
*15 top documents with one additional iteration, dot product SC, max(1, 0.1tf) document term weighting, query expansion with MeSH terms only, term relevance measure of termFrequency*idf.
Identify query expansion parameters : Identify query expansion parameters Number of Feedback Iterations
3 terms selected from 15 top documents, dot product SC, max(1, 0.1tf) document term weighting, query expansion with MeSH terms only, term relevance measure of termFrequency*idf.
Summary - RF Parameters : Summary - RF Parameters Optimal parameters:
Select the 3 most discriminating terms
Use tf*idf term selection criteria
Select from the 15 most highly ranked documents
Use one feedback iteration.
These relevance feedback parameters yielded an average precision of .3590 over 11-points of recall and represented an increase of 21.5% over the average precision of .2955 achieved without any relevance feedback
RF vs. No RF Precision : RF vs. No RF Precision
RF vs. No RF Precision at n Docs : RF vs. No RF Precision at n Docs
Discussion : Discussion Pseudo RF using MeSH terms uniformly increased precision at all points of recall.
Results exceeded the improvement in average precision in similar studies by Hersh [21], Srinivasan [11], and Aronson [1], though they can’t be directly compared due to differences in underlying IR strategies and utilities.
Results indicate a broad based robustness of utilizing relevance feedback to improve precision.
Results and tuning parameters, may be subject to some bias introduced by the order in which we conducted our experiments.
By RF’s very nature of expanding queries with terms from top ranked documents, relevance feedback is bound to provide improvement since we are utilizing the criteria we have defined for relevance to identify “relevant.” terms.
Future Work : Future Work Apply adaptive learning methods at each iteration of RF.
Restrict MeSH term expansion for highly ranked retrieved documents to MetaThesaurus concepts related to the initial query.
Expansion using Subject heading/sub-headings relations.
“fuzzy” term matcher utilizing machine learning.
Explore causes of rare diseases by looking for indirect links in different subsets of MEDLINE biomedical literature (Swanson [15]).
UMLS co-occurrence statistics.
Add page Score algorithm to help identify “gold standard” papers.
Full OSHUMED data set.
Automated weighting of feedback terms.
Jay Urbain : Jay Urbain Professional:
7/99 to Present. Principal, Upstream Development LLC
8/03 to Present. Lecturer, Milwaukee School of Engineering, EECS.
9/00 to Present. Certified Instructor, Learning Tree International
7/97 to 7/99. VP Software, ThinkMed LLC
3/93 to 7/97. Senior Engineering Manager, Marquette Medical Systems
2/86 to 8/87. Senior Systems Engineer, Rockwell, Computer Vision Group
4/84 to 2/86. Senior Software Engineer, GCA Corp.
1/82 to 4/84. Software Engineer, Northrop Defense Systems
Academic:
Post Grad. Studies in Computer Science, IIT, Chicago
MBA, May ’97, UNIVERSITY OF WISCONSIN, Madison
MSEE, Computer Engineering, 1986, IIT, Chicago
BS Engineering, Computer & Info. Systems, 1981, UNIVERSITY OF ILLINOIS, Chicago
Certifications:
Sun Certified Enterprise Architect for Java Platform Enterprise Edition Technology
Sun Certified Programmer for the Java 2 Platform
Java Enterprise Certified Professional (Learning Tree International)
Old Stuff follows : Old Stuff follows
Retrieval Strategy – VSM : Retrieval Strategy – VSM VSM represents each document and query as a vector of terms.
Measure similarity (SC) by taking dot product, or measuring angle (Cosine) between them.
Documents most similar to the query are deemed to be most relevant.
Each term in a document and a query is weighted by term frequency and inverse document frequency score (IDF).
IDF provides basic measure of how discriminating a term is in identifying a document
Retrieval Strategy – VSM : Retrieval Strategy – VSM
Measure similarity (SC) by taking dot product, or measuring angle (Cosine) between document and query vector.
Each term in a document and a query is weighted by term frequency and inverse document frequency score (IDF).
IDF = log(N/df) // N= total # of docs; df=# of docs with term
SC = ∑ (tfd*idf * tfq*idf) // dot product.
SC = ∑ (tfd*idf * tfq*idf)/nd // cosine normalized by document length.
Retrieval Strategy – Probabilistic : Retrieval Strategy – Probabilistic Use probability estimate of the chance of a document being relevant to a query.
Odds a single term will be in a relevant document:
P(R | t1) = (num relevant with t1 / num relevant)
(num with t1 / all documents)
Probability for multiple terms:
P(R | t1, t2, …, tn) = P(R | t1) x P(R | t1) x P(R | t2) x … x P(R | tn)
Retrieval Strategy – Issues : Retrieval Strategy – Issues Small size of the query vector makes finding a higher percentage of relevant documents difficult since query vector contains only a few “relevant” terms.
In a MEDLINE application, it is difficult for the average user to identify relevant MeSH terms to facilitate high precision searches.
Relevance feedback can address both issues by adjusting the query vector through term expansion of highly relevant/discriminating terms.
Relevance Feedback : Relevance Feedback Automatically expand query by adding terms not in the original query, but in the ranked relevant documents.
The “more like this” link available with search engines allows user to select which documents are relevant.
In Rocchio’s approach:
VSM is used to rank documents.
After feedback from the user, relevant document vectors are added to the query, and non-relevant document vectors are removed:
Relevance Feedback : Relevance Feedback Many extensions of Rocchio's algorithm have been proposed, like the Ide regular algorithm and Ide dec-hi algorithm.
Salton and Buckley found a variation of “Ide dec-hi” method, where relevant terms from top ranked doc’s are added directly to the queries, to be the most effective technique in their study.
Pseudo RF – Approach : Pseudo RF – Approach
Assume the top ranking retrieved documents are relevant.
Identify most relevant terms (MeSH, title, abstract):
Term frequency
How effectively a term uniquely identifies a document (IDF).
Add most relevant terms to query.
Re-execute the expanded query.
Repeat process (if helpful).
Prior Work - Hersh : Prior Work - Hersh Hersh, Price, and Donohoe [21] conducted a study assessing thesaurus-based query expansion using the MetaThesaurus.
OHSUMED test collection: 106 queries, ~350,000 MEDLINE ref’s.
Created MetaThesaurus concepts for each query.
Used SAPHIRE concept matching program to identify “best” terms
Used synonym, hierarchical, and related term information
Found all types of query expansion degraded aggregate retrieval performance as measured by recall and precision.
Some queries showed some improvement.
Prior Work - Srinivasan : Prior Work - Srinivasan Srinivasan demonstrated effectiveness of expanding MEDLINE queries with three expansion strategies:
expansion of MeSH terms.
expansion of document free text.
expansion of both MeSH and document free text terms.
Used subset of OHSUMED test collection of 75 queries and 2,344 MEDLINE documents.
Results showed significant improvement for all three strategies.
Showed that expansion of MeSH terms outperformed free text query expansion.
MeSH term expansion improved 11-AvgP by 14.4% over baseline of .5169.
Prior Work - Aranson : Prior Work - Aranson Aronson, et. al. [1] used the UMLS MetaMapTM program for associating UMLS Metathesaurus concepts with terms in the original query.
Query expansion included human assigned MeSH terms, along with terms through MetaThesaurus lookup.
Used probabilistic retrieval strategy.
Used same OHSUMED test collection as Srinivasan (75 queries and 2,344 MEDLINE documents.
Results compared favorable to automated relevance feedback and the results of Srinivasan (2.2% to 9.4% over baseline).
Authors suggest an optimal strategy is to use both UMLS Metathesaurus and some form of automated relevance feedback.
Prior Work - Yang and Chute : Prior Work - Yang and Chute Yang and Chute [14] utilized a least squared error technique to map query terms to MeSH terms in the MetaThesaurus.
They reported a 32.2% improvement in average precision over their baseline.
Difficult to compare results since they utilized a custom collection.
An additional concern is the risk of over-fitting their data by utilizing 88% of their data as the training set.
Prior Work - Mao and Chu : Prior Work - Mao and Chu Mao and Chu[8] use a variant on query expansion through the use of concepts and relations in the MetaThesaurus.
Define a phrase as consisting of word stems and “concepts.”
The similarity between two phrases is jointly determined by their conceptual similarity and their common word stems.
Concepts represent a UMLS concept identifier, e.g., “high body temperature” is a hyponym of “hyperthermia”, and “hyperthermia” is synonymous with “fever.”
Their results showed an increase of 16% in precision, but they only tested with one query.
Prior Work - Mendonca and Cimino : Prior Work - Mendonca and Cimino Mendonca and Cimino [23] investigated the utilization of co-occurrence of MeSH terms in MEDLINE citations.
Interested in terms associated with the search strategies optimal for evidence–based medicine to automate construction of a knowledge base.
Used UMLS MRCOC & MRREL table.
Found good specificity and sensitivity for co-occurrence of MeSH terms in MEDLINE citations.
These results lend support to the notion of utilizing MeSH terms from highly ranked documents as part of a query expansion strategy for relevance feedback.