palakal1 iu

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

SIFTER: A Content-based Information Filtering System: 

SIFTER: A Content-based Information Filtering System Indiana University Purdue University Indianapolis Indiana University Bloomington Mathew J. Palakal Rajeev R. Raje Snehasis Mukhopadhyay Javed Mostafa http://sifter.indiana.edu http://sifter.cs.iupui.edu

Slide2: 

SIFTER: Motivation Information Overload -- Reality in Today’s World A Need for locating highly ‘Relevant Information’ A Necessity for an existence of a single tool for accessing multiple data sources and formats Continuous Updating of relevant information Privacy Collaborative Work environments

The Current Model: 

The Current Model Yahoo! Private Data Sources Internal Data Unexplored Data X

Slide4: 

The New Model Yahoo! Private Data Sources Internal Data Unexplored Data SIFTER

Usages of SIFTER: 

A Personalized Content Manager A Productivity Enhancement Tool A Nth Degree Information Personalization Tool A Collaborative Research Tool A Private Information Tool Usages of SIFTER

Single Agent Filter (SIFTER): 

Single Agent Filter (SIFTER)

SIFTER: 

SIFTER Acquisition File, Known Sources Representation Vector-space -- tf-idf Classification Maximin, Centroids, Sample Documents User Profiling Reinforcement Learning Presentation GUI

Document Representation and Vector Space Model: 

Document Representation and Vector Space Model Identify the concepts that describe the content of the given document Convert a document to a numeric or symbolic form Documents are vectors of weighted terms, defined in a thesaurus -- How to generate? Weights -- tf (term frequency) and idf (inverse document frequency) -- Simple and effective

Classification: 

Classification Maximin-Distance: unsupervised clustering algorithm based on the document set Distance Metric: Cosine similarity measure (Salton) A point is chosen that has the largest distance from the centroids and is added as a new centroid if this distance is larger than a threshold

User Profiling: 

User Profiling Learn user interest levels for given categories Relies on relevance feedback from user Uses a simple reinforcement learning algorithms (known as Pursuit Learning) maintains an action probability vector and a estimated relevance probabilities vector both these vectors are updated continuously

SIFTER BioSifter : 

SIFTER BioSifter Aimed at Customizing and Adapting SIFTER to Biological Domain Successfully Customized PubMed as the Document Source Documents and Thesaurus for Type II Diabetes Stand-alone Version in Java and HTML Tested and Deployed at Eli Lilly & Co.

BioSifter Interface: 

BioSifter Interface

How BioSifter help Pharmaceutical Researchers?: 

How BioSifter help Pharmaceutical Researchers? Reducing the Information Overhead Rapidly Adapting to User Interests and New Sources Detecting New Information Sources Discovering Novel Correlations Identifying Internal/External Collaborators -- Acquiring/Selling In/Out-house Knowledge Creating a Dynamic Web of Intelligent Filters

Knowledge Discovery: 

Knowledge Discovery Actinin desmin FUS ank1 TLS myoglobin filamin nebulin titin CSE1 importin FKBP54 FKBP51 hsp90 Data based on 5000 PubMed documents. Thesaurus consists of 67 Gene Terms. The thickness & color of lines indicate relative strengths of associations. Gene-Pair Relationship:

Future Plans: 

Future Plans System Automatic Thesaurus Discovery Retrieval from Multiple Sources Ability to Filter Multiple Formats Different Approaches to User Profiling Application Sequence and 3-D Structure Data Retrieval, Representation and Filtering Knowledge Discovery

D-SIFTER and SIFTER II: 

D-SIFTER and SIFTER II D-SIFTER Distributed Filtering System Homogeneous Classification/Profiling Collaboration Models SIFTER II Uniform Structure of an Agent Multiple and Heterogeneous Agents Collaboration Models

Thank You: 

Thank You {mpalakal, rraje, smukhopa}@cs.iupui.edu jm@indiana.edu sifter@cs.iupui.edu