logging in or signing up palakal1 iu Emma Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 40 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 10, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript SIFTER: A Content-based Information Filtering System: SIFTER: A Content-based Information Filtering System Indiana University Purdue University Indianapolis Indiana University Bloomington Mathew J. Palakal Rajeev R. Raje Snehasis Mukhopadhyay Javed Mostafa http://sifter.indiana.edu http://sifter.cs.iupui.eduSlide2: SIFTER: Motivation Information Overload -- Reality in Today’s World A Need for locating highly ‘Relevant Information’ A Necessity for an existence of a single tool for accessing multiple data sources and formats Continuous Updating of relevant information Privacy Collaborative Work environmentsThe Current Model: The Current Model Yahoo! Private Data Sources Internal Data Unexplored Data XSlide4: The New Model Yahoo! Private Data Sources Internal Data Unexplored Data SIFTER Usages of SIFTER: A Personalized Content Manager A Productivity Enhancement Tool A Nth Degree Information Personalization Tool A Collaborative Research Tool A Private Information Tool Usages of SIFTERSingle Agent Filter (SIFTER): Single Agent Filter (SIFTER) SIFTER: SIFTER Acquisition File, Known Sources Representation Vector-space -- tf-idf Classification Maximin, Centroids, Sample Documents User Profiling Reinforcement Learning Presentation GUIDocument Representation and Vector Space Model: Document Representation and Vector Space Model Identify the concepts that describe the content of the given document Convert a document to a numeric or symbolic form Documents are vectors of weighted terms, defined in a thesaurus -- How to generate? Weights -- tf (term frequency) and idf (inverse document frequency) -- Simple and effectiveClassification: Classification Maximin-Distance: unsupervised clustering algorithm based on the document set Distance Metric: Cosine similarity measure (Salton) A point is chosen that has the largest distance from the centroids and is added as a new centroid if this distance is larger than a thresholdUser Profiling: User Profiling Learn user interest levels for given categories Relies on relevance feedback from user Uses a simple reinforcement learning algorithms (known as Pursuit Learning) maintains an action probability vector and a estimated relevance probabilities vector both these vectors are updated continuouslySIFTER BioSifter : SIFTER BioSifter Aimed at Customizing and Adapting SIFTER to Biological Domain Successfully Customized PubMed as the Document Source Documents and Thesaurus for Type II Diabetes Stand-alone Version in Java and HTML Tested and Deployed at Eli Lilly & Co. BioSifter Interface: BioSifter InterfaceHow BioSifter help Pharmaceutical Researchers?: How BioSifter help Pharmaceutical Researchers? Reducing the Information Overhead Rapidly Adapting to User Interests and New Sources Detecting New Information Sources Discovering Novel Correlations Identifying Internal/External Collaborators -- Acquiring/Selling In/Out-house Knowledge Creating a Dynamic Web of Intelligent FiltersKnowledge Discovery: Knowledge Discovery Actinin desmin FUS ank1 TLS myoglobin filamin nebulin titin CSE1 importin FKBP54 FKBP51 hsp90 Data based on 5000 PubMed documents. Thesaurus consists of 67 Gene Terms. The thickness & color of lines indicate relative strengths of associations. Gene-Pair Relationship:Future Plans: Future Plans System Automatic Thesaurus Discovery Retrieval from Multiple Sources Ability to Filter Multiple Formats Different Approaches to User Profiling Application Sequence and 3-D Structure Data Retrieval, Representation and Filtering Knowledge DiscoveryD-SIFTER and SIFTER II: D-SIFTER and SIFTER II D-SIFTER Distributed Filtering System Homogeneous Classification/Profiling Collaboration Models SIFTER II Uniform Structure of an Agent Multiple and Heterogeneous Agents Collaboration ModelsThank You: Thank You {mpalakal, rraje, smukhopa}@cs.iupui.edu jm@indiana.edu sifter@cs.iupui.edu You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
palakal1 iu Emma Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 40 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 10, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript SIFTER: A Content-based Information Filtering System: SIFTER: A Content-based Information Filtering System Indiana University Purdue University Indianapolis Indiana University Bloomington Mathew J. Palakal Rajeev R. Raje Snehasis Mukhopadhyay Javed Mostafa http://sifter.indiana.edu http://sifter.cs.iupui.eduSlide2: SIFTER: Motivation Information Overload -- Reality in Today’s World A Need for locating highly ‘Relevant Information’ A Necessity for an existence of a single tool for accessing multiple data sources and formats Continuous Updating of relevant information Privacy Collaborative Work environmentsThe Current Model: The Current Model Yahoo! Private Data Sources Internal Data Unexplored Data XSlide4: The New Model Yahoo! Private Data Sources Internal Data Unexplored Data SIFTER Usages of SIFTER: A Personalized Content Manager A Productivity Enhancement Tool A Nth Degree Information Personalization Tool A Collaborative Research Tool A Private Information Tool Usages of SIFTERSingle Agent Filter (SIFTER): Single Agent Filter (SIFTER) SIFTER: SIFTER Acquisition File, Known Sources Representation Vector-space -- tf-idf Classification Maximin, Centroids, Sample Documents User Profiling Reinforcement Learning Presentation GUIDocument Representation and Vector Space Model: Document Representation and Vector Space Model Identify the concepts that describe the content of the given document Convert a document to a numeric or symbolic form Documents are vectors of weighted terms, defined in a thesaurus -- How to generate? Weights -- tf (term frequency) and idf (inverse document frequency) -- Simple and effectiveClassification: Classification Maximin-Distance: unsupervised clustering algorithm based on the document set Distance Metric: Cosine similarity measure (Salton) A point is chosen that has the largest distance from the centroids and is added as a new centroid if this distance is larger than a thresholdUser Profiling: User Profiling Learn user interest levels for given categories Relies on relevance feedback from user Uses a simple reinforcement learning algorithms (known as Pursuit Learning) maintains an action probability vector and a estimated relevance probabilities vector both these vectors are updated continuouslySIFTER BioSifter : SIFTER BioSifter Aimed at Customizing and Adapting SIFTER to Biological Domain Successfully Customized PubMed as the Document Source Documents and Thesaurus for Type II Diabetes Stand-alone Version in Java and HTML Tested and Deployed at Eli Lilly & Co. BioSifter Interface: BioSifter InterfaceHow BioSifter help Pharmaceutical Researchers?: How BioSifter help Pharmaceutical Researchers? Reducing the Information Overhead Rapidly Adapting to User Interests and New Sources Detecting New Information Sources Discovering Novel Correlations Identifying Internal/External Collaborators -- Acquiring/Selling In/Out-house Knowledge Creating a Dynamic Web of Intelligent FiltersKnowledge Discovery: Knowledge Discovery Actinin desmin FUS ank1 TLS myoglobin filamin nebulin titin CSE1 importin FKBP54 FKBP51 hsp90 Data based on 5000 PubMed documents. Thesaurus consists of 67 Gene Terms. The thickness & color of lines indicate relative strengths of associations. Gene-Pair Relationship:Future Plans: Future Plans System Automatic Thesaurus Discovery Retrieval from Multiple Sources Ability to Filter Multiple Formats Different Approaches to User Profiling Application Sequence and 3-D Structure Data Retrieval, Representation and Filtering Knowledge DiscoveryD-SIFTER and SIFTER II: D-SIFTER and SIFTER II D-SIFTER Distributed Filtering System Homogeneous Classification/Profiling Collaboration Models SIFTER II Uniform Structure of an Agent Multiple and Heterogeneous Agents Collaboration ModelsThank You: Thank You {mpalakal, rraje, smukhopa}@cs.iupui.edu jm@indiana.edu sifter@cs.iupui.edu