logging in or signing up KM Techniques Examples 2002 Bruno Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 236 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: March 08, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Knowledge Management Systems: Development and ApplicationsPart II: Techniques and Examples: Knowledge Management Systems: Development and Applications Part II: Techniques and Examples Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial Intelligence Lab and Hoffman E-Commerce Lab The University of Arizona Founder, Knowledge Computing Corporation 美國亞歷桑那大學, 陳炘鈞 博士 Acknowledgement: NSF DLI1, DLI2, NSDL, DG, ITR, IDM, CSS, NIH/NLM, NCI, NIJ, CIA, NCSA, HP, SAPSlide2: Knowledge Management Systems: Overview Slide3: KMS Root: Intersection of IR and AI Information Retrieval (IR) and Gerald Salton • Inverted Index, Boolean, and Probabilistic, 1970s • Expert Systems, User Modeling and Natural Language Processing, 1980s • Machine Learning for Information Retrieval, 1990s • Internet Search Engines, late 1990s Slide4: KMS Root: Intersection of IR and AI Artificial Intelligence (AI) and Herbert Simon • General Problem Solvers, 1970s • Expert Systems, 1980s • Machine Learning and Data Mining, 1990s • Autonomous Agents, late 1990sSlide5: Representing Knowledge •IR Approach •Indexing and Subject Headings •Dictionaries, Thesauri, and Classification Schemes •AI Approach •Cognitive Modeling •Semantic Networks, Production Systems, Logic, Frames, and Ontologies Knowledge Retrieval Vendor Direction(Source: GartnerGroup): Knowledge Retrieval Vendor Direction (Source: GartnerGroup) grapeVINE Sovereign Hill CompassWare Intraspect KnowledgeX WiseWire • Lycos • Autonomy • Perspecta Lotus Netscape* Technology Innovation Niche Players IR Leaders Verity Fulcrum Excalibur Dataware Microsoft Content Experience • IDI Oracle • Open Text • Folio • IBM • InText PCDOCS Documentum Knowledge Retrieval NewBies Newbies: IR Leaders: Niche Players: Market Target * Not yet marketedSlide7: KM Software Vendors Ability to Execute Completeness of Vision Niche Players Visionaries Challengers Leaders Microsoft * Lotus * Dataware * * Verity * Excalibur Netscape * Documentum* * IBM Inference* Lycos/InMagic* CompassWare* KnowledgeX* SovereignHill* Semio* IDI* PCDOCS/* Fulcrum OpenText* Autonomy* GrapeVINE* * InXight WiseWire* *IntraspectCompetitive Analysis: Text Analysis: Competitive Analysis: Text AnalysisCompetitive Analysis: Collection Creation: Competitive Analysis: Collection CreationCompetitive Analysis: Retrieval/Display: Competitive Analysis: Retrieval/DisplaySlide11: Knowledge Management Systems: Techniques Slide12: KMS Techniques: Linguistic analysis/NLP: identify key concepts (who/what/where…) Statistical/co-occurrence analysis: create automatic thesaurus, link analysis Statistical and neural networks clustering/categorization: identify similar documents/users/communities and create knowledge maps Visualization and HCI: tree/network, 1/2/3D, zooming/detail-in-contextSlide13: KMS Techniques: Linguistic Analysis Word and inverted index: stemming, suffixes, morphological analysis, Boolean, proximity, range, fuzzy search Phrasal analysis: noun phrases, verb phrases, entity extraction, mutual information Sentence-level analysis: context-free grammar, transformational grammar Semantic analysis: semantic grammar, case-based reasoning, frame/scriptAutomatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 : Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 : Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Slide16: KMS Techniques: Statistical/Co-Occurrence Analysis Similarity functions: Jaccard, Cosine Weighting heuristics Bi-gram, tri-gram, N-gram Finite State Automata (FSA) Dictionaries and thesauriAutomatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 : Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Slide18: Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Slide19: KMS Techniques: Clustering/Categorization Hierarchical clustering: single-link, multi-link, Ward’s Statistical clustering: multi-dimensional scaling (MDS), factor analysis Neural network clustering: self-organizing map (SOM) Ontologies: directories, classification schemesSlide20: Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 : Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Slide22: KMS Techniques: Visualization/HCI Structures: trees/hierarchies, networks Dimensions: 1D, 2D, 2.5D, 3D, N-D (glyphs) Interactions: zooming, spotlight, fisheye views, fractal viewsAutomatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 : Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Automatic Generation of CL: : Automatic Generation of CL: Slide25: Entity Extraction and Co-reference based on TREC and MUG Visualization techniques based on Fisheye, Fractal, and Spotlight Text segmentation and summarization based on Textile and Wavelets Automatic Generation of CL: (Continued)Slide26: Lexicon-enhanced indexing (e.g., UMLS Specialist Lexicon) Ontology-enhanced semantic tagging (e.g., UMLS Semantic Nets) Ontology-enhanced query expansion (e.g., WordNet, UMLS Metathesaurus) Integration of CL: Spreading-activation based term suggestion (e.g., Hopfield net)YAHOO vs. OOHAY:: YAHOO vs. OOHAY: YAHOO: manual, high-precision OOHAY: automatic, high-recall Acknowledgements: NSF, NIH, NLM, NIJ, DARPASlide28: From YAHOO! To OOHAY? Y A H O O ! ? Object Oriented Hierarchical Automatic YellowpageSlide29: Knowledge Management Systems: Examples Web Analysis (1M):Web pages, spidering, noun phrasing, categorization: Web Analysis (1M): Web pages, spidering, noun phrasing, categorizationOOHAY: Visualizing the Web: OOHAY: Visualizing the WebSlide32: OOHAY: Visualizing the WebSlide33: Lessons Learned: Web pages are noisy: need filtering Spidering needs help: domain lexicons, multi-threads SOM is computational feasible for large-scale application SOM performance for web pages = 50% Web knowledge map (directory) is interesting for browsing, not for searching Techniques applicable to Intranet and marketing intelligenceNews Classification (1M):Chinese news content, mutual information indexing, PAT tree, categorization: News Classification (1M): Chinese news content, mutual information indexing, PAT tree, categorizationSlide41: Lessons Learned: News readers are not knowledge workers News articles are professionally written and precise. SOM performance for news articles = 85% Statistical indexing techniques perform well for Chinese documents Corporate users may need multiple sources and dynamic search help Techniques applicable to eCommerce (eCatalogs) and ePortalPersonal Agents (1K):Web spidering, meta searching, noun phrasing, dynamic categorization: Personal Agents (1K): Web spidering, meta searching, noun phrasing, dynamic categorizationSlide44: 2. Search results from spiders are displayed dynamically 1. Enter Starting URLs and Key Phrases to be searched OOHAY: CI Spider For project information and free download: http://ai.bpa.arizona.eduSlide45: 2. Search results from spiders are displayed dynamically 1. Enter Starting URLs and Key Phrases to be searched OOHAY: CI Spider, Meta Spider, Med Spider For project information and free download: http://ai.bpa.arizona.eduSlide46: OOHAY: Meta Spider, News Spider, Cancer Spider For project information and free download: http://ai.bpa.arizona.eduSlide47: 4. SOM is generated based on the phrases selected. Steps 3 and 4 can be done in iterations to refine the results. 3. Noun Phrases are extracted from the web ages and user can selected preferred phrases for further summarization. OOHAY: CI Spider, Meta Spider, Med Spider For project information and free download: http://ai.bpa.arizona.eduSlide48: Lessons Learned: Meta spidering is useful for information consolidation Noun phrasing is useful for topic classification (dynamic folders) SOM usefulness is suspect for small collections Knowledge workers like personalization, client searching, and collaborative information sharing Corporate users need multiple sources and dynamic search help Techniques applicable to marketing and competitive analysesCRM Data Analysis (5K):Call center Q/A, noun phrasing, dynamic categorization, problem analysis, agent assistance: CRM Data Analysis (5K): Call center Q/A, noun phrasing, dynamic categorization, problem analysis, agent assistanceSlide52: Lessons Learned: Call center data are noisy: typos and errors Noun phrasing useful for Q/A classification Q/A classification could identify problem areas Q/A classification could improve agent productivity: email, online chat, and VoIP Q/A classification could improve new agent training Techniques applicable to virtual call center and CRM applicationsNewsgroup Categorization (1K):Workgroup communication, noun phrasing, dynamic categorization, glyphs visualization: Newsgroup Categorization (1K): Workgroup communication, noun phrasing, dynamic categorization, glyphs visualizationSlide54: Thread Disadvantages: No sub-topic identification Difficult to identify experts Difficult to learn participants’ attitude toward the communitySlide55: Thread Representation Time Message Person Length of TimeSlide56: People Representation Time Message Thread Length of TimeSlide57: Visual Effects: Thickness = how active a subtopic is Length in x-dimension = the time duration of a sub-topic Slide58: Proposed Interface (Interaction Summary) Visual Effects: Healthy sub-garden with many blooming high flowers = popular active sub-topic A long, blooming flower is a healthy threadSlide59: Proposed Interface (Expert Indicator) Visual Effects: Healthy sub-garden with many blooming high flowers = popular sub-topic A long, blooming people flower is a recognized expert.Slide60: Lessons Learned: P1000: A picture is indeed worth 1000 words Expert identification is critical for KM support Glyphs are powerful for capturing multi-dimensional data Techniques applicable to collaborative applications, e.g., email, online chats, newsgroup, and suchGIS Multimedia Data Mining (10GBs):Geoscience data, texture image indexing, multimedia content: GIS Multimedia Data Mining (10GBs): Geoscience data, texture image indexing, multimedia contentSlide62: Airphoto analysis: Texture (Gabor filter)Slide63: AVHRR satellite data: Temperature/vegetationSlide64: Lessons Learned: Image analysis techniques are application dependent (unlike text analysis) Image killer apps not found yet Multimedia applications require integration of data, text, and image mining techniques Multimedia KMS not ready for prime-time consumption yet You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
KM Techniques Examples 2002 Bruno Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 236 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: March 08, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Knowledge Management Systems: Development and ApplicationsPart II: Techniques and Examples: Knowledge Management Systems: Development and Applications Part II: Techniques and Examples Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial Intelligence Lab and Hoffman E-Commerce Lab The University of Arizona Founder, Knowledge Computing Corporation 美國亞歷桑那大學, 陳炘鈞 博士 Acknowledgement: NSF DLI1, DLI2, NSDL, DG, ITR, IDM, CSS, NIH/NLM, NCI, NIJ, CIA, NCSA, HP, SAPSlide2: Knowledge Management Systems: Overview Slide3: KMS Root: Intersection of IR and AI Information Retrieval (IR) and Gerald Salton • Inverted Index, Boolean, and Probabilistic, 1970s • Expert Systems, User Modeling and Natural Language Processing, 1980s • Machine Learning for Information Retrieval, 1990s • Internet Search Engines, late 1990s Slide4: KMS Root: Intersection of IR and AI Artificial Intelligence (AI) and Herbert Simon • General Problem Solvers, 1970s • Expert Systems, 1980s • Machine Learning and Data Mining, 1990s • Autonomous Agents, late 1990sSlide5: Representing Knowledge •IR Approach •Indexing and Subject Headings •Dictionaries, Thesauri, and Classification Schemes •AI Approach •Cognitive Modeling •Semantic Networks, Production Systems, Logic, Frames, and Ontologies Knowledge Retrieval Vendor Direction(Source: GartnerGroup): Knowledge Retrieval Vendor Direction (Source: GartnerGroup) grapeVINE Sovereign Hill CompassWare Intraspect KnowledgeX WiseWire • Lycos • Autonomy • Perspecta Lotus Netscape* Technology Innovation Niche Players IR Leaders Verity Fulcrum Excalibur Dataware Microsoft Content Experience • IDI Oracle • Open Text • Folio • IBM • InText PCDOCS Documentum Knowledge Retrieval NewBies Newbies: IR Leaders: Niche Players: Market Target * Not yet marketedSlide7: KM Software Vendors Ability to Execute Completeness of Vision Niche Players Visionaries Challengers Leaders Microsoft * Lotus * Dataware * * Verity * Excalibur Netscape * Documentum* * IBM Inference* Lycos/InMagic* CompassWare* KnowledgeX* SovereignHill* Semio* IDI* PCDOCS/* Fulcrum OpenText* Autonomy* GrapeVINE* * InXight WiseWire* *IntraspectCompetitive Analysis: Text Analysis: Competitive Analysis: Text AnalysisCompetitive Analysis: Collection Creation: Competitive Analysis: Collection CreationCompetitive Analysis: Retrieval/Display: Competitive Analysis: Retrieval/DisplaySlide11: Knowledge Management Systems: Techniques Slide12: KMS Techniques: Linguistic analysis/NLP: identify key concepts (who/what/where…) Statistical/co-occurrence analysis: create automatic thesaurus, link analysis Statistical and neural networks clustering/categorization: identify similar documents/users/communities and create knowledge maps Visualization and HCI: tree/network, 1/2/3D, zooming/detail-in-contextSlide13: KMS Techniques: Linguistic Analysis Word and inverted index: stemming, suffixes, morphological analysis, Boolean, proximity, range, fuzzy search Phrasal analysis: noun phrases, verb phrases, entity extraction, mutual information Sentence-level analysis: context-free grammar, transformational grammar Semantic analysis: semantic grammar, case-based reasoning, frame/scriptAutomatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 : Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 : Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Slide16: KMS Techniques: Statistical/Co-Occurrence Analysis Similarity functions: Jaccard, Cosine Weighting heuristics Bi-gram, tri-gram, N-gram Finite State Automata (FSA) Dictionaries and thesauriAutomatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 : Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Slide18: Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Slide19: KMS Techniques: Clustering/Categorization Hierarchical clustering: single-link, multi-link, Ward’s Statistical clustering: multi-dimensional scaling (MDS), factor analysis Neural network clustering: self-organizing map (SOM) Ontologies: directories, classification schemesSlide20: Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 : Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Slide22: KMS Techniques: Visualization/HCI Structures: trees/hierarchies, networks Dimensions: 1D, 2D, 2.5D, 3D, N-D (glyphs) Interactions: zooming, spotlight, fisheye views, fractal viewsAutomatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 : Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1 Automatic Generation of CL: : Automatic Generation of CL: Slide25: Entity Extraction and Co-reference based on TREC and MUG Visualization techniques based on Fisheye, Fractal, and Spotlight Text segmentation and summarization based on Textile and Wavelets Automatic Generation of CL: (Continued)Slide26: Lexicon-enhanced indexing (e.g., UMLS Specialist Lexicon) Ontology-enhanced semantic tagging (e.g., UMLS Semantic Nets) Ontology-enhanced query expansion (e.g., WordNet, UMLS Metathesaurus) Integration of CL: Spreading-activation based term suggestion (e.g., Hopfield net)YAHOO vs. OOHAY:: YAHOO vs. OOHAY: YAHOO: manual, high-precision OOHAY: automatic, high-recall Acknowledgements: NSF, NIH, NLM, NIJ, DARPASlide28: From YAHOO! To OOHAY? Y A H O O ! ? Object Oriented Hierarchical Automatic YellowpageSlide29: Knowledge Management Systems: Examples Web Analysis (1M):Web pages, spidering, noun phrasing, categorization: Web Analysis (1M): Web pages, spidering, noun phrasing, categorizationOOHAY: Visualizing the Web: OOHAY: Visualizing the WebSlide32: OOHAY: Visualizing the WebSlide33: Lessons Learned: Web pages are noisy: need filtering Spidering needs help: domain lexicons, multi-threads SOM is computational feasible for large-scale application SOM performance for web pages = 50% Web knowledge map (directory) is interesting for browsing, not for searching Techniques applicable to Intranet and marketing intelligenceNews Classification (1M):Chinese news content, mutual information indexing, PAT tree, categorization: News Classification (1M): Chinese news content, mutual information indexing, PAT tree, categorizationSlide41: Lessons Learned: News readers are not knowledge workers News articles are professionally written and precise. SOM performance for news articles = 85% Statistical indexing techniques perform well for Chinese documents Corporate users may need multiple sources and dynamic search help Techniques applicable to eCommerce (eCatalogs) and ePortalPersonal Agents (1K):Web spidering, meta searching, noun phrasing, dynamic categorization: Personal Agents (1K): Web spidering, meta searching, noun phrasing, dynamic categorizationSlide44: 2. Search results from spiders are displayed dynamically 1. Enter Starting URLs and Key Phrases to be searched OOHAY: CI Spider For project information and free download: http://ai.bpa.arizona.eduSlide45: 2. Search results from spiders are displayed dynamically 1. Enter Starting URLs and Key Phrases to be searched OOHAY: CI Spider, Meta Spider, Med Spider For project information and free download: http://ai.bpa.arizona.eduSlide46: OOHAY: Meta Spider, News Spider, Cancer Spider For project information and free download: http://ai.bpa.arizona.eduSlide47: 4. SOM is generated based on the phrases selected. Steps 3 and 4 can be done in iterations to refine the results. 3. Noun Phrases are extracted from the web ages and user can selected preferred phrases for further summarization. OOHAY: CI Spider, Meta Spider, Med Spider For project information and free download: http://ai.bpa.arizona.eduSlide48: Lessons Learned: Meta spidering is useful for information consolidation Noun phrasing is useful for topic classification (dynamic folders) SOM usefulness is suspect for small collections Knowledge workers like personalization, client searching, and collaborative information sharing Corporate users need multiple sources and dynamic search help Techniques applicable to marketing and competitive analysesCRM Data Analysis (5K):Call center Q/A, noun phrasing, dynamic categorization, problem analysis, agent assistance: CRM Data Analysis (5K): Call center Q/A, noun phrasing, dynamic categorization, problem analysis, agent assistanceSlide52: Lessons Learned: Call center data are noisy: typos and errors Noun phrasing useful for Q/A classification Q/A classification could identify problem areas Q/A classification could improve agent productivity: email, online chat, and VoIP Q/A classification could improve new agent training Techniques applicable to virtual call center and CRM applicationsNewsgroup Categorization (1K):Workgroup communication, noun phrasing, dynamic categorization, glyphs visualization: Newsgroup Categorization (1K): Workgroup communication, noun phrasing, dynamic categorization, glyphs visualizationSlide54: Thread Disadvantages: No sub-topic identification Difficult to identify experts Difficult to learn participants’ attitude toward the communitySlide55: Thread Representation Time Message Person Length of TimeSlide56: People Representation Time Message Thread Length of TimeSlide57: Visual Effects: Thickness = how active a subtopic is Length in x-dimension = the time duration of a sub-topic Slide58: Proposed Interface (Interaction Summary) Visual Effects: Healthy sub-garden with many blooming high flowers = popular active sub-topic A long, blooming flower is a healthy threadSlide59: Proposed Interface (Expert Indicator) Visual Effects: Healthy sub-garden with many blooming high flowers = popular sub-topic A long, blooming people flower is a recognized expert.Slide60: Lessons Learned: P1000: A picture is indeed worth 1000 words Expert identification is critical for KM support Glyphs are powerful for capturing multi-dimensional data Techniques applicable to collaborative applications, e.g., email, online chats, newsgroup, and suchGIS Multimedia Data Mining (10GBs):Geoscience data, texture image indexing, multimedia content: GIS Multimedia Data Mining (10GBs): Geoscience data, texture image indexing, multimedia contentSlide62: Airphoto analysis: Texture (Gabor filter)Slide63: AVHRR satellite data: Temperature/vegetationSlide64: Lessons Learned: Image analysis techniques are application dependent (unlike text analysis) Image killer apps not found yet Multimedia applications require integration of data, text, and image mining techniques Multimedia KMS not ready for prime-time consumption yet