logging in or signing up KM-Overview-2005 aSGuest7411 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 535 Category: Business & Fin.. License: All Rights Reserved Like it (0) Dislike it (0) Added: December 19, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Knowledge Management Systems: Development and ApplicationsPart I: Overview and Related Fields : Knowledge Management Systems: Development and ApplicationsPart I: Overview and Related Fields Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial Intelligence Lab The University of Arizona Founder, Knowledge Computing Corporation Acknowledgement: NSF DLI1, DLI2, NSDL, DG, ITR, IDM, CSS, NIH/NLM, NCI, NIJ, CIA, DHS, NCSA, HP, SAP ????????, ??? ?? Slide 2: My Background: ( A Mixed Bag!) BS NCTU Management Science, 1981 MBA SUNY Buffalo Finance, MS, MIS Ph.D. NYU Information System, Minor: CS, 1989 Dissertation: “An AI Approach to the Design Of Online Information Retrieval Systems” (GEAC Online Cataloging System) Assistant/Associate/Full/Chair Professor, University of Arizona, MIS Department Scientific Counselor, National Library of Medicine USA), National Library of China, Academia Sinica Slide 3: My Background: (A Mixed Bag!) Founder/Director, Artificial Intelligent Lab, 1990 Founder/Director, Hoffman eCommerce Lab, 2000 PIs: NSF CISE DLI-1 DLI-2, NSDL, DG, DARPA, NIJ, NIH, CIA, DHS Associate Editors: JASIST, DSS, ACM TOIS, IEEE SMC, IEEE ITS Conference/program Co-hairs: ICADL 1998-2004, China DL 2002/2004, NSF/NIJ ISI 2003-2006, JCDL 2004 Industry Consulting: HP, IBM, AT&T, SGI, Microsoft, SAP Founder, Knowledge Computing Corporation, 2000 Slide 4: Knowledge Management: Overview Slide 5: Knowledge Management Overview What is Knowledge Management Data, Information, and Knowledge Why Knowledge Management? Knowledge Management Processes Unit of Analysis : Unit of Analysis Data: 1980s Factual Structured, numeric Oracle, Sybase, DB2 Information: 1990s Factual Yahoo!, Excalibur, Unstructured, textual Verity, Documentum Knowledge: 2000s Inferential, sensemaking, decision making Multimedia ??? Slide 7: According to Alter (1996), Tobin (1996), and Beckman (1999): Data: Facts, images, or sounds (+interpretation+meaning =) Information: Formatted, filtered, and summarized data (+action+application =) Knowledge: Instincts, ideas, rules, and procedures that guide actions and decisions Data, Information and Knowledge: Application and Societal Relevance : : Application and Societal Relevance : Ontologies, hierarchies, and subject headings Knowledge management systems and practices: knowledge maps Digital libraries, search engines, web mining, text mining, data mining, CRM, eCommerce Semantic web, multilingual web, multimedia web, and wireless web The Third Wave of Net Evolution : 1965 1975 1985 1995 2000 2010 ARPANET Internet “SemanticWeb” Company IBM ??? Microsoft/Netscape The Third Wave of Net Evolution Function Server Access Knowledge Access Info Access Unit Server Concepts File/Homepage Example Email Concept Protocols WWW: “World Wide Wait” Knowledge Management Definition : Knowledge Management Definition “The system and managerial approach to collecting, processing, and organizing enterprise-specific knowledge assets for business functions and decision making.” Knowledge Management Challenges : Knowledge Management Challenges “… making high-value corporate information and knowledge easily available to support decision making at the lowest, broadest possible levels …” Personnel Turn-over Organizational Resistance Manual Top-down Knowledge Creation Information Overload Knowledge Management Landscape : Knowledge Management Landscape Research Community NSF / DARPA / NASA, Digital Library Initiative I & II, NSDL ($120M) NSF, Digital Government Initiative ($60M) NSF, Knowledge Networking Initiative ($50M) NSF, Information Technology Research ($300M) Business Community Intellectual Capital, Corporate Memory, Knowledge Chain, Competitive Intelligence Knowledge Management Foundations : Enabling Technologies: Information Retrieval (Excalibur, Verity, Oracle Context) Electronic Document Management (Documentum, PC DOCS) Internet/Intranet (Yahoo!, Google) Groupware (Lotus Notes, MS Exchange) Consulting and System Integration: Best practices, human resources, organizational development, performance metrics, methodology, framework, ontology (Delphi, E&Y, Arthur Andersen, AMS, KPMG) Knowledge Management Foundations Knowledge Management Perspectives: : Knowledge Management Perspectives: Process perspective (management and behavior): consulting practices, methodology, best practices, e-learning, culture/reward, existing IT ? new information, old IT, new but manual process Information perspective (information and library sciences): content management, manual ontologies ? new information, manual process Knowledge Computing perspective (text mining, artificial intelligence): automated knowledge extraction, thesauri, knowledge maps ? new IT, new knowledge, automated process Slide 15: KM Perspectives KM, Emergence of a Discipline (Ponzi, 2004): : KM, Emergence of a Discipline (Ponzi, 2004): Influences from three disciplines: Management and Policy (40%), Computer Science (30%), Information/Library Science (20%) Continuous, steady growth since 1990: academic publications and industry articles; not a fad (unlike BPR, TQM) Seminal books and articles in Knowledge Management (e.g., Drucker, Davenport, Nonaka): the 50 most-cited KM articles KM Thoughts and Thinkers: : KM Thoughts and Thinkers: Future organizations are information-based organizations of knowledge workers; Specialization, cross-discipline task teams, disappearance of middle managers (Drucker, “The Coming of the New Organization”) The Japanese Management Style: Tacit knowledge, redundancy, slogans, metaphors; the “Ba”; the SECI Model – Socialization, Externalization, Combination, and Internalization (Nonaka, “The Knowledge-Creating Company) KM Thoughts and Thinkers: (cont’d) : KM Thoughts and Thinkers: (cont’d) Knowledge generation (acquisition, dedicated resources, fusion, adaptation, knowledge networking); Knowledge codification (mapping and modeling knowledge); Knowledge transfer; Technologies for KM; Learning from experiments (Davenport, “Working Knowledge”) Deep Smart: Seeing the big picture and knowing the skills; learning from experience (Leonard, “Deep Smart”) KM Thoughts and Thinkers: (cont’d) : KM Thoughts and Thinkers: (cont’d) Teaching smart people how to learn; Defensive reasoning and doom loop; Learning how to reason productively (Argyris, “Teaching Smart People How to Learn”) Technology gets in the way; Research on work practices; Harvesting local innovation and innovating with customer; PARC anthropologists (John Seely Brown, “Research that Reinvents the Corporation”) Inverting organizations (individual professionals leading); Creating intellectual webs (Quinn, “Managing Professional Intellect”) Slide 20: Knowledge Management: The Industry and Status Slide 21: Anderson Consulting (Accenture) (1) Acquire (2) Create (3) Synthesize (4) Share (5) Use to Achieve Organizational Goals (6) Environment Conducive to Knowledge Sharing Slide 22: Ernst & Young (1) Knowledge Generation (2) Knowledge Representation (3) Knowledge Codification (4) Knowledge Application Slide 23: Reason for Adopting KM 51.9% Retain expertise of personnel Increase customer satisfaction 43.1% Improve profits, grow revenues 37.5% Support e-business initiatives 24.7% Shorten product development cycles 23% Provide project workspace 11.7% Knowledge Management and IDC May 2001 Slide 24: Business Uses Of KM Initiative 77.7% Capture and share best practices Provide training, corporate learning 62.4% Manage customer relationships 58% Deliver competitive intelligence 55.7% Provide project workspace 31.4% Manage legal, intellectual property 31.4% Continue Slide 25: Leader Of KM Initiative Knowledge Management and IDC May 2001 Slide 26: 41% Employees have no time for KM Current culture does not encourage sharing 36.6% Lack of understanding of KM and Benefits 29.5% Inability to measure financial benefits of KM 24.5% Lack of Skill in KM techniques 22.7% Organization’s processes are not designed for KM 22.2% Continue Implementation Challenges Slide 27: 21.8% Lack of funding for KM Lack of incentives, rewards to share 19.9% Have not yet begun implementing KM 18.7% Lack of appropriate technology 17.4% Lack of commitment from senior management 13.9% No challenges encountered 4.3% Implementation Challenges Knowledge Management and IDC May 2001 Slide 28: 44.7% Messaging e-mail Knowledge base, repository 40.7% Document management 39.2% Data warehousing 34.6% Groupware 33.1% Search engines 32.3% Types of Software Purchased Continue Slide 29: 23.8% Web-based training Workflow 23.8% Enterprise information portal 23.2% Business rules management 11.6% Types of Software Purchased Knowledge Management and IDC May 2001 Slide 30: Spending On IT Services For KM 27% Implementation 27.8% Consulting Planning 15.3% Training 13.7% Maintenance 15.3% Operations, outsourcing Knowledge Management and IDC May 2001 Slide 31: 35.6% 24.4% Enterprise information portal Document management 26.2% Groupware Workflow 22.9% Data warehousing 19.3% Search engines 13.0% Software Budget Allotments Continue Slide 32: 11.4% Web-based training Messaging e-mail 10.8% Other 29.2% Software Budget Allotments Knowledge Management and IDC May 2001 Slide 33: Knowledge Management Systems: Overview Slide 34: Knowledge Management Systems (KMS) Characteristics of KMS The Industry and the Market Major Vendors and Systems Knowledge Management Systems Definition : Knowledge Management Systems Definition KMSs are computer-based information systems that: can help an enterprise acquire, manage, retain, analyze, and retrieve mission-critical information; and help turn enterprise information into well-organized, abstract, and actionable knowledge; and can help an enterprise identify and inter-connect experts, managers, and knowledge workers; and help extract, retain, and disseminate their knowledge in an organization. KM Architecture (Source: GartnerGroup) : KM Architecture (Source: GartnerGroup) Network Services Platform Services Distributed Object Models Databases Database Indexes Conceptual Knowledge Maps Web Browser “Workgroup” Applications Text Indexes Enterprise Knowledge Architecture Intranet and Extranet Applications Web UI KR Functions Text and Database Drivers Physical Application Index Knowledge Retrieval Knowledge Retrieval Level (Source: GartnerGroup) : Knowledge Retrieval Level (Source: GartnerGroup) Concept “Yellow Pages” Value “Recommendation” Retrieved Knowledge Semantic Collaboration Clustering — categorization “table of contents” Semantic Networks “index” Dictionaries Thesauri Linguistic analysis Data extraction Collaborative filters Communities Trusted advisor Expert identification Knowledge Retrieval Vendor Direction(Source: GartnerGroup) : Knowledge Retrieval Vendor Direction(Source: GartnerGroup) grapeVINE Sovereign Hill CompassWare Intraspect KnowledgeX WiseWire • Lycos • Autonomy • Perspecta Lotus Netscape* Technology Innovation Niche Players IR Leaders Verity Fulcrum Excalibur Dataware Microsoft Content Experience • IDI Oracle • Open Text • Folio • IBM • InText PCDOCS Documentum Knowledge Retrieval NewBies Newbies: IR Leaders: Niche Players: Market Target * Not yet marketed Slide 39: KM Software Vendors Ability to Execute Completeness of Vision Niche Players Visionaries Challengers Leaders Microsoft * Lotus * Dataware * * Verity * Excalibur Netscape * Documentum* * IBM Inference* Lycos/InMagic* CompassWare* KnowledgeX* SovereignHill* Semio* IDI* PCDOCS/* Fulcrum OpenText* Autonomy* GrapeVINE* * InXight WiseWire* *Intraspect Two Approaches to Codify Knowledge : Two Approaches to Codify Knowledge Structured Manual Human-driven Unstructured System-aided Data/Info-driven Bottom-Up Approach Top-Down Approach Slide 41: Sample KMS: Search Engine and Web Portal Data Mining Text Mining Web Mining Slide 42: Managing Information: Search Engine and Web Portal (Source: Jan Peterson and William Chang, Excite) Basic Architectures: Search : Basic Architectures: Search Web Log Index SE Spider Spam Freshness Quality results 20M queries/day Browser 800M pages? 24x7 SE SE Basic Architectures: Directory : Basic Architectures: Directory Web Browser Url submission Surfing Ontology Reviewed Urls SE SE SE Spidering : Spidering Web HTML data Hyperlinked Directed, disconnected graph Dynamic and static data Estimated 2 billion indexible pages Freshness How often are pages revisited? Indexing : Indexing Size from 50M to 150M to 3B urls 50 to 100% indexing overhead 200 to 400GB indices Representation Fields, meta-tags and content NLP: stemming? Search : Search Augmented Vector-space Ranked results with Boolean filtering Quality-based re-ranking Based on hyperlink data or user behavior Spam Manipulation of content to improve placement Queries : Queries Short expressions of information need 2.3 words on average Relevance overload is a key issue Users typically only view top results Search is a high volume business Yahoo! 50M queries/day Excite 30M queries/day Infoseek 15M queries/day Slide 49: Alta Vista: within site search, machine translation Directory : Directory Manual categorization and rating Labor intensive 20 to 50 editors High quality, but low coverage 200-500K urls Browsable ontology Open Directory is a distributed solution Slide 51: Yahoo: manual ontology (200 ontologists) Special Collections : Special Collections Newswire Newsgroups Specialized services (Deja) Information extraction Shopping catalog Events; recipes, etc. The Hidden Web : The Hidden Web Non-indexible content Behind passwords, firewalls Dynamic content Often searchable through local interface Network of distributed search resources How to access? Ask Jeeves! The Role of NLP : The Role of NLP Many Search Engines do not stem Precision bias suggests conservative term treatment What about non-English documents N-grams are popular for Chinese Language ID anyone? Link Analysis : Link Analysis Authors vote via links Pages with higher inlink are higher quality Not all links are equal Links from higher quality sites are better Links in context are better Resistant to Spam Only cross-site links considered Page Rank (Page’98) : Page Rank (Page’98) Limiting distribution of a random walk Jump to a random page with Prob. ? Follow a link with Prob. 1- ? Probability of landing at a page D: ?/T + ? P(D)/L(D) Sum over pages leading to D L(D) = number of links on page D Who asks What? : Who asks What? Query logs revisited Query-based indexing – why index things people don’t ask for? If they ask for A, give them B From atomic concepts to query extensions Structure of questions and answers Shyam Kapur’s chunks Futures : Futures Vertical markets – healthcare, real estate, jobs and resumes, etc. Localized search Search as embedded app Shopping 'bots Open Problems Has the bubble burst? From SE to Web Portal : From SE to Web Portal Spidering: Intranet and Internet crawling Integration: legacy systems and databases Content: aggregation and conversion Process: Collaboration, chat, workflow management, calendaring, and such Analysis: data and text mining, agent/alert, web mining Slide 60: Discovering Knowledge: Data Mining (Source: Michael Welge Automated Learning Group, NCSA) Why Data Mining? -- Potential Applications : Why Data Mining? -- Potential Applications Database analysis, decision support, and automation Market and Sales Analysis Fraud Detection Manufacturing Process Analysis Risk Analysis and Management Experimental Results Analysis Scientific Data Analysis Text Document Analysis Data Mining: Confluence of Multiple Disciplines : Data Mining: Confluence of Multiple Disciplines Database Systems, Data Warehouses, and OLAP Machine Learning Statistics Mathematical Programming Visualization High Performance Computing Data Mining: A KDD Process : Data Mining: A KDD Process Required Effort for Each KDD Step : Required Effort for Each KDD Step Data Mining Models and Methods : Data Mining Models and Methods Deviation Detection : Deviation Detection Identify outliers in a dataset. Typical techniques: OLAP charting, probability distribution contrasts, regression analysis, discriminant analysis Link Analysis (Rule Association) : Link Analysis (Rule Association) Given a database, find all associations of the form: IF < LHS > THEN <RHS > Prevalence = frequency of the LHS and RHS occurring together Predictability = fraction of the RHS out of all items with the LHS e.g., Beer and diaper Database Segmentation : Database Segmentation Regroup datasets into clusters that share common characteristics. Typical techniques: hierarchical clustering, neural network clustering (SOM), k-means Predictive Modeling : Predictive Modeling Use past data to predict future response and behavior. Typical technique: supervised learning (Neural Networks, Decision Trees, Naïve Bayesian) E.g., Who is most likely to respond to a direct mailing Data/Information Visualization : Data/Information Visualization Gain insight into the contents and complexity of the database being analyzed Vast amounts of under utilized data Time-critical decisions hampered Key information difficult to find Results presentation Reduced perceptual, interpretative, cognitive burden Rule Association - Basket Analysis : Rule Association - Basket Analysis Text Mining Visualization : Text Mining Visualization This data is considered to be confidential and proprietary to Caterpillar and may only be used with prior written consent from Caterpillar. Decision Tree Visualizer : Decision Tree Visualizer From Data Mining to Text Mining : From Data Mining to Text Mining Techniques: linguistics analysis, clustering, unsupervised learning, case-based reasoning Ontologies: XML/RDF, content management P1000: A picture is worth 1000 words Formats/types: email, reports, web pages, etc. Integration: KMS and IT infrastructure Cultural: rewards and unintended consequences You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
KM-Overview-2005 aSGuest7411 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 535 Category: Business & Fin.. License: All Rights Reserved Like it (0) Dislike it (0) Added: December 19, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Knowledge Management Systems: Development and ApplicationsPart I: Overview and Related Fields : Knowledge Management Systems: Development and ApplicationsPart I: Overview and Related Fields Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial Intelligence Lab The University of Arizona Founder, Knowledge Computing Corporation Acknowledgement: NSF DLI1, DLI2, NSDL, DG, ITR, IDM, CSS, NIH/NLM, NCI, NIJ, CIA, DHS, NCSA, HP, SAP ????????, ??? ?? Slide 2: My Background: ( A Mixed Bag!) BS NCTU Management Science, 1981 MBA SUNY Buffalo Finance, MS, MIS Ph.D. NYU Information System, Minor: CS, 1989 Dissertation: “An AI Approach to the Design Of Online Information Retrieval Systems” (GEAC Online Cataloging System) Assistant/Associate/Full/Chair Professor, University of Arizona, MIS Department Scientific Counselor, National Library of Medicine USA), National Library of China, Academia Sinica Slide 3: My Background: (A Mixed Bag!) Founder/Director, Artificial Intelligent Lab, 1990 Founder/Director, Hoffman eCommerce Lab, 2000 PIs: NSF CISE DLI-1 DLI-2, NSDL, DG, DARPA, NIJ, NIH, CIA, DHS Associate Editors: JASIST, DSS, ACM TOIS, IEEE SMC, IEEE ITS Conference/program Co-hairs: ICADL 1998-2004, China DL 2002/2004, NSF/NIJ ISI 2003-2006, JCDL 2004 Industry Consulting: HP, IBM, AT&T, SGI, Microsoft, SAP Founder, Knowledge Computing Corporation, 2000 Slide 4: Knowledge Management: Overview Slide 5: Knowledge Management Overview What is Knowledge Management Data, Information, and Knowledge Why Knowledge Management? Knowledge Management Processes Unit of Analysis : Unit of Analysis Data: 1980s Factual Structured, numeric Oracle, Sybase, DB2 Information: 1990s Factual Yahoo!, Excalibur, Unstructured, textual Verity, Documentum Knowledge: 2000s Inferential, sensemaking, decision making Multimedia ??? Slide 7: According to Alter (1996), Tobin (1996), and Beckman (1999): Data: Facts, images, or sounds (+interpretation+meaning =) Information: Formatted, filtered, and summarized data (+action+application =) Knowledge: Instincts, ideas, rules, and procedures that guide actions and decisions Data, Information and Knowledge: Application and Societal Relevance : : Application and Societal Relevance : Ontologies, hierarchies, and subject headings Knowledge management systems and practices: knowledge maps Digital libraries, search engines, web mining, text mining, data mining, CRM, eCommerce Semantic web, multilingual web, multimedia web, and wireless web The Third Wave of Net Evolution : 1965 1975 1985 1995 2000 2010 ARPANET Internet “SemanticWeb” Company IBM ??? Microsoft/Netscape The Third Wave of Net Evolution Function Server Access Knowledge Access Info Access Unit Server Concepts File/Homepage Example Email Concept Protocols WWW: “World Wide Wait” Knowledge Management Definition : Knowledge Management Definition “The system and managerial approach to collecting, processing, and organizing enterprise-specific knowledge assets for business functions and decision making.” Knowledge Management Challenges : Knowledge Management Challenges “… making high-value corporate information and knowledge easily available to support decision making at the lowest, broadest possible levels …” Personnel Turn-over Organizational Resistance Manual Top-down Knowledge Creation Information Overload Knowledge Management Landscape : Knowledge Management Landscape Research Community NSF / DARPA / NASA, Digital Library Initiative I & II, NSDL ($120M) NSF, Digital Government Initiative ($60M) NSF, Knowledge Networking Initiative ($50M) NSF, Information Technology Research ($300M) Business Community Intellectual Capital, Corporate Memory, Knowledge Chain, Competitive Intelligence Knowledge Management Foundations : Enabling Technologies: Information Retrieval (Excalibur, Verity, Oracle Context) Electronic Document Management (Documentum, PC DOCS) Internet/Intranet (Yahoo!, Google) Groupware (Lotus Notes, MS Exchange) Consulting and System Integration: Best practices, human resources, organizational development, performance metrics, methodology, framework, ontology (Delphi, E&Y, Arthur Andersen, AMS, KPMG) Knowledge Management Foundations Knowledge Management Perspectives: : Knowledge Management Perspectives: Process perspective (management and behavior): consulting practices, methodology, best practices, e-learning, culture/reward, existing IT ? new information, old IT, new but manual process Information perspective (information and library sciences): content management, manual ontologies ? new information, manual process Knowledge Computing perspective (text mining, artificial intelligence): automated knowledge extraction, thesauri, knowledge maps ? new IT, new knowledge, automated process Slide 15: KM Perspectives KM, Emergence of a Discipline (Ponzi, 2004): : KM, Emergence of a Discipline (Ponzi, 2004): Influences from three disciplines: Management and Policy (40%), Computer Science (30%), Information/Library Science (20%) Continuous, steady growth since 1990: academic publications and industry articles; not a fad (unlike BPR, TQM) Seminal books and articles in Knowledge Management (e.g., Drucker, Davenport, Nonaka): the 50 most-cited KM articles KM Thoughts and Thinkers: : KM Thoughts and Thinkers: Future organizations are information-based organizations of knowledge workers; Specialization, cross-discipline task teams, disappearance of middle managers (Drucker, “The Coming of the New Organization”) The Japanese Management Style: Tacit knowledge, redundancy, slogans, metaphors; the “Ba”; the SECI Model – Socialization, Externalization, Combination, and Internalization (Nonaka, “The Knowledge-Creating Company) KM Thoughts and Thinkers: (cont’d) : KM Thoughts and Thinkers: (cont’d) Knowledge generation (acquisition, dedicated resources, fusion, adaptation, knowledge networking); Knowledge codification (mapping and modeling knowledge); Knowledge transfer; Technologies for KM; Learning from experiments (Davenport, “Working Knowledge”) Deep Smart: Seeing the big picture and knowing the skills; learning from experience (Leonard, “Deep Smart”) KM Thoughts and Thinkers: (cont’d) : KM Thoughts and Thinkers: (cont’d) Teaching smart people how to learn; Defensive reasoning and doom loop; Learning how to reason productively (Argyris, “Teaching Smart People How to Learn”) Technology gets in the way; Research on work practices; Harvesting local innovation and innovating with customer; PARC anthropologists (John Seely Brown, “Research that Reinvents the Corporation”) Inverting organizations (individual professionals leading); Creating intellectual webs (Quinn, “Managing Professional Intellect”) Slide 20: Knowledge Management: The Industry and Status Slide 21: Anderson Consulting (Accenture) (1) Acquire (2) Create (3) Synthesize (4) Share (5) Use to Achieve Organizational Goals (6) Environment Conducive to Knowledge Sharing Slide 22: Ernst & Young (1) Knowledge Generation (2) Knowledge Representation (3) Knowledge Codification (4) Knowledge Application Slide 23: Reason for Adopting KM 51.9% Retain expertise of personnel Increase customer satisfaction 43.1% Improve profits, grow revenues 37.5% Support e-business initiatives 24.7% Shorten product development cycles 23% Provide project workspace 11.7% Knowledge Management and IDC May 2001 Slide 24: Business Uses Of KM Initiative 77.7% Capture and share best practices Provide training, corporate learning 62.4% Manage customer relationships 58% Deliver competitive intelligence 55.7% Provide project workspace 31.4% Manage legal, intellectual property 31.4% Continue Slide 25: Leader Of KM Initiative Knowledge Management and IDC May 2001 Slide 26: 41% Employees have no time for KM Current culture does not encourage sharing 36.6% Lack of understanding of KM and Benefits 29.5% Inability to measure financial benefits of KM 24.5% Lack of Skill in KM techniques 22.7% Organization’s processes are not designed for KM 22.2% Continue Implementation Challenges Slide 27: 21.8% Lack of funding for KM Lack of incentives, rewards to share 19.9% Have not yet begun implementing KM 18.7% Lack of appropriate technology 17.4% Lack of commitment from senior management 13.9% No challenges encountered 4.3% Implementation Challenges Knowledge Management and IDC May 2001 Slide 28: 44.7% Messaging e-mail Knowledge base, repository 40.7% Document management 39.2% Data warehousing 34.6% Groupware 33.1% Search engines 32.3% Types of Software Purchased Continue Slide 29: 23.8% Web-based training Workflow 23.8% Enterprise information portal 23.2% Business rules management 11.6% Types of Software Purchased Knowledge Management and IDC May 2001 Slide 30: Spending On IT Services For KM 27% Implementation 27.8% Consulting Planning 15.3% Training 13.7% Maintenance 15.3% Operations, outsourcing Knowledge Management and IDC May 2001 Slide 31: 35.6% 24.4% Enterprise information portal Document management 26.2% Groupware Workflow 22.9% Data warehousing 19.3% Search engines 13.0% Software Budget Allotments Continue Slide 32: 11.4% Web-based training Messaging e-mail 10.8% Other 29.2% Software Budget Allotments Knowledge Management and IDC May 2001 Slide 33: Knowledge Management Systems: Overview Slide 34: Knowledge Management Systems (KMS) Characteristics of KMS The Industry and the Market Major Vendors and Systems Knowledge Management Systems Definition : Knowledge Management Systems Definition KMSs are computer-based information systems that: can help an enterprise acquire, manage, retain, analyze, and retrieve mission-critical information; and help turn enterprise information into well-organized, abstract, and actionable knowledge; and can help an enterprise identify and inter-connect experts, managers, and knowledge workers; and help extract, retain, and disseminate their knowledge in an organization. KM Architecture (Source: GartnerGroup) : KM Architecture (Source: GartnerGroup) Network Services Platform Services Distributed Object Models Databases Database Indexes Conceptual Knowledge Maps Web Browser “Workgroup” Applications Text Indexes Enterprise Knowledge Architecture Intranet and Extranet Applications Web UI KR Functions Text and Database Drivers Physical Application Index Knowledge Retrieval Knowledge Retrieval Level (Source: GartnerGroup) : Knowledge Retrieval Level (Source: GartnerGroup) Concept “Yellow Pages” Value “Recommendation” Retrieved Knowledge Semantic Collaboration Clustering — categorization “table of contents” Semantic Networks “index” Dictionaries Thesauri Linguistic analysis Data extraction Collaborative filters Communities Trusted advisor Expert identification Knowledge Retrieval Vendor Direction(Source: GartnerGroup) : Knowledge Retrieval Vendor Direction(Source: GartnerGroup) grapeVINE Sovereign Hill CompassWare Intraspect KnowledgeX WiseWire • Lycos • Autonomy • Perspecta Lotus Netscape* Technology Innovation Niche Players IR Leaders Verity Fulcrum Excalibur Dataware Microsoft Content Experience • IDI Oracle • Open Text • Folio • IBM • InText PCDOCS Documentum Knowledge Retrieval NewBies Newbies: IR Leaders: Niche Players: Market Target * Not yet marketed Slide 39: KM Software Vendors Ability to Execute Completeness of Vision Niche Players Visionaries Challengers Leaders Microsoft * Lotus * Dataware * * Verity * Excalibur Netscape * Documentum* * IBM Inference* Lycos/InMagic* CompassWare* KnowledgeX* SovereignHill* Semio* IDI* PCDOCS/* Fulcrum OpenText* Autonomy* GrapeVINE* * InXight WiseWire* *Intraspect Two Approaches to Codify Knowledge : Two Approaches to Codify Knowledge Structured Manual Human-driven Unstructured System-aided Data/Info-driven Bottom-Up Approach Top-Down Approach Slide 41: Sample KMS: Search Engine and Web Portal Data Mining Text Mining Web Mining Slide 42: Managing Information: Search Engine and Web Portal (Source: Jan Peterson and William Chang, Excite) Basic Architectures: Search : Basic Architectures: Search Web Log Index SE Spider Spam Freshness Quality results 20M queries/day Browser 800M pages? 24x7 SE SE Basic Architectures: Directory : Basic Architectures: Directory Web Browser Url submission Surfing Ontology Reviewed Urls SE SE SE Spidering : Spidering Web HTML data Hyperlinked Directed, disconnected graph Dynamic and static data Estimated 2 billion indexible pages Freshness How often are pages revisited? Indexing : Indexing Size from 50M to 150M to 3B urls 50 to 100% indexing overhead 200 to 400GB indices Representation Fields, meta-tags and content NLP: stemming? Search : Search Augmented Vector-space Ranked results with Boolean filtering Quality-based re-ranking Based on hyperlink data or user behavior Spam Manipulation of content to improve placement Queries : Queries Short expressions of information need 2.3 words on average Relevance overload is a key issue Users typically only view top results Search is a high volume business Yahoo! 50M queries/day Excite 30M queries/day Infoseek 15M queries/day Slide 49: Alta Vista: within site search, machine translation Directory : Directory Manual categorization and rating Labor intensive 20 to 50 editors High quality, but low coverage 200-500K urls Browsable ontology Open Directory is a distributed solution Slide 51: Yahoo: manual ontology (200 ontologists) Special Collections : Special Collections Newswire Newsgroups Specialized services (Deja) Information extraction Shopping catalog Events; recipes, etc. The Hidden Web : The Hidden Web Non-indexible content Behind passwords, firewalls Dynamic content Often searchable through local interface Network of distributed search resources How to access? Ask Jeeves! The Role of NLP : The Role of NLP Many Search Engines do not stem Precision bias suggests conservative term treatment What about non-English documents N-grams are popular for Chinese Language ID anyone? Link Analysis : Link Analysis Authors vote via links Pages with higher inlink are higher quality Not all links are equal Links from higher quality sites are better Links in context are better Resistant to Spam Only cross-site links considered Page Rank (Page’98) : Page Rank (Page’98) Limiting distribution of a random walk Jump to a random page with Prob. ? Follow a link with Prob. 1- ? Probability of landing at a page D: ?/T + ? P(D)/L(D) Sum over pages leading to D L(D) = number of links on page D Who asks What? : Who asks What? Query logs revisited Query-based indexing – why index things people don’t ask for? If they ask for A, give them B From atomic concepts to query extensions Structure of questions and answers Shyam Kapur’s chunks Futures : Futures Vertical markets – healthcare, real estate, jobs and resumes, etc. Localized search Search as embedded app Shopping 'bots Open Problems Has the bubble burst? From SE to Web Portal : From SE to Web Portal Spidering: Intranet and Internet crawling Integration: legacy systems and databases Content: aggregation and conversion Process: Collaboration, chat, workflow management, calendaring, and such Analysis: data and text mining, agent/alert, web mining Slide 60: Discovering Knowledge: Data Mining (Source: Michael Welge Automated Learning Group, NCSA) Why Data Mining? -- Potential Applications : Why Data Mining? -- Potential Applications Database analysis, decision support, and automation Market and Sales Analysis Fraud Detection Manufacturing Process Analysis Risk Analysis and Management Experimental Results Analysis Scientific Data Analysis Text Document Analysis Data Mining: Confluence of Multiple Disciplines : Data Mining: Confluence of Multiple Disciplines Database Systems, Data Warehouses, and OLAP Machine Learning Statistics Mathematical Programming Visualization High Performance Computing Data Mining: A KDD Process : Data Mining: A KDD Process Required Effort for Each KDD Step : Required Effort for Each KDD Step Data Mining Models and Methods : Data Mining Models and Methods Deviation Detection : Deviation Detection Identify outliers in a dataset. Typical techniques: OLAP charting, probability distribution contrasts, regression analysis, discriminant analysis Link Analysis (Rule Association) : Link Analysis (Rule Association) Given a database, find all associations of the form: IF < LHS > THEN <RHS > Prevalence = frequency of the LHS and RHS occurring together Predictability = fraction of the RHS out of all items with the LHS e.g., Beer and diaper Database Segmentation : Database Segmentation Regroup datasets into clusters that share common characteristics. Typical techniques: hierarchical clustering, neural network clustering (SOM), k-means Predictive Modeling : Predictive Modeling Use past data to predict future response and behavior. Typical technique: supervised learning (Neural Networks, Decision Trees, Naïve Bayesian) E.g., Who is most likely to respond to a direct mailing Data/Information Visualization : Data/Information Visualization Gain insight into the contents and complexity of the database being analyzed Vast amounts of under utilized data Time-critical decisions hampered Key information difficult to find Results presentation Reduced perceptual, interpretative, cognitive burden Rule Association - Basket Analysis : Rule Association - Basket Analysis Text Mining Visualization : Text Mining Visualization This data is considered to be confidential and proprietary to Caterpillar and may only be used with prior written consent from Caterpillar. Decision Tree Visualizer : Decision Tree Visualizer From Data Mining to Text Mining : From Data Mining to Text Mining Techniques: linguistics analysis, clustering, unsupervised learning, case-based reasoning Ontologies: XML/RDF, content management P1000: A picture is worth 1000 words Formats/types: email, reports, web pages, etc. Integration: KMS and IT infrastructure Cultural: rewards and unintended consequences