Presentation Description

No description available.


Presentation Transcript

Knowledge Management Systems: Development and ApplicationsPart I: Overview and Related Fields : 

Knowledge Management Systems: Development and ApplicationsPart I: Overview and Related Fields Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial Intelligence Lab The University of Arizona Founder, Knowledge Computing Corporation Acknowledgement: NSF DLI1, DLI2, NSDL, DG, ITR, IDM, CSS, NIH/NLM, NCI, NIJ, CIA, DHS, NCSA, HP, SAP ????????, ??? ??

Slide 2: 

My Background: ( A Mixed Bag!) BS NCTU Management Science, 1981 MBA SUNY Buffalo Finance, MS, MIS Ph.D. NYU Information System, Minor: CS, 1989 Dissertation: “An AI Approach to the Design Of Online Information Retrieval Systems” (GEAC Online Cataloging System) Assistant/Associate/Full/Chair Professor, University of Arizona, MIS Department Scientific Counselor, National Library of Medicine USA), National Library of China, Academia Sinica

Slide 3: 

My Background: (A Mixed Bag!) Founder/Director, Artificial Intelligent Lab, 1990 Founder/Director, Hoffman eCommerce Lab, 2000 PIs: NSF CISE DLI-1 DLI-2, NSDL, DG, DARPA, NIJ, NIH, CIA, DHS Associate Editors: JASIST, DSS, ACM TOIS, IEEE SMC, IEEE ITS Conference/program Co-hairs: ICADL 1998-2004, China DL 2002/2004, NSF/NIJ ISI 2003-2006, JCDL 2004 Industry Consulting: HP, IBM, AT&T, SGI, Microsoft, SAP Founder, Knowledge Computing Corporation, 2000

Slide 4: 

Knowledge Management: Overview

Slide 5: 

Knowledge Management Overview What is Knowledge Management Data, Information, and Knowledge Why Knowledge Management? Knowledge Management Processes

Unit of Analysis : 

Unit of Analysis Data: 1980s Factual Structured, numeric Oracle, Sybase, DB2 Information: 1990s Factual Yahoo!, Excalibur, Unstructured, textual Verity, Documentum Knowledge: 2000s Inferential, sensemaking, decision making Multimedia ???

Slide 7: 

According to Alter (1996), Tobin (1996), and Beckman (1999): Data: Facts, images, or sounds (+interpretation+meaning =) Information: Formatted, filtered, and summarized data (+action+application =) Knowledge: Instincts, ideas, rules, and procedures that guide actions and decisions Data, Information and Knowledge:

Application and Societal Relevance : : 

Application and Societal Relevance : Ontologies, hierarchies, and subject headings Knowledge management systems and practices: knowledge maps Digital libraries, search engines, web mining, text mining, data mining, CRM, eCommerce Semantic web, multilingual web, multimedia web, and wireless web

The Third Wave of Net Evolution : 

1965 1975 1985 1995 2000 2010 ARPANET Internet “SemanticWeb” Company IBM ??? Microsoft/Netscape The Third Wave of Net Evolution Function Server Access Knowledge Access Info Access Unit Server Concepts File/Homepage Example Email Concept Protocols WWW: “World Wide Wait”

Knowledge Management Definition : 

Knowledge Management Definition “The system and managerial approach to collecting, processing, and organizing enterprise-specific knowledge assets for business functions and decision making.”

Knowledge Management Challenges : 

Knowledge Management Challenges “… making high-value corporate information and knowledge easily available to support decision making at the lowest, broadest possible levels …” Personnel Turn-over Organizational Resistance Manual Top-down Knowledge Creation Information Overload

Knowledge Management Landscape : 

Knowledge Management Landscape Research Community NSF / DARPA / NASA, Digital Library Initiative I & II, NSDL ($120M) NSF, Digital Government Initiative ($60M) NSF, Knowledge Networking Initiative ($50M) NSF, Information Technology Research ($300M) Business Community Intellectual Capital, Corporate Memory, Knowledge Chain, Competitive Intelligence

Knowledge Management Foundations : 

Enabling Technologies: Information Retrieval (Excalibur, Verity, Oracle Context) Electronic Document Management (Documentum, PC DOCS) Internet/Intranet (Yahoo!, Google) Groupware (Lotus Notes, MS Exchange) Consulting and System Integration: Best practices, human resources, organizational development, performance metrics, methodology, framework, ontology (Delphi, E&Y, Arthur Andersen, AMS, KPMG) Knowledge Management Foundations

Knowledge Management Perspectives: : 

Knowledge Management Perspectives: Process perspective (management and behavior): consulting practices, methodology, best practices, e-learning, culture/reward, existing IT ? new information, old IT, new but manual process Information perspective (information and library sciences): content management, manual ontologies ? new information, manual process Knowledge Computing perspective (text mining, artificial intelligence): automated knowledge extraction, thesauri, knowledge maps ? new IT, new knowledge, automated process

Slide 15: 

KM Perspectives

KM, Emergence of a Discipline (Ponzi, 2004): : 

KM, Emergence of a Discipline (Ponzi, 2004): Influences from three disciplines: Management and Policy (40%), Computer Science (30%), Information/Library Science (20%) Continuous, steady growth since 1990: academic publications and industry articles; not a fad (unlike BPR, TQM) Seminal books and articles in Knowledge Management (e.g., Drucker, Davenport, Nonaka): the 50 most-cited KM articles

KM Thoughts and Thinkers: : 

KM Thoughts and Thinkers: Future organizations are information-based organizations of knowledge workers; Specialization, cross-discipline task teams, disappearance of middle managers (Drucker, “The Coming of the New Organization”) The Japanese Management Style: Tacit knowledge, redundancy, slogans, metaphors; the “Ba”; the SECI Model – Socialization, Externalization, Combination, and Internalization (Nonaka, “The Knowledge-Creating Company)

KM Thoughts and Thinkers: (cont’d) : 

KM Thoughts and Thinkers: (cont’d) Knowledge generation (acquisition, dedicated resources, fusion, adaptation, knowledge networking); Knowledge codification (mapping and modeling knowledge); Knowledge transfer; Technologies for KM; Learning from experiments (Davenport, “Working Knowledge”) Deep Smart: Seeing the big picture and knowing the skills; learning from experience (Leonard, “Deep Smart”)

KM Thoughts and Thinkers: (cont’d) : 

KM Thoughts and Thinkers: (cont’d) Teaching smart people how to learn; Defensive reasoning and doom loop; Learning how to reason productively (Argyris, “Teaching Smart People How to Learn”) Technology gets in the way; Research on work practices; Harvesting local innovation and innovating with customer; PARC anthropologists (John Seely Brown, “Research that Reinvents the Corporation”) Inverting organizations (individual professionals leading); Creating intellectual webs (Quinn, “Managing Professional Intellect”)

Slide 20: 

Knowledge Management: The Industry and Status

Slide 21: 

Anderson Consulting (Accenture) (1) Acquire (2) Create (3) Synthesize (4) Share (5) Use to Achieve Organizational Goals (6) Environment Conducive to Knowledge Sharing

Slide 22: 

Ernst & Young (1) Knowledge Generation (2) Knowledge Representation (3) Knowledge Codification (4) Knowledge Application

Slide 23: 

Reason for Adopting KM 51.9% Retain expertise of personnel Increase customer satisfaction 43.1% Improve profits, grow revenues 37.5% Support e-business initiatives 24.7% Shorten product development cycles 23% Provide project workspace 11.7% Knowledge Management and IDC May 2001

Slide 24: 

Business Uses Of KM Initiative 77.7% Capture and share best practices Provide training, corporate learning 62.4% Manage customer relationships 58% Deliver competitive intelligence 55.7% Provide project workspace 31.4% Manage legal, intellectual property 31.4% Continue

Slide 25: 

Leader Of KM Initiative Knowledge Management and IDC May 2001

Slide 26: 

41% Employees have no time for KM Current culture does not encourage sharing 36.6% Lack of understanding of KM and Benefits 29.5% Inability to measure financial benefits of KM 24.5% Lack of Skill in KM techniques 22.7% Organization’s processes are not designed for KM 22.2% Continue Implementation Challenges

Slide 27: 

21.8% Lack of funding for KM Lack of incentives, rewards to share 19.9% Have not yet begun implementing KM 18.7% Lack of appropriate technology 17.4% Lack of commitment from senior management 13.9% No challenges encountered 4.3% Implementation Challenges Knowledge Management and IDC May 2001

Slide 28: 

44.7% Messaging e-mail Knowledge base, repository 40.7% Document management 39.2% Data warehousing 34.6% Groupware 33.1% Search engines 32.3% Types of Software Purchased Continue

Slide 29: 

23.8% Web-based training Workflow 23.8% Enterprise information portal 23.2% Business rules management 11.6% Types of Software Purchased Knowledge Management and IDC May 2001

Slide 30: 

Spending On IT Services For KM 27% Implementation 27.8% Consulting Planning 15.3% Training 13.7% Maintenance 15.3% Operations, outsourcing Knowledge Management and IDC May 2001

Slide 31: 

35.6% 24.4% Enterprise information portal Document management 26.2% Groupware Workflow 22.9% Data warehousing 19.3% Search engines 13.0% Software Budget Allotments Continue

Slide 32: 

11.4% Web-based training Messaging e-mail 10.8% Other 29.2% Software Budget Allotments Knowledge Management and IDC May 2001

Slide 33: 

Knowledge Management Systems: Overview

Slide 34: 

Knowledge Management Systems (KMS) Characteristics of KMS The Industry and the Market Major Vendors and Systems

Knowledge Management Systems Definition : 

Knowledge Management Systems Definition KMSs are computer-based information systems that: can help an enterprise acquire, manage, retain, analyze, and retrieve mission-critical information; and help turn enterprise information into well-organized, abstract, and actionable knowledge; and can help an enterprise identify and inter-connect experts, managers, and knowledge workers; and help extract, retain, and disseminate their knowledge in an organization.

KM Architecture (Source: GartnerGroup) : 

KM Architecture (Source: GartnerGroup) Network Services Platform Services Distributed Object Models Databases Database Indexes Conceptual Knowledge Maps Web Browser “Workgroup” Applications Text Indexes Enterprise Knowledge Architecture Intranet and Extranet Applications Web UI KR Functions Text and Database Drivers Physical Application Index Knowledge Retrieval

Knowledge Retrieval Level (Source: GartnerGroup) : 

Knowledge Retrieval Level (Source: GartnerGroup) Concept “Yellow Pages” Value “Recommendation” Retrieved Knowledge Semantic Collaboration Clustering — categorization “table of contents” Semantic Networks “index” Dictionaries Thesauri Linguistic analysis Data extraction Collaborative filters Communities Trusted advisor Expert identification

Knowledge Retrieval Vendor Direction(Source: GartnerGroup) : 

Knowledge Retrieval Vendor Direction(Source: GartnerGroup) grapeVINE Sovereign Hill CompassWare Intraspect KnowledgeX WiseWire • Lycos • Autonomy • Perspecta Lotus Netscape* Technology Innovation Niche Players IR Leaders Verity Fulcrum Excalibur Dataware Microsoft Content Experience • IDI Oracle • Open Text • Folio • IBM • InText PCDOCS Documentum Knowledge Retrieval NewBies Newbies: IR Leaders: Niche Players: Market Target * Not yet marketed

Slide 39: 

KM Software Vendors Ability to Execute Completeness of Vision Niche Players Visionaries Challengers Leaders Microsoft * Lotus * Dataware * * Verity * Excalibur Netscape * Documentum* * IBM Inference* Lycos/InMagic* CompassWare* KnowledgeX* SovereignHill* Semio* IDI* PCDOCS/* Fulcrum OpenText* Autonomy* GrapeVINE* * InXight WiseWire* *Intraspect

Two Approaches to Codify Knowledge : 

Two Approaches to Codify Knowledge Structured Manual Human-driven Unstructured System-aided Data/Info-driven Bottom-Up Approach Top-Down Approach

Slide 41: 

Sample KMS: Search Engine and Web Portal Data Mining Text Mining Web Mining

Slide 42: 

Managing Information: Search Engine and Web Portal (Source: Jan Peterson and William Chang, Excite)

Basic Architectures: Search : 

Basic Architectures: Search Web Log Index SE Spider Spam Freshness Quality results 20M queries/day Browser 800M pages? 24x7 SE SE

Basic Architectures: Directory : 

Basic Architectures: Directory Web Browser Url submission Surfing Ontology Reviewed Urls SE SE SE

Spidering : 

Spidering Web HTML data Hyperlinked Directed, disconnected graph Dynamic and static data Estimated 2 billion indexible pages Freshness How often are pages revisited?

Indexing : 

Indexing Size from 50M to 150M to 3B urls 50 to 100% indexing overhead 200 to 400GB indices Representation Fields, meta-tags and content NLP: stemming?

Search : 

Search Augmented Vector-space Ranked results with Boolean filtering Quality-based re-ranking Based on hyperlink data or user behavior Spam Manipulation of content to improve placement

Queries : 

Queries Short expressions of information need 2.3 words on average Relevance overload is a key issue Users typically only view top results Search is a high volume business Yahoo! 50M queries/day Excite 30M queries/day Infoseek 15M queries/day

Slide 49: 

Alta Vista: within site search, machine translation

Directory : 

Directory Manual categorization and rating Labor intensive 20 to 50 editors High quality, but low coverage 200-500K urls Browsable ontology Open Directory is a distributed solution

Slide 51: 

Yahoo: manual ontology (200 ontologists)

Special Collections : 

Special Collections Newswire Newsgroups Specialized services (Deja) Information extraction Shopping catalog Events; recipes, etc.

The Hidden Web : 

The Hidden Web Non-indexible content Behind passwords, firewalls Dynamic content Often searchable through local interface Network of distributed search resources How to access? Ask Jeeves!

The Role of NLP : 

The Role of NLP Many Search Engines do not stem Precision bias suggests conservative term treatment What about non-English documents N-grams are popular for Chinese Language ID anyone?

Link Analysis : 

Link Analysis Authors vote via links Pages with higher inlink are higher quality Not all links are equal Links from higher quality sites are better Links in context are better Resistant to Spam Only cross-site links considered

Page Rank (Page’98) : 

Page Rank (Page’98) Limiting distribution of a random walk Jump to a random page with Prob. ? Follow a link with Prob. 1- ? Probability of landing at a page D: ?/T + ? P(D)/L(D) Sum over pages leading to D L(D) = number of links on page D

Who asks What? : 

Who asks What? Query logs revisited Query-based indexing – why index things people don’t ask for? If they ask for A, give them B From atomic concepts to query extensions Structure of questions and answers Shyam Kapur’s chunks

Futures : 

Futures Vertical markets – healthcare, real estate, jobs and resumes, etc. Localized search Search as embedded app Shopping 'bots Open Problems Has the bubble burst?

From SE to Web Portal : 

From SE to Web Portal Spidering: Intranet and Internet crawling Integration: legacy systems and databases Content: aggregation and conversion Process: Collaboration, chat, workflow management, calendaring, and such Analysis: data and text mining, agent/alert, web mining

Slide 60: 

Discovering Knowledge: Data Mining (Source: Michael Welge Automated Learning Group, NCSA)

Why Data Mining? -- Potential Applications : 

Why Data Mining? -- Potential Applications Database analysis, decision support, and automation Market and Sales Analysis Fraud Detection Manufacturing Process Analysis Risk Analysis and Management Experimental Results Analysis Scientific Data Analysis Text Document Analysis

Data Mining: Confluence of Multiple Disciplines : 

Data Mining: Confluence of Multiple Disciplines Database Systems, Data Warehouses, and OLAP Machine Learning Statistics Mathematical Programming Visualization High Performance Computing

Data Mining: A KDD Process : 

Data Mining: A KDD Process

Required Effort for Each KDD Step : 

Required Effort for Each KDD Step

Data Mining Models and Methods : 

Data Mining Models and Methods

Deviation Detection : 

Deviation Detection Identify outliers in a dataset. Typical techniques: OLAP charting, probability distribution contrasts, regression analysis, discriminant analysis

Link Analysis (Rule Association) : 

Link Analysis (Rule Association) Given a database, find all associations of the form: IF < LHS > THEN <RHS > Prevalence = frequency of the LHS and RHS occurring together Predictability = fraction of the RHS out of all items with the LHS e.g., Beer and diaper

Database Segmentation : 

Database Segmentation Regroup datasets into clusters that share common characteristics. Typical techniques: hierarchical clustering, neural network clustering (SOM), k-means

Predictive Modeling : 

Predictive Modeling Use past data to predict future response and behavior. Typical technique: supervised learning (Neural Networks, Decision Trees, Naïve Bayesian) E.g., Who is most likely to respond to a direct mailing

Data/Information Visualization : 

Data/Information Visualization Gain insight into the contents and complexity of the database being analyzed Vast amounts of under utilized data Time-critical decisions hampered Key information difficult to find Results presentation Reduced perceptual, interpretative, cognitive burden

Rule Association - Basket Analysis : 

Rule Association - Basket Analysis

Text Mining Visualization : 

Text Mining Visualization This data is considered to be confidential and proprietary to Caterpillar and may only be used with prior written consent from Caterpillar.

Decision Tree Visualizer : 

Decision Tree Visualizer

From Data Mining to Text Mining : 

From Data Mining to Text Mining Techniques: linguistics analysis, clustering, unsupervised learning, case-based reasoning Ontologies: XML/RDF, content management P1000: A picture is worth 1000 words Formats/types: email, reports, web pages, etc. Integration: KMS and IT infrastructure Cultural: rewards and unintended consequences

authorStream Live Help