Presentation Transcript
Cyborg Categorization Salvation for Search? : Cyborg Categorization Salvation for Search? Tom Reamy
Information Architect
Charles Schwab © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. (0401-6450)
Categorization Explosion: Categorization Explosion Autonomy
Semio
Verity
Inxight
Topical Net
Mohomine
Simile
H5Technologies GammaSite
MetaTagger
Applied Semantics
Sageware
SmartLogik
Quiver
PurpleYogi
Other - Tacit
Categorization: Why Now?: Categorization: Why Now? Forrester: Must Search Stink?
Browse and Search
Need a Taxonomy
Problem: Expensive to develop Taxonomies
Buy Search to get Categorization
News Feeds - Corporate Intranets: News Feeds - Corporate Intranets News Feeds and Content providers
uniform content, size and structure
professional writers
Simple or standard vocabulary
Corporate intranet
Wildly varied content
Mix of good, bad, and ugly writers
Tower of Babel: Acronyms, special meanings © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. (0401-6450)
Auto-Categorization: the How: Auto-Categorization: the How Rules
Catalog by Example
Statistical Clustering
Support Vector Machines
Machine Learning
World Knowledge © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. (0401-6450)
Automatic vs. Humanatic: Automatic vs. Humanatic Humans are better, but not as consistent
General bin, understandable mistakes
Bring outside contexts to the document
Purpose, similar documents, common sense
Computers are faster and cheaper.
Faster yes, Cheaper ?
Cost of poorer quality categorization
Intranet: 20,000 users taking 60 seconds longer = $20,000 a week © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. (0401-6450)
The Answer is Cyborg: The Answer is Cyborg Integration not Assimilation
Human and Computer Integration
Iterative, distributed work flow, ease of use
Cyborg and Content Management
Categorization and keywords by Subject Matter Experts
Cyborg and Search
Computers and people learn from each other
Create the Taxonomy: Create the Taxonomy Top Level Taxonomy - 7-12 Categories
Human intensive, Cluster - random creativity
Grow the Taxonomy - 2nd - 3rd Levels
Humans - create rules, select training sets
Computers - Taxonomy Builders, Refine rules or training sets
Essential Feature
White Box Categorization
Customize algorithm, not just results © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. (0401-6450)
Refine the Taxonomy: Refine the Taxonomy Initial Phase: Information Architect Effort
Suggest
Provisional Categorization, Meta Data
Automatic Summarization
Support
Distributed Work flow
Visualization of taxonomic relationships © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. (0401-6450)
Maintain the Taxonomy: Maintain the Taxonomy Intranets - ongoing human efforts
Can’t pass on the cost to your customers - they work for the same company as you
Continue and Improve Refinement
Collaborative Categorization
Features:
Smart Learning categorization
Integration - Content management, Search © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. (0401-6450)
Apply the Taxonomy: Apply the Taxonomy Integration of Search and Categorization
Browse and Search
Real time clustering, customiztion of results
support collaborative filtering
Integration with Content Management
Integrated Distributed Work Flow
Support Taxonomic Publishing Model
Integration with Expertise & Processes © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. (0401-6450)
Lessons Learned: Lessons Learned Out of the Box, Out of Your Mind
Play well with others
Brain surgery is funl
World revolves around you
Quality counts and size matters
Let a Hundred flowers Bloom
The End © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. (0401-6450)
The END: The END Really.