An Automatic Classification Approach to Business Stakeholder Analysis on the Web : An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen,
Edna O. F. Reid
January 16, 2003
Agenda : Agenda Introduction
Literature Review
Research Questions
Research Approach and Testbed
Evaluation Methodology
Experimental Results and Discussion
Conclusions and Future Directions
Introduction : Introduction
Current Business Environment : Current Business Environment Networked business environment facilitates information sharing
Collaborative commerce integrates business processes among partners through electronic sharing of information
Sales support, vendor management, planning and scheduling, demand planning, etc.
Knowledge sharing about stakeholder relationships through a company’s Web sites and pages
Textual content or annotated hyperlinks
Problems : Problems Information overload on the Web
Hinders analysis of stakeholder relationships
Knowledge hidden in interconnected Web resources
Posing challenges to identifying and classifying various business stakeholders
e.g., A company’s manager may not know who are using their company’s Web resources
Problem of traditional stakeholder analysis
The emergence of electronic commerce
An Automatic Classification Approach : An Automatic Classification Approach Need better approaches to uncovering such knowledge
Enhance understanding of business stakeholders
Enhance understanding of competitive environments
We propose an automatic classification approach to business stakeholder analysis
Human knowledge + machine-learned information
We will review related areas in stakeholder analysis and Web page classification techniques
Literature Review : Literature Review
Stakeholder Analysis : Stakeholder Analysis Stakeholder theories evolve over time while the view of firm changes
Production view (19th century): Suppliers and Customers
Managerial view (20th century): + Owners, Employees
Stakeholder view (1960-80s) (Freeman, 1984): + Competitors, Governments, News Media, Environmentalists, …
E-commerce view (1990s - now): + International partners, Online communities, Multinational employees, …
Slide9 : These types, ordered by their relevance to those appearing on the Web, are important for practical understanding of stakeholders of firms
Slide10 : P = Partners/suppliers, E = Employees/Unions, C = Customers,
S = Shareholders/investors, U = Education/research institutions, M=Media/Portals,
G = Public/government, R = Recruiters, V = Reviewers, O = Competitors,
T = Trade associations, F = Financial institutions, I = Political groups,
N = SIG/Communities
(Note that a class “Unknown” is not included here) *
Comments on Stakeholder Research : Comments on Stakeholder Research Strong explanatory power but are weak at practical classification of stakeholders
Conclusions drawn from old data
Previous research rarely considers the many opportunities offered by the Web for stakeholder analysis, e.g.,
Business intelligence, which is obtained from the business environment, is likely to help in stakeholder activities
Tools have been developed to exploit business intelligence but not yet applied to stakeholder analysis
BI and Stakeholder Analysis : BI and Stakeholder Analysis Advanced BI tools often rely on Web mining techniques to discover patterns on the Web automatically (Etzioni 1996; Kosala & Blockeel 2000), e.g.,
PageRank (Brin & Page 1998), HITS (Kleinberg 1999), Web IF (Ingwersen 1998)
External links mirror social communication phenomena (e.g., stakeholder relationships)
Tools and approaches exploit Web content and link structure information
Ong et al 2001; Tan et al. 2002; Reiterer et al. 2000; Chung et al. 2003; Reid 2003; Byrne 2003
Information on the Web : Information on the Web Structural and textual content
But commercial BI tools lack analysis capability (Fuld et al. 2002)
Need to automate stakeholder classification, a primary step in stakeholder analysis
Automatic classification of Web pages is a promising way to alleviate the problem
Web Page Classification : Web Page Classification The process of assigning pages to predefined categories
Helps to discover companies’ stakeholders on the Web and enables companies to understand the competitive environment better
Major approaches include k-nearest neighbor, neural network, Support Vector Machines, and Naïve Bayesian network (Chen & Chau 2004)
Previous work
Kwon and Lee 2003; Mladenic 1998; Furnkranz 1999; Lee et al. 2002; Glover et al. 2002
Feature selection in Web Page Classification : Feature selection in Web Page Classification Features considered
Page textual content: full text, page title, headings
Link related textual content: anchor text, extended anchor text, URL strings
Page structural information: #words, #page out-links, inbound outlinks (i.e., links that point to its own company), outbound outlinks (i.e., links that point to external Web site)
Methods for selection
Human judgment / Use of domain lexicon
Feature ratios and thresholding
Frequency counting / MI
Research Questions : Research Questions
Research Gaps : Research Gaps Stakeholder research provides rich theoretical background but rarely considers the tremendous opportunities offered by the Web for stakeholder analysis
Conclusions drawn from old data may not reflect rapid development in e-commerce
Existing BI tools lack stakeholder analysis capability
Automatic Web page classification techniques are well developed but have not yet been applied to business stakeholder classification
Research Questions : Research Questions How can we develop an automated approach to business stakeholder analysis on the Web?
How can Web page textual content and structural information be used in such an approach?
What are the effectiveness (measured by accuracy) and efficiency (measured by time requirement) of such an approach for business stakeholder classification on the Web?
Research Approach and Testbed : Research Approach and Testbed
Automatic Classification Approach : Automatic Classification Approach Purpose: To automatically classify the stakeholders of businesses on the Web in order to facilitate stakeholder analysis
Rationale
Business stakeholders should have identifiable clues that can be used to distinguish their types
The Web content and structural information is important for understanding the clues for stakeholder classification
Two generic steps:
Creation of a domain lexicon that contains key textual attributes for identifying stakeholders
Automatic classification of Web pages (stakeholders) linking to selected companies based on textual and structural content of Web pages
Building a Research Testbed : Building a Research Testbed Business stakeholders of the KM World top 100 KM companies (McKellar 2003)
Used backlink search function of the Google search engine to search for Web pages having hyperlinks pointing to the companies’ Web sites
For each host company, we considered only the first 100 results returned
Removed self links and extra links from same sites
After filtering, we obtained 3,713 results in total
Randomly selected the results of 9 companies as training examples (414 283 pages stored in DB)
Creation of a Domain Lexicon : Creation of a Domain Lexicon Manually read through all the Web pages of the nine companies’ business stakeholders to identify one-, two-, and three-word terms that were indicative of business stakeholder types
Extracted a total of 329 terms (67 one-word terms, 84 two-word terms, and 178 three-word terms), e.g.,
Automatic Stakeholder Classification : Automatic Stakeholder Classification Three steps: Manual Tagging Feature selection Automatic classification
Manual Tagging : Manual Tagging Manually classified each of the stakeholder pages of the nine selected companies into one of the 11 stakeholder types (based on our review on slides 9-10) Manual tagging Feature selection Automatic classification
Feature Selection : Feature Selection Structural content features: binary variables indicating whether certain lexicon terms are present in the structural content
A term could be a one-, two-, or three-word long
Considered occurrences in title, extended anchor text, and full text
Textual content features: frequencies of occurrences of the extracted features
The first set of features was selected based on human knowledge, while the second was selected based on statistical aggregation, thereby combining both kinds of knowledge Manual tagging Feature selection Automatic classification
An Example(a media type) :
David Schatsky: Search and Discovery in the Post-Cold War Era ...
I just saw a demo by ClearForest, a company that provides tools for analyzing unstructured textual information. It's truly amazing, and truly the search tool for the post-Cold War era. ...
...
An Example (a media type) Link to the host company (ClearForest) HTML hyperlink and extended anchor text
Automatic Classification : Automatic Classification A feedforward/backpropagation neural network (Lippman 1987) and SVM (Joachims, 1998) were used due to their robustness in automatic classification
Train the algorithms using the stakeholder pages of the 9 training companies and obtain a model or sets of weights for classification
Test the algorithms on sets of stakeholder pages of 10 companies different from training examples Manual tagging Automatic classification Feature selection
Evaluation Methodology : Evaluation Methodology
Experimental Design : Experimental Design Consisted of algorithm comparison, feature comparison, and a user evaluation study
Compared the performance of neural network (NN), SVM, baseline method (random classification), human judgment
Compared structural content features, textual content features, and a combination of the two sets of features
36 Univ of Arizona business students performed manual stakeholder classification and provided comments on the approach
Performance Measures : Performance Measures Effectiveness:
Overall accuracy
Within-class accuracy
Efficiency: time used (in minutes)
User subjective ratings and comments
User Study : User Study Each subject was introduced to stakeholder analysis and was asked to use our system named “Business Stakeholder Analyzer (BSA)” to browse companies’ stakeholder lists
We randomly selected three companies (Intelliseek, Siebel, and WebMethods) from testing companies to be the targets of analysis
Hypotheses (1) : Hypotheses (1) H1: NN and SVM would achieve similar effectiveness when the same set of features was used
Both techniques were robust
Procedure: created 30 sets of stakeholder pages by randomly selecting groups of 5 stakeholder pages of each of the 10 testing companies
Hypotheses (2) : Hypotheses (2) H2: NN and SVM would perform better than the baseline method
Incorporated human knowledge and machine learning capability into the classification
H3: Human judgment in stakeholder classification would achieve effectiveness similar to that of machine learning, but that the former is less efficient
They could make use of the Web page’s textual and structural content in classifying stakeholders
Humans might spend more time on it
Hypotheses (3) : Hypotheses (3) H4 & H5 examined the use of different types of features in automatic stakeholder classification
H4: structural = textual
H5: combined > structural or textual alone
Experimental Results and Discussion : Experimental Results and Discussion
Algorithm Comparison : Algorithm Comparison H1 not confirmed
NN performed significantly differently than SVM when the same set of features was used
NN performed significantly better than SVM when structural content features were used
SVM performed significantly better than NN when textual content features or a combination of both feature sets were used
More studies would be needed to identify optimal feature sets for each algorithm
Effectiveness of the Approach : Effectiveness of the Approach H2 confirmed
The use of any combination of features and techniques in automatic stakeholder classification outperformed the baseline method significantly
Our approach has integrated human knowledge with machine-learned information related to stakeholder types …
and was significantly better than a random conjecture
Comparing with Human Judgment : Comparing with Human Judgment H3b and H3d (efficiency) confirmed
Human: 22 minutes (average), varied
Algorithms: 1 – 30 seconds (average)
Showing high efficiency of using the automatic approach to facilitate stakeholder analysis
H3a and H3c (effectiveness) not confirmed
Humans were significantly more effective than NN or SVM
They could rely on more clues in performing classification
Experience in Internet browsing and searching helped narrow down choices
However, the algorithms achieved better within-class accuracies than humans in frequently occurring types … : However, the algorithms achieved better within-class accuracies than humans in frequently occurring types …
Use of Features : Use of Features To our surprise, hypotheses H4a-b, H5a-b, and H5d were not confirmed
Different feature sets yielded different performances of the algorithms
Structural features enabled NN to achieve better effectiveness than textual ones
Textual and combined features enabled SVM to achieve better effectiveness than structural ones
Do not know exactly why
Future research: studying the effect of features and the nature of algorithms
H5c was confirmed: structural content feature did not add value to the performance of SVM
Subjects’ Comments : Subjects’ Comments Overwhelmingly positive
“It would be very helpful!”
“That’s cool!”
“I want to use it.”
Conclusions and Future Directions : Conclusions and Future Directions
Conclusions : Conclusions Proposed an automatic classification approach to business stakeholder analysis on the Web
Integrated Human expert knowledge + machine-learned information
Promising in terms of effectiveness and efficiency
A strong potential to use the approach to augment traditional stakeholder classification
Could potentially facilitate business analysts’ interaction with automated stakeholder analysis systems in today’s networked enterprises
Future Directions : Future Directions To automate the next steps of business stakeholder analysis
With more expert participation and more Web page data
Type-specific stakeholder analysis
e.g., partner relationships are often important in developing business strategies
Automating cross-regional business stakeholder analysis
Study multinational business partnerships and cooperation and related HCI issues