webCommunity Mining

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Web Community Mining and Analysis: 

Web Community Mining and Analysis Yanchun Zhang, Guandong Xu School of Computer Science and Mathematics & Innovative e-Research Centre Victoria University, Australia

e-Research @ Victoria University Prof. Yanchun Zhang : 

e-Research @ Victoria University Prof. Yanchun Zhang

Innovative e-Research: 

Innovative e-Research e-Research has become extremely important research direction e-Research is an application driven and multidisciplinary research e-Research projects include: e-Business, e-Health, e-Learning, e-Law, e-Water, etc.

Current E-Research at VU: 

Current E-Research at VU E-Research activities at VU: E-Water E-Health E-Law E-Tourism E-Learning Back E-Water E-Health E-Law E-Tourism E-Learning Skip

Innovative e-Research: 

Innovative e-Research

Web Community Mining and Analysis Outline: 

Web Community Mining and Analysis Outline Introduction: Problems & Challenges Mining Web: Content, structure, usage, recommendation Web Community Discovering User Access Pattern Future work

Y Zhang, J X Yu and J. Hou, Web Communities: Analysis ad Construction, Springer 2006: 

Y Zhang, J X Yu and J. Hou, Web Communities: Analysis ad Construction, Springer 2006 Introduction Preliminaries HITS and Related Algorithms PageRank Related Algorithms Affinity and Co-Citation Analysis Approaches Building a Web Community Web Community Related Techniques ........ Conclusions http://e-research.csm.vu.edu.au/books/web-community/index.php

Cafe Babe 2006-11-02[research] : 

Cafe Babe 2006-11-02[research] Web Communities -Analysis and Construction- (Springer-Verlag) 一言で言えば,Webのハイパーリンクの解析についてまとめた本で,たとえば,HITSやPageRankのようにWebページの重要度を判定する方法や,Webページの類似度判定とHierarchical Clustring,Matrix-Based Clustering,Co-Citationなどのクラスタリング手法,そしてWebコミュニティの抽出などについて述べている. Web Communities: Analysis and construction 作者: Yanchun Zhang, Jeffrey Xu Yu, Jingyu Hou 出版社/メーカー: Springer-Verlag New York Inc (C) メディア: ハードカバー この本が良いのは,たとえばHITS,PageRankと言っても,その関連アルゴリズムをかなり網羅的に取り上げているところか.つまり,この本を読めば,その技術の周辺のトピックがだいたい把握できる.ただし,内容に深くは踏み込んでいないので,詳細は原論文を読まねばならない. まあ,リンク解析の研究開発をするのなら,必須の一冊かもしれない.

ACM Reviews: 

ACM Reviews The book can be used by applied mathematicians, search industry professionals, and anyone who wants to learn more about how search engines work. I recommend it for any course on Web information retrieval. I firmly believe that this book and the book by Langville and Meyer are the top two books about the algorithmic aspects of modern search engines. (Yannis Manolopoulos, Aristotle University, Thessaloniki, Greece in ACM REVIEWS, 2006)

1. Introduction: 

1. Introduction The Internet & Web disseminate & search information and conduct business. perform collaborative work Problem: information overload.

Web mining: 

Web mining Web mining utilize data mining methods to induce and extract useful information from web data and service. Web content mining Web structure mining Web usage mining

Web Content mining: 

Web Content mining Web content mining tries to discover valuable information from web contents (i.e. web documents or web pages), alternatively termed as text mining sometimes

Web structure mining: 

Web structure mining Web structure mining involves modeling web site in terms of link structures. The mutual linkage information obtained could be used to cluster the web pages or find relevant pages based on the similarity or relevance between different web pages.

Web usage mining: 

Web usage mining Web usage mining tries to reveal the underlying access patterns from web user transactions User pattern can be utilized to recommend or personalize web contents to users. Capturing the web user access interest can help for better understanding user navigational behavior, and for improving web site structure design.

Slide15: 

Web usage mining is utilized to make recommendation: Firstly, extract the informative knowledge from web log files and identify the underlying user functional interests; Secondly, create user profiles for representing common user navigational behavior based on observation usage data; Finally, present the desired web content in personalized style to user by matching the current active user session with the discovered patterns.

Web recommendation: 

Web recommendation Web recommendation is a process that recommends customized web content to users according to their specific preference. Two common approaches: content-based filtering and collaborative filtering.

Web recommendation: 

Web recommendation Content-based filtering systems, such as WebWatcher and client-side agent Letizia generally generate recommendation based on the pre-constructed user profiles by measuring the similarity of web content to these profiles Collaborative filtering systems make recommendation by utilizing the rating of current user for objects via other users’ preference.

Problem with Web Searching: 

Problem with Web Searching Current Web search tools such as Yahoo!, AltaVista, Google, … return more information than those required Example: search “Java Programming”, Google returns 1,330,000 (more than 1 million) Web pages, AltaVista returns 16,921,862 (more than 16 million) Web pages. Users only concern about a small and interesting portion of the returned search results. Necessary to build web communities.

Web Community: 

Web Community A web community is defined as a set of web pages that link to more web pages within the set than to web pages outside the community (Flake/Lawrence/Giles 2000) Hyperlink analysis if a web page A has a hyperlink to page B, then the author of page A usually considers that page B contains valuable information that is related to page A; hyperlinks convey a considerable amount of latent human judgment;

3.1 Constructing Web Community: 

Given a query/topic, a web community consists of two distinct, but interrelated, types of pages: authority pages (authorities) and hub pages (hubs). 3.1 Constructing Web Community Hubs Authorities Figure 2. A skeletal web community

Slide21: 

Construct Good Quality Web Page Communities World Wide Web Concerned Info. Space Web Page Community Hubs Authorities Search Engines ? ?

Slide22: 

Kleinberg’s HITS Algorithm (Hypertext Induced Topic Search) Concerned Info. Space Calculate Authority/Hub values Web Page Community R : root set of pages B : base set of pages R construction: collecting r highest-ranked pages from the search results. B construction: adding to R more pages that are pointed by or pointing to the pages in R.

HITS Algorithm: 

HITS Algorithm Hub and authority value computation "p  q" denotes "page p has a hyperlink to page q". Normalize the a and h vectors after each iteration. Improved HITS algorithm (Bharat and Henzinger, 1998) ,

Slide24: 

Problems with various HITS algorithms Concerned Info. Space Community Construction Algorithms Web Page Community R : root set of pages B : base set of pages R construction: collecting r highest-ranked pages from the search results. B construction: adding to R more pages that are pointed by or pointing to the pages in R. Topic Drift Problem Noise pages

Example: 

Example Noise page --- contains no query terms Experiment Query Term: Harvard Search Engine: AltaVista Size of Base Set: 8064 Size of Root Set: 200

Slide26: 

Ten arbitrarily selected noise pages

Slide27: 

Ten arbitrarily selected topic-related pages

Slide28: 

Authorities & Hubs (by HITS algorithm)

Slide29: 

Matrix Approach & Noise Page Elimination Concerned Info. Space Community Construction Algorithms Good Quality Web Page Community Noise Page Elimination Algorithms !

Hyperlink Analysis via SVD: 

Hyperlink hyperlink matrix Hyperlink matrix singular value decomposition getting intrinsic relationships between the pages Hyperlink Analysis via SVD

SVD of a Matrix: 

SVD of a Matrix Definition of SVD then where

Slide32: 

Approximation Property is the best approximation to A with rank k ( ).

Noise Page Elimination Algorithm: 

Noise Page Elimination Algorithm Main Procedure of the Algorithm Hyperlinks Matrix A SVD of A Approximation Matrix Ak Ak captures main structure information of A Ak filters minor factors in A For a proper k, computing cost can be reduced

Slide34: 

Authorities & Hubs after Elimination (NPEA and HITS algorithm)

3.2 Finding Related Web Pages: 

Given a URL or web page, find a set of pages that address the same topic of the original page. Co-citation algorithm: the sibling pages of the given URL u that have the most number of common parent pages with u are the related pages. Given a query, with HITS and Improved HITS algorithms: authority pages with the highest authority values are related pages 3.2 Finding Related Web Pages

Slide36: 

Finding related pages Co-Citation Algorithm Given page (URL) u, page source S is constructed in the following way: chooses up to B (e.g. 2000) arbitrary parents of u; for each of these parents p, adds to S up to BF (e.g. 8) children of p that surround the link from p to u. The elements of S are siblings. for each s in S, co-citation degree of s and u is determined; the algorithm returns the 10 pages that have the highest co-citation degrees with u as the relevant pages.

Slide37: 

Extended Co-Citation Algorithm Given page u, Choose up to B parent pages of u, and add up to BF children pages (different from u) of each parent page. choose up to F children pages of u, and add up to FB parent pages (diff from u) of each child page. B, F, BF and FB are used to keep the page source to a reasonable size. In practice, choose B = FB = 200, F = BF = 40.

3.3 Web Page Clustering: 

With a set of pages, the key is to find a similarity between the pages Text content based algorithms Regard a web page as a set of words or phrases; Each page is represented as a feature vector; The similarity is measured as where wik is the k-th component of the feature vector of the i-th page, and L is the length of the feature vectors. 3.3 Web Page Clustering

Slide39: 

Hyperlink based algorithms Linkage vector algorithm Each page P is represented as two linkage vectors: Pout and Pin ; The similarity is defined as : dot product of vectors Pout and Qout .

Web Page Clustering : 

Page Source Construction Given a user’s query topic, page source S = R  BV  FV with hyperlinks. Web Page Clustering

Slide41: 

Page Similarity Assume: size(R) = m, size(V) = n, V = BV  FV Hyperlinks Adjacency Matrix C

Slide42: 

(Out-link relationship of page i in R) (In-link relationship of page i in R) Out-link similarity of pages i and j In-link similarity of pages i and j

Slide43: 

Similarity between any two pages i and j in R: Similarity Matrix SM = (smi,j)mm , where MODij = || rowi || + || rowj || + || coli || + || colj ||

Slide44: 

Matrix-Based Clustering Algorithms Three main steps: Similarity Matrix Permutation Purpose: put closely related pages together in the similarity matrix SM. Matrix Partition Purpose: cluster pages at one level with the partition. Hierarchical Clustering Recursively apply the above two steps to the existing clusters.

Slide45: 

Matrix-based hierarchical clustering diagram: Similarity Matrix mm Page Source SM CL1 CL2 Matrix Partition Hierarchical Clusters Permuted Similarity Matrix mm SM’

Slide46: 

An example of matrix permutation and partition 1 0 0 0 .47 0 .6 0 0 0 1 0 0 0 0 .49 0 0 0 0 1 0 0 0 0 .7 .66 0 0 0 1 0 0 0 0 .42 .47 0 0 0 1 0 .9 0 0 0 0 0 0 0 1 0 .92 .45 .6 .49 0 0 .9 0 1 0 0 0 0 .7 0 0 .92 0 1 0 0 0 .66 .42 0 .45 0 0 1 P1 P2 P3 P4 P5 P6 P7 P8 P9 P1 P2 P3 P4 P5 P6 P7 P8 P9 (a) A similarity matrix (b) Permuted matrix and partition 1 .47 .6 0 0 0 0 0 0 .47 1 .9 0 0 0 0 0 0 .6 .9 1 .49 0 0 0 0 0 0 0 .49 1 0 0 0 0 0 0 0 0 0 1 .42 0 0 0 0 0 0 0 .42 1 .66 0 .45 0 0 0 0 0 .66 1 .7 0 0 0 0 0 0 0 .7 1 .92 0 0 0 0 0 .45 0 .92 1 D P1 P5 P7 P2 P4 P9 P3 P8 P6 P1 P5 P7 P2 P4 P9 P3 P8 P6 Cluster1 = { P1 , P5 , P7 , P2 } , Cluster2 = { P4 , P9 , P3 , P8 , P6 }

Slide47: 

Examples of Some Major Clusters

4. Discovering User Access Pattern via Probabilistic Latent Factor Model: 

4. Discovering User Access Pattern via Probabilistic Latent Factor Model Three steps are included in web usage mining Data pre-processing: selecting specific information from the web log. Pattern mining: discovering general patterns. Pattern applications: utilizing the mined pattern for providing better recommendation / personalization

Web usage mining: 

Web usage mining

Web data model: 

Web data model Usage data Web page identification Web user sessionization Linkage data Hyperlink and hyperlink transitivity Integrating linkage data into usage data Apply PLFM

Web data model: 

Web data model A data model for analyzing web hyperlink information and web usage data. user access interest may be reflected by the varying degree of visits to different web pages during a session. Mutual linking reflects the content relevance provided by the designer. … … … A skeletal Web data model

Usage data: 

Usage data Page identification P = {p1, p2,…,pn} User sessionization S = {s1, s2, …. , sm} T Matrix expression considered as an n-dimensional page vector: si = {ai1, ai2…, ain}, where aij denotes the weight for pageview pj in si user session Session-page matrix: SPmn

Matrix expression Entry (weight) is determined by visiting duration or hit numbers: 

Matrix expression Entry (weight) is determined by visiting duration or hit numbers

Linkage data: 

Linkage data linkage can reveal semantic information between web pages If a hyperlink is reasonable, it may reveal mutual semantic relationship among the web pages If two users visited two web pages linked to each other, they exhibit some similar interests

Linkage data (cont.): 

Linkage data (cont.) Definition 1: If there is a direct link from page A to page B, then the length of path from page A to page B is 1, denoted as l(A,B) = 1. If page A has a link to page B via n other pages, then l(A,B) = n+1. The distance from page A to page B, denoted as sl(A,B), is the shortest path length from A to B, i.e. sl(A,B) = min(l(A,B)). Definition 2: Correlation factor, denoted as F, 0<F<1, is a constant that measures the correlation coefficient between two page with direct link, i.e. if page A has a direct link to page B, then the correlation rate from page A to page B is F. Definition 3: The correlation degree from page i to page j, denoted as cij, is defined as: cij =Fsl(i,j)

Usage data and linkage data integration: 

Usage data and linkage data integration By multiplying the session-pageview matrix with correlation matrix to generate hyperlink-enhanced session An simplest example, s1=(1,0) and s2=(0,1), and The similarity between them is 0.45 instead of 0 on cosine function when F=0.5

An example: 

An example 1) Main Movies: 20sec Movies News: 15sec NewsBox: 43sec Box-Office Evita: 52sec News Argentina:31 sec Evita: 44sec 2) Music Box: llsec Box-Office Crucible: 12sec Crucible Book: 13sec Books: 19sec 3) Main Movies: 33sec Movies Box: 21sec Boxoffice Evita: 44sec News Box: 53sec Box-office Evita: 61 sec Evita : 31sec 4) Main Movies: 19sec Movies News: 21sec News box: 38sec Box-Office Evita:61 sec News Evita:24sec Evita News: 31 sec News Argentina: 19sec Evita: 39sec 5) Movies Box: 32sec Box-Office News: 17sec News Jordan: 64sec Box-Office Evita: 19sec Evita: 50sec 6) Main Box: 17sec Box-Office Evita: 33sec News Box: 41 sec Box-Office Evita: 54sec Evita News: 56sec News: 47sec

Probabilistic latent semantic analysis model (PLSA) : 

Probabilistic latent semantic analysis model (PLSA) The core of PLSA is an aspect model, which can be used to identify the hidden semantic relationships among web pages and users. Aspect model assumes that there is a latent factor variable zk ∈ Z = {z1, z2, · · · , zl} associated with co-occurrence observation data, e.g. page-session data. The degree of relationships “explained” by each factor, is derived from the conditional probabilities associated with factors.

Probabilities definitions: 

Probabilities definitions P(si) denotes the probability that a particular user session si will be observed in the occurrence data P(zk|si) denotes a user session-specific probability distribution on the unobserved class factor zk explained above, P(pj|zk) denotes the class-conditional probability distribution of pages over a specific latent variable zk.

Probabilistic latent semantic model: 

Probabilistic latent semantic model Select a user session si with probability P(si), Pick a hidden factor zk with probability P(zk|si), Generate a page pj with probability P(pj|zk);

Generated conditional-probabilities: 

Generated conditional-probabilities

Slide62: 

Goal: maximize Li where re-parameterized version total likelihood Li as Formulation of PLSA

Expectation & Maximization (EM) algorithm : 

Expectation & Maximization (EM) algorithm Firstly, given the randomized initial values of P(zk), P(si|zk), P(pj|zk) Maximization (M) step: Expectation (E) step :

Implementation of EM algorithm: 

Implementation of EM algorithm The E-step and M-step are iterating until a local optimal limit of Li is approached. Then we can obtain the conditional probability estimates, P(zk), P(si|zk), P(pj|zk), corresponding to local maximum likelihood Li , Then, they will be used to perform web clustering and induce user access pattern.

User session clustering and user profile generating algorithm : 

User session clustering and user profile generating algorithm For each user session ui, we can compute a set of probabilities P(zk|si) according to different latent class factors. Algorithm 1 Clustering user session Input: P(zk|si), user session-page matrix SPij, threshold μ. Output: A set of clusters SCL=(SCL1,SCL2,…, SCLK) Begin Step 1: SCL1=SCL2=…=SCLk=Φ Step 2: For each si ∈ S, select P(zk|si), if P(zk|si)≥μ, then SCLk=SCLk∪si Step 3: If there are still users sessions to be clustered, go back to step 2 Step 4: Return clusters SCL={SCLk}

User profile generating: 

User profile generating Algorithm 2 generating user profiles Input: session cluster set SCL={SCLk} Output: user profiles PF= Step 1: for each factor zk, choose all candidate sessions in SCLk Step 2: represent each session as a pageview vector and compute their centroid pageview vector as: where |R| denotes the total number of session in the cluster Step 3: if there are still user session clusters not to be processed, go back to step1 Step 4; output the centroid pageview vector as the aggregated user profile corresponding to each factor zk

Charactering factors: 

Charactering factors Algorithm 3 clustering web pages 1. Input: P(pj|zk), predefined threshold μ 2. For each zk, choose all pages with P(pj|zk) ≥ μ, construct PCLk={pj| P(pj|zk) ≥μ} 3. If there are still pages to be classified, go back to step 2 4. Output: PCL={PCLk}

experimental evaluation : 

experimental evaluation Real world data sets The first data set is downloaded from KDDCUP website (www.ecn.purdue.edu/KDDCUP/), including 9308 user sessions and 69 pageviews, where every session consists of 11.88 pageviews in average. The second data set is downloaded from msnbc.com (http://kdd.ics.uci.edu/databases/), which describes the page visits by users who visited msnbc.com on September 28, 1999.

Experimental results : 

Experimental results Table 1: factor examples from msnbc data set Table 2: factor examples from KDDCUP data set

Explanations of results: 

Explanations of results In table 1 The two extracted factors indicate that factor #1 is associated with all kinds of local information came from miscellaneous information channel such as bbs, while factor #2 reflects the interests or opinions which are often linked with health, sport as well as technical development in physical exercise.

Explanations of results: 

Explanations of results In table 2 factor #3 indicates the concerns about vendor service message such as customer service, contact number, payment methods as well as delivery support. The factor #8 describes the specific progress which may include customer login, product order, express checkout and financial information input such steps occurred in internet shopping scenario

Table 3: User access profiles from msnbc data set: 

Table 3: User access profiles from msnbc data set

Explanations of results: 

Explanations of results In table 3 user profile #1 reveals that this group of users has interests in local and miscellaneous information, while user profile #2 captures the common intention of such users who want to share all kinds of opinions with regard to health, sport and technology issues. Such interpretations of the generated user profiles are in accordance with meaning of the unobserved factors discovered by the above mentioned.

Table 4: User access profiles from KDDCUP data set : 

Table 4: User access profiles from KDDCUP data set

Explanations of results: 

Explanations of results In table 4 user profile #8 represents the internet-shopping activities in details, especially occurring in purchasing leg-wear products or fashion clothes, whereas user profile #14 reflects one kind of customers’ concern exhibited during their visiting website, which is mainly focused on the information with regard to the department store itself.

Discovery of Latent Factor and User Profile: 

Discovery of Latent Factor and User Profile Algorithm 4 Characterizing latent factor Input: P(pj|zk) and P(zk|pj), predefined threshold μ Output: A set of characteristic page base sets LF = (LF1, LF2, …, LFk) 1. LF1 =LF2=···= LFk=Φ 2. For each zk, choose all pages pj∈P If P(pj|zk) ≥ μ and P(zk|pj) ≥ μ then LFk=LFk∪pj Else go back to step 2 3. If there are still pages to be classified, go back to step 2 4. Output: LF={LFk}

Clustering Web Sessions : 

Clustering Web Sessions Algorithm 5 Clustering Web Session (i) Input: the set of web sessions, predefined threshold μ Output: A set of user session clusters 1. Select the first session s1 as the initial cluster SCL1 and the centroid of this cluster: SCL1={s1} and Cid1=s1. For each session , measure the similarity with the centroid of each existing cluster sim(si,Cidj) If , then insert into the cluster and update the centroid of as

Clustering Web Session (cont.): 

Clustering Web Session (cont.) Algorithm 5 Clustering Web Session (ii) where srj is the transformed user session over factor space, |SCLt| is the number of sessions in the cluster. Otherwise, will create a new cluster and is the centroid of the new cluster. If there are still sessions to be classified into one of existing clusters or a session that itself is a cluster, go back to step 2 iteratively until it converges (i.e. all clusters’ centroid are no longer changed) Output SCL={SCLp}, Cid={Cidp}

Web Recommendation: 

Web Recommendation

Web Recommendation: 

Web Recommendation

Experimental Results : 

Experimental Results Real world data sets The data set is downloaded from KDDCUP including 9308 user sessions and 69 pageviews, (KDD dataset). The second data set is a 2-week web log file from university website containing 13745 sessions and 683 pages (CTI dataset).

Evaluation Metrics : 

Evaluation Metrics For Web Session Cluster Weighted Average Visit Percentage (WAVP): the likelihood that a user session, which contains any pages in the session cluster, will include the rest pages in the cluster during the same session

Evaluation Metrics: 

Evaluation Metrics For Web Recommendation hit precision (hitp) where S(N) and |T| denote the number of pages allocated in top-N recommended set and test set size respectively Baseline method: clustering-based method [5]

Table 1: Titles of factors from KDDCUP : 

Table 1: Titles of factors from KDDCUP

Table 2: Titles of factors from CTI: 

Table 2: Titles of factors from CTI

Analysis on the results: 

Analysis on the results From these tables, it is shown the titles of latent factors are characterized by some “dominant” pages whose probabilistic weights are exceeding one predefined threshold. This work is done by interpreting the content of corresponding pages since these “dominant” pages contribute greatly to the latent factors. With the derived characteristic factors, we may semantically discover usage-based task pattern.

Experimental Results (Fig.1:WAVP): 

Experimental Results (Fig.1:WAVP)

Experimental Result (Fig.2: hitp): 

Experimental Result (Fig.2: hitp)

Result analysis : 

Result analysis Figure 1 and 2 depict the results of WAVP and hitp. The results demonstrate that the proposed technique overweighs conventional algorithm in terms of clustering quality and recommending accuracy. Moreover, this approach is able to identify the hidden factors associated with task pattern

Y Zhang, J X Yu and J. Hou, Web Communities: Analysis ad Construction, Springer 2006: 

Y Zhang, J X Yu and J. Hou, Web Communities: Analysis ad Construction, Springer 2006 Introduction Preliminaries HITS and Related Algorithms PageRank Related Algorithms Affinity and Co-Citation Analysis Approaches Building a Web Community Web Community Related Techniques ........ Conclusions http://e-research.csm.vu.edu.au/books/web-community/index.php

e-Research @ Victoria University Prof. Yanchun Zhang : 

e-Research @ Victoria University Prof. Yanchun Zhang

Innovative e-Research: 

Innovative e-Research e-Research has become extremely important research direction e-Research is an application driven and multidisciplinary research e-Research projects include: e-Business, e-Health, e-Learning, e-Law, e-Water, etc.

Current E-Research at VU: 

Current E-Research at VU E-Research activities at VU: E-Water E-Health E-Law E-Tourism E-Learning Back E-Water E-Health E-Law E-Tourism E-Learning Skip

Innovative e-Research: 

Innovative e-Research

E-Health with WHO : 

E-Health with WHO Collaborate with the World Health Organization (WHO) Develop a joint project on the healthcare data collection, monitoring and risk factor analysing Apply transparent computing, heterogeneous system integration to e-Health

Outline of WHO eSTEPS, EPi-Survey/EPi-SMART: 

Outline of WHO eSTEPS, EPi-Survey/EPi-SMART The WHO STEPwise approach to surveillance (STEPS) is the WHO recommended surveillance tool The tool is used to monitor analysis of: ∆ chronic disease risk factors and ∆ chronic disease-specific morbidity and mortality.

Outline of WHO eSTEPS/EPiSurvey: 

Outline of WHO eSTEPS/EPiSurvey provides an entry point for low and middle income countries to get started on chronic diseases surveillance activities. is also designed to help countries build and strengthen their capacity to conduct surveillance.

The Design of EPiSurvey: 

The Design of EPiSurvey employs latest web technique to develop a web-based questionnaire designer is about to run on Pocket PCs also runs on other hardware/operational system platforms such as PalmOS etc. provides additional analysis, calculation functionalities and reliability on PDA applications.

The Architecture of EPiSurvey: 

The Architecture of EPiSurvey

The Architecture of EPiSurvey: 

The Architecture of EPiSurvey The architecture of EPiSurvey consists of four components: ∆ Front-end Pocket PC component ∆ Web Questionnaire Designer ∆ W-eSTEPS Manager ∆ Web Data Management

e Health : 

e Health Cluster Analysis of Gait Patterns for Detecting Risk of Falling Issue: Australia is facing an ageing population. Falls and related injuries in the elderly is a major public health issue (costs ~$2.4 billion pa to Australia). Little is done so far for the automated detection of potential fallers. Solution: Gait or walking pattern analysis can detect abnormalities and evaluate walking performance. Cluster analysis of gait features will be used to identify potential fallers and to assess any gait function improvement due to various intervention (e.g., exercise program). Impact: This research will develop an automated diagnostic system to screen potential fallers from gait/balance measures so that steps can be taken to prevent falls. This model can also be applied to other pathologies (e.g., stroke). It will strengthen research collaboration with other universities, NARI, nursing homes, etc to develop an innovative way of preventing falls in the elderly. Back E-Water E-Health E-Law E-Tourism E-Learning Skip

e-Water: Three Stage Project : 

e-Water: Three Stage Project The challenging issues involve semantic and heterogeneous data integration, metadata extraction and schema transformation, modeling and model interoperability in water management domain as well as accessibility and security. Stage 1: integration of data collected in individual cases, such as Queensland Healthy Waterways Partnership, the Yellow River and Erdos Basin (China), and make them available to research collaborators; Stage 2: accessibility of data through the internet to enable closer collaboration; and Stage 3: binding of models, using the existing modelling environment to form the basis for further studies on application to a wider set of data and modelling environment.

e Law: 

e Law Peer-to-Peer collaborative research network for sharing and managing digital legal information Issue: p2p network, one of most popular Internet applications, the open and anonymous nature of p2p networks results in a complete lack of accountability for any content uploaded onto the network widespread illegal distribution of the copyright protected works and ‘polluted’ with inauthentic files opening the door to abuses by malicious peers. Solution: The collaboration between legal and research professionals to create a p2p research network: enables effective control over the upload contents issues a range of “machine or network readable” licenses facilitate open exchange of digital legal information Impact: p2p network contribute to the advancement of knowledge and the development in non-traditional applications pave a way to overcome legal barriers for p2p network to share resources expand significantly in the scope of e-research for sharing and managing digital legal information This work is supported by ARC e-Research Scheme. Back E-Water E-Health E-Law E-Tourism E-Learning Skip

Slide104: 

What rights do you want to grant ? Reproduce & Modify Reproduce Only Personal Use Only Do you want Modifications to be licensed on the same terms ? Do you want to restrict the use of the work ? NO YES Do you want to restrict the use of the work ? No restriction Non-Commercial Use Only Do you want to restrict the use of the work ? No restriction Non-Commercial Use Only Educational Use Only I No restriction Non-Commercial Use Only Educational Use Only C D E A B F G H

Abstraction Levels: 

JXTA Services Abstraction Levels Conceptual Level Petri Net Model for Conceptual Level Document Document Licence Services Modified Document Modifiable Document Reproducible Document Personal Use Only Document Document Modification Licensed Document Requirements Collaboration Levels Abstraction Levels