logging in or signing up swkeg research activate 3 July 2006 GenX Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 76 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: March 21, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Semantic Web and Web Services in KEG: Semantic Web and Web Services in KEG Jie Tang Knowledge Engineering Group, Department of Computer Science and Technology Tsinghua University July 4, 2006Outline: Knowledge Engineering Group (KEG) Our Research Work Ontology Development Semantic Web Availability Domain Knowledge Management Web Service Prototype Personal Network Search, SWARMS, and SEWSIP Publications OutlineKnowledge Engineering Group (KEG): Knowledge Engineering Group (KEG) Staffs Professor: Kehong Wang Associate Professors: Juanzi Li and Yueru Cai Assistant Professors: Bin Xu and Jie Tang Post Doctors: 2 Graduate Students PhD Students: 9 + 4 (graduated) Master Students: 13 Research Direction Semantic Web Web ServicesMajor Research Projects: Major Research Projects Ontology Granularity Partition in Distributed Ontology System Supported by Natural Science Foundation of China (2003-2004) Research of Domain-Specific Semantic Content Management Supported by Natural Science Foundation of China (2006-2008) CNML Specification Management System Co-operation project with China News Agency (2004-2005) Research of Service Oriented Architecture Supported by IBM (2006-2007) Advanced Semantic Web Technologies to Support Ontology based Enterprise Content Management Supported by Greece-China (2006-2007) The Intelligent Processing and Semi-structured Information System Co-operation Project of Tsinghua-ITF Co-Lab (2002-2005) GUI XML Co-operation Project of Tsinghua-ITF Co-Lab (2003-2005)Outline: Knowledge Engineering Group (KEG) Our Research Work Ontology Development Semantic Web Availability Domain Knowledge Management Web Service Prototype Personal Network Search, SWARMS, and SEWSIP Publications OutlineOur Research Work: Our Research Work Ontology Development CNML (Recommended as the National Standard for News Exchange) Semantic Web Availability Semantic Annotation Ontology Mapping Domain Knowledge Management Search & Association Finding Caching Visualization & Navigation Semantic Web Service Service Annotator Web Service Evaluation Semantic based Service IntegrationOntology Development: Ontology Development Chinese News Markup Language (CNML) For sharing and managing the News in Chinese Travel Ontology For exchanging travel information http://www.luopan.com Software Ontology For managing softwaresOur Research Work: Our Research Work Ontology Development CNML (Recommended as the National Standard of News Exchange) Semantic Web Availability Semantic Annotation Ontology Mapping Domain Knowledge Management Search & Association Finding Caching Visualization & Navigation Semantic Web Service Service Annotator Web Service Evaluation Semantic based Service IntegrationSemantic Web Content Availability: Semantic Web Content Availability Semantic web content: machine ‘understandable’ content As one of the major challenges for semantic web Annotating the existing web pages using the ontological information Integrating/exchanging heterogeneous structured informationProblems: Information of almost all types exist. Semantic annotation is different in different domains The information is noisy Much of the information is semi-structured The information structure is heterogeneous The data representation is different The amount of information is huge The information is redundant … ProblemsOur Proposal: Our Proposal Our proposal is to employ machine learning for semantic annotation and ontology mapping Semantic Annotation iASA: Learning rules for semantic annotation Email annotation using classification based method Tree-structured Conditional Random Fields for sequential annotation Ontology Mapping Risk Minimization based Ontology MappingiASA: Learning Annotation Rules: iASA: Learning Annotation Rules Existing semantic annotation systems usually employ manual or rule based methods Manual annotation is tedious, time consumption, and error prone. Conventional rule learning algorithms adopt an random strategy of rule selection in the learning. We propose a rule learning based method that can select the ‘most’ similar rules for induction in the learning process Automatically determining the context size Finding similar rules Funded by Tsinghua-ITF Co-Lab (2002-2005) and by NSFC (2003-2004) A paper was published on Journal on Data Semantics [Tang, 05]Applications -- TIPSI: Applications -- TIPSI TIPSI Extracting information from company annual reports A practical project supported by ITF Frontier Co. Main Features Table identification and table reconstruction A paper was accepted by APWEB’06 [Li and Tang, 06] Two-stage annotation: Logic structure construction and semantic annotation Rule learning Semantic search Friendly user interactionAnnotating Company Annual Report: Annotating Company Annual Report 上海上菱电器股份有限公司 一九九九年年度报告 重要提示:本公司董事会保证本报告所载资料不存在任何虚假记载、误导性陈述或者重大遗漏,并对其内容的真实性、准确性和完整性负个别及连带责任。 一、公司简介 1、公司法定中文名称:上海上菱电器股份有限公司 公司法定英文名称:SHANGHAI SHANGLING ELECTRIC APPLIANCES CO.,LTD. 公司英文缩写:SLEC 2、公司法定代表人:夏毓灼 3、公司董事会秘书:曹俊 联系地址:上海市浦东新区建平路2号 电话:(021)58857888(总机)-2273 传真:(021)58857367 电子信箱:sldsh@shangling.com 五 董事会报告 (一)公司经营情况 1、 公司所在的行业为橡胶加工业所属的轮胎制造业。公司是原化工部"九五"期间规划的全国四大轮胎生产基地之一,属行业内重点企业。 2、 公司主营业务的范围及其经营状况 (1)、报告期内公司实现主营业务收入705,190,775元,比去年同期下降24.84%,实现主营业务利润125,097,141元,比去年同期下降36.87%。公司主营业务收入、主营业务利润全部来自轮胎的生产和销售。 (2)、从产品结构分析,52.8%的主营业务收入和49.7%的主营业务利润来自汽车轮胎。 项 目 1999年(元) 1998年(元) 增 减(%) 总资产 1,877,855,011 1,303,317,456 44.08 长期负债 78,600,000 65,480,000 20.04 股东权益 797,710,985 423,623,203 88.31 主营业务利润 125,097,141 198,149,514 -36.87 净利润 21,654,518 58,788,065 -63.17 Search in TIPSI: Search in TIPSISemantic Annotation on Email: Semantic Annotation on Email The goal is to identify and annotate different kinds of information from emails E.g. Header, Signature, Forwarded message, and Program code, etc. Email is one of the most common modes of communication. Many text mining applications on emails Email classification Email summarization Term extraction from email Part of the work was finished at MSRA We consider one application of semantic annotation on email as Email Data Cleaning A paper was published at SIGKDD’2005 [Tang, 05]Email Annotation: Email Annotation Our Cascaded Approach: Our Cascaded Approach Implementation -- Technical Issues: Implementation -- Technical Issues Block Annotation Forwarded message detection Header detection Signature detection List detection Program code detection Block Metadata Annotation Header metadata detection Signature metadata detection Extra line break detection VS.Block Detection Using SVMs: Block Detection Using SVMs Header detection Signature detection Program code detectionSignature Detection: Signature Detection Tree Structured CRFs for Hierarchical Annotation: Tree Structured CRFs for Hierarchical Annotation The rule and classification based methods can not model dependencies between information Sequential labeling models HMM, MEMM, and CRFs Previous linear-chain models cannot handle non-linear features Hierarchical dependencies Document structure HTML, XMLHierarchical Semantic Annotation: Hierarchical Semantic Annotation 3. Company Directorate Info Company directorate secretary: Haokui Zhou Representative of directorate: He Zhang Address: No. 583-14, Road Linling, Shanghai, China Zipcode: 200030 Email: ajcoob@mail2.online.sh.cn Phone: 021-64396600 Fax: 021-64392118 4. Company Registration Info Company registration address: No. 838, Road Zhang Yang, Shanghai, China Zipcode: 200122 Company office address: No. 583-14, Road Linling, Shanghai, China Zipcode: 200030 Email: ajcorp@online.sh.cn Phone: 021-64396654 Dependency DependencyTree-structured CRFs: Tree-structured CRFs Linear CRFs. Can only model linear dependency TCRFs. Can model hierarchical dependency Vs.TCRF Model: TCRF Model Ontology Mapping: Ontology Mapping Ontology mapping is the task of finding semantic correspondences between elements of two ontologies It is needed in many applications Integration of web data sources XML message mapping Query across different data sources The project is supported by NSFC Published papers: WWW’05 [Tang, 2005] and Journal of Web Semantics [Tang, 2006]Problem Description: Problem DescriptionOur Approach: Our Approach Formalizing ontology mapping as that of loss minimization Conducting ontology mapping by running several passes of processing Candidate collection Multi-strategy execution Strategy combination Mapping discovery Mapping process can take place iteratively. In each iteration, user interaction is supported.Mapping Processing Flow: Mapping Processing Flow Conducting mapping in following steps Our Research Work: Our Research Work Ontology Development CNML (Recommended as the National Standard of News Exchange) Semantic Web Availability Semantic Annotation Ontology Mapping Domain Knowledge Management Search & Association Finding Caching Visualization & Navigation Semantic Web Service Service Annotator Web Service Evaluation Semantic based Service IntegrationDomain Knowledge Management: Domain Knowledge Management Domain Knowledge Management Semantic Search and Association Finding Caching Visualization and Navigation Supported by Chinese Xinhua News Agency and NSFC Papers was published at WWW’05 [Liang, 05], Workshop of AAAI’05 [Liang, 06], APWEB’06 [Zhang, 06]Semantic Search: Semantic SearchVisualization-Navigation: Visualization-NavigationVisualization-Search: Visualization-Search Semantic Search: Semantic Search Index-based Search Using keywords to retrieve instances Constraint-based Search Setting constraints on properties and instances in the search Association Search Identifying two instances and finding their possible associationsAssociation Search: Association Search Searching for relationships between two entities For example: Input “Jack” and “Tom” “Jack” <reads> “document” <writtenby> “Tom” Input “Jack” and “book1” “Jack” <hasfriend> “Tom” <writes> “book1” A paper was published as WWW’05 poster [Liang, 05] A paper was published at APWEB’06 [Zhang, 06]Association Search: Association Search Keyword1 keyword2 Search from the centers of the two sub graphs Ranked path: Developer1 <develop> Project_11 <developedBy> Developer2 score: 0.64 Developer1 <develop> Project_11 <developedBy> Developer3 <develop> Project _21 <developedBy> Developer2 score: 0.43 … … l lOntology Caching: Ontology Caching The task is aimed at caching the concepts/ instances that are frequently accessed Existing Methods FIFO (First In First Out) LUFO (Least used First Out) Relation based method Our method Formalizing the problem as classification Features: instance similarity, hierarchical similarity, property similarity, access log, etc. A paper was published at APWEB’06 [Liang, 06]Visualization and Navigation: Visualization and Navigation A graph is used to model the domain data We propose using a strategy of “Focus+Context” to visualize the domain data Using “one click” mode to navigate in the visualized data Constraint based navigation Specifying the properties’ values to navigate to the matched instances Our Research Work: Our Research Work Ontology Development CNML (Recommended as the National Standard of News Exchange) Semantic Web Availability Semantic Annotation Ontology Mapping Domain Knowledge Management Search & Association Finding Caching Visualization & Navigation Semantic Web Service Service Annotator Web Service Evaluation Semantic based Service IntegrationWSDL Data Crawler and Annotator: WSDL Data Crawler and Annotator Crawler: finds the web services on Internet and filter the services of a specific domain Service Evaluation: Service EvaluationService Deployment and Discovery in P2P: Service Deployment and Discovery in P2P Rough matching based on keywords Precious matching based on semantic DescriptionSemantic based Service Integration: Semantic based Service IntegrationOutline: Knowledge Engineering Group (KEG) Our Research Work Ontology Development Semantic Web Availability Domain Knowledge Management Web Service Prototype Personal Network Search, SWARMS, and SEWSIP Publications OutlinePersonal Network Search: Personal Network Search For finding person…Personal Network Search: Personal Network Search IDC white paper entitled The High Cost of Not Finding Information reports Information workers spent from 15% to 35% of their time on searching for info. About 40% of information workers complain that they cannot find the information they need to do their job The existing system, such as Google、Yahoo、MSN Search, etc. have been developed for average use The average search method cannot satisfy the requirements for domain specific search, e.g. personal network searchPersonal Network Search: Personal Network Search Personal network search is an important research area. A person usually has different types of information Personal profile (including portrait, homepage, position, affiliation, publications, and documents) Contact information (including address, email, telephone, and fax number) Friends Unfortunately, the information is often hidden in heterogeneous and distributed web pagesOur Approach: Our Approach Personal Network Search = Semantic Mining + Semantic Search Collecting relevant docs Semantic annotation Semantic integration Semantic mining Visualization Processing Flow: Processing Flow Submitted to Returned pages Classifier Fed to Extracting and saving to Database QuerySnapshots: Snapshots SWARMS: SWARMS For managing semantic content…SWARMS: SWARMS Designing as a tool for domain knowledge exploration Integrating the following research results: Visualization and navigation Search Caching Design Goals Easy to adapt to the other domains Manage large scale domain data Implementation: Implementation Backend Storage Database + Jena Index-based Search Lucene + Database Constraint-based Search, Association Search SPARQL based Visualization and Navigation By extending JUNGArchitecture: Architecture Personal Search Database Jena based FOAF API FOAF repository Lucene Index-based repository Index-based Search Visualization and navigation Constraint based and Association Search User interface Snapshots of SWARMS: Snapshots of SWARMS Outline: Knowledge Engineering Group (KEG) Our Research Work Ontology Development Semantic Web Availability Domain Knowledge Management Web Service Prototype Personal Network Search, SWARMS, and SEWSIP Publications OutlinePublications: Jie Tang, Juan-Zi Li, Hong-Jun Lu, Bang-Yong Liang, and Ke-Hong Wang. iASA: Learning to Annotate the Semantic Web. Journal on Data Semantic, IV. Springer Press, 2005. pp110-145. Jie Tang, Hang Li, Yunbo Cao, and Zhaohui Tang. Email Data Cleaning. SIGKDD2005. August 21-24, 2005, Chicago, Illinois, USA. Full paper. pp489-499 Jie Tang, Bang-Yong Liang, and Juan-Zi Li. Multiple Strategies Detection in Ontology Mapping. Poster Paper. The 14th International World Wide Web Conference (WWW2005). Jie Tang, Juan-Zi Li, Bang-Yong Liang, Xiao-Tong Huang, Yi Li, and Ke-Hong Wang. Using Bayesian Decision for Ontology Mapping. Journal of Web Semantics. (Accepted) Jie Tang, Bang-Yong Liang, and Juan-Zi Li. Toward Detecting Strategies for Ontology Mapping. Workshop on Semantic Computing in the 14th International World Wide Web Conference (WWW2005). Jie Tang, Juan-Zi Li, Ke-Hong Wang, and Yue-Ru Cai. Multi-senses and Multi-Dependencies Discovery Among Words. NLP-KE2003 (2003 International Conference on Natural Language Processing and Knowledge Engineering). IEEE Press. 2003, 10. Jie Tang, Juan-Zi Li, Ke-Hong Wang, and Yue-Ru Cai. Loss Minimization based Keyword Distillation. 6th Asia-Pacific Web Conference (APWeb 2004), Springer 2004, 4, LNCS 3007, ISBN 3-540-21371-6. pp572-577. Jie Tang, Bang-Yong Liang, Juan-Zi Li, and Ke-Hong Wang. Risk Minimization based Ontology Mapping. 2004 Advanced Workshop on Content Computing (AWCC). Springer-Verlag, LNCS/LNAI. PublicationsPublications: Jie Tang, Juan-Zi Li, Ke-Hong Wang, and Yue-Ru Cai. Research of Atomic and Anonymous Electronic Commerce Protocol. 9th RSFDGrC (International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing). Springer-Verlag, LNCS/LNAI 2639. 2003, 5. pp711-714 Jie Tang, Juan-Zi Li, Ke-Hong Wang, and Yue-Ru Cai. Research of Satisfied Atomic and Anonymous Electronic Commerce Protocol. GCC2003. Springer-Verlag, LNCS/LNAI. Bang-Yong Liang, Jie Tang, Juan-Zi Li, and Ke-Hong Wang. Semantic Similarity Based Ontology Cache. APWEB'06 Bang-Yong Liang, Jie Tang, and Juanzi Li. SWARMS: A Tool for Exploring Domain Knowledge. The Workshop of Contexts and Ontologies on the 20th International AAAI Conference (AAAI’05). Bang-Yong Liang, Jie Tang, and Juan-Zi Li. Association Search in Semantic Web: Search + Inference. Poster Paper. The 14th International World Wide Web Conference (WWW2005). Juanzi Li, Jie Tang, Qiang Song, and Peng Xu. Table Detection from Plain Text Using Machine Learning and Document Structure. APWEB'06 Kuo Zhang, Jie Tang, Juan-Zi Li, and Ke-Hong Wang. Feature-Correlation Based Multi-view Detection. ICCSA 2005. Lecture Notes in Computer Science (LNCS) Kuo Zhang, Jie Tang, MingCai Hong, and JuanZi Li. Weighted ontology based semantic search exploiting semantic similarity. APWEB'06 Bangyong Liang, Jie Tang, Juanzi Li, and Kehong Wang. The Research of Knowledge Correctness Checking in the Semantic web. To be appeared in Chinese Journal of CIMS(In Chinese) PublicationsPublications: Bangyong Liang, Jie Tang, Juanzi Li, and Kehong Wang. Keyword Extraction Based Peer Clustering. to appear in the third International Conference on Grid and Cooperative Computing 2004. Springer-Verlag, LNCS/LNAI. (GCC'04) Bangyong Liang, Jie Tang, Juanzi Li, and Kehong Wang. Using DAML+OIL to Enhance Search Semantic. In IEEE/WIC/ACM Web Intelligence-2004(WI'04). Bangyong Liang, Jie Tang, Juanzi Li, and Kehong Wang. Resource Association Discovery in Semantic Web. In IEEE International Conference on e-Commerce Technology for Dynamic E-Business(CEC'04EAST). Mingcai Hong, Jie Tang J, and Juanzi Li. Semantic annotation by using horizontal and vertical context information. In Proceedings of ASWC’2006. Xiao-Qin Xie, Jie Tang, Juan-Zi Li, and Ke-Hong Wang. A Component Retrieval Method based on Facet-Weight Self-Learning. 2004 Advanced Workshop on Content Computing (AWCC). Springer-Verlag, LNCS/LNAI. Kuo Zhang, Hui Xu, Jie Tang, and Juanzi Li. Keyword extraction using support vector machine. In Proceedings of WAIM’2006. Bangyong Liang, Juanzi Li, and Kehong Wang. Knowledge sharing by Grid Technology. In Proceeding of the GCC2003. Jianjun Xu, Qian Zhu, JuanZi Li, Po Zhang, and Kehong Wang. Modeling and Implementation of Unified Semantic Web Platform. In Proceeding of the IEEE/WIC/ACM 2004. PublicationsPublications: Jianjun Xu, Qian Zhu, Juanzi Li, Po Zhang, and Kehong Wang. Semantic Based Web Services Discovery. In Proceeding of the Advanced Workshop on Content Computing (AWCC), 2004. Cao Weiqi, Li Juanzi, and Wang kehong. Asynchronous Communication for complicated e-commerce applications. In Proceeding of the Third International conference on Quality Software,2003, Dallas, Texas, USA, IEEE Society Press Li Juanzi, Xu Bin, etc, Semantic based Web Services Integration System in P2P, IEEE workshop on Service Oriented System Engineering, 2005 Yang Wenjun, Li Juanzi, etc, Interactive Service Composition in SEWSIP, IEEE workshop on Service Oriented System Engineering, 2005 Draft of China News Markup Language (submitted to China standard committee) Weiqi Cao, Juanzi Li, and Kehong Wang. Hybrid Messaging of Adaptive Workflow Engine. In Proceeding of the IEEE International Conference on Robotics, Intelligent Systems and Signal Processing, 2003, ChangSha, Hunan, China. IEEE Society Press Mingqin LI, Juanzi LI, Zhendong DONG, Zuoying WANG, and Dajin LU. Building a Large Chinese Corpus Annotated with Semantic Dependency. In Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, ACL2003, Sapporo, Japan,2003, pp. 84-91. PublicationsSlide67: Homepage of KEG: http://keg.cs.tsinghua.edu.cn Personal Homepage: http://keg.cs.tsinghua.edu.cn/persons/tj/ You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
swkeg research activate 3 July 2006 GenX Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 76 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: March 21, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Semantic Web and Web Services in KEG: Semantic Web and Web Services in KEG Jie Tang Knowledge Engineering Group, Department of Computer Science and Technology Tsinghua University July 4, 2006Outline: Knowledge Engineering Group (KEG) Our Research Work Ontology Development Semantic Web Availability Domain Knowledge Management Web Service Prototype Personal Network Search, SWARMS, and SEWSIP Publications OutlineKnowledge Engineering Group (KEG): Knowledge Engineering Group (KEG) Staffs Professor: Kehong Wang Associate Professors: Juanzi Li and Yueru Cai Assistant Professors: Bin Xu and Jie Tang Post Doctors: 2 Graduate Students PhD Students: 9 + 4 (graduated) Master Students: 13 Research Direction Semantic Web Web ServicesMajor Research Projects: Major Research Projects Ontology Granularity Partition in Distributed Ontology System Supported by Natural Science Foundation of China (2003-2004) Research of Domain-Specific Semantic Content Management Supported by Natural Science Foundation of China (2006-2008) CNML Specification Management System Co-operation project with China News Agency (2004-2005) Research of Service Oriented Architecture Supported by IBM (2006-2007) Advanced Semantic Web Technologies to Support Ontology based Enterprise Content Management Supported by Greece-China (2006-2007) The Intelligent Processing and Semi-structured Information System Co-operation Project of Tsinghua-ITF Co-Lab (2002-2005) GUI XML Co-operation Project of Tsinghua-ITF Co-Lab (2003-2005)Outline: Knowledge Engineering Group (KEG) Our Research Work Ontology Development Semantic Web Availability Domain Knowledge Management Web Service Prototype Personal Network Search, SWARMS, and SEWSIP Publications OutlineOur Research Work: Our Research Work Ontology Development CNML (Recommended as the National Standard for News Exchange) Semantic Web Availability Semantic Annotation Ontology Mapping Domain Knowledge Management Search & Association Finding Caching Visualization & Navigation Semantic Web Service Service Annotator Web Service Evaluation Semantic based Service IntegrationOntology Development: Ontology Development Chinese News Markup Language (CNML) For sharing and managing the News in Chinese Travel Ontology For exchanging travel information http://www.luopan.com Software Ontology For managing softwaresOur Research Work: Our Research Work Ontology Development CNML (Recommended as the National Standard of News Exchange) Semantic Web Availability Semantic Annotation Ontology Mapping Domain Knowledge Management Search & Association Finding Caching Visualization & Navigation Semantic Web Service Service Annotator Web Service Evaluation Semantic based Service IntegrationSemantic Web Content Availability: Semantic Web Content Availability Semantic web content: machine ‘understandable’ content As one of the major challenges for semantic web Annotating the existing web pages using the ontological information Integrating/exchanging heterogeneous structured informationProblems: Information of almost all types exist. Semantic annotation is different in different domains The information is noisy Much of the information is semi-structured The information structure is heterogeneous The data representation is different The amount of information is huge The information is redundant … ProblemsOur Proposal: Our Proposal Our proposal is to employ machine learning for semantic annotation and ontology mapping Semantic Annotation iASA: Learning rules for semantic annotation Email annotation using classification based method Tree-structured Conditional Random Fields for sequential annotation Ontology Mapping Risk Minimization based Ontology MappingiASA: Learning Annotation Rules: iASA: Learning Annotation Rules Existing semantic annotation systems usually employ manual or rule based methods Manual annotation is tedious, time consumption, and error prone. Conventional rule learning algorithms adopt an random strategy of rule selection in the learning. We propose a rule learning based method that can select the ‘most’ similar rules for induction in the learning process Automatically determining the context size Finding similar rules Funded by Tsinghua-ITF Co-Lab (2002-2005) and by NSFC (2003-2004) A paper was published on Journal on Data Semantics [Tang, 05]Applications -- TIPSI: Applications -- TIPSI TIPSI Extracting information from company annual reports A practical project supported by ITF Frontier Co. Main Features Table identification and table reconstruction A paper was accepted by APWEB’06 [Li and Tang, 06] Two-stage annotation: Logic structure construction and semantic annotation Rule learning Semantic search Friendly user interactionAnnotating Company Annual Report: Annotating Company Annual Report 上海上菱电器股份有限公司 一九九九年年度报告 重要提示:本公司董事会保证本报告所载资料不存在任何虚假记载、误导性陈述或者重大遗漏,并对其内容的真实性、准确性和完整性负个别及连带责任。 一、公司简介 1、公司法定中文名称:上海上菱电器股份有限公司 公司法定英文名称:SHANGHAI SHANGLING ELECTRIC APPLIANCES CO.,LTD. 公司英文缩写:SLEC 2、公司法定代表人:夏毓灼 3、公司董事会秘书:曹俊 联系地址:上海市浦东新区建平路2号 电话:(021)58857888(总机)-2273 传真:(021)58857367 电子信箱:sldsh@shangling.com 五 董事会报告 (一)公司经营情况 1、 公司所在的行业为橡胶加工业所属的轮胎制造业。公司是原化工部"九五"期间规划的全国四大轮胎生产基地之一,属行业内重点企业。 2、 公司主营业务的范围及其经营状况 (1)、报告期内公司实现主营业务收入705,190,775元,比去年同期下降24.84%,实现主营业务利润125,097,141元,比去年同期下降36.87%。公司主营业务收入、主营业务利润全部来自轮胎的生产和销售。 (2)、从产品结构分析,52.8%的主营业务收入和49.7%的主营业务利润来自汽车轮胎。 项 目 1999年(元) 1998年(元) 增 减(%) 总资产 1,877,855,011 1,303,317,456 44.08 长期负债 78,600,000 65,480,000 20.04 股东权益 797,710,985 423,623,203 88.31 主营业务利润 125,097,141 198,149,514 -36.87 净利润 21,654,518 58,788,065 -63.17 Search in TIPSI: Search in TIPSISemantic Annotation on Email: Semantic Annotation on Email The goal is to identify and annotate different kinds of information from emails E.g. Header, Signature, Forwarded message, and Program code, etc. Email is one of the most common modes of communication. Many text mining applications on emails Email classification Email summarization Term extraction from email Part of the work was finished at MSRA We consider one application of semantic annotation on email as Email Data Cleaning A paper was published at SIGKDD’2005 [Tang, 05]Email Annotation: Email Annotation Our Cascaded Approach: Our Cascaded Approach Implementation -- Technical Issues: Implementation -- Technical Issues Block Annotation Forwarded message detection Header detection Signature detection List detection Program code detection Block Metadata Annotation Header metadata detection Signature metadata detection Extra line break detection VS.Block Detection Using SVMs: Block Detection Using SVMs Header detection Signature detection Program code detectionSignature Detection: Signature Detection Tree Structured CRFs for Hierarchical Annotation: Tree Structured CRFs for Hierarchical Annotation The rule and classification based methods can not model dependencies between information Sequential labeling models HMM, MEMM, and CRFs Previous linear-chain models cannot handle non-linear features Hierarchical dependencies Document structure HTML, XMLHierarchical Semantic Annotation: Hierarchical Semantic Annotation 3. Company Directorate Info Company directorate secretary: Haokui Zhou Representative of directorate: He Zhang Address: No. 583-14, Road Linling, Shanghai, China Zipcode: 200030 Email: ajcoob@mail2.online.sh.cn Phone: 021-64396600 Fax: 021-64392118 4. Company Registration Info Company registration address: No. 838, Road Zhang Yang, Shanghai, China Zipcode: 200122 Company office address: No. 583-14, Road Linling, Shanghai, China Zipcode: 200030 Email: ajcorp@online.sh.cn Phone: 021-64396654 Dependency DependencyTree-structured CRFs: Tree-structured CRFs Linear CRFs. Can only model linear dependency TCRFs. Can model hierarchical dependency Vs.TCRF Model: TCRF Model Ontology Mapping: Ontology Mapping Ontology mapping is the task of finding semantic correspondences between elements of two ontologies It is needed in many applications Integration of web data sources XML message mapping Query across different data sources The project is supported by NSFC Published papers: WWW’05 [Tang, 2005] and Journal of Web Semantics [Tang, 2006]Problem Description: Problem DescriptionOur Approach: Our Approach Formalizing ontology mapping as that of loss minimization Conducting ontology mapping by running several passes of processing Candidate collection Multi-strategy execution Strategy combination Mapping discovery Mapping process can take place iteratively. In each iteration, user interaction is supported.Mapping Processing Flow: Mapping Processing Flow Conducting mapping in following steps Our Research Work: Our Research Work Ontology Development CNML (Recommended as the National Standard of News Exchange) Semantic Web Availability Semantic Annotation Ontology Mapping Domain Knowledge Management Search & Association Finding Caching Visualization & Navigation Semantic Web Service Service Annotator Web Service Evaluation Semantic based Service IntegrationDomain Knowledge Management: Domain Knowledge Management Domain Knowledge Management Semantic Search and Association Finding Caching Visualization and Navigation Supported by Chinese Xinhua News Agency and NSFC Papers was published at WWW’05 [Liang, 05], Workshop of AAAI’05 [Liang, 06], APWEB’06 [Zhang, 06]Semantic Search: Semantic SearchVisualization-Navigation: Visualization-NavigationVisualization-Search: Visualization-Search Semantic Search: Semantic Search Index-based Search Using keywords to retrieve instances Constraint-based Search Setting constraints on properties and instances in the search Association Search Identifying two instances and finding their possible associationsAssociation Search: Association Search Searching for relationships between two entities For example: Input “Jack” and “Tom” “Jack” <reads> “document” <writtenby> “Tom” Input “Jack” and “book1” “Jack” <hasfriend> “Tom” <writes> “book1” A paper was published as WWW’05 poster [Liang, 05] A paper was published at APWEB’06 [Zhang, 06]Association Search: Association Search Keyword1 keyword2 Search from the centers of the two sub graphs Ranked path: Developer1 <develop> Project_11 <developedBy> Developer2 score: 0.64 Developer1 <develop> Project_11 <developedBy> Developer3 <develop> Project _21 <developedBy> Developer2 score: 0.43 … … l lOntology Caching: Ontology Caching The task is aimed at caching the concepts/ instances that are frequently accessed Existing Methods FIFO (First In First Out) LUFO (Least used First Out) Relation based method Our method Formalizing the problem as classification Features: instance similarity, hierarchical similarity, property similarity, access log, etc. A paper was published at APWEB’06 [Liang, 06]Visualization and Navigation: Visualization and Navigation A graph is used to model the domain data We propose using a strategy of “Focus+Context” to visualize the domain data Using “one click” mode to navigate in the visualized data Constraint based navigation Specifying the properties’ values to navigate to the matched instances Our Research Work: Our Research Work Ontology Development CNML (Recommended as the National Standard of News Exchange) Semantic Web Availability Semantic Annotation Ontology Mapping Domain Knowledge Management Search & Association Finding Caching Visualization & Navigation Semantic Web Service Service Annotator Web Service Evaluation Semantic based Service IntegrationWSDL Data Crawler and Annotator: WSDL Data Crawler and Annotator Crawler: finds the web services on Internet and filter the services of a specific domain Service Evaluation: Service EvaluationService Deployment and Discovery in P2P: Service Deployment and Discovery in P2P Rough matching based on keywords Precious matching based on semantic DescriptionSemantic based Service Integration: Semantic based Service IntegrationOutline: Knowledge Engineering Group (KEG) Our Research Work Ontology Development Semantic Web Availability Domain Knowledge Management Web Service Prototype Personal Network Search, SWARMS, and SEWSIP Publications OutlinePersonal Network Search: Personal Network Search For finding person…Personal Network Search: Personal Network Search IDC white paper entitled The High Cost of Not Finding Information reports Information workers spent from 15% to 35% of their time on searching for info. About 40% of information workers complain that they cannot find the information they need to do their job The existing system, such as Google、Yahoo、MSN Search, etc. have been developed for average use The average search method cannot satisfy the requirements for domain specific search, e.g. personal network searchPersonal Network Search: Personal Network Search Personal network search is an important research area. A person usually has different types of information Personal profile (including portrait, homepage, position, affiliation, publications, and documents) Contact information (including address, email, telephone, and fax number) Friends Unfortunately, the information is often hidden in heterogeneous and distributed web pagesOur Approach: Our Approach Personal Network Search = Semantic Mining + Semantic Search Collecting relevant docs Semantic annotation Semantic integration Semantic mining Visualization Processing Flow: Processing Flow Submitted to Returned pages Classifier Fed to Extracting and saving to Database QuerySnapshots: Snapshots SWARMS: SWARMS For managing semantic content…SWARMS: SWARMS Designing as a tool for domain knowledge exploration Integrating the following research results: Visualization and navigation Search Caching Design Goals Easy to adapt to the other domains Manage large scale domain data Implementation: Implementation Backend Storage Database + Jena Index-based Search Lucene + Database Constraint-based Search, Association Search SPARQL based Visualization and Navigation By extending JUNGArchitecture: Architecture Personal Search Database Jena based FOAF API FOAF repository Lucene Index-based repository Index-based Search Visualization and navigation Constraint based and Association Search User interface Snapshots of SWARMS: Snapshots of SWARMS Outline: Knowledge Engineering Group (KEG) Our Research Work Ontology Development Semantic Web Availability Domain Knowledge Management Web Service Prototype Personal Network Search, SWARMS, and SEWSIP Publications OutlinePublications: Jie Tang, Juan-Zi Li, Hong-Jun Lu, Bang-Yong Liang, and Ke-Hong Wang. iASA: Learning to Annotate the Semantic Web. Journal on Data Semantic, IV. Springer Press, 2005. pp110-145. Jie Tang, Hang Li, Yunbo Cao, and Zhaohui Tang. Email Data Cleaning. SIGKDD2005. August 21-24, 2005, Chicago, Illinois, USA. Full paper. pp489-499 Jie Tang, Bang-Yong Liang, and Juan-Zi Li. Multiple Strategies Detection in Ontology Mapping. Poster Paper. The 14th International World Wide Web Conference (WWW2005). Jie Tang, Juan-Zi Li, Bang-Yong Liang, Xiao-Tong Huang, Yi Li, and Ke-Hong Wang. Using Bayesian Decision for Ontology Mapping. Journal of Web Semantics. (Accepted) Jie Tang, Bang-Yong Liang, and Juan-Zi Li. Toward Detecting Strategies for Ontology Mapping. Workshop on Semantic Computing in the 14th International World Wide Web Conference (WWW2005). Jie Tang, Juan-Zi Li, Ke-Hong Wang, and Yue-Ru Cai. Multi-senses and Multi-Dependencies Discovery Among Words. NLP-KE2003 (2003 International Conference on Natural Language Processing and Knowledge Engineering). IEEE Press. 2003, 10. Jie Tang, Juan-Zi Li, Ke-Hong Wang, and Yue-Ru Cai. Loss Minimization based Keyword Distillation. 6th Asia-Pacific Web Conference (APWeb 2004), Springer 2004, 4, LNCS 3007, ISBN 3-540-21371-6. pp572-577. Jie Tang, Bang-Yong Liang, Juan-Zi Li, and Ke-Hong Wang. Risk Minimization based Ontology Mapping. 2004 Advanced Workshop on Content Computing (AWCC). Springer-Verlag, LNCS/LNAI. PublicationsPublications: Jie Tang, Juan-Zi Li, Ke-Hong Wang, and Yue-Ru Cai. Research of Atomic and Anonymous Electronic Commerce Protocol. 9th RSFDGrC (International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing). Springer-Verlag, LNCS/LNAI 2639. 2003, 5. pp711-714 Jie Tang, Juan-Zi Li, Ke-Hong Wang, and Yue-Ru Cai. Research of Satisfied Atomic and Anonymous Electronic Commerce Protocol. GCC2003. Springer-Verlag, LNCS/LNAI. Bang-Yong Liang, Jie Tang, Juan-Zi Li, and Ke-Hong Wang. Semantic Similarity Based Ontology Cache. APWEB'06 Bang-Yong Liang, Jie Tang, and Juanzi Li. SWARMS: A Tool for Exploring Domain Knowledge. The Workshop of Contexts and Ontologies on the 20th International AAAI Conference (AAAI’05). Bang-Yong Liang, Jie Tang, and Juan-Zi Li. Association Search in Semantic Web: Search + Inference. Poster Paper. The 14th International World Wide Web Conference (WWW2005). Juanzi Li, Jie Tang, Qiang Song, and Peng Xu. Table Detection from Plain Text Using Machine Learning and Document Structure. APWEB'06 Kuo Zhang, Jie Tang, Juan-Zi Li, and Ke-Hong Wang. Feature-Correlation Based Multi-view Detection. ICCSA 2005. Lecture Notes in Computer Science (LNCS) Kuo Zhang, Jie Tang, MingCai Hong, and JuanZi Li. Weighted ontology based semantic search exploiting semantic similarity. APWEB'06 Bangyong Liang, Jie Tang, Juanzi Li, and Kehong Wang. The Research of Knowledge Correctness Checking in the Semantic web. To be appeared in Chinese Journal of CIMS(In Chinese) PublicationsPublications: Bangyong Liang, Jie Tang, Juanzi Li, and Kehong Wang. Keyword Extraction Based Peer Clustering. to appear in the third International Conference on Grid and Cooperative Computing 2004. Springer-Verlag, LNCS/LNAI. (GCC'04) Bangyong Liang, Jie Tang, Juanzi Li, and Kehong Wang. Using DAML+OIL to Enhance Search Semantic. In IEEE/WIC/ACM Web Intelligence-2004(WI'04). Bangyong Liang, Jie Tang, Juanzi Li, and Kehong Wang. Resource Association Discovery in Semantic Web. In IEEE International Conference on e-Commerce Technology for Dynamic E-Business(CEC'04EAST). Mingcai Hong, Jie Tang J, and Juanzi Li. Semantic annotation by using horizontal and vertical context information. In Proceedings of ASWC’2006. Xiao-Qin Xie, Jie Tang, Juan-Zi Li, and Ke-Hong Wang. A Component Retrieval Method based on Facet-Weight Self-Learning. 2004 Advanced Workshop on Content Computing (AWCC). Springer-Verlag, LNCS/LNAI. Kuo Zhang, Hui Xu, Jie Tang, and Juanzi Li. Keyword extraction using support vector machine. In Proceedings of WAIM’2006. Bangyong Liang, Juanzi Li, and Kehong Wang. Knowledge sharing by Grid Technology. In Proceeding of the GCC2003. Jianjun Xu, Qian Zhu, JuanZi Li, Po Zhang, and Kehong Wang. Modeling and Implementation of Unified Semantic Web Platform. In Proceeding of the IEEE/WIC/ACM 2004. PublicationsPublications: Jianjun Xu, Qian Zhu, Juanzi Li, Po Zhang, and Kehong Wang. Semantic Based Web Services Discovery. In Proceeding of the Advanced Workshop on Content Computing (AWCC), 2004. Cao Weiqi, Li Juanzi, and Wang kehong. Asynchronous Communication for complicated e-commerce applications. In Proceeding of the Third International conference on Quality Software,2003, Dallas, Texas, USA, IEEE Society Press Li Juanzi, Xu Bin, etc, Semantic based Web Services Integration System in P2P, IEEE workshop on Service Oriented System Engineering, 2005 Yang Wenjun, Li Juanzi, etc, Interactive Service Composition in SEWSIP, IEEE workshop on Service Oriented System Engineering, 2005 Draft of China News Markup Language (submitted to China standard committee) Weiqi Cao, Juanzi Li, and Kehong Wang. Hybrid Messaging of Adaptive Workflow Engine. In Proceeding of the IEEE International Conference on Robotics, Intelligent Systems and Signal Processing, 2003, ChangSha, Hunan, China. IEEE Society Press Mingqin LI, Juanzi LI, Zhendong DONG, Zuoying WANG, and Dajin LU. Building a Large Chinese Corpus Annotated with Semantic Dependency. In Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, ACL2003, Sapporo, Japan,2003, pp. 84-91. PublicationsSlide67: Homepage of KEG: http://keg.cs.tsinghua.edu.cn Personal Homepage: http://keg.cs.tsinghua.edu.cn/persons/tj/