logging in or signing up 232 Reva Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 27 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: September 29, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Pushing the Quality Level in Networked News Business semantic-based content retrieval and composition in international news publishing: Markus Schranz schranz@infosys.tuwien.ac.at Pushing the Quality Level in Networked News Business semantic-based content retrieval and composition in international news publishingAgenda: Agenda Problem and Project Description Goals and Objectives Approaches and Results Architectural Design & Communication Multinational and Multilingual Services Semantic Content Relations Future Steps and Exploitation Environmental Situation: Environmental Situation Internet gains in importance in the news distribution area Large amount of distributed business information is available European business today is highly segmented and widely unrecognised beyond national borders Business news mostly bear national relevance but hold the potential to spread cooperation opportunities and business chances towards an economically and socially integrated Europe. Business is global news need to be Support for old and new economy within entire Europe is required; Appropriate solution beneficial for business in the EU, with special focus on the support and the integration of new member states problem descriptionExisting approaches: Existing approaches National solutions available Business News Distribution Service in German speaking area Increasing interest from both Subscribers Press distributors within the existing services for multinational solutions Limitations Single language limitation Not attractive for European companies to join problem descriptionproject description: project description NEDINE has been EC-funded (Apr 2004-Apr2006). The objective of the project is to establish a distributed news network, aimed at European journalists and opinion leaders. NEDINE provides participants with a network for news exchange and distribution. It supports mutual awareness of relevant topics and information content within all European countries. NEDINE focuses on the availability and affordability for all partners to transport national information to the addressed target group, regardless of the origin, nationality and financial capability of the information provider. ObjectivesSlide6: Small Company Good product Austrian reader Czech reader Slovakian reader Austrian NA Czech NA Slovakian NA The Challenge project descriptionSlide7: Small Company Good product Austrian reader Czech reader Slovakian reader Austrian NA Czech NA Slovakian NA News agency offers to its customers: Single access point for international press releases Distribution Payment Editing / Translation Price advantage compared to collection of single press releases News agency benefits from the nedine network: Common business model Additional customers more revenues new contacts international presence The Solution project description News agency offers to its readers: Multilingual news International news From various sources (Semantic) Relationships independent from source Relevance ranking for searchArchitecture Reasoning: Architecture Reasoning First Approach – Centralized Architecture Pro‘s: Single maintenance point Clear infrastructure One traffic channel (News agency NEDINE) No additional infrastructure required for Partners Con‘s: Single point of failure (whole network down) Huge amount of network traffic Storage of complete articles Which organization maintains the central server? approaches and resultscentralized configuration: centralized configuration NEDINE Central Server ČIA SITA PTE Web Service Interface approaches and resultsArchitecture Reasoning: Architecture Reasoning Alternative Approach – Hybrid P2P - Architecture Why Peer - to - Peer? Better scalability No single point of failure No downtime if central services are down Less network traffic Network remains transparent for the peers (they only see Nedine) approaches and resultsapproaches and results: Final Approach – Hybrid P2P - Architecture Properties of this Architecture: Democratic System Identical software components are installed at each partner Nedine becomes a logically centralized platform Nedine is technically distributed to the view of all participating peers Semantic relations and necessary steps for news distribution are done in a local context approaches and resultsP2P configuration: P2P configuration Virtually Central Services ČIA SITA PTE Web Service Interface NEDINE Peer NEDINE Peer NEDINE Peer approaches and resultsSlide13: Communication: Peer Agency Web Services as the communication protocol Standard Interfaces for default peers (SOAP, NewsML Data transfer, Queries, Network Data) Customized interfaces for each partner, if necessary (database access based on document ID) Location and functionality of the NEDINE-peer is defined in the corresponding WSDL-file Functionality is only visible by the local peer, which increases network security approaches and resultsSlide14: Inter - Peer - Communication Implemented also by XML Web Services Inter – peer communication is invisible to the agencies High flexibility, easy to upgrade/change – doesn’t influence the rest of the network Network traffic is encrypted via PKI (Private-Public-Key Infrastructure) approaches and resultsSlide15: Multinational and Multilingual Services Multinational Service Integration Standardized news exchange formats NewsML Local Service to Peer communication SOAP local service providers hold business critical information installation of a local peer with well-known (open) source increases trust of the participating organizations and underlines the local character of the relevant business data Peer-to-Peer communication SOAP approaches and resultsSlide16: Multilingual News Publishing and Distribution Automatic Translation ? Multilingual content presentation ? Multilingual information distribution & retrieval Semantic relations between the (multilingual) business news contents approaches and resultsML Peer Registration: ML Peer Registration Virtually Central Services ČIA SITA PTE Web Service Interface NEDINE Peer NEDINE Peer NEDINE Peer Register Peer Languages = SK, EN Register Peer Languages = CZ,DE,EN Register Peer Languages = DE, EN approaches and resultsSemantic News Enrichment: Semantic News Enrichment Pushing the Quality Level by Semantics International news describe local business and lack relevant interrelations “Linking” between sensible business news has been manual work and thus costly Semantic relationships increase business value of news items, but how to create with reasonable effort? approaches and resultsapproaches and results: The Vector Space Engine Vectors are assigned to every news article representing keyword occurrences (weights) Vectors are technically small portions of data, feasible to integrate in peer component Semantic relationships increase business value of news items Automatically recognize similarities by creating a vector space on relevant keywords approaches and resultsapproaches and results: What is a keyword? all words (except stopwords) relevant words from frequencies with weights (vector space model) from the domain How does a keyword look like? A word : bodies A stem : bodi A lemma : body A phrase : public bodies approaches and resultsSlide22: Query Query Processing Document Processing Document Matching - Stemming and/or - PN Detection and/or - N-Gram Detection … - Stemming and/or - PN Detection and/or - N-Gram Detection … Document Query Q = (wq1,…,wqn) D = (wd1,…,wdn) approaches and resultsSlide23: Vector Space Model combined with statistic and linguistic processing. Statistical metrics included are: tfij = Term frequency for word i in document j IDFi = Inverse Document Frequency for word i in the whole document collection IDFi = 1 + wij = tfij *IDFi N = Total documents dfi = Document Frequency for term i approaches and resultsVector Space Model: Vector Space Model Documents are indexed by vectors Documents are retrieved by similarity Query and Documents are compared using the cosine formula: Sim(Q,D) = Local archives must provide term frequency data (internal and document) approaches and resultsThe used model: The used model Taggers and Stemmers Proper Names Heuristics Syntactic patterns Semantic resources (EWN) Metadata information Statistical process Linguistic Processing approaches and results Use case: distributing news in Czech republic and in AustriaČIA CZ, DE: Use case: distributing news in Czech republic and in Austria ČIA CZ, DE ČIA (CZ,DE,EN) SITA (SK,EN) PTE (DE,EN) NEDINE Peer NEDINE Peer NEDINE Peer 1. Distribution &Enrichment 2. Enrichment (DE) 5. CZ,DE Subscriber 7. DE Subscriber 3. 4. 6. approaches and resultsFuture Exploitation: Future Exploitation Recent developments and open issues Nedine has been extended with translation services (additional service on P2P architecture) Secure communication infrastructure has been implementation Performance and scalability tests Market & Business orientation Nedine Association has been funded end 2005 Slide31: Good News from Europe Have a look at NEDINE, we are open to recommendations, news providers and partners from all over Europe. Website http://www.nedine.org/ E-Mail info@nedine.org Nedine Contact Person: Dr. Markus Schranz Tel. ++43-1-81140-444, schranz@pressetext.at You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
232 Reva Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 27 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: September 29, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Pushing the Quality Level in Networked News Business semantic-based content retrieval and composition in international news publishing: Markus Schranz schranz@infosys.tuwien.ac.at Pushing the Quality Level in Networked News Business semantic-based content retrieval and composition in international news publishingAgenda: Agenda Problem and Project Description Goals and Objectives Approaches and Results Architectural Design & Communication Multinational and Multilingual Services Semantic Content Relations Future Steps and Exploitation Environmental Situation: Environmental Situation Internet gains in importance in the news distribution area Large amount of distributed business information is available European business today is highly segmented and widely unrecognised beyond national borders Business news mostly bear national relevance but hold the potential to spread cooperation opportunities and business chances towards an economically and socially integrated Europe. Business is global news need to be Support for old and new economy within entire Europe is required; Appropriate solution beneficial for business in the EU, with special focus on the support and the integration of new member states problem descriptionExisting approaches: Existing approaches National solutions available Business News Distribution Service in German speaking area Increasing interest from both Subscribers Press distributors within the existing services for multinational solutions Limitations Single language limitation Not attractive for European companies to join problem descriptionproject description: project description NEDINE has been EC-funded (Apr 2004-Apr2006). The objective of the project is to establish a distributed news network, aimed at European journalists and opinion leaders. NEDINE provides participants with a network for news exchange and distribution. It supports mutual awareness of relevant topics and information content within all European countries. NEDINE focuses on the availability and affordability for all partners to transport national information to the addressed target group, regardless of the origin, nationality and financial capability of the information provider. ObjectivesSlide6: Small Company Good product Austrian reader Czech reader Slovakian reader Austrian NA Czech NA Slovakian NA The Challenge project descriptionSlide7: Small Company Good product Austrian reader Czech reader Slovakian reader Austrian NA Czech NA Slovakian NA News agency offers to its customers: Single access point for international press releases Distribution Payment Editing / Translation Price advantage compared to collection of single press releases News agency benefits from the nedine network: Common business model Additional customers more revenues new contacts international presence The Solution project description News agency offers to its readers: Multilingual news International news From various sources (Semantic) Relationships independent from source Relevance ranking for searchArchitecture Reasoning: Architecture Reasoning First Approach – Centralized Architecture Pro‘s: Single maintenance point Clear infrastructure One traffic channel (News agency NEDINE) No additional infrastructure required for Partners Con‘s: Single point of failure (whole network down) Huge amount of network traffic Storage of complete articles Which organization maintains the central server? approaches and resultscentralized configuration: centralized configuration NEDINE Central Server ČIA SITA PTE Web Service Interface approaches and resultsArchitecture Reasoning: Architecture Reasoning Alternative Approach – Hybrid P2P - Architecture Why Peer - to - Peer? Better scalability No single point of failure No downtime if central services are down Less network traffic Network remains transparent for the peers (they only see Nedine) approaches and resultsapproaches and results: Final Approach – Hybrid P2P - Architecture Properties of this Architecture: Democratic System Identical software components are installed at each partner Nedine becomes a logically centralized platform Nedine is technically distributed to the view of all participating peers Semantic relations and necessary steps for news distribution are done in a local context approaches and resultsP2P configuration: P2P configuration Virtually Central Services ČIA SITA PTE Web Service Interface NEDINE Peer NEDINE Peer NEDINE Peer approaches and resultsSlide13: Communication: Peer Agency Web Services as the communication protocol Standard Interfaces for default peers (SOAP, NewsML Data transfer, Queries, Network Data) Customized interfaces for each partner, if necessary (database access based on document ID) Location and functionality of the NEDINE-peer is defined in the corresponding WSDL-file Functionality is only visible by the local peer, which increases network security approaches and resultsSlide14: Inter - Peer - Communication Implemented also by XML Web Services Inter – peer communication is invisible to the agencies High flexibility, easy to upgrade/change – doesn’t influence the rest of the network Network traffic is encrypted via PKI (Private-Public-Key Infrastructure) approaches and resultsSlide15: Multinational and Multilingual Services Multinational Service Integration Standardized news exchange formats NewsML Local Service to Peer communication SOAP local service providers hold business critical information installation of a local peer with well-known (open) source increases trust of the participating organizations and underlines the local character of the relevant business data Peer-to-Peer communication SOAP approaches and resultsSlide16: Multilingual News Publishing and Distribution Automatic Translation ? Multilingual content presentation ? Multilingual information distribution & retrieval Semantic relations between the (multilingual) business news contents approaches and resultsML Peer Registration: ML Peer Registration Virtually Central Services ČIA SITA PTE Web Service Interface NEDINE Peer NEDINE Peer NEDINE Peer Register Peer Languages = SK, EN Register Peer Languages = CZ,DE,EN Register Peer Languages = DE, EN approaches and resultsSemantic News Enrichment: Semantic News Enrichment Pushing the Quality Level by Semantics International news describe local business and lack relevant interrelations “Linking” between sensible business news has been manual work and thus costly Semantic relationships increase business value of news items, but how to create with reasonable effort? approaches and resultsapproaches and results: The Vector Space Engine Vectors are assigned to every news article representing keyword occurrences (weights) Vectors are technically small portions of data, feasible to integrate in peer component Semantic relationships increase business value of news items Automatically recognize similarities by creating a vector space on relevant keywords approaches and resultsapproaches and results: What is a keyword? all words (except stopwords) relevant words from frequencies with weights (vector space model) from the domain How does a keyword look like? A word : bodies A stem : bodi A lemma : body A phrase : public bodies approaches and resultsSlide22: Query Query Processing Document Processing Document Matching - Stemming and/or - PN Detection and/or - N-Gram Detection … - Stemming and/or - PN Detection and/or - N-Gram Detection … Document Query Q = (wq1,…,wqn) D = (wd1,…,wdn) approaches and resultsSlide23: Vector Space Model combined with statistic and linguistic processing. Statistical metrics included are: tfij = Term frequency for word i in document j IDFi = Inverse Document Frequency for word i in the whole document collection IDFi = 1 + wij = tfij *IDFi N = Total documents dfi = Document Frequency for term i approaches and resultsVector Space Model: Vector Space Model Documents are indexed by vectors Documents are retrieved by similarity Query and Documents are compared using the cosine formula: Sim(Q,D) = Local archives must provide term frequency data (internal and document) approaches and resultsThe used model: The used model Taggers and Stemmers Proper Names Heuristics Syntactic patterns Semantic resources (EWN) Metadata information Statistical process Linguistic Processing approaches and results Use case: distributing news in Czech republic and in AustriaČIA CZ, DE: Use case: distributing news in Czech republic and in Austria ČIA CZ, DE ČIA (CZ,DE,EN) SITA (SK,EN) PTE (DE,EN) NEDINE Peer NEDINE Peer NEDINE Peer 1. Distribution &Enrichment 2. Enrichment (DE) 5. CZ,DE Subscriber 7. DE Subscriber 3. 4. 6. approaches and resultsFuture Exploitation: Future Exploitation Recent developments and open issues Nedine has been extended with translation services (additional service on P2P architecture) Secure communication infrastructure has been implementation Performance and scalability tests Market & Business orientation Nedine Association has been funded end 2005 Slide31: Good News from Europe Have a look at NEDINE, we are open to recommendations, news providers and partners from all over Europe. Website http://www.nedine.org/ E-Mail info@nedine.org Nedine Contact Person: Dr. Markus Schranz Tel. ++43-1-81140-444, schranz@pressetext.at