18 IBM Text Analytic OS Architecture 011606 mod

Uploaded from authorPOINT
Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

IOP ’06Open Source Intelligence Lesson Learned: 

IOP ’06 Open Source Intelligence Lesson Learned

Issues in using open source for intelligence: 

Issues in using open source for intelligence Growth and complexity of heterogeneous content Not all open source data is equal – Quantities vs. Qualitative Requirements of Ecoinformatics Architectures

Slide3: 

Source: IBM 2005 GTO Years 1024 = 1Trillion Terabytes of data which is equivalent to all the information consumed visually by all humans in a year Digital content is growing at dramatic rate

Slide4: 

Source: IBM 2005 GTO The scale of open source data and its heterogeneous form increases complexity of extracting intelligence Storage online Medical data stored Personal multimedia Surveillance bytes Photos multimedia Scalable Heterogeneity Intelligence Structured data Free from text 109 1012 1015 1021 1024 1027

Slide5: 

Open Source Intelligence from the periphery requires an understanding of its topology, including strengths and weaknesses sources in the periphery

Ecoinformatics Architectures need to be multi-layered : 

Ecoinformatics Architectures need to be multi-layered Cross-Page Annotators Classification Clustering Communities Ranking Applications Network Associations Search Topic Tracking Buzz Analysis Per-Page Annotators Auto Entity Spotters Auto Geography Spotter Porn andamp; Dup Detection Customer Taxonomy Spotter 100’s 1000’s (pages/second) World Wide Web Blogs Newspapers Licensed Feeds Data Bases Intranet Data Taxonomies Commercial Date Bases Index Store Un-Structured Data DATA ACQUISITION Structured Data Parsing/ Tokenizing Annotation Searching Natural Clustering Affinity Analysis Snippet Analysis Trending Performance Management Drug Research Business Insights Workbench Customer Applications 10’s Relevancy Volume WebFountain Business Insights Workbench WS OminFind II

Slide7: 

0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% Congressman Rob Simmons Douglas Rushkoff Eliot Jardines Major General Patrick Cammaert Mr Arno Reuser Robert Steele Open Source Trend on Web Some event happened in August % of OSI web documents One dominant voice Finding intelligence can require different view of the same information

Slide8: 

Context Network of Conference Attendees to auto-spotted Companies and Universities In this network view we don’t care about association with 'Open Source Intelligence' but with companies and universities

Slide9: 

Computers don’t create intelligence, people do – computers enable smart people Not all open source content is equal – know the sources Not every thing you see is right – it’s all about the CONTEXT Ecoinformation architecture supports - Large scale analytics of open source content - Integration of content other than open source - Power text analytic tools to support analysis of on topic stores Conclusions on Open Source Intelligence