Parallel and Distributed Computing for Cyber Security

Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Parallel and Distributed Computing for Cyber Security : 

Parallel and Distributed Computing for Cyber Security

Progress in HPC - past 6 decades : 

Progress in HPC - past 6 decades ENIACS – 1945 100 K Hz 5 K Additions/second 357 Multiplications/second Japanese Earth Simulator IBM Blue Gene CPU power increasing by a factor of 50-100 every decade Multi-Giga Hz/ Gigabyte PCs are commodity Teraflop computers common at large organizations Petaflop scale computing within reach in this decade Virginia Tech Infiniband Cluster

Applications Drive the Technology : 

Applications Drive the Technology “I think there is world market for maybe 5 computers” - Thomas Watson Sr. (1943) Scientific Computing Data Driven Computing

Data Mining - A Driver for Parallel/ Distributed Computing : 

Data Mining - A Driver for Parallel/ Distributed Computing Lots of data being collected in commercial and scientific world Strong competitive pressure to extract and use the information from the data Scaling of data mining to large data requires HPC Data and/or computational resources needed for analysis are often distributed Sometimes the choice is distributed data mining or no data mining Ownership, privacy, security issues

Cyber Intrusion Detection - Motivation : 

Cyber Intrusion Detection - Motivation Spread of SQL Slammer worm 10 minutes after its deployment Incidents Reported to Computer Emergency Response Team/Coordination Center Sophistication of cyber attacks and their severity is increasing Large-scale denial of service attacks Identify Theft/ Fraud Espionage DOD and Other U.S. Government Agencies are major targets for sophisticated state sponsored cyber attacks Security mechanisms always have inevitable vulnerabilities Firewalls are not sufficient to ensure security in computer networks Insider attacks difficult to detect 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003

Slide 6: 

Experts Race to Beat Computer Worm U.S., Canada Try to Thwart Sobig by Disconnecting 17 Machines Hackers cut off SCO Web site Martin LaMonica, Staff Writer, August 25, 2003 An Onslaught of Computer Viruses August 23, 2003 Blackout spurs cyberattack worry By Kevin Maney and Michelle Kessler, USA TODAY, August 19, 2003 Hackers Steal 13,000 Credit Card Numbers Navy Says No Fraud Has Been Noticed Saturday, August 23, 2003 Saturday, August 23, 2003

What are Intrusions? : 

What are Intrusions? Intrusions are actions that attempt to bypass security mechanisms of computer systems. They are caused by: Attackers accessing the system from Internet Insider attackers - authorized users attempting to gain and misuse non-authorized privileges Typical intrusion scenario Scanning activity Computer Network Attacker

What are Intrusions? : 

What are Intrusions? Intrusions are actions that attempt to bypass security mechanisms of computer systems. They are caused by: Attackers accessing the system from Internet Insider attackers - authorized users attempting to gain and misuse non-authorized privileges Typical intrusion scenario Computer Network Attacker Scanning activity

Intrusion Detection Systems : 

Intrusion Detection Systems www.snort.org Example of SNORT rule (MS-SQL “Slammer” worm) any -> udp port 1434 (content:"|81 F1 03 01 04 9B 81 F1 01|"; content:"sock"; content:"send") Intrusion Detection System Combination of software and hardware that attempts to perform intrusion detection Raises the alarm when possible intrusion happens Traditional intrusion detection system IDS tools are based on signatures of known attacks Limitations Signature database has to be manually revised for each new type of discovered intrusion Substantial latency in deployment of newly created signatures across the computer system Cannot detect emerging cyber threats Not suitable for detecting policy violations and insider abuse Do not provide understanding of network traffic Generate too many false alarms Not suited for detecting multi-step attacks

Data Mining for Intrusion Detection : 

Data Mining for Intrusion Detection Increased interest in data mining based intrusion detection over the past decade Misuse detection Suitable for attacks for which it is difficult to build signatures Builds predictive models from labeled labeled data sets (instances are labeled as “normal” or “intrusive”) to identify known intrusions Cannot detect unknown and emerging attacks Madam ID project, ADAM project, fuzzy association rules [Bridges00], decision trees [Sinclair99], neural networks [Lippmann00, Ghosh99], genetic algorithms [Bridges00, Sinclair99], cost sensitive modeling (AdaCost [Fan99], MetaCost [Domingos99, Ting00]), learning from rare class ([Kubat97, Fawcett97, Provost01, Japkowicz01, Joshi02, Lazarevic03] Anomaly detection Detects emerging/novel attacks as deviations from “normal” behavior Potential high false alarm rate - previously unseen (yet legitimate) system behaviors may also be recognized as anomalies PHAD, ALAD [Chan01, Cha02], ADAM [Barbara01] finite mixture model [Yamanishi00], 2 based [Ye01]), temporal sequence learning [Lane98], neural networks [Ryan98], generating artificial anomalies [Fan01], clustering [Eskin02], unsupervised SVM [Eskin02, Lazarevic03], outlier detection schemes (MINDS), Bayesian net [Valdes00], Hidden Markov models [Ourston03]

Data Mining for Intrusion Detection : 

Data Mining for Intrusion Detection Misuse Detection – Building Predictive Models categorical temporal continuous class Learn Classifier categorical Rules Discovered: {Src IP = 206.163.37.95, Dest Port = 139, Bytes  [150, 200]} --> {ATTACK} Summarization of attacks using association rules Training Set Key Technical Challenges Large data size High dimensionality Temporal nature of the data Skewed class distribution Data preprocessing On-line analysis Clustering & Anomaly Detection Link Analysis Live data

Data Mining for Intrusion Detection : 

Misuse Detection – Building Predictive Models Key Technical Challenges Large data size High dimensionality Temporal nature of the data Skewed class distribution Data preprocessing On-line analysis categorical temporal continuous class Learn Classifier categorical Rules Discovered: {Src IP = 206.163.37.95, Dest Port = 139, Bytes  [150, 200]} --> {ATTACK} Summarization of attacks using association rules Training Set Link Analysis Live data Clustering & Anomaly Detection Data Mining for Intrusion Detection

MINDS – Minnesota INtrusion Detection System : 

MINDS – Minnesota INtrusion Detection System network Data capturing device Anomaly detection … … Anomaly scores Humananalyst Detected novel attacks Summary and characterizationof attacks MINDS system Known attack detection Detected known attacks Labels Feature Extraction Association pattern analysis Filtering Net flow tools tcpdump Data mining based anomaly detection system Used at the University of Minnesota to analyze network traffic to/from 40,000 computers Incorporated into Interrogator architecture at ARL Center for Intrusion Monitoring and Protection (CIMP), PoC: Bencevenko and Long (ARL) Helps analyze data from multiple sensors at DoD sites around the country Routinely detects attacks and intrusive behavior not detected by widely used intrusion detection systems Insider Abuse / Policy Violations / Worms / Scans ARL-CIMP considers MINDS as the first effective anomaly intrusion detection system

Typical Anomaly Detection Output : 

Anomalous connections that correspond to the “slammer” worm Anomalous connections that correspond to the ping scan Connections corresponding to UM machines connecting to “half-life” game servers Typical Anomaly Detection Output

Summarization Using Association Patterns : 

Summarization Using Association Patterns Anomaly Detection System attack normal R1: TCP, DstPort=1863  Attack … … … … R100: TCP, DstPort=80  Normal Discriminating Association Pattern Generator Build normal profile Study changes in normal behavior Create attack summary Detect misuse behavior Understand nature of the attack update Knowledge Base Ranked connections

Typical MINDS Output : 

Typical MINDS Output UMN computer connecting to a remote FTP server, running on port 5002 Summarized TCP reset packets received from 64.156.X.74, which is a victim of DoS attack, and we were observing backscatter, i.e. replies to spoofed packets Summarization of FTP scan from a computer in Columbia, 200.75.X.2 Summary of IDENT lookups, where a remote computer tries to get user name Summarization of a USENET server transferring a large amount of data

Typical MINDS Output : 

UMN computers doing bulk transfers 160.94.122.142 is running a rogue FTP server on 60000/TCP UMN Computers doing large transfers via BitTorrent to many outside hosts This computer is scanning for computers on port 139/TCP. Majority of the packets are 192bytes or 144bytes, except for the second summary (score 88.2) UMN computer running a RealMedia server, that was not known to the analyst Odd looking P2P traffic to/from a UMN computer (potentially KaZaA or Gnutella) The remote computer was scanning for 57/TCP, where RESET packets are sent back from computers that do not have 57/TCP open. Typical MINDS Output

Typical Summarization Output : 

Typical Summarization Output UMN computers doing bulk transfers 160.94.122.142 is running a rogue FTP server on 60000/TCP UMN Computers doing large transfers via BitTorrent to many outside hosts This computer is scanning for computers on port 139/TCP. Majority of the packets are 192bytes or 144bytes, except for the second summary (score 88.2) UMN computer running a RealMedia server, that was not known to the analyst Odd looking P2P traffic to/from a UMN computer (potentially KaZaA or Gnutella) The remote computer was scanning for 57/TCP, where RESET packets are sent back from computers that do not have 57/TCP open.

Detecting Modes of Network Traffic Using Clustering : 

Detecting Modes of Network Traffic Using Clustering Used Shared Nearest Neighbor (SNN) clustering Not distracted by “noise” in the data CPU intensive: O(N2) Requires storing an N x K matrix K (number of neighbors) is typically between 10 – 20 K should be about the size of the smallest expect mode Clustered 850,000 connections collected over one hour at one US Army Fort Took 10 hours on a 16 CPU cluster Found 3135 clusters Largest clusters around 500 records, smallest cluster 10 records Large clusters correspond to normal behavior Many small clusters correspond to policy violations or other undesired behavior

Detecting Modes of Network Traffic Using Clustering : 

Detecting Modes of Network Traffic Using Clustering Large clusters of VPN traffic (hundreds of connections) Used between forts for secure sharing of data and working remotely

Detecting Modes of Network Traffic Using Clustering : 

Clusters Involving GoToMyPC.com (Army Data) Policy violation, allows remote control of a desktop Detecting Modes of Network Traffic Using Clustering

Detecting Modes of Network Traffic Using Clustering : 

Clusters involving mysterious ping and SNMP traffic Misconfigured computer subjected to SNMP surveillance Detecting Modes of Network Traffic Using Clustering

Detecting Modes of Network Traffic Using Clustering : 

Clusters involving unusual repeated ftp sessions Further investigations revealed a misconfigured Army computer was trying to contact Microsoft Detecting Modes of Network Traffic Using Clustering

Need for HPC : 

Need for HPC Very large data size Typical network traffic at University level reach around 500 million connections per day Compute intensive nature of the pattern finding algorithm Associative analysis Clustering Sequential pattern analysis

Need for Distributed Intrusion Detection : 

Need for Distributed Intrusion Detection Attacks on the network infrastructure may be launched from several different locations and may target multiple destinations Stealthy coordinated attacks with low traffic volumes are difficult to detect by IDSs based at a single network site Detection of such attacks in early stage requires correlation of data at multiple network sites

Slide 26: 

Map of the Global IP Space

Slide 27: 

Source IPs of suspicious connections in the global IP space Suspicious Traffic on Port 80 Destination IPs of suspicious connections within the 3 class B networks at the U of M 999 unique sources, 1126 unique destinations, 1516 total flows involved + Failed connections O Successful connections

Slide 29: 

Source IPs of suspicious connections in the global IP space Suspicious Traffic on Port 445 Destination IPs of suspicious connections within the 3 class B networks at the U of M 7982 unique sources, 6184 unique destinations, 9930 total flows involved + Failed connections O Successful connections

Need for Grid-based IDS : 

Centralizing data is not possible Data needed for analysis is distributed Costs of centralizing data is too high Security and privacy issues Computational resources needed for analysis are distributed Need for Grid-based IDS How to detect a distributed network attack?

Data Mining Middleware for Grids : 

Data Services (e.g. JDBC, SQL, SRB) Grid and Web Services (e.g. Globus, XML-RPC, DWTP,Condor) Grid and Web Services (e.g. Globus, XML-RPC, DWTP, Condor) Data Services (e.g. JDBC, SQL, SRB) Data and Policy Management Services Data and Policy Management Services Scheduling and Replication Services Data Mining and Exploration Services Execution, Representation And Management Systems (e.g., Chimera, Pegasus) Application Data & Model Transport Services Grid Control Services Data Mining Middleware for Grids NSF/ITR funded project jointly with B. Grossman, S. Ranka, and J. Weissman

Grid-Based Data Mining: Distributed Network Intrusion Detection : 

Grid-Based Data Mining: Distributed Network Intrusion Detection Detection of attack by correlating suspicious events across sites. Locate computing resources needed for time critical execution of the data mining query. Needed to protect privacy, but allow necessary data access.

Publications : 

Publications Managing Cyber Threats: Issues, Approaches and Challenges, edited by V. Kumar, J. Srivastava, and A. Lazarevic, Kluwer Academic Publishers (forthcoming). MINDS - Minnesota Intrusion Detection System, Ertöz, L., Eilertson, E., Lazarevic, A., Tan, P., Srivastava, J., Kumar, V., Dokas, P., Data Mining: Next Generation Challenges and Future Directions, editors: H. Kargupta, A. Joshi, K. Sivakumar, Y. Yesha MIT/AAAI Press, 2004, AHPCRC Technical Report # 2003-121 Detection of Novel Network Attacks Using Data Mining, L. Ertöz, E. Eilertson, A. Lazarevic, P. Tan, P. Dokas, V. Kumar, J. Srivastava, Workshop on Data Mining for Computer Security, IEEE International Conference on Data Mining, Melbourne, FL, November 19, 2003, AHPCRC Technical Report # 2003-108 Visit http://www.cs.umn.edu/~kumar for further information